WIP: multivariate statistics / proof of concept

Started by Tomas Vondraover 11 years ago240 messages

tv@fuzzy.cz

over 11 years ago

1 attachment(s)

Hi,

attached is a WIP patch implementing multivariate statistics. The code
certainly is not "ready" - parts of it look as if written by a rogue
chimp who got bored of attempts to type the complete works of William
Shakespeare, and decided to try something different.

I also cut some corners to make it work, and those limitations need to
be fixed before the eventual commit (those are not difficult problems,
but were not necessary for a proof-of-concept patch).

It however seems to be working sufficiently well at this point, enough
to get some useful feedback. So here we go.

I expect to be busy over the next two weeks because of travel, so sorry
for somehow delayed responses. If you happen to attend pgconf.eu next
week (Oct 20-24), we can of course discuss this patch in person.

Goals and basics
----------------

The goal of this patch is allowing users to define multivariate
statistics (i.e. statistics on multiple columns), and improving
estimation when the columns are correlated.

Take for example a table like this:

CREATE TABLE test (a INT, b INT, c INT);
INSERT INTO test SELECT i/10000, i/10000, i/10000
FROM generate_series(1,1000000) s(i);
ANALYZE test;

and do a query like this:

SELECT * FROM test WHERE (a = 10) AND (b = 10) AND (c = 10);

which is estimated like this:

QUERY PLAN
---------------------------------------------------------
Seq Scan on test (cost=0.00..22906.00 rows=1 width=12)
Filter: ((a = 10) AND (b = 10) AND (c = 10))
Planning time: 0.142 ms
(3 rows)

The query of course returns 10.000 rows, but the planner assumes the
columns are independent and thus multiplies the selectivities. And 1/100
for each column means 1/1000000 in total, which is 1 row.

This example is of course somehow artificial, but the problem is far
from uncommon, especially in denormalized datasets (e.g. star schema).
If you ever got an index scan instead of a sequential scan due to poor
estimate, resulting in a query running for hours instead of seconds, you
know the pain.

The patch allows you to do this:

ALTER TABLE test ADD STATISTICS ON (a, b, c);
ANALYZE test;

which then results in this estimate:

QUERY PLAN
------------------------------------------------------------
Seq Scan on test (cost=0.00..22906.00 rows=9667 width=12)
Filter: ((a = 10) AND (b = 10) AND (c = 10))
Planning time: 0.110 ms
(3 rows)

This however is not free - both building such statistics (during
ANALYZE) and using it (during planning) costs some cycles. Even if we
optimize the hell out of it, it won't be entirely free.

One of the design goals in this patch is not to make the ANALYZE or
planning more expensive unless you add such statistics.

Those who add such statistics probably decided that the price is worth
the improved estimates, and lower risk of inefficient plans. If the
planning takes a few more miliseconds, it's probably worth it if you
risk queries running for minutes or hours because of misestimates.

It also does not guarantee the estimates to be always better. There will
be misestimates, although rather in the other direction (independence
assumption usually leads to underestimates, this may lead to
overestimates). However based on my experience from writing the patch I
be I believe it's possible to reasonably limit the extent of such errors
(just like in the single-column histograms, it's related to the bucket
size).

Of course, there will be cases when the old approach is lucky by
accident - there's not much we can do to beat luck. And we can't rely on
it either.

Design overview
---------------

The patch adds a new system catalog, called pg_mv_statistic, which is
used to keep track of requested statistics. There's also a pg_mv_stats
view, showing some basic info about the stats (not all the data).

There are three kinds of statistics

- list of most common combinations of values (MCV list)
- multi-dimensional histogram
- associative rules

The first two are extensions of the single-column stats we already have.
The MCV list is a trivial extension to multiple dimensions, just
tracking combinations and frequencies. The histogram is more complex -
the structure is quite simple (multi-dimensional rectangles) but there's
a lot of ways to build it. But even the current naive and simple
implementation seems to work quite well.

The last kind (associative rules) is an attempt to track "implications"
between columns. It is however an experiment and it's not really used in
the patch so I'll ignore it for now.

I'm not going to explain all the implementation details here - if you
want to learn more, the best way is probably by reading the changes in
those files (probably in this order):

src/include/utils/mvstats.h
src/backend/commands/analyze.c
src/backend/optimizer/path/clausesel.c

I tried to explain the ideas thoroughly in the comments, along with a
lot of TODO/FIXME items related to limitations, explained in the next
section.

Limitations
-----------

As I mentioned, the current patch has a number of practical limitations,
most importantly:

(a) only data types passed by value (no varlena types)
(b) only data types with sort (to be able to build histogram)
(c) no NULL values supported
(d) not handling DROP COLUMN or DROP TABLE and such
(e) limited to stats on 8 columns (max)
(f) optimizer uses single stats per table
(g) limited list of compatible WHERE clauses
(h) incomplete ADD STATISTICS syntax

The first three conditions are really a shortcut to a working patch, and
fixing them should not be difficult.

The limited number of columns is really just a sanity check. It's
possible to increase it, but I doubt stats on more columns will be
practical because of excessive size or poor accuracy.

A better approach is to support combining multiple stats, defined on
various subsets of columns. This is not implemented at the memoment, but
it's certainly on the roadmap. Currently the "smallest" stats covering
the most columns is selected.

Regarding the compatible WHERE clauses, the patch currently handles
conditions of the form

column OPERATOR constant

where operator is one of the comparison operators (=, <, >, =<, >=). In
the future it's possible to add support for more conditions, e.g.
"column IS NULL" or "column OPERATOR column".

The last point is really just "unfinished implementation" - the syntax I
propose is this:

ALTER TABLE ... ADD STATISTICS (options) ON (columns)

where the options influence the MCV list and histogram size, etc. The
options are recognized and may give you an idea of what it might do, but
it's not really used at the moment (except for storing in the
pg_mv_statistic catalog).

Examples
--------

Let's see a few examples of how to define the stats, and what difference
in estimates it makes:

CREATE TABLE test (a INT, b INT c INT);

-- same value in all columns
INSERT INTO test SELECT mod(i,100), mod(i,100), mod(i,100)
FROM generate_series(1,1000000) s(i);

ANALYZE test;

=============== no multivariate stats ============================

SELECT * FROM test WHERE a = 10 AND b = 10;

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on test (cost=0.00..20406.00 rows=101 width=12)
(actual time=0.007..60.902 rows=10000 loops=1)
Filter: ((a = 10) AND (b = 10))
Rows Removed by Filter: 990000
Planning time: 0.119 ms
Execution time: 61.164 ms
(5 rows)

SELECT * FROM test WHERE a = 10 AND b = 10 AND c = 10;

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on test (cost=0.00..22906.00 rows=1 width=12)
(actual time=0.010..56.780 rows=10000 loops=1)
Filter: ((a = 10) AND (b = 10) AND (c = 10))
Rows Removed by Filter: 990000
Planning time: 0.061 ms
Execution time: 56.994 ms
(5 rows)

=============== with multivariate stats ===========================

ALTER TABLE test ADD STATISTICS ON (a, b, c);
ANALYZE test;

SELECT * FROM test WHERE a = 10 AND b = 10;

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on test (cost=0.00..20406.00 rows=10767 width=12)
(actual time=0.007..58.981 rows=10000 loops=1)
Filter: ((a = 10) AND (b = 10))
Rows Removed by Filter: 990000
Planning time: 0.114 ms
Execution time: 59.214 ms
(5 rows)

SELECT * FROM test WHERE a = 10 AND b = 10 AND c = 10;

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on test (cost=0.00..22906.00 rows=10767 width=12)
(actual time=0.008..61.838 rows=10000 loops=1)
Filter: ((a = 10) AND (b = 10) AND (c = 10))
Rows Removed by Filter: 990000
Planning time: 0.088 ms
Execution time: 62.057 ms
(5 rows)

OK, that was rather significant improvement, but it's also trivial
dataset. Let's see something more complicated - the following table has
correlated columns with distributions skewed to 0.

CREATE TABLE test (a INT, b INT, c INT);
INSERT INTO test SELECT r*MOD(i,50),
pow(r,2)*MOD(i,100),
pow(r,4)*MOD(i,500)
FROM (SELECT random() AS r, i
FROM generate_series(1,1000000) s(i)) foo;
ANALYZE test;

SELECT * FROM test WHERE a = 0 AND b = 0;

=============== no multivariate stats ============================

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on test (cost=0.00..20406.00 rows=9024 width=12)
(actual time=0.007..62.969 rows=49503 loops=1)
Filter: ((a = 0) AND (b = 0))
Rows Removed by Filter: 950497
Planning time: 0.057 ms
Execution time: 64.098 ms
(5 rows)

SELECT * FROM test WHERE a = 0 AND b = 0 AND c = 0;

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on test (cost=0.00..22906.00 rows=2126 width=12)
(actual time=0.008..63.862 rows=40770 loops=1)
Filter: ((a = 0) AND (b = 0) AND (c = 0))
Rows Removed by Filter: 959230
Planning time: 0.060 ms
Execution time: 64.794 ms
(5 rows)

=============== with multivariate stats ============================

ALTER TABLE test ADD STATISTICS ON (a, b, c);
ANALYZE test;

SELECT * FROM test WHERE a = 0 AND b = 0;

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on test (cost=0.00..20406.00 rows=47717 width=12)
(actual time=0.007..61.782 rows=49503 loops=1)
Filter: ((a = 0) AND (b = 0))
Rows Removed by Filter: 950497
Planning time: 3.181 ms
Execution time: 62.859 ms
(5 rows)

SELECT * FROM test WHERE a = 0 AND b = 0 AND c = 0;

QUERY PLAN
-------------------------------------------------------------------
Seq Scan on test (cost=0.00..22906.00 rows=40567 width=12)
(actual time=0.009..66.685 rows=40770 loops=1)
Filter: ((a = 0) AND (b = 0) AND (c = 0))
Rows Removed by Filter: 959230
Planning time: 0.188 ms
Execution time: 67.593 ms
(5 rows)

regards
Tomas

Attachments:

multivar-stats-v1.patchtext/x-diff; name=multivar-stats-v1.patchDownload

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index b257b02..6e63afe 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9d9d239..68ec1aa 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -150,6 +150,18 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.stakeys AS attnums,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mvclist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c09ca7e..df51805 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -54,7 +55,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Data structure for Algorithm S from Knuth 3.4.2 */
 typedef struct
@@ -111,6 +116,62 @@ static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
 
+/* multivariate statistics (histogram, MCV list, associative rules) */
+
+static void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+static void update_mv_stats(Oid relid,
+							MVHistogram histogram, MCVList mcvlist);
+
+/* multivariate histograms */
+static MVHistogram build_mv_histogram(int numrows, HeapTuple *rows,
+									  int2vector *attrs,
+									  int attr_cnt, VacAttrStats **vacattrstats,
+									  int numrows_total);
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs, int natts,
+										 VacAttrStats **vacattrstats);
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 int natts, VacAttrStats **vacattrstats);
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+/* multivariate MCV list */
+static MCVList build_mv_mcvlist(int numrows, HeapTuple *rows,
+								int2vector *attrs,
+								int natts, VacAttrStats **vacattrstats,
+								int *numrows_filtered);
+
+/* multivariate associative rules */
+static void build_mv_associations(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  int natts, VacAttrStats **vacattrstats);
+
+/* serialization */
+static bytea * serialize_mv_histogram(MVHistogram histogram);
+static bytea * serialize_mv_mcvlist(MCVList mcvlist);
+
+/* comparators, used when constructing multivariate stats */
+static int compare_scalars_simple(const void *a, const void *b, void *arg);
+static int compare_scalars_partition(const void *a, const void *b, void *arg);
+static int compare_scalars_memcmp(const void *a, const void *b, void *arg);
+static int compare_scalars_memcmp_2(const void *a, const void *b);
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+								int natts, VacAttrStats **vacattrstats);
+
+/* some debugging methods */
+#ifdef MVSTATS_DEBUG
+static void print_mv_histogram_info(MVHistogram histogram);
+#endif
+
+
 /*
  *	analyze_rel() -- analyze one relation
  */
@@ -469,6 +530,13 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 * 
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For small number of dimensions it works, but
+	 *       for complex stats it'd be nice use sample proportional to
+	 *       the table (say, 0.5% - 1%) instead of a fixed size.
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -571,6 +639,9 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
@@ -2810,3 +2881,1979 @@ compare_mcvs(const void *a, const void *b)
 
 	return da - db;
 }
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+static void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	MVStats mvstats;
+	int		nmvstats;
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 * 
+	 * TODO move this to a separate method or something ...
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel), &nmvstats, false);
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MCVList		mcvlist   = NULL;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = mvstats[i].stakeys;
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze associations between pairs of columns.
+		 * 
+		 * FIXME store the identified associations back to pg_mv_statistic
+		 */
+		build_mv_associations(numrows, rows, attrs, natts, vacattrstats);
+
+		/* build the MCV list */
+		mcvlist = build_mv_mcvlist(numrows, rows, attrs, natts, vacattrstats, &numrows_filtered);
+
+		/*
+		 * Build a multivariate histogram on the columns.
+		 *
+		 * FIXME remove the rows used to build the MCV from the histogram.
+		 *       Another option might be subtracting the MCV selectivities
+		 *       from the histogram, but I'm not sure whether that works
+		 *       accurately (maybe it introduces additional errors).
+		 */
+		if (numrows_filtered > 0)
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, natts, vacattrstats, numrows);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(mvstats[i].mvoid, histogram, mcvlist);
+
+#ifdef MVSTATS_DEBUG
+		print_mv_histogram_info(histogram);
+#endif
+
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(stats[i]->compute_stats == compute_scalar_stats);
+
+		/* TODO remove the 'pass by value' requirement */
+		Assert(stats[i]->attrtype->typbyval);
+	}
+
+	return stats;
+}
+
+/*
+ * TODO Add ndistinct estimation, probably the one described in "Towards
+ *      Estimation Error Guarantees for Distinct Values, PODS 2000,
+ *      p. 268-279" (the ones called GEE, or maybe AE).
+ *
+ * TODO The "combined" ndistinct is more likely to scale with the number
+ *      of rows (in the table), because a single column behaving this
+ *      way is sufficient for such behavior.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+
+	/* info for the interesting attributes only */
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/* resulting bucket */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+	bucket->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/*
+	 * All the sample rows fall into the initial bucket.
+	 * 
+	 * FIXME This is wrong (unless all columns are NOT NULL), because we
+	 *       skipped the NULL values.
+	 */
+	bucket->numrows = numrows;
+	bucket->ntuples = numrows;
+	bucket->rows = rows;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	/*
+	 * The initial bucket was not split at all, so we'll start with the
+	 * first dimension in the next round (index = 0).
+	 */
+	bucket->last_split_dimension = -1;
+
+	return bucket;
+}
+
+/*
+ * TODO Fix to handle arbitrarily-sized histograms (not just 2D ones)
+ *      and call the right output procedures (for the particular type).
+ *
+ * TODO This should somehow fetch info about the data types, and use
+ *      the appropriate output functions to print the boundary values.
+ *      Right now this prints the 8B value as an integer.
+ *
+ * TODO Also, provide a special function for 2D histogram, printing
+ *      a gnuplot script (with rectangles).
+ *
+ * TODO For string types (once supported) we can sort the strings first,
+ *      assign them a sequence of integers and use the original values
+ *      as labels.
+ */
+#ifdef MVSTATS_DEBUG
+static void
+print_mv_histogram_info(MVHistogram histogram)
+{
+	int i = 0;
+
+	elog(WARNING, "histogram nbuckets=%d", histogram->nbuckets);
+
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		MVBucket bucket = histogram->buckets[i];
+		elog(WARNING, "  bucket %d : ndistinct=%f ntuples=%d min=[%ld, %ld], max=[%ld, %ld] distinct=[%d,%d]",
+			i, bucket->ndistinct, bucket->numrows,
+			bucket->min[0], bucket->min[1], bucket->max[0], bucket->max[1],
+			bucket->ndistincts[0], bucket->ndistincts[1]);
+	}
+}
+#endif
+
+/*
+ * A very simple partitioning selection criteria - choose the bucket
+ * with the highest number of distinct values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int ndistinct = 1; /* if ndistinct=1, we can't split the bucket */
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* if the ndistinct count is higher, use this bucket */
+		if (buckets[i]->ndistinct > ndistinct) {
+			bucket = buckets[i];
+			ndistinct = buckets[i]->ndistinct;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - splits the dimensions in
+ * a round-robin manner (considering only those with ndistinct>1). That
+ * is first a dimension 0 is split, then 1, 2, ... until reaching the
+ * end of attribute list, and then wrapping back to 0. Of course,
+ * dimensions with a single distinct value are skipped.
+ *
+ * This is essentially what Muralikrishna/DeWitt described in their SIGMOD
+ * article (M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For
+ * Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD
+ * Conference 1988: 28-36).
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ * 
+ * This splits the bucket by tweaking the existing one, and returning the
+ * new bucket (essentially shrinking the existing one in-place and returning
+ * the other "half" as a new bucket). The caller is responsible for adding
+ * the new bucket into the list of buckets.
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case of
+ *      strongly dependent columns - e.g. y=x).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g. to
+ *      split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(bucket->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = bucket->rows;
+	int oldnrows = bucket->numrows;
+
+	/* info for the interesting attributes only */
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(bucket->ndistinct > 1);
+	Assert(bucket->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split, in a round robin manner.
+	 * We'll use the first one with (ndistinc > 1).
+	 *
+	 * If we happen to wrap around, something clearly went wrong (we
+	 * can't mess with the last_split_dimension directly, because we
+	 * couldn't do this check).
+	 */
+	dimension = bucket->last_split_dimension;
+	while (true)
+	{
+		dimension = (dimension + 1) % numattrs;
+
+		if (bucket->ndistincts[dimension] > 1)
+			break;
+
+		/* if we ran the previous split dimension, it's infinite loop */
+		Assert(dimension != bucket->last_split_dimension);
+	}
+
+	/* Remember the dimension for the next split of this bucket. */
+	bucket->last_split_dimension = dimension;
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < bucket->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(bucket->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	split_value = values[0].value;
+	for (i = 1; i < bucket->numrows; i++)
+	{
+		/* count distinct values */
+		if (values[i].value != values[i-1].value)
+			ndistinct += 1;
+
+		/* once we've seen 1/2 distinct values (and use the value) */
+		if (ndistinct > bucket->ndistincts[dimension] / 2)
+		{
+			split_value = values[i].value;
+			break;
+		}
+
+		/* keep track how many rows belong to the first bucket */
+		nrows += 1;
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < bucket->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	bucket->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_bucket->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	bucket->numrows	 = nrows;
+	new_bucket->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&bucket->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_bucket->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	new_bucket->last_split_dimension = bucket->last_split_dimension;
+
+	/* allocate the per-dimension arrays */
+	new_bucket->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ * 
+ * FIXME Make this work with all types (not just pass-by-value ones).
+ * 
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ * 
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j, idx = 0;
+	int numattrs = attrs->dim1;
+	Size len = sizeof(Datum) * numattrs;
+	bool isNull;
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	Datum * values = palloc0(bucket->numrows * numattrs * sizeof(Datum));
+
+	for (j = 0; j < bucket->numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			values[idx++] = heap_getattr(bucket->rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isNull);
+
+	qsort_arg((void *) values, bucket->numrows, sizeof(Datum) * numattrs,
+			  compare_scalars_memcmp, &len);
+
+	bucket->ndistinct = 1;
+
+	for (i = 1; i < bucket->numrows; i++)
+		if (memcmp(&values[i * numattrs], &values[(i-1) * numattrs], len) != 0)
+			bucket->ndistinct += 1;
+
+	pfree(values);
+
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ *
+ * TODO Remove unnecessary parameters - don't pass in the whole arrays,
+ *      just the proper elements.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	Datum * values = (Datum*)palloc0(bucket->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < bucket->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(bucket->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	bucket->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 * 
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs etc.).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			bucket->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+MVStats
+list_mv_stats(Oid relid, int *nstats, bool built_only)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	MVStats		result;
+
+	/* start with 16 items, that should be enough for most cases */
+	int maxitems = 16;
+	result = (MVStats)palloc0(sizeof(MVStatsData) * maxitems);
+	*nstats = 0;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		/*
+		 * Skip statistics that were not computed yet (if only stats
+		 * that were already built were requested)
+		 */
+		if (built_only && (! (stats->hist_built || stats->mcv_built || stats->assoc_built)))
+			continue;
+
+		/* double the array size if needed */
+		if (*nstats == maxitems)
+		{
+			maxitems *= 2;
+			result = (MVStats)repalloc(result, sizeof(MVStatsData) * maxitems);
+		}
+
+		result[*nstats].mvoid = HeapTupleGetOid(htup);
+		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		result[*nstats].hist_built = stats->hist_built;
+		result[*nstats].mcv_built = stats->mcv_built;
+		result[*nstats].assoc_built = stats->assoc_built;
+		*nstats += 1;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+
+/*
+ * Serialize the MV histogram into a bytea value.
+ *
+ * The serialized first deduplicates the boundary values into a separate
+ * array, and uses 2B indexes when serializing the buckets. This stores
+ * a significant amount of space because each bucket split adds a single
+ * new boundary value, so e.g. with 4 attributes and 8191 splits (thus
+ * 8192 buckets), there are only ~8200 distinct boundary values.
+ *
+ * But as each bucket has 8 boundary values (4+4), that's ~64k Datums.
+ * That's roughly 65kB vs. 512kB, but we haven't included the indexes
+ * used to reference the boundary values. By using int16 indexes (which
+ * should be more than enough for all reasonable histogram sizes),
+ * this amounts to ~128kB (8192*8*2). So in total it's ~196kB vs. 512kB,
+ * i.e. more than 2x compression, which is nice.
+ *
+ * The implementation is simple - walk through the buckets, collect all
+ * the boundary values, keep only distinct values (in a sorted array)
+ * and then replace the values with indexes (using binary search).
+ *
+ * It's possible to either serialize/deserialize the histogram into
+ * a MVHistogram, or create a special structure working with this
+ * compressed structure (and keep MVBucket/MVHistogram only for the
+ * building phase). This might actually work better thanks to better
+ * CPU cache hit ratio, and simpler deserialization.
+ *
+ * This encoding will probably prevent automatic varlena compression,
+ * because first part of the serialized bytea will be an array of unique
+ * values (although sorted), and pglz decides whether to compress by
+ * trying to compress the first part (~1kB or so). Which will be poor,
+ * due to the lack of repetition.
+ *
+ * But in this case this is probably desirable - the data in general
+ * won't be really compressible (in addition to the 2x compression we
+ * got thanks to the encoding). In a sense the encoding scheme is
+ * actually a context-aware compression (usually compressing to ~30%).
+ * So this seems appropriate in this case.
+ *
+ * FIXME Make this work with arbitrary types.
+ *
+ * TODO Try to keep the compressed form, instead of deserializing it to
+ *      MVHistogram/MVBucket.
+ *
+ * TODO We might get a bit better compression by considering the actual
+ *      data type length. The current implementation treats all data as
+ *      8B values, but for INT it's actually 4B etc. OTOH this is only
+ *      related to the lookup table, and most of the space is occupied
+ *      by the buckets (with int16 indexes). And we don't have type info
+ *      at the moment, so it would be difficult (but we'll nedd it to
+ *      support all types, so maybe then).
+ */
+static bytea *
+serialize_mv_histogram(MVHistogram histogram)
+{
+	int i = 0, j = 0;
+
+	/* total size (histogram header + all buckets) */
+	Size	total_len;
+	char   *tmp = NULL;
+	bytea  *result = NULL;
+
+	/* we need to accumulate all boundary values (min/max) */
+	int idx = 0;
+	int max_values = histogram->nbuckets * histogram->ndimensions * 2;
+	Datum * values = (Datum*)palloc0(max_values * sizeof(Datum));
+	Size len = sizeof(Datum);
+
+	/* we'll collect unique boundary values into this */
+	int		ndistinct = 0;
+	Datum  *lookup = NULL;
+
+	/*
+	 * Collect the boundary values first, sort them and generate a small
+	 * array with only distinct values.
+	 */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			values[idx++] = histogram->buckets[i]->min[j];
+			values[idx++] = histogram->buckets[i]->max[j];
+		}
+	}
+
+	/*
+	 * We've allocated just enough space for all boundary values, but
+	 * this may change once we start handling NULL values (as we'll
+	 * probably skip those).
+	 *
+	 * Also, we expect at least one boundary value at this moment.
+	 */
+	Assert(max_values == idx);
+	Assert(idx > 1);
+
+	/*
+	 * Sort the collected boundary values using a simple memcmp-based
+	 * comparator (this won't work for pass-by-reference types), and
+	 * then walk the data and count the distinct values.
+	 */
+	qsort((void *) values, idx, len, compare_scalars_memcmp_2);
+
+	ndistinct = 1;
+	for (i = 1; i < max_values; i++)
+		ndistinct += (values[i-1] != values[i]) ? 1 : 0;
+
+	/*
+	 * At this moment we can allocate the bytea value (and we'll collect
+	 * the boundary values directly into it).
+	 *
+	 * The bytea will be structured like this:
+	 *
+	 * - varlena header            : VARHDRSZ
+	 * - histogram header          : offsetof(MVHistogram,buckets)
+	 * - number of boundary values : sizeof(uint32)
+	 * - boundary values           : ndistinct * sizeof(Datum)
+	 * - buckets                   : nbuckets * BUCKET_SIZE_SERIALIZED
+	 *
+	 * We'll assume 2B indexes into the boundary values, because each
+	 * bucket 'split' introduces one boundary value. Moreover, multiple
+	 * splits may introduce the same value, so this should be enough for
+	 * at least 65k buckets (and likely more). That's more than enough
+	 * for reasonable histogram sizes.
+	 */
+
+	Assert(ndistinct <= 65536);
+
+	total_len = VARHDRSZ + offsetof(MVHistogramData, buckets) +
+				(sizeof(uint32) + ndistinct * sizeof(Datum)) +
+				histogram->nbuckets * BUCKET_SIZE_SERIALIZED(histogram->ndimensions);
+
+	result = (bytea*)palloc0(total_len);
+	tmp = VARDATA(result);
+
+	SET_VARSIZE(result, total_len);
+
+	/* copy the global histogram header */
+	memcpy(tmp, histogram, offsetof(MVHistogramData, buckets));
+	tmp += offsetof(MVHistogramData, buckets);
+
+	/*
+	 * Copy the number of distinct values, and then all the distinct
+	 * values currently stored in the 'values' array (sorted).
+	 */
+	 memcpy(tmp, &ndistinct, sizeof(uint32));
+	 tmp += sizeof(uint32);
+
+	lookup = (Datum*)tmp;
+
+	for (i = 0; i < max_values; i++)
+	{
+		/* skip values that are equal to the previous one */
+		if ((i > 0) && (values[i-1] == values[i]))
+			continue;
+
+		memcpy(tmp, &values[i], sizeof(Datum));
+		tmp += sizeof(Datum);
+	}
+
+	Assert(tmp - (char*)lookup == ndistinct * sizeof(Datum));
+
+	/* now serialize all the buckets - first the header, without the
+	 * variable-length part, then all the variable length parts */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		MVBucket	bucket = histogram->buckets[i];
+		uint16		indexes[histogram->ndimensions];
+
+		/* write the common bucket header */
+		memcpy(tmp, bucket, offsetof(MVBucketData, ndistincts));
+		tmp += offsetof(MVBucketData, ndistincts);
+
+		/* per-dimension ndistincts / nullsonly */
+		memcpy(tmp, bucket->ndistincts, sizeof(uint32)*histogram->ndimensions);
+		tmp += sizeof(uint32)*histogram->ndimensions;
+
+		memcpy(tmp, bucket->nullsonly, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		memcpy(tmp, bucket->min_inclusive, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		memcpy(tmp, bucket->max_inclusive, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		/* and now translate the min (and then max) boundaries to indexes */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			Datum *v = (Datum*)bsearch(&bucket->min[j], lookup, ndistinct,
+									   sizeof(Datum), compare_scalars_memcmp_2);
+
+			Assert(v != NULL);
+			indexes[j] = (v - lookup);		/* Datum arithmetics (not char) */
+			Assert(indexes[j] < ndistinct);	/* we have to be within the array */
+		}
+
+		memcpy(tmp, indexes, sizeof(uint16)*histogram->ndimensions);
+		tmp += sizeof(uint16)*histogram->ndimensions;
+
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			Datum *v = (Datum*)bsearch(&bucket->max[j], lookup, ndistinct,
+									   sizeof(Datum), compare_scalars_memcmp_2);
+			Assert(v != NULL);
+			indexes[j] = (v - lookup);		/* Datum arithmetics (not char) */
+			Assert(indexes[j] < ndistinct);	/* we have to be within the array */
+		}
+
+		memcpy(tmp, indexes, sizeof(uint16)*histogram->ndimensions);
+		tmp += sizeof(uint16)*histogram->ndimensions;
+	}
+
+	return result;
+}
+
+/*
+ * Reverse to serialize histogram. This essentially expands the serialized
+ * form back to MVHistogram / MVBucket.
+ */
+MVHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_length;
+	char   *tmp = NULL;
+	MVHistogram histogram;
+
+	uint32	nlookup;	/* Datum lookup table */
+	Datum   *lookup = NULL;
+
+	if (data == NULL)
+		return NULL;
+
+	/* get pointer to the data part of the varlena */
+	tmp = VARDATA(data);
+
+	histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	/* copy the histogram header in place */
+	memcpy(histogram, tmp, offsetof(MVHistogramData, buckets));
+	tmp += offsetof(MVHistogramData, buckets);
+
+	if (histogram->magic != MVHIST_MAGIC)
+	{
+		pfree(histogram);
+		elog(WARNING, "not a MV Histogram (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(histogram->type == MVHIST_TYPE_BASIC);
+	Assert(histogram->nbuckets > 0);
+	Assert(histogram->nbuckets <= MVHIST_MAX_BUCKETS);
+	Assert(histogram->ndimensions > 0);
+	Assert(histogram->ndimensions <= MVSTATS_MAX_DIMENSIONS);
+
+	/* now, get the size of the lookup table */
+	memcpy(&nlookup, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+	lookup = (Datum*)tmp;
+
+	/* skip to the first bucket */
+	tmp += sizeof(Datum) * nlookup;
+
+	/* check the total serialized length */
+	expected_length = offsetof(MVHistogramData, buckets) +
+			sizeof(uint32) + nlookup * sizeof(Datum) +
+			histogram->nbuckets * BUCKET_SIZE_SERIALIZED(histogram->ndimensions);
+
+	/* check serialized length */
+	if (VARSIZE_ANY_EXHDR(data) != expected_length)
+	{
+		elog(ERROR, "invalid MV histogram serialized size (expected %ld, got %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_length);
+		return NULL;
+	}
+
+	/* allocate bucket pointers */
+	histogram->buckets = (MVBucket*)palloc0(histogram->nbuckets * sizeof(MVBucket));
+
+	/* deserialize the buckets, one by one */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		/* don't allocate space for the build-only fields */
+		MVBucket	bucket = (MVBucket)palloc0(offsetof(MVBucketData, rows));
+		uint16     *indexes = NULL;
+
+		/* write the common bucket header */
+		memcpy(bucket, tmp, offsetof(MVBucketData, ndistincts));
+		tmp += offsetof(MVBucketData, ndistincts);
+
+		/* per-dimension ndistincts / nullsonly */
+		bucket->ndistincts = (uint32*)palloc0(sizeof(uint32)*histogram->ndimensions);
+		memcpy(bucket->ndistincts, tmp, sizeof(uint32)*histogram->ndimensions);
+		tmp += sizeof(uint32)*histogram->ndimensions;
+
+		bucket->nullsonly = (bool*)palloc0(sizeof(bool)*histogram->ndimensions);
+		memcpy(bucket->nullsonly, tmp, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		bucket->min_inclusive = (bool*)palloc0(sizeof(bool)*histogram->ndimensions);
+		memcpy(bucket->min_inclusive, tmp, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		bucket->max_inclusive = (bool*)palloc0(sizeof(bool)*histogram->ndimensions);
+		memcpy(bucket->max_inclusive, tmp, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		/* translate the indexes back to Datum values */
+		bucket->min = (Datum*)palloc0(sizeof(Datum)*histogram->ndimensions);
+		bucket->max = (Datum*)palloc0(sizeof(Datum)*histogram->ndimensions);
+
+		indexes = (uint16*)tmp;
+		tmp += sizeof(uint16) * histogram->ndimensions;
+		for (j = 0; j < histogram->ndimensions; j++)
+			memcpy(&bucket->min[j], &lookup[indexes[j]], sizeof(Datum));
+
+		indexes = (uint16*)tmp;
+		tmp += sizeof(uint16) * histogram->ndimensions;
+		for (j = 0; j < histogram->ndimensions; j++)
+			memcpy(&bucket->max[j], &lookup[indexes[j]], sizeof(Datum));
+
+		histogram->buckets[i] = bucket;
+	}
+
+	return histogram;
+}
+
+/*
+ * Serialize MCV list into a bytea value.
+ *
+ * This does not use any kind of deduplication (compared to histogram
+ * serialization), as we don't expect the same efficiency here.
+ *
+ * This simply writes a MCV header (number of items, ...) and then Datum
+ * values for all attribute of a item, followed by the item frequency
+ * (as a double).
+ */
+static bytea *
+serialize_mv_mcvlist(MCVList mcvlist)
+{
+	int i;
+
+	/* we need to store nitems, and each needs ndimension * Datum, plus a double */
+	Size len = VARHDRSZ + offsetof(MCVListData, items) + mcvlist->nitems * (sizeof(Datum) * mcvlist->ndimensions + sizeof(double));
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, mcvlist, offsetof(MCVListData, items));
+	tmp += offsetof(MCVListData, items);
+
+	/* now, walk through the items and store values + frequency for each MCV item */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		memcpy(tmp, mcvlist->items[i]->values, mcvlist->ndimensions * sizeof(Datum));
+		tmp += mcvlist->ndimensions * sizeof(Datum);
+
+		memcpy(tmp, &mcvlist->items[i]->frequency, sizeof(double));
+		tmp += sizeof(double);
+	}
+
+	return output;
+
+}
+
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	Assert(mcvlist->nitems > 0);
+	Assert((mcvlist->ndimensions >= 2) && (mcvlist->ndimensions <= MVSTATS_MAX_DIMENSIONS));
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MCVListData,items) +
+					mcvlist->nitems * (sizeof(Datum) * mcvlist->ndimensions + sizeof(double));
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem) * mcvlist->nitems);
+
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem item = (MCVItem)palloc0(offsetof(MCVItemData, values) +
+										mcvlist->ndimensions * sizeof(Datum));
+
+		memcpy(item->values, tmp, mcvlist->ndimensions * sizeof(Datum));
+		tmp += mcvlist->ndimensions * sizeof(Datum);
+
+		memcpy(&item->frequency, tmp, sizeof(double));
+		tmp += sizeof(double);
+
+		mcvlist->items[i] = item;
+	}
+
+	return mcvlist;
+}
+
+static void
+update_mv_stats(Oid mvoid, MVHistogram histogram, MCVList mcvlist)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (histogram != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stahist-1]    = false;
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(serialize_mv_histogram(histogram));
+	}
+
+	if (mcvlist != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = false;
+		values[Anum_pg_mv_statistic_stamcv  - 1]
+			= PointerGetDatum(serialize_mv_mcvlist(mcvlist));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+
+/* MV stats */
+
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+Datum
+pg_mv_stats_mvclist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+Datum
+pg_mv_stats_histogram_gnuplot(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+
+	/* FIXME (handle the length properly using StringBuilder */
+	Size		len = 1024*1024;
+	char	   *buffer = palloc0(len);
+	char	   *str = buffer;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+
+	MVHistogram hist = deserialize_mv_histogram(data);
+
+	for (i = 0; i < hist->nbuckets; i++)
+	{
+		str += snprintf(str, len - (str - buffer),
+						"set object %d rect from %ld,%ld to %ld,%ld lw 1\n",
+						(i+1),
+						hist->buckets[i]->min[0], hist->buckets[i]->min[1],
+						hist->buckets[i]->max[0], hist->buckets[i]->max[1]);
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(buffer));
+
+}
+
+bytea *
+fetch_mv_histogram(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *stahist = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum hist  = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stahist, &isnull);
+
+		Assert(!isnull);
+
+		stahist = DatumGetByteaP(hist);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return stahist;
+}
+
+bytea *
+fetch_mv_mcvlist(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *mcvlist = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum tmp  = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stamcv, &isnull);
+
+		Assert(!isnull);
+
+		mcvlist = DatumGetByteaP(tmp);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return mcvlist;
+}
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, by looking at the number of
+ * distinct values (combination of column values for bucket, column
+ * values for a dimension). This is somehow naive, but seems to work
+ * quite well. See the discussion at select_bucket_to_partition and
+ * partition_bucket for more details about alternative algorithms.
+ *
+ * So the current algorithm looks like this:
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (max distinct combinations)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (max distinct values)
+ *             split the bucket into two buckets
+ *
+ */
+static MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   int attr_cnt, VacAttrStats **vacattrstats,
+				   int numrows_total)
+{
+	int i;
+	int ndistinct;
+	int numattrs = attrs->dim1;
+	int ndistincts[numattrs];
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVHIST_MAGIC;
+	histogram->type  = MVHIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets = (MVBucket*)palloc0(MVHIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0] = create_initial_mv_bucket(numrows, rows_copy, attrs,
+													 attr_cnt, vacattrstats);
+
+	ndistinct = histogram->buckets[0]->ndistinct;
+
+	/* keep the global ndistinct values */
+	for (i = 0; i < numattrs; i++)
+		ndistincts[i] = histogram->buckets[0]->ndistincts[i];
+
+	while (histogram->nbuckets < MVHIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets, histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets] = partition_bucket(bucket, attrs,
+																   attr_cnt, vacattrstats);
+
+		histogram->nbuckets += 1;
+	}
+
+	/*
+	 * FIXME store the histogram in a catalog in a serialized form (simple for
+	 *       pass-by-value, more complicated for buckets on varlena types)
+	 */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		int d;
+		histogram->buckets[i]->ntuples = (histogram->buckets[i]->numrows * 1.0) / numrows_total;
+		histogram->buckets[i]->ndistinct = (histogram->buckets[i]->ndistinct * 1.0) / ndistinct;
+
+		for (d = 0; d < numattrs; d++)
+			histogram->buckets[i]->ndistincts[d] = (histogram->buckets[i]->ndistincts[d] * 1.0) / ndistincts[d];
+	}
+
+	return histogram;
+
+}
+
+/*
+ * Mine associations between the columns, in the form (A => B).
+ *
+ * At the moment this only works for associations between two columns,
+ * but it might be useful to mine for rules involving multiple columns
+ * on the left side. That is rules [A,B] => C and so on. Handling
+ * multiple columns on the right side is not necessary, because such
+ * rules may be decomposed into a set of rules, one for each column.
+ * I.e. A => [B,C] is exactly the same as (A => B) & (A => C).
+ *
+ * Those rules don't immediately identify redundant clauses, because the
+ * user may choose "incompatible conditions" (e.g. by using a zip code
+ * and a mismatching city) and so on. This should however be easy to
+ * identify from a histogram, because the conditions will match a bucket
+ * with low frequencies.
+ *
+ * The question is whether this can be useful when we have a histogram,
+ * because such incompatible conditions should result in not matching
+ * any buckets (or matching only buckets with low frequencies).
+ *
+ * The problem is that histograms work like this when the sorting is
+ * compatible with the meaning of the data. We're often using data types
+ * that support sorting (e.g. INT, BIGING) as a kind of labels where
+ * the sorting really does not make much sense. Sorting by ZIP code will
+ * result in sorting the cities quite randomly, and similarly for most
+ * surrogate primary / foreign keys. In such cases the histograms are
+ * pretty useless.
+ *
+ * So, a good approach might be testing the independence of the data
+ * (by building a contingency table) and buildint the MV histogram only
+ * when there's a dependency. For the 'label' data this should notice
+ * the histogram is useless. So we won't build it (and we may use that
+ * as a sign supporting the association rule).
+ *
+ * Another option is to look at selectivity of A and B separately, and
+ * then use the minimum of those.
+ *
+ * TODO investigate using histogram and MCV list to confirm the
+ *      associative rule
+ *
+ * TODO investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram)
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ */
+static void
+build_mv_associations(int numrows, HeapTuple *rows, int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	bool isNull;
+	Size len = 2 * sizeof(Datum);	/* only simple associations a => b */
+	int numattrs = attrs->dim1;
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct columns in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we should expected to
+	 *      observe in the sample - we can then use the average group
+	 *      size as a threshold. That seems better than a static approach.
+	 */
+	int min_group_size = 10;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/* info for the interesting attributes only
+	 * 
+	 * TODO Compute this only once and pass it to all the methods
+	 *      that need it.
+	 */
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/* We'll reuse the same array for all the combinations */
+	Datum * values = (Datum*)palloc0(numrows * 2 * sizeof(Datum));
+
+	Assert(numattrs >= 2);
+
+	for (dima = 0; dima < numattrs; dima++)
+	{
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+
+			int supporting = 0;
+			int contradicting = 0;
+
+			Datum val_a, val_b;
+			int violations = 0;
+			int group_size = 0;
+
+			int supporting_rows = 0;
+
+			/* skip (dima==dimb) */
+			if (dima == dimb)
+				continue;
+
+			/*
+			 * FIXME Not sure if this handles NULL values properly (not sure
+			 *       how to do that). We assume that NULL means 0 for now,
+			 *       handling it just like any other value.
+			 */
+			for (i = 0; i < numrows; i++)
+			{
+				values[i*2]   = heap_getattr(rows[i], attrs->values[dima], stats[dima]->tupDesc, &isNull);
+				values[i*2+1] = heap_getattr(rows[i], attrs->values[dimb], stats[dimb]->tupDesc, &isNull);
+			}
+
+			qsort_arg((void *) values, numrows, sizeof(Datum) * 2, compare_scalars_memcmp, &len);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 * 
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful. When contradicting,
+			 * use it always.
+			 */
+
+			/* start with values from the first row */
+			val_a = values[0];
+			val_b = values[1];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				if (values[2*i] != val_a)	/* end of the group */
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 */
+					supporting += ((violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
+					contradicting += (violations != 0) ? 1 : 0;
+
+					supporting_rows += ((violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
+
+					/* current values start a new group */
+					val_a = values[2*i];
+					val_b = values[2*i+1];
+					violations = 0;
+					group_size = 1;
+				}
+				else
+				{
+					if (values[2*i+1] != val_b)	/* mismatch of a B value */
+					{
+						val_b = values[2*i+1];
+						violations += 1;
+					}
+
+					group_size += 1;
+				}
+			}
+
+			/* FIXME handle the last group */
+			supporting += ((violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
+			contradicting += (violations != 0) ? 1 : 0;
+			supporting_rows += ((violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical rule.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 * 
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means the columns have the same values (or one is a 'label'),
+			 *      making the conditions rather redundant. Although it's possible
+			 *      that the query uses incompatible combination of values.
+			 */
+			if (supporting_rows > (numrows - supporting_rows) * 10)
+			{
+				// elog(WARNING, "%d => %d : supporting=%d contradicting=%d", dima, dimb, supporting, contradicting);
+			}
+
+		}
+	}
+
+	pfree(values);
+
+}
+
+/*
+ * Compute the list of most common items, where item is a combination of
+ * values for all the columns. For small number of distinct values, we
+ * may be able to represent the distribution pretty exactly, with
+ * per-item statistics.
+ *
+ * If we can represent the distribution using a MCV list only, it's great
+ * because that allows much better estimates (especially for equality).
+ * Such discrete distributions are also easier to combine (more
+ * efficient and more accurate) than when using histograms.
+ *
+ * FIXME This does not handle NULL values at the moment.
+ *
+ * TODO When computing equality selectivity (a=1 AND b=2), we can do that
+ *      pretty exactly assuming (a) we hit a MCV item and (b) the
+ *      histogram is built on those two columns only (i.e. there are no
+ *      other columns). In that case we can estimate the selectivity
+ *      using only the MCV.
+ *
+ *      When we don't hit a MCV item, we can use the frequency of the
+ *      least probable MCV item as upper bound of the selectivity
+ *      (otherwise it'd get into the MCV list). Again, this only works
+ *      when the histogram size matches the restricted columns.
+ *
+ *      When the histogram is larger (i.e. there are additional columns),
+ *      we can't be sure how is the selectivity distributed among the MCV
+ *      list and the histogram (we may get several MCV items matching
+ *      the conditions and several histogram buckets at the same time).
+ *
+ *      In this case we can probably clamp the selectivity by minimum of
+ *      selectivities for each condition. For example if we know the
+ *      number of distinct values for each column, we can use 1/ndistinct
+ *      as a per-column estimate. Or rather 1/ndistinct + selectivity
+ *      derived from the MCV list.
+ *
+ *      If there's no histogram (thus the distribution is approximated
+ *      only by the MCV list), the size of the stats (whether there are
+ *      some other columns, not referenced in the conditions) does not
+ *      matter. We can do pretty accurate estimation using the MCV.
+ *
+ * TODO Currently there's no logic to consider building only a MCV list
+ *      (and not building the histogram at all).
+ * 
+ * TODO For types that don't reasonably support ordering (either because
+ *      the type does not support that or when the user adds some option
+ *      to the ADD STATISTICS command - e.g. UNSORTED_STATS), building
+ *      the histogram may be pointless and inefficient. This is esp.
+ *      true for varlena types that may be quite large and a large MCV
+ *      list may be a better choice, because it makes equality estimates
+ *      more accurate. Due to the unsorted nature, range queries on those
+ *      attributes are rather useless anyway.
+ *
+ *      Another thing is that by restricting to MCV list and equality
+ *      conditions, we can use hash values instead of long varlena values.
+ *      The equality estimation will be very accurate.
+ *
+ *      This however complicates matching the columns to available
+ *      statistics, as it will require matching clauses (not columns) to
+ *      stats. And it may get quite complex - e.g. what if there are
+ *      multiple clauses, each compatible with different stats subset?
+ * 
+ * FIXME Create a special-purpose type for MCV items (instead of a plain
+ *       Datum array, which is very difficult to work with).
+ */
+static MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats,
+					  int *numrows_filtered)
+{
+	int i, j, idx = 0;
+	int numattrs = attrs->dim1;
+	Size len = sizeof(Datum) * numattrs;
+	bool isNull;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 * 
+	 * TODO We're using Datum (8B), even for data types smaller than this
+	 *      (notably int4 and float4). Maybe we could save some space here,
+	 *      although it seems the bytea compression will handle it just fine.
+	 */
+	Datum * values = palloc0(numrows * numattrs * sizeof(Datum));
+
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			values[idx++] = heap_getattr(rows[j], attrs->values[i], stats[i]->tupDesc, &isNull);
+
+	qsort_arg((void *) values, numrows, sizeof(Datum) * numattrs, compare_scalars_memcmp, &len);
+
+	/*
+	 * Count the number of distinct values - we need this to determine
+	 * the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (memcmp(&values[i * numattrs], &values[(i-1) * numattrs], len) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array.
+	 * 
+	 * TODO for now the threshold is the same as in the single-column
+	 * 		case (average + 25%), but maybe that's worth revisiting
+	 * 
+	 * TODO see if we can fit all the distinct values in the MCV list
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	/*
+	 * If there are less than some number of items, store all with at
+	 * least two rows in the sample.
+	 * 
+	 * FIXME We can do this only if we believe we got all the distinct
+	 *       values of the table.
+	 */
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or a new group */
+		if ((i == numrows) || (memcmp(&values[i * numattrs], &values[(i-1) * numattrs], len) != 0))
+		{
+			/* count the MCV item if exceeding the threshold */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* same group, just increase the number of items */
+			count += 1;
+	}
+
+	/* by default we keep all the rows (even if there's no MCV list) */
+	*numrows_filtered = numrows;
+
+	/* we know the number of mcvitems, now collect them in a 2nd pass */
+	if (nitems > 0)
+	{
+		/* we need to store the frequency for each group, so (numattrs + 1) */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		/* now repeat the same loop as above, but this time copy the data
+		 * for items exceeding the threshold */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+
+			/* last row or a new group */
+			if ((i == numrows) || (memcmp(&values[i * numattrs], &values[(i-1) * numattrs], len) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* first, allocate the item (with the proper size of values) */
+					MCVItem item = (MCVItem)palloc0(offsetof(MCVItemData, values) +
+															  sizeof(Datum)*mcvlist->ndimensions);
+
+					/* then copy values from the _previous_ group */
+					memcpy(item->values, &values[(i-1)*numattrs], len);
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					mcvlist->items[nitems] = item;
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV items.
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+				Datum	keys[numattrs];
+
+				/* collect the key values */
+				for (j = 0; j < numattrs; j++)
+					keys[j] = heap_getattr(rows[i], attrs->values[j], stats[j]->tupDesc, &isNull);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+					if (memcmp(keys, mcvlist->items[j]->values, sizeof(Datum)*numattrs) == 0)
+					{
+						match = true;
+						break;
+					}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the first part */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			pfree(rows_filtered);
+
+		}
+	}
+
+	pfree(values);
+
+	/*
+	 * TODO Single-dimensional MCV is stored sorted by frequency (descending).
+	 *      Maybe this should be stored like that too?
+	 */
+
+	return mcvlist;
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+static int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+static int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting Datum[] (row of Datums) when
+ * counting distinct values.
+ */
+static int
+compare_scalars_memcmp(const void *a, const void *b, void *arg)
+{
+	Size		len = *(Size*)arg;
+
+	return memcmp(a, b, len);
+}
+
+static int
+compare_scalars_memcmp_2(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(Datum));
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cb16c53..28bad78 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -34,6 +34,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_rowsecurity.h"
@@ -89,7 +90,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -137,8 +138,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
@@ -412,7 +414,8 @@ static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKM
 static void ATExecGenericOptions(Relation rel, List *options);
 static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
-
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
 				   ForkNumber forkNum, char relpersistence);
 static const char *storage_name(char c);
@@ -2963,6 +2966,7 @@ AlterTableGetLockLevel(List *cmds)
 				 * updates.
 				 */
 			case AT_SetStatistics:		/* Uses MVCC in getTableAttrs() */
+			case AT_AddStatistics:		/* XXX not sure if the right level */
 			case AT_ClusterOn:	/* Uses MVCC in getIndexes() */
 			case AT_DropCluster:		/* Uses MVCC in getIndexes() */
 			case AT_SetOptions:	/* Uses MVCC in getTableAttrs() */
@@ -3110,6 +3114,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_ADD_CONSTR;
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
+		case AT_AddStatistics:	/* XXX maybe not the right place */
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* Performs own permission checks */
 			ATPrepSetStatistics(rel, cmd->name, cmd->def, lockmode);
@@ -3405,6 +3410,9 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			ATExecSetStatistics(rel, cmd->name, cmd->def, lockmode);
 			break;
+		case AT_AddStatistics:		/* ADD STATISTICS */
+			ATExecAddStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 			ATExecSetOptions(rel, cmd->name, cmd->def, false, lockmode);
 			break;
@@ -11614,3 +11622,197 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 
 	ReleaseSysCache(tuple);
 }
+
+/* used for sorting the attnums in ATExecAddStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the ALTER TABLE ... ADD STATISTICS (options) ON (columns).
+ *
+ * The code is an unholy mix of pieces that really belong to other parts
+ * of the source tree.
+ *
+ * FIXME Check that the types are pass-by-value and support sort,
+ *       although maybe we can live without the sort (and only build
+ *       MCV list / association rules).
+ *
+ * FIXME This should probably check for duplicate stats (i.e. same
+ *       keys, same options). Although maybe it's useful to have
+ *       multiple stats on the same columns with different options
+ *       (say, a detailed MCV-only stats for some queries, histogram
+ *       for others, etc.)
+ */
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	Oid 		atttypids[INDEX_MAX_KEYS];
+	int			numcols = 0;
+
+	Oid			mvstatoid;
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+
+	/* by default build everything */
+	bool 	build_histogram = true,
+			build_mcv = true,
+			build_associations = true;
+
+	/* build regular MCV (not hashed by default) */
+	bool	mcv_hashed = false;
+
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
+
+	Assert(IsA(def, StatisticsDef));
+
+	/* transform the column names to attnum values */
+
+	foreach(l, def->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		atttypids[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->atttypid;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, def->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv_hashed") == 0)
+			mcv_hashed = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "associations") == 0)
+			build_associations = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* TODO check that this is not used with 'histogram off' */
+
+			/* sanity check */
+			if (max_buckets < 1024)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is 1024")));
+
+			else if (max_buckets > 32768) /* FIXME use the proper constant */
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is 1024")));
+
+		}
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* TODO check that this is not used with 'mcv off' */
+
+			/* sanity check */
+			if (max_mcv_items < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be non-negative")));
+
+			else if (max_mcv_items > 8192) /* FIXME use the proper constant */
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is 8192")));
+
+		}
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+	values[Anum_pg_mv_statistic_hist_enabled  -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_mcv_hashed    -1] = BoolGetDatum(mcv_hashed);
+	values[Anum_pg_mv_statistic_assoc_enabled -1] = BoolGetDatum(build_associations);
+
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+
+	nulls[Anum_pg_mv_statistic_staassoc -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	mvstatoid = simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	heap_freetuple(htup);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	return;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 225756c..18464b9 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3879,6 +3879,17 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static StatisticsDef *
+_copyStatisticsDef(const StatisticsDef *from)
+{
+	StatisticsDef  *newnode = makeNode(StatisticsDef);
+
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4690,6 +4701,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_StatisticsDef:
+			retval = _copyStatisticsDef(from);
+			break;
 		case T_PrivGrantee:
 			retval = _copyPrivGrantee(from);
 			break;
@@ -4702,7 +4716,6 @@ copyObject(const void *from)
 		case T_XmlSerialize:
 			retval = _copyXmlSerialize(from);
 			break;
-
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(from));
 			retval = 0;			/* keep compiler quiet */
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 9b657fb..9c32735 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -24,6 +24,9 @@
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
+#include "utils/mvstats.h"
+#include "catalog/pg_collation.h"
+#include "utils/typcache.h"
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -43,6 +46,23 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 
+static bool is_mv_compatible(Node *clause, Oid varRelid, Index *varno,
+							 Bitmapset  **attnums);
+static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
+									  Oid varRelid, Oid *relid);
+static int choose_mv_histogram(int nmvstats, MVStats mvstats,
+							   Bitmapset *attnums);
+static List *clauselist_mv_split(List *clauses, Oid varRelid,
+								 List **mvclauses, MVStats mvstats);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStats mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStats mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStats mvstats);
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -100,14 +120,74 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+	int			nmvstats = 0;
+	MVStats		mvstats = NULL;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
+	/* collect attributes from mv-compatible clauses */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid);
+
+	/*
+	 * If there are mv-compatible clauses, referencing at least two
+	 * columns (otherwise it makes no sense to use mv stats), fetch the
+	 * MV histograms for the relation (only the column keys, not the
+	 * histograms yet - we'll decide which histogram to use first).
+	 */
+	if (bms_num_members(mvattnums) >= 2)
+	{
+		/* clauses compatible with multi-variate stats */
+		List	*mvclauses = NIL;
+
+		/* fetch info from the catalog (not the serialized stats yet) */
+		mvstats = list_mv_stats(relid, &nmvstats, true);
+
+		/*
+		 * If there are candidate statistics, choose the histogram first.
+		 * At the moment we only use a single statistics, covering the
+		 * most columns (using info from the previous step). If there
+		 * are multiple such histograms, we'll use the smallest one
+		 * (with the lowest number of dimensions).
+		 * 
+		 * This may not be optimal choice, if the 'smaller' stats has
+		 * much less buckets than the rejected one (making it less
+		 * accurate).
+		 *
+		 * We may end up without multivariate statistics, if none of the
+		 * stats matches at least two columns from the clauses (in that
+		 * case we may just use the single dimensional stats).
+		 */
+		if (nmvstats > 0)
+		{
+			int idx = choose_mv_histogram(nmvstats, mvstats, mvattnums);
+
+			if (idx >= 0)	/* we have a matching stats */
+			{
+				MVStats mvstat = &mvstats[idx];
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_mv_split(clauses, varRelid, &mvclauses, mvstat);
+
+				/* we've chosen the histogram to match the clauses */
+				Assert(mvclauses != NIL);
+
+				/* compute the multivariate stats */
+				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			}
+		}
+	}
+
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -782,3 +862,1010 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+
+
+/*
+ * Estimate selectivity for the list of MV-compatible clauses, using that
+ * particular histogram.
+ *
+ * When we hit a single bucket, we don't know what portion of it actually
+ * matches the clauses (e.g. equality), and we use 1/2 the bucket by
+ * default. However, the MV histograms are usually less detailed than
+ * the per-column ones, meaning the sum of buckets is often quite high
+ * (thanks to combining a lot of "partially hit" buckets).
+ *
+ * There are several ways to improve this, usually with cases when it
+ * won't really help. Also, the more complex the process, the worse
+ * the failures (i.e. misestimates).
+ *
+ * (1) Use the MV histogram only as a way to combine multiple
+ *     per-column histograms, essentially rewriting
+ *
+ *       P(A & B) = P(A) * P(B|A)
+ *
+ *     where P(B|A) may be computed using a proper "slice" of the
+ *     histogram, by first selecting only buckets where A is true, and
+ *     then using the boundaries to 'restrict' the per-colunm histogram.
+ *
+ *     With more clauses, it gets more complicated, of course
+ *
+ *       P(A & B & C) = P(A & C) * P(B|A & C)
+ *                    = P(A) * P(C|A) * P(B|A & C)
+ *
+ *     and so on.
+ * 
+ *     Of course, the question is how well and efficiently we can
+ *     compute the conditional probabilities - whether this approach
+ *     can improve the estimates (instead of amplifying the errors).
+ *
+ *     Also, this does not eliminate the need for histogram on [A,B,C].
+ *
+ * (2) Use multiple smaller (and more accurate) histograms, and combine
+ *     them using a process similar to the above. E.g. by assuming that
+ *     B and C are independent, we can rewrite
+ *
+ *       P(B|A & C) = P(B|A)
+ * 
+ *     so we can rewrite the whole formula to
+ * 
+ *       P(A & B & C) = P(A) * P(C|A) * P(B|A)
+ * 
+ *     and we're OK with two 2D histograms [A,C] and [A,B].
+ *
+ *     It'd be nice to perform some sort of statistical test (Fisher
+ *     or another chi-squared test) to identify independent components
+ *     and automatically separate them into smaller histograms.
+ *
+ * (3) Using the estimated number of distinct values in a bucket to
+ *     decide the selectivity of equality in the bucket (instead of
+ *     blindly using 1/2 of the bucket, we may use 1/ndistinct).
+ *     Of course, if the ndistinct estimate is way off, or when the
+ *     distribution is not uniform (one distict items get much more
+ *     items), this will fail. Also, we currently don't have ndistinct
+ *     estimate available at this moment (but it shouldn't be that
+ *     difficult to compute as ndistinct and ntuples should be available).
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities
+ *      (i.e. the selectivity of the most restrictive clause), because
+ *      that's the maximum we can ever get from ANDed list of clauses.
+ *      This may probably prevent issues with hitting too many buckets
+ *      and low precision histograms.
+ *
+ * TODO We may support some additional conditions, most importantly
+ *      those matching multiple columns (e.g. "a = b" or "a < b").
+ *      Ultimately we could track multi-table histograms for join
+ *      cardinality estimation.
+ *
+ * TODO Currently this is only estimating all clauses, or clauses
+ *      matching varRelid (when it's not 0). I'm not sure what's the
+ *      purpose of varRelid, but my assumption is this is used for
+ *      join conditions and such. In that case we can use those clauses
+ *      to restrict the other (i.e. filter the histogram buckets first,
+ *      before estimating the other clauses). This is essentially equal
+ *      to computing P(A|B) where "B" are the clauses not matching the
+ *      varRelid.
+ * 
+ * TODO Further thoughts on processing equality clauses - maybe it'd be
+ *      better to look for stats (with MCV) covered by the equality
+ *      clauses, because then we have a chance to find an exact match
+ *      in the MCV list, which is pretty much the best we can do. We may
+ *      also look at the least frequent MCV item, and use it as a upper
+ *      boundary for the selectivity (had there been a more frequent
+ *      item, it'd be in the MCV list).
+ *
+ *      These conditions may then be used as a condition for the other
+ *      selectivities, i.e. we may estimate P(A,B) first, and then
+ *      compute P(C|A,B) from another histogram. This may be useful when
+ *      we can estimate P(A,B) accurately (e.g. because it's a complete
+ *      equality match evaluated on MCV list), and then compute the
+ *      conditional probability P(C|A,B), giving us the requested stats
+ *
+ *          P(A,B,C) = P(A,B) * P(C|A,B)
+ *
+ * TODO There are several options for 'sanity clamping' the estimates.
+ *
+ *      First, if we have selectivities for each condition, then
+ * 
+ *          P(A,B) <= MIN(P(A), P(B))
+ *
+ *      Because additional conditions (connected by AND) can only lower
+ *      the probability.
+ *
+ *      So we can do some basic sanity checks using the single-variate
+ *      stats (the ones we have right now).
+ *
+ *      Second, when we have multivariate stats with a MCV list, then
+ *
+ *      (a) if we have a full equality condition (one equality condition
+ *          on each column) and we found a match in the MCV list, this is
+ *          the selectivity (and it's supposed to be exact)
+ *
+ *      (b) if we have a full equality condition and we haven't found a
+ *          match in the MCV list, then the selectivity is below the
+ *          lowest selectivity in the MCV list
+ *
+ *      (c) if we have a equality condition (not full), we can still
+ *          search the MCV for matches and use the sum of probabilities
+ *          as a lower boundary for the histogram (if there are no
+ *          matches in the MCV list, then we have no boundary)
+ *
+ *      Third, if there are multiple multivariate stats for a set of
+ *      clauses, we may compute all of them and then somehow aggregate
+ *      them - e.g. by choosing the minimum, median or average. The
+ *      multi-variate stats are susceptible to overestimation (because
+ *      we take 50% of the bucket for partial matches). Some stats may
+ *      give better estimates than others, but it's very difficult to
+ *      say determine that in advance which one is the best (it depends
+ *      on the number of buckets, number of additional columns not
+ *      referenced in the clauses etc.) so we may compute all and then
+ *      choose a sane aggregation (minimum seems like a good approach).
+ *      Of course, this may result in longer / more expensive estimation
+ *      (CPU-wise), but it may be worth it.
+ *
+ *      There are ways to address this, though. First, it's possible to
+ *      add a GUC choosing whether to do a 'simple' (using a single
+ *      stats expected to give the best estimate) and 'complex' (combining
+ *      the multiple estimates).
+ * 
+ *          multivariate_estimates = (simple|full)
+ *
+ *      Also, this might be enabled at a table level, by something like
+ * 
+ *          ALTER TABLE ... SET STATISTICS (simple|full)
+ * 
+ *      Which would make it possible to use this only for the tables
+ *      where the simple approach does not work.
+ * 
+ *      Also, there are ways to optimize this algorithmically. E.g. we
+ *      may try to get an estimate from a matching MCV list first, and
+ *      if we happen to get a "full equality match" we may stop computing
+ *      the estimates from other stats (for this condition) because
+ *      that's probably the best estimate we can really get.
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive).
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
+{
+	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation). 
+	 */
+
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ * 
+ */
+static Bitmapset *
+collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid, Oid *relid)
+{
+	Index		varno = 0;
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		is_mv_compatible(clause, varRelid, &varno, &attnums);
+	}
+
+	/*
+	 * If there are at least two attributes referenced by the clause(s),
+	 * fetch the relation info (and pass back the Oid of the relation).
+	 */
+	if (bms_num_members(attnums) > 1)
+	{
+		RelOptInfo *rel = find_base_rel(root, varno);
+		*relid = root->simple_rte_array[bms_singleton_member(rel->relids)]->relid;
+	}
+	else
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * We're looking for a histogram matching at least 2 attributes, and we
+ * want the smallest histogram available wrt. to number of buckets (to
+ * get efficient estimation and likely better precision. The precision
+ * depends on the total number of buckets too, but the lower the number
+ * of dimensions the smaller (and more precise) the buckets can get.
+ */
+static int
+choose_mv_histogram(int nmvstats, MVStats mvstats, Bitmapset *attnums)
+{
+	int i, j;
+
+	int choice = -1;
+	int current_matches = 1;					/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		int matches = 0;	/* columns matching this histogram */
+
+		int2vector * attrs = mvstats[i].stakeys;
+		int	numattrs = mvstats[i].stakeys->dim1;
+
+		/* count columns covered by the histogram */
+		for (j = 0; j < numattrs; j++)
+			if (bms_is_member(attrs->values[j], attnums))
+				matches++;
+
+		/*
+		 * Use this histogram when it improves the number of matches or
+		 * when it keeps the number of matches and is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = i;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen histogram, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(List *clauses, Oid varRelid, List **mvclauses, MVStats mvstats)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		RestrictInfo *rinfo;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/*
+		 * Only restrictinfo may be mv-compatible, so everything else
+		 * goes to the non-mv list directly
+		 * 
+		 * TODO create a macro/function to decide mv-compatible clauses
+		 *      (along the is_opclause for example)
+		 */
+		if (! IsA(clause, RestrictInfo))
+		{
+			non_mvclauses = lappend(non_mvclauses, clause);
+			continue;
+		}
+
+		rinfo = (RestrictInfo *) clause;
+		clause = (Node*)rinfo->clause;
+
+		/* Pseudoconstants go directly to the non-mv list too. */
+		if (rinfo->pseudoconstant)
+		{
+			non_mvclauses = lappend(non_mvclauses, rinfo);
+			continue;
+		}
+
+		if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+				{
+					non_mvclauses = lappend(non_mvclauses, rinfo);
+					continue;
+				}
+
+				/*
+				* If it's not a "<" or ">" or "=" operator, just ignore the
+				* clause. Otherwise note the relid and attnum for the variable.
+				*/
+				switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:
+					case F_SCALARGTSEL:
+					case F_EQSEL:
+						if (! IS_SPECIAL_VARNO(var->varno))	/* FIXME necessary here? */
+						{
+							bool match = false;
+							for (i = 0; i < numattrs; i++)
+								if (attrs->values[i] == var->varattno)
+									match = true;
+
+							if (match)
+								*mvclauses = lappend(*mvclauses, clause);
+							else
+								non_mvclauses = lappend(non_mvclauses, rinfo);
+						}
+				}
+			}
+		}
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ * 
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ */
+static bool
+is_mv_compatible(Node *clause, Oid varRelid, Index *varno, Bitmapset  **attnums)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* get the actual clause from the RestrictInfo ... */
+		clause = (Node*)rinfo->clause;
+
+		/* is it 'variable op constant' ? */
+		if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_SCALARLTSEL:
+						case F_SCALARGTSEL:
+						case F_EQSEL:
+							*varno = var->varno;
+							*attnums = bms_add_member(*attnums, var->varattno);
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * it's assumed we can skip computing the estimate from histogram,
+ * because all the rows matching the condition are represented by the
+ * MCV item.
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram.
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStats mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	ListCell * l;
+	char * mcvitems = NULL;
+	MCVList mcvlist = NULL;
+
+	Bitmapset *matches = NULL;	/* attributes with equality matches */
+
+	/* there's no MCV list yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = deserialize_mv_mcvlist(fetch_mv_mcvlist(mvstats->mvoid));
+
+	Assert(mcvlist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	mcvitems = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(mcvitems, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* no match here */
+	*lowsel = 1.0;
+
+	/* loop through the list of MV-compatible clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+		OpExpr * expr = (OpExpr*)clause;
+		bool		varonleft = true;
+		bool		ok;
+
+		/* operator */
+		FmgrInfo	opproc;
+
+		fmgr_info(get_opcode(expr->opno), &opproc);
+
+		ok = (NumRelids(clause) == 1) &&
+			 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		if (ok)
+		{
+
+			FmgrInfo	ltproc, gtproc;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+			bool isgt = (! varonleft);
+
+			/*
+			 * TODO Fetch only when really needed (probably for equality only)
+			 * TODO Technically either lt/gt is sufficient.
+			 * 
+			 * FIXME The code in analyze.c creates histograms only for types
+			 *       with enough ordering (by calling get_sort_group_operators).
+			 *       Is this the same assumption, i.e. are we certain that we
+			 *       get the ltproc/gtproc every time we ask? Or are there types
+			 *       where get_sort_group_operators returns ltopr and here we
+			 *       get nothing?
+			 */
+			TypeCacheEntry *typecache = lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, mvstats->stakeys);
+
+			fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+			fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+			/* process the MCV list first */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				bool tmp;
+				MCVItem item = mcvlist->items[i];
+
+				/* find the lowest selectivity in the MCV */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* skip MCV items already ruled out */
+				if (mcvitems[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* TODO consider bsearch here (list is sorted by values)
+				 * TODO handle other operators too (LT, GT)
+				 * TODO identify "full match" when the clauses fully
+				 *      match the whole MCV list (so that checking the
+				 *      histogram is not needed)
+				 */
+				if (get_oprrest(expr->opno) == F_EQSEL)
+				{
+					/*
+					 * We don't care about isgt in equality, because it does not matter
+					 * whether it's (var = const) or (const = var).
+					 */
+					if (memcmp(&cst->constvalue, &item->values[idx], sizeof(Datum)) != 0)
+						mcvitems[i] = MVSTATS_MATCH_NONE;
+					else
+						matches = bms_add_member(matches, idx);
+				}
+				else if (get_oprrest(expr->opno) == F_SCALARLTSEL)	/* column < constant */
+				{
+
+					if (! isgt)	/* (var < const) */
+					{
+						/*
+						 * First check whether the constant is below the lower boundary (in that 
+						 * case we can skip the bucket, because there's no overlap).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+
+						if (tmp)
+						{
+							mcvitems[i] = MVSTATS_MATCH_NONE; /* no match */
+							continue;
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+					else	/* (const < var) */
+					{
+						/*
+						 * First check whether the constant is above the upper boundary (in that 
+						 * case we can skip the bucket, because there's no overlap).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 item->values[idx],
+															 cst->constvalue));
+						if (tmp)
+						{
+							mcvitems[i] = MVSTATS_MATCH_NONE; /* no match */
+							continue;
+						}
+					}
+				}
+				else if (get_oprrest(expr->opno) == F_SCALARGTSEL)	/* column > constant */
+				{
+
+					if (! isgt)	/* (var > const) */
+					{
+						/*
+						 * First check whether the constant is above the upper boundary (in that 
+						 * case we can skip the bucket, because there's no overlap).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+						if (tmp)
+						{
+							mcvitems[i] = MVSTATS_MATCH_NONE; /* no match */
+							continue;
+						}
+
+					}
+					else /* (const > var) */
+					{
+						/*
+						 * First check whether the constant is below the lower boundary (in
+						 * that case we can skip the bucket, because there's no overlap).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 item->values[idx],
+															 cst->constvalue));
+						if (tmp)
+						{
+							mcvitems[i] = MVSTATS_MATCH_NONE; /* no match */
+							continue;
+						}
+					}
+
+				} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+
+			}
+		}
+	}
+
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		if (mcvitems[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	*fullmatch = (bms_num_members(matches) == mcvlist->ndimensions);
+
+	pfree(mcvitems);
+	pfree(mcvlist);
+
+	return s;
+}
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram list for the stats, the function returns 0.0.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStats mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	ListCell * l;
+	char *buckets = NULL;
+	MVHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = deserialize_mv_histogram(fetch_mv_histogram(mvstats->mvoid));
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	buckets = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(buckets,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+		OpExpr * expr = (OpExpr*)clause;
+		bool		varonleft = true;
+		bool		ok;
+
+		FmgrInfo	opproc;			/* operator */
+		fmgr_info(get_opcode(expr->opno), &opproc);
+
+		ok = (NumRelids(clause) == 1) &&
+			 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		if (ok)
+		{
+			FmgrInfo	ltproc;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+			bool isgt = (! varonleft);
+
+			/*
+			 * TODO Fetch only when really needed (probably for equality only)
+			 *
+			 * TODO Technically either lt/gt is sufficient.
+			 * 
+			 * FIXME The code in analyze.c creates histograms only for types
+			 *       with enough ordering (by calling get_sort_group_operators).
+			 *       Is this the same assumption, i.e. are we certain that we
+			 *       get the ltproc/gtproc every time we ask? Or are there types
+			 *       where get_sort_group_operators returns ltopr and here we
+			 *       get nothing?
+			 */
+			TypeCacheEntry *typecache
+				= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																   | TYPECACHE_GT_OPR);
+
+			/* lookup dimension for the attribute */
+			int idx = mv_get_index(var->varattno, mvstats->stakeys);
+
+			fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+			/*
+			 * Check this for all buckets that still have "true" in the bitmap
+			 * 
+			 * We already know the clauses use suitable operators (because that's
+			 * how we filtered them).
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				bool tmp;
+				MVBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if (buckets[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/*
+				* If it's not a "<" or ">" or "=" operator, just ignore the
+				* clause. Otherwise note the relid and attnum for the variable.
+				*
+				* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+				*      with reverse order of variable/constant) is correct. I wouldn't
+				*      be surprised if there was some mixup. Using the lt/gt operators
+				*      instead of messing with the opproc could make it simpler.
+				*      It would however be using a different operator than the query,
+				*      although it's not any shadier than using the selectivity function
+				*      as is done currently.
+				*
+				* FIXME Once the min/max values are deduplicated, we can easily minimize
+				*       the number of calls to the comparator (assuming we keep the
+				*       deduplicated structure). See the note on compression at MVBucket
+				*       serialize/deserialize methods.
+				*/
+				switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:	/* column < constant */
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->min[idx]));
+							if (tmp)
+							{
+								buckets[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+							/*
+							 * Now check whether the upper boundary is below the constant (in that
+							 * case it's a partial match).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->max[idx]));
+
+							if (tmp)
+								buckets[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+						}
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->max[idx],
+																 cst->constvalue));
+							if (tmp)
+							{
+								buckets[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+							/*
+							 * Now check whether the lower boundary is below the constant (in that
+							 * case it's a partial match).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->min[idx],
+																 cst->constvalue));
+
+							if (tmp)
+								buckets[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+						}
+						break;
+
+					case F_SCALARGTSEL:	/* column > constant */
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->max[idx]));
+							if (tmp)
+							{
+								buckets[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+							/*
+							 * Now check whether the lower boundary is below the constant (in that
+							 * case it's a partial match).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->min[idx]));
+
+							if (tmp)
+								buckets[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->min[idx],
+																 cst->constvalue));
+							if (tmp)
+							{
+								buckets[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+							/*
+							 * Now check whether the upper boundary is below the constant (in that
+							 * case it's a partial match).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->max[idx],
+																 cst->constvalue));
+
+							if (tmp)
+								buckets[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+						}
+
+						break;
+
+					case F_EQSEL:
+
+						/*
+						 * We only check whether the value is within the bucket, using the lt/gt
+						 * operators fetched from type cache.
+						 * 
+						 * TODO We'll use the default 50% estimate, but that's probably way off
+						 *		if there are multiple distinct values. Consider tweaking this a
+						 *		somehow, e.g. using only a part inversely proportional to the
+						 *		estimated number of distinct values in the bucket.
+						 *
+						 * TODO This does not handle inclusion flags at the moment, thus counting
+						 *		some buckets twice (when hitting the boundary).
+						 * 
+						 * TODO Optimization is that if max[i] == min[i], it's effectively a MCV
+						 *		item and we can count the whole bucket as a complete match (thus
+						 *		using 100% bucket selectivity and not just 50%).
+						 * 
+						 * TODO Technically some buckets may "degenerate" into single-value
+						 *		buckets (not necessarily for all the dimensions) - maybe this
+						 *		is better than keeping a separate MCV list (multi-dimensional).
+						 *		Update: Actually, that's unlikely to be better than a separate
+						 *		MCV list for two reasons - first, it requires ~2x the space
+						 *		(because of storing lower/upper boundaries) and second because
+						 *		the buckets are ranges - depending on the partitioning algorithm
+						 *		it may not even degenerate into (min=max) bucket. For example the
+						 *		the current partitioning algorithm never does that.
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 bucket->min[idx]));
+
+						if (tmp)
+						{
+							buckets[i] = MVSTATS_MATCH_NONE;	/* constvalue < min */
+							continue;
+						}
+
+						tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+															 DEFAULT_COLLATION_OID,
+															 bucket->max[idx],
+															 cst->constvalue));
+
+						if (tmp)
+						{
+							buckets[i] = MVSTATS_MATCH_NONE;	/* constvalue > max */
+							continue;
+						}
+
+						/* partial match */
+						buckets[i] = MVSTATS_MATCH_PARTIAL;
+
+						break;
+				}
+			}
+		}
+	}
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		if (buckets[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (buckets[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+	return s;
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 77d2f29..038c878 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -365,6 +365,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_generic_options alter_generic_options
 				relation_expr_list dostmt_opt_list
 
+%type <list>	OptStatsOptions 
+%type <str>		stats_options_name
+%type <node>	stats_options_arg
+%type <defelt>	stats_options_elem
+%type <list>	stats_options_list
+
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -483,7 +490,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <keyword> unreserved_keyword type_func_name_keyword
 %type <keyword> col_name_keyword reserved_keyword
 
-%type <node>	TableConstraint TableLikeClause
+%type <node>	TableConstraint TableLikeClause TableStatistics
 %type <ival>	TableLikeOptionList TableLikeOption
 %type <list>	ColQualList
 %type <node>	ColConstraint ColConstraintElem ConstraintAttr
@@ -2327,6 +2334,14 @@ alter_table_cmd:
 					n->subtype = AT_DisableRowSecurity;
 					$$ = (Node *)n;
 				}
+			/* ALTER TABLE <name> ADD STATISTICS (options) ON (columns) ... */
+			| ADD_P TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_AddStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
 			| alter_generic_options
 				{
 					AlterTableCmd *n = makeNode(AlterTableCmd);
@@ -3397,6 +3412,56 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				ALTER TABLE relname ADD STATISTICS (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+TableStatistics:
+			STATISTICS OptStatsOptions ON '(' columnList ')'
+				{
+					StatisticsDef *n = makeNode(StatisticsDef);
+					n->keys  = $5;
+					n->options  = $2;
+					$$ = (Node *) n;
+				}
+		;
+
+OptStatsOptions:
+			'(' stats_options_list ')'		{ $$ = $2; }
+			| /*EMPTY*/						{ $$ = NIL; }
+		;
+
+stats_options_list:
+			stats_options_elem
+				{
+					$$ = list_make1($1);
+				}
+			| stats_options_list ',' stats_options_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+stats_options_elem:
+			stats_options_name stats_options_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+stats_options_name:
+			NonReservedWord			{ $$ = $1; }
+		;
+
+stats_options_arg:
+			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
+			| NumericOnly			{ $$ = (Node *) $1; }
+			| /* EMPTY */			{ $$ = NULL; }
+		;
+
 
 /*****************************************************************************
  *
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 94d951c..ec90773 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -499,6 +500,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		128
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 870692c..d57cdbe 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,11 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3259, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3259
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3264, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3264
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..703931e
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,89 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3260
+
+CATALOG(pg_mv_statistic,3260)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+
+	/* statistics requested to build */
+	bool		hist_enabled;		/* build histogram? */
+	bool		mcv_enabled;		/* build MCV list? */
+	bool		mcv_hashed;			/* build hashed MCV? */
+	bool		assoc_enabled;		/* analyze associations? */
+
+	/* histogram / MCV size */
+	int32		hist_max_buckets;	/* max buckets */
+	int32		mcv_max_items;		/* max MCV items */
+
+	/* statistics that are available (if requested) */
+	bool		hist_built;			/* histogram was built */
+	bool		mcv_built;			/* MCV list was built */
+	bool		assoc_built;		/* associations were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		staassoc;			/* association rules (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_attrdef
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					14
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_hist_enabled		2
+#define Anum_pg_mv_statistic_mcv_enabled		3
+#define Anum_pg_mv_statistic_mcv_hashed			4
+#define Anum_pg_mv_statistic_assoc_enabled		5
+#define Anum_pg_mv_statistic_hist_max_buckets	6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_hist_built			8
+#define Anum_pg_mv_statistic_mcv_built			9
+#define Anum_pg_mv_statistic_assoc_built		10
+#define Anum_pg_mv_statistic_stakeys			11
+#define Anum_pg_mv_statistic_staassoc			12
+#define Anum_pg_mv_statistic_stamcv				13
+#define Anum_pg_mv_statistic_stahist			14
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 3ce9849..6961b7c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2647,6 +2647,13 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3261 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3262 (  pg_mv_stats_mvclist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_mvclist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3263 (  pg_mv_stats_histogram_gnuplot	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_histogram_gnuplot _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: 2D histogram gnuplot");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index a4af551..02b9aa3 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3952, 3954);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 154d943..36e675b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -410,6 +410,7 @@ typedef enum NodeTag
 	T_XmlSerialize,
 	T_WithClause,
 	T_CommonTableExpr,
+	T_StatisticsDef,
 
 	/*
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f3aa69e..e7ed773 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -542,6 +542,14 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct StatisticsDef
+{
+	NodeTag		type;
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+} StatisticsDef;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1337,7 +1345,8 @@ typedef enum AlterTableType
 	AT_ReplicaIdentity,			/* REPLICA IDENTITY */
 	AT_EnableRowSecurity,		/* ENABLE ROW SECURITY */
 	AT_DisableRowSecurity,		/* DISABLE ROW SECURITY */
-	AT_GenericOptions			/* OPTIONS (...) */
+	AT_GenericOptions,			/* OPTIONS (...) */
+	AT_AddStatistics			/* add statistics */
 } AlterTableType;
 
 typedef struct ReplicaIdentityStmt
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..157891a
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,283 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+/*
+ * Multivariate statistics for planner/optimizer, implementing extensions
+ * of the single-column statistics:
+ * 
+ * - multivariate MCV list
+ * - multivariate histograms
+ *
+ * There's also an experimental support for associative rules (values in
+ * one column implying values in other columns - e.g. ZIP code implies
+ * name of a city, etc.).
+ *
+ * The current implementation has various limitations:
+ *
+ *  (a) it supports only data types passed by value
+ *
+ *  (b) no support for NULL values
+ *
+ * Both (a) and (b) should be straightforward to fix (and usually
+ * described in comments at related data structures or functions).
+ *
+ * The stats may be built only directly on columns, not on expressions.
+ * And there are usually some additional technical limits (e.g. number
+ * of columns in a histogram, etc.).
+ *
+ * Those limits serve mostly as sanity checks and while increasing them
+ * is possible (the implementation should not break), it's expected to
+ * lead either to very bad precision or expensive planning.
+ */
+
+/*
+ * Multivariate histograms
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by an array of lower and
+ * upper boundaries, so that for for the i-th attribute
+ * 
+ *     min[i] <= value[i] <= max[i]
+ *
+ * Each bucket tracks frequency (fraction of tuples it contains),
+ * information about the inequalities, number of distinct values in
+ * each dimension (which is used when building the histogram) etc.
+ *
+ * The boundaries may be either inclusive or exclusive, or the whole
+ * dimension may be NULL.
+ *
+ * The buckets may overlap (assuming the build algorithm keeps the
+ * frequencies additive) or may not cover the whole space (i.e. allow
+ * gaps). This entirely depends on the algorithm used to build the
+ * histogram.
+ *
+ * The histograms are marked with a 'magic' constant, mostly to make
+ * sure the bytea really is a histogram in serialized form.
+ *
+ * We do expect to support multiple histogram types, with different
+ * features etc. The 'type' field is used to identify those types.
+ * Technically some histogram types might use completely different
+ * bucket representation, but that's not expected at the moment.
+ *
+ * TODO Add pointer to 'private' data, meant for private data for
+ *      other algorithms for building the histogram.
+ *
+ * TODO The current implementation does not handle NULL values (it's
+ *      somehow prepared for that, but the algorithm building the
+ *      histogram ignores them). The idea is to build buckets with one
+ *      or more NULL-only dimensions - there'll be at most 2^ndimensions
+ *      such buckets, which for 8 atttributes (current limit) is 256.
+ *      That's quite reasonable, considering we expect thousands of
+ *      buckets in total.
+ * 
+ * TODO This structure is used both when building the histogram, and
+ *      then when using it to compute estimates. That's why the last
+ *      few elements are not used once the histogram is built.
+ *
+ * TODO The limit on number of buckets is quite arbitrary, aiming for
+ *      sufficient accuracy while still being fast. Probably should be
+ *      replaced with a dynamic limit dependent on statistics target,
+ *      number of attributes (dimensions) and statistics target
+ *      associated with the attributes. Also, this needs to be related
+ *      to the number of sampled rows, by either clamping it to a
+ *      reasonable number (after seeing the number of rows) or using
+ *      it when computing the number of rows to sample. Something like
+ *      10 rows per bucket seems reasonable.
+ *
+ * TODO We may replace the bool arrays with a suitably large data type
+ *      (say, uint16 or uint32) and get rid of the allocations. It's
+ *      unlikely we'll ever support more than 32 columns as that'd
+ *      result in poor precision, huge histograms (splitting each
+ *      dimension once would mean 2^32 buckets), and very expensive
+ *      estimation. MCVItem already does it this way.
+ *
+ * TODO Actually the distinct stats (both for combination of all columns
+ *      and for combinations of various subsets of columns) should be
+ *      moved to a separate structure (next to histogram/MCV/...) to
+ *      make it useful even without a histogram computed etc.
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+	float	ndistinct;	/* frequency of distinct values */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized), but
+	 * it could be useful for estimating ndistinct for combinations of
+	 * columns.
+	 *
+	 * It would mean tracking 2^N values for each bucket, and even if
+	 * those values might be stores in 1B it's still a lot of space
+	 * (considering the expected number of buckets).
+	 *
+	 * TODO Consider tracking ndistincts for all attribute combinations.
+	 */
+	uint32 *ndistincts;
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */ 
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/*
+	 * Sample tuples falling into this bucket, index of the dimension
+	 * the bucket was split by in the last step.
+	 *
+	 * XXX These fields are needed only while building the histogram,
+	 *     and are not serialized at all.
+	 */
+	HeapTuple  *rows;
+	uint32		numrows;
+	int			last_split_dimension;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVHIST_MAGIC	0x7F8C5670		/* marks serialized bytea */
+#define MVHIST_TYPE_BASIC		1		/* basic histogram type */
+
+/* limits (mostly sanity check, may be relaxed in the future) */
+#define MVHIST_MAX_BUCKETS		16384	/* max number of buckets */
+
+/* bucket size in a serialized form */
+#define BUCKET_SIZE_SERIALIZED(ndims) \
+	(offsetof(MVBucketData, ndistincts) + \
+	(ndims) * (2 * sizeof(uint16) + sizeof(uint32) + 3 * sizeof(bool)))
+
+
+/*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ *
+ * This already uses the trick with using uint32 as a null bitmap.
+ * 
+ * TODO Shouldn't the MCVItemData use plain pointer for values, instead
+ *      of the single-item array trick?
+ *
+ * TODO It's possible to build a special case of MCV list, storing not
+ *      the actual values but only 32/64-bit hash. This is only useful
+ *      for estimating equality clauses and for large varlena types.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	uint32		nulls;		/* lags of NULL values (up to 32 columns) */
+	Datum		values[1];	/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */ 
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/* TODO consider increasing the limit, and/or using statistics target */
+#define MVSTAT_MCVLIST_MAX_ITEMS	1024	/* max items in MCV list */
+
+
+/*
+ * Basic info about the stats, used when choosing what to use
+ * 
+ * TODO Add info about what statistics is available (histogram, MCV,
+ *      hashed MCV, assciative rules).
+ */
+typedef struct MVStatsData {
+	Oid			mvoid;		/* OID of the stats in pg_mv_statistic */
+	int2vector *stakeys;	/* attnums for columns in the stats */
+	bool		hist_built;	/* histogram is already available */
+	bool		mcv_built;	/* MCV list is already available */
+	bool		assoc_built;	/* associative rules available */
+} MVStatsData;
+
+typedef struct MVStatsData *MVStats;
+
+
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
+bytea * fetch_mv_histogram(Oid mvoid);
+bytea * fetch_mv_mcvlist(Oid mvoid);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVHistogram deserialize_mv_histogram(bytea * data);
+MCVList     deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_gnuplot(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mvclist_info(PG_FUNCTION_ARGS);
+
+#endif
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index f97229f..a275bd5 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,

Albe Laurenz

laurenz.albe@wien.gv.at

over 11 years ago

In reply to: Tomas Vondra (#1)

Re: WIP: multivariate statistics / proof of concept

Tomas Vondra wrote:

attached is a WIP patch implementing multivariate statistics.

I think that is pretty useful.
Oracle has an identical feature called "extended statistics".

That's probably an entirely different thing, but it would be very
nice to have statistics to estimate the correlation between columns
of different tables, to improve the estimate for the number of rows
in a join.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: Albe Laurenz (#2)

Re: WIP: multivariate statistics / proof of concept

Hi!

On 13.10.2014 09:36, Albe Laurenz wrote:

Tomas Vondra wrote:

attached is a WIP patch implementing multivariate statistics.

I think that is pretty useful.
Oracle has an identical feature called "extended statistics".

That's probably an entirely different thing, but it would be very
nice to have statistics to estimate the correlation between columns
of different tables, to improve the estimate for the number of rows
in a join.

I don't have a clear idea of how that should work, but from the quick
look at how join selectivity estimation is implemented, I believe two
things might be possible:

(a) using conditional probabilities

Say we have a join "ta JOIN tb ON (ta.x = tb.y)"

Currently, the selectivity is derived from stats on the two keys.
Essentially probabilities P(x), P(y), represented by the MCV lists.
But if there are additional WHERE conditions on the tables, and we
have suitable multivariate stats, it's possible to use conditional
probabilities.

E.g. if the query actually uses

... ta JOIN tb ON (ta.x = tb.y) WHERE ta.z = 10

and we have stats on (ta.x, ta.z), we can use P(x|z=10) instead.
If the two columns are correlated, this might be much different.

(b) using this for multi-column conditions

If the join condition involves multiple columns, e.g.

ON (ta.x = tb.y AND ta.p = tb.q)

and we happen to have stats on (ta.x,ta.p) and (tb.y,tb.q), we may
use this to compute the cardinality (pretty much as we do today).

But I haven't really worked on this so far, I suspect there are various
subtle issues and I certainly don't plan to address this in the first
phase of the patch.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: Tomas Vondra (#1)

Re: WIP: multivariate statistics / proof of concept

On Mon, Oct 13, 2014 at 11:00 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Hi,

attached is a WIP patch implementing multivariate statistics. The code
certainly is not "ready" - parts of it look as if written by a rogue
chimp who got bored of attempts to type the complete works of William
Shakespeare, and decided to try something different.

I'm really glad you're working on this. I had been thinking of looking into
doing this myself.

The last point is really just "unfinished implementation" - the syntax I
propose is this:

ALTER TABLE ... ADD STATISTICS (options) ON (columns)

where the options influence the MCV list and histogram size, etc. The
options are recognized and may give you an idea of what it might do, but
it's not really used at the moment (except for storing in the
pg_mv_statistic catalog).

I've not really gotten around to looking at the patch yet, but I'm also
wondering if it would be simple include allowing functional statistics too.
The pg_mv_statistic name seems to indicate multi columns, but how about
stats on date(datetime_column), or perhaps any non-volatile function. This
would help to solve the problem highlighted here
/messages/by-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9Uf34DhQ@mail.gmail.com
. Without giving it too much thought, perhaps any expression that can be
indexed should be allowed to have stats? Would that be really difficult to
implement in comparison to what you've already done with the patch so far?

I'm quite interested in reviewing your work on this, but it appears that
some of your changes are not C89:

src\backend\commands\analyze.c(3774): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(3774): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(3774): error C2133: 'indexes' : unknown
size [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2133: 'ndistincts' : unknown
size [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2133: 'keys' : unknown size
[D:\Postgres\a\postgres.vcxproj]

The compiler I'm using is a bit too stupid to understand the C99 syntax.

I guess you'd need to palloc() these arrays instead in order to comply with
the project standards.

http://www.postgresql.org/docs/devel/static/install-requirements.html

I'm going to sign myself up to review this, so probably my first feedback
would be the compiling problem.

Regards

David Rowley

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: David Rowley (#4)

Re: WIP: multivariate statistics / proof of concept

Dne 29 Říjen 2014, 10:41, David Rowley napsal(a):

I've not really gotten around to looking at the patch yet, but I'm also
wondering if it would be simple include allowing functional statistics
too.
The pg_mv_statistic name seems to indicate multi columns, but how about
stats on date(datetime_column), or perhaps any non-volatile function. This
would help to solve the problem highlighted here
/messages/by-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9Uf34DhQ@mail.gmail.com
. Without giving it too much thought, perhaps any expression that can be
indexed should be allowed to have stats? Would that be really difficult to
implement in comparison to what you've already done with the patch so far?

I don't know, but it seems mostly orthogonal to what the patch aims to do.
If we add collecting statistics on expressions (on a single column), then I'd
expect it to be reasonably simple to add this to the multi-column case.

There are features like join stats or range type stats, that are probably
more directly related to the patch (but out of scope for the initial
version).

I'm quite interested in reviewing your work on this, but it appears that
some of your changes are not C89:

src\backend\commands\analyze.c(3774): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(3774): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(3774): error C2133: 'indexes' : unknown
size [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2133: 'ndistincts' : unknown
size [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2133: 'keys' : unknown size
[D:\Postgres\a\postgres.vcxproj]

The compiler I'm using is a bit too stupid to understand the C99 syntax.

I guess you'd need to palloc() these arrays instead in order to comply
with
the project standards.

http://www.postgresql.org/docs/devel/static/install-requirements.html

I'm going to sign myself up to review this, so probably my first feedback
would be the compiling problem.

I'll look into that. The thing is I don't have access to MSVC, so it's a bit
difficult to spot / fix those issues :-(

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: David Rowley (#4)

Re: WIP: multivariate statistics / proof of concept

On 29/10/14 10:41, David Rowley wrote:

On Mon, Oct 13, 2014 at 11:00 AM, Tomas Vondra <tv@fuzzy.cz

The last point is really just "unfinished implementation" - the syntax I
propose is this:

ALTER TABLE ... ADD STATISTICS (options) ON (columns)

where the options influence the MCV list and histogram size, etc. The
options are recognized and may give you an idea of what it might do, but
it's not really used at the moment (except for storing in the
pg_mv_statistic catalog).

I've not really gotten around to looking at the patch yet, but I'm also
wondering if it would be simple include allowing functional statistics
too. The pg_mv_statistic name seems to indicate multi columns, but how
about stats on date(datetime_column), or perhaps any non-volatile
function. This would help to solve the problem highlighted here
/messages/by-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9Uf34DhQ@mail.gmail.com
. Without giving it too much thought, perhaps any expression that can be
indexed should be allowed to have stats? Would that be really difficult
to implement in comparison to what you've already done with the patch so
far?

I would not over-complicate requirements for the first version of this,
I think it's already complicated enough.

Quick look at the patch suggests that it mainly needs discussion about
design and particular implementation choices, there is fair amount of
TODOs and FIXMEs. I'd like to look at it too but I doubt that I'll have
time to do in depth review in this CF.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: Petr Jelinek (#6)

Re: WIP: multivariate statistics / proof of concept

Dne 29 Říjen 2014, 12:31, Petr Jelinek napsal(a):

On 29/10/14 10:41, David Rowley wrote:

On Mon, Oct 13, 2014 at 11:00 AM, Tomas Vondra <tv@fuzzy.cz

The last point is really just "unfinished implementation" - the
syntax I
propose is this:

ALTER TABLE ... ADD STATISTICS (options) ON (columns)

where the options influence the MCV list and histogram size, etc.
The
options are recognized and may give you an idea of what it might do,
but
it's not really used at the moment (except for storing in the
pg_mv_statistic catalog).

I've not really gotten around to looking at the patch yet, but I'm also
wondering if it would be simple include allowing functional statistics
too. The pg_mv_statistic name seems to indicate multi columns, but how
about stats on date(datetime_column), or perhaps any non-volatile
function. This would help to solve the problem highlighted here
/messages/by-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9Uf34DhQ@mail.gmail.com
. Without giving it too much thought, perhaps any expression that can be
indexed should be allowed to have stats? Would that be really difficult
to implement in comparison to what you've already done with the patch so
far?

I would not over-complicate requirements for the first version of this,
I think it's already complicated enough.

My thoughts, exactly. I'm not willing to put more features into the
initial version of the patch. Actually, I'm thinking about ripping out
some experimental features (particularly "hashed MCV" and "associative
rules").

Quick look at the patch suggests that it mainly needs discussion about
design and particular implementation choices, there is fair amount of
TODOs and FIXMEs. I'd like to look at it too but I doubt that I'll have
time to do in depth review in this CF.

Yes. I think it's a bit premature to discuss the code thoroughly at this
point - I'd like to discuss the general approach to the feature (i.e.
minimizing the impact on those not using it, etc.).

The most interesting part of the code are probably the comments,
explaining the design in more detail, known shortcomings and possible ways
to address them.

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: Tomas Vondra (#7)

Re: WIP: multivariate statistics / proof of concept

On Thu, Oct 30, 2014 at 12:48 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Dne 29 Říjen 2014, 12:31, Petr Jelinek napsal(a):

I've not really gotten around to looking at the patch yet, but I'm also
wondering if it would be simple include allowing functional statistics
too. The pg_mv_statistic name seems to indicate multi columns, but how
about stats on date(datetime_column), or perhaps any non-volatile
function. This would help to solve the problem highlighted here

/messages/by-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9Uf34DhQ@mail.gmail.com

. Without giving it too much thought, perhaps any expression that can be
indexed should be allowed to have stats? Would that be really difficult
to implement in comparison to what you've already done with the patch so
far?

I would not over-complicate requirements for the first version of this,
I think it's already complicated enough.

My thoughts, exactly. I'm not willing to put more features into the
initial version of the patch. Actually, I'm thinking about ripping out
some experimental features (particularly "hashed MCV" and "associative
rules").

That's fair, but I didn't really mean to imply that you should go work on
that too and that it should be part of this patch..
I was thinking more along the lines of that I don't really agree with the
table name for the new stats and that at some later date someone will want
to add expression stats and we'd probably better come up design that would
be friendly towards that. At this time I can only think that the name of
the table might not suit well to expression stats, I'd hate to see someone
have to invent a 3rd table to support these when we could likely come up
with something that could be extended later and still make sense both today
and in the future.

I was just looking at how expression indexes are stored in pg_index and I
see that if it's an expression index that the expression is stored in
the indexprs column which is of type pg_node_tree, so quite possibly at
some point in the future the new stats table could just have an extra
column added, and for today, we'd just need to come up with a future proof
name... Perhaps pg_statistic_ext or pg_statisticx, and name functions and
source files something along those lines instead?

Regards

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: Tomas Vondra (#5)

Re: WIP: multivariate statistics / proof of concept

On Thu, Oct 30, 2014 at 12:21 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Dne 29 Říjen 2014, 10:41, David Rowley napsal(a):

I'm quite interested in reviewing your work on this, but it appears that
some of your changes are not C89:

src\backend\commands\analyze.c(3774): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(3774): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(3774): error C2133: 'indexes' : unknown
size [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2133: 'ndistincts' :

unknown

size [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2133: 'keys' : unknown size
[D:\Postgres\a\postgres.vcxproj]

I'll look into that. The thing is I don't have access to MSVC, so it's a
bit
difficult to spot / fix those issues :-(

It should be a pretty simple fix, just use the files and line numbers from
the above. It's just a problem that in those 3 places you're declaring an
array of a variable size, which is not allowed in C89. The thing to do
instead would just be to palloc() the size you need and the pfree() it when
you're done.

Regards

David Rowley

#10

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: David Rowley (#8)

Re: WIP: multivariate statistics / proof of concept

Dne 30 Říjen 2014, 10:17, David Rowley napsal(a):

On Thu, Oct 30, 2014 at 12:48 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Dne 29 Říjen 2014, 12:31, Petr Jelinek napsal(a):

I've not really gotten around to looking at the patch yet, but I'm

also

wondering if it would be simple include allowing functional

statistics

too. The pg_mv_statistic name seems to indicate multi columns, but

how

about stats on date(datetime_column), or perhaps any non-volatile
function. This would help to solve the problem highlighted here

/messages/by-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9Uf34DhQ@mail.gmail.com

. Without giving it too much thought, perhaps any expression that can

be

indexed should be allowed to have stats? Would that be really

difficult

to implement in comparison to what you've already done with the patch

so

far?

I would not over-complicate requirements for the first version of

this,

I think it's already complicated enough.

My thoughts, exactly. I'm not willing to put more features into the
initial version of the patch. Actually, I'm thinking about ripping out
some experimental features (particularly "hashed MCV" and "associative
rules").

That's fair, but I didn't really mean to imply that you should go work on
that too and that it should be part of this patch..
I was thinking more along the lines of that I don't really agree with the
table name for the new stats and that at some later date someone will want
to add expression stats and we'd probably better come up design that would
be friendly towards that. At this time I can only think that the name of
the table might not suit well to expression stats, I'd hate to see someone
have to invent a 3rd table to support these when we could likely come up
with something that could be extended later and still make sense both
today
and in the future.

I was just looking at how expression indexes are stored in pg_index and I
see that if it's an expression index that the expression is stored in
the indexprs column which is of type pg_node_tree, so quite possibly at
some point in the future the new stats table could just have an extra
column added, and for today, we'd just need to come up with a future proof
name... Perhaps pg_statistic_ext or pg_statisticx, and name functions and
source files something along those lines instead?

Ah, OK. I don't think the catalog name "pg_mv_statistic" is somehow
inappropriate for this purpose, though. IMHO the "multivariate" does not
mean "only columns" or "no expressions", it simply describes that the
approximated density function has multiple input variables, be it
attributes or expressions.

But maybe there's a better name.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: David Rowley (#9)

1 attachment(s)

Re: WIP: multivariate statistics / proof of concept

On 30.10.2014 10:23, David Rowley wrote:

On Thu, Oct 30, 2014 at 12:21 AM, Tomas Vondra <tv@fuzzy.cz
<mailto:tv@fuzzy.cz>> wrote:

Dne 29 Říjen 2014, 10:41, David Rowley napsal(a):

I'm quite interested in reviewing your work on this, but it

appears that

some of your changes are not C89:

src\backend\commands\analyze.c(3774): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(3774): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(3774): error C2133: 'indexes' :

unknown

size [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4302): error C2133: 'ndistincts' :

unknown

size [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2057: expected constant
expression [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2466: cannot allocate an
array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
src\backend\commands\analyze.c(4775): error C2133: 'keys' :

unknown size

[D:\Postgres\a\postgres.vcxproj]

I'll look into that. The thing is I don't have access to MSVC, so
it's a bit difficult to spot / fix those issues :-(

It should be a pretty simple fix, just use the files and line
numbers from the above. It's just a problem that in those 3 places
you're declaring an array of a variable size, which is not allowed in
C89. The thing to do instead would just be to palloc() the size you
need and the pfree() it when you're done.

Attached is a patch that should fix these issues.

The bad news is there are a few installcheck failures (and were in the
previous patch, but I haven't noticed for some reason). Apparently,
there's some mixup in how the patch handles Var->varno in some causes,
causing issues with a handful of regression tests.

The problem is that is_mv_compatible (checking whether the condition is
compatible with multivariate stats) does this

if (! ((varRelid == 0) || (varRelid == var->varno)))
return false;

/* Also skip special varno values, and system attributes ... */
if ((IS_SPECIAL_VARNO(var->varno)) ||
(! AttrNumberIsForUserDefinedAttr(var->varattno)))
return false;

assuming that after this, varno represents an index into the range
table, and passes it out to the caller.

And the caller (collect_mv_attnums) does this:

RelOptInfo *rel = find_base_rel(root, varno);

which fails with errors like these:

ERROR: no relation entry for relid 0
ERROR: no relation entry for relid 1880

or whatever. What's even stranger is this:

regression=# SELECT table_name, is_updatable, is_insertable_into
regression-# FROM information_schema.views
regression-# WHERE table_name = 'rw_view1';
ERROR: no relation entry for relid 0
regression=# SELECT table_name, is_updatable, is_insertable_into
regression-# FROM information_schema.views
regression-# ;
regression=# SELECT table_name, is_updatable, is_insertable_into
regression-# FROM information_schema.views
regression-# WHERE table_name = 'rw_view1';
table_name | is_updatable | is_insertable_into
------------+--------------+--------------------
(0 rows)

regression=# explain SELECT table_name, is_updatable, is_insertable_into
FROM information_schema.views
WHERE table_name = 'rw_view1';
ERROR: no relation entry for relid 0

So, the query fails. After removing the WHERE clause it works, and this
somehow fixes the original query (with the WHERE clause). Nevertheless,
I still can't do explain on the query.

Clearly, I'm doing something wrong. I suspect it's caused either by
conditions involving function calls, or the fact that the view is a join
of multiple tables. But what?

For simple queries (single table, ...) it seems to be working fine.

regards
Tomas

Attachments:

multivar-stats-v2.patchtext/x-diff; name=multivar-stats-v2.patchDownload

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index b257b02..6e63afe 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a819952..bb82fe8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -152,6 +152,18 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.stakeys AS attnums,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mvclist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 954e5a6..32e0d07 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -54,7 +55,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Data structure for Algorithm S from Knuth 3.4.2 */
 typedef struct
@@ -111,6 +116,62 @@ static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
 
+/* multivariate statistics (histogram, MCV list, associative rules) */
+
+static void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+static void update_mv_stats(Oid relid,
+							MVHistogram histogram, MCVList mcvlist);
+
+/* multivariate histograms */
+static MVHistogram build_mv_histogram(int numrows, HeapTuple *rows,
+									  int2vector *attrs,
+									  int attr_cnt, VacAttrStats **vacattrstats,
+									  int numrows_total);
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs, int natts,
+										 VacAttrStats **vacattrstats);
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 int natts, VacAttrStats **vacattrstats);
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+/* multivariate MCV list */
+static MCVList build_mv_mcvlist(int numrows, HeapTuple *rows,
+								int2vector *attrs,
+								int natts, VacAttrStats **vacattrstats,
+								int *numrows_filtered);
+
+/* multivariate associative rules */
+static void build_mv_associations(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  int natts, VacAttrStats **vacattrstats);
+
+/* serialization */
+static bytea * serialize_mv_histogram(MVHistogram histogram);
+static bytea * serialize_mv_mcvlist(MCVList mcvlist);
+
+/* comparators, used when constructing multivariate stats */
+static int compare_scalars_simple(const void *a, const void *b, void *arg);
+static int compare_scalars_partition(const void *a, const void *b, void *arg);
+static int compare_scalars_memcmp(const void *a, const void *b, void *arg);
+static int compare_scalars_memcmp_2(const void *a, const void *b);
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+								int natts, VacAttrStats **vacattrstats);
+
+/* some debugging methods */
+#ifdef MVSTATS_DEBUG
+static void print_mv_histogram_info(MVHistogram histogram);
+#endif
+
+
 /*
  *	analyze_rel() -- analyze one relation
  */
@@ -472,6 +533,13 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 * 
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For small number of dimensions it works, but
+	 *       for complex stats it'd be nice use sample proportional to
+	 *       the table (say, 0.5% - 1%) instead of a fixed size.
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -574,6 +642,9 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
@@ -2815,3 +2886,1985 @@ compare_mcvs(const void *a, const void *b)
 
 	return da - db;
 }
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+static void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	MVStats mvstats;
+	int		nmvstats;
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 * 
+	 * TODO move this to a separate method or something ...
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel), &nmvstats, false);
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MCVList		mcvlist   = NULL;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = mvstats[i].stakeys;
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze associations between pairs of columns.
+		 * 
+		 * FIXME store the identified associations back to pg_mv_statistic
+		 */
+		build_mv_associations(numrows, rows, attrs, natts, vacattrstats);
+
+		/* build the MCV list */
+		mcvlist = build_mv_mcvlist(numrows, rows, attrs, natts, vacattrstats, &numrows_filtered);
+
+		/*
+		 * Build a multivariate histogram on the columns.
+		 *
+		 * FIXME remove the rows used to build the MCV from the histogram.
+		 *       Another option might be subtracting the MCV selectivities
+		 *       from the histogram, but I'm not sure whether that works
+		 *       accurately (maybe it introduces additional errors).
+		 */
+		if (numrows_filtered > 0)
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, natts, vacattrstats, numrows);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(mvstats[i].mvoid, histogram, mcvlist);
+
+#ifdef MVSTATS_DEBUG
+		print_mv_histogram_info(histogram);
+#endif
+
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(stats[i]->compute_stats == compute_scalar_stats);
+
+		/* TODO remove the 'pass by value' requirement */
+		Assert(stats[i]->attrtype->typbyval);
+	}
+
+	return stats;
+}
+
+/*
+ * TODO Add ndistinct estimation, probably the one described in "Towards
+ *      Estimation Error Guarantees for Distinct Values, PODS 2000,
+ *      p. 268-279" (the ones called GEE, or maybe AE).
+ *
+ * TODO The "combined" ndistinct is more likely to scale with the number
+ *      of rows (in the table), because a single column behaving this
+ *      way is sufficient for such behavior.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+
+	/* info for the interesting attributes only */
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/* resulting bucket */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+	bucket->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/*
+	 * All the sample rows fall into the initial bucket.
+	 * 
+	 * FIXME This is wrong (unless all columns are NOT NULL), because we
+	 *       skipped the NULL values.
+	 */
+	bucket->numrows = numrows;
+	bucket->ntuples = numrows;
+	bucket->rows = rows;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	/*
+	 * The initial bucket was not split at all, so we'll start with the
+	 * first dimension in the next round (index = 0).
+	 */
+	bucket->last_split_dimension = -1;
+
+	return bucket;
+}
+
+/*
+ * TODO Fix to handle arbitrarily-sized histograms (not just 2D ones)
+ *      and call the right output procedures (for the particular type).
+ *
+ * TODO This should somehow fetch info about the data types, and use
+ *      the appropriate output functions to print the boundary values.
+ *      Right now this prints the 8B value as an integer.
+ *
+ * TODO Also, provide a special function for 2D histogram, printing
+ *      a gnuplot script (with rectangles).
+ *
+ * TODO For string types (once supported) we can sort the strings first,
+ *      assign them a sequence of integers and use the original values
+ *      as labels.
+ */
+#ifdef MVSTATS_DEBUG
+static void
+print_mv_histogram_info(MVHistogram histogram)
+{
+	int i = 0;
+
+	elog(WARNING, "histogram nbuckets=%d", histogram->nbuckets);
+
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		MVBucket bucket = histogram->buckets[i];
+		elog(WARNING, "  bucket %d : ndistinct=%f ntuples=%d min=[%ld, %ld], max=[%ld, %ld] distinct=[%d,%d]",
+			i, bucket->ndistinct, bucket->numrows,
+			bucket->min[0], bucket->min[1], bucket->max[0], bucket->max[1],
+			bucket->ndistincts[0], bucket->ndistincts[1]);
+	}
+}
+#endif
+
+/*
+ * A very simple partitioning selection criteria - choose the bucket
+ * with the highest number of distinct values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int ndistinct = 1; /* if ndistinct=1, we can't split the bucket */
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* if the ndistinct count is higher, use this bucket */
+		if (buckets[i]->ndistinct > ndistinct) {
+			bucket = buckets[i];
+			ndistinct = buckets[i]->ndistinct;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - splits the dimensions in
+ * a round-robin manner (considering only those with ndistinct>1). That
+ * is first a dimension 0 is split, then 1, 2, ... until reaching the
+ * end of attribute list, and then wrapping back to 0. Of course,
+ * dimensions with a single distinct value are skipped.
+ *
+ * This is essentially what Muralikrishna/DeWitt described in their SIGMOD
+ * article (M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For
+ * Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD
+ * Conference 1988: 28-36).
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ * 
+ * This splits the bucket by tweaking the existing one, and returning the
+ * new bucket (essentially shrinking the existing one in-place and returning
+ * the other "half" as a new bucket). The caller is responsible for adding
+ * the new bucket into the list of buckets.
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case of
+ *      strongly dependent columns - e.g. y=x).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g. to
+ *      split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(bucket->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = bucket->rows;
+	int oldnrows = bucket->numrows;
+
+	/* info for the interesting attributes only */
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(bucket->ndistinct > 1);
+	Assert(bucket->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split, in a round robin manner.
+	 * We'll use the first one with (ndistinct > 1).
+	 *
+	 * If we happen to wrap around, something clearly went wrong (we
+	 * can't mess with the last_split_dimension directly, because we
+	 * couldn't do this check).
+	 */
+	dimension = bucket->last_split_dimension;
+	while (true)
+	{
+		dimension = (dimension + 1) % numattrs;
+
+		if (bucket->ndistincts[dimension] > 1)
+			break;
+
+		/* if we ran the previous split dimension, it's infinite loop */
+		Assert(dimension != bucket->last_split_dimension);
+	}
+
+	/* Remember the dimension for the next split of this bucket. */
+	bucket->last_split_dimension = dimension;
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < bucket->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(bucket->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	split_value = values[0].value;
+	for (i = 1; i < bucket->numrows; i++)
+	{
+		/* count distinct values */
+		if (values[i].value != values[i-1].value)
+			ndistinct += 1;
+
+		/* once we've seen 1/2 distinct values (and use the value) */
+		if (ndistinct > bucket->ndistincts[dimension] / 2)
+		{
+			split_value = values[i].value;
+			break;
+		}
+
+		/* keep track how many rows belong to the first bucket */
+		nrows += 1;
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < bucket->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	bucket->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_bucket->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	bucket->numrows	 = nrows;
+	new_bucket->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&bucket->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_bucket->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	new_bucket->last_split_dimension = bucket->last_split_dimension;
+
+	/* allocate the per-dimension arrays */
+	new_bucket->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ * 
+ * FIXME Make this work with all types (not just pass-by-value ones).
+ * 
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ * 
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j, idx = 0;
+	int numattrs = attrs->dim1;
+	Size len = sizeof(Datum) * numattrs;
+	bool isNull;
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	Datum * values = palloc0(bucket->numrows * numattrs * sizeof(Datum));
+
+	for (j = 0; j < bucket->numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			values[idx++] = heap_getattr(bucket->rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isNull);
+
+	qsort_arg((void *) values, bucket->numrows, sizeof(Datum) * numattrs,
+			  compare_scalars_memcmp, &len);
+
+	bucket->ndistinct = 1;
+
+	for (i = 1; i < bucket->numrows; i++)
+		if (memcmp(&values[i * numattrs], &values[(i-1) * numattrs], len) != 0)
+			bucket->ndistinct += 1;
+
+	pfree(values);
+
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ *
+ * TODO Remove unnecessary parameters - don't pass in the whole arrays,
+ *      just the proper elements.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	Datum * values = (Datum*)palloc0(bucket->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < bucket->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(bucket->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	bucket->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 * 
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs etc.).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			bucket->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+MVStats
+list_mv_stats(Oid relid, int *nstats, bool built_only)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	MVStats		result;
+
+	/* start with 16 items, that should be enough for most cases */
+	int maxitems = 16;
+	result = (MVStats)palloc0(sizeof(MVStatsData) * maxitems);
+	*nstats = 0;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		/*
+		 * Skip statistics that were not computed yet (if only stats
+		 * that were already built were requested)
+		 */
+		if (built_only && (! (stats->hist_built || stats->mcv_built || stats->assoc_built)))
+			continue;
+
+		/* double the array size if needed */
+		if (*nstats == maxitems)
+		{
+			maxitems *= 2;
+			result = (MVStats)repalloc(result, sizeof(MVStatsData) * maxitems);
+		}
+
+		result[*nstats].mvoid = HeapTupleGetOid(htup);
+		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		result[*nstats].hist_built = stats->hist_built;
+		result[*nstats].mcv_built = stats->mcv_built;
+		result[*nstats].assoc_built = stats->assoc_built;
+		*nstats += 1;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+
+/*
+ * Serialize the MV histogram into a bytea value.
+ *
+ * The serialized first deduplicates the boundary values into a separate
+ * array, and uses 2B indexes when serializing the buckets. This stores
+ * a significant amount of space because each bucket split adds a single
+ * new boundary value, so e.g. with 4 attributes and 8191 splits (thus
+ * 8192 buckets), there are only ~8200 distinct boundary values.
+ *
+ * But as each bucket has 8 boundary values (4+4), that's ~64k Datums.
+ * That's roughly 65kB vs. 512kB, but we haven't included the indexes
+ * used to reference the boundary values. By using int16 indexes (which
+ * should be more than enough for all reasonable histogram sizes),
+ * this amounts to ~128kB (8192*8*2). So in total it's ~196kB vs. 512kB,
+ * i.e. more than 2x compression, which is nice.
+ *
+ * The implementation is simple - walk through the buckets, collect all
+ * the boundary values, keep only distinct values (in a sorted array)
+ * and then replace the values with indexes (using binary search).
+ *
+ * It's possible to either serialize/deserialize the histogram into
+ * a MVHistogram, or create a special structure working with this
+ * compressed structure (and keep MVBucket/MVHistogram only for the
+ * building phase). This might actually work better thanks to better
+ * CPU cache hit ratio, and simpler deserialization.
+ *
+ * This encoding will probably prevent automatic varlena compression,
+ * because first part of the serialized bytea will be an array of unique
+ * values (although sorted), and pglz decides whether to compress by
+ * trying to compress the first part (~1kB or so). Which will be poor,
+ * due to the lack of repetition.
+ *
+ * But in this case this is probably desirable - the data in general
+ * won't be really compressible (in addition to the 2x compression we
+ * got thanks to the encoding). In a sense the encoding scheme is
+ * actually a context-aware compression (usually compressing to ~30%).
+ * So this seems appropriate in this case.
+ *
+ * FIXME Make this work with arbitrary types.
+ *
+ * TODO Try to keep the compressed form, instead of deserializing it to
+ *      MVHistogram/MVBucket.
+ *
+ * TODO We might get a bit better compression by considering the actual
+ *      data type length. The current implementation treats all data as
+ *      8B values, but for INT it's actually 4B etc. OTOH this is only
+ *      related to the lookup table, and most of the space is occupied
+ *      by the buckets (with int16 indexes). And we don't have type info
+ *      at the moment, so it would be difficult (but we'll nedd it to
+ *      support all types, so maybe then).
+ */
+static bytea *
+serialize_mv_histogram(MVHistogram histogram)
+{
+	int i = 0, j = 0;
+
+	/* total size (histogram header + all buckets) */
+	Size	total_len;
+	char   *tmp = NULL;
+	bytea  *result = NULL;
+
+	/* we need to accumulate all boundary values (min/max) */
+	int idx = 0;
+	int max_values = histogram->nbuckets * histogram->ndimensions * 2;
+	Datum * values = (Datum*)palloc0(max_values * sizeof(Datum));
+	Size len = sizeof(Datum);
+
+	/* we'll collect unique boundary values into this */
+	int		ndistinct = 0;
+	Datum  *lookup = NULL;
+	uint16 *indexes = (uint16*)palloc0(sizeof(uint16) * histogram->ndimensions);
+
+	/*
+	 * Collect the boundary values first, sort them and generate a small
+	 * array with only distinct values.
+	 */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			values[idx++] = histogram->buckets[i]->min[j];
+			values[idx++] = histogram->buckets[i]->max[j];
+		}
+	}
+
+	/*
+	 * We've allocated just enough space for all boundary values, but
+	 * this may change once we start handling NULL values (as we'll
+	 * probably skip those).
+	 *
+	 * Also, we expect at least one boundary value at this moment.
+	 */
+	Assert(max_values == idx);
+	Assert(idx > 1);
+
+	/*
+	 * Sort the collected boundary values using a simple memcmp-based
+	 * comparator (this won't work for pass-by-reference types), and
+	 * then walk the data and count the distinct values.
+	 */
+	qsort((void *) values, idx, len, compare_scalars_memcmp_2);
+
+	ndistinct = 1;
+	for (i = 1; i < max_values; i++)
+		ndistinct += (values[i-1] != values[i]) ? 1 : 0;
+
+	/*
+	 * At this moment we can allocate the bytea value (and we'll collect
+	 * the boundary values directly into it).
+	 *
+	 * The bytea will be structured like this:
+	 *
+	 * - varlena header            : VARHDRSZ
+	 * - histogram header          : offsetof(MVHistogram,buckets)
+	 * - number of boundary values : sizeof(uint32)
+	 * - boundary values           : ndistinct * sizeof(Datum)
+	 * - buckets                   : nbuckets * BUCKET_SIZE_SERIALIZED
+	 *
+	 * We'll assume 2B indexes into the boundary values, because each
+	 * bucket 'split' introduces one boundary value. Moreover, multiple
+	 * splits may introduce the same value, so this should be enough for
+	 * at least 65k buckets (and likely more). That's more than enough
+	 * for reasonable histogram sizes.
+	 */
+
+	Assert(ndistinct <= 65536);
+
+	total_len = VARHDRSZ + offsetof(MVHistogramData, buckets) +
+				(sizeof(uint32) + ndistinct * sizeof(Datum)) +
+				histogram->nbuckets * BUCKET_SIZE_SERIALIZED(histogram->ndimensions);
+
+	result = (bytea*)palloc0(total_len);
+	tmp = VARDATA(result);
+
+	SET_VARSIZE(result, total_len);
+
+	/* copy the global histogram header */
+	memcpy(tmp, histogram, offsetof(MVHistogramData, buckets));
+	tmp += offsetof(MVHistogramData, buckets);
+
+	/*
+	 * Copy the number of distinct values, and then all the distinct
+	 * values currently stored in the 'values' array (sorted).
+	 */
+	 memcpy(tmp, &ndistinct, sizeof(uint32));
+	 tmp += sizeof(uint32);
+
+	lookup = (Datum*)tmp;
+
+	for (i = 0; i < max_values; i++)
+	{
+		/* skip values that are equal to the previous one */
+		if ((i > 0) && (values[i-1] == values[i]))
+			continue;
+
+		memcpy(tmp, &values[i], sizeof(Datum));
+		tmp += sizeof(Datum);
+	}
+
+	Assert(tmp - (char*)lookup == ndistinct * sizeof(Datum));
+
+	/* now serialize all the buckets - first the header, without the
+	 * variable-length part, then all the variable length parts */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		MVBucket	bucket = histogram->buckets[i];
+
+		/* write the common bucket header */
+		memcpy(tmp, bucket, offsetof(MVBucketData, ndistincts));
+		tmp += offsetof(MVBucketData, ndistincts);
+
+		/* per-dimension ndistincts / nullsonly */
+		memcpy(tmp, bucket->ndistincts, sizeof(uint32)*histogram->ndimensions);
+		tmp += sizeof(uint32)*histogram->ndimensions;
+
+		memcpy(tmp, bucket->nullsonly, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		memcpy(tmp, bucket->min_inclusive, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		memcpy(tmp, bucket->max_inclusive, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		/* and now translate the min (and then max) boundaries to indexes */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			Datum *v = (Datum*)bsearch(&bucket->min[j], lookup, ndistinct,
+									   sizeof(Datum), compare_scalars_memcmp_2);
+
+			Assert(v != NULL);
+			indexes[j] = (v - lookup);		/* Datum arithmetics (not char) */
+			Assert(indexes[j] < ndistinct);	/* we have to be within the array */
+		}
+
+		memcpy(tmp, indexes, sizeof(uint16)*histogram->ndimensions);
+		tmp += sizeof(uint16)*histogram->ndimensions;
+
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			Datum *v = (Datum*)bsearch(&bucket->max[j], lookup, ndistinct,
+									   sizeof(Datum), compare_scalars_memcmp_2);
+			Assert(v != NULL);
+			indexes[j] = (v - lookup);		/* Datum arithmetics (not char) */
+			Assert(indexes[j] < ndistinct);	/* we have to be within the array */
+		}
+
+		memcpy(tmp, indexes, sizeof(uint16)*histogram->ndimensions);
+		tmp += sizeof(uint16)*histogram->ndimensions;
+	}
+
+	pfree(indexes);
+
+	return result;
+}
+
+/*
+ * Reverse to serialize histogram. This essentially expands the serialized
+ * form back to MVHistogram / MVBucket.
+ */
+MVHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_length;
+	char   *tmp = NULL;
+	MVHistogram histogram;
+
+	uint32	nlookup;	/* Datum lookup table */
+	Datum   *lookup = NULL;
+
+	if (data == NULL)
+		return NULL;
+
+	/* get pointer to the data part of the varlena */
+	tmp = VARDATA(data);
+
+	histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	/* copy the histogram header in place */
+	memcpy(histogram, tmp, offsetof(MVHistogramData, buckets));
+	tmp += offsetof(MVHistogramData, buckets);
+
+	if (histogram->magic != MVHIST_MAGIC)
+	{
+		pfree(histogram);
+		elog(WARNING, "not a MV Histogram (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(histogram->type == MVHIST_TYPE_BASIC);
+	Assert(histogram->nbuckets > 0);
+	Assert(histogram->nbuckets <= MVHIST_MAX_BUCKETS);
+	Assert(histogram->ndimensions > 0);
+	Assert(histogram->ndimensions <= MVSTATS_MAX_DIMENSIONS);
+
+	/* now, get the size of the lookup table */
+	memcpy(&nlookup, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+	lookup = (Datum*)tmp;
+
+	/* skip to the first bucket */
+	tmp += sizeof(Datum) * nlookup;
+
+	/* check the total serialized length */
+	expected_length = offsetof(MVHistogramData, buckets) +
+			sizeof(uint32) + nlookup * sizeof(Datum) +
+			histogram->nbuckets * BUCKET_SIZE_SERIALIZED(histogram->ndimensions);
+
+	/* check serialized length */
+	if (VARSIZE_ANY_EXHDR(data) != expected_length)
+	{
+		elog(ERROR, "invalid MV histogram serialized size (expected %ld, got %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_length);
+		return NULL;
+	}
+
+	/* allocate bucket pointers */
+	histogram->buckets = (MVBucket*)palloc0(histogram->nbuckets * sizeof(MVBucket));
+
+	/* deserialize the buckets, one by one */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		/* don't allocate space for the build-only fields */
+		MVBucket	bucket = (MVBucket)palloc0(offsetof(MVBucketData, rows));
+		uint16     *indexes = NULL;
+
+		/* write the common bucket header */
+		memcpy(bucket, tmp, offsetof(MVBucketData, ndistincts));
+		tmp += offsetof(MVBucketData, ndistincts);
+
+		/* per-dimension ndistincts / nullsonly */
+		bucket->ndistincts = (uint32*)palloc0(sizeof(uint32)*histogram->ndimensions);
+		memcpy(bucket->ndistincts, tmp, sizeof(uint32)*histogram->ndimensions);
+		tmp += sizeof(uint32)*histogram->ndimensions;
+
+		bucket->nullsonly = (bool*)palloc0(sizeof(bool)*histogram->ndimensions);
+		memcpy(bucket->nullsonly, tmp, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		bucket->min_inclusive = (bool*)palloc0(sizeof(bool)*histogram->ndimensions);
+		memcpy(bucket->min_inclusive, tmp, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		bucket->max_inclusive = (bool*)palloc0(sizeof(bool)*histogram->ndimensions);
+		memcpy(bucket->max_inclusive, tmp, sizeof(bool)*histogram->ndimensions);
+		tmp += sizeof(bool)*histogram->ndimensions;
+
+		/* translate the indexes back to Datum values */
+		bucket->min = (Datum*)palloc0(sizeof(Datum)*histogram->ndimensions);
+		bucket->max = (Datum*)palloc0(sizeof(Datum)*histogram->ndimensions);
+
+		indexes = (uint16*)tmp;
+		tmp += sizeof(uint16) * histogram->ndimensions;
+		for (j = 0; j < histogram->ndimensions; j++)
+			memcpy(&bucket->min[j], &lookup[indexes[j]], sizeof(Datum));
+
+		indexes = (uint16*)tmp;
+		tmp += sizeof(uint16) * histogram->ndimensions;
+		for (j = 0; j < histogram->ndimensions; j++)
+			memcpy(&bucket->max[j], &lookup[indexes[j]], sizeof(Datum));
+
+		histogram->buckets[i] = bucket;
+	}
+
+	return histogram;
+}
+
+/*
+ * Serialize MCV list into a bytea value.
+ *
+ * This does not use any kind of deduplication (compared to histogram
+ * serialization), as we don't expect the same efficiency here.
+ *
+ * This simply writes a MCV header (number of items, ...) and then Datum
+ * values for all attribute of a item, followed by the item frequency
+ * (as a double).
+ */
+static bytea *
+serialize_mv_mcvlist(MCVList mcvlist)
+{
+	int i;
+
+	/* we need to store nitems, and each needs ndimension * Datum, plus a double */
+	Size len = VARHDRSZ + offsetof(MCVListData, items) + mcvlist->nitems * (sizeof(Datum) * mcvlist->ndimensions + sizeof(double));
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, mcvlist, offsetof(MCVListData, items));
+	tmp += offsetof(MCVListData, items);
+
+	/* now, walk through the items and store values + frequency for each MCV item */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		memcpy(tmp, mcvlist->items[i]->values, mcvlist->ndimensions * sizeof(Datum));
+		tmp += mcvlist->ndimensions * sizeof(Datum);
+
+		memcpy(tmp, &mcvlist->items[i]->frequency, sizeof(double));
+		tmp += sizeof(double);
+	}
+
+	return output;
+
+}
+
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	Assert(mcvlist->nitems > 0);
+	Assert((mcvlist->ndimensions >= 2) && (mcvlist->ndimensions <= MVSTATS_MAX_DIMENSIONS));
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MCVListData,items) +
+					mcvlist->nitems * (sizeof(Datum) * mcvlist->ndimensions + sizeof(double));
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem) * mcvlist->nitems);
+
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem item = (MCVItem)palloc0(offsetof(MCVItemData, values) +
+										mcvlist->ndimensions * sizeof(Datum));
+
+		memcpy(item->values, tmp, mcvlist->ndimensions * sizeof(Datum));
+		tmp += mcvlist->ndimensions * sizeof(Datum);
+
+		memcpy(&item->frequency, tmp, sizeof(double));
+		tmp += sizeof(double);
+
+		mcvlist->items[i] = item;
+	}
+
+	return mcvlist;
+}
+
+static void
+update_mv_stats(Oid mvoid, MVHistogram histogram, MCVList mcvlist)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (histogram != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stahist-1]    = false;
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(serialize_mv_histogram(histogram));
+	}
+
+	if (mcvlist != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = false;
+		values[Anum_pg_mv_statistic_stamcv  - 1]
+			= PointerGetDatum(serialize_mv_mcvlist(mcvlist));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+
+/* MV stats */
+
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+Datum
+pg_mv_stats_mvclist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+Datum
+pg_mv_stats_histogram_gnuplot(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+
+	/* FIXME (handle the length properly using StringBuilder */
+	Size		len = 1024*1024;
+	char	   *buffer = palloc0(len);
+	char	   *str = buffer;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+
+	MVHistogram hist = deserialize_mv_histogram(data);
+
+	for (i = 0; i < hist->nbuckets; i++)
+	{
+		str += snprintf(str, len - (str - buffer),
+						"set object %d rect from %ld,%ld to %ld,%ld lw 1\n",
+						(i+1),
+						hist->buckets[i]->min[0], hist->buckets[i]->min[1],
+						hist->buckets[i]->max[0], hist->buckets[i]->max[1]);
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(buffer));
+
+}
+
+bytea *
+fetch_mv_histogram(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *stahist = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum hist  = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stahist, &isnull);
+
+		Assert(!isnull);
+
+		stahist = DatumGetByteaP(hist);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return stahist;
+}
+
+bytea *
+fetch_mv_mcvlist(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *mcvlist = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum tmp  = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stamcv, &isnull);
+
+		Assert(!isnull);
+
+		mcvlist = DatumGetByteaP(tmp);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return mcvlist;
+}
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, by looking at the number of
+ * distinct values (combination of column values for bucket, column
+ * values for a dimension). This is somehow naive, but seems to work
+ * quite well. See the discussion at select_bucket_to_partition and
+ * partition_bucket for more details about alternative algorithms.
+ *
+ * So the current algorithm looks like this:
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (max distinct combinations)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (max distinct values)
+ *             split the bucket into two buckets
+ *
+ */
+static MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   int attr_cnt, VacAttrStats **vacattrstats,
+				   int numrows_total)
+{
+	int i;
+	int ndistinct;
+	int numattrs = attrs->dim1;
+	int *ndistincts = (int*)palloc0(sizeof(int) * numattrs);
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVHIST_MAGIC;
+	histogram->type  = MVHIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets = (MVBucket*)palloc0(MVHIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0] = create_initial_mv_bucket(numrows, rows_copy, attrs,
+													 attr_cnt, vacattrstats);
+
+	ndistinct = histogram->buckets[0]->ndistinct;
+
+	/* keep the global ndistinct values */
+	for (i = 0; i < numattrs; i++)
+		ndistincts[i] = histogram->buckets[0]->ndistincts[i];
+
+	while (histogram->nbuckets < MVHIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets, histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets] = partition_bucket(bucket, attrs,
+																   attr_cnt, vacattrstats);
+
+		histogram->nbuckets += 1;
+	}
+
+	/*
+	 * FIXME store the histogram in a catalog in a serialized form (simple for
+	 *       pass-by-value, more complicated for buckets on varlena types)
+	 */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		int d;
+		histogram->buckets[i]->ntuples = (histogram->buckets[i]->numrows * 1.0) / numrows_total;
+		histogram->buckets[i]->ndistinct = (histogram->buckets[i]->ndistinct * 1.0) / ndistinct;
+
+		for (d = 0; d < numattrs; d++)
+			histogram->buckets[i]->ndistincts[d] = (histogram->buckets[i]->ndistincts[d] * 1.0) / ndistincts[d];
+	}
+
+	pfree(ndistincts);
+
+	return histogram;
+
+}
+
+/*
+ * Mine associations between the columns, in the form (A => B).
+ *
+ * At the moment this only works for associations between two columns,
+ * but it might be useful to mine for rules involving multiple columns
+ * on the left side. That is rules [A,B] => C and so on. Handling
+ * multiple columns on the right side is not necessary, because such
+ * rules may be decomposed into a set of rules, one for each column.
+ * I.e. A => [B,C] is exactly the same as (A => B) & (A => C).
+ *
+ * Those rules don't immediately identify redundant clauses, because the
+ * user may choose "incompatible conditions" (e.g. by using a zip code
+ * and a mismatching city) and so on. This should however be easy to
+ * identify from a histogram, because the conditions will match a bucket
+ * with low frequencies.
+ *
+ * The question is whether this can be useful when we have a histogram,
+ * because such incompatible conditions should result in not matching
+ * any buckets (or matching only buckets with low frequencies).
+ *
+ * The problem is that histograms work like this when the sorting is
+ * compatible with the meaning of the data. We're often using data types
+ * that support sorting (e.g. INT, BIGING) as a kind of labels where
+ * the sorting really does not make much sense. Sorting by ZIP code will
+ * result in sorting the cities quite randomly, and similarly for most
+ * surrogate primary / foreign keys. In such cases the histograms are
+ * pretty useless.
+ *
+ * So, a good approach might be testing the independence of the data
+ * (by building a contingency table) and buildint the MV histogram only
+ * when there's a dependency. For the 'label' data this should notice
+ * the histogram is useless. So we won't build it (and we may use that
+ * as a sign supporting the association rule).
+ *
+ * Another option is to look at selectivity of A and B separately, and
+ * then use the minimum of those.
+ *
+ * TODO investigate using histogram and MCV list to confirm the
+ *      associative rule
+ *
+ * TODO investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram)
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ */
+static void
+build_mv_associations(int numrows, HeapTuple *rows, int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	bool isNull;
+	Size len = 2 * sizeof(Datum);	/* only simple associations a => b */
+	int numattrs = attrs->dim1;
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct columns in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we should expected to
+	 *      observe in the sample - we can then use the average group
+	 *      size as a threshold. That seems better than a static approach.
+	 */
+	int min_group_size = 10;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/* info for the interesting attributes only
+	 * 
+	 * TODO Compute this only once and pass it to all the methods
+	 *      that need it.
+	 */
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/* We'll reuse the same array for all the combinations */
+	Datum * values = (Datum*)palloc0(numrows * 2 * sizeof(Datum));
+
+	Assert(numattrs >= 2);
+
+	for (dima = 0; dima < numattrs; dima++)
+	{
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+
+			int supporting = 0;
+			int contradicting = 0;
+
+			Datum val_a, val_b;
+			int violations = 0;
+			int group_size = 0;
+
+			int supporting_rows = 0;
+
+			/* skip (dima==dimb) */
+			if (dima == dimb)
+				continue;
+
+			/*
+			 * FIXME Not sure if this handles NULL values properly (not sure
+			 *       how to do that). We assume that NULL means 0 for now,
+			 *       handling it just like any other value.
+			 */
+			for (i = 0; i < numrows; i++)
+			{
+				values[i*2]   = heap_getattr(rows[i], attrs->values[dima], stats[dima]->tupDesc, &isNull);
+				values[i*2+1] = heap_getattr(rows[i], attrs->values[dimb], stats[dimb]->tupDesc, &isNull);
+			}
+
+			qsort_arg((void *) values, numrows, sizeof(Datum) * 2, compare_scalars_memcmp, &len);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 * 
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful. When contradicting,
+			 * use it always.
+			 */
+
+			/* start with values from the first row */
+			val_a = values[0];
+			val_b = values[1];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				if (values[2*i] != val_a)	/* end of the group */
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 */
+					supporting += ((violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
+					contradicting += (violations != 0) ? 1 : 0;
+
+					supporting_rows += ((violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
+
+					/* current values start a new group */
+					val_a = values[2*i];
+					val_b = values[2*i+1];
+					violations = 0;
+					group_size = 1;
+				}
+				else
+				{
+					if (values[2*i+1] != val_b)	/* mismatch of a B value */
+					{
+						val_b = values[2*i+1];
+						violations += 1;
+					}
+
+					group_size += 1;
+				}
+			}
+
+			/* FIXME handle the last group */
+			supporting += ((violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
+			contradicting += (violations != 0) ? 1 : 0;
+			supporting_rows += ((violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical rule.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 * 
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means the columns have the same values (or one is a 'label'),
+			 *      making the conditions rather redundant. Although it's possible
+			 *      that the query uses incompatible combination of values.
+			 */
+			if (supporting_rows > (numrows - supporting_rows) * 10)
+			{
+				// elog(WARNING, "%d => %d : supporting=%d contradicting=%d", dima, dimb, supporting, contradicting);
+			}
+
+		}
+	}
+
+	pfree(values);
+
+}
+
+/*
+ * Compute the list of most common items, where item is a combination of
+ * values for all the columns. For small number of distinct values, we
+ * may be able to represent the distribution pretty exactly, with
+ * per-item statistics.
+ *
+ * If we can represent the distribution using a MCV list only, it's great
+ * because that allows much better estimates (especially for equality).
+ * Such discrete distributions are also easier to combine (more
+ * efficient and more accurate) than when using histograms.
+ *
+ * FIXME This does not handle NULL values at the moment.
+ *
+ * TODO When computing equality selectivity (a=1 AND b=2), we can do that
+ *      pretty exactly assuming (a) we hit a MCV item and (b) the
+ *      histogram is built on those two columns only (i.e. there are no
+ *      other columns). In that case we can estimate the selectivity
+ *      using only the MCV.
+ *
+ *      When we don't hit a MCV item, we can use the frequency of the
+ *      least probable MCV item as upper bound of the selectivity
+ *      (otherwise it'd get into the MCV list). Again, this only works
+ *      when the histogram size matches the restricted columns.
+ *
+ *      When the histogram is larger (i.e. there are additional columns),
+ *      we can't be sure how is the selectivity distributed among the MCV
+ *      list and the histogram (we may get several MCV items matching
+ *      the conditions and several histogram buckets at the same time).
+ *
+ *      In this case we can probably clamp the selectivity by minimum of
+ *      selectivities for each condition. For example if we know the
+ *      number of distinct values for each column, we can use 1/ndistinct
+ *      as a per-column estimate. Or rather 1/ndistinct + selectivity
+ *      derived from the MCV list.
+ *
+ *      If there's no histogram (thus the distribution is approximated
+ *      only by the MCV list), the size of the stats (whether there are
+ *      some other columns, not referenced in the conditions) does not
+ *      matter. We can do pretty accurate estimation using the MCV.
+ *
+ * TODO Currently there's no logic to consider building only a MCV list
+ *      (and not building the histogram at all).
+ * 
+ * TODO For types that don't reasonably support ordering (either because
+ *      the type does not support that or when the user adds some option
+ *      to the ADD STATISTICS command - e.g. UNSORTED_STATS), building
+ *      the histogram may be pointless and inefficient. This is esp.
+ *      true for varlena types that may be quite large and a large MCV
+ *      list may be a better choice, because it makes equality estimates
+ *      more accurate. Due to the unsorted nature, range queries on those
+ *      attributes are rather useless anyway.
+ *
+ *      Another thing is that by restricting to MCV list and equality
+ *      conditions, we can use hash values instead of long varlena values.
+ *      The equality estimation will be very accurate.
+ *
+ *      This however complicates matching the columns to available
+ *      statistics, as it will require matching clauses (not columns) to
+ *      stats. And it may get quite complex - e.g. what if there are
+ *      multiple clauses, each compatible with different stats subset?
+ * 
+ * FIXME Create a special-purpose type for MCV items (instead of a plain
+ *       Datum array, which is very difficult to work with).
+ */
+static MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats,
+					  int *numrows_filtered)
+{
+	int i, j, idx = 0;
+	int numattrs = attrs->dim1;
+	Size len = sizeof(Datum) * numattrs;
+	bool isNull;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 * 
+	 * TODO We're using Datum (8B), even for data types smaller than this
+	 *      (notably int4 and float4). Maybe we could save some space here,
+	 *      although it seems the bytea compression will handle it just fine.
+	 */
+	Datum * values = palloc0(numrows * numattrs * sizeof(Datum));
+
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			values[idx++] = heap_getattr(rows[j], attrs->values[i], stats[i]->tupDesc, &isNull);
+
+	qsort_arg((void *) values, numrows, sizeof(Datum) * numattrs, compare_scalars_memcmp, &len);
+
+	/*
+	 * Count the number of distinct values - we need this to determine
+	 * the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (memcmp(&values[i * numattrs], &values[(i-1) * numattrs], len) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array.
+	 * 
+	 * TODO for now the threshold is the same as in the single-column
+	 * 		case (average + 25%), but maybe that's worth revisiting
+	 * 
+	 * TODO see if we can fit all the distinct values in the MCV list
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	/*
+	 * If there are less than some number of items, store all with at
+	 * least two rows in the sample.
+	 * 
+	 * FIXME We can do this only if we believe we got all the distinct
+	 *       values of the table.
+	 */
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or a new group */
+		if ((i == numrows) || (memcmp(&values[i * numattrs], &values[(i-1) * numattrs], len) != 0))
+		{
+			/* count the MCV item if exceeding the threshold */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* same group, just increase the number of items */
+			count += 1;
+	}
+
+	/* by default we keep all the rows (even if there's no MCV list) */
+	*numrows_filtered = numrows;
+
+	/* we know the number of mcvitems, now collect them in a 2nd pass */
+	if (nitems > 0)
+	{
+		/* we need to store the frequency for each group, so (numattrs + 1) */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		/* now repeat the same loop as above, but this time copy the data
+		 * for items exceeding the threshold */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+
+			/* last row or a new group */
+			if ((i == numrows) || (memcmp(&values[i * numattrs], &values[(i-1) * numattrs], len) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* first, allocate the item (with the proper size of values) */
+					MCVItem item = (MCVItem)palloc0(offsetof(MCVItemData, values) +
+															  sizeof(Datum)*mcvlist->ndimensions);
+
+					/* then copy values from the _previous_ group */
+					memcpy(item->values, &values[(i-1)*numattrs], len);
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					mcvlist->items[nitems] = item;
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV items.
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+				Datum  *keys = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+				/* collect the key values */
+				for (j = 0; j < numattrs; j++)
+					keys[j] = heap_getattr(rows[i], attrs->values[j], stats[j]->tupDesc, &isNull);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+					if (memcmp(keys, mcvlist->items[j]->values, sizeof(Datum)*numattrs) == 0)
+					{
+						match = true;
+						break;
+					}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+
+				pfree(keys);
+			}
+
+			/* replace the first part */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			pfree(rows_filtered);
+
+		}
+	}
+
+	pfree(values);
+
+	/*
+	 * TODO Single-dimensional MCV is stored sorted by frequency (descending).
+	 *      Maybe this should be stored like that too?
+	 */
+
+	return mcvlist;
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+static int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+static int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting Datum[] (row of Datums) when
+ * counting distinct values.
+ */
+static int
+compare_scalars_memcmp(const void *a, const void *b, void *arg)
+{
+	Size		len = *(Size*)arg;
+
+	return memcmp(a, b, len);
+}
+
+static int
+compare_scalars_memcmp_2(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(Datum));
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 714a9f1..7f9e54f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_rowsecurity.h"
@@ -91,7 +92,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -139,8 +140,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
@@ -414,7 +416,8 @@ static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKM
 static void ATExecGenericOptions(Relation rel, List *options);
 static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
-
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
 				   ForkNumber forkNum, char relpersistence);
 static const char *storage_name(char c);
@@ -2965,6 +2968,7 @@ AlterTableGetLockLevel(List *cmds)
 				 * updates.
 				 */
 			case AT_SetStatistics:		/* Uses MVCC in getTableAttrs() */
+			case AT_AddStatistics:		/* XXX not sure if the right level */
 			case AT_ClusterOn:	/* Uses MVCC in getIndexes() */
 			case AT_DropCluster:		/* Uses MVCC in getIndexes() */
 			case AT_SetOptions:	/* Uses MVCC in getTableAttrs() */
@@ -3112,6 +3116,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_ADD_CONSTR;
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
+		case AT_AddStatistics:	/* XXX maybe not the right place */
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* Performs own permission checks */
 			ATPrepSetStatistics(rel, cmd->name, cmd->def, lockmode);
@@ -3407,6 +3412,9 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			ATExecSetStatistics(rel, cmd->name, cmd->def, lockmode);
 			break;
+		case AT_AddStatistics:		/* ADD STATISTICS */
+			ATExecAddStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 			ATExecSetOptions(rel, cmd->name, cmd->def, false, lockmode);
 			break;
@@ -11616,3 +11624,197 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 
 	ReleaseSysCache(tuple);
 }
+
+/* used for sorting the attnums in ATExecAddStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the ALTER TABLE ... ADD STATISTICS (options) ON (columns).
+ *
+ * The code is an unholy mix of pieces that really belong to other parts
+ * of the source tree.
+ *
+ * FIXME Check that the types are pass-by-value and support sort,
+ *       although maybe we can live without the sort (and only build
+ *       MCV list / association rules).
+ *
+ * FIXME This should probably check for duplicate stats (i.e. same
+ *       keys, same options). Although maybe it's useful to have
+ *       multiple stats on the same columns with different options
+ *       (say, a detailed MCV-only stats for some queries, histogram
+ *       for others, etc.)
+ */
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	Oid 		atttypids[INDEX_MAX_KEYS];
+	int			numcols = 0;
+
+	Oid			mvstatoid;
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+
+	/* by default build everything */
+	bool 	build_histogram = true,
+			build_mcv = true,
+			build_associations = true;
+
+	/* build regular MCV (not hashed by default) */
+	bool	mcv_hashed = false;
+
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
+
+	Assert(IsA(def, StatisticsDef));
+
+	/* transform the column names to attnum values */
+
+	foreach(l, def->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		atttypids[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->atttypid;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, def->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv_hashed") == 0)
+			mcv_hashed = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "associations") == 0)
+			build_associations = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* TODO check that this is not used with 'histogram off' */
+
+			/* sanity check */
+			if (max_buckets < 1024)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is 1024")));
+
+			else if (max_buckets > 32768) /* FIXME use the proper constant */
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is 1024")));
+
+		}
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* TODO check that this is not used with 'mcv off' */
+
+			/* sanity check */
+			if (max_mcv_items < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be non-negative")));
+
+			else if (max_mcv_items > 8192) /* FIXME use the proper constant */
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is 8192")));
+
+		}
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+	values[Anum_pg_mv_statistic_hist_enabled  -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_mcv_hashed    -1] = BoolGetDatum(mcv_hashed);
+	values[Anum_pg_mv_statistic_assoc_enabled -1] = BoolGetDatum(build_associations);
+
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+
+	nulls[Anum_pg_mv_statistic_staassoc -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	mvstatoid = simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	heap_freetuple(htup);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	return;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e76b5b3..da35331 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3903,6 +3903,17 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static StatisticsDef *
+_copyStatisticsDef(const StatisticsDef *from)
+{
+	StatisticsDef  *newnode = makeNode(StatisticsDef);
+
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4717,6 +4728,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_StatisticsDef:
+			retval = _copyStatisticsDef(from);
+			break;
 		case T_PrivGrantee:
 			retval = _copyPrivGrantee(from);
 			break;
@@ -4729,7 +4743,6 @@ copyObject(const void *from)
 		case T_XmlSerialize:
 			retval = _copyXmlSerialize(from);
 			break;
-
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(from));
 			retval = 0;			/* keep compiler quiet */
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 9b657fb..9c32735 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -24,6 +24,9 @@
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
+#include "utils/mvstats.h"
+#include "catalog/pg_collation.h"
+#include "utils/typcache.h"
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -43,6 +46,23 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 
+static bool is_mv_compatible(Node *clause, Oid varRelid, Index *varno,
+							 Bitmapset  **attnums);
+static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
+									  Oid varRelid, Oid *relid);
+static int choose_mv_histogram(int nmvstats, MVStats mvstats,
+							   Bitmapset *attnums);
+static List *clauselist_mv_split(List *clauses, Oid varRelid,
+								 List **mvclauses, MVStats mvstats);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStats mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStats mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStats mvstats);
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -100,14 +120,74 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+	int			nmvstats = 0;
+	MVStats		mvstats = NULL;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
+	/* collect attributes from mv-compatible clauses */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid);
+
+	/*
+	 * If there are mv-compatible clauses, referencing at least two
+	 * columns (otherwise it makes no sense to use mv stats), fetch the
+	 * MV histograms for the relation (only the column keys, not the
+	 * histograms yet - we'll decide which histogram to use first).
+	 */
+	if (bms_num_members(mvattnums) >= 2)
+	{
+		/* clauses compatible with multi-variate stats */
+		List	*mvclauses = NIL;
+
+		/* fetch info from the catalog (not the serialized stats yet) */
+		mvstats = list_mv_stats(relid, &nmvstats, true);
+
+		/*
+		 * If there are candidate statistics, choose the histogram first.
+		 * At the moment we only use a single statistics, covering the
+		 * most columns (using info from the previous step). If there
+		 * are multiple such histograms, we'll use the smallest one
+		 * (with the lowest number of dimensions).
+		 * 
+		 * This may not be optimal choice, if the 'smaller' stats has
+		 * much less buckets than the rejected one (making it less
+		 * accurate).
+		 *
+		 * We may end up without multivariate statistics, if none of the
+		 * stats matches at least two columns from the clauses (in that
+		 * case we may just use the single dimensional stats).
+		 */
+		if (nmvstats > 0)
+		{
+			int idx = choose_mv_histogram(nmvstats, mvstats, mvattnums);
+
+			if (idx >= 0)	/* we have a matching stats */
+			{
+				MVStats mvstat = &mvstats[idx];
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_mv_split(clauses, varRelid, &mvclauses, mvstat);
+
+				/* we've chosen the histogram to match the clauses */
+				Assert(mvclauses != NIL);
+
+				/* compute the multivariate stats */
+				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			}
+		}
+	}
+
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -782,3 +862,1010 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+
+
+/*
+ * Estimate selectivity for the list of MV-compatible clauses, using that
+ * particular histogram.
+ *
+ * When we hit a single bucket, we don't know what portion of it actually
+ * matches the clauses (e.g. equality), and we use 1/2 the bucket by
+ * default. However, the MV histograms are usually less detailed than
+ * the per-column ones, meaning the sum of buckets is often quite high
+ * (thanks to combining a lot of "partially hit" buckets).
+ *
+ * There are several ways to improve this, usually with cases when it
+ * won't really help. Also, the more complex the process, the worse
+ * the failures (i.e. misestimates).
+ *
+ * (1) Use the MV histogram only as a way to combine multiple
+ *     per-column histograms, essentially rewriting
+ *
+ *       P(A & B) = P(A) * P(B|A)
+ *
+ *     where P(B|A) may be computed using a proper "slice" of the
+ *     histogram, by first selecting only buckets where A is true, and
+ *     then using the boundaries to 'restrict' the per-colunm histogram.
+ *
+ *     With more clauses, it gets more complicated, of course
+ *
+ *       P(A & B & C) = P(A & C) * P(B|A & C)
+ *                    = P(A) * P(C|A) * P(B|A & C)
+ *
+ *     and so on.
+ * 
+ *     Of course, the question is how well and efficiently we can
+ *     compute the conditional probabilities - whether this approach
+ *     can improve the estimates (instead of amplifying the errors).
+ *
+ *     Also, this does not eliminate the need for histogram on [A,B,C].
+ *
+ * (2) Use multiple smaller (and more accurate) histograms, and combine
+ *     them using a process similar to the above. E.g. by assuming that
+ *     B and C are independent, we can rewrite
+ *
+ *       P(B|A & C) = P(B|A)
+ * 
+ *     so we can rewrite the whole formula to
+ * 
+ *       P(A & B & C) = P(A) * P(C|A) * P(B|A)
+ * 
+ *     and we're OK with two 2D histograms [A,C] and [A,B].
+ *
+ *     It'd be nice to perform some sort of statistical test (Fisher
+ *     or another chi-squared test) to identify independent components
+ *     and automatically separate them into smaller histograms.
+ *
+ * (3) Using the estimated number of distinct values in a bucket to
+ *     decide the selectivity of equality in the bucket (instead of
+ *     blindly using 1/2 of the bucket, we may use 1/ndistinct).
+ *     Of course, if the ndistinct estimate is way off, or when the
+ *     distribution is not uniform (one distict items get much more
+ *     items), this will fail. Also, we currently don't have ndistinct
+ *     estimate available at this moment (but it shouldn't be that
+ *     difficult to compute as ndistinct and ntuples should be available).
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities
+ *      (i.e. the selectivity of the most restrictive clause), because
+ *      that's the maximum we can ever get from ANDed list of clauses.
+ *      This may probably prevent issues with hitting too many buckets
+ *      and low precision histograms.
+ *
+ * TODO We may support some additional conditions, most importantly
+ *      those matching multiple columns (e.g. "a = b" or "a < b").
+ *      Ultimately we could track multi-table histograms for join
+ *      cardinality estimation.
+ *
+ * TODO Currently this is only estimating all clauses, or clauses
+ *      matching varRelid (when it's not 0). I'm not sure what's the
+ *      purpose of varRelid, but my assumption is this is used for
+ *      join conditions and such. In that case we can use those clauses
+ *      to restrict the other (i.e. filter the histogram buckets first,
+ *      before estimating the other clauses). This is essentially equal
+ *      to computing P(A|B) where "B" are the clauses not matching the
+ *      varRelid.
+ * 
+ * TODO Further thoughts on processing equality clauses - maybe it'd be
+ *      better to look for stats (with MCV) covered by the equality
+ *      clauses, because then we have a chance to find an exact match
+ *      in the MCV list, which is pretty much the best we can do. We may
+ *      also look at the least frequent MCV item, and use it as a upper
+ *      boundary for the selectivity (had there been a more frequent
+ *      item, it'd be in the MCV list).
+ *
+ *      These conditions may then be used as a condition for the other
+ *      selectivities, i.e. we may estimate P(A,B) first, and then
+ *      compute P(C|A,B) from another histogram. This may be useful when
+ *      we can estimate P(A,B) accurately (e.g. because it's a complete
+ *      equality match evaluated on MCV list), and then compute the
+ *      conditional probability P(C|A,B), giving us the requested stats
+ *
+ *          P(A,B,C) = P(A,B) * P(C|A,B)
+ *
+ * TODO There are several options for 'sanity clamping' the estimates.
+ *
+ *      First, if we have selectivities for each condition, then
+ * 
+ *          P(A,B) <= MIN(P(A), P(B))
+ *
+ *      Because additional conditions (connected by AND) can only lower
+ *      the probability.
+ *
+ *      So we can do some basic sanity checks using the single-variate
+ *      stats (the ones we have right now).
+ *
+ *      Second, when we have multivariate stats with a MCV list, then
+ *
+ *      (a) if we have a full equality condition (one equality condition
+ *          on each column) and we found a match in the MCV list, this is
+ *          the selectivity (and it's supposed to be exact)
+ *
+ *      (b) if we have a full equality condition and we haven't found a
+ *          match in the MCV list, then the selectivity is below the
+ *          lowest selectivity in the MCV list
+ *
+ *      (c) if we have a equality condition (not full), we can still
+ *          search the MCV for matches and use the sum of probabilities
+ *          as a lower boundary for the histogram (if there are no
+ *          matches in the MCV list, then we have no boundary)
+ *
+ *      Third, if there are multiple multivariate stats for a set of
+ *      clauses, we may compute all of them and then somehow aggregate
+ *      them - e.g. by choosing the minimum, median or average. The
+ *      multi-variate stats are susceptible to overestimation (because
+ *      we take 50% of the bucket for partial matches). Some stats may
+ *      give better estimates than others, but it's very difficult to
+ *      say determine that in advance which one is the best (it depends
+ *      on the number of buckets, number of additional columns not
+ *      referenced in the clauses etc.) so we may compute all and then
+ *      choose a sane aggregation (minimum seems like a good approach).
+ *      Of course, this may result in longer / more expensive estimation
+ *      (CPU-wise), but it may be worth it.
+ *
+ *      There are ways to address this, though. First, it's possible to
+ *      add a GUC choosing whether to do a 'simple' (using a single
+ *      stats expected to give the best estimate) and 'complex' (combining
+ *      the multiple estimates).
+ * 
+ *          multivariate_estimates = (simple|full)
+ *
+ *      Also, this might be enabled at a table level, by something like
+ * 
+ *          ALTER TABLE ... SET STATISTICS (simple|full)
+ * 
+ *      Which would make it possible to use this only for the tables
+ *      where the simple approach does not work.
+ * 
+ *      Also, there are ways to optimize this algorithmically. E.g. we
+ *      may try to get an estimate from a matching MCV list first, and
+ *      if we happen to get a "full equality match" we may stop computing
+ *      the estimates from other stats (for this condition) because
+ *      that's probably the best estimate we can really get.
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive).
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
+{
+	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation). 
+	 */
+
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ * 
+ */
+static Bitmapset *
+collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid, Oid *relid)
+{
+	Index		varno = 0;
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		is_mv_compatible(clause, varRelid, &varno, &attnums);
+	}
+
+	/*
+	 * If there are at least two attributes referenced by the clause(s),
+	 * fetch the relation info (and pass back the Oid of the relation).
+	 */
+	if (bms_num_members(attnums) > 1)
+	{
+		RelOptInfo *rel = find_base_rel(root, varno);
+		*relid = root->simple_rte_array[bms_singleton_member(rel->relids)]->relid;
+	}
+	else
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * We're looking for a histogram matching at least 2 attributes, and we
+ * want the smallest histogram available wrt. to number of buckets (to
+ * get efficient estimation and likely better precision. The precision
+ * depends on the total number of buckets too, but the lower the number
+ * of dimensions the smaller (and more precise) the buckets can get.
+ */
+static int
+choose_mv_histogram(int nmvstats, MVStats mvstats, Bitmapset *attnums)
+{
+	int i, j;
+
+	int choice = -1;
+	int current_matches = 1;					/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		int matches = 0;	/* columns matching this histogram */
+
+		int2vector * attrs = mvstats[i].stakeys;
+		int	numattrs = mvstats[i].stakeys->dim1;
+
+		/* count columns covered by the histogram */
+		for (j = 0; j < numattrs; j++)
+			if (bms_is_member(attrs->values[j], attnums))
+				matches++;
+
+		/*
+		 * Use this histogram when it improves the number of matches or
+		 * when it keeps the number of matches and is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = i;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen histogram, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(List *clauses, Oid varRelid, List **mvclauses, MVStats mvstats)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		RestrictInfo *rinfo;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/*
+		 * Only restrictinfo may be mv-compatible, so everything else
+		 * goes to the non-mv list directly
+		 * 
+		 * TODO create a macro/function to decide mv-compatible clauses
+		 *      (along the is_opclause for example)
+		 */
+		if (! IsA(clause, RestrictInfo))
+		{
+			non_mvclauses = lappend(non_mvclauses, clause);
+			continue;
+		}
+
+		rinfo = (RestrictInfo *) clause;
+		clause = (Node*)rinfo->clause;
+
+		/* Pseudoconstants go directly to the non-mv list too. */
+		if (rinfo->pseudoconstant)
+		{
+			non_mvclauses = lappend(non_mvclauses, rinfo);
+			continue;
+		}
+
+		if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+				{
+					non_mvclauses = lappend(non_mvclauses, rinfo);
+					continue;
+				}
+
+				/*
+				* If it's not a "<" or ">" or "=" operator, just ignore the
+				* clause. Otherwise note the relid and attnum for the variable.
+				*/
+				switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:
+					case F_SCALARGTSEL:
+					case F_EQSEL:
+						if (! IS_SPECIAL_VARNO(var->varno))	/* FIXME necessary here? */
+						{
+							bool match = false;
+							for (i = 0; i < numattrs; i++)
+								if (attrs->values[i] == var->varattno)
+									match = true;
+
+							if (match)
+								*mvclauses = lappend(*mvclauses, clause);
+							else
+								non_mvclauses = lappend(non_mvclauses, rinfo);
+						}
+				}
+			}
+		}
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ * 
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ */
+static bool
+is_mv_compatible(Node *clause, Oid varRelid, Index *varno, Bitmapset  **attnums)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* get the actual clause from the RestrictInfo ... */
+		clause = (Node*)rinfo->clause;
+
+		/* is it 'variable op constant' ? */
+		if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_SCALARLTSEL:
+						case F_SCALARGTSEL:
+						case F_EQSEL:
+							*varno = var->varno;
+							*attnums = bms_add_member(*attnums, var->varattno);
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * it's assumed we can skip computing the estimate from histogram,
+ * because all the rows matching the condition are represented by the
+ * MCV item.
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram.
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStats mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	ListCell * l;
+	char * mcvitems = NULL;
+	MCVList mcvlist = NULL;
+
+	Bitmapset *matches = NULL;	/* attributes with equality matches */
+
+	/* there's no MCV list yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = deserialize_mv_mcvlist(fetch_mv_mcvlist(mvstats->mvoid));
+
+	Assert(mcvlist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	mcvitems = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(mcvitems, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* no match here */
+	*lowsel = 1.0;
+
+	/* loop through the list of MV-compatible clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+		OpExpr * expr = (OpExpr*)clause;
+		bool		varonleft = true;
+		bool		ok;
+
+		/* operator */
+		FmgrInfo	opproc;
+
+		fmgr_info(get_opcode(expr->opno), &opproc);
+
+		ok = (NumRelids(clause) == 1) &&
+			 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		if (ok)
+		{
+
+			FmgrInfo	ltproc, gtproc;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+			bool isgt = (! varonleft);
+
+			/*
+			 * TODO Fetch only when really needed (probably for equality only)
+			 * TODO Technically either lt/gt is sufficient.
+			 * 
+			 * FIXME The code in analyze.c creates histograms only for types
+			 *       with enough ordering (by calling get_sort_group_operators).
+			 *       Is this the same assumption, i.e. are we certain that we
+			 *       get the ltproc/gtproc every time we ask? Or are there types
+			 *       where get_sort_group_operators returns ltopr and here we
+			 *       get nothing?
+			 */
+			TypeCacheEntry *typecache = lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, mvstats->stakeys);
+
+			fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+			fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+			/* process the MCV list first */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				bool tmp;
+				MCVItem item = mcvlist->items[i];
+
+				/* find the lowest selectivity in the MCV */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* skip MCV items already ruled out */
+				if (mcvitems[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* TODO consider bsearch here (list is sorted by values)
+				 * TODO handle other operators too (LT, GT)
+				 * TODO identify "full match" when the clauses fully
+				 *      match the whole MCV list (so that checking the
+				 *      histogram is not needed)
+				 */
+				if (get_oprrest(expr->opno) == F_EQSEL)
+				{
+					/*
+					 * We don't care about isgt in equality, because it does not matter
+					 * whether it's (var = const) or (const = var).
+					 */
+					if (memcmp(&cst->constvalue, &item->values[idx], sizeof(Datum)) != 0)
+						mcvitems[i] = MVSTATS_MATCH_NONE;
+					else
+						matches = bms_add_member(matches, idx);
+				}
+				else if (get_oprrest(expr->opno) == F_SCALARLTSEL)	/* column < constant */
+				{
+
+					if (! isgt)	/* (var < const) */
+					{
+						/*
+						 * First check whether the constant is below the lower boundary (in that 
+						 * case we can skip the bucket, because there's no overlap).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+
+						if (tmp)
+						{
+							mcvitems[i] = MVSTATS_MATCH_NONE; /* no match */
+							continue;
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+					else	/* (const < var) */
+					{
+						/*
+						 * First check whether the constant is above the upper boundary (in that 
+						 * case we can skip the bucket, because there's no overlap).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 item->values[idx],
+															 cst->constvalue));
+						if (tmp)
+						{
+							mcvitems[i] = MVSTATS_MATCH_NONE; /* no match */
+							continue;
+						}
+					}
+				}
+				else if (get_oprrest(expr->opno) == F_SCALARGTSEL)	/* column > constant */
+				{
+
+					if (! isgt)	/* (var > const) */
+					{
+						/*
+						 * First check whether the constant is above the upper boundary (in that 
+						 * case we can skip the bucket, because there's no overlap).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+						if (tmp)
+						{
+							mcvitems[i] = MVSTATS_MATCH_NONE; /* no match */
+							continue;
+						}
+
+					}
+					else /* (const > var) */
+					{
+						/*
+						 * First check whether the constant is below the lower boundary (in
+						 * that case we can skip the bucket, because there's no overlap).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 item->values[idx],
+															 cst->constvalue));
+						if (tmp)
+						{
+							mcvitems[i] = MVSTATS_MATCH_NONE; /* no match */
+							continue;
+						}
+					}
+
+				} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+
+			}
+		}
+	}
+
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		if (mcvitems[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	*fullmatch = (bms_num_members(matches) == mcvlist->ndimensions);
+
+	pfree(mcvitems);
+	pfree(mcvlist);
+
+	return s;
+}
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram list for the stats, the function returns 0.0.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStats mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	ListCell * l;
+	char *buckets = NULL;
+	MVHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = deserialize_mv_histogram(fetch_mv_histogram(mvstats->mvoid));
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	buckets = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(buckets,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+		OpExpr * expr = (OpExpr*)clause;
+		bool		varonleft = true;
+		bool		ok;
+
+		FmgrInfo	opproc;			/* operator */
+		fmgr_info(get_opcode(expr->opno), &opproc);
+
+		ok = (NumRelids(clause) == 1) &&
+			 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		if (ok)
+		{
+			FmgrInfo	ltproc;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+			bool isgt = (! varonleft);
+
+			/*
+			 * TODO Fetch only when really needed (probably for equality only)
+			 *
+			 * TODO Technically either lt/gt is sufficient.
+			 * 
+			 * FIXME The code in analyze.c creates histograms only for types
+			 *       with enough ordering (by calling get_sort_group_operators).
+			 *       Is this the same assumption, i.e. are we certain that we
+			 *       get the ltproc/gtproc every time we ask? Or are there types
+			 *       where get_sort_group_operators returns ltopr and here we
+			 *       get nothing?
+			 */
+			TypeCacheEntry *typecache
+				= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																   | TYPECACHE_GT_OPR);
+
+			/* lookup dimension for the attribute */
+			int idx = mv_get_index(var->varattno, mvstats->stakeys);
+
+			fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+			/*
+			 * Check this for all buckets that still have "true" in the bitmap
+			 * 
+			 * We already know the clauses use suitable operators (because that's
+			 * how we filtered them).
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				bool tmp;
+				MVBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if (buckets[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/*
+				* If it's not a "<" or ">" or "=" operator, just ignore the
+				* clause. Otherwise note the relid and attnum for the variable.
+				*
+				* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+				*      with reverse order of variable/constant) is correct. I wouldn't
+				*      be surprised if there was some mixup. Using the lt/gt operators
+				*      instead of messing with the opproc could make it simpler.
+				*      It would however be using a different operator than the query,
+				*      although it's not any shadier than using the selectivity function
+				*      as is done currently.
+				*
+				* FIXME Once the min/max values are deduplicated, we can easily minimize
+				*       the number of calls to the comparator (assuming we keep the
+				*       deduplicated structure). See the note on compression at MVBucket
+				*       serialize/deserialize methods.
+				*/
+				switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:	/* column < constant */
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->min[idx]));
+							if (tmp)
+							{
+								buckets[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+							/*
+							 * Now check whether the upper boundary is below the constant (in that
+							 * case it's a partial match).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->max[idx]));
+
+							if (tmp)
+								buckets[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+						}
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->max[idx],
+																 cst->constvalue));
+							if (tmp)
+							{
+								buckets[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+							/*
+							 * Now check whether the lower boundary is below the constant (in that
+							 * case it's a partial match).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->min[idx],
+																 cst->constvalue));
+
+							if (tmp)
+								buckets[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+						}
+						break;
+
+					case F_SCALARGTSEL:	/* column > constant */
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->max[idx]));
+							if (tmp)
+							{
+								buckets[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+							/*
+							 * Now check whether the lower boundary is below the constant (in that
+							 * case it's a partial match).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->min[idx]));
+
+							if (tmp)
+								buckets[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->min[idx],
+																 cst->constvalue));
+							if (tmp)
+							{
+								buckets[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+							/*
+							 * Now check whether the upper boundary is below the constant (in that
+							 * case it's a partial match).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->max[idx],
+																 cst->constvalue));
+
+							if (tmp)
+								buckets[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+						}
+
+						break;
+
+					case F_EQSEL:
+
+						/*
+						 * We only check whether the value is within the bucket, using the lt/gt
+						 * operators fetched from type cache.
+						 * 
+						 * TODO We'll use the default 50% estimate, but that's probably way off
+						 *		if there are multiple distinct values. Consider tweaking this a
+						 *		somehow, e.g. using only a part inversely proportional to the
+						 *		estimated number of distinct values in the bucket.
+						 *
+						 * TODO This does not handle inclusion flags at the moment, thus counting
+						 *		some buckets twice (when hitting the boundary).
+						 * 
+						 * TODO Optimization is that if max[i] == min[i], it's effectively a MCV
+						 *		item and we can count the whole bucket as a complete match (thus
+						 *		using 100% bucket selectivity and not just 50%).
+						 * 
+						 * TODO Technically some buckets may "degenerate" into single-value
+						 *		buckets (not necessarily for all the dimensions) - maybe this
+						 *		is better than keeping a separate MCV list (multi-dimensional).
+						 *		Update: Actually, that's unlikely to be better than a separate
+						 *		MCV list for two reasons - first, it requires ~2x the space
+						 *		(because of storing lower/upper boundaries) and second because
+						 *		the buckets are ranges - depending on the partitioning algorithm
+						 *		it may not even degenerate into (min=max) bucket. For example the
+						 *		the current partitioning algorithm never does that.
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 bucket->min[idx]));
+
+						if (tmp)
+						{
+							buckets[i] = MVSTATS_MATCH_NONE;	/* constvalue < min */
+							continue;
+						}
+
+						tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+															 DEFAULT_COLLATION_OID,
+															 bucket->max[idx],
+															 cst->constvalue));
+
+						if (tmp)
+						{
+							buckets[i] = MVSTATS_MATCH_NONE;	/* constvalue > max */
+							continue;
+						}
+
+						/* partial match */
+						buckets[i] = MVSTATS_MATCH_PARTIAL;
+
+						break;
+				}
+			}
+		}
+	}
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		if (buckets[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (buckets[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+	return s;
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index bd180e7..d725ae0 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -366,6 +366,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_generic_options alter_generic_options
 				relation_expr_list dostmt_opt_list
 
+%type <list>	OptStatsOptions 
+%type <str>		stats_options_name
+%type <node>	stats_options_arg
+%type <defelt>	stats_options_elem
+%type <list>	stats_options_list
+
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -484,7 +491,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <keyword> unreserved_keyword type_func_name_keyword
 %type <keyword> col_name_keyword reserved_keyword
 
-%type <node>	TableConstraint TableLikeClause
+%type <node>	TableConstraint TableLikeClause TableStatistics
 %type <ival>	TableLikeOptionList TableLikeOption
 %type <list>	ColQualList
 %type <node>	ColConstraint ColConstraintElem ConstraintAttr
@@ -2312,6 +2319,14 @@ alter_table_cmd:
 					n->subtype = AT_DisableRowSecurity;
 					$$ = (Node *)n;
 				}
+			/* ALTER TABLE <name> ADD STATISTICS (options) ON (columns) ... */
+			| ADD_P TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_AddStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
 			| alter_generic_options
 				{
 					AlterTableCmd *n = makeNode(AlterTableCmd);
@@ -3382,6 +3397,56 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				ALTER TABLE relname ADD STATISTICS (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+TableStatistics:
+			STATISTICS OptStatsOptions ON '(' columnList ')'
+				{
+					StatisticsDef *n = makeNode(StatisticsDef);
+					n->keys  = $5;
+					n->options  = $2;
+					$$ = (Node *) n;
+				}
+		;
+
+OptStatsOptions:
+			'(' stats_options_list ')'		{ $$ = $2; }
+			| /*EMPTY*/						{ $$ = NIL; }
+		;
+
+stats_options_list:
+			stats_options_elem
+				{
+					$$ = list_make1($1);
+				}
+			| stats_options_list ',' stats_options_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+stats_options_elem:
+			stats_options_name stats_options_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+stats_options_name:
+			NonReservedWord			{ $$ = $1; }
+		;
+
+stats_options_arg:
+			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
+			| NumericOnly			{ $$ = (Node *) $1; }
+			| /* EMPTY */			{ $$ = NULL; }
+		;
+
 
 /*****************************************************************************
  *
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 94d951c..ec90773 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -499,6 +500,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		128
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 870692c..d2266c0 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,11 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3259, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3259
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3264, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3264
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..d725957
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,89 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3260
+
+CATALOG(pg_mv_statistic,3260)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+
+	/* statistics requested to build */
+	bool		hist_enabled;		/* build histogram? */
+	bool		mcv_enabled;		/* build MCV list? */
+	bool		mcv_hashed;			/* build hashed MCV? */
+	bool		assoc_enabled;		/* analyze associations? */
+
+	/* histogram / MCV size */
+	int32		hist_max_buckets;	/* max buckets */
+	int32		mcv_max_items;		/* max MCV items */
+
+	/* statistics that are available (if requested) */
+	bool		hist_built;			/* histogram was built */
+	bool		mcv_built;			/* MCV list was built */
+	bool		assoc_built;		/* associations were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		staassoc;			/* association rules (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_attrdef
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					14
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_hist_enabled		2
+#define Anum_pg_mv_statistic_mcv_enabled		3
+#define Anum_pg_mv_statistic_mcv_hashed			4
+#define Anum_pg_mv_statistic_assoc_enabled		5
+#define Anum_pg_mv_statistic_hist_max_buckets	6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_hist_built			8
+#define Anum_pg_mv_statistic_mcv_built			9
+#define Anum_pg_mv_statistic_assoc_built		10
+#define Anum_pg_mv_statistic_stakeys			11
+#define Anum_pg_mv_statistic_staassoc			12
+#define Anum_pg_mv_statistic_stamcv				13
+#define Anum_pg_mv_statistic_stahist			14
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 497e652..c3c03b6 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2676,6 +2676,13 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3261 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3262 (  pg_mv_stats_mvclist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_mvclist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3263 (  pg_mv_stats_histogram_gnuplot	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_histogram_gnuplot _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: 2D histogram gnuplot");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index a4af551..c7839c0 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3265, 3954);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bc71fea..b916edd 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -413,6 +413,7 @@ typedef enum NodeTag
 	T_XmlSerialize,
 	T_WithClause,
 	T_CommonTableExpr,
+	T_StatisticsDef,
 
 	/*
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 3e4f815..c3e458a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -543,6 +543,14 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct StatisticsDef
+{
+	NodeTag		type;
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+} StatisticsDef;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1338,7 +1346,8 @@ typedef enum AlterTableType
 	AT_ReplicaIdentity,			/* REPLICA IDENTITY */
 	AT_EnableRowSecurity,		/* ENABLE ROW SECURITY */
 	AT_DisableRowSecurity,		/* DISABLE ROW SECURITY */
-	AT_GenericOptions			/* OPTIONS (...) */
+	AT_GenericOptions,			/* OPTIONS (...) */
+	AT_AddStatistics			/* add statistics */
 } AlterTableType;
 
 typedef struct ReplicaIdentityStmt
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..157891a
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,283 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+/*
+ * Multivariate statistics for planner/optimizer, implementing extensions
+ * of the single-column statistics:
+ * 
+ * - multivariate MCV list
+ * - multivariate histograms
+ *
+ * There's also an experimental support for associative rules (values in
+ * one column implying values in other columns - e.g. ZIP code implies
+ * name of a city, etc.).
+ *
+ * The current implementation has various limitations:
+ *
+ *  (a) it supports only data types passed by value
+ *
+ *  (b) no support for NULL values
+ *
+ * Both (a) and (b) should be straightforward to fix (and usually
+ * described in comments at related data structures or functions).
+ *
+ * The stats may be built only directly on columns, not on expressions.
+ * And there are usually some additional technical limits (e.g. number
+ * of columns in a histogram, etc.).
+ *
+ * Those limits serve mostly as sanity checks and while increasing them
+ * is possible (the implementation should not break), it's expected to
+ * lead either to very bad precision or expensive planning.
+ */
+
+/*
+ * Multivariate histograms
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by an array of lower and
+ * upper boundaries, so that for for the i-th attribute
+ * 
+ *     min[i] <= value[i] <= max[i]
+ *
+ * Each bucket tracks frequency (fraction of tuples it contains),
+ * information about the inequalities, number of distinct values in
+ * each dimension (which is used when building the histogram) etc.
+ *
+ * The boundaries may be either inclusive or exclusive, or the whole
+ * dimension may be NULL.
+ *
+ * The buckets may overlap (assuming the build algorithm keeps the
+ * frequencies additive) or may not cover the whole space (i.e. allow
+ * gaps). This entirely depends on the algorithm used to build the
+ * histogram.
+ *
+ * The histograms are marked with a 'magic' constant, mostly to make
+ * sure the bytea really is a histogram in serialized form.
+ *
+ * We do expect to support multiple histogram types, with different
+ * features etc. The 'type' field is used to identify those types.
+ * Technically some histogram types might use completely different
+ * bucket representation, but that's not expected at the moment.
+ *
+ * TODO Add pointer to 'private' data, meant for private data for
+ *      other algorithms for building the histogram.
+ *
+ * TODO The current implementation does not handle NULL values (it's
+ *      somehow prepared for that, but the algorithm building the
+ *      histogram ignores them). The idea is to build buckets with one
+ *      or more NULL-only dimensions - there'll be at most 2^ndimensions
+ *      such buckets, which for 8 atttributes (current limit) is 256.
+ *      That's quite reasonable, considering we expect thousands of
+ *      buckets in total.
+ * 
+ * TODO This structure is used both when building the histogram, and
+ *      then when using it to compute estimates. That's why the last
+ *      few elements are not used once the histogram is built.
+ *
+ * TODO The limit on number of buckets is quite arbitrary, aiming for
+ *      sufficient accuracy while still being fast. Probably should be
+ *      replaced with a dynamic limit dependent on statistics target,
+ *      number of attributes (dimensions) and statistics target
+ *      associated with the attributes. Also, this needs to be related
+ *      to the number of sampled rows, by either clamping it to a
+ *      reasonable number (after seeing the number of rows) or using
+ *      it when computing the number of rows to sample. Something like
+ *      10 rows per bucket seems reasonable.
+ *
+ * TODO We may replace the bool arrays with a suitably large data type
+ *      (say, uint16 or uint32) and get rid of the allocations. It's
+ *      unlikely we'll ever support more than 32 columns as that'd
+ *      result in poor precision, huge histograms (splitting each
+ *      dimension once would mean 2^32 buckets), and very expensive
+ *      estimation. MCVItem already does it this way.
+ *
+ * TODO Actually the distinct stats (both for combination of all columns
+ *      and for combinations of various subsets of columns) should be
+ *      moved to a separate structure (next to histogram/MCV/...) to
+ *      make it useful even without a histogram computed etc.
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+	float	ndistinct;	/* frequency of distinct values */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized), but
+	 * it could be useful for estimating ndistinct for combinations of
+	 * columns.
+	 *
+	 * It would mean tracking 2^N values for each bucket, and even if
+	 * those values might be stores in 1B it's still a lot of space
+	 * (considering the expected number of buckets).
+	 *
+	 * TODO Consider tracking ndistincts for all attribute combinations.
+	 */
+	uint32 *ndistincts;
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */ 
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/*
+	 * Sample tuples falling into this bucket, index of the dimension
+	 * the bucket was split by in the last step.
+	 *
+	 * XXX These fields are needed only while building the histogram,
+	 *     and are not serialized at all.
+	 */
+	HeapTuple  *rows;
+	uint32		numrows;
+	int			last_split_dimension;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVHIST_MAGIC	0x7F8C5670		/* marks serialized bytea */
+#define MVHIST_TYPE_BASIC		1		/* basic histogram type */
+
+/* limits (mostly sanity check, may be relaxed in the future) */
+#define MVHIST_MAX_BUCKETS		16384	/* max number of buckets */
+
+/* bucket size in a serialized form */
+#define BUCKET_SIZE_SERIALIZED(ndims) \
+	(offsetof(MVBucketData, ndistincts) + \
+	(ndims) * (2 * sizeof(uint16) + sizeof(uint32) + 3 * sizeof(bool)))
+
+
+/*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ *
+ * This already uses the trick with using uint32 as a null bitmap.
+ * 
+ * TODO Shouldn't the MCVItemData use plain pointer for values, instead
+ *      of the single-item array trick?
+ *
+ * TODO It's possible to build a special case of MCV list, storing not
+ *      the actual values but only 32/64-bit hash. This is only useful
+ *      for estimating equality clauses and for large varlena types.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	uint32		nulls;		/* lags of NULL values (up to 32 columns) */
+	Datum		values[1];	/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */ 
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/* TODO consider increasing the limit, and/or using statistics target */
+#define MVSTAT_MCVLIST_MAX_ITEMS	1024	/* max items in MCV list */
+
+
+/*
+ * Basic info about the stats, used when choosing what to use
+ * 
+ * TODO Add info about what statistics is available (histogram, MCV,
+ *      hashed MCV, assciative rules).
+ */
+typedef struct MVStatsData {
+	Oid			mvoid;		/* OID of the stats in pg_mv_statistic */
+	int2vector *stakeys;	/* attnums for columns in the stats */
+	bool		hist_built;	/* histogram is already available */
+	bool		mcv_built;	/* MCV list is already available */
+	bool		assoc_built;	/* associative rules available */
+} MVStatsData;
+
+typedef struct MVStatsData *MVStats;
+
+
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
+bytea * fetch_mv_histogram(Oid mvoid);
+bytea * fetch_mv_mcvlist(Oid mvoid);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVHistogram deserialize_mv_histogram(bytea * data);
+MCVList     deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_gnuplot(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mvclist_info(PG_FUNCTION_ARGS);
+
+#endif
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index f97229f..a275bd5 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/regression.diffs b/src/test/regress/regression.diffs
new file mode 100644
index 0000000..179c09d
--- /dev/null
+++ b/src/test/regress/regression.diffs
@@ -0,0 +1,294 @@
+*** /home/tomas/work/postgres/src/test/regress/expected/updatable_views.out	2014-10-29 00:22:04.820171312 +0100
+--- /home/tomas/work/postgres/src/test/regress/results/updatable_views.out	2014-11-10 02:54:44.083052362 +0100
+***************
+*** 657,668 ****
+    FROM information_schema.views
+   WHERE table_name LIKE 'rw_view%'
+   ORDER BY table_name;
+!  table_name | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  rw_view1   | NO           | NO                 | NO                   | NO                   | NO
+!  rw_view2   | NO           | NO                 | NO                   | NO                   | NO
+! (2 rows)
+! 
+  SELECT table_name, column_name, is_updatable
+    FROM information_schema.columns
+   WHERE table_name LIKE 'rw_view%'
+--- 657,663 ----
+    FROM information_schema.views
+   WHERE table_name LIKE 'rw_view%'
+   ORDER BY table_name;
+! ERROR:  no relation entry for relid 1880
+  SELECT table_name, column_name, is_updatable
+    FROM information_schema.columns
+   WHERE table_name LIKE 'rw_view%'
+***************
+*** 710,721 ****
+    FROM information_schema.views
+   WHERE table_name LIKE 'rw_view%'
+   ORDER BY table_name;
+!  table_name | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  rw_view1   | NO           | NO                 | NO                   | NO                   | YES
+!  rw_view2   | NO           | NO                 | NO                   | NO                   | NO
+! (2 rows)
+! 
+  SELECT table_name, column_name, is_updatable
+    FROM information_schema.columns
+   WHERE table_name LIKE 'rw_view%'
+--- 705,711 ----
+    FROM information_schema.views
+   WHERE table_name LIKE 'rw_view%'
+   ORDER BY table_name;
+! ERROR:  no relation entry for relid 1880
+  SELECT table_name, column_name, is_updatable
+    FROM information_schema.columns
+   WHERE table_name LIKE 'rw_view%'
+***************
+*** 746,757 ****
+    FROM information_schema.views
+   WHERE table_name LIKE 'rw_view%'
+   ORDER BY table_name;
+!  table_name | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  rw_view1   | NO           | NO                 | YES                  | NO                   | YES
+!  rw_view2   | NO           | NO                 | NO                   | NO                   | NO
+! (2 rows)
+! 
+  SELECT table_name, column_name, is_updatable
+    FROM information_schema.columns
+   WHERE table_name LIKE 'rw_view%'
+--- 736,742 ----
+    FROM information_schema.views
+   WHERE table_name LIKE 'rw_view%'
+   ORDER BY table_name;
+! ERROR:  no relation entry for relid 1880
+  SELECT table_name, column_name, is_updatable
+    FROM information_schema.columns
+   WHERE table_name LIKE 'rw_view%'
+***************
+*** 782,793 ****
+    FROM information_schema.views
+   WHERE table_name LIKE 'rw_view%'
+   ORDER BY table_name;
+!  table_name | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  rw_view1   | NO           | NO                 | YES                  | YES                  | YES
+!  rw_view2   | NO           | NO                 | NO                   | NO                   | NO
+! (2 rows)
+! 
+  SELECT table_name, column_name, is_updatable
+    FROM information_schema.columns
+   WHERE table_name LIKE 'rw_view%'
+--- 767,773 ----
+    FROM information_schema.views
+   WHERE table_name LIKE 'rw_view%'
+   ORDER BY table_name;
+! ERROR:  no relation entry for relid 1880
+  SELECT table_name, column_name, is_updatable
+    FROM information_schema.columns
+   WHERE table_name LIKE 'rw_view%'
+***************
+*** 1385,1398 ****
+  Options: check_option=local
+  
+  SELECT * FROM information_schema.views WHERE table_name = 'rw_view1';
+!  table_catalog | table_schema | table_name |          view_definition           | check_option | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ---------------+--------------+------------+------------------------------------+--------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  regression    | public       | rw_view1   |  SELECT base_tbl.a,               +| LOCAL        | YES          | YES                | NO                   | NO                   | NO
+!                |              |            |     base_tbl.b                    +|              |              |                    |                      |                      | 
+!                |              |            |    FROM base_tbl                  +|              |              |                    |                      |                      | 
+!                |              |            |   WHERE (base_tbl.a < base_tbl.b); |              |              |                    |                      |                      | 
+! (1 row)
+! 
+  INSERT INTO rw_view1 VALUES(3,4); -- ok
+  INSERT INTO rw_view1 VALUES(4,3); -- should fail
+  ERROR:  new row violates WITH CHECK OPTION for "rw_view1"
+--- 1365,1371 ----
+  Options: check_option=local
+  
+  SELECT * FROM information_schema.views WHERE table_name = 'rw_view1';
+! ERROR:  no relation entry for relid 1880
+  INSERT INTO rw_view1 VALUES(3,4); -- ok
+  INSERT INTO rw_view1 VALUES(4,3); -- should fail
+  ERROR:  new row violates WITH CHECK OPTION for "rw_view1"
+***************
+*** 1437,1449 ****
+  Options: check_option=cascaded
+  
+  SELECT * FROM information_schema.views WHERE table_name = 'rw_view2';
+!  table_catalog | table_schema | table_name |      view_definition       | check_option | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ---------------+--------------+------------+----------------------------+--------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  regression    | public       | rw_view2   |  SELECT rw_view1.a        +| CASCADED     | YES          | YES                | NO                   | NO                   | NO
+!                |              |            |    FROM rw_view1          +|              |              |                    |                      |                      | 
+!                |              |            |   WHERE (rw_view1.a < 10); |              |              |                    |                      |                      | 
+! (1 row)
+! 
+  INSERT INTO rw_view2 VALUES (-5); -- should fail
+  ERROR:  new row violates WITH CHECK OPTION for "rw_view1"
+  DETAIL:  Failing row contains (-5).
+--- 1410,1416 ----
+  Options: check_option=cascaded
+  
+  SELECT * FROM information_schema.views WHERE table_name = 'rw_view2';
+! ERROR:  no relation entry for relid 1880
+  INSERT INTO rw_view2 VALUES (-5); -- should fail
+  ERROR:  new row violates WITH CHECK OPTION for "rw_view1"
+  DETAIL:  Failing row contains (-5).
+***************
+*** 1477,1489 ****
+  Options: check_option=local
+  
+  SELECT * FROM information_schema.views WHERE table_name = 'rw_view2';
+!  table_catalog | table_schema | table_name |      view_definition       | check_option | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ---------------+--------------+------------+----------------------------+--------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  regression    | public       | rw_view2   |  SELECT rw_view1.a        +| LOCAL        | YES          | YES                | NO                   | NO                   | NO
+!                |              |            |    FROM rw_view1          +|              |              |                    |                      |                      | 
+!                |              |            |   WHERE (rw_view1.a < 10); |              |              |                    |                      |                      | 
+! (1 row)
+! 
+  INSERT INTO rw_view2 VALUES (-10); -- ok, but not in view
+  INSERT INTO rw_view2 VALUES (20); -- should fail
+  ERROR:  new row violates WITH CHECK OPTION for "rw_view2"
+--- 1444,1450 ----
+  Options: check_option=local
+  
+  SELECT * FROM information_schema.views WHERE table_name = 'rw_view2';
+! ERROR:  no relation entry for relid 1880
+  INSERT INTO rw_view2 VALUES (-10); -- ok, but not in view
+  INSERT INTO rw_view2 VALUES (20); -- should fail
+  ERROR:  new row violates WITH CHECK OPTION for "rw_view2"
+***************
+*** 1517,1529 ****
+    WHERE rw_view1.a < 10;
+  
+  SELECT * FROM information_schema.views WHERE table_name = 'rw_view2';
+!  table_catalog | table_schema | table_name |      view_definition       | check_option | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ---------------+--------------+------------+----------------------------+--------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  regression    | public       | rw_view2   |  SELECT rw_view1.a        +| NONE         | YES          | YES                | NO                   | NO                   | NO
+!                |              |            |    FROM rw_view1          +|              |              |                    |                      |                      | 
+!                |              |            |   WHERE (rw_view1.a < 10); |              |              |                    |                      |                      | 
+! (1 row)
+! 
+  INSERT INTO rw_view2 VALUES (30); -- ok, but not in view
+  SELECT * FROM base_tbl;
+    a  
+--- 1478,1484 ----
+    WHERE rw_view1.a < 10;
+  
+  SELECT * FROM information_schema.views WHERE table_name = 'rw_view2';
+! ERROR:  no relation entry for relid 1880
+  INSERT INTO rw_view2 VALUES (30); -- ok, but not in view
+  SELECT * FROM base_tbl;
+    a  
+***************
+*** 1543,1559 ****
+  CREATE VIEW rw_view2 AS SELECT * FROM rw_view1 WHERE a > 0;
+  CREATE VIEW rw_view3 AS SELECT * FROM rw_view2 WITH CHECK OPTION;
+  SELECT * FROM information_schema.views WHERE table_name LIKE E'rw\\_view_' ORDER BY table_name;
+!  table_catalog | table_schema | table_name |      view_definition      | check_option | is_updatable | is_insertable_into | is_trigger_updatable | is_trigger_deletable | is_trigger_insertable_into 
+! ---------------+--------------+------------+---------------------------+--------------+--------------+--------------------+----------------------+----------------------+----------------------------
+!  regression    | public       | rw_view1   |  SELECT base_tbl.a       +| CASCADED     | YES          | YES                | NO                   | NO                   | NO
+!                |              |            |    FROM base_tbl;         |              |              |                    |                      |                      | 
+!  regression    | public       | rw_view2   |  SELECT rw_view1.a       +| NONE         | YES          | YES                | NO                   | NO                   | NO
+!                |              |            |    FROM rw_view1         +|              |              |                    |                      |                      | 
+!                |              |            |   WHERE (rw_view1.a > 0); |              |              |                    |                      |                      | 
+!  regression    | public       | rw_view3   |  SELECT rw_view2.a       +| CASCADED     | YES          | YES                | NO                   | NO                   | NO
+!                |              |            |    FROM rw_view2;         |              |              |                    |                      |                      | 
+! (3 rows)
+! 
+  INSERT INTO rw_view1 VALUES (-1); -- ok
+  INSERT INTO rw_view1 VALUES (1); -- ok
+  INSERT INTO rw_view2 VALUES (-2); -- ok, but not in view
+--- 1498,1504 ----
+  CREATE VIEW rw_view2 AS SELECT * FROM rw_view1 WHERE a > 0;
+  CREATE VIEW rw_view3 AS SELECT * FROM rw_view2 WITH CHECK OPTION;
+  SELECT * FROM information_schema.views WHERE table_name LIKE E'rw\\_view_' ORDER BY table_name;
+! ERROR:  no relation entry for relid 1880
+  INSERT INTO rw_view1 VALUES (-1); -- ok
+  INSERT INTO rw_view1 VALUES (1); -- ok
+  INSERT INTO rw_view2 VALUES (-2); -- ok, but not in view
+
+======================================================================
+
+*** /home/tomas/work/postgres/src/test/regress/expected/sanity_check.out	2014-10-29 00:22:04.812171313 +0100
+--- /home/tomas/work/postgres/src/test/regress/results/sanity_check.out	2014-11-10 02:54:44.150052357 +0100
+***************
+*** 113,118 ****
+--- 113,119 ----
+  pg_language|t
+  pg_largeobject|t
+  pg_largeobject_metadata|t
++ pg_mv_statistic|t
+  pg_namespace|t
+  pg_opclass|t
+  pg_operator|t
+
+======================================================================
+
+*** /home/tomas/work/postgres/src/test/regress/expected/rowsecurity.out	2014-10-29 00:22:04.811171313 +0100
+--- /home/tomas/work/postgres/src/test/regress/results/rowsecurity.out	2014-11-10 02:54:45.775052238 +0100
+***************
+*** 901,925 ****
+  -- prepared statement with rls_regress_user0 privilege
+  PREPARE p1(int) AS SELECT * FROM t1 WHERE a <= $1;
+  EXECUTE p1(2);
+!  a |  b  
+! ---+-----
+!  2 | bbb
+!  2 | bcd
+!  2 | yyy
+! (3 rows)
+! 
+  EXPLAIN (COSTS OFF) EXECUTE p1(2);
+!                   QUERY PLAN                  
+! ----------------------------------------------
+!  Append
+!    ->  Seq Scan on t1
+!          Filter: ((a <= 2) AND ((a % 2) = 0))
+!    ->  Seq Scan on t2
+!          Filter: ((a <= 2) AND ((a % 2) = 0))
+!    ->  Seq Scan on t3
+!          Filter: ((a <= 2) AND ((a % 2) = 0))
+! (7 rows)
+! 
+  -- superuser is allowed to bypass RLS checks
+  RESET SESSION AUTHORIZATION;
+  SET row_security TO OFF;
+--- 901,909 ----
+  -- prepared statement with rls_regress_user0 privilege
+  PREPARE p1(int) AS SELECT * FROM t1 WHERE a <= $1;
+  EXECUTE p1(2);
+! ERROR:  no relation entry for relid 530
+  EXPLAIN (COSTS OFF) EXECUTE p1(2);
+! ERROR:  no relation entry for relid 530
+  -- superuser is allowed to bypass RLS checks
+  RESET SESSION AUTHORIZATION;
+  SET row_security TO OFF;
+
+======================================================================
+
+*** /home/tomas/work/postgres/src/test/regress/expected/rules.out	2014-10-29 00:22:04.812171313 +0100
+--- /home/tomas/work/postgres/src/test/regress/results/rules.out	2014-11-10 02:54:48.329052050 +0100
+***************
+*** 1353,1358 ****
+--- 1353,1368 ----
+       LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
+       LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
+    WHERE (c.relkind = 'm'::"char");
++ pg_mv_stats| SELECT n.nspname AS schemaname,
++     c.relname AS tablename,
++     s.stakeys AS attnums,
++     length(s.stamcv) AS mcvbytes,
++     pg_mv_stats_mvclist_info(s.stamcv) AS mcvinfo,
++     length(s.stahist) AS histbytes,
++     pg_mv_stats_histogram_info(s.stahist) AS histinfo
++    FROM ((pg_mv_statistic s
++      JOIN pg_class c ON ((c.oid = s.starelid)))
++      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
+  pg_policies| SELECT n.nspname AS schemaname,
+      c.relname AS tablename,
+      rs.rsecpolname AS policyname,
+
+======================================================================
+
diff --git a/src/test/regress/regression.out b/src/test/regress/regression.out
new file mode 100644
index 0000000..48a4a25
--- /dev/null
+++ b/src/test/regress/regression.out
@@ -0,0 +1,147 @@
+test tablespace               ... ok
+test boolean                  ... ok
+test char                     ... ok
+test name                     ... ok
+test varchar                  ... ok
+test text                     ... ok
+test int2                     ... ok
+test int4                     ... ok
+test int8                     ... ok
+test oid                      ... ok
+test float4                   ... ok
+test float8                   ... ok
+test bit                      ... ok
+test numeric                  ... ok
+test txid                     ... ok
+test uuid                     ... ok
+test enum                     ... ok
+test money                    ... ok
+test rangetypes               ... ok
+test pg_lsn                   ... ok
+test regproc                  ... ok
+test strings                  ... ok
+test numerology               ... ok
+test point                    ... ok
+test lseg                     ... ok
+test line                     ... ok
+test box                      ... ok
+test path                     ... ok
+test polygon                  ... ok
+test circle                   ... ok
+test date                     ... ok
+test time                     ... ok
+test timetz                   ... ok
+test timestamp                ... ok
+test timestamptz              ... ok
+test interval                 ... ok
+test abstime                  ... ok
+test reltime                  ... ok
+test tinterval                ... ok
+test inet                     ... ok
+test macaddr                  ... ok
+test tstypes                  ... ok
+test comments                 ... ok
+test geometry                 ... ok
+test horology                 ... ok
+test regex                    ... ok
+test oidjoins                 ... ok
+test type_sanity              ... ok
+test opr_sanity               ... ok
+test insert                   ... ok
+test create_function_1        ... ok
+test create_type              ... ok
+test create_table             ... ok
+test create_function_2        ... ok
+test copy                     ... ok
+test copyselect               ... ok
+test create_misc              ... ok
+test create_operator          ... ok
+test create_index             ... ok
+test create_view              ... ok
+test create_aggregate         ... ok
+test create_function_3        ... ok
+test create_cast              ... ok
+test constraints              ... ok
+test triggers                 ... ok
+test inherit                  ... ok
+test create_table_like        ... ok
+test typed_table              ... ok
+test vacuum                   ... ok
+test drop_if_exists           ... ok
+test updatable_views          ... FAILED
+test sanity_check             ... FAILED
+test errors                   ... ok
+test select                   ... ok
+test select_into              ... ok
+test select_distinct          ... ok
+test select_distinct_on       ... ok
+test select_implicit          ... ok
+test select_having            ... ok
+test subselect                ... ok
+test union                    ... ok
+test case                     ... ok
+test join                     ... ok
+test aggregates               ... ok
+test transactions             ... ok
+test random                   ... ok
+test portals                  ... ok
+test arrays                   ... ok
+test btree_index              ... ok
+test hash_index               ... ok
+test update                   ... ok
+test delete                   ... ok
+test namespace                ... ok
+test prepared_xacts           ... ok
+test privileges               ... ok
+test security_label           ... ok
+test collate                  ... ok
+test matview                  ... ok
+test lock                     ... ok
+test replica_identity         ... ok
+test rowsecurity              ... FAILED
+test alter_generic            ... ok
+test brin                     ... ok
+test misc                     ... ok
+test psql                     ... ok
+test async                    ... ok
+test rules                    ... FAILED
+test event_trigger            ... ok
+test select_views             ... ok
+test portals_p2               ... ok
+test foreign_key              ... ok
+test cluster                  ... ok
+test dependency               ... ok
+test guc                      ... ok
+test bitmapops                ... ok
+test combocid                 ... ok
+test tsearch                  ... ok
+test tsdicts                  ... ok
+test foreign_data             ... ok
+test window                   ... ok
+test xmlmap                   ... ok
+test functional_deps          ... ok
+test advisory_lock            ... ok
+test json                     ... ok
+test jsonb                    ... ok
+test indirect_toast           ... ok
+test equivclass               ... ok
+test plancache                ... ok
+test limit                    ... ok
+test plpgsql                  ... ok
+test copy2                    ... ok
+test temp                     ... ok
+test domain                   ... ok
+test rangefuncs               ... ok
+test prepare                  ... ok
+test without_oid              ... ok
+test conversion               ... ok
+test truncate                 ... ok
+test alter_table              ... ok
+test sequence                 ... ok
+test polymorphism             ... ok
+test rowtypes                 ... ok
+test returning                ... ok
+test largeobject              ... ok
+test with                     ... ok
+test xml                      ... ok
+test stats                    ... ok

#12

Simon Riggs

simon@2ndQuadrant.com

about 11 years ago

In reply to: Tomas Vondra (#1)

Re: WIP: multivariate statistics / proof of concept

On 12 October 2014 23:00, Tomas Vondra <tv@fuzzy.cz> wrote:

It however seems to be working sufficiently well at this point, enough
to get some useful feedback. So here we go.

This looks interesting and useful.

What I'd like to check before a detailed review is that this has
sufficient applicability to be useful.

My understanding is that Q9 and Q18 of TPC-H have poor plans as a
result of multi-column stats errors.

Could you look at those queries and confirm that this patch can
produce better plans for them?

If so, I will work with you to review this patch.

One aspect of the patch that seems to be missing is a user declaration
of correlation, just as we have for setting n_distinct. It seems like
an even easier place to start to just let the user specify the stats
declaratively. That way we can split the patch into two parts. First,
allow multi column stats that are user declared. Then add user stats
collected by ANALYZE. The first part is possibly contentious and thus
a good initial focus. The second part will have lots of discussion, so
good to skip for a first version.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: Simon Riggs (#12)

Re: WIP: multivariate statistics / proof of concept

Dne 13 Listopad 2014, 12:31, Simon Riggs napsal(a):

On 12 October 2014 23:00, Tomas Vondra <tv@fuzzy.cz> wrote:

It however seems to be working sufficiently well at this point, enough
to get some useful feedback. So here we go.

This looks interesting and useful.

What I'd like to check before a detailed review is that this has
sufficient applicability to be useful.

My understanding is that Q9 and Q18 of TPC-H have poor plans as a
result of multi-column stats errors.

Could you look at those queries and confirm that this patch can
produce better plans for them?

Sure. I planned to do such verification/demonstration anyway, after
discussing the overall approach.

I planned to give it a try on TPC-DS, but I can start with the TPC-H
queries you propose. I'm not sure whether the poor estimates in Q9 & Q18
come from column correlation though - if it's due to some other issues
(e.g. conditions that are difficult to estimate), this patch can't do
anything with them. But it's a good start.

If so, I will work with you to review this patch.

Thanks!

One aspect of the patch that seems to be missing is a user declaration
of correlation, just as we have for setting n_distinct. It seems like
an even easier place to start to just let the user specify the stats
declaratively. That way we can split the patch into two parts. First,
allow multi column stats that are user declared. Then add user stats
collected by ANALYZE. The first part is possibly contentious and thus
a good initial focus. The second part will have lots of discussion, so
good to skip for a first version.

I'm not a big fan of this approach, for a number of reasons.

Firstly, it only works for "simple" parameters that are trivial to specify
(say, Pearson's correlation coefficient), and the patch does not work with
those at all - it only works with histograms, MCV lists (and might work
with associative rules in the future). And we certainly can't ask users to
specify multivariate histograms - because it's very difficult to do, and
also because complex stats are more susceptible to get stale after adding
new data to the table.

Secondly, even if we add such "simple" parameters to the patch, we have to
come up with a way to apply those parameters to the estimates. The
problem is that as the parameters get simpler, it's less and less useful
to compute the stats.

Another question is whether it should support more than 2 columns ...

The only place where I think this might work are the associative rules.
It's simple to specify rules like ("ZIP code" implies "city") and we could
even do some simple check against the data to see if it actually makes
sense (and 'disable' the rule if not).

But maybe I got it wrong and you have something particular in mind? Can
you give an example of how it would work?

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Katharina Büchse

katharina.buechse@uni-jena.de

about 11 years ago

In reply to: Tomas Vondra (#13)

Re: WIP: multivariate statistics / proof of concept

On 13.11.2014 14:11, Tomas Vondra wrote:

Dne 13 Listopad 2014, 12:31, Simon Riggs napsal(a):

On 12 October 2014 23:00, Tomas Vondra <tv@fuzzy.cz> wrote:

It however seems to be working sufficiently well at this point, enough
to get some useful feedback. So here we go.

This looks interesting and useful.

What I'd like to check before a detailed review is that this has
sufficient applicability to be useful.

My understanding is that Q9 and Q18 of TPC-H have poor plans as a
result of multi-column stats errors.

Could you look at those queries and confirm that this patch can
produce better plans for them?

Sure. I planned to do such verification/demonstration anyway, after
discussing the overall approach.

I planned to give it a try on TPC-DS, but I can start with the TPC-H
queries you propose. I'm not sure whether the poor estimates in Q9 & Q18
come from column correlation though - if it's due to some other issues
(e.g. conditions that are difficult to estimate), this patch can't do
anything with them. But it's a good start.

If so, I will work with you to review this patch.

Thanks!

One aspect of the patch that seems to be missing is a user declaration
of correlation, just as we have for setting n_distinct. It seems like
an even easier place to start to just let the user specify the stats
declaratively. That way we can split the patch into two parts. First,
allow multi column stats that are user declared. Then add user stats
collected by ANALYZE. The first part is possibly contentious and thus
a good initial focus. The second part will have lots of discussion, so
good to skip for a first version.

I'm not a big fan of this approach, for a number of reasons.

Firstly, it only works for "simple" parameters that are trivial to specify
(say, Pearson's correlation coefficient), and the patch does not work with
those at all - it only works with histograms, MCV lists (and might work
with associative rules in the future). And we certainly can't ask users to
specify multivariate histograms - because it's very difficult to do, and
also because complex stats are more susceptible to get stale after adding
new data to the table.

Secondly, even if we add such "simple" parameters to the patch, we have to
come up with a way to apply those parameters to the estimates. The
problem is that as the parameters get simpler, it's less and less useful
to compute the stats.

Another question is whether it should support more than 2 columns ...

The only place where I think this might work are the associative rules.
It's simple to specify rules like ("ZIP code" implies "city") and we could
even do some simple check against the data to see if it actually makes
sense (and 'disable' the rule if not).

and even this simple example has its limits, at least in Germany ZIP
codes are not unique for rural areas, where several villages have the
same ZIP code.

I guess there are just a few examples where columns are completely
functional dependent without any exceptions.
But of course, if the user gives this information just for optimization
the statistics, some exceptions don't matter.
If this information should be used for creating different execution
plans (e.g. on column A is an index and column B is functional
dependent, one could think about using this index on A and the
dependency instead of running through the whole table to find all tuples
that fit the query on column B), exceptions are a very important issue.

But maybe I got it wrong and you have something particular in mind? Can
you give an example of how it would work?

regards
Tomas

--
Dipl.-Math. Katharina Büchse
Friedrich-Schiller-Universität Jena
Institut für Informatik
Lehrstuhl für Datenbanken und Informationssysteme
Ernst-Abbe-Platz 2
07743 Jena
Telefon 03641/946367
Webseite http://users.minet.uni-jena.de/~re89qen/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: Katharina Büchse (#14)

Re: WIP: multivariate statistics / proof of concept

Dne 13 Listopad 2014, 16:51, Katharina Büchse napsal(a):

On 13.11.2014 14:11, Tomas Vondra wrote:

The only place where I think this might work are the associative rules.
It's simple to specify rules like ("ZIP code" implies "city") and we
could
even do some simple check against the data to see if it actually makes
sense (and 'disable' the rule if not).

and even this simple example has its limits, at least in Germany ZIP
codes are not unique for rural areas, where several villages have the
same ZIP code.

I guess there are just a few examples where columns are completely
functional dependent without any exceptions.
But of course, if the user gives this information just for optimization
the statistics, some exceptions don't matter.
If this information should be used for creating different execution
plans (e.g. on column A is an index and column B is functional
dependent, one could think about using this index on A and the
dependency instead of running through the whole table to find all tuples
that fit the query on column B), exceptions are a very important issue.

Yes, exactly. The aim of this patch is "only" improving estimates, not
removing conditions from the plan (e.g. checking only the ZIP code and not
the city name). That certainly can't be done solely based on approximate
statistics, and as you point out most real-world data either contain bugs
or are inherently imperfect (we have the same kind of ZIP/city
inconsistencies in Czech). That's not a big issue for estimates (assuming
only small fraction of rows violates the rule) though.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Kevin Grittner

kgrittn@ymail.com

about 11 years ago

In reply to: Tomas Vondra (#15)

Re: WIP: multivariate statistics / proof of concept

Tomas Vondra <tv@fuzzy.cz> wrote:

Dne 13 Listopad 2014, 16:51, Katharina Büchse napsal(a):

On 13.11.2014 14:11, Tomas Vondra wrote:

The only place where I think this might work are the associative rules.
It's simple to specify rules like ("ZIP code" implies "city") and we could
even do some simple check against the data to see if it actually makes
sense (and 'disable' the rule if not).

and even this simple example has its limits, at least in Germany ZIP
codes are not unique for rural areas, where several villages have the
same ZIP code.

as you point out most real-world data either contain bugs
or are inherently imperfect (we have the same kind of ZIP/city
inconsistencies in Czech).

You can have lots of fun with U.S. zip code, too. Just on the
nominally "Madison, Wisconsin" zip codes (those starting with 537),
there are several exceptions:

select zipcode, city, locationtype
from zipcode
where zipcode like '537%'
and Decommisioned = 'false'
and zipcodetype = 'STANDARD'
and locationtype in ('PRIMARY', 'ACCEPTABLE')
order by zipcode, city;

If you eliminate the quals besides the zipcode column you get 61
rows and it gets much stranger, with legal municipalities that are
completely surrounded by Madison that the postal service would
rather you didn't use in addressing your envelopes, but they have
to deliver to anyway, and organizations inside Madison receiving
enough mail to (literally) have their own zip code -- where the
postal service allows the organization name as a deliverable
"city".

If you want to have your own fun with this data, you can download
it here:

http://federalgovernmentzipcodes.us/free-zipcode-database.csv

I was able to load it into PostgreSQL with this:

create table zipcode
(
recordnumber integer not null,
zipcode text not null,
zipcodetype text not null,
city text not null,
state text not null,
locationtype text not null,
lat double precision,
long double precision,
xaxis double precision not null,
yaxis double precision not null,
zaxis double precision not null,
worldregion text not null,
country text not null,
locationtext text,
location text,
decommisioned text not null,
taxreturnsfiled bigint,
estimatedpopulation bigint,
totalwages bigint,
notes text
);
comment on column zipcode.zipcode is 'Zipcode or military postal code(FPO/APO)';
comment on column zipcode.zipcodetype is 'Standard, PO BOX Only, Unique, Military(implies APO or FPO)';
comment on column zipcode.city is 'offical city name(s)';
comment on column zipcode.state is 'offical state, territory, or quasi-state (AA, AE, AP) abbreviation code';
comment on column zipcode.locationtype is 'Primary, Acceptable,Not Acceptable';
comment on column zipcode.lat is 'Decimal Latitude, if available';
comment on column zipcode.long is 'Decimal Longitude, if available';
comment on column zipcode.location is 'Standard Display (eg Phoenix, AZ ; Pago Pago, AS ; Melbourne, AU )';
comment on column zipcode.decommisioned is 'If Primary location, Yes implies historical Zipcode, No Implies current Zipcode; If not Primary, Yes implies Historical Placename';
comment on column zipcode.taxreturnsfiled is 'Number of Individual Tax Returns Filed in 2008';
copy zipcode from 'filepath' with (format csv, header);
alter table zipcode add primary key (recordnumber);
create unique index zipcode_city on zipcode (zipcode, city);

I bet there are all sorts of correlation possibilities with, for
example, latitude and longitude and other variables. With 81831
rows and so many correlations among the columns, it might be a
useful data set to test with.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: Kevin Grittner (#16)

Re: WIP: multivariate statistics / proof of concept

On 15.11.2014 18:49, Kevin Grittner

If you eliminate the quals besides the zipcode column you get 61
rows and it gets much stranger, with legal municipalities that are
completely surrounded by Madison that the postal service would
rather you didn't use in addressing your envelopes, but they have
to deliver to anyway, and organizations inside Madison receiving
enough mail to (literally) have their own zip code -- where the
postal service allows the organization name as a deliverable
"city".

If you want to have your own fun with this data, you can download
it here:

http://federalgovernmentzipcodes.us/free-zipcode-database.csv

...

I bet there are all sorts of correlation possibilities with, for
example, latitude and longitude and other variables. With 81831
rows and so many correlations among the columns, it might be a
useful data set to test with.

Thanks for the link. I've been looking for a good dataset with such
data, and this one is by far the best one.

The current version of the patch supports only data types passed by
value (i.e. no varlena types - text, ), which means it's impossible to
build multivariate stats on some of the interesting columns (state,
city, ...).

I guess it's time to start working on removing this limitation.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Michael Paquier

michael.paquier@gmail.com

about 11 years ago

In reply to: Tomas Vondra (#17)

Re: WIP: multivariate statistics / proof of concept

On Sun, Nov 16, 2014 at 3:35 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Thanks for the link. I've been looking for a good dataset with such
data, and this one is by far the best one.

The current version of the patch supports only data types passed by
value (i.e. no varlena types - text, ), which means it's impossible to
build multivariate stats on some of the interesting columns (state,
city, ...).

I guess it's time to start working on removing this limitation.

Tomas, what's your status on this patch? Are you planning to make it
more complicated than it is? For now I have switched it to a "Needs
Review" state because even your first version did not get advanced
review (that's quite big btw). I guess that we should switch it to the
next CF.
Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: Michael Paquier (#18)

Re: WIP: multivariate statistics / proof of concept

On 8.12.2014 02:01, Michael Paquier wrote:

On Sun, Nov 16, 2014 at 3:35 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Thanks for the link. I've been looking for a good dataset with such
data, and this one is by far the best one.

The current version of the patch supports only data types passed by
value (i.e. no varlena types - text, ), which means it's impossible to
build multivariate stats on some of the interesting columns (state,
city, ...).

I guess it's time to start working on removing this limitation.

Tomas, what's your status on this patch? Are you planning to make it
more complicated than it is? For now I have switched it to a "Needs
Review" state because even your first version did not get advanced
review (that's quite big btw). I guess that we should switch it to the
next CF.

Hello Michael,

I agree with moving the patch to the next CF - I'm working on the patch,
but I will take a bit more time to submit a new version and I can do
that in the next CF.

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Heikki Linnakangas

hlinnakangas@vmware.com

about 11 years ago

In reply to: Tomas Vondra (#1)

Re: WIP: multivariate statistics / proof of concept

On 10/13/2014 01:00 AM, Tomas Vondra wrote:

Hi,

attached is a WIP patch implementing multivariate statistics.

Great! Really glad to see you working on this.

+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For small number of dimensions it works, but
+	 *       for complex stats it'd be nice use sample proportional to
+	 *       the table (say, 0.5% - 1%) instead of a fixed size.

I don't think a fraction of the table is appropriate. As long as the
sample is random, the accuracy of a sample doesn't depend much on the
size of the population. For example, if you sample 1,000 rows from a
table with 100,000 rows, or 1000 rows from a table with 100,000,000
rows, the accuracy is pretty much the same. That doesn't change when you
go from a single variable to multiple variables.

You do need a bigger sample with multiple variables, however. My gut
feeling is that if you sample N rows for a single variable, with two
variables you need to sample N^2 rows to get the same accuracy. But it's
not proportional to the table size. (I have no proof for that, but I'm
sure there is literature on this.)

+ * Multivariate histograms
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by an array of lower and
+ * upper boundaries, so that for for the i-th attribute
+ *
+ *     min[i] <= value[i] <= max[i]
+ *
+ * Each bucket tracks frequency (fraction of tuples it contains),
+ * information about the inequalities, number of distinct values in
+ * each dimension (which is used when building the histogram) etc.
+ *
+ * The boundaries may be either inclusive or exclusive, or the whole
+ * dimension may be NULL.
+ *
+ * The buckets may overlap (assuming the build algorithm keeps the
+ * frequencies additive) or may not cover the whole space (i.e. allow
+ * gaps). This entirely depends on the algorithm used to build the
+ * histogram.

That sounds pretty exotic. These buckets are quite different from the
single-dimension buckets we currently have.

The paper you reference in partition_bucket() function, M.
Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating
Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference
1988: 28-36, actually doesn't mention overlapping buckets at all. I
haven't read the code in detail, but if it implements the algorithm from
that paper, there will be no overlap.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tomas Vondra

tv@fuzzy.cz

about 11 years ago

In reply to: Heikki Linnakangas (#20)

Re: WIP: multivariate statistics / proof of concept

On 11.12.2014 17:53, Heikki Linnakangas wrote:

On 10/13/2014 01:00 AM, Tomas Vondra wrote:

Hi,

attached is a WIP patch implementing multivariate statistics.

Great! Really glad to see you working on this.
+     * FIXME This sample sizing is mostly OK when computing stats for
+     *       individual columns, but when computing multi-variate stats
+     *       for multivariate stats (histograms, mcv, ...) it's rather
+     *       insufficient. For small number of dimensions it works, but
+     *       for complex stats it'd be nice use sample proportional to
+     *       the table (say, 0.5% - 1%) instead of a fixed size.
I don't think a fraction of the table is appropriate. As long as the
sample is random, the accuracy of a sample doesn't depend much on
the size of the population. For example, if you sample 1,000 rows
from a table with 100,000 rows, or 1000 rows from a table with
100,000,000 rows, the accuracy is pretty much the same. That doesn't
change when you go from a single variable to multiple variables.

I might be wrong, but I doubt that. First, I read a number of papers
while working on this patch, and all of them used samples proportional
to the data set. That's an indirect evidence, though.

You do need a bigger sample with multiple variables, however. My gut
feeling is that if you sample N rows for a single variable, with two
variables you need to sample N^2 rows to get the same accuracy. But
it's not proportional to the table size. (I have no proof for that,
but I'm sure there is literature on this.)

Maybe. I think it's somehow related to the number of buckets (which
somehow determines the precision of the histogram). If you want 1000
buckets, the number of rows scanned needs to be e.g. 10x that. With
multi-variate histograms, we may shoot for more buckets (say, 100 in
each dimension).

+ * Multivariate histograms
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by an array of lower and
+ * upper boundaries, so that for for the i-th attribute
+ *
+ *     min[i] <= value[i] <= max[i]
+ *
+ * Each bucket tracks frequency (fraction of tuples it contains),
+ * information about the inequalities, number of distinct values in
+ * each dimension (which is used when building the histogram) etc.
+ *
+ * The boundaries may be either inclusive or exclusive, or the whole
+ * dimension may be NULL.
+ *
+ * The buckets may overlap (assuming the build algorithm keeps the
+ * frequencies additive) or may not cover the whole space (i.e. allow
+ * gaps). This entirely depends on the algorithm used to build the
+ * histogram.
That sounds pretty exotic. These buckets are quite different from
the single-dimension buckets we currently have.

The paper you reference in partition_bucket() function, M.
Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating
Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference
1988: 28-36, actually doesn't mention overlapping buckets at all. I
haven't read the code in detail, but if it implements the algorithm
from that paper, there will be no overlap.

The algorithm implemented in partition_bucket() is very simple and
naive, and it mostly resembles the algorithm described in the paper. I'm
sure there are differences, it's not a 1:1 implementation, but you're
right it produces non-overlapping buckets.

The point is that I envision more complex algorithms or different
histogram types, and some of them may produce overlapping buckets. Maybe
that's premature comment, and it will turn out it's not really necessary.

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Michael Paquier

michael.paquier@gmail.com

about 11 years ago

In reply to: Tomas Vondra (#19)

Re: WIP: multivariate statistics / proof of concept

On Wed, Dec 10, 2014 at 5:15 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

I agree with moving the patch to the next CF - I'm working on the patch,
but I will take a bit more time to submit a new version and I can do
that in the next CF.

OK cool. I just moved it by myself. I didn't see it yet registered in 2014-12.
Thanks,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Michael Paquier

michael.paquier@gmail.com

almost 11 years ago

In reply to: Michael Paquier (#22)

Re: WIP: multivariate statistics / proof of concept

On Mon, Dec 15, 2014 at 11:55 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Dec 10, 2014 at 5:15 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

I agree with moving the patch to the next CF - I'm working on the patch,
but I will take a bit more time to submit a new version and I can do
that in the next CF.

OK cool. I just moved it by myself. I didn't see it yet registered in 2014-12.

Marked as returned with feedback. No new version showed up in the last
month and this patch was waiting for input from author.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Michael Paquier (#23)

4 attachment(s)

Re: WIP: multivariate statistics / proof of concept

Hi,

attached is an updated version of the multivariate stats patch. This is
going to be a bit longer mail, so I'll put here a small ToC ;-)

1) patch split into 4 parts
2) where to start / documentation
3) state of the code
4) main changes/improvements
5) remaining limitations

The motivation and design ideas, explained in the first message of this
thread are still valid. It might be a good idea to read it first:

/messages/by-id/543AFA15.4080608@fuzzy.cz

BTW if you happen to go to FOSDEM [PGDay], I'll gladly give you an intro
into the patch in person, or discuss the patch in general.

1) Patch split into 4 parts
---------------------------
Firstly, the patch got broken into the following four pieces, to make
the reviews somewhat easier:

1) 0001-shared-infrastructure-and-functional-dependencies.patch

- infrastructure, shared by all the kinds of stats added
in the following patches (catalog, ALTER TABLE, ANALYZE ...)

- implementation of a simple statistics, tracking functional
dependencies between columns (previously called "associative
rules", but that's incorrect for several reasons)

- this does not modify the optimizer in any way

2) 0002-clause-reduction-using-functional-dependencies.patch

- applies the functional dependencies to optimizer (i.e. considers
the rules in clauselist_selectivity())

3) 0003-multivariate-MCV-lists.patch

- multivariate MCV lists (both ANALYZE and optimizer parts)

4) 0004-multivariate-histograms.patch

- multivariate histograms (both ANALYZE and optimizer parts)

You may look at the patches at github here:

https://github.com/tvondra/postgres/tree/multivariate-stats-squashed

The branch is not stable, i.e. I'll rebase / squash / force-push changes
in the future. (There's also multivariate-stats development branch with
unsquashed changes, but you don't want to look at that, trust me.)

The patches are not exactly small (being in the 50-100 kB range), but
that's mostly because of the amount of comments explaining the goals and
implementation details.

2) Where to start / documentation
---------------------------------
I strived to document all the pieces properly, mostly in the form of
comments. There's no sgml documentation at this point, which should
obviously change in the future.

Anyway, I'd suggest reading the first e-mail in this thread, explaining
the ideas, and then these comments:

1) functional dependencies (patch 0001)
- src/backend/utils/mvstats/dependencies.c

2) MCV lists (patch 0003)
- src/backend/utils/mvstats/mcv.c

3) histograms (patch 0004)
- src/backend/utils/mvstats/mcv.c

- also see clauselist_mv_selectivity_mcvlist() in clausesel.c
- also see clauselist_mv_selectivity_histogram() in clausesel.c

4) selectivity estimation (patches 0002-0004)
- all in src/backend/optimizer/path/clausesel.c
- clauselist_selectivity() - overview of how the stats are applied
- clauselist_apply_dependencies() - functional dependencies reduction
- clauselist_mv_selectivity_mcvlist() - MCV list estimation
- clauselist_mv_selectivity_histogram() - histogram estimation

3) State of the code
--------------------
I've spent a fair amount of time testing the patches, and while I
believe there are no segfaults or so, I know parts of the code need a
bit more love.

The part most in need of improvements / comments is probably the code in
clausesel.c - that seems a bit quirky. Reviews / comments regarding this
part of the code are very welcome - I'm sure there are many ways to
improve this part.

There are a few FIXMEs elsewhere (e.g. about memory allocation in the
(de)serialization code), but those are mostly well-defined issues that I
know how to address (at least I believe so).

4) Main changes/improvements
----------------------------
There are many significant improvements. The previous patch version was
in the 'proof of concept' category (missing pieces, knowingly broken in
some areas), the current patch should 'mostly work'.

The patch fixes two most annoying limitations of the first version:

(a) support for all data types (not just those passed by value)
(b) handles NULL values properly
(c) adds support for IS [NOT] NULL clauses

Aside from that the code was significantly improved, there are proper
regression tests and plenty of comments explaining the details.

5) Remaining limitations
------------------------

(a) limited to stats on 8 columns

This is mostly just a 'safeguard' restriction.

(b) only data types with '<' operator

I don't think this will change anytime soon, because all the
algorithms for building the stats rely on this. I don't see
this as a serious limitation though.

Currently this is not handled at all (so the regression tests
do an explicit DELETE from the pg_mv_statistic catalog).

Handling the DROP TABLE won't be difficult, it's similar to the
current stats. Handling ALTER TABLE ... DROP COLUMN will be much
more tricky I guess - should we drop all the stats referencing
that column, or should we just remove it from the stats? Or
should we keep it and treat it as NULL? Not sure what's the best
solution.

(d) limited list of compatible WHERE clauses

The initial patch handled only simple operator clauses

(Var op Constant)

where operator is one of ('<', '<=', '=', '>=', '>'). Now it also
handles IS [NOT] NULL clauses. Adding more clause types should
not be overly difficult - starting with more traditional
'BooleanTest' conditions, or even multi-column conditions
(Var op Var)

which are difficult to estimate using simple-column stats.

(e) optimizer uses single stats per table

This is still true and I don't think this will change soon. i do
have some ideas on how to merge multiple stats etc. but it's
certainly complex stuff, unlikely to happen within this CF. The
patch makes a lot of sense even without this particular feature,
because you can create multiple stats, each suitable for different
queries.

(f) no JOIN conditions

Similarly to the previous point, it's on the TODO but it's not
going to happen in this CF.

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-shared-infrastructure-and-functional-dependencies.patchtext/x-diff; name=0001-shared-infrastructure-and-functional-dependencies.patchDownload

>From 2b8cbad288a0cd8fb5603af447c99f706ba7bbee Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 1/4] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate
stats, most importantly:

- adds a new system catalog (pg_mv_statistic)
- ALTER TABLE ... ADD STATISTICS syntax
- implementation of functional dependencies (the simplest
  type of multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e.
it does not influence the query planning.

FIX: invalid assert in lookup_var_attr_stats()

The current implementation requires a valid 'ltopr'
so that we can sort the sample rows in various ways,
and the assert did verify this by checking that the
function is 'compute_scalar_stats'. This is however
private function in analyze.c, so the check failed
after moving the code into common.c.

Fixed by checking the 'ltopr' operator directly.
Eventually this will be removed, as ltopr is only
needed for histograms (functional dependencies and
MVC lists may be built without it).

FIX: improved comments about functional dependencies
FIX: add magic (MVSTAT_DEPS_MAGIC) into MVDependencies
FIX: improved analysis of functional dependencies

Changes:

- decreased minimum group size
- count contradicting rows ('not supporting' ones)

The algorithm is still rather simple and probably needs
other improvements.

FIX: add pg_mv_stats_dependencies_show() function

This function actually prints the rules, not just some basic
info (number of rules) as  pg_mv_stats_dependencies_info().

FIX: (dependencies != NULL) in pg_mv_stats_dependencies_info()

STRICT is not a solution, because the deserialization may fail
for some reason (corrupted data, ...)

FIX: rename 'associative rules' to 'functional dependencies'

It's a more appropriate name as functional dependencies,
as defined in relational theory (esp. Normal Forms) are
tracking column-level dependencies.

Associative (or more correctly 'association') rules are
tracking dependencies between particular values, and not
necessarily in different columns (shopping bag analysis).

Also, did a bunch of comment improvements, minor fixes.

This does not include changes in clausesel.c!

FIX: remove obsolete Assert() enforcing typbyval types
---
 src/backend/catalog/Makefile               |   1 +
 src/backend/catalog/system_views.sql       |  10 +
 src/backend/commands/analyze.c             |  17 +-
 src/backend/commands/tablecmds.c           | 149 +++++++-
 src/backend/nodes/copyfuncs.c              |  15 +-
 src/backend/parser/gram.y                  |  67 +++-
 src/backend/utils/Makefile                 |   2 +-
 src/backend/utils/cache/syscache.c         |  12 +
 src/backend/utils/mvstats/Makefile         |  17 +
 src/backend/utils/mvstats/common.c         | 272 ++++++++++++++
 src/backend/utils/mvstats/common.h         |  70 ++++
 src/backend/utils/mvstats/dependencies.c   | 554 +++++++++++++++++++++++++++++
 src/include/catalog/indexing.h             |   5 +
 src/include/catalog/pg_mv_statistic.h      |  69 ++++
 src/include/catalog/pg_proc.h              |   5 +
 src/include/catalog/toasting.h             |   1 +
 src/include/nodes/nodes.h                  |   1 +
 src/include/nodes/parsenodes.h             |  11 +-
 src/include/utils/mvstats.h                |  86 +++++
 src/include/utils/syscache.h               |   1 +
 src/test/regress/expected/rules.out        |   8 +
 src/test/regress/expected/sanity_check.out |   1 +
 22 files changed, 1365 insertions(+), 9 deletions(-)
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..d6c16f8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 4bc874f..da957fc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -152,6 +152,16 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5de2b39..a02dcb2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -54,7 +55,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Data structure for Algorithm S from Knuth 3.4.2 */
 typedef struct
@@ -110,7 +115,6 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
-
 /*
  *	analyze_rel() -- analyze one relation
  */
@@ -472,6 +476,13 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 * 
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For small number of dimensions it works, but
+	 *       for complex stats it'd be nice use sample proportional to
+	 *       the table (say, 0.5% - 1%) instead of a fixed size.
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -574,6 +585,9 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
@@ -2819,3 +2833,4 @@ compare_mcvs(const void *a, const void *b)
 
 	return da - db;
 }
+
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 66d5083..3ec1a5a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -91,7 +92,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -139,8 +140,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
@@ -415,7 +417,8 @@ static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKM
 static void ATExecGenericOptions(Relation rel, List *options);
 static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
-
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
 				   ForkNumber forkNum, char relpersistence);
 static const char *storage_name(char c);
@@ -2965,6 +2968,7 @@ AlterTableGetLockLevel(List *cmds)
 				 * updates.
 				 */
 			case AT_SetStatistics:		/* Uses MVCC in getTableAttrs() */
+			case AT_AddStatistics:		/* XXX not sure if the right level */
 			case AT_ClusterOn:	/* Uses MVCC in getIndexes() */
 			case AT_DropCluster:		/* Uses MVCC in getIndexes() */
 			case AT_SetOptions:	/* Uses MVCC in getTableAttrs() */
@@ -3119,6 +3123,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_ADD_CONSTR;
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
+		case AT_AddStatistics:	/* XXX maybe not the right place */
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* Performs own permission checks */
 			ATPrepSetStatistics(rel, cmd->name, cmd->def, lockmode);
@@ -3414,6 +3419,9 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			ATExecSetStatistics(rel, cmd->name, cmd->def, lockmode);
 			break;
+		case AT_AddStatistics:		/* ADD STATISTICS */
+			ATExecAddStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 			ATExecSetOptions(rel, cmd->name, cmd->def, false, lockmode);
 			break;
@@ -11605,3 +11613,136 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 
 	ReleaseSysCache(tuple);
 }
+
+/* used for sorting the attnums in ATExecAddStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the ALTER TABLE ... ADD STATISTICS (options) ON (columns).
+ *
+ * The code is an unholy mix of pieces that really belong to other parts
+ * of the source tree.
+ *
+ * FIXME Check that the types are pass-by-value and support sort,
+ *       although maybe we can live without the sort (and only build
+ *       MCV list / association rules).
+ *
+ * FIXME This should probably check for duplicate stats (i.e. same
+ *       keys, same options). Although maybe it's useful to have
+ *       multiple stats on the same columns with different options
+ *       (say, a detailed MCV-only stats for some queries, histogram
+ *       for others, etc.)
+ */
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+
+	/* by default build everything */
+	bool 	build_dependencies = true;
+
+	Assert(IsA(def, StatisticsDef));
+
+	/* transform the column names to attnum values */
+
+	foreach(l, def->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, def->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	heap_freetuple(htup);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	return;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1a24f5..eb406ff 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3909,6 +3909,17 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static StatisticsDef *
+_copyStatisticsDef(const StatisticsDef *from)
+{
+	StatisticsDef  *newnode = makeNode(StatisticsDef);
+
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4723,6 +4734,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_StatisticsDef:
+			retval = _copyStatisticsDef(from);
+			break;
 		case T_PrivGrantee:
 			retval = _copyPrivGrantee(from);
 			break;
@@ -4735,7 +4749,6 @@ copyObject(const void *from)
 		case T_XmlSerialize:
 			retval = _copyXmlSerialize(from);
 			break;
-
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(from));
 			retval = 0;			/* keep compiler quiet */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 679e1bb..7a89f6c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -366,6 +366,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_generic_options alter_generic_options
 				relation_expr_list dostmt_opt_list
 
+%type <list>	OptStatsOptions 
+%type <str>		stats_options_name
+%type <node>	stats_options_arg
+%type <defelt>	stats_options_elem
+%type <list>	stats_options_list
+
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -484,7 +491,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <keyword> unreserved_keyword type_func_name_keyword
 %type <keyword> col_name_keyword reserved_keyword
 
-%type <node>	TableConstraint TableLikeClause
+%type <node>	TableConstraint TableLikeClause TableStatistics
 %type <ival>	TableLikeOptionList TableLikeOption
 %type <list>	ColQualList
 %type <node>	ColConstraint ColConstraintElem ConstraintAttr
@@ -2312,6 +2319,14 @@ alter_table_cmd:
 					n->subtype = AT_DisableRowSecurity;
 					$$ = (Node *)n;
 				}
+			/* ALTER TABLE <name> ADD STATISTICS (options) ON (columns) ... */
+			| ADD_P TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_AddStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
 			| alter_generic_options
 				{
 					AlterTableCmd *n = makeNode(AlterTableCmd);
@@ -3382,6 +3397,56 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				ALTER TABLE relname ADD STATISTICS (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+TableStatistics:
+			STATISTICS OptStatsOptions ON '(' columnList ')'
+				{
+					StatisticsDef *n = makeNode(StatisticsDef);
+					n->keys  = $5;
+					n->options  = $2;
+					$$ = (Node *) n;
+				}
+		;
+
+OptStatsOptions:
+			'(' stats_options_list ')'		{ $$ = $2; }
+			| /*EMPTY*/						{ $$ = NIL; }
+		;
+
+stats_options_list:
+			stats_options_elem
+				{
+					$$ = list_make1($1);
+				}
+			| stats_options_list ',' stats_options_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+stats_options_elem:
+			stats_options_name stats_options_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+stats_options_name:
+			NonReservedWord			{ $$ = $1; }
+		;
+
+stats_options_arg:
+			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
+			| NumericOnly			{ $$ = (Node *) $1; }
+			| /* EMPTY */			{ $$ = NULL; }
+		;
+
 
 /*****************************************************************************
  *
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..f61ef7e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -499,6 +500,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		128
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..36757d5
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,272 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	MVStats mvstats;
+	int		nmvstats;
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel), &nmvstats, false);
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MVDependencies	deps  = NULL;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = mvstats[i].stakeys;
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, natts, vacattrstats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(mvstats[i].mvoid, deps);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+MVStats
+list_mv_stats(Oid relid, int *nstats, bool built_only)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	MVStats		result;
+
+	/* start with 16 items, that should be enough for most cases */
+	int maxitems = 16;
+	result = (MVStats)palloc0(sizeof(MVStatsData) * maxitems);
+	*nstats = 0;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		/*
+		 * Skip statistics that were not computed yet (if only stats
+		 * that were already built were requested)
+		 */
+		if (built_only && (! stats->deps_built))
+			continue;
+
+		/* double the array size if needed */
+		if (*nstats == maxitems)
+		{
+			maxitems *= 2;
+			result = (MVStats)repalloc(result, sizeof(MVStatsData) * maxitems);
+		}
+
+		result[*nstats].mvoid = HeapTupleGetOid(htup);
+		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		result[*nstats].deps_built = stats->deps_built;
+		*nstats += 1;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting Datum[] (row of Datums) when
+ * counting distinct values.
+ */
+int
+compare_scalars_memcmp(const void *a, const void *b, void *arg)
+{
+	Size		len = *(Size*)arg;
+
+	return memcmp(a, b, len);
+}
+
+int
+compare_scalars_memcmp_2(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(Datum));
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..f511c4e
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+
+typedef struct
+{
+	int			count;			/* # of duplicates */
+	int			first;			/* values[] index of first occurrence */
+} ScalarMCVItem;
+
+typedef struct
+{
+	SortSupport ssup;
+	int		   *tupnoLink;
+} CompareScalarsContext;
+
+
+VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
+int compare_scalars_memcmp(const void *a, const void *b, void *arg);
+int compare_scalars_memcmp_2(const void *a, const void *b);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..b900efd
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,554 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+/*
+ * Mine functional dependencies between columns, in the form (A => B),
+ * meaning that a value in column 'A' determines value in 'B'. A simple
+ * artificial example may be a table created like this
+ *
+ *     CREATE TABLE deptest (a INT, b INT)
+ *        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+ *
+ * Clearly, once we know the value for 'A' we can easily determine the
+ * value of 'B' by dividing (A/10). A more practical example may be
+ * addresses, where (ZIP code => city name), i.e. once we know the ZIP,
+ * we probably know which city it belongs to. Larger cities usually have
+ * multiple ZIP codes, so the dependency can't be reversed.
+ *
+ * Functional dependencies are a concept well described in relational
+ * theory, especially in definition of normalization and "normal forms".
+ * Wikipedia has a nice definition of a functional dependency [1]:
+ *
+ *     In a given table, an attribute Y is said to have a functional
+ *     dependency on a set of attributes X (written X -> Y) if and only
+ *     if each X value is associated with precisely one Y value. For
+ *     example, in an "Employee" table that includes the attributes
+ *     "Employee ID" and "Employee Date of Birth", the functional
+ *     dependency {Employee ID} -> {Employee Date of Birth} would hold.
+ *     It follows from the previous two sentences that each {Employee ID}
+ *     is associated with precisely one {Employee Date of Birth}.
+ *
+ * [1] http://en.wikipedia.org/wiki/Database_normalization
+ *
+ * Most datasets might be normalized not to contain any such functional
+ * dependencies, but sometimes it's not practical. In some cases it's
+ * actually a conscious choice to model the dataset in denormalized way,
+ * either because of performance or to make querying easier.
+ *
+ * The current implementation supports only dependencies between two
+ * columns, but this is merely a simplification of the initial patch.
+ * It's certainly useful to mine for dependencies involving multiple
+ * columns on the 'left' side, i.e. a condition for the dependency.
+ * That is dependencies [A,B] => C and so on.
+ *
+ * Handling multiple columns on the right side is not necessary, as such
+ * dependencies may be decomposed into a set of dependencies with
+ * the same meaning, one for each column on the right side. For example
+ *
+ *     A => [B,C]
+ *
+ * is exactly the same as
+ *
+ *     (A => B) & (A => C).
+ *
+ * Of course, storing (A => [B, C]) may be more efficient thant storing
+ * the two dependencies (A => B) and (A => C) separately.
+ *
+ *
+ * Dependency mining (ANALYZE)
+ * ---------------------------
+ *
+ * FIXME Add more details about how build_mv_dependencies() works
+ *       (minimum group size, supporting/contradicting etc.).
+ *
+ * Real-world datasets are imperfect - there may be errors (e.g. due to
+ * data-entry mistakes), or factually correct records, yet contradicting
+ * the dependency (e.g. when a city splits into two, but both keep the
+ * same ZIP code). A strict ANALYZE implementation (where the functional
+ * dependencies are identified) would ignore dependencies on such noisy
+ * data, making the approach unusable in practice.
+ *
+ * The proposed implementation attempts to handle such noisy cases
+ * gracefully, by tolerating small number of contradicting cases.
+ *
+ * In the future this might also perform some sort of test and decide
+ * whether it's worth building any other kind of multivariate stats,
+ * or whether the dependencies sufficiently describe the data. Or at
+ * least not build the MCV list / histogram on the implied columns.
+ * Such reduction would however make the 'verification' (see the next
+ * section) impossible.
+ *
+ *
+ * Clause reduction (planner/optimizer)
+ * ------------------------------------
+ *
+ * FIXME Explain how reduction works.
+ *
+ * The problem with the reduction is that the query may use conditions
+ * that are not redundant, but in fact contradictory - e.g. the user
+ * may search for a ZIP code and a city name not matching the ZIP code.
+ *
+ * In such cases, the condition on the city name is not actually
+ * redundant, but actually contradictory (making the result empty), and
+ * removing it while estimating the cardinality will make the estimate
+ * worse.
+ *
+ * The current estimation assuming independence (and multiplying the
+ * selectivities) works better in this case, but only by utter luck.
+ *
+ * In some cases this might be verified using the other multivariate
+ * statistics - MCV lists and histograms. For MCV lists the verification
+ * might be very simple - peek into the list if there are any items
+ * matching the clause on the 'A' column (e.g. ZIP code), and if such
+ * item is found, check that the 'B' column matches the other clause.
+ * If it does not, the clauses are contradictory. We can't really say
+ * if such item was not found, except maybe restricting the selectivity
+ * using the MCV data (e.g. using min/max selectivity, or something).
+ *
+ * With histograms, it might work similarly - we can't check the values
+ * directly (because histograms use buckets, unlike MCV lists, storing
+ * the actual values). So we can only observe the buckets matching the
+ * clauses - if those buckets have very low frequency, it probably means
+ * the two clauses are incompatible.
+ *
+ * It's unclear what 'low frequency' is, but if one of the clauses is
+ * implied (automatically true because of the other clause), then
+ *
+ *     selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+ *
+ * So we might compute selectivity of the first clause (on the column
+ * A in dependency [A=>B]) - for example using regular statistics.
+ * And then check if the selectivity computed from the histogram is
+ * about the same (or significantly lower).
+ *
+ * The problem is that histograms work well only when the data ordering
+ * matches the natural meaning. For values that serve as labels - like
+ * city names or ZIP codes, or even generated IDs, histograms really
+ * don't work all that well. For example sorting cities by name won't
+ * match the sorting of ZIP codes, rendering the histogram unusable.
+ *
+ * The MCV are probably going to work much better, because they don't
+ * really assume any sort of ordering. And it's probably more appropriate
+ * for the label-like data.
+ *
+ * TODO Support dependencies with multiple columns on left/right.
+ *
+ * TODO Investigate using histogram and MCV list to confirm the
+ *      functional dependencies.
+ *
+ * TODO Investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram/MCV list).
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ *
+ * TODO Consider eliminating the implied columns from the histogram and
+ *      MCV lists (but maybe that's not a good idea).
+ *
+ * FIXME Not sure if this handles NULL values properly (not sure how to
+ *       do that). We assume that NULL means 0 for now, handling it just
+ *       like any other value.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	bool isNull;
+	Size len = 2 * sizeof(Datum);	/* only simple associations a => b */
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct columns in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we should expected to
+	 *      observe in the sample - we can then use the average group
+	 *      size as a threshold. That seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/* info for the interesting attributes only
+	 *
+	 * TODO Compute this only once and pass it to all the methods
+	 *      that need it.
+	 */
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/* We'll reuse the same array for all the combinations */
+	Datum * values = (Datum*)palloc0(numrows * 2 * sizeof(Datum));
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			Datum val_a, val_b;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				values[i*2]   = heap_getattr(rows[i], attrs->values[dima], stats[dima]->tupDesc, &isNull);
+				values[i*2+1] = heap_getattr(rows[i], attrs->values[dimb], stats[dimb]->tupDesc, &isNull);
+			}
+
+			qsort_arg((void *) values, numrows, sizeof(Datum) * 2, compare_scalars_memcmp, &len);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			val_a = values[0];
+			val_b = values[1];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				if (values[2*i] != val_a)	/* end of the group */
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 */
+					n_supporting += ((n_violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
+					n_contradicting += (n_violations != 0) ? 1 : 0;
+
+					n_supporting_rows += ((n_violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
+					n_contradicting_rows += (n_violations > 0) ? group_size : 0;
+
+					/* current values start a new group */
+					val_a = values[2*i];
+					val_b = values[2*i+1];
+					n_violations = 0;
+					group_size = 1;
+				}
+				else
+				{
+					if (values[2*i+1] != val_b)	/* mismatch of a B value is contradicting */
+					{
+						val_b = values[2*i+1];
+						n_violations += 1;
+					}
+
+					group_size += 1;
+				}
+			}
+
+			/* handle the last group */
+			n_supporting += ((n_violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
+			n_contradicting += (n_violations != 0) ? 1 : 0;
+			n_supporting_rows += ((n_violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
+			n_contradicting_rows += (n_violations > 0) ? group_size : 0;
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means the columns have the same values (or one is a 'label'),
+			 *      making the conditions rather redundant. Although it's possible
+			 *      that the query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(values);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+bytea *
+fetch_mv_dependencies(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *stadeps = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum deps = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stadeps, &isnull);
+
+		Assert(!isnull);
+
+		stadeps = DatumGetByteaP(deps);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return stadeps;
+}
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..048cd7c 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,11 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3277, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3277
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3278, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3278
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..76b7db7
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3281
+
+CATALOG(pg_mv_statistic,3281)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_attrdef
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					5
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_deps_enabled		2
+#define Anum_pg_mv_statistic_deps_built			3
+#define Anum_pg_mv_statistic_stakeys			4
+#define Anum_pg_mv_statistic_stadeps			5
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9edfdb8..9fb118a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2683,6 +2683,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3284 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3285 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index cba4ae7..bf11005 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3279, 3280);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..f1d79eb 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -413,6 +413,7 @@ typedef enum NodeTag
 	T_XmlSerialize,
 	T_WithClause,
 	T_CommonTableExpr,
+	T_StatisticsDef,
 
 	/*
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..b8700dd 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -543,6 +543,14 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct StatisticsDef
+{
+	NodeTag		type;
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+} StatisticsDef;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1340,7 +1348,8 @@ typedef enum AlterTableType
 	AT_ReplicaIdentity,			/* REPLICA IDENTITY */
 	AT_EnableRowSecurity,		/* ENABLE ROW SECURITY */
 	AT_DisableRowSecurity,		/* DISABLE ROW SECURITY */
-	AT_GenericOptions			/* OPTIONS (...) */
+	AT_GenericOptions,			/* OPTIONS (...) */
+	AT_AddStatistics			/* add statistics */
 } AlterTableType;
 
 typedef struct ReplicaIdentityStmt
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..2b59c2d
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,86 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "commands/vacuum.h"
+
+/*
+ * Basic info about the stats, used when choosing what to use
+ * 
+ * TODO Add info about what statistics is available (histogram, MCV,
+ *      hashed MCV, functional dependencies).
+ */
+typedef struct MVStatsData {
+	Oid			mvoid;		/* OID of the stats in pg_mv_statistic */
+	int2vector *stakeys;	/* attnums for columns in the stats */
+	bool		deps_built;	/* functional dependencies available */
+} MVStatsData;
+
+typedef struct MVStatsData *MVStats;
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
+
+bytea * fetch_mv_dependencies(Oid mvoid);
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  int natts, VacAttrStats **vacattrstats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies);
+
+#endif
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..12147ab 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 80c3351..82c2659 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1353,6 +1353,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..00f5fe7 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
2.0.5

0002-clause-reduction-using-functional-dependencies.patchtext/x-diff; name=0002-clause-reduction-using-functional-dependencies.patchDownload

>From 4881b97548ed6fa8acbef153562da617ea7e58cb Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 16 Jan 2015 22:33:41 +0100
Subject: [PATCH 2/4] clause reduction using functional dependencies

During planning, use functional dependencies to decide
which clauses to skip during cardinality estimation.
Initial and rather simplistic implementation.

FIX: second part of the rename to functional dependencies
FIX: don't build functional dependencies by default
FIX: build deps only when requested
FIX: use treat_as_join_clause() in clause_is_mv_compatible()

We don't want to process clauses that are used for joining,
but only simple WHERE clauses.

FIX: use planner_rt_fetch() to identify relation

The clause_is_mv_compatible() needs to identify the relation
(so that we can fetch the list of multivariate stats by OID).
planner_rt_fetch() seems like the appropriate way to get the
relation OID, but apparently it only works with simple vars.
Maybe examine_variable() would make this work with more
complex vars too?

FIX: comment about functional dependencies and transitivity
FIX: comment about multi-column functional dependencies
FIX: test: functional dependencies / ANALYZE

Test analyzing functional dependencies (part of ANALYZE)
on several datasets (no dependencies, no transitive
dependencies, ...).

FIX: test: clause reduction using function dependencies / EXPLAIN

Checks that a query with conditions on two columns, where one (B)
is functionally dependent on the other one (A), correctly ignores
the clause on (B) and chooses bitmap index scan instead of plain
index scan (which is what happens otherwise, thanks to assumption
of independence).

FIX: comment about building multi-column dependencies (TODO)
FIX: support functional dependencies on all data types

Until now build_mv_dependencies() only supported data types
passed by value (i.e. not varlena types or types passed by
reference). This commit adds support for these data types
by using SortSupport to do the sorting.

This however keeps the 'typbyval' assert in common.c:

    Assert(stats[i]->attrtype->typbyval);

as that method is used for all types of multivariate stats
and we don't want to make this work for all of them. If
you want to play with functional dependencies on columns
with such data types, comment this assert out.

FIX: support NULL values in functional dependencies
FIX: typo in regression test of functional dependencies
FIX: added regression test for functional dependencies with TEXT
FIX: rework build_mv_dependencies() not to fail with mixed columns
FIX: readability improvement in build_mv_dependencies()
FIX: readability fixes in build_mv_dependencies()
FIX: regression test - dependencies with mix of data types / NULLs
FIX: minor formatting fixes in build_mv_dependencies()
FIX: comment about efficient building of multi-column dependencies
FIX: comment about proper NULL handling in build_mv_dependencies()
FIX: minor comment in build_mv_dependencies()
FIX: comment about handling NULLs like regular values (dependencies)
FIX: explanation of allocations in build_mv_dependencies()
FIX: move multisort typedefs/functions to common.h/c
FIX: check that at least some statistics were requested (dependencies)
FIX: comment about handling NULL values in dependencies
FIX: minor improvements in mvstat.h (functional dependencies)
FIX: add regression test for ADD STATISTICS options (dependencies)
FIX: comment about multivariate stats at clauselist_selectivity()
FIX: updated comments in clausesel.c (dependencies)
FIX: note in clauselist_selectivity()
FIX: fixed typo in tablecmds.c (comma -> semicolon)
FIX: make regression tests parallel-happy (functional dependencies)
---
 src/backend/commands/tablecmds.c              |   8 +-
 src/backend/optimizer/path/clausesel.c        | 476 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/common.c            |  86 ++++-
 src/backend/utils/mvstats/common.h            |  22 ++
 src/backend/utils/mvstats/dependencies.c      | 170 +++++++--
 src/include/utils/mvstats.h                   |  23 +-
 src/test/regress/expected/mv_dependencies.out | 175 ++++++++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 153 +++++++++
 10 files changed, 1076 insertions(+), 41 deletions(-)
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3ec1a5a..3c82b89 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11651,7 +11651,7 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	Relation	mvstatrel;
 
 	/* by default build everything */
-	bool 	build_dependencies = true;
+	bool 	build_dependencies = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11713,6 +11713,12 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 							opt->defname)));
 	}
 
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index dcac1c1..36e5bce 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -24,6 +24,12 @@
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
+#include "utils/mvstats.h"
+#include "catalog/pg_collation.h"
+#include "utils/typcache.h"
+
+#include "parser/parsetree.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -43,6 +49,16 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 
+static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+							 Oid *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+
+static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
+									  Oid varRelid, Oid *relid, SpecialJoinInfo *sjinfo);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+								Oid varRelid, int nmvstats, MVStats mvstats,
+								SpecialJoinInfo *sjinfo);
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -61,7 +77,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -88,6 +104,76 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  *
  * Of course this is all very dependent on the behavior of
  * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ *
+ *
+ * Multivariate statististics
+ * --------------------------
+ * This also uses multivariate stats to estimate combinations of conditions,
+ * in a way attempting to minimize the overhead when there are no suitable
+ * multivariate stats.
+ *
+ * The following checks are performed (in this order), and the optimizer
+ * falls back to regular stats on the first 'false'.
+ *
+ * NOTE: This explains how this works with all the patches applied, not
+ *       just the functional dependencies.
+ *
+ * (1) check that at least two columns are referenced from conditions
+ *     compatible with multivariate stats
+ *
+ *     If there are no conditions that might be handled by multivariate
+ *     stats, or if the conditions reference just a single column, it
+ *     makes no sense to use multivariate stats.
+ *
+ *     What conditions are compatible with multivariate stats is decided
+ *     by clause_is_mv_compatible(). At this moment, only simple conditions
+ *     of the form "column operator constant" (for simple comparison
+ *     operators), and IS NULL / IS NOT NULL are considered compatible
+ *     with multivariate statistics.
+ *
+ * (2) reduce the clauses using functional dependencies
+ *
+ *     This simply attempts to 'reduce' the clauses by applying functional
+ *     dependencies. For example if there are two clauses:
+ *
+ *         WHERE (a = 1) AND (b = 2)
+ *
+ *     and we know that 'a' determines the value of 'b', we may remove
+ *     the second condition (b = 2) when computing the selectivity.
+ *     This is of course tricky - see mvstats/dependencies.c for details.
+ *
+ *     After the reduction, step (1) is to be repeated.
+ *
+ * (3) check if there are multivariate stats built on the columns
+ *
+ *     If there are no multivariate statistics, we have to fall back to
+ *     the regular stats. We might perform checks (1) and (2) in reverse
+ *     order, i.e. first check if there are multivariate statistics and
+ *     then collect the attributes only if needed. The assumption is
+ *     that checking the clauses is cheaper than querying the catalog,
+ *     so this check is performed first.
+ *
+ * (4) choose the stats matching the most columns (at least two)
+ *
+ *     If there are multiple instances of multivariate statistics (e.g.
+ *     built on different sets of columns), we choose the stats covering
+ *     the most columns from step (1). It may happen that all available
+ *     stats match just a single column - for example with conditions
+ *
+ *         WHERE a = 1 AND b = 2
+ *
+ *     and statistics built on (a,c) and (b,c). In such case just fall
+ *     back to the regular stats because it makes no sense to use the
+ *     multivariate statistics.
+ *
+ *     This selection criteria (the most columns) is certainly very
+ *     simple and definitely not optimal - it's simple to come up with
+ *     examples where other approaches work better. More about this
+ *     at choose_mv_statistics().
+ *
+ * (5) use the multivariate stats to estimate matching clauses
+ *
+ * (6) estimate the remaining clauses using the regular statistics
  */
 Selectivity
 clauselist_selectivity(PlannerInfo *root,
@@ -100,6 +186,14 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+	int			nmvstats = 0;
+	MVStats		mvstats = NULL;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +202,28 @@ clauselist_selectivity(PlannerInfo *root,
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
+	/* collect attributes referenced by mv-compatible clauses */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+
+	/*
+	 * If there are mv-compatible clauses, referencing at least two
+	 * different columns (otherwise it makes no sense to use mv stats),
+	 * try to reduce the clauses using functional dependencies, and
+	 * recollect the attributes from the reduced list.
+	 *
+	 * We don't need to select a single statistics for this - we can
+	 * apply all the functional dependencies we have.
+	 */
+	if (bms_num_members(mvattnums) >= 2)
+	{
+		/* fetch info from the catalog (not the serialized stats yet) */
+		mvstats = list_mv_stats(relid, &nmvstats, true);
+
+		/* reduce clauses by applying functional dependencies rules */
+		clauses = clauselist_apply_dependencies(root, clauses, varRelid,
+												nmvstats, mvstats, sjinfo);
+	}
+
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -782,3 +898,361 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ * 
+ */
+static Bitmapset *
+collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
+				   Oid *relid, SpecialJoinInfo *sjinfo)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ * 
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+						Oid *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* no support for OR clauses at this point */
+		if (rinfo->orclause)
+			return false;
+
+		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
+		clause = (Node*)rinfo->clause;
+
+		/* only simple opclauses are compatible with multivariate stats */
+		if (! is_opclause(clause))
+			return false;
+
+		/* we don't support join conditions at this moment */
+		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+			return false;
+
+		/* is it 'variable op constant' ? */
+		if (list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+				RangeTblEntry * rte;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+				 * (return NULL).
+				 *
+				 * TODO Maybe use examine_variable() would fix that?
+				 */
+				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+					return false;
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 *
+				 * FIXME I suspet this may not be really necessary. The (varRelid == 0)
+				 *       part seems to be enforced by treat_as_join_clause().
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				/* Lookup info about the base relation (we need to pass the OID out) */
+				rte = planner_rt_fetch(var->varno, root);
+				*relid = rte->relid;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_SCALARLTSEL:
+						case F_SCALARGTSEL:
+						case F_EQSEL:
+							*attnum = var->varattno;
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * Performs reduction of clauses using functional dependencies, i.e.
+ * removes clauses that are considered redundant. It simply walks
+ * through dependencies, and checks whether the dependency 'matches'
+ * the clauses, i.e. if there's a clause matching the condition. If yes,
+ * all clauses matching the implied part of the dependency are removed
+ * from the list.
+ *
+ * This simply looks at attnums references by the clauses, not at the
+ * type of the operator (equality, inequality, ...). This may not be the
+ * right way to do - it certainly works best for equalities, which is
+ * naturally consistent with functional dependencies (implications).
+ * It's not clear that other operators are handled sensibly - for
+ * example for inequalities, like
+ *
+ *     WHERE (A >= 10) AND (B <= 20)
+ *
+ * and a trivial case where [A == B], resulting in symmetric pair of
+ * rules [A => B], [B => A], it's rather clear we can't remove either of
+ * those clauses.
+ *
+ * That only highlights that functional dependencies are most suitable
+ * for label-like data, where using non-equality operators is very rare.
+ * Using the common city/zipcode example, clauses like
+ *
+ *     (zipcode <= 12345)
+ *
+ * or
+ *
+ *     (cityname >= 'Washington')
+ *
+ * are rare. So restricting the reduction to equality should not harm
+ * the usefulness / applicability.
+ *
+ * The other assumption is that this assumes 'compatible' clauses. For
+ * example by using mismatching zip code and city name, this is unable
+ * to identify the discrepancy and eliminates one of the clauses. The
+ * usual approach (multiplying both selectivities) thus produces a more
+ * accurate estimate, although mostly by luck - the multiplication
+ * comes from assumption of statistical independence of the two
+ * conditions (which is not not valid in this case), but moves the
+ * estimate in the right direction (towards 0%).
+ *
+ * This might be somewhat improved by cross-checking the selectivities
+ * against MCV and/or histogram.
+ *
+ * The implementation needs to be careful about cyclic rules, i.e. rules
+ * like [A => B] and [B => A] at the same time. This must not reduce
+ * clauses on both attributes at the same time.
+ *
+ * Technically we might consider selectivities here too, somehow. E.g.
+ * when (A => B) and (B => A), we might use the clauses with minimum
+ * selectivity.
+ *
+ * TODO Consider restricting the reduction to equality clauses. Or maybe
+ *      use equality classes somehow?
+ *
+ * TODO Merge this docs to dependencies.c, as it's saying mostly the
+ *      same things as the comments there.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses, Oid varRelid,
+					int nmvstats, MVStats mvstats, SpecialJoinInfo *sjinfo)
+{
+	int i;
+	ListCell *lc;
+	List * reduced_clauses = NIL;
+	Oid	relid;
+
+	/*
+	 * preallocate space for all clauses, including non-mv-compatible,
+	 * so that we don't need to reallocate the arrays repeatedly
+	 */
+	bool	   *reduced   = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+	AttrNumber *mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+	Node	  **mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	int nmvclauses = 0;	/* number of mv-compatible clauses */
+
+	/*
+	 * Walk through the clauses - clauses that are not mv-compatible copy
+	 * directly into the result list, and mv-compatible ones store into
+	 * an array of clauses (and remember the attnumb in another array).
+	 */
+	foreach (lc, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+		if (! clause_is_mv_compatible(root, clause, varRelid, &relid, &attnum, sjinfo))
+			lappend(reduced_clauses, clause);
+		else
+		{
+			mvclauses[nmvclauses] = clause;
+			mvattnums[nmvclauses] = attnum;
+			nmvclauses++;
+		}
+	}
+
+	Assert(nmvclauses >= 2);
+
+	/* walk through all the mvstats, and try to apply all the rules */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int j;
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! mvstats[i].deps_built)
+			continue;
+
+		/* fetch dependencies */
+		dependencies = deserialize_mv_dependencies(fetch_mv_dependencies(mvstats->mvoid));
+		if (dependencies == NULL)
+			continue;
+
+		/*
+		 * Walk through the dependencies and eliminate all the implied
+		 * clauses, i.e. when there's a rule [A => B], and if we find
+		 * a clause referencing column A (not yet eliminated), eliminate
+		 * all clauses referencing "B".
+		 *
+		 * This is imperfect for a number of reasons. First, this greedy
+		 * approach does not guarantee eliminating the most clauses.
+		 * For example consider dependency [A => B] and [B => A], and
+		 * three clauses referencing A, A and B, i.e. something like
+		 *
+		 *     WHERE (A >= 10) AND (A <= 20) AND (B = 20)
+		 *
+		 * Then by considering the dependency [A => B] a single clause
+		 * on B is eliminated, while by considering [B => A], both
+		 * clauses on A are eliminated.
+		 *
+		 * The order of the dependencies may be either due to ordering
+		 * within a single pg_mv_statistics record, or due to rules
+		 * placed in different records.
+		 *
+		 * Possible solutions:
+		 *
+		 * (a) backtracking/recursion, with tracking of how many clauses
+		 *     were eliminated
+		 *
+		 * (b) building adjacency matrix (where A and B are adjacent
+		 *     when [A => B]), and multiplying it to construct
+		 *     transitive implications. I.e. by having [A=>B] and [B=>C]
+		 *     this also results in [A=>C]. Then we can simply choose
+		 *     the attribute that eliminates the most clauses (and
+		 *     repeat).
+		 *
+		 * We don't expect to have many clauses enough to result in long
+		 * runtimes.
+		 *
+		 * This may also merge all the dependencies, possibly leading to
+		 * longer sequences of transitive dependencies.
+		 *
+		 * E.g. rule [A=>B] in one pg_mv_statistic record and [B=>C] in
+		 * another one results in [A=>C], which can't be deduced if the
+		 * records are considered separately.
+		 */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int k;
+			bool applicable = false;
+
+			/* are clauses on 'A' already eliminated */
+			for (k = 0; k < nmvclauses; k++)
+			{
+				/* clause on 'A' and not yet eliminated */
+				if ((! reduced[k]) && (mvattnums[k] == dependencies->deps[j]->a))
+				{
+					applicable = true; /* we can apply this rule */
+					break;
+				}
+			}
+
+			/* if the rule is not applicable, skip to the next one */
+			if (! applicable)
+				continue;
+
+			/* eliminate all clauses on 'B ' */
+			for (k = 0; k < nmvclauses; k++)
+			{
+				if (mvattnums[k] == dependencies->deps[j]->b)
+					reduced[k] = true;
+			}
+		}
+	}
+
+	/* now walk through the clauses, and keep those that were not reduced */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+	}
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	return reduced_clauses;
+}
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 36757d5..0edaaa6 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -50,7 +50,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, natts, vacattrstats);
+		if (mvstats->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, natts, vacattrstats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(mvstats[i].mvoid, deps);
@@ -147,6 +148,7 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 
 		result[*nstats].mvoid = HeapTupleGetOid(htup);
 		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		result[*nstats].deps_enabled = stats->deps_enabled;
 		result[*nstats].deps_built = stats->deps_built;
 		*nstats += 1;
 	}
@@ -270,3 +272,85 @@ compare_scalars_memcmp_2(const void *a, const void *b)
 {
 	return memcmp(a, b, sizeof(Datum));
 }
+
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index f511c4e..b98ceb7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -59,6 +59,28 @@ typedef struct
 	int		   *tupnoLink;
 } CompareScalarsContext;
 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
 
 VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 									  int natts, VacAttrStats **vacattrstats);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index b900efd..93a2fa6 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -15,6 +15,7 @@
  */
 
 #include "common.h"
+#include "utils/lsyscache.h"
 
 /*
  * Mine functional dependencies between columns, in the form (A => B),
@@ -56,6 +57,20 @@
  * columns on the 'left' side, i.e. a condition for the dependency.
  * That is dependencies [A,B] => C and so on.
  *
+ * TODO The implementation may/should be smart enough not to mine both
+ *      [A => B] and [A,C => B], because the second dependency is a
+ *      consequence of the first one (if values of A determine values
+ *      of B, adding another column won't change that). The ANALYZE
+ *      should first analyze 1:1 dependencies, then 2:1 dependencies
+ *      (and skip the already identified ones), etc.
+ *
+ * For example the dependency [city name => zip code] is much weaker
+ * than [city name, state name => zip code], because there may be
+ * multiple cities with the same name in various states. It's not
+ * perfect though - there are probably cities with the same name within
+ * the same state, but this is relatively rare occurence hopefully.
+ * More about this in the section about dependency mining.
+ *
  * Handling multiple columns on the right side is not necessary, as such
  * dependencies may be decomposed into a set of dependencies with
  * the same meaning, one for each column on the right side. For example
@@ -163,19 +178,61 @@
  * FIXME Not sure if this handles NULL values properly (not sure how to
  *       do that). We assume that NULL means 0 for now, handling it just
  *       like any other value.
+ *
+ * FIXME This builds a complete set of dependencies, i.e. including
+ *       transitive dependencies - if we identify [A => B] and [B => C],
+ *       we're likely to identify [A => C] too. It might be better to
+ *       keep only the minimal set of dependencies, i.e. prune all the
+ *       dependencies that we can recreate by transivitity.
+ *
+ *       There are two conceptual ways to do that:
+ *
+ *       (a) generate all the rules, and then prune the rules that may
+ *           be recteated by combining other dependencies, or
+ *
+ *       (b) performing the 'is combination of other dependencies' check
+ *           before actually doing the work
+ *
+ *       The second option has the advantage that we don't really need
+ *       to perform the sort/count. It's not sufficient alone, though,
+ *       because we may discover the dependencies in the wrong order.
+ *       For example [A => B], [A => C] and then [B => C]. None of those
+ *       dependencies is a combination of the already known ones, yet
+ *       [A => C] is a combination of [A => B] and [B => C].
+ *
+ * TODO Not sure the current NULL handling makes much sense. It's
+ *      handled like a regular value (NULL == NULL), so all NULLs in
+ *      a single column form a single group. Maybe that's not the right
+ *      thing to do, especially with equality conditions - in that case
+ *      NULLs are irrelevant. So maybe the right solution would be to
+ *      just ignore NULL values here?
+ *
+ *      However simply "ignoring" the NULL values does not seem like
+ *      a good idea - imagine columns A and B, where for each value of
+ *      A, values in B are constant (same for the whole group) or NULL.
+ *      Let's say only 10% of B values in each group is not NULL. Then
+ *      ignoring the NULL values will result in 10x misestimate (and
+ *      it's trivial to construct arbitrary errors). So maybe handling
+ *      NULL values just like a regular value is the right thing here.
+ *
+ *      Or maybe NULL values should be treated differently on each side
+ *      of the dependency? E.g. as ignored on the left (condition) and
+ *      as regular values on the right - this seems consistent with how
+ *      equality clauses work, as equality clause means 'NOT NULL'.
+ *      So if we say [A => B] then it may also imply "NOT NULL" on the
+ *      right side.
  */
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 					  int natts, VacAttrStats **vacattrstats)
 {
 	int i;
-	bool isNull;
-	Size len = 2 * sizeof(Datum);	/* only simple associations a => b */
 	int numattrs = attrs->dim1;
 
 	/* result */
 	int ndeps = 0;
 	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
 
 	/* TODO Maybe this should be somehow related to the number of
 	 *      distinct columns in the two columns we're currently analyzing.
@@ -195,8 +252,24 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 	 */
 	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
 
-	/* We'll reuse the same array for all the combinations */
-	Datum * values = (Datum*)palloc0(numrows * 2 * sizeof(Datum));
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
 
 	Assert(numattrs >= 2);
 
@@ -213,9 +286,12 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 	 */
 	for (dima = 0; dima < numattrs; dima++)
 	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, vacattrstats);
+
 		for (dimb = 0; dimb < numattrs; dimb++)
 		{
-			Datum val_a, val_b;
+			SortItem current;
 
 			/* number of groups supporting / contradicting the dependency */
 			int n_supporting = 0;
@@ -232,14 +308,27 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 			if (dima == dimb)
 				continue;
 
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, vacattrstats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
 			/* accumulate all the data for both columns into an array and sort it */
 			for (i = 0; i < numrows; i++)
 			{
-				values[i*2]   = heap_getattr(rows[i], attrs->values[dima], stats[dima]->tupDesc, &isNull);
-				values[i*2+1] = heap_getattr(rows[i], attrs->values[dimb], stats[dimb]->tupDesc, &isNull);
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
 			}
 
-			qsort_arg((void *) values, numrows, sizeof(Datum) * 2, compare_scalars_memcmp, &len);
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
 
 			/*
 			 * Walk through the array, split it into rows according to
@@ -254,13 +343,13 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 			 */
 
 			/* start with values from the first row */
-			val_a = values[0];
-			val_b = values[1];
+			current = items[0];
 			group_size  = 1;
 
 			for (i = 1; i < numrows; i++)
 			{
-				if (values[2*i] != val_a)	/* end of the group */
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
 				{
 					/*
 					 * If there are no contradicting rows, count it as
@@ -271,36 +360,49 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 					 * impossible to identify [unique,unique] cases, but
 					 * that's probably a different case. This is more
 					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
 					 */
-					n_supporting += ((n_violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
-					n_contradicting += (n_violations != 0) ? 1 : 0;
-
-					n_supporting_rows += ((n_violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
-					n_contradicting_rows += (n_violations > 0) ? group_size : 0;
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
 
 					/* current values start a new group */
-					val_a = values[2*i];
-					val_b = values[2*i+1];
 					n_violations = 0;
-					group_size = 1;
+					group_size = 0;
 				}
-				else
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
 				{
-					if (values[2*i+1] != val_b)	/* mismatch of a B value is contradicting */
-					{
-						val_b = values[2*i+1];
-						n_violations += 1;
-					}
-
-					group_size += 1;
+					n_violations += 1;
 				}
+
+				current = items[i];
+				group_size += 1;
 			}
 
-			/* handle the last group */
-			n_supporting += ((n_violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
-			n_contradicting += (n_violations != 0) ? 1 : 0;
-			n_supporting_rows += ((n_violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
-			n_contradicting_rows += (n_violations > 0) ? group_size : 0;
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
 
 			/*
 			 * See if the number of rows supporting the association is at least
@@ -338,7 +440,11 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 		}
 	}
 
+	pfree(items);
 	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
 
 	return dependencies;
 }
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 2b59c2d..a074253 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -18,24 +18,34 @@
 
 /*
  * Basic info about the stats, used when choosing what to use
- * 
- * TODO Add info about what statistics is available (histogram, MCV,
- *      hashed MCV, functional dependencies).
  */
 typedef struct MVStatsData {
 	Oid			mvoid;		/* OID of the stats in pg_mv_statistic */
 	int2vector *stakeys;	/* attnums for columns in the stats */
+
+	/* statistics requested in ALTER TABLE ... ADD STATISTICS */
+	bool		deps_enabled;	/* analyze functional dependencies */
+
+	/* available statistics (computed by ANALYZE) */
 	bool		deps_built;	/* functional dependencies available */
 } MVStatsData;
 
 typedef struct MVStatsData *MVStats;
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -61,6 +71,7 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
+bytea * fetch_mv_rules(Oid mvoid);
 
 bytea * fetch_mv_dependencies(Oid mvoid);
 
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..dbfb5cf
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,175 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE functional_dependencies ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;  
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;  
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;  
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;  
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;  
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;  
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c, d);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0ae2f2..c41762c 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -109,3 +109,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7f762bd..3845b0f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -152,3 +152,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..5d1ad52
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,153 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (unknown_column);
+
+-- single column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a);
+
+-- single column, duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE functional_dependencies ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- correct command
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;  
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;  
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;  
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;  
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;  
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;  
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c, d);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
-- 
2.0.5

0003-multivariate-MCV-lists.patchtext/x-diff; name=0003-multivariate-MCV-lists.patchDownload

>From d6d169988a8ccef1e41e9620599bdfc83a192433 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:15:37 +0100
Subject: [PATCH 3/4] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

FIX: don't build MCV list by default
FIX: analyze MCV lists only when requested
FIX: improvements in clauselist_mv_selectivity_mcvlist()
FIX: comment about upper selectivity boundaries for MCV lists
FIX: switch MCV build to multi_sort functions (from dependencies)

This adds support for proper sorting (per data type), arbitrary
data types (as long as they have '<' operator) and NULL values
to building MCV lists. It still needs a fair amount of love, and
does nothing to serializing/deserializing the MCV lists.

FIX: comment about using max_mcv_items (ADD STATISTICS option)
FIX: initial support for all data types and NULL in MCV lists

This changes the serialize_mcvlist/update_mv_stats in a bit
strange way (passing VacAttrStats all over the place). This
needs to be improved, somehow, before rebasing into the MCV
part. Otherwise it'll cause needless conflicts.

FIX: fixed MCV build / removed debugging WARNING log message
FIX: refactoring lookup_var_attr_stats() - moving to common.c, static

This only includes changes in the common part + functional dependencies.

FIX: refactoring lookup_var_attr_stats() / MCV lists
FIX: a set of regression tests for MCV lists

This is mostly equal to a combination of all the regression tests
for functional dependencies.

One of the tests (EXPLAIN with TEXT columns) currently fails, and
produces Index Scan instead of Bitmap Index Scan. Will investigate.

FIX: comment about memory corruption in deserializing MCV list
FIX: correct MCV spelling on a few places (MVC -> MCV)
FIX: get rid of the custom comparators in mcv.c
FIX: use USE_ASSERT_CHECKING for assert-only variable (MCV)
FIX: check 'mcv' and 'mcv_max_items' options in ADD STATISTICS
FIX: proper handling of 'mcv_max_items' options (constants etc.)
FIX: check that either dependencies or MCV were requested
FIX: improved comments / docs for MCV lists
FIX: move DimensionInfo to common.h
FIX: move MCV list definitions after functional dependencies
FIX: incorrect memcpy() when building MCV list, causing segfaults
FIX: replace variables by macros in MCV serialize/deserialize
FIX: rework clauselist_mv_split() to call clause_is_mv_compatible()

Mostly duplicated the code, making it difficult to add more clause
types etc.

FIX: fixed estimation of equality clauses using MCV lists
FIX: add support for 'IS [NOT] NULL' support to MCV lists
FIX: add regression test for ADD STATISTICS options (MCV list)
FIX: added regression test to test IS [NOT] NULL with MCV lists
FIX: updated comments in clausesel.c (mcv)
FIX: obsolete Assert in mcv code (indexes -> ITEM_INDEXES)
FIX: make regression tests parallel-happy (MCV lists)
---
 src/backend/catalog/system_views.sql     |    4 +-
 src/backend/commands/tablecmds.c         |   47 +-
 src/backend/optimizer/path/clausesel.c   |  788 ++++++++++++++++++++++-
 src/backend/utils/mvstats/Makefile       |    2 +-
 src/backend/utils/mvstats/common.c       |   65 +-
 src/backend/utils/mvstats/common.h       |   12 +-
 src/backend/utils/mvstats/dependencies.c |   13 +-
 src/backend/utils/mvstats/mcv.c          | 1002 ++++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h    |   18 +-
 src/include/catalog/pg_proc.h            |    2 +
 src/include/utils/mvstats.h              |   68 +-
 src/test/regress/expected/mv_mcv.out     |  210 +++++++
 src/test/regress/expected/rules.out      |    4 +-
 src/test/regress/parallel_schedule       |    2 +-
 src/test/regress/serial_schedule         |    1 +
 src/test/regress/sql/mv_mcv.sql          |  181 ++++++
 16 files changed, 2370 insertions(+), 49 deletions(-)
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index da957fc..8acf160 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,7 +158,9 @@ CREATE VIEW pg_mv_stats AS
         C.relname AS tablename,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3c82b89..1f08c1c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11651,7 +11651,13 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	Relation	mvstatrel;
 
 	/* by default build everything */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11706,6 +11712,29 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -11714,10 +11743,16 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -11733,9 +11768,13 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
 
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
-	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+	nulls[Anum_pg_mv_statistic_stadeps -1]  = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 36e5bce..1446fa0 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,18 @@ static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, int nmvstats, MVStats mvstats,
 								SpecialJoinInfo *sjinfo);
 
+static int choose_mv_statistics(int nmvstats, MVStats mvstats,
+								Bitmapset *attnums);
+static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+								 List *clauses, Oid varRelid,
+								 List **mvclauses, MVStats mvstats);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStats mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStats mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -195,8 +207,8 @@ clauselist_selectivity(PlannerInfo *root,
 	Bitmapset  *mvattnums = NULL;
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
@@ -222,6 +234,46 @@ clauselist_selectivity(PlannerInfo *root,
 		/* reduce clauses by applying functional dependencies rules */
 		clauses = clauselist_apply_dependencies(root, clauses, varRelid,
 												nmvstats, mvstats, sjinfo);
+
+		/*
+		 * recollect attributes from mv-compatible clauses (maybe we've
+		 * removed so many clauses we have a single mv-compatible attnum)
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+	}
+
+	/*
+	 * If there still are at least two columns, we'll try to select
+	 * a suitable multivariate stats.
+	 */
+	if (bms_num_members(mvattnums) >= 2)
+	{
+		/* fetch info from the catalog (not the serialized stats yet) */
+		mvstats = list_mv_stats(relid, &nmvstats, true);
+
+		/* see choose_mv_statistics() for details */
+		if (nmvstats > 0)
+		{
+			int idx = choose_mv_statistics(nmvstats, mvstats, mvattnums);
+
+			if (idx >= 0)	/* we have a matching stats */
+			{
+				MVStats mvstat = &mvstats[idx];
+
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat);
+
+				/* we've chosen the histogram to match the clauses */
+				Assert(mvclauses != NIL);
+
+				/* compute the multivariate stats */
+				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			}
+		}
 	}
 
 	/*
@@ -899,6 +951,192 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * Estimate selectivity for the list of MV-compatible clauses, using that
+ * particular histogram.
+ *
+ * When we hit a single bucket, we don't know what portion of it actually
+ * matches the clauses (e.g. equality), and we use 1/2 the bucket by
+ * default. However, the MV histograms are usually less detailed than
+ * the per-column ones, meaning the sum of buckets is often quite high
+ * (thanks to combining a lot of "partially hit" buckets).
+ *
+ * There are several ways to improve this, usually with cases when it
+ * won't really help. Also, the more complex the process, the worse
+ * the failures (i.e. misestimates).
+ *
+ * (1) Use the MV histogram only as a way to combine multiple
+ *     per-column histograms, essentially rewriting
+ *
+ *       P(A & B) = P(A) * P(B|A)
+ *
+ *     where P(B|A) may be computed using a proper "slice" of the
+ *     histogram, by first selecting only buckets where A is true, and
+ *     then using the boundaries to 'restrict' the per-colunm histogram.
+ *
+ *     With more clauses, it gets more complicated, of course
+ *
+ *       P(A & B & C) = P(A & C) * P(B|A & C)
+ *                    = P(A) * P(C|A) * P(B|A & C)
+ *
+ *     and so on.
+ * 
+ *     Of course, the question is how well and efficiently we can
+ *     compute the conditional probabilities - whether this approach
+ *     can improve the estimates (instead of amplifying the errors).
+ *
+ *     Also, this does not eliminate the need for histogram on [A,B,C].
+ *
+ * (2) Use multiple smaller (and more accurate) histograms, and combine
+ *     them using a process similar to the above. E.g. by assuming that
+ *     B and C are independent, we can rewrite
+ *
+ *       P(B|A & C) = P(B|A)
+ * 
+ *     so we can rewrite the whole formula to
+ * 
+ *       P(A & B & C) = P(A) * P(C|A) * P(B|A)
+ * 
+ *     and we're OK with two 2D histograms [A,C] and [A,B].
+ *
+ *     It'd be nice to perform some sort of statistical test (Fisher
+ *     or another chi-squared test) to identify independent components
+ *     and automatically separate them into smaller histograms.
+ *
+ * (3) Using the estimated number of distinct values in a bucket to
+ *     decide the selectivity of equality in the bucket (instead of
+ *     blindly using 1/2 of the bucket, we may use 1/ndistinct).
+ *     Of course, if the ndistinct estimate is way off, or when the
+ *     distribution is not uniform (one distict items get much more
+ *     items), this will fail. Also, we currently don't have ndistinct
+ *     estimate available at this moment (but it shouldn't be that
+ *     difficult to compute as ndistinct and ntuples should be available).
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities
+ *      (i.e. the selectivity of the most restrictive clause), because
+ *      that's the maximum we can ever get from ANDed list of clauses.
+ *      This may probably prevent issues with hitting too many buckets
+ *      and low precision histograms.
+ *
+ * TODO We may support some additional conditions, most importantly
+ *      those matching multiple columns (e.g. "a = b" or "a < b").
+ *      Ultimately we could track multi-table histograms for join
+ *      cardinality estimation.
+ *
+ * TODO Currently this is only estimating all clauses, or clauses
+ *      matching varRelid (when it's not 0). I'm not sure what's the
+ *      purpose of varRelid, but my assumption is this is used for
+ *      join conditions and such. In that case we can use those clauses
+ *      to restrict the other (i.e. filter the histogram buckets first,
+ *      before estimating the other clauses). This is essentially equal
+ *      to computing P(A|B) where "B" are the clauses not matching the
+ *      varRelid.
+ * 
+ * TODO Further thoughts on processing equality clauses - maybe it'd be
+ *      better to look for stats (with MCV) covered by the equality
+ *      clauses, because then we have a chance to find an exact match
+ *      in the MCV list, which is pretty much the best we can do. We may
+ *      also look at the least frequent MCV item, and use it as a upper
+ *      boundary for the selectivity (had there been a more frequent
+ *      item, it'd be in the MCV list).
+ *
+ *      These conditions may then be used as a condition for the other
+ *      selectivities, i.e. we may estimate P(A,B) first, and then
+ *      compute P(C|A,B) from another histogram. This may be useful when
+ *      we can estimate P(A,B) accurately (e.g. because it's a complete
+ *      equality match evaluated on MCV list), and then compute the
+ *      conditional probability P(C|A,B), giving us the requested stats
+ *
+ *          P(A,B,C) = P(A,B) * P(C|A,B)
+ *
+ * TODO There are several options for 'sanity clamping' the estimates.
+ *
+ *      First, if we have selectivities for each condition, then
+ * 
+ *          P(A,B) <= MIN(P(A), P(B))
+ *
+ *      Because additional conditions (connected by AND) can only lower
+ *      the probability.
+ *
+ *      So we can do some basic sanity checks using the single-variate
+ *      stats (the ones we have right now).
+ *
+ *      Second, when we have multivariate stats with a MCV list, then
+ *
+ *      (a) if we have a full equality condition (one equality condition
+ *          on each column) and we found a match in the MCV list, this is
+ *          the selectivity (and it's supposed to be exact)
+ *
+ *      (b) if we have a full equality condition and we haven't found a
+ *          match in the MCV list, then the selectivity is below the
+ *          lowest selectivity in the MCV list
+ *
+ *      (c) if we have a equality condition (not full), we can still
+ *          search the MCV for matches and use the sum of probabilities
+ *          as a lower boundary for the histogram (if there are no
+ *          matches in the MCV list, then we have no boundary)
+ *
+ *      Third, if there are multiple multivariate stats for a set of
+ *      clauses, we may compute all of them and then somehow aggregate
+ *      them - e.g. by choosing the minimum, median or average. The
+ *      multi-variate stats are susceptible to overestimation (because
+ *      we take 50% of the bucket for partial matches). Some stats may
+ *      give better estimates than others, but it's very difficult to
+ *      say determine that in advance which one is the best (it depends
+ *      on the number of buckets, number of additional columns not
+ *      referenced in the clauses etc.) so we may compute all and then
+ *      choose a sane aggregation (minimum seems like a good approach).
+ *      Of course, this may result in longer / more expensive estimation
+ *      (CPU-wise), but it may be worth it.
+ *
+ *      There are ways to address this, though. First, it's possible to
+ *      add a GUC choosing whether to do a 'simple' (using a single
+ *      stats expected to give the best estimate) and 'complex' (combining
+ *      the multiple estimates).
+ * 
+ *          multivariate_estimates = (simple|full)
+ *
+ *      Also, this might be enabled at a table level, by something like
+ * 
+ *          ALTER TABLE ... SET STATISTICS (simple|full)
+ * 
+ *      Which would make it possible to use this only for the tables
+ *      where the simple approach does not work.
+ * 
+ *      Also, there are ways to optimize this algorithmically. E.g. we
+ *      may try to get an estimate from a matching MCV list first, and
+ *      if we happen to get a "full equality match" we may stop computing
+ *      the estimates from other stats (for this condition) because
+ *      that's probably the best estimate we can really get.
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive).
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation). 
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  * 
@@ -945,6 +1183,175 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
+ * We're looking for statistics matching at least 2 attributes,
+ * referenced in the clauses compatible with multivariate statistics.
+ * The current selection criteria is very simple - we choose the
+ * statistics referencing the most attributes.
+ * 
+ * If there are multiple statistics referencing the same number of
+ * columns (from the clauses), the one with less source columns
+ * (as listed in the ADD STATISTICS when creating the statistics) wins.
+ * Other wise the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns,
+ *     but one has 100 buckets and the other one has 1000 buckets (thus
+ *     likely providing better estimates), this is not currently
+ *     considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list,
+ *     another one with just a histogram and a third one with both,
+ *     this is not considered.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts,
+ *     so if there are multiple clauses on a single attribute, this
+ *     still counts as a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example
+ *     equality clauses probably work better with MCV lists than with
+ *     histograms. But IS [NOT] NULL conditions may often work better
+ *     with histograms (thanks to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
+ * selected as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list
+ * of clauses into two parts - conditions that are compatible with the
+ * selected stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while
+ * the last condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting
+ * conditions instead of just referenced attributes), but eventually
+ * the best option should be to combine multiple statistics. But that's
+ * much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing
+ *      the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ */
+static int
+choose_mv_statistics(int nmvstats, MVStats mvstats, Bitmapset *attnums)
+{
+	int i, j;
+
+	int choice = -1;
+	int current_matches = 1;					/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements)
+	 * and for each one count the referenced attributes (encoded in
+	 * the 'attnums' bitmap).
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = mvstats[i].stakeys;
+		int	numattrs = mvstats[i].stakeys->dim1;
+
+		/* count columns covered by the histogram */
+		for (j = 0; j < numattrs; j++)
+			if (bms_is_member(attrs->values[j], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = i;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen histogram, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStats mvstats)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(root, clause, varRelid, NULL, &attnum, sjinfo))
+		{
+			/* Is the attribute part of the selected stats? */
+			for (i = 0; i < numattrs; i++)
+				if (attrs->values[i] == attnum)
+					match = true;
+		}
+
+		if (match)
+		{
+			/*
+			 * The clause matches the selected stats, so extract the
+			 * clause from the RestrictInfo and put it to the
+			 * multivariate list. We'll use it directly.
+			 */
+			RestrictInfo * rinfo = (RestrictInfo *) clause;
+			*mvclauses = lappend(*mvclauses, (Node*)rinfo->clause);
+		}
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
  * into simple_rte_array) and a bitmap of attributes. This is then
@@ -981,21 +1388,23 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
 		clause = (Node*)rinfo->clause;
 
-		/* only simple opclauses are compatible with multivariate stats */
-		if (! is_opclause(clause))
-			return false;
-
 		/* we don't support join conditions at this moment */
 		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
 			return false;
 
-		/* is it 'variable op constant' ? */
-		if (list_length(((OpExpr *) clause)->args) == 2)
+		/*
+		 * Only simple opclauses and IS NULL tests are compatible with
+		 * multivariate stats at this point.
+		 */
+		if ((is_opclause(clause))
+			&& (list_length(((OpExpr *) clause)->args) == 2))
 		{
 			OpExpr	   *expr = (OpExpr *) clause;
 			bool		varonleft = true;
 			bool		ok;
 
+			/* is it 'variable op constant' ? */
+
 			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
 				(is_pseudo_constant_clause_relids(lsecond(expr->args),
 												rinfo->right_relids) ||
@@ -1032,8 +1441,11 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 					return false;
 
 				/* Lookup info about the base relation (we need to pass the OID out) */
-				rte = planner_rt_fetch(var->varno, root);
-				*relid = rte->relid;
+				if (relid != NULL)
+				{
+					rte = planner_rt_fetch(var->varno, root);
+					*relid = rte->relid;
+				}
 
 				/*
 				 * If it's not a "<" or ">" or "=" operator, just ignore the
@@ -1051,6 +1463,45 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 					}
 			}
 		}
+		else if (IsA(clause, NullTest)
+				 && IsA(((NullTest*)clause)->arg, Var))
+		{
+			RangeTblEntry * rte;
+			Var * var = (Var*)((NullTest*)clause)->arg;
+
+			/*
+			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+			 * (return NULL).
+			 *
+			 * TODO Maybe use examine_variable() would fix that?
+			 */
+			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+				return false;
+
+			/*
+			 * Only consider this variable if (varRelid == 0) or when the varno
+			 * matches varRelid (see explanation at clause_selectivity).
+			 *
+			 * FIXME I suspet this may not be really necessary. The (varRelid == 0)
+			 *       part seems to be enforced by treat_as_join_clause().
+			 */
+			if (! ((varRelid == 0) || (varRelid == var->varno)))
+				return false;
+
+			/* Also skip special varno values, and system attributes ... */
+			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+				return false;
+
+			/* Lookup info about the base relation (we need to pass the OID out) */
+			if (relid != NULL)
+			{
+				rte = planner_rt_fetch(var->varno, root);
+				*relid = rte->relid;
+			}
+			*attnum = var->varattno;
+
+			return true;
+		}
 	}
 
 	return false;
@@ -1256,3 +1707,320 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses, Oid varRelid,
 
 	return reduced_clauses;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStats mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	ListCell * l;
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	char * matches = NULL;			/* match/mismatch for each MCV item */
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = deserialize_mv_mcvlist(fetch_mv_mcvlist(mvstats->mvoid));
+
+	Assert(mcvlist != NULL);
+	Assert(clauses != NIL);
+	Assert(mcvlist->nitems > 0);
+	Assert(list_length(clauses) >= 2);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* frequency of the lowest MCV item */
+	*lowsel = 1.0;
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 *
+	 * FIXME This would probably deserve a refactoring, I guess. Unify
+	 *       the two loops and put the checks inside, or something like
+	 *       that.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if there are no remaining matches possible, we can stop */
+		if (nmatches == 0)
+			break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			/* operator */
+			FmgrInfo	opproc;
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	ltproc, gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 * TODO Technically either lt/gt is sufficient.
+				 * 
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype,
+										TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, mvstats->stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool tmp;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * find the lowest selectivity in the MCV
+					 * FIXME Maybe not the best place do do this (in for all clauses).
+					 */
+					if (item->frequency < *lowsel)
+						*lowsel = item->frequency;
+
+					/* if there are no more matches, we can stop processing this clause */
+					if (nmatches == 0)
+						break;
+
+					/* skip MCV items that were already ruled out */
+					if (matches[i] == MVSTATS_MATCH_NONE)
+						continue;
+
+					/* TODO consider bsearch here (list is sorted by values)
+					 * TODO handle other operators too (LT, GT)
+					 * TODO identify "full match" when the clauses fully
+					 *      match the whole MCV list (so that checking the
+					 *      histogram is not needed)
+					 */
+					if (get_oprrest(expr->opno) == F_EQSEL)
+					{
+						/*
+						 * We don't care about isgt in equality, because it does not
+						 * matter whether it's (var = const) or (const = var).
+						 */
+						tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+						if (! tmp)
+							matches[i] = MVSTATS_MATCH_NONE;
+						else
+							eqmatches = bms_add_member(eqmatches, idx);
+					}
+					else if (get_oprrest(expr->opno) == F_SCALARLTSEL)	/* column < constant */
+					{
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							if (tmp)
+							{
+								matches[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+						} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+							if (tmp)
+							{
+								matches[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+						}
+					}
+					else if (get_oprrest(expr->opno) == F_SCALARGTSEL)	/* column > constant */
+					{
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that 
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+							if (tmp)
+							{
+								matches[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+							if (tmp)
+							{
+								matches[i] = MVSTATS_MATCH_NONE; /* no match */
+								continue;
+							}
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, mvstats->stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/*
+				 * find the lowest selectivity in the MCV
+				 * FIXME Maybe not the best place do do this (in for all clauses).
+				 */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! mcvlist->items[i]->isnull[idx]))
+						matches[i] = MVSTATS_MATCH_NONE;
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (mcvlist->items[i]->isnull[idx]))
+						matches[i] = MVSTATS_MATCH_NONE;
+			}
+		}
+	}
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..3c0aff4 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o mcv.o dependencies.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 0edaaa6..69ab805 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,6 +16,10 @@
 
 #include "common.h"
 
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+											 int natts,
+											 VacAttrStats **vacattrstats);
+
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -40,10 +44,15 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 	for (i = 0; i < nmvstats; i++)
 	{
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		/* int2 vector of attnums the stats should be computed on */
 		int2vector * attrs = mvstats[i].stakeys;
 
+		/* filter only the interesting vacattrstats records */
+		VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
 		/* check allowed number of dimensions */
 		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
 
@@ -51,10 +60,14 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		 * Analyze functional dependencies of columns.
 		 */
 		if (mvstats->deps_enabled)
-			deps = build_mv_dependencies(numrows, rows, attrs, natts, vacattrstats);
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* build the MCV list */
+		if (mvstats->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(mvstats[i].mvoid, deps);
+		update_mv_stats(mvstats[i].mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -63,7 +76,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
  * matching the attrs vector (to make it easy to work with when
  * computing multivariate stats).
  */
-VacAttrStats **
+static VacAttrStats **
 lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
 {
 	int i, j;
@@ -136,7 +149,7 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 		 * Skip statistics that were not computed yet (if only stats
 		 * that were already built were requested)
 		 */
-		if (built_only && (! stats->deps_built))
+		if (built_only && (! (stats->mcv_built || stats->deps_built)))
 			continue;
 
 		/* double the array size if needed */
@@ -149,7 +162,9 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 		result[*nstats].mvoid = HeapTupleGetOid(htup);
 		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		result[*nstats].deps_enabled = stats->deps_enabled;
+		result[*nstats].mcv_enabled = stats->mcv_enabled;
 		result[*nstats].deps_built = stats->deps_built;
+		result[*nstats].mcv_built = stats->mcv_built;
 		*nstats += 1;
 	}
 
@@ -164,7 +179,9 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 }
 
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -189,15 +206,26 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
 	oldtup = SearchSysCache1(MVSTATOID,
@@ -225,6 +253,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+       int i, idx = 0;
+       for (i = 0; i < stakeys->dim1; i++)
+       {
+               if (stakeys->values[i] < varattno)
+                       idx += 1;
+               else
+                       break;
+       }
+       return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -235,11 +278,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index b98ceb7..fca2782 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -59,6 +59,14 @@ typedef struct
 	int		   *tupnoLink;
 } CompareScalarsContext;
 
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -82,10 +90,8 @@ int multi_sort_compare(const void *a, const void *b, void *arg);
 int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
-VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
-
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
 int compare_scalars_memcmp(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 93a2fa6..0543690 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -224,7 +224,7 @@
  */
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
-					  int natts, VacAttrStats **vacattrstats)
+					  VacAttrStats **stats)
 {
 	int i;
 	int numattrs = attrs->dim1;
@@ -245,13 +245,6 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 	/* dimension indexes we'll check for associations [a => b] */
 	int dima, dimb;
 
-	/* info for the interesting attributes only
-	 *
-	 * TODO Compute this only once and pass it to all the methods
-	 *      that need it.
-	 */
-	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
-
 	/*
 	 * We'll reuse the same array for all the 2-column combinations.
 	 *
@@ -287,7 +280,7 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 	for (dima = 0; dima < numattrs; dima++)
 	{
 		/* prepare the sort function for the first dimension */
-		multi_sort_add_dimension(mss, 0, dima, vacattrstats);
+		multi_sort_add_dimension(mss, 0, dima, stats);
 
 		for (dimb = 0; dimb < numattrs; dimb++)
 		{
@@ -309,7 +302,7 @@ build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
 				continue;
 
 			/* prepare the sort function for the second dimension */
-			multi_sort_add_dimension(mss, 1, dimb, vacattrstats);
+			multi_sort_add_dimension(mss, 1, dimb, stats);
 
 			/* reset the values and isnull flags */
 			memset(values, 0, sizeof(Datum) * numrows * 2);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..2b3d171
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1002 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+/*
+ * Multivariate MCVs (most-common values lists) are a straightforward
+ * extension of regular MCV list by tracking combinations of values for
+ * several attributes (columns), including NULL flags, and frequency
+ * of the combination.
+ *
+ * For columns small number of distinct values, this works quite well
+ * and may represent the distribution pretty exactly. For columns with
+ * large number of distinct values (e.g. stored as FLOAT), this does
+ * not work that well.
+ *
+ * If we can represent the distribution as a MCV list, we can estimate
+ * some clauses (e.g. equality clauses) much accurately than using
+ * histograms for example.
+ *
+ * Discrete distributions are also easier to combine into a larger
+ * distribution (but this is not yet implemented).
+ *
+ *
+ * TODO For types that don't reasonably support ordering (either because
+ *      the type does not support that or when the user adds some option
+ *      to the ADD STATISTICS command - e.g. UNSORTED_STATS), building
+ *      the histogram may be pointless and inefficient. This is esp.
+ *      true for varlena types that may be quite large and a large MCV
+ *      list may be a better choice, because it makes equality estimates
+ *      more accurate. Due to the unsorted nature, range queries on those
+ *      attributes are rather useless anyway.
+ *
+ *      Another thing is that by restricting to MCV list and equality
+ *      conditions, we can use hash values instead of long varlena values.
+ *      The equality estimation will be very accurate.
+ *
+ *      This however complicates matching the columns to available
+ *      statistics, as it will require matching clauses (not columns) to
+ *      stats. And it may get quite complex - e.g. what if there are
+ *      multiple clauses, each compatible with different stats subset?
+ * 
+ *
+ * Selectivity estimation
+ * ----------------------
+ * The estimation, implemented in clauselist_mv_selectivity_mcvlist(),
+ * is quite simple in principle - walk through the MCV items and sum
+ * frequencies of all the items that match all the clauses.
+ *
+ * The current implementation uses MCV lists to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (a) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *
+ *  (b) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ *
+ * Estimating equality clauses
+ * ---------------------------
+ * When computing selectivity estimate for equality clauses
+ *
+ *      (a = 1) AND (b = 2)
+ *
+ * we can do this estimate pretty exactly assuming that two conditions
+ * are met:
+ *
+ *      (1) there's an equality condition on each attribute
+ *
+ *      (2) we find a matching item in the MCV list
+ *
+ * In that case we know the MCV item represents all the tuples matching
+ * the clauses, and the selectivity estimate is complete. This is what
+ * we call 'full match'.
+ *
+ * When only (1) holds, but there's no matching MCV item, we don't know
+ * whether there are no such rows or just are not very frequent. We can
+ * however use the frequency of the least frequent MCV item as an upper
+ * bound for the selectivity.
+ *
+ * If the equality conditions match only a subset of the attributes
+ * the MCV list is built on (i.e. we can't get a full match - we may get
+ * multiple MCV items matching the clauses, but even if we get a single
+ * match there may be items that did not get into the MCV list. But in
+ * this case we can still use the frequency of the last MCV item to clam
+ * the 'additional' selectivity not accounted for by the matching items.
+ *
+ * If there's no histogram, because the MCV list approximates the
+ * distribution accurately (not because the histogram was disabled),
+ * it does not really matter whether there are equality conditions on
+ * all the columns - we can do pretty accurate estimation using the MCV.
+ * 
+ * TODO For a combination of equality conditions (not full-match case)
+ *      we probably can clamp the selectivity by the minimum of
+ *      selectivities for each condition. For example if we know the
+ *      number of distinct values for each column, we can use 1/ndistinct
+ *      as a per-column estimate. Or rather 1/ndistinct + selectivity
+ *      derived from the MCV list.
+ *
+ *      If we know the estimate of number of combinations of the columns
+ *      (i.e. ndistinct(A,B)), we may estimate the average frequency of
+ *      items in the remaining 10% as [10% / ndistinct(A,B)].
+ *
+ *
+ * Bounding estimates
+ * ------------------
+ * In general the MCV lists may not provide estimates as accurate as
+ * for the full-match equality case, but may provide some useful
+ * lower/upper boundaries for the estimation error.
+ *
+ * With equality clauses we can do a few more tricks to narrow this
+ * error range (see the previous section and TODO), but with inequality
+ * clauses (or generally non-equality clauses), it's rather dificult.
+ * There's nothing like a 'full match' - we have to consider both the
+ * MCV items and the remaining part every time. We can't use the minimum
+ * selectivity of MCV items, as the clauses may match multiple items.
+ *
+ * For example with a MCV list on columns (A, B), covering 90% of the
+ * table (computed while building the MCV list), about ~10% of the table
+ * is not represented by the MCV list. So even if the conditions match
+ * all the remaining rows (not represented by the MCV items), we can't
+ * get selectivity higher than those 10%. We may use 1/2 the remaining
+ * selectivity as an estimate (minimizing average error).
+ *
+ * TODO Most of these ideas (error limiting) are not yet implemented.
+ *
+ *
+ * General TODO
+ * ------------
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * TODO Add support for IS [NOT] NULL clauses, and clauses referencing
+ *      multiple columns (a < b).
+ *
+ * TODO It's possible to build a special case of MCV list, storing not
+ *      the actual values but only 32/64-bit hash. This is only useful
+ *      for estimating equality clauses and for large varlena types,
+ *      which are very impractical for plain MCV list because of size.
+ *      But for those data types we really want just the equality
+ *      clauses, so it's actually a good solution.
+ *
+ * TODO Currently there's no logic to consider building only a MCV list
+ *      (and not building the histogram at all), except for doing this
+ *      decision manually in ADD STATISTICS.
+ */
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(int32) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((int32*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+bytea *
+fetch_mv_mcvlist(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *mcvlist = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum tmp  = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stamcv, &isnull);
+
+		Assert(!isnull);
+
+		mcvlist = DatumGetByteaP(tmp);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return mcvlist;
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * items in the MCV list max_mcv_items (well, we might increase this to
+ * 32k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval)
+			/*
+			 * passed by value, so just Datum array (int4, int8, ...)
+			 *
+			 * TODO Might save a few bytes here, by storing just typlen
+			 *      bytes instead of whole Datum (8B) on 64-bits.
+			 */
+			info[i].nbytes = info[i].nvalues * sizeof(Datum);
+		else if (info[i].typlen > 0)
+			/* pased by reference, but fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 * 
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], sizeof(Datum));
+				data += sizeof(Datum);
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/* inverse to serialize_mv_mcvlist() - see the comment there */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	int32  *indexes = NULL;
+	Datum **values = NULL;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* let's parse the value arrays */
+	values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 * 
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else if (info[i].typlen > 0)
+		{
+			/* pased by reference, but fixed length (name, tid, ...) */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += info[i].typlen;
+			}
+		}
+		else if (info[i].typlen == -1)
+		{
+			/* varlena */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += VARSIZE_ANY(tmp);
+			}
+		}
+		else if (info[i].typlen == -2)
+		{
+			/* cstring */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+			}
+		}
+	}
+
+	/* allocate space for the MCV items */
+	mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)palloc0(sizeof(MCVItemData));
+
+		item->values = (Datum*)palloc0(sizeof(Datum)*ndims);
+		item->isnull = (bool*) palloc0(sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 76b7db7..f88e200 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -35,15 +35,21 @@ CATALOG(pg_mv_statistic,3281)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -59,11 +65,15 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					5
+#define Natts_pg_mv_statistic					9
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_deps_enabled		2
-#define Anum_pg_mv_statistic_deps_built			3
-#define Anum_pg_mv_statistic_stakeys			4
-#define Anum_pg_mv_statistic_stadeps			5
+#define Anum_pg_mv_statistic_mcv_enabled		3
+#define Anum_pg_mv_statistic_mcv_max_items		4
+#define Anum_pg_mv_statistic_deps_built			5
+#define Anum_pg_mv_statistic_mcv_built			6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_stadeps			8
+#define Anum_pg_mv_statistic_stamcv				9
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9fb118a..b4e7b4f 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2687,6 +2687,8 @@ DATA(insert OID = 3284 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3285 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3283 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index a074253..e11aefc 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -25,9 +25,11 @@ typedef struct MVStatsData {
 
 	/* statistics requested in ALTER TABLE ... ADD STATISTICS */
 	bool		deps_enabled;	/* analyze functional dependencies */
+	bool		mcv_enabled;	/* analyze MCV lists */
 
 	/* available statistics (computed by ANALYZE) */
 	bool		deps_built;	/* functional dependencies available */
+	bool		mcv_built;	/* MCV list is already available */
 } MVStatsData;
 
 typedef struct MVStatsData *MVStats;
@@ -66,6 +68,47 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */ 
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -74,24 +117,39 @@ MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
 bytea * fetch_mv_rules(Oid mvoid);
 
 bytea * fetch_mv_dependencies(Oid mvoid);
+bytea * fetch_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  int natts, VacAttrStats **vacattrstats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..fa298ea
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,210 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE mcv_list ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+ALTER TABLE mcv_list ADD STATISTICS (dependencies, max_mcv_items 200) ON (a, b, c);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10) ON (a, b, c);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10000) ON (a, b, c);
+ERROR:  max number of MCV items is 8192
+-- correct command
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list 
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c, d);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 82c2659..80375b8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1357,7 +1357,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index c41762c..78c9b04 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -111,4 +111,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3845b0f..3f9884f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -153,3 +153,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..090731e
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,181 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (unknown_column);
+
+-- single column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a);
+
+-- single column, duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE mcv_list ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- missing MCV statistics
+ALTER TABLE mcv_list ADD STATISTICS (dependencies, max_mcv_items 200) ON (a, b, c);
+
+-- invalid mcv_max_items value / too low
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10) ON (a, b, c);
+
+-- invalid mcv_max_items value / too high
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10000) ON (a, b, c);
+
+-- correct command
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list 
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c, d);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
-- 
2.0.5

0004-multivariate-histograms.patchtext/x-diff; name=0004-multivariate-histograms.patchDownload

>From 397a9b96670097df72f95af04687b1874fb6ae31 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 4/4] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

FIX: don't build histogram by default
FIX: analyze histogram only when requested
FIX: improvements in clauselist_mv_selectivity_histogram()
FIX: minor cleanup in multivariate histograms

- move BUCKET_SIZE_SERIALIZED() macro to histogram.c
- rename MVHIST_ constants to MVSTAT_HIST

FIX: comment about building histograms on too many columns
FIX: added comment about check in ALTER TABLE ... ADD STATISTICS
FIX: added comment about handling DROP TABLE / DROP COLUMN
FIX: added comment about ALTER TABLE ... DROP STATISTICS
FIX: comment about building NULL-buckets for a histogram
FIX: initial support for all data types and NULL in MV histograms

This changes the serialize_histogram/update_mv_stats in a bit
strange way (passing VacAttrStats all over the place). This
needs to be improved, somehow, before rebasing into the
histogram part. Otherwise it'll cause needless conflicts.

FIX: refactoring lookup_var_attr_stats() / histograms
FIX: a set of regression tests for MV histograms

This is mostly equal to a combination of all the regression tests
for functional dependencies / MCV lists.

The last test fails due to Assert(!isNull) in partition_bucket()
which prevents NULL values in histograms.

FIX: remove the two memcmp-based comparators (used for histograms)
FIX: comment about memory corruption in deserializing histograms
FIX: remove CompareScalarsContext/ScalarMCVItem from common.h
FIX: fix the lookup_vac_attr() refactoring in histograms
FIX: get rid of the custom comparators in histogram.c
FIX: building NULL-buckets - buckets with just NULLs in some dimension(s)
FIX: fixed bugs in serialize/deserialize methods for histogram

When serializing, BUCKET_MIN_INDEXES were set twice (once instead
of BUCKET_MAX_INDEXES, which were not set at all).

When deserializing, the 'tmp' pointer was not advanced, so just
the first bucket was ever deserialized (and copied into all the
histogram buckets).

Added a few asserts into deserialize method, similarly to how
it's done in serialize.

FIX: formatting issues in the histogram regression test
FIX: remove sample-dependent results from histogram regression test
FIX: add USE_ASSERT_CHECKING to assert-only variable (histogram)
FIX: check ADD STATISTICS options (histograms)
FIX: improved comments/docs for the multivariate histograms
FIX: reuse DimensionInfo (after move to common.h)
FIX: remove obsolete TODO about NULL-buckets, improve comments
FIX: move multivariate histogram definitions after MCV lists
FIX: correct variable names in error message (dimension index 'j')
FIX: add support for 'IS [NOT] NULL' support to histograms
FIX: add regression test for ADD STATISTICS options (histograms)
FIX: added regression test to test IS [NOT] NULL with histograms
FIX: make regression tests parallel-happy (histograms)
---
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/tablecmds.c           |   55 +-
 src/backend/optimizer/path/clausesel.c     |  391 +++++-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/common.c         |   67 +-
 src/backend/utils/mvstats/common.h         |   14 -
 src/backend/utils/mvstats/histogram.c      | 1778 ++++++++++++++++++++++++++++
 src/backend/utils/mvstats/mcv.c            |    1 +
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    2 +
 src/include/utils/mvstats.h                |   99 +-
 src/test/regress/expected/mv_histogram.out |  210 ++++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  179 +++
 16 files changed, 2776 insertions(+), 57 deletions(-)
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8acf160..3aa7d2b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -160,7 +160,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1f08c1c..2bd3884 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11635,6 +11635,16 @@ static int compare_int16(const void *a, const void *b)
  *       multiple stats on the same columns with different options
  *       (say, a detailed MCV-only stats for some queries, histogram
  *       for others, etc.)
+ *
+ * FIXME Check that at least one of the statistic types is enabled, and
+ *       that only compatible options are used. For example if 'mcv' is
+ *       not selected, then 'mcv_max_items' can't be used (alternative
+ *       might be to enable it automatically).
+ *
+ * TODO It might be useful to have ALTER TABLE DROP STATISTICS too, but
+ *      it's tricky because there may be multiple kinds of stats for the
+ *      same list of columns, with different options (e.g. one just MCV
+ *      list, another with histogram, etc.).
  */
 static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 						StatisticsDef *def, LOCKMODE lockmode)
@@ -11652,12 +11662,15 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 	/* by default build everything */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11735,6 +11748,29 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -11743,10 +11779,10 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -11754,6 +11790,11 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -11771,10 +11812,14 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled  -1] = BoolGetDatum(build_histogram);
+
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps -1]  = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 1446fa0..a4e6d16 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -70,6 +70,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStats mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStats mvstats);
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -1119,6 +1121,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -1132,9 +1135,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
 	 *      MCV/histogram evaluation). 
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -2024,3 +2042,372 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 
 	return s;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all buckets, and increase the match level
+ *      for the clauses (and skip buckets that are 'full match').
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStats mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	ListCell * l;
+	char *matches = NULL;
+	MVHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = deserialize_mv_histogram(fetch_mv_histogram(mvstats->mvoid));
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 *
+				 * TODO Technically either lt/gt is sufficient.
+				 * 
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																	   | TYPECACHE_GT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, mvstats->stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 * 
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					bool tmp;
+					MVBucket bucket = mvhist->buckets[i];
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match)
+					 */
+					if (matches[i] == MVSTATS_MATCH_NONE)
+						continue;
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*
+					* FIXME Once the min/max values are deduplicated, we can easily minimize
+					*       the number of calls to the comparator (assuming we keep the
+					*       deduplicated structure). See the note on compression at MVBucket
+					*       serialize/deserialize methods.
+					*/
+					switch (get_oprrest(expr->opno))
+					{
+						case F_SCALARLTSEL:	/* column < constant */
+
+							if (! isgt)	/* (var < const) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in that 
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 bucket->min[idx]));
+								if (tmp)
+								{
+									matches[i] = MVSTATS_MATCH_NONE; /* no match */
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 bucket->max[idx]));
+
+								if (tmp)
+									matches[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+							}
+							else	/* (const < var) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that 
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 bucket->max[idx],
+																	 cst->constvalue));
+								if (tmp)
+								{
+									matches[i] = MVSTATS_MATCH_NONE; /* no match */
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 bucket->min[idx],
+																	 cst->constvalue));
+
+								if (tmp)
+									matches[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+							}
+							break;
+
+						case F_SCALARGTSEL:	/* column > constant */
+
+							if (! isgt)	/* (var > const) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that 
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 bucket->max[idx]));
+								if (tmp)
+								{
+									matches[i] = MVSTATS_MATCH_NONE; /* no match */
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 bucket->min[idx]));
+
+								if (tmp)
+									matches[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+							}
+							else /* (const > var) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in
+								 * that case we can skip the bucket, because there's no overlap).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 bucket->min[idx],
+																	 cst->constvalue));
+								if (tmp)
+								{
+									matches[i] = MVSTATS_MATCH_NONE; /* no match */
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 bucket->max[idx],
+																	 cst->constvalue));
+
+								if (tmp)
+									matches[i] = MVSTATS_MATCH_PARTIAL; /* partial match */
+							}
+
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the lt/gt
+							 * operators fetched from type cache.
+							 * 
+							 * TODO We'll use the default 50% estimate, but that's probably way off
+							 *		if there are multiple distinct values. Consider tweaking this a
+							 *		somehow, e.g. using only a part inversely proportional to the
+							 *		estimated number of distinct values in the bucket.
+							 *
+							 * TODO This does not handle inclusion flags at the moment, thus counting
+							 *		some buckets twice (when hitting the boundary).
+							 * 
+							 * TODO Optimization is that if max[i] == min[i], it's effectively a MCV
+							 *		item and we can count the whole bucket as a complete match (thus
+							 *		using 100% bucket selectivity and not just 50%).
+							 * 
+							 * TODO Technically some buckets may "degenerate" into single-value
+							 *		buckets (not necessarily for all the dimensions) - maybe this
+							 *		is better than keeping a separate MCV list (multi-dimensional).
+							 *		Update: Actually, that's unlikely to be better than a separate
+							 *		MCV list for two reasons - first, it requires ~2x the space
+							 *		(because of storing lower/upper boundaries) and second because
+							 *		the buckets are ranges - depending on the partitioning algorithm
+							 *		it may not even degenerate into (min=max) bucket. For example the
+							 *		the current partitioning algorithm never does that.
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->min[idx]));
+
+							if (tmp)
+							{
+								matches[i] = MVSTATS_MATCH_NONE;	/* constvalue < min */
+								continue;
+							}
+
+							tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->max[idx],
+																 cst->constvalue));
+
+							if (tmp)
+							{
+								matches[i] = MVSTATS_MATCH_NONE;	/* constvalue > max */
+								continue;
+							}
+
+							/* partial match */
+							matches[i] = MVSTATS_MATCH_PARTIAL;
+
+							break;
+					}
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, mvstats->stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+						matches[i] = MVSTATS_MATCH_NONE;
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+						matches[i] = MVSTATS_MATCH_NONE;
+			}
+		}
+	}
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+	return s;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 3c0aff4..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o mcv.o dependencies.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 69ab805..f6edb2f 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -45,7 +45,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 	{
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		/* int2 vector of attnums the stats should be computed on */
 		int2vector * attrs = mvstats[i].stakeys;
@@ -66,8 +67,23 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (mvstats->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/*
+		 * Build a multivariate histogram on the columns.
+		 *
+		 * FIXME remove the rows used to build the MCV from the histogram.
+		 *       Another option might be subtracting the MCV selectivities
+		 *       from the histogram, but I'm not sure whether that works
+		 *       accurately (maybe it introduces additional errors).
+		 */
+		if ((numrows_filtered > 0) && (mvstats->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(mvstats[i].mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(mvstats[i].mvoid, deps, mcvlist, histogram, attrs, stats);
+
+#ifdef MVSTATS_DEBUG
+		print_mv_histogram_info(histogram);
+#endif
 	}
 }
 
@@ -149,7 +165,7 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 		 * Skip statistics that were not computed yet (if only stats
 		 * that were already built were requested)
 		 */
-		if (built_only && (! (stats->mcv_built || stats->deps_built)))
+		if (built_only && (! (stats->mcv_built || stats->deps_built || stats->hist_built)))
 			continue;
 
 		/* double the array size if needed */
@@ -161,10 +177,15 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 
 		result[*nstats].mvoid = HeapTupleGetOid(htup);
 		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+
 		result[*nstats].deps_enabled = stats->deps_enabled;
 		result[*nstats].mcv_enabled = stats->mcv_enabled;
+		result[*nstats].hist_enabled = stats->hist_enabled;
+
 		result[*nstats].deps_built = stats->deps_built;
 		result[*nstats].mcv_built = stats->mcv_built;
+		result[*nstats].hist_built = stats->hist_built;
+
 		*nstats += 1;
 	}
 
@@ -178,9 +199,16 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 	return result;
 }
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -213,19 +241,31 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
 	oldtup = SearchSysCache1(MVSTATOID,
@@ -302,25 +342,6 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
-/*
- * qsort_arg comparator for sorting Datum[] (row of Datums) when
- * counting distinct values.
- */
-int
-compare_scalars_memcmp(const void *a, const void *b, void *arg)
-{
-	Size		len = *(Size*)arg;
-
-	return memcmp(a, b, len);
-}
-
-int
-compare_scalars_memcmp_2(const void *a, const void *b)
-{
-	return memcmp(a, b, sizeof(Datum));
-}
-
-
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index fca2782..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -47,18 +47,6 @@ typedef struct
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
 
-typedef struct
-{
-	int			count;			/* # of duplicates */
-	int			first;			/* values[] index of first occurrence */
-} ScalarMCVItem;
-
-typedef struct
-{
-	SortSupport ssup;
-	int		   *tupnoLink;
-} CompareScalarsContext;
-
 /* (de)serialization info */
 typedef struct DimensionInfo {
 	int		nvalues;	/* number of deduplicated values */
@@ -94,5 +82,3 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
-int compare_scalars_memcmp(const void *a, const void *b, void *arg);
-int compare_scalars_memcmp_2(const void *a, const void *b);
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..3acbea2
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,1778 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+/*
+ * Multivariate histograms
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by a min/max value in each
+ * dimension, stored in an array, so that the bucket includes values
+ * fulfilling condition
+ * 
+ *     min[i] <= value[i] <= max[i]
+ *
+ * where 'i' is the dimension. In 1D this corresponds to a simple
+ * interval, in 2D to a rectangle, and in 3D to a block. If you can
+ * imagine this in 4D, congrats!
+ *
+ * In addition to the bounaries, each bucket tracks additional details:
+ *
+ *     * frequency (fraction of tuples it matches)
+ *     * whether the boundaries are inclusive or exclusive
+ *     * whether the dimension contains only NULL values
+ *     * number of distinct values in each dimension (for building)
+ *
+ * and possibly some additional information.
+ *
+ * We do expect to support multiple histogram types, with different
+ * features etc. The 'type' field is used to identify those types.
+ * Technically some histogram types might use completely different
+ * bucket representation, but that's not expected at the moment.
+ *
+ * Although the current implementation builds non-overlapping buckets,
+ * the code does not rely on the non-overlapping nature - there are
+ * interesting types of histograms / histogram building algorithms
+ * producing overlapping buckets.
+ *
+ * TODO Currently the histogram does not include information about what
+ *      part of the table it covers (because the frequencies are
+ *      computed from the rows that may be filtered by MCV list). Seems
+ *      wrong, possibly causing misestimates (when not matching the MCV
+ *      list, we'll probably get much higher selectivity).
+ *
+ *
+ * Estimating selectivity
+ * ----------------------
+ * With histograms, we always "match" a whole bucket, not indivitual
+ * rows (or values), irrespectedly of the type of clause. Therefore we
+ * can't use the optimizations for equality clauses, as in MCV lists.
+ *
+ * The current implementation uses histograms to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (a) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (b) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ * When used on low-cardinality data, histograms usually perform
+ * considerably worse than MCV lists (which are a good fit for this
+ * kind of data). This is especially true on categorical data, where
+ * ordering of the values is only loosely related to meaning of the
+ * data, as proper ordering is crucial for histograms.
+ *
+ * On high-cardinality data the histograms are usually a better choice,
+ * because MCV lists can't accurately represent the distribution.
+ *
+ * By evaluating a clause on a bucket, we may get one of three results:
+ *
+ *     (a) FULL_MATCH - The bucket definitely matches the clause.
+ *
+ *     (b) PARTIAL_MATCH - The bucket matches the clause, but not
+ *                         necessarily all the tuples it represents.
+ *
+ *     (c) NO_MATCH - The bucket definitely does not match the clause.
+ *
+ * This may be illustrated using a range [1, 5], which is essentially
+ * a 1D bucket. With clause
+ *
+ *     WHERE (a < 10) => FULL_MATCH (all range values are below
+ *                       10, so the whole bucket matches)
+ *
+ *     WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+ *                       the clause, but we don't know how many)
+ *
+ *     WHERE (a < 0)  => NO_MATCH (all range values are above 1, so
+ *                       no values from the bucket match)
+ *
+ * Some clauses may produce only some of those results - for example
+ * equality clauses may never produce FULL_MATCH as we always hit only
+ * part of the bucket, not all the values. This results in less accurate
+ * estimates compared to MCV lists, where we can hit a MCV items exactly
+ * (an extreme case of that is 'full match').
+ *
+ * There are clauses that may not produce any PARTIAL_MATCH results.
+ * A nice example of that is 'IS [NOT] NULL' clause, which either
+ * matches the bucket completely (FULL_MATCH) or not at all (NO_MATCH),
+ * thanks to how the NULL-buckets are constructed.
+ *
+ * TODO The IS [NOT] NULL clause is not yet implemented, but should be
+ *      rather trivial to.
+ *
+ * Computing the total selectivity estimate is trivial - simply sum
+ * selectivities from all the FULL_MATCH and PARTIAL_MATCH buckets, but
+ * multiply the PARTIAL_MATCH buckets by 0.5 to minimize average error.
+ *
+ *
+ * NULL handling
+ * -------------
+ * Buckets may not contain tuples with NULL and non-NULL values in
+ * a single dimension (attribute). To handle this, the histogram may
+ * contain NULL-buckets, i.e. buckets with one or more NULL-only
+ * dimensions.
+ *
+ * The maximum number of NULL-buckets is determined by the number of
+ * attributes the histogram is built on. For N-dimensional histogram,
+ * the maximum number of NULL-buckets is 2^N. So for 8 attributes
+ * (which is the current value of MVSTATS_MAX_DIMENSIONS), there may be
+ * up to 256 NULL-buckets.
+ *
+ * Those buckets are only built if needed - if there are no NULL values
+ * in the data, no such buckets are built.
+ *
+ *
+ * Serialization
+ * -------------
+ * After serialization, the histograms are marked with 'magic' constant.
+ * to make sure the bytea really is a histogram in serialized form.
+ * 
+ * FIXME info about deduplication
+ * 
+ * 
+ * TODO This structure is used both when building the histogram, and
+ *      then when using it to compute estimates. That's why the last
+ *      few elements are not used once the histogram is built.
+ *
+ *      Add pointer to 'private' data, meant for private data for
+ *      other algorithms for building the histogram. It also removes
+ *      the bogus / unnecessary fields.
+ *
+ * TODO The limit on number of buckets is quite arbitrary, aiming for
+ *      sufficient accuracy while still being fast. Probably should be
+ *      replaced with a dynamic limit dependent on statistics target,
+ *      number of attributes (dimensions) and statistics target
+ *      associated with the attributes. Also, this needs to be related
+ *      to the number of sampled rows, by either clamping it to a
+ *      reasonable number (after seeing the number of rows) or using
+ *      it when computing the number of rows to sample. Something like
+ *      10 rows per bucket seems reasonable.
+ *
+ * TODO Add MVSTAT_HIST_ROWS_PER_BUCKET tracking minimal number of
+ *      tuples per bucket (also, see the previous TODO).
+ *
+ * TODO We may replace the bool arrays with a suitably large data type
+ *      (say, uint16 or uint32) and get rid of the allocations. It's
+ *      unlikely we'll ever support more than 32 columns as that'd
+ *      result in poor precision, huge histograms (splitting each
+ *      dimension once would mean 2^32 buckets), and very expensive
+ *      estimation. MCVItem already does it this way.
+ *
+ *      Update: Actually, this is not 100% true, because we're splitting
+ *      a single bucket, not all the buckets at the same time. So each
+ *      split simply adds one new bucket, and we choose the bucket that
+ *      is most in need of a slit. So even with 32 columns this might
+ *      give reasonable accuracy, maybe? After 1000 splits we'll get
+ *      about 1001 buckets, and some may be quite large (if that area
+ *      frequency has low frequency of tuples).
+ *
+ *      There are other challenges though - e.g. with this many columns
+ *      it's more likely to reference both label/non-label columns,
+ *      which is rather quirky (especially with histograms).
+ *
+ *      However, while this would save some space for histograms built
+ *      on many columns, it won't save anything for up to 4 columns
+ *      (actually, on less than 3 columns it's probably wasteful).
+ *
+ * TODO Maybe the distinct stats (both for combination of all columns
+ *      and for combinations of various subsets of columns) should be
+ *      moved to a separate structure (next to histogram/MCV/...) to
+ *      make it useful even without a histogram computed etc.
+ */
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ * 
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(int32) + 3 * sizeof(bool)) + 2 * sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_NDISTINCT(b)		((float*)(b + sizeof(float)))
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + 2 * sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((int32*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* some debugging methods */
+#ifdef MVSTATS_DEBUG
+static void print_mv_histogram_info(MVHistogram histogram);
+#endif
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, by looking at the number of
+ * distinct values (combination of column values for bucket, column
+ * values for a dimension). This is somehow naive, but seems to work
+ * quite well. See the discussion at select_bucket_to_partition and
+ * partition_bucket for more details about alternative algorithms.
+ *
+ * So the current algorithm looks like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (max distinct combinations)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (max distinct values)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int ndistinct;
+	int numattrs = attrs->dim1;
+	int *ndistincts = (int*)palloc0(sizeof(int) * numattrs);
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets = (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * We may use this to limit number of buckets too - there can never
+	 * be more than ndistinct buckets (or ndistinct/k if we require at
+	 * least k tuples per bucket.
+	 *
+	 * With NULL buckets it's a bit more complicated, because there may
+	 * be 2^ndims NULL buckets, and if each contains a single tuple then
+	 * there may be up to
+	 *
+	 *     (ndistinct - 2^ndims)/k + 2^ndims
+	 *
+	 * buckets. But of course, it may happen that (ndistinct < 2^ndims)
+	 * which needs to be checked.
+	 *
+	 * TODO Use this for alternative estimate of number of buckets.
+	 */
+	ndistinct = histogram->buckets[0]->ndistinct;
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	/* keep the global ndistinct values */
+	for (i = 0; i < numattrs; i++)
+		ndistincts[i] = histogram->buckets[0]->ndistincts[i];
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		int d;
+		histogram->buckets[i]->ntuples
+			= (histogram->buckets[i]->numrows * 1.0) / numrows_total;
+		histogram->buckets[i]->ndistinct
+			= (histogram->buckets[i]->ndistinct * 1.0) / ndistinct;
+
+		for (d = 0; d < numattrs; d++)
+			histogram->buckets[i]->ndistincts[d]
+				= (histogram->buckets[i]->ndistincts[d] * 1.0) / ndistincts[d];
+	}
+
+	pfree(ndistincts);
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+bytea *
+fetch_mv_histogram(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *stahist = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum hist  = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stahist, &isnull);
+
+		Assert(!isnull);
+
+		stahist = DatumGetByteaP(hist);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/*
+	 * TODO Maybe save the histogram into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?.
+	 */
+
+	return stahist;
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm
+ * is simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * buckets in the histogram max_buckets (well, we might increase this
+ * to 16k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ *
+ * Deduplication in serialization
+ * ------------------------------
+ * The deduplication is very effective and important here, because every
+ * time we split a bucket, we keep all the boundary values, except for
+ * the dimension that was used for the split. Another way to look at
+ * this is that each split introduces 1 new value (the value used to do
+ * the split). A histogram with M buckets was created by (M-1) splits
+ * of the initial bucket, and each bucket has 2*N boundary values. So
+ * assuming the initial bucket does not have any 'collapsed' dimensions,
+ * the number of distinct values is
+ *
+ *     (2*N + (M-1))
+ *
+ * but the total number of boundary values is
+ *
+ *     2*N*M
+ *
+ * which is clearly much higher. For a histogram on two columns, with
+ * 1024 buckets, it's 1027 vs. 4096. Of course, we're not saving all
+ * the difference (because we'll use 32-bit indexes into the values).
+ * But with large values (e.g. stored as varlena), this saves a lot.
+ *
+ * An interesting feature is that the total number of distinct values
+ * does not really grow with the number of dimensions, except for the
+ * size of the initial bucket. After that it only depends on number of
+ * buckets (i.e. number of splits).
+ *
+ * XXX Of course this only holds for the current histogram building
+ *     algorithm. Algorithms doing the splits differently (e.g.
+ *     producing overlapping buckets) may behave differently.
+ *
+ * TODO This only confirms we can use the uint16 indexes. The worst
+ *      that could happen is if all the splits happened by a single
+ *      dimension. To exhaust the uint16 this would require ~64k
+ *      splits (needs to be reflected in MVSTAT_HIST_MAX_BUCKETS).
+ *
+ * TODO We don't need to use a separate boolean for each flag, instead
+ *      use a single char and set bits.
+ * 
+ * TODO We might get a bit better compression by considering the actual
+ *      data type length. The current implementation treats all data
+ *      types passed by value as requiring 8B, but for INT it's actually
+ *      just 4B etc.
+ *
+ *      OTOH this is only related to the lookup table, and most of the
+ *      space is occupied by the buckets (with int16 indexes).
+ *
+ *
+ * Varlena compression
+ * -------------------
+ * This encoding may prevent automatic varlena compression (similarly
+ * to JSONB), because first part of the serialized bytea will be an
+ * array of unique values (although sorted), and pglz decides whether
+ * to compress by trying to compress the first part (~1kB or so). Which
+ * is likely to be poor, due to the lack of repetition.
+ *
+ * One possible cure to that might be storing the buckets first, and
+ * then the deduplicated arrays. The buckets might be better suited
+ * for compression.
+ *
+ * On the other hand the encoding scheme is a context-aware compression,
+ * usually compressing to ~30% (or less, with large data types). So the
+ * lack of pglz compression may be OK.
+ *
+ * XXX But maybe we don't really want to compress this, to save on
+ *     planning time?
+ *
+ * TODO Try storing the buckets / deduplicated arrays in reverse order,
+ *      measure impact on compression.
+ *
+ *
+ * Deserialization
+ * ---------------
+ * The deserialization is currently implemented so that it reconstructs
+ * the histogram back into the same structures - this involves quite
+ * a few of memcpy() and palloc(), but maybe we could create a special
+ * structure for the serialized histogram, and access the data directly,
+ * without the unpacking.
+ *
+ * Not only it would save some memory and CPU time, but might actually
+ * work better with CPU caches (not polluting the caches).
+ *
+ * TODO Try to keep the compressed form, instead of deserializing it to
+ *      MVHistogram/MVBucket.
+ *
+ *
+ * General TODOs
+ * -------------
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval)
+			/*
+			 * passed by value, so just Datum array (int4, int8, ...)
+			 *
+			 * TODO Might save a few bytes here, by storing just typlen
+			 *      bytes instead of whole Datum (8B) on 64-bits.
+			 */
+			info[i].nbytes = info[i].nvalues * sizeof(Datum);
+		else if (info[i].typlen > 0)
+			/* pased by reference, but fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized histogram exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], sizeof(Datum));
+				data += sizeof(Datum);
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+		*BUCKET_NDISTINCT(bucket) = histogram->buckets[i]->ndistinct;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				int idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Reverse to serialize histogram. This essentially expands the serialized
+ * form back to MVHistogram / MVBucket.
+ */
+MVHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+	Datum **values = NULL;
+
+	MVHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVHistogramData, buckets));
+	tmp += offsetof(MVHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* let's parse the value arrays */
+	values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 * 
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else if (info[i].typlen > 0)
+		{
+			/* pased by reference, but fixed length (name, tid, ...) */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += info[i].typlen;
+			}
+		}
+		else if (info[i].typlen == -1)
+		{
+			/* varlena */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += VARSIZE_ANY(tmp);
+			}
+		}
+		else if (info[i].typlen == -2)
+		{
+			/* cstring */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+			}
+		}
+	}
+
+	/* allocate space for the buckets */
+	histogram->buckets = (MVBucket*)palloc0(sizeof(MVBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+		bucket->nullsonly     = (bool*) palloc0(sizeof(bool) * ndims);
+		bucket->min_inclusive = (bool*) palloc0(sizeof(bool) * ndims);
+		bucket->max_inclusive = (bool*) palloc0(sizeof(bool) * ndims);
+
+		bucket->min = (Datum*) palloc0(sizeof(Datum) * ndims);
+		bucket->max = (Datum*) palloc0(sizeof(Datum) * ndims);
+
+		bucket->ntuples   = *BUCKET_NTUPLES(tmp);
+		bucket->ndistinct = *BUCKET_NDISTINCT(tmp);
+
+		memcpy(bucket->nullsonly, BUCKET_NULLS_ONLY(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		memcpy(bucket->min_inclusive, BUCKET_MIN_INCL(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		memcpy(bucket->max_inclusive, BUCKET_MAX_INCL(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		/* translate the indexes to values */
+		for (j = 0; j < ndims; j++)
+		{
+			if (! bucket->nullsonly[j])
+			{
+				bucket->min[j] = values[j][BUCKET_MIN_INDEXES(tmp, ndims)[j]];
+				bucket->max[j] = values[j][BUCKET_MAX_INDEXES(tmp, ndims)[j]];
+			}
+		}
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller
+ * buckets.
+ * 
+ * TODO Add ndistinct estimation, probably the one described in "Towards
+ *      Estimation Error Guarantees for Distinct Values, PODS 2000,
+ *      p. 268-279" (the ones called GEE, or maybe AE).
+ *
+ * TODO The "combined" ndistinct is more likely to scale with the number
+ *      of rows (in the table), because a single column behaving this
+ *      way is sufficient for such behavior.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+	bucket->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* all the sample rows fall into the initial bucket */
+	bucket->numrows = numrows;
+	bucket->ntuples = numrows;
+	bucket->rows = rows;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	/*
+	 * The initial bucket was not split at all, so we'll start with the
+	 * first dimension in the next round (index = 0).
+	 */
+	bucket->last_split_dimension = -1;
+
+	return bucket;
+}
+
+/*
+ * TODO Fix to handle arbitrarily-sized histograms (not just 2D ones)
+ *      and call the right output procedures (for the particular type).
+ *
+ * TODO This should somehow fetch info about the data types, and use
+ *      the appropriate output functions to print the boundary values.
+ *      Right now this prints the 8B value as an integer.
+ *
+ * TODO Also, provide a special function for 2D histogram, printing
+ *      a gnuplot script (with rectangles).
+ *
+ * TODO For string types (once supported) we can sort the strings first,
+ *      assign them a sequence of integers and use the original values
+ *      as labels.
+ */
+#ifdef MVSTATS_DEBUG
+static void
+print_mv_histogram_info(MVHistogram histogram)
+{
+	int i = 0;
+
+	elog(WARNING, "histogram nbuckets=%d", histogram->nbuckets);
+
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		MVBucket bucket = histogram->buckets[i];
+		elog(WARNING, "  bucket %d : ndistinct=%f ntuples=%d min=[%ld, %ld], max=[%ld, %ld] distinct=[%d,%d]",
+			i, bucket->ndistinct, bucket->numrows,
+			bucket->min[0], bucket->min[1], bucket->max[0], bucket->max[1],
+			bucket->ndistincts[0], bucket->ndistincts[1]);
+	}
+}
+#endif
+
+/*
+ * A very simple partitioning selection criteria - choose the bucket
+ * with the highest number of distinct values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int ndistinct = 1; /* if ndistinct=1, we can't split the bucket */
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* if the ndistinct count is higher, use this bucket */
+		if (buckets[i]->ndistinct > ndistinct) {
+			bucket = buckets[i];
+			ndistinct = buckets[i]->ndistinct;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - splits the dimensions in
+ * a round-robin manner (considering only those with ndistinct>1). That
+ * is first a dimension 0 is split, then 1, 2, ... until reaching the
+ * end of attribute list, and then wrapping back to 0. Of course,
+ * dimensions with a single distinct value are skipped.
+ *
+ * This is essentially what Muralikrishna/DeWitt described in their SIGMOD
+ * article (M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For
+ * Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD
+ * Conference 1988: 28-36).
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * This splits the bucket by tweaking the existing one, and returning the
+ * new bucket (essentially shrinking the existing one in-place and returning
+ * the other "half" as a new bucket). The caller is responsible for adding
+ * the new bucket into the list of buckets.
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case of
+ *      strongly dependent columns - e.g. y=x).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g. to
+ *      split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(bucket->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = bucket->rows;
+	int oldnrows = bucket->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(bucket->ndistinct > 1);
+	Assert(bucket->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split, in a round robin manner.
+	 * We'll use the first one with (ndistinct > 1).
+	 *
+	 * If we happen to wrap around, something clearly went wrong (we
+	 * can't mess with the last_split_dimension directly, because we
+	 * couldn't do this check).
+	 */
+	dimension = bucket->last_split_dimension;
+	while (true)
+	{
+		dimension = (dimension + 1) % numattrs;
+
+		if (bucket->ndistincts[dimension] > 1)
+			break;
+
+		/* if we ran the previous split dimension, it's infinite loop */
+		Assert(dimension != bucket->last_split_dimension);
+	}
+
+	/* Remember the dimension for the next split of this bucket. */
+	bucket->last_split_dimension = dimension;
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < bucket->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(bucket->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	split_value = values[0].value;
+	for (i = 1; i < bucket->numrows; i++)
+	{
+		/* count distinct values */
+		if (values[i].value != values[i-1].value)
+			ndistinct += 1;
+
+		/* once we've seen 1/2 distinct values (and use the value) */
+		if (ndistinct > bucket->ndistincts[dimension] / 2)
+		{
+			split_value = values[i].value;
+			break;
+		}
+
+		/* keep track how many rows belong to the first bucket */
+		nrows += 1;
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < bucket->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	bucket->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_bucket->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	bucket->numrows	 = nrows;
+	new_bucket->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&bucket->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_bucket->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	new_bucket->last_split_dimension = bucket->last_split_dimension;
+
+	/* allocate the per-dimension arrays */
+	new_bucket->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numrows = bucket->numrows;
+	int numattrs = attrs->dim1;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(bucket->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	bucket->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			bucket->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	Datum * values = (Datum*)palloc0(bucket->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		bucket->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < bucket->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(bucket->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	bucket->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			bucket->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+
+	numrows = bucket->numrows;
+	oldrows = bucket->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < bucket->numrows; i++)
+	{
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(bucket->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < bucket->numrows; i++)
+	{
+		if (heap_attisnull(bucket->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= bucket->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == bucket->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+
+	/* remember the current array info */
+	oldrows = bucket->rows;
+	numrows = bucket->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	bucket->numrows = (numrows - null_count);
+	bucket->rows
+		= (HeapTuple*)palloc0(bucket->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_bucket->numrows = null_count;
+	null_bucket->rows
+		= (HeapTuple*)palloc0(null_bucket->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_bucket->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&bucket->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
index 2b3d171..b0cea61 100644
--- a/src/backend/utils/mvstats/mcv.c
+++ b/src/backend/utils/mvstats/mcv.c
@@ -961,6 +961,7 @@ MCVList deserialize_mv_mcvlist(bytea * data)
 
 	for (i = 0; i < nitems; i++)
 	{
+		/* FIXME allocate as a single chunk (minimize palloc overhead) */
 		MCVItem item = (MCVItem)palloc0(sizeof(MCVItemData));
 
 		item->values = (Datum*)palloc0(sizeof(Datum)*ndims);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index f88e200..08424bd 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -36,13 +36,16 @@ CATALOG(pg_mv_statistic,3281)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -50,6 +53,7 @@ CATALOG(pg_mv_statistic,3281)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -65,15 +69,19 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					9
+#define Natts_pg_mv_statistic					13
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_deps_enabled		2
 #define Anum_pg_mv_statistic_mcv_enabled		3
-#define Anum_pg_mv_statistic_mcv_max_items		4
-#define Anum_pg_mv_statistic_deps_built			5
-#define Anum_pg_mv_statistic_mcv_built			6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_stadeps			8
-#define Anum_pg_mv_statistic_stamcv				9
+#define Anum_pg_mv_statistic_hist_enabled		4
+#define Anum_pg_mv_statistic_mcv_max_items		5
+#define Anum_pg_mv_statistic_hist_max_buckets	6
+#define Anum_pg_mv_statistic_deps_built			7
+#define Anum_pg_mv_statistic_mcv_built			8
+#define Anum_pg_mv_statistic_hist_built			9
+#define Anum_pg_mv_statistic_stakeys			10
+#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_stamcv				12
+#define Anum_pg_mv_statistic_stahist			13
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b4e7b4f..448e76a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2689,6 +2689,8 @@ DATA(insert OID = 3285 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies show");
 DATA(insert OID = 3283 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
 DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3282 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index e11aefc..028a634 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -26,10 +26,12 @@ typedef struct MVStatsData {
 	/* statistics requested in ALTER TABLE ... ADD STATISTICS */
 	bool		deps_enabled;	/* analyze functional dependencies */
 	bool		mcv_enabled;	/* analyze MCV lists */
+	bool		hist_enabled;	/* analyze histogram */
 
 	/* available statistics (computed by ANALYZE) */
 	bool		deps_built;	/* functional dependencies available */
 	bool		mcv_built;	/* MCV list is already available */
+	bool		hist_built;	/* histogram is already available */
 } MVStatsData;
 
 typedef struct MVStatsData *MVStats;
@@ -109,6 +111,91 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+	float	ndistinct;	/* frequency of distinct values */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized), but
+	 * it could be useful for estimating ndistinct for combinations of
+	 * columns.
+	 *
+	 * It would mean tracking 2^N values for each bucket, and even if
+	 * those values might be stores in 1B it's still a lot of space
+	 * (considering the expected number of buckets).
+	 *
+	 * TODO Consider tracking ndistincts for all attribute combinations.
+	 */
+	uint32 *ndistincts;
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */ 
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/*
+	 * Sample tuples falling into this bucket, index of the dimension
+	 * the bucket was split by in the last step.
+	 *
+	 * XXX These fields are needed only while building the histogram,
+	 *     and are not serialized at all.
+	 */
+	HeapTuple  *rows;
+	uint32		numrows;
+	int			last_split_dimension;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -118,14 +205,18 @@ bytea * fetch_mv_rules(Oid mvoid);
 
 bytea * fetch_mv_dependencies(Oid mvoid);
 bytea * fetch_mv_mcvlist(Oid mvoid);
+bytea * fetch_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVHistogram		deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
@@ -137,6 +228,7 @@ int mv_get_index(AttrNumber varattno, int2vector * stakeys);
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -146,10 +238,15 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..a0cf37f
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,210 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE mv_histogram ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+ALTER TABLE mv_histogram ADD STATISTICS (dependencies, max_buckets 200) ON (a, b, c);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 10) ON (a, b, c);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 100000) ON (a, b, c);
+ERROR:  minimum number of buckets is 16384
+-- correct command
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=10000
+(1 row)
+
+TRUNCATE mv_histogram;  
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=1001
+(1 row)
+
+TRUNCATE mv_histogram;  
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=1001
+(1 row)
+
+TRUNCATE mv_histogram;  
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=10000
+(1 row)
+
+TRUNCATE mv_histogram;  
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=3492
+(1 row)
+
+TRUNCATE mv_histogram;  
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=3433
+(1 row)
+
+TRUNCATE mv_histogram;  
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram 
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c, d);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 80375b8..07896b4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1359,7 +1359,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 78c9b04..d9864b7 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -111,4 +111,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3f9884f..d901a78 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -154,3 +154,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..a693e35
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,179 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (unknown_column);
+
+-- single column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a);
+
+-- single column, duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE mv_histogram ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- missing histogram statistics
+ALTER TABLE mv_histogram ADD STATISTICS (dependencies, max_buckets 200) ON (a, b, c);
+
+-- invalid max_buckets value / too low
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 10) ON (a, b, c);
+
+-- invalid max_buckets value / too high
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 100000) ON (a, b, c);
+
+-- correct command
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;  
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;  
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;  
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;  
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;  
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;  
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram 
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram 
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c, d);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
-- 
2.0.5

#25

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 11 years ago

In reply to: Tomas Vondra (#24)

1 attachment(s)

Re: WIP: multivariate statistics / proof of concept

Hello,

Patch 0001 needs changes for OIDs since my patch was
committed. The attached is compatible with current master.

And I tried this like this, and got the following error on
analyze. But unfortunately I don't have enough time to
investigate it now.

postgres=# create table t1 (a int, b int, c int);
insert into t1 (select a/ 10000, a / 10000, a / 10000 from generate_series(0, 99999) a);
postgres=# analyze t1;
ERROR: invalid memory alloc request size 1485176862

regards,

At Sat, 24 Jan 2015 21:21:39 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <54C3FED3.1060600@2ndquadrant.com>

Show quoted text

Hi,

attached is an updated version of the multivariate stats patch. This is
going to be a bit longer mail, so I'll put here a small ToC ;-)

1) patch split into 4 parts
2) where to start / documentation
3) state of the code
4) main changes/improvements
5) remaining limitations

The motivation and design ideas, explained in the first message of this
thread are still valid. It might be a good idea to read it first:

/messages/by-id/543AFA15.4080608@fuzzy.cz

BTW if you happen to go to FOSDEM [PGDay], I'll gladly give you an intro
into the patch in person, or discuss the patch in general.

1) Patch split into 4 parts
---------------------------
Firstly, the patch got broken into the following four pieces, to make
the reviews somewhat easier:

1) 0001-shared-infrastructure-and-functional-dependencies.patch

- infrastructure, shared by all the kinds of stats added
in the following patches (catalog, ALTER TABLE, ANALYZE ...)

- implementation of a simple statistics, tracking functional
dependencies between columns (previously called "associative
rules", but that's incorrect for several reasons)

- this does not modify the optimizer in any way
2) 0002-clause-reduction-using-functional-dependencies.patch

- applies the functional dependencies to optimizer (i.e. considers
the rules in clauselist_selectivity())

3) 0003-multivariate-MCV-lists.patch

- multivariate MCV lists (both ANALYZE and optimizer parts)

4) 0004-multivariate-histograms.patch

- multivariate histograms (both ANALYZE and optimizer parts)

You may look at the patches at github here:

https://github.com/tvondra/postgres/tree/multivariate-stats-squashed

The branch is not stable, i.e. I'll rebase / squash / force-push changes
in the future. (There's also multivariate-stats development branch with
unsquashed changes, but you don't want to look at that, trust me.)

The patches are not exactly small (being in the 50-100 kB range), but
that's mostly because of the amount of comments explaining the goals and
implementation details.

2) Where to start / documentation
---------------------------------
I strived to document all the pieces properly, mostly in the form of
comments. There's no sgml documentation at this point, which should
obviously change in the future.

Anyway, I'd suggest reading the first e-mail in this thread, explaining
the ideas, and then these comments:

1) functional dependencies (patch 0001)
- src/backend/utils/mvstats/dependencies.c

2) MCV lists (patch 0003)
- src/backend/utils/mvstats/mcv.c

3) histograms (patch 0004)
- src/backend/utils/mvstats/mcv.c

- also see clauselist_mv_selectivity_mcvlist() in clausesel.c
- also see clauselist_mv_selectivity_histogram() in clausesel.c

4) selectivity estimation (patches 0002-0004)
- all in src/backend/optimizer/path/clausesel.c
- clauselist_selectivity() - overview of how the stats are applied
- clauselist_apply_dependencies() - functional dependencies reduction
- clauselist_mv_selectivity_mcvlist() - MCV list estimation
- clauselist_mv_selectivity_histogram() - histogram estimation

3) State of the code
--------------------
I've spent a fair amount of time testing the patches, and while I
believe there are no segfaults or so, I know parts of the code need a
bit more love.

The part most in need of improvements / comments is probably the code in
clausesel.c - that seems a bit quirky. Reviews / comments regarding this
part of the code are very welcome - I'm sure there are many ways to
improve this part.

There are a few FIXMEs elsewhere (e.g. about memory allocation in the
(de)serialization code), but those are mostly well-defined issues that I
know how to address (at least I believe so).

4) Main changes/improvements
----------------------------
There are many significant improvements. The previous patch version was
in the 'proof of concept' category (missing pieces, knowingly broken in
some areas), the current patch should 'mostly work'.

The patch fixes two most annoying limitations of the first version:

(a) support for all data types (not just those passed by value)
(b) handles NULL values properly
(c) adds support for IS [NOT] NULL clauses

Aside from that the code was significantly improved, there are proper
regression tests and plenty of comments explaining the details.

5) Remaining limitations
------------------------

(a) limited to stats on 8 columns

This is mostly just a 'safeguard' restriction.

(b) only data types with '<' operator

I don't think this will change anytime soon, because all the
algorithms for building the stats rely on this. I don't see
this as a serious limitation though.

(c) not handling DROP COLUMN or DROP TABLE and so on

Currently this is not handled at all (so the regression tests
do an explicit DELETE from the pg_mv_statistic catalog).

Handling the DROP TABLE won't be difficult, it's similar to the
current stats. Handling ALTER TABLE ... DROP COLUMN will be much
more tricky I guess - should we drop all the stats referencing
that column, or should we just remove it from the stats? Or
should we keep it and treat it as NULL? Not sure what's the best
solution.

(d) limited list of compatible WHERE clauses

The initial patch handled only simple operator clauses

(Var op Constant)

where operator is one of ('<', '<=', '=', '>=', '>'). Now it also
handles IS [NOT] NULL clauses. Adding more clause types should
not be overly difficult - starting with more traditional
'BooleanTest' conditions, or even multi-column conditions
(Var op Var)

which are difficult to estimate using simple-column stats.

(e) optimizer uses single stats per table

This is still true and I don't think this will change soon. i do
have some ideas on how to merge multiple stats etc. but it's
certainly complex stuff, unlikely to happen within this CF. The
patch makes a lot of sense even without this particular feature,
because you can create multiple stats, each suitable for different
queries.

(f) no JOIN conditions

Similarly to the previous point, it's on the TODO but it's not
going to happen in this CF.

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-shared-infrastructure-and-functional-dependencies.patchtext/x-patch; charset=us-asciiDownload

>From 9ebfadb5d6cd9b55dd2707cfc8c789884dafa7fa Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 1/4] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate
stats, most importantly:

- adds a new system catalog (pg_mv_statistic)
- ALTER TABLE ... ADD STATISTICS syntax
- implementation of functional dependencies (the simplest
  type of multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e.
it does not influence the query planning.

FIX: invalid assert in lookup_var_attr_stats()

The current implementation requires a valid 'ltopr'
so that we can sort the sample rows in various ways,
and the assert did verify this by checking that the
function is 'compute_scalar_stats'. This is however
private function in analyze.c, so the check failed
after moving the code into common.c.

Fixed by checking the 'ltopr' operator directly.
Eventually this will be removed, as ltopr is only
needed for histograms (functional dependencies and
MVC lists may be built without it).

FIX: improved comments about functional dependencies
FIX: add magic (MVSTAT_DEPS_MAGIC) into MVDependencies
FIX: improved analysis of functional dependencies

Changes:

- decreased minimum group size
- count contradicting rows ('not supporting' ones)

The algorithm is still rather simple and probably needs
other improvements.

FIX: add pg_mv_stats_dependencies_show() function

This function actually prints the rules, not just some basic
info (number of rules) as  pg_mv_stats_dependencies_info().

FIX: (dependencies != NULL) in pg_mv_stats_dependencies_info()

STRICT is not a solution, because the deserialization may fail
for some reason (corrupted data, ...)

FIX: rename 'associative rules' to 'functional dependencies'

It's a more appropriate name as functional dependencies,
as defined in relational theory (esp. Normal Forms) are
tracking column-level dependencies.

Associative (or more correctly 'association') rules are
tracking dependencies between particular values, and not
necessarily in different columns (shopping bag analysis).

Also, did a bunch of comment improvements, minor fixes.

This does not include changes in clausesel.c!

FIX: remove obsolete Assert() enforcing typbyval types
---
 src/backend/catalog/Makefile               |   1 +
 src/backend/catalog/system_views.sql       |  10 +
 src/backend/commands/analyze.c             |  17 +-
 src/backend/commands/tablecmds.c           | 149 +++++++-
 src/backend/nodes/copyfuncs.c              |  15 +-
 src/backend/parser/gram.y                  |  67 +++-
 src/backend/utils/Makefile                 |   2 +-
 src/backend/utils/cache/syscache.c         |  12 +
 src/backend/utils/mvstats/Makefile         |  17 +
 src/backend/utils/mvstats/common.c         | 272 ++++++++++++++
 src/backend/utils/mvstats/common.h         |  70 ++++
 src/backend/utils/mvstats/dependencies.c   | 554 +++++++++++++++++++++++++++++
 src/include/catalog/indexing.h             |   5 +
 src/include/catalog/pg_mv_statistic.h      |  69 ++++
 src/include/catalog/pg_proc.h              |   5 +
 src/include/catalog/toasting.h             |   1 +
 src/include/nodes/nodes.h                  |   1 +
 src/include/nodes/parsenodes.h             |  11 +-
 src/include/utils/mvstats.h                |  86 +++++
 src/include/utils/syscache.h               |   1 +
 src/test/regress/expected/rules.out        |   8 +
 src/test/regress/expected/sanity_check.out |   1 +
 22 files changed, 1365 insertions(+), 9 deletions(-)
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..d6c16f8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2800f73..d05a716 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -150,6 +150,16 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 75b45f7..da98d54 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -54,7 +55,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Data structure for Algorithm S from Knuth 3.4.2 */
 typedef struct
@@ -110,7 +115,6 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
-
 /*
  *	analyze_rel() -- analyze one relation
  */
@@ -472,6 +476,13 @@ do_analyze_rel(Relation onerel, int options, List *va_cols,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 * 
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For small number of dimensions it works, but
+	 *       for complex stats it'd be nice use sample proportional to
+	 *       the table (say, 0.5% - 1%) instead of a fixed size.
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -574,6 +585,9 @@ do_analyze_rel(Relation onerel, int options, List *va_cols,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
@@ -2825,3 +2839,4 @@ compare_mcvs(const void *a, const void *b)
 
 	return da - db;
 }
+
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 623e6bf..0df7f03 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -92,7 +93,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -140,8 +141,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
@@ -416,7 +418,8 @@ static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKM
 static void ATExecGenericOptions(Relation rel, List *options);
 static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
-
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
 				   ForkNumber forkNum, char relpersistence);
 static const char *storage_name(char c);
@@ -2989,6 +2992,7 @@ AlterTableGetLockLevel(List *cmds)
 				 * updates.
 				 */
 			case AT_SetStatistics:		/* Uses MVCC in getTableAttrs() */
+			case AT_AddStatistics:		/* XXX not sure if the right level */
 			case AT_ClusterOn:	/* Uses MVCC in getIndexes() */
 			case AT_DropCluster:		/* Uses MVCC in getIndexes() */
 			case AT_SetOptions:	/* Uses MVCC in getTableAttrs() */
@@ -3145,6 +3149,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_ADD_CONSTR;
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
+		case AT_AddStatistics:	/* XXX maybe not the right place */
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* Performs own permission checks */
 			ATPrepSetStatistics(rel, cmd->name, cmd->def, lockmode);
@@ -3440,6 +3445,9 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			ATExecSetStatistics(rel, cmd->name, cmd->def, lockmode);
 			break;
+		case AT_AddStatistics:		/* ADD STATISTICS */
+			ATExecAddStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 			ATExecSetOptions(rel, cmd->name, cmd->def, false, lockmode);
 			break;
@@ -11638,3 +11646,136 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 
 	ReleaseSysCache(tuple);
 }
+
+/* used for sorting the attnums in ATExecAddStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the ALTER TABLE ... ADD STATISTICS (options) ON (columns).
+ *
+ * The code is an unholy mix of pieces that really belong to other parts
+ * of the source tree.
+ *
+ * FIXME Check that the types are pass-by-value and support sort,
+ *       although maybe we can live without the sort (and only build
+ *       MCV list / association rules).
+ *
+ * FIXME This should probably check for duplicate stats (i.e. same
+ *       keys, same options). Although maybe it's useful to have
+ *       multiple stats on the same columns with different options
+ *       (say, a detailed MCV-only stats for some queries, histogram
+ *       for others, etc.)
+ */
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+
+	/* by default build everything */
+	bool 	build_dependencies = true;
+
+	Assert(IsA(def, StatisticsDef));
+
+	/* transform the column names to attnum values */
+
+	foreach(l, def->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, def->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	heap_freetuple(htup);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	return;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 029761e..df230d6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3918,6 +3918,17 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static StatisticsDef *
+_copyStatisticsDef(const StatisticsDef *from)
+{
+	StatisticsDef  *newnode = makeNode(StatisticsDef);
+
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4744,7 +4755,9 @@ copyObject(const void *from)
 		case T_RoleSpec:
 			retval = _copyRoleSpec(from);
 			break;
-
+		case T_StatisticsDef:
+			retval = _copyStatisticsDef(from);
+			break;
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(from));
 			retval = 0;			/* keep compiler quiet */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 82405b9..0346a00 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -367,6 +367,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_generic_options alter_generic_options
 				relation_expr_list dostmt_opt_list
 
+%type <list>	OptStatsOptions 
+%type <str>		stats_options_name
+%type <node>	stats_options_arg
+%type <defelt>	stats_options_elem
+%type <list>	stats_options_list
+
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -486,7 +493,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <keyword> unreserved_keyword type_func_name_keyword
 %type <keyword> col_name_keyword reserved_keyword
 
-%type <node>	TableConstraint TableLikeClause
+%type <node>	TableConstraint TableLikeClause TableStatistics
 %type <ival>	TableLikeOptionList TableLikeOption
 %type <list>	ColQualList
 %type <node>	ColConstraint ColConstraintElem ConstraintAttr
@@ -2311,6 +2318,14 @@ alter_table_cmd:
 					n->subtype = AT_DisableRowSecurity;
 					$$ = (Node *)n;
 				}
+			/* ALTER TABLE <name> ADD STATISTICS (options) ON (columns) ... */
+			| ADD_P TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_AddStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
 			| alter_generic_options
 				{
 					AlterTableCmd *n = makeNode(AlterTableCmd);
@@ -3381,6 +3396,56 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				ALTER TABLE relname ADD STATISTICS (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+TableStatistics:
+			STATISTICS OptStatsOptions ON '(' columnList ')'
+				{
+					StatisticsDef *n = makeNode(StatisticsDef);
+					n->keys  = $5;
+					n->options  = $2;
+					$$ = (Node *) n;
+				}
+		;
+
+OptStatsOptions:
+			'(' stats_options_list ')'		{ $$ = $2; }
+			| /*EMPTY*/						{ $$ = NIL; }
+		;
+
+stats_options_list:
+			stats_options_elem
+				{
+					$$ = list_make1($1);
+				}
+			| stats_options_list ',' stats_options_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+stats_options_elem:
+			stats_options_name stats_options_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+stats_options_name:
+			NonReservedWord			{ $$ = $1; }
+		;
+
+stats_options_arg:
+			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
+			| NumericOnly			{ $$ = (Node *) $1; }
+			| /* EMPTY */			{ $$ = NULL; }
+		;
+
 
 /*****************************************************************************
  *
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..f61ef7e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -499,6 +500,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		128
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..36757d5
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,272 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	MVStats mvstats;
+	int		nmvstats;
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel), &nmvstats, false);
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MVDependencies	deps  = NULL;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = mvstats[i].stakeys;
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, natts, vacattrstats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(mvstats[i].mvoid, deps);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+MVStats
+list_mv_stats(Oid relid, int *nstats, bool built_only)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	MVStats		result;
+
+	/* start with 16 items, that should be enough for most cases */
+	int maxitems = 16;
+	result = (MVStats)palloc0(sizeof(MVStatsData) * maxitems);
+	*nstats = 0;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		/*
+		 * Skip statistics that were not computed yet (if only stats
+		 * that were already built were requested)
+		 */
+		if (built_only && (! stats->deps_built))
+			continue;
+
+		/* double the array size if needed */
+		if (*nstats == maxitems)
+		{
+			maxitems *= 2;
+			result = (MVStats)repalloc(result, sizeof(MVStatsData) * maxitems);
+		}
+
+		result[*nstats].mvoid = HeapTupleGetOid(htup);
+		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		result[*nstats].deps_built = stats->deps_built;
+		*nstats += 1;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting Datum[] (row of Datums) when
+ * counting distinct values.
+ */
+int
+compare_scalars_memcmp(const void *a, const void *b, void *arg)
+{
+	Size		len = *(Size*)arg;
+
+	return memcmp(a, b, len);
+}
+
+int
+compare_scalars_memcmp_2(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(Datum));
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..f511c4e
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+
+typedef struct
+{
+	int			count;			/* # of duplicates */
+	int			first;			/* values[] index of first occurrence */
+} ScalarMCVItem;
+
+typedef struct
+{
+	SortSupport ssup;
+	int		   *tupnoLink;
+} CompareScalarsContext;
+
+
+VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
+int compare_scalars_memcmp(const void *a, const void *b, void *arg);
+int compare_scalars_memcmp_2(const void *a, const void *b);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..b900efd
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,554 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+/*
+ * Mine functional dependencies between columns, in the form (A => B),
+ * meaning that a value in column 'A' determines value in 'B'. A simple
+ * artificial example may be a table created like this
+ *
+ *     CREATE TABLE deptest (a INT, b INT)
+ *        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+ *
+ * Clearly, once we know the value for 'A' we can easily determine the
+ * value of 'B' by dividing (A/10). A more practical example may be
+ * addresses, where (ZIP code => city name), i.e. once we know the ZIP,
+ * we probably know which city it belongs to. Larger cities usually have
+ * multiple ZIP codes, so the dependency can't be reversed.
+ *
+ * Functional dependencies are a concept well described in relational
+ * theory, especially in definition of normalization and "normal forms".
+ * Wikipedia has a nice definition of a functional dependency [1]:
+ *
+ *     In a given table, an attribute Y is said to have a functional
+ *     dependency on a set of attributes X (written X -> Y) if and only
+ *     if each X value is associated with precisely one Y value. For
+ *     example, in an "Employee" table that includes the attributes
+ *     "Employee ID" and "Employee Date of Birth", the functional
+ *     dependency {Employee ID} -> {Employee Date of Birth} would hold.
+ *     It follows from the previous two sentences that each {Employee ID}
+ *     is associated with precisely one {Employee Date of Birth}.
+ *
+ * [1] http://en.wikipedia.org/wiki/Database_normalization
+ *
+ * Most datasets might be normalized not to contain any such functional
+ * dependencies, but sometimes it's not practical. In some cases it's
+ * actually a conscious choice to model the dataset in denormalized way,
+ * either because of performance or to make querying easier.
+ *
+ * The current implementation supports only dependencies between two
+ * columns, but this is merely a simplification of the initial patch.
+ * It's certainly useful to mine for dependencies involving multiple
+ * columns on the 'left' side, i.e. a condition for the dependency.
+ * That is dependencies [A,B] => C and so on.
+ *
+ * Handling multiple columns on the right side is not necessary, as such
+ * dependencies may be decomposed into a set of dependencies with
+ * the same meaning, one for each column on the right side. For example
+ *
+ *     A => [B,C]
+ *
+ * is exactly the same as
+ *
+ *     (A => B) & (A => C).
+ *
+ * Of course, storing (A => [B, C]) may be more efficient thant storing
+ * the two dependencies (A => B) and (A => C) separately.
+ *
+ *
+ * Dependency mining (ANALYZE)
+ * ---------------------------
+ *
+ * FIXME Add more details about how build_mv_dependencies() works
+ *       (minimum group size, supporting/contradicting etc.).
+ *
+ * Real-world datasets are imperfect - there may be errors (e.g. due to
+ * data-entry mistakes), or factually correct records, yet contradicting
+ * the dependency (e.g. when a city splits into two, but both keep the
+ * same ZIP code). A strict ANALYZE implementation (where the functional
+ * dependencies are identified) would ignore dependencies on such noisy
+ * data, making the approach unusable in practice.
+ *
+ * The proposed implementation attempts to handle such noisy cases
+ * gracefully, by tolerating small number of contradicting cases.
+ *
+ * In the future this might also perform some sort of test and decide
+ * whether it's worth building any other kind of multivariate stats,
+ * or whether the dependencies sufficiently describe the data. Or at
+ * least not build the MCV list / histogram on the implied columns.
+ * Such reduction would however make the 'verification' (see the next
+ * section) impossible.
+ *
+ *
+ * Clause reduction (planner/optimizer)
+ * ------------------------------------
+ *
+ * FIXME Explain how reduction works.
+ *
+ * The problem with the reduction is that the query may use conditions
+ * that are not redundant, but in fact contradictory - e.g. the user
+ * may search for a ZIP code and a city name not matching the ZIP code.
+ *
+ * In such cases, the condition on the city name is not actually
+ * redundant, but actually contradictory (making the result empty), and
+ * removing it while estimating the cardinality will make the estimate
+ * worse.
+ *
+ * The current estimation assuming independence (and multiplying the
+ * selectivities) works better in this case, but only by utter luck.
+ *
+ * In some cases this might be verified using the other multivariate
+ * statistics - MCV lists and histograms. For MCV lists the verification
+ * might be very simple - peek into the list if there are any items
+ * matching the clause on the 'A' column (e.g. ZIP code), and if such
+ * item is found, check that the 'B' column matches the other clause.
+ * If it does not, the clauses are contradictory. We can't really say
+ * if such item was not found, except maybe restricting the selectivity
+ * using the MCV data (e.g. using min/max selectivity, or something).
+ *
+ * With histograms, it might work similarly - we can't check the values
+ * directly (because histograms use buckets, unlike MCV lists, storing
+ * the actual values). So we can only observe the buckets matching the
+ * clauses - if those buckets have very low frequency, it probably means
+ * the two clauses are incompatible.
+ *
+ * It's unclear what 'low frequency' is, but if one of the clauses is
+ * implied (automatically true because of the other clause), then
+ *
+ *     selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+ *
+ * So we might compute selectivity of the first clause (on the column
+ * A in dependency [A=>B]) - for example using regular statistics.
+ * And then check if the selectivity computed from the histogram is
+ * about the same (or significantly lower).
+ *
+ * The problem is that histograms work well only when the data ordering
+ * matches the natural meaning. For values that serve as labels - like
+ * city names or ZIP codes, or even generated IDs, histograms really
+ * don't work all that well. For example sorting cities by name won't
+ * match the sorting of ZIP codes, rendering the histogram unusable.
+ *
+ * The MCV are probably going to work much better, because they don't
+ * really assume any sort of ordering. And it's probably more appropriate
+ * for the label-like data.
+ *
+ * TODO Support dependencies with multiple columns on left/right.
+ *
+ * TODO Investigate using histogram and MCV list to confirm the
+ *      functional dependencies.
+ *
+ * TODO Investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram/MCV list).
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ *
+ * TODO Consider eliminating the implied columns from the histogram and
+ *      MCV lists (but maybe that's not a good idea).
+ *
+ * FIXME Not sure if this handles NULL values properly (not sure how to
+ *       do that). We assume that NULL means 0 for now, handling it just
+ *       like any other value.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	bool isNull;
+	Size len = 2 * sizeof(Datum);	/* only simple associations a => b */
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct columns in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we should expected to
+	 *      observe in the sample - we can then use the average group
+	 *      size as a threshold. That seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/* info for the interesting attributes only
+	 *
+	 * TODO Compute this only once and pass it to all the methods
+	 *      that need it.
+	 */
+	VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+	/* We'll reuse the same array for all the combinations */
+	Datum * values = (Datum*)palloc0(numrows * 2 * sizeof(Datum));
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			Datum val_a, val_b;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				values[i*2]   = heap_getattr(rows[i], attrs->values[dima], stats[dima]->tupDesc, &isNull);
+				values[i*2+1] = heap_getattr(rows[i], attrs->values[dimb], stats[dimb]->tupDesc, &isNull);
+			}
+
+			qsort_arg((void *) values, numrows, sizeof(Datum) * 2, compare_scalars_memcmp, &len);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			val_a = values[0];
+			val_b = values[1];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				if (values[2*i] != val_a)	/* end of the group */
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 */
+					n_supporting += ((n_violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
+					n_contradicting += (n_violations != 0) ? 1 : 0;
+
+					n_supporting_rows += ((n_violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
+					n_contradicting_rows += (n_violations > 0) ? group_size : 0;
+
+					/* current values start a new group */
+					val_a = values[2*i];
+					val_b = values[2*i+1];
+					n_violations = 0;
+					group_size = 1;
+				}
+				else
+				{
+					if (values[2*i+1] != val_b)	/* mismatch of a B value is contradicting */
+					{
+						val_b = values[2*i+1];
+						n_violations += 1;
+					}
+
+					group_size += 1;
+				}
+			}
+
+			/* handle the last group */
+			n_supporting += ((n_violations == 0) && (group_size >= min_group_size)) ? 1 : 0;
+			n_contradicting += (n_violations != 0) ? 1 : 0;
+			n_supporting_rows += ((n_violations == 0) && (group_size >= min_group_size)) ? group_size : 0;
+			n_contradicting_rows += (n_violations > 0) ? group_size : 0;
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means the columns have the same values (or one is a 'label'),
+			 *      making the conditions rather redundant. Although it's possible
+			 *      that the query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(values);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+bytea *
+fetch_mv_dependencies(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *stadeps = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum deps = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stadeps, &isnull);
+
+		Assert(!isnull);
+
+		stadeps = DatumGetByteaP(deps);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return stadeps;
+}
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..f69eb7c 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,11 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3286, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3286
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3287, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3287
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..76b7db7
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3281
+
+CATALOG(pg_mv_statistic,3281)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_attrdef
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					5
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_deps_enabled		2
+#define Anum_pg_mv_statistic_deps_built			3
+#define Anum_pg_mv_statistic_stakeys			4
+#define Anum_pg_mv_statistic_stadeps			5
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 6a757f3..4b7ae1f 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2693,6 +2693,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3284 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3285 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index cba4ae7..45d3b5a 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3288, 3289);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 38469ef..3a0e7c4 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -414,6 +414,7 @@ typedef enum NodeTag
 	T_WithClause,
 	T_CommonTableExpr,
 	T_RoleSpec,
+	T_StatisticsDef,
 
 	/*
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ec0d0ea..b256162 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -570,6 +570,14 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct StatisticsDef
+{
+	NodeTag		type;
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+} StatisticsDef;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1362,7 +1370,8 @@ typedef enum AlterTableType
 	AT_ReplicaIdentity,			/* REPLICA IDENTITY */
 	AT_EnableRowSecurity,		/* ENABLE ROW SECURITY */
 	AT_DisableRowSecurity,		/* DISABLE ROW SECURITY */
-	AT_GenericOptions			/* OPTIONS (...) */
+	AT_GenericOptions,			/* OPTIONS (...) */
+	AT_AddStatistics			/* add statistics */
 } AlterTableType;
 
 typedef struct ReplicaIdentityStmt
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..2b59c2d
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,86 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "commands/vacuum.h"
+
+/*
+ * Basic info about the stats, used when choosing what to use
+ * 
+ * TODO Add info about what statistics is available (histogram, MCV,
+ *      hashed MCV, functional dependencies).
+ */
+typedef struct MVStatsData {
+	Oid			mvoid;		/* OID of the stats in pg_mv_statistic */
+	int2vector *stakeys;	/* attnums for columns in the stats */
+	bool		deps_built;	/* functional dependencies available */
+} MVStatsData;
+
+typedef struct MVStatsData *MVStats;
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
+
+bytea * fetch_mv_dependencies(Oid mvoid);
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  int natts, VacAttrStats **vacattrstats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies);
+
+#endif
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..12147ab 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1788270..f0117ca 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1353,6 +1353,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..00f5fe7 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
2.1.0.GIT

#26

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Kyotaro HORIGUCHI (#25)

Re: WIP: multivariate statistics / proof of concept

Hello,

On 20.3.2015 09:33, Kyotaro HORIGUCHI wrote:

Hello,

Patch 0001 needs changes for OIDs since my patch was
committed. The attached is compatible with current master.

Thanks. I plan to submit a new version of the patch in a few days, with
significant progress in various directions. I'll have to rebase to
current master before submitting the new version anyway (which includes
fixing duplicate OIDs).

And I tried this like this, and got the following error on
analyze. But unfortunately I don't have enough time to
investigate it now.

postgres=# create table t1 (a int, b int, c int);
insert into t1 (select a/ 10000, a / 10000, a / 10000 from
generate_series(0, 99999) a);
postgres=# analyze t1;
ERROR: invalid memory alloc request size 1485176862

Interesting - particularly because this does not involve any
multivariate stats. I can't reproduce it with the current version of the
patch, so either it's unrelated, or I've fixed it since posting the last
version.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 11 years ago

In reply to: Tomas Vondra (#26)

Re: WIP: multivariate statistics / proof of concept

Hello,

Patch 0001 needs changes for OIDs since my patch was
committed. The attached is compatible with current master.

Thanks. I plan to submit a new version of the patch in a few days, with
significant progress in various directions. I'll have to rebase to
current master before submitting the new version anyway (which includes
fixing duplicate OIDs).

And I tried this like this, and got the following error on
analyze. But unfortunately I don't have enough time to
investigate it now.

postgres=# create table t1 (a int, b int, c int);
insert into t1 (select a/ 10000, a / 10000, a / 10000 from
generate_series(0, 99999) a);
postgres=# analyze t1;
ERROR: invalid memory alloc request size 1485176862

Interesting - particularly because this does not involve any
multivariate stats. I can't reproduce it with the current version of the
patch, so either it's unrelated, or I've fixed it since posting the last
version.

Sorry, not shown above, the *previous* t1 had been done "alter
table t1 add statistics (a, b, c)". Removing t1 didn't remove the
setting. reiniting cluster let me do that without error.

The steps throughout was as following.
===
create table t1 (a int, b int, c int);
alter table t1 add statistics (histogram) on (a, b, c);
drop table t1; -- This does not remove the above setting.
create table t1 (a int, b int, c int);
insert into t1 (select a/ 10000, a / 10000, a / 10000 from generate_series(0, 99999) a);insert into t1 ...
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Kyotaro HORIGUCHI (#27)

Re: WIP: multivariate statistics / proof of concept

Hello,

On 03/24/15 06:34, Kyotaro HORIGUCHI wrote:

Sorry, not shown above, the *previous* t1 had been done "alter table
t1 add statistics (a, b, c)". Removing t1 didn't remove the setting.
reiniting cluster let me do that without error.

OK, thanks. My guess is this issue got already fixed in my working copy,
but I will double-check that.

Admittedly, the management of the stats (e.g. removing stats when the
table is dropped) is one of the incomplete parts. You have to delete the
rows manually from pg_mv_statistic.

--
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Kyotaro HORIGUCHI (#27)

8 attachment(s)

Re: WIP: multivariate statistics / proof of concept

Hello,

attached is a new version of the patch series. Aside from fixing various
issues (crashes, memory leaks). The patches are rebased to current
master, and I also attach a few SQL scripts I used for testing (nothing
fancy, just stress-testing all the parts the patch touches).

The main changes in the patches (requiring plenty of changes in the
other parts) are about these:

(1) combining multiple statistics on a table
--------------------------------------------

In the previous version of the patch, it was only possible to use a
single statistics on a table - when there was a statistics "covering"
all the conditions it worked fine, but that's not always the case.

The new patch is able to combine multiple statistics by decomposing the
probability (=selectivity) into conditional probabilities. Imagine
estimating selectivity of clauses

WHERE (a=1) AND (b=1) AND (c=1) AND (d=1)

with statistics on [a,b,c] and [b,c,d]. The selectivity may be split for
example like this:

P(a=1,b=1,c=1,d=1) = P(a=1,b=1,c=1) * P(d=1|a=1,b=1,c=1)

where P(a=1,b=1,c=1) may be estimated using statistics [a,b,c], and the
second may be simplified like this:

P(d=1|a=1,b=1,c=1) = P(d=1|b=1,c=1)

using the assumption "no multivariate stats => independent". Both these
probabilities match the existing statistics.

The idea is described a bit more in the part #5 of the patch.

(2) choosing the best combination of statistics
-----------------------------------------------

There may be more statistics on a table, and multiple possible ways to
use them to estimate the clauses (different ordering, overlapping
statistics, etc.).

The patch formulates this as an optimization task with two goals.

(a) cover as many clauses as possible
(b) reuse as many conditions (i.e. dependencies) as possible

and implements two algorithms to solve this: (a) exhaustive, walking
through all possible states (using dynamic programming), and (b) greedy,
choosing the best local solution in each step.

The time requirements for the exhaustive solution grows pretty quickly
with the number of clauses and statistics on a table (~ O(N!)). The
greedy is much faster, as it's ~O(N) and in fact much more time is spent
in actually processing the selected statistics (walking through the
histograms etc.).

I assume the exhaustive search may find a better solution in some cases
(that the greedy algorithm misses), but so far I've been unable to come
up with such example.

To make this easier to test, I've added GUC to switch between these
algorithms easily (set to 'greedy' by default)

mvstat_search = {'greedy', 'exhaustive'}

I assume this GUC will be removed eventually, after we figure out which
algorithm is the right one.

(3) estimation of more complex conditions (AND/OR clauses)
----------------------------------------------------------

I've added ability to estimate more complex clauses - combinations of
AND/OR clauses and such. It's somewhat incomplete at the moment, but
hopefully the ideas will be clear from the TODOs/FIXMEs along the way.

Let me know if you have any questions about this version of the patch,
or about the ideas it implements in general.

I also welcome real-world examples of poorly estimated queries, so that
I can test if these patches improve that particular case situation.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-shared-infrastructure-and-functional-dependencies.patchtext/x-diff; name=0001-shared-infrastructure-and-functional-dependencies.patchDownload

>From 7c8f0ce0017beea314219c24146cbb64d0d37a3d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 1/5] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate
stats, most importantly:

- adds a new system catalog (pg_mv_statistic)
- ALTER TABLE ... ADD STATISTICS syntax
- implementation of functional dependencies (the simplest
  type of multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e.
it does not influence the query planning (subject to
follow-up patches).

The current implementation requires a valid 'ltopr' for
the columns, so that we can sort the sample rows in various
ways, both in this patch and other kinds of statistics.
Maybe this restriction could be relaxed in the future,
requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

The algorithm detecting the dependencies is rather simple
and probably needs improvements.

The name 'functional dependencies' is more correct (than
'association rules') as it's exactly the name used in
relational theory (esp. Normal Forms) for tracking
column-level dependencies.
---
 src/backend/catalog/Makefile               |   1 +
 src/backend/catalog/system_views.sql       |  10 +
 src/backend/commands/analyze.c             |  20 +-
 src/backend/commands/tablecmds.c           | 149 ++++++-
 src/backend/nodes/copyfuncs.c              |  14 +
 src/backend/parser/gram.y                  |  67 ++-
 src/backend/utils/Makefile                 |   2 +-
 src/backend/utils/cache/syscache.c         |  12 +
 src/backend/utils/mvstats/Makefile         |  17 +
 src/backend/utils/mvstats/common.c         | 342 +++++++++++++++
 src/backend/utils/mvstats/common.h         |  75 ++++
 src/backend/utils/mvstats/dependencies.c   | 680 +++++++++++++++++++++++++++++
 src/include/catalog/indexing.h             |   5 +
 src/include/catalog/pg_mv_statistic.h      |  69 +++
 src/include/catalog/pg_proc.h              |   5 +
 src/include/catalog/toasting.h             |   1 +
 src/include/nodes/nodes.h                  |   1 +
 src/include/nodes/parsenodes.h             |  11 +-
 src/include/utils/mvstats.h                |  86 ++++
 src/include/utils/syscache.h               |   1 +
 src/test/regress/expected/rules.out        |   8 +
 src/test/regress/expected/sanity_check.out |   1 +
 22 files changed, 1569 insertions(+), 8 deletions(-)
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..d6c16f8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2800f73..d05a716 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -150,6 +150,16 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d4d1914..f82fcf5 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -54,7 +55,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Data structure for Algorithm S from Knuth 3.4.2 */
 typedef struct
@@ -110,7 +115,6 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
-
 /*
  *	analyze_rel() -- analyze one relation
  */
@@ -471,6 +475,17 @@ do_analyze_rel(Relation onerel, int options, List *va_cols,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, and in some cases samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -573,6 +588,9 @@ do_analyze_rel(Relation onerel, int options, List *va_cols,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 002319e..a321755 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -92,7 +93,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -140,8 +141,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
@@ -416,7 +418,8 @@ static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKM
 static void ATExecGenericOptions(Relation rel, List *options);
 static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
-
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
 				   ForkNumber forkNum, char relpersistence);
 static const char *storage_name(char c);
@@ -2999,6 +3002,7 @@ AlterTableGetLockLevel(List *cmds)
 				 * updates.
 				 */
 			case AT_SetStatistics:		/* Uses MVCC in getTableAttrs() */
+			case AT_AddStatistics:		/* XXX not sure if the right level */
 			case AT_ClusterOn:	/* Uses MVCC in getIndexes() */
 			case AT_DropCluster:		/* Uses MVCC in getIndexes() */
 			case AT_SetOptions:	/* Uses MVCC in getTableAttrs() */
@@ -3155,6 +3159,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_ADD_CONSTR;
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
+		case AT_AddStatistics:	/* XXX maybe not the right place */
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* Performs own permission checks */
 			ATPrepSetStatistics(rel, cmd->name, cmd->def, lockmode);
@@ -3457,6 +3462,9 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			address = ATExecSetStatistics(rel, cmd->name, cmd->def, lockmode);
 			break;
+		case AT_AddStatistics:		/* ADD STATISTICS */
+			ATExecAddStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 			address = ATExecSetOptions(rel, cmd->name, cmd->def, false, lockmode);
 			break;
@@ -11854,3 +11862,136 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 
 	ReleaseSysCache(tuple);
 }
+
+/* used for sorting the attnums in ATExecAddStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the ALTER TABLE ... ADD STATISTICS (options) ON (columns).
+ *
+ * The code is an unholy mix of pieces that really belong to other parts
+ * of the source tree.
+ *
+ * FIXME Check that the types are pass-by-value and support sort,
+ *       although maybe we can live without the sort (and only build
+ *       MCV list / association rules).
+ *
+ * FIXME This should probably check for duplicate stats (i.e. same
+ *       keys, same options). Although maybe it's useful to have
+ *       multiple stats on the same columns with different options
+ *       (say, a detailed MCV-only stats for some queries, histogram
+ *       for others, etc.)
+ */
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+
+	/* by default build everything */
+	bool 	build_dependencies = true;
+
+	Assert(IsA(def, StatisticsDef));
+
+	/* transform the column names to attnum values */
+
+	foreach(l, def->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, def->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	heap_freetuple(htup);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	return;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 029761e..a4ce2c9 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3918,6 +3918,17 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static StatisticsDef *
+_copyStatisticsDef(const StatisticsDef *from)
+{
+	StatisticsDef  *newnode = makeNode(StatisticsDef);
+
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4732,6 +4743,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_StatisticsDef:
+			retval = _copyStatisticsDef(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3aa9e42..17183ef 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -367,6 +367,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_generic_options alter_generic_options
 				relation_expr_list dostmt_opt_list
 
+%type <list>	OptStatsOptions
+%type <str>		stats_options_name
+%type <node>	stats_options_arg
+%type <defelt>	stats_options_elem
+%type <list>	stats_options_list
+
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -486,7 +493,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <keyword> unreserved_keyword type_func_name_keyword
 %type <keyword> col_name_keyword reserved_keyword
 
-%type <node>	TableConstraint TableLikeClause
+%type <node>	TableConstraint TableLikeClause TableStatistics
 %type <ival>	TableLikeOptionList TableLikeOption
 %type <list>	ColQualList
 %type <node>	ColConstraint ColConstraintElem ConstraintAttr
@@ -2311,6 +2318,14 @@ alter_table_cmd:
 					n->subtype = AT_DisableRowSecurity;
 					$$ = (Node *)n;
 				}
+			/* ALTER TABLE <name> ADD STATISTICS (options) ON (columns) ... */
+			| ADD_P TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_AddStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
 			| alter_generic_options
 				{
 					AlterTableCmd *n = makeNode(AlterTableCmd);
@@ -3385,6 +3400,56 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				ALTER TABLE relname ADD STATISTICS (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+TableStatistics:
+			STATISTICS OptStatsOptions ON '(' columnList ')'
+				{
+					StatisticsDef *n = makeNode(StatisticsDef);
+					n->keys  = $5;
+					n->options  = $2;
+					$$ = (Node *) n;
+				}
+		;
+
+OptStatsOptions:
+			'(' stats_options_list ')'		{ $$ = $2; }
+			| /*EMPTY*/						{ $$ = NIL; }
+		;
+
+stats_options_list:
+			stats_options_elem
+				{
+					$$ = list_make1($1);
+				}
+			| stats_options_list ',' stats_options_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+stats_options_elem:
+			stats_options_name stats_options_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+stats_options_name:
+			NonReservedWord			{ $$ = $1; }
+		;
+
+stats_options_arg:
+			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
+			| NumericOnly			{ $$ = (Node *) $1; }
+			| /* EMPTY */			{ $$ = NULL; }
+		;
+
 
 /*****************************************************************************
  *
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..f61ef7e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -499,6 +500,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		128
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..8efc5ba
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,342 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	int i;
+	MVStats mvstats;
+	int		nmvstats;
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel), &nmvstats, false);
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MVDependencies	deps  = NULL;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = mvstats[i].stakeys;
+ 
+		/* filter only the interesting vacattrstats records */
+		VacAttrStats **stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(mvstats[i].mvoid, deps);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+MVStats
+list_mv_stats(Oid relid, int *nstats, bool built_only)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	MVStats		result;
+
+	/* start with 16 items, that should be enough for most cases */
+	int maxitems = 16;
+	result = (MVStats)palloc0(sizeof(MVStatsData) * maxitems);
+	*nstats = 0;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		/*
+		 * Skip statistics that were not computed yet (if only stats
+		 * that were already built were requested)
+		 */
+		if (built_only && (! stats->deps_built))
+			continue;
+
+		/* double the array size if needed */
+		if (*nstats == maxitems)
+		{
+			maxitems *= 2;
+			result = (MVStats)repalloc(result, sizeof(MVStatsData) * maxitems);
+		}
+
+		result[*nstats].mvoid = HeapTupleGetOid(htup);
+		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		result[*nstats].deps_built = stats->deps_built;
+		*nstats += 1;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..6d5465b
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..9e7f294
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,680 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Mine functional dependencies between columns, in the form (A => B),
+ * meaning that a value in column 'A' determines value in 'B'. A simple
+ * artificial example may be a table created like this
+ *
+ *     CREATE TABLE deptest (a INT, b INT)
+ *        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+ *
+ * Clearly, once we know the value for 'A' we can easily determine the
+ * value of 'B' by dividing (A/10). A more practical example may be
+ * addresses, where (ZIP code => city name), i.e. once we know the ZIP,
+ * we probably know which city it belongs to. Larger cities usually have
+ * multiple ZIP codes, so the dependency can't be reversed.
+ *
+ * Functional dependencies are a concept well described in relational
+ * theory, especially in definition of normalization and "normal forms".
+ * Wikipedia has a nice definition of a functional dependency [1]:
+ *
+ *     In a given table, an attribute Y is said to have a functional
+ *     dependency on a set of attributes X (written X -> Y) if and only
+ *     if each X value is associated with precisely one Y value. For
+ *     example, in an "Employee" table that includes the attributes
+ *     "Employee ID" and "Employee Date of Birth", the functional
+ *     dependency {Employee ID} -> {Employee Date of Birth} would hold.
+ *     It follows from the previous two sentences that each {Employee ID}
+ *     is associated with precisely one {Employee Date of Birth}.
+ *
+ * [1] http://en.wikipedia.org/wiki/Database_normalization
+ *
+ * Most datasets might be normalized not to contain any such functional
+ * dependencies, but sometimes it's not practical. In some cases it's
+ * actually a conscious choice to model the dataset in denormalized way,
+ * either because of performance or to make querying easier.
+ *
+ * The current implementation supports only dependencies between two
+ * columns, but this is merely a simplification of the initial patch.
+ * It's certainly useful to mine for dependencies involving multiple
+ * columns on the 'left' side, i.e. a condition for the dependency.
+ * That is dependencies [A,B] => C and so on.
+ *
+ * TODO The implementation may/should be smart enough not to mine both
+ *      [A => B] and [A,C => B], because the second dependency is a
+ *      consequence of the first one (if values of A determine values
+ *      of B, adding another column won't change that). The ANALYZE
+ *      should first analyze 1:1 dependencies, then 2:1 dependencies
+ *      (and skip the already identified ones), etc.
+ *
+ * For example the dependency [city name => zip code] is much weaker
+ * than [city name, state name => zip code], because there may be
+ * multiple cities with the same name in various states. It's not
+ * perfect though - there are probably cities with the same name within
+ * the same state, but this is relatively rare occurence hopefully.
+ * More about this in the section about dependency mining.
+ *
+ * Handling multiple columns on the right side is not necessary, as such
+ * dependencies may be decomposed into a set of dependencies with
+ * the same meaning, one for each column on the right side. For example
+ *
+ *     A => [B,C]
+ *
+ * is exactly the same as
+ *
+ *     (A => B) & (A => C).
+ *
+ * Of course, storing (A => [B, C]) may be more efficient thant storing
+ * the two dependencies (A => B) and (A => C) separately.
+ *
+ *
+ * Dependency mining (ANALYZE)
+ * ---------------------------
+ *
+ * The current build algorithm is rather simple - for each pair [A,B] of
+ * columns, the data are sorted lexicographically (first by A, then B),
+ * and then a number of metrics is computed by walking the sorted data.
+ *
+ * In general the algorithm counts distict values of A (forming groups
+ * thanks to the sorting), supporting or contradicting the hypothesis
+ * that A => B (i.e. that values of B are predetermined by A). If there
+ * are multiple values of B for a single value of A, it's counted as
+ * contradicting.
+ *
+ * A group may be neither supporting nor contradicting. To be counted as
+ * supporting, the group has to have at least min_group_size(=3) rows.
+ * Smaller 'supporting' groups are counted as neutral.
+ *
+ * Finally, the number of rows in supporting and contradicting groups is
+ * compared, and if there is at least 10x more supporting rows, the
+ * dependency is considered valid.
+ *
+ *
+ * Real-world datasets are imperfect - there may be errors (e.g. due to
+ * data-entry mistakes), or factually correct records, yet contradicting
+ * the dependency (e.g. when a city splits into two, but both keep the
+ * same ZIP code). A strict ANALYZE implementation (where the functional
+ * dependencies are identified) would ignore dependencies on such noisy
+ * data, making the approach unusable in practice.
+ *
+ * The proposed implementation attempts to handle such noisy cases
+ * gracefully, by tolerating small number of contradicting cases.
+ *
+ * In the future this might also perform some sort of test and decide
+ * whether it's worth building any other kind of multivariate stats,
+ * or whether the dependencies sufficiently describe the data. Or at
+ * least not build the MCV list / histogram on the implied columns.
+ * Such reduction would however make the 'verification' (see the next
+ * section) impossible.
+ *
+ *
+ * Clause reduction (planner/optimizer)
+ * ------------------------------------
+ *
+ * Apllying the dependencies is quite simple - given a list of clauses,
+ * try to apply all the dependencies. For example given clause list
+ *
+ *    (a = 1) AND (b = 1) AND (c = 1) AND (d < 100)
+ *
+ * and dependencies [a=>b] and [a=>d], this may be reduced to
+ *
+ *    (a = 1) AND (c = 1) AND (d < 100)
+ *
+ * The (d<100) can't be reduced as it's not an equality clause, so the
+ * dependency [a=>d] can't be applied.
+ *
+ * See clauselist_apply_dependencies() for more details.
+ *
+ * The problem with the reduction is that the query may use conditions
+ * that are not redundant, but in fact contradictory - e.g. the user
+ * may search for a ZIP code and a city name not matching the ZIP code.
+ *
+ * In such cases, the condition on the city name is not actually
+ * redundant, but actually contradictory (making the result empty), and
+ * removing it while estimating the cardinality will make the estimate
+ * worse.
+ *
+ * The current estimation assuming independence (and multiplying the
+ * selectivities) works better in this case, but only by utter luck.
+ *
+ * In some cases this might be verified using the other multivariate
+ * statistics - MCV lists and histograms. For MCV lists the verification
+ * might be very simple - peek into the list if there are any items
+ * matching the clause on the 'A' column (e.g. ZIP code), and if such
+ * item is found, check that the 'B' column matches the other clause.
+ * If it does not, the clauses are contradictory. We can't really say
+ * if such item was not found, except maybe restricting the selectivity
+ * using the MCV data (e.g. using min/max selectivity, or something).
+ *
+ * With histograms, it might work similarly - we can't check the values
+ * directly (because histograms use buckets, unlike MCV lists, storing
+ * the actual values). So we can only observe the buckets matching the
+ * clauses - if those buckets have very low frequency, it probably means
+ * the two clauses are incompatible.
+ *
+ * It's unclear what 'low frequency' is, but if one of the clauses is
+ * implied (automatically true because of the other clause), then
+ *
+ *     selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+ *
+ * So we might compute selectivity of the first clause (on the column
+ * A in dependency [A=>B]) - for example using regular statistics.
+ * And then check if the selectivity computed from the histogram is
+ * about the same (or significantly lower).
+ *
+ * The problem is that histograms work well only when the data ordering
+ * matches the natural meaning. For values that serve as labels - like
+ * city names or ZIP codes, or even generated IDs, histograms really
+ * don't work all that well. For example sorting cities by name won't
+ * match the sorting of ZIP codes, rendering the histogram unusable.
+ *
+ * The MCV are probably going to work much better, because they don't
+ * really assume any sort of ordering. And it's probably more appropriate
+ * for the label-like data.
+ *
+ * TODO Support dependencies with multiple columns on left/right.
+ *
+ * TODO Investigate using histogram and MCV list to confirm the
+ *      functional dependencies.
+ *
+ * TODO Investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram/MCV list).
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ *
+ * TODO Consider eliminating the implied columns from the histogram and
+ *      MCV lists (but maybe that's not a good idea, because that'd make
+ *      it impossible to use these stats for non-equality clauses and
+ *      also it wouldn't be possible to use the stats for verification
+ *      of the dependencies as proposed in another TODO).
+ *
+ * TODO This builds a complete set of dependencies, i.e. including
+ *      transitive dependencies - if we identify [A => B] and [B => C],
+ *      we're likely to identify [A => C] too. It might be better to
+ *      keep only the minimal set of dependencies, i.e. prune all the
+ *      dependencies that we can recreate by transivitity.
+ *
+ *      There are two conceptual ways to do that:
+ *
+ *      (a) generate all the rules, and then prune the rules that may
+ *          be recteated by combining other dependencies, or
+ *
+ *      (b) performing the 'is combination of other dependencies' check
+ *          before actually doing the work
+ *
+ *      The second option has the advantage that we don't really need
+ *      to perform the sort/count. It's not sufficient alone, though,
+ *      because we may discover the dependencies in the wrong order.
+ *      For example [A => B], [A => C] and then [B => C]. None of those
+ *      dependencies is a combination of the already known ones, yet
+ *      [A => C] is a combination of [A => B] and [B => C].
+ *
+ * FIXME Not sure the current NULL handling makes much sense. We assume
+ *       that NULL is 0, so it's handled like a regular value
+ *       (NULL == NULL), so all NULLs in a single column form a single
+ *       group. Maybe that's not the right thing to do, especially with
+ *       equality conditions - in that case NULLs are irrelevant. So
+ *       maybe the right solution would be to just ignore NULL values?
+ *
+ *       However simply "ignoring" the NULL values does not seem like
+ *       a good idea - imagine columns A and B, where for each value of
+ *       A, values in B are constant (same for the whole group) or NULL.
+ *       Let's say only 10% of B values in each group is not NULL. Then
+ *       ignoring the NULL values will result in 10x misestimate (and
+ *       it's trivial to construct arbitrary errors). So maybe handling
+ *       NULL values just like a regular value is the right thing here.
+ *
+ *       Or maybe NULL values should be treated differently on each side
+ *       of the dependency? E.g. as ignored on the left (condition) and
+ *       as regular values on the right - this seems consistent with how
+ *       equality clauses work, as equality clause means 'NOT NULL'.
+ *       So if we say [A => B] then it may also imply "NOT NULL" on the
+ *       right side.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct columns in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we should expected to
+	 *      observe in the sample - we can then use the average group
+	 *      size as a threshold. That seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+bytea *
+fetch_mv_dependencies(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *stadeps = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum deps = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stadeps, &isnull);
+
+		Assert(!isnull);
+
+		stadeps = DatumGetByteaP(deps);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return stadeps;
+}
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..22bb781 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,11 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..81ec23b
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_attrdef
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					5
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_deps_enabled		2
+#define Anum_pg_mv_statistic_deps_built			3
+#define Anum_pg_mv_statistic_stakeys			4
+#define Anum_pg_mv_statistic_stadeps			5
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8890ade..f728d88 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2712,6 +2712,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3284 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3285 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index fb2f035..724a169 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3288, 3289);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 38469ef..3a0e7c4 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -414,6 +414,7 @@ typedef enum NodeTag
 	T_WithClause,
 	T_CommonTableExpr,
 	T_RoleSpec,
+	T_StatisticsDef,
 
 	/*
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2893cef..81ca159 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -570,6 +570,14 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct StatisticsDef
+{
+	NodeTag		type;
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+} StatisticsDef;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1362,7 +1370,8 @@ typedef enum AlterTableType
 	AT_ReplicaIdentity,			/* REPLICA IDENTITY */
 	AT_EnableRowSecurity,		/* ENABLE ROW SECURITY */
 	AT_DisableRowSecurity,		/* DISABLE ROW SECURITY */
-	AT_GenericOptions			/* OPTIONS (...) */
+	AT_GenericOptions,			/* OPTIONS (...) */
+	AT_AddStatistics			/* add statistics */
 } AlterTableType;
 
 typedef struct ReplicaIdentityStmt
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..5c8643d
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,86 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "commands/vacuum.h"
+
+/*
+ * Basic info about the stats, used when choosing what to use
+ *
+ * TODO Add info about what statistics is available (histogram, MCV,
+ *      hashed MCV, functional dependencies).
+ */
+typedef struct MVStatsData {
+	Oid			mvoid;		/* OID of the stats in pg_mv_statistic */
+	int2vector *stakeys;	/* attnums for columns in the stats */
+	bool		deps_built;	/* functional dependencies available */
+} MVStatsData;
+
+typedef struct MVStatsData *MVStats;
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
+
+bytea * fetch_mv_dependencies(Oid mvoid);
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies);
+
+#endif
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..12147ab 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1788270..f0117ca 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1353,6 +1353,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..00f5fe7 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
2.0.5

0002-clause-reduction-using-functional-dependencies.patchtext/x-diff; name=0002-clause-reduction-using-functional-dependencies.patchDownload

>From 47a48180be115db2fa29ac659f4e4f259e01600d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 16 Jan 2015 22:33:41 +0100
Subject: [PATCH 2/5] clause reduction using functional dependencies

During planning, use functional dependencies to decide
which clauses to skip during cardinality estimation.
Initial and rather simplistic implementation.

This only works with regular WHERE clauses, not clauses
used for joining.

Note: The clause_is_mv_compatible() needs to identify the
relation (so that we can fetch the list of multivariate stats
by OID). planner_rt_fetch() seems like the appropriate way to
get the relation OID, but apparently it only works with simple
vars. Maybe examine_variable() would make this work with more
complex vars too?

Includes regression tests analyzing functional dependencies
(part of ANALYZE) on several datasets (no dependencies, no
transitive dependencies, ...).

Checks that a query with conditions on two columns, where one (B)
is functionally dependent on the other one (A), correctly ignores
the clause on (B) and chooses bitmap index scan instead of plain
index scan (which is what happens otherwise, thanks to assumption
of independence).

Note: Functional dependencies only work with equality clauses,
      no inequalities etc.
---
 src/backend/commands/analyze.c                |   1 +
 src/backend/commands/tablecmds.c              |   9 +-
 src/backend/optimizer/path/clausesel.c        | 650 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/common.c            |   5 +-
 src/include/catalog/pg_proc.h                 |   4 +-
 src/include/utils/mvstats.h                   |  23 +-
 src/test/regress/expected/mv_dependencies.out | 175 +++++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 153 ++++++
 10 files changed, 1013 insertions(+), 11 deletions(-)
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index f82fcf5..e247f84 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -115,6 +115,7 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
+
 /*
  *	analyze_rel() -- analyze one relation
  */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a321755..965d342 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -420,6 +420,7 @@ static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
 static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 								StatisticsDef *def, LOCKMODE lockmode);
+
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
 				   ForkNumber forkNum, char relpersistence);
 static const char *storage_name(char c);
@@ -11900,7 +11901,7 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	Relation	mvstatrel;
 
 	/* by default build everything */
-	bool 	build_dependencies = true;
+	bool 	build_dependencies = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11962,6 +11963,12 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 							opt->defname)));
 	}
 
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index dcac1c1..e742827 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -24,6 +24,14 @@
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
+#include "utils/mvstats.h"
+#include "catalog/pg_collation.h"
+#include "utils/typcache.h"
+
+#include "parser/parsetree.h"
+
+
+#include <stdio.h>
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -43,6 +51,16 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 
+static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+							 Oid *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+
+static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
+									  Oid varRelid, Oid *relid, SpecialJoinInfo *sjinfo);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+								Oid varRelid, int nmvstats, MVStats mvstats,
+								SpecialJoinInfo *sjinfo);
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -61,7 +79,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -88,6 +106,76 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  *
  * Of course this is all very dependent on the behavior of
  * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ *
+ *
+ * Multivariate statististics
+ * --------------------------
+ * This also uses multivariate stats to estimate combinations of conditions,
+ * in a way attempting to minimize the overhead when there are no suitable
+ * multivariate stats.
+ *
+ * The following checks are performed (in this order), and the optimizer
+ * falls back to regular stats on the first 'false'.
+ *
+ * NOTE: This explains how this works with all the patches applied, not
+ *       just the functional dependencies.
+ *
+ * (1) check that at least two columns are referenced from conditions
+ *     compatible with multivariate stats
+ *
+ *     If there are no conditions that might be handled by multivariate
+ *     stats, or if the conditions reference just a single column, it
+ *     makes no sense to use multivariate stats.
+ *
+ *     What conditions are compatible with multivariate stats is decided
+ *     by clause_is_mv_compatible(). At this moment, only simple conditions
+ *     of the form "column operator constant" (for simple comparison
+ *     operators), and IS NULL / IS NOT NULL are considered compatible
+ *     with multivariate statistics.
+ *
+ * (2) reduce the clauses using functional dependencies
+ *
+ *     This simply attempts to 'reduce' the clauses by applying functional
+ *     dependencies. For example if there are two clauses:
+ *
+ *         WHERE (a = 1) AND (b = 2)
+ *
+ *     and we know that 'a' determines the value of 'b', we may remove
+ *     the second condition (b = 2) when computing the selectivity.
+ *     This is of course tricky - see mvstats/dependencies.c for details.
+ *
+ *     After the reduction, step (1) is to be repeated.
+ *
+ * (3) check if there are multivariate stats built on the columns
+ *
+ *     If there are no multivariate statistics, we have to fall back to
+ *     the regular stats. We might perform checks (1) and (2) in reverse
+ *     order, i.e. first check if there are multivariate statistics and
+ *     then collect the attributes only if needed. The assumption is
+ *     that checking the clauses is cheaper than querying the catalog,
+ *     so this check is performed first.
+ *
+ * (4) choose the stats matching the most columns (at least two)
+ *
+ *     If there are multiple instances of multivariate statistics (e.g.
+ *     built on different sets of columns), we choose the stats covering
+ *     the most columns from step (1). It may happen that all available
+ *     stats match just a single column - for example with conditions
+ *
+ *         WHERE a = 1 AND b = 2
+ *
+ *     and statistics built on (a,c) and (b,c). In such case just fall
+ *     back to the regular stats because it makes no sense to use the
+ *     multivariate statistics.
+ *
+ *     This selection criteria (the most columns) is certainly very
+ *     simple and definitely not optimal - it's simple to come up with
+ *     examples where other approaches work better. More about this
+ *     at choose_mv_statistics().
+ *
+ * (5) use the multivariate stats to estimate matching clauses
+ *
+ * (6) estimate the remaining clauses using the regular statistics
  */
 Selectivity
 clauselist_selectivity(PlannerInfo *root,
@@ -100,6 +188,14 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+	int			nmvstats = 0;
+	MVStats		mvstats = NULL;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +204,28 @@ clauselist_selectivity(PlannerInfo *root,
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
+	/* collect attributes referenced by mv-compatible clauses */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+
+	/*
+	 * If there are mv-compatible clauses, referencing at least two
+	 * different columns (otherwise it makes no sense to use mv stats),
+	 * try to reduce the clauses using functional dependencies, and
+	 * recollect the attributes from the reduced list.
+	 *
+	 * We don't need to select a single statistics for this - we can
+	 * apply all the functional dependencies we have.
+	 */
+	if (bms_num_members(mvattnums) >= 2)
+	{
+		/* fetch info from the catalog (not the serialized stats yet) */
+		mvstats = list_mv_stats(relid, &nmvstats, true);
+
+		/* reduce clauses by applying functional dependencies rules */
+		clauses = clauselist_apply_dependencies(root, clauses, varRelid,
+												nmvstats, mvstats, sjinfo);
+	}
+
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -782,3 +900,533 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
+				   Oid *relid, SpecialJoinInfo *sjinfo)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+						Oid *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* no support for OR clauses at this point */
+		if (rinfo->orclause)
+			return false;
+
+		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
+		clause = (Node*)rinfo->clause;
+
+		/* only simple opclauses are compatible with multivariate stats */
+		if (! is_opclause(clause))
+			return false;
+
+		/* we don't support join conditions at this moment */
+		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+			return false;
+
+		/* is it 'variable op constant' ? */
+		if (list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+				RangeTblEntry * rte;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+				 * (return NULL).
+				 *
+				 * TODO Maybe use examine_variable() would fix that?
+				 */
+				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+					return false;
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 *
+				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+				 *       part seems to be enforced by treat_as_join_clause().
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				/* Lookup info about the base relation (we need to pass the OID out) */
+				rte = planner_rt_fetch(var->varno, root);
+				*relid = rte->relid;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_EQSEL:
+							*attnum = var->varattno;
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * Performs reduction of clauses using functional dependencies, i.e.
+ * removes clauses that are considered redundant. It simply walks
+ * through dependencies, and checks whether the dependency 'matches'
+ * the clauses, i.e. if there's a clause matching the condition. If yes,
+ * all clauses matching the implied part of the dependency are removed
+ * from the list.
+ *
+ * This simply looks at attnums references by the clauses, not at the
+ * type of the operator (equality, inequality, ...). This may not be the
+ * right way to do - it certainly works best for equalities, which is
+ * naturally consistent with functional dependencies (implications).
+ * It's not clear that other operators are handled sensibly - for
+ * example for inequalities, like
+ *
+ *     WHERE (A >= 10) AND (B <= 20)
+ *
+ * and a trivial case where [A == B], resulting in symmetric pair of
+ * rules [A => B], [B => A], it's rather clear we can't remove either of
+ * those clauses.
+ *
+ * That only highlights that functional dependencies are most suitable
+ * for label-like data, where using non-equality operators is very rare.
+ * Using the common city/zipcode example, clauses like
+ *
+ *     (zipcode <= 12345)
+ *
+ * or
+ *
+ *     (cityname >= 'Washington')
+ *
+ * are rare. So restricting the reduction to equality should not harm
+ * the usefulness / applicability.
+ *
+ * The other assumption is that this assumes 'compatible' clauses. For
+ * example by using mismatching zip code and city name, this is unable
+ * to identify the discrepancy and eliminates one of the clauses. The
+ * usual approach (multiplying both selectivities) thus produces a more
+ * accurate estimate, although mostly by luck - the multiplication
+ * comes from assumption of statistical independence of the two
+ * conditions (which is not not valid in this case), but moves the
+ * estimate in the right direction (towards 0%).
+ *
+ * This might be somewhat improved by cross-checking the selectivities
+ * against MCV and/or histogram.
+ *
+ * The implementation needs to be careful about cyclic rules, i.e. rules
+ * like [A => B] and [B => A] at the same time. This must not reduce
+ * clauses on both attributes at the same time.
+ *
+ * Technically we might consider selectivities here too, somehow. E.g.
+ * when (A => B) and (B => A), we might use the clauses with minimum
+ * selectivity.
+ *
+ * TODO Consider restricting the reduction to equality clauses. Or maybe
+ *      use equality classes somehow?
+ *
+ * TODO Merge this docs to dependencies.c, as it's saying mostly the
+ *      same things as the comments there.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses, Oid varRelid,
+					int nmvstats, MVStats mvstats, SpecialJoinInfo *sjinfo)
+{
+	int i;
+	ListCell *lc;
+	List * reduced_clauses = NIL;
+	Oid	relid;
+
+	/*
+	 * preallocate space for all clauses, including non-mv-compatible,
+	 * so that we don't need to reallocate the arrays repeatedly
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+	int			nmvclauses = 0;	/* number clauses in the arrays */
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	int			attnum, attidx, attnum_max;
+
+	bool		has_deps_built = false;
+
+	/* see if there's at least one statistics with dependencies */
+	for (i = 0; i < nmvstats; i++)
+	{
+		if (mvstats[i].deps_built)
+		{
+			has_deps_built = true;
+			break;
+		}
+	}
+
+	/* no dependencies available - return the original clauses */
+	if (! has_deps_built)
+		return clauses;
+
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+
+	/*
+	 * Walk through the clauses - clauses that are not mv-compatible copy
+	 * directly into the result list, and mv-compatible ones store into
+	 * an array of clauses (and remember the attnumb in another array).
+	 */
+	foreach (lc, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+		if (! clause_is_mv_compatible(root, clause, varRelid, &relid, &attnum, sjinfo))
+			reduced_clauses = lappend(reduced_clauses, clause);
+		else
+		{
+			mvclauses[nmvclauses] = clause;
+			mvattnums[nmvclauses] = attnum;
+			nmvclauses++;
+
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((nmvclauses < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		pfree(mvattnums);
+		pfree(mvclauses);
+
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* build the dependency matrix */
+	attnum_max = -1;
+	for (i = 0; i < nmvstats; i++)
+	{
+		int j;
+		int2vector *stakeys = mvstats[i].stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! mvstats[i].deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+		{
+			int attnum = stakeys->values[j];
+			deps_attnums = bms_add_member(deps_attnums, attnum);
+
+			/* keep the max attnum in the dependencies */
+			attnum_max = (attnum > attnum_max) ? attnum : attnum_max;
+		}
+	}
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		pfree(mvattnums);
+		pfree(mvclauses);
+
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/* allocate the matrix and mappings */
+	deps_natts  = bms_num_members(deps_attnums);
+	deps_matrix = (bool*)palloc0(deps_natts * deps_natts * sizeof(int));
+	deps_idx_to_attnum = (int*)palloc0(deps_natts * sizeof(int));
+	deps_attnum_to_idx = (int*)palloc0((attnum_max+1) * sizeof(int));
+
+	/* build the (attnum => attidx) and (attidx => attnum) mappings */
+	attidx = 0;
+	attnum = -1;
+
+	while (true)
+	{
+		attnum = bms_next_member(deps_attnums, attnum);
+		if (attnum == -2)
+			break;
+
+		deps_idx_to_attnum[attidx] = attnum;
+		deps_attnum_to_idx[attnum] = attidx;
+
+		attidx += 1;
+	}
+
+	/* do we have all the attributes mapped? */
+	Assert(attidx == deps_natts);
+
+	/* walk through all the mvstats, build the adjacency matrix */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int j;
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! mvstats[i].deps_built)
+			continue;
+
+		/* fetch dependencies */
+		dependencies = deserialize_mv_dependencies(fetch_mv_dependencies(mvstats[i].mvoid));
+		if (dependencies == NULL)
+			continue;
+
+		/* set deps_matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = deps_attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = deps_attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			deps_matrix[aidx * deps_natts + bidx] = true;
+		}
+	}
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	for (i = 0; i < deps_natts; i++)
+	{
+		int k, l, m;
+		int nchanges = 0;
+
+		/* k => l */
+		for (k = 0; k < deps_natts; k++)
+		{
+			for (l = 0; l < deps_natts; l++)
+			{
+				/* we already have this dependency */
+				if (deps_matrix[k * deps_natts + l])
+					continue;
+
+				/* we don't really care about the exact value, just 0/1 */
+				for (m = 0; m < deps_natts; m++)
+				{
+					if (deps_matrix[k * deps_natts + m] * deps_matrix[m * deps_natts + l])
+					{
+						deps_matrix[k * deps_natts + l] = true;
+						nchanges += 1;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added here, so terminate */
+		if (nchanges == 0)
+			break;
+	}
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], deps_attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], deps_attnums))
+				continue;
+
+			aidx = deps_attnum_to_idx[mvattnums[i]];
+			bidx = deps_attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = deps_matrix[aidx * deps_natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+	}
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 8efc5ba..d44b95a 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -57,7 +57,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (mvstats->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(mvstats[i].mvoid, deps);
@@ -154,6 +155,7 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 
 		result[*nstats].mvoid = HeapTupleGetOid(htup);
 		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		result[*nstats].deps_enabled = stats->deps_enabled;
 		result[*nstats].deps_built = stats->deps_built;
 		*nstats += 1;
 	}
@@ -260,6 +262,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f728d88..2916f11 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2712,9 +2712,9 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
-DATA(insert OID = 3284 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DATA(insert OID = 3377 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies info");
-DATA(insert OID = 3285 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DATA(insert OID = 3378 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 5c8643d..ec6764b 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -18,24 +18,34 @@
 
 /*
  * Basic info about the stats, used when choosing what to use
- *
- * TODO Add info about what statistics is available (histogram, MCV,
- *      hashed MCV, functional dependencies).
  */
 typedef struct MVStatsData {
 	Oid			mvoid;		/* OID of the stats in pg_mv_statistic */
 	int2vector *stakeys;	/* attnums for columns in the stats */
+
+	/* statistics requested in ALTER TABLE ... ADD STATISTICS */
+	bool		deps_enabled;	/* analyze functional dependencies */
+
+	/* available statistics (computed by ANALYZE) */
 	bool		deps_built;	/* functional dependencies available */
 } MVStatsData;
 
 typedef struct MVStatsData *MVStats;
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -61,6 +71,7 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
+bytea * fetch_mv_rules(Oid mvoid);
 
 bytea * fetch_mv_dependencies(Oid mvoid);
 
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..159d317
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,175 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE functional_dependencies ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c, d);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 6d3b865..00c6ddf 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -109,3 +109,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 8326894..b818be9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -153,3 +153,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..f95dbf5
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,153 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (unknown_column);
+
+-- single column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a);
+
+-- single column, duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE functional_dependencies ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- correct command
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c, d);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+DROP TABLE functional_dependencies;
-- 
2.0.5

0003-multivariate-MCV-lists.patchtext/x-diff; name=0003-multivariate-MCV-lists.patchDownload

>From 13c3d4cbe85bbbe6b9509de15dd08384df1df97f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:15:37 +0100
Subject: [PATCH 3/5] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 src/backend/catalog/system_views.sql   |    4 +-
 src/backend/commands/tablecmds.c       |   47 +-
 src/backend/optimizer/path/clausesel.c | 1153 ++++++++++++++++++++++++++++++--
 src/backend/utils/mvstats/Makefile     |    2 +-
 src/backend/utils/mvstats/common.c     |   58 +-
 src/backend/utils/mvstats/common.h     |   11 +-
 src/backend/utils/mvstats/mcv.c        | 1002 +++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h  |   18 +-
 src/include/catalog/pg_proc.h          |    2 +
 src/include/utils/mvstats.h            |   68 +-
 src/test/regress/expected/mv_mcv.out   |  210 ++++++
 src/test/regress/expected/rules.out    |    4 +-
 src/test/regress/parallel_schedule     |    2 +-
 src/test/regress/serial_schedule       |    1 +
 src/test/regress/sql/mv_mcv.sql        |  181 +++++
 15 files changed, 2662 insertions(+), 101 deletions(-)
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d05a716..4538e63 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -156,7 +156,9 @@ CREATE VIEW pg_mv_stats AS
         C.relname AS tablename,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 965d342..fae0fc7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11901,7 +11901,13 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	Relation	mvstatrel;
 
 	/* by default build everything */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11956,6 +11962,29 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -11964,10 +11993,16 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -11983,9 +12018,13 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
 
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
-	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+	nulls[Anum_pg_mv_statistic_stadeps -1]  = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index e742827..d24aedf 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -20,6 +20,7 @@
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
@@ -50,17 +51,46 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-							 Oid *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+							 Oid *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+							 int type);
 
 static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
-									  Oid varRelid, Oid *relid, SpecialJoinInfo *sjinfo);
+									  Oid varRelid, Oid *relid, SpecialJoinInfo *sjinfo,
+									  int type);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, int nmvstats, MVStats mvstats,
 								SpecialJoinInfo *sjinfo);
 
+static int choose_mv_statistics(int nmvstats, MVStats mvstats,
+								Bitmapset *attnums);
+static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+								 List *clauses, Oid varRelid,
+								 List **mvclauses, MVStats mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStats mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStats mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,or)	\
+	(m) = (or) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -197,15 +227,19 @@ clauselist_selectivity(PlannerInfo *root,
 	Bitmapset  *mvattnums = NULL;
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
-	/* collect attributes referenced by mv-compatible clauses */
-	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+	/*
+	 * Collect attributes referenced by mv-compatible clauses (looking
+	 * for clauses compatible with functional dependencies for now).
+	 */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+								   MV_CLAUSE_TYPE_FDEP);
 
 	/*
 	 * If there are mv-compatible clauses, referencing at least two
@@ -227,6 +261,49 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
+	 * Recollect attributes from mv-compatible clauses (maybe we've
+	 * removed so many clauses we have a single mv-compatible attnum).
+	 * From now on we're only interested in MCV-compatible clauses.
+	 */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+								   MV_CLAUSE_TYPE_MCV);
+
+	/*
+	 * If there still are at least two columns, we'll try to select
+	 * a suitable multivariate stats.
+	 */
+	if (bms_num_members(mvattnums) >= 2)
+	{
+		/* fetch info from the catalog (not the serialized stats yet) */
+		mvstats = list_mv_stats(relid, &nmvstats, true);
+
+		/* see choose_mv_statistics() for details */
+		if (nmvstats > 0)
+		{
+			int idx = choose_mv_statistics(nmvstats, mvstats, mvattnums);
+
+			if (idx >= 0)	/* we have a matching stats */
+			{
+				MVStats mvstat = &mvstats[idx];
+
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										MV_CLAUSE_TYPE_MCV);
+
+				/* we've chosen the histogram to match the clauses */
+				Assert(mvclauses != NIL);
+
+				/* compute the multivariate stats */
+				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			}
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -901,12 +978,198 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * Estimate selectivity for the list of MV-compatible clauses, using that
+ * particular histogram.
+ *
+ * When we hit a single bucket, we don't know what portion of it actually
+ * matches the clauses (e.g. equality), and we use 1/2 the bucket by
+ * default. However, the MV histograms are usually less detailed than
+ * the per-column ones, meaning the sum of buckets is often quite high
+ * (thanks to combining a lot of "partially hit" buckets).
+ *
+ * There are several ways to improve this, usually with cases when it
+ * won't really help. Also, the more complex the process, the worse
+ * the failures (i.e. misestimates).
+ *
+ * (1) Use the MV histogram only as a way to combine multiple
+ *     per-column histograms, essentially rewriting
+ *
+ *       P(A & B) = P(A) * P(B|A)
+ *
+ *     where P(B|A) may be computed using a proper "slice" of the
+ *     histogram, by first selecting only buckets where A is true, and
+ *     then using the boundaries to 'restrict' the per-colunm histogram.
+ *
+ *     With more clauses, it gets more complicated, of course
+ *
+ *       P(A & B & C) = P(A & C) * P(B|A & C)
+ *                    = P(A) * P(C|A) * P(B|A & C)
+ *
+ *     and so on.
+ *
+ *     Of course, the question is how well and efficiently we can
+ *     compute the conditional probabilities - whether this approach
+ *     can improve the estimates (instead of amplifying the errors).
+ *
+ *     Also, this does not eliminate the need for histogram on [A,B,C].
+ *
+ * (2) Use multiple smaller (and more accurate) histograms, and combine
+ *     them using a process similar to the above. E.g. by assuming that
+ *     B and C are independent, we can rewrite
+ *
+ *       P(B|A & C) = P(B|A)
+ *
+ *     so we can rewrite the whole formula to
+ *
+ *       P(A & B & C) = P(A) * P(C|A) * P(B|A)
+ *
+ *     and we're OK with two 2D histograms [A,C] and [A,B].
+ *
+ *     It'd be nice to perform some sort of statistical test (Fisher
+ *     or another chi-squared test) to identify independent components
+ *     and automatically separate them into smaller histograms.
+ *
+ * (3) Using the estimated number of distinct values in a bucket to
+ *     decide the selectivity of equality in the bucket (instead of
+ *     blindly using 1/2 of the bucket, we may use 1/ndistinct).
+ *     Of course, if the ndistinct estimate is way off, or when the
+ *     distribution is not uniform (one distict items get much more
+ *     items), this will fail. Also, we currently don't have ndistinct
+ *     estimate available at this moment (but it shouldn't be that
+ *     difficult to compute as ndistinct and ntuples should be available).
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities
+ *      (i.e. the selectivity of the most restrictive clause), because
+ *      that's the maximum we can ever get from ANDed list of clauses.
+ *      This may probably prevent issues with hitting too many buckets
+ *      and low precision histograms.
+ *
+ * TODO We may support some additional conditions, most importantly
+ *      those matching multiple columns (e.g. "a = b" or "a < b").
+ *      Ultimately we could track multi-table histograms for join
+ *      cardinality estimation.
+ *
+ * TODO Currently this is only estimating all clauses, or clauses
+ *      matching varRelid (when it's not 0). I'm not sure what's the
+ *      purpose of varRelid, but my assumption is this is used for
+ *      join conditions and such. In that case we can use those clauses
+ *      to restrict the other (i.e. filter the histogram buckets first,
+ *      before estimating the other clauses). This is essentially equal
+ *      to computing P(A|B) where "B" are the clauses not matching the
+ *      varRelid.
+ *
+ * TODO Further thoughts on processing equality clauses - maybe it'd be
+ *      better to look for stats (with MCV) covered by the equality
+ *      clauses, because then we have a chance to find an exact match
+ *      in the MCV list, which is pretty much the best we can do. We may
+ *      also look at the least frequent MCV item, and use it as a upper
+ *      boundary for the selectivity (had there been a more frequent
+ *      item, it'd be in the MCV list).
+ *
+ *      These conditions may then be used as a condition for the other
+ *      selectivities, i.e. we may estimate P(A,B) first, and then
+ *      compute P(C|A,B) from another histogram. This may be useful when
+ *      we can estimate P(A,B) accurately (e.g. because it's a complete
+ *      equality match evaluated on MCV list), and then compute the
+ *      conditional probability P(C|A,B), giving us the requested stats
+ *
+ *          P(A,B,C) = P(A,B) * P(C|A,B)
+ *
+ * TODO There are several options for 'sanity clamping' the estimates.
+ *
+ *      First, if we have selectivities for each condition, then
+ *
+ *          P(A,B) <= MIN(P(A), P(B))
+ *
+ *      Because additional conditions (connected by AND) can only lower
+ *      the probability.
+ *
+ *      So we can do some basic sanity checks using the single-variate
+ *      stats (the ones we have right now).
+ *
+ *      Second, when we have multivariate stats with a MCV list, then
+ *
+ *      (a) if we have a full equality condition (one equality condition
+ *          on each column) and we found a match in the MCV list, this is
+ *          the selectivity (and it's supposed to be exact)
+ *
+ *      (b) if we have a full equality condition and we haven't found a
+ *          match in the MCV list, then the selectivity is below the
+ *          lowest selectivity in the MCV list
+ *
+ *      (c) if we have a equality condition (not full), we can still
+ *          search the MCV for matches and use the sum of probabilities
+ *          as a lower boundary for the histogram (if there are no
+ *          matches in the MCV list, then we have no boundary)
+ *
+ *      Third, if there are multiple multivariate stats for a set of
+ *      clauses, we may compute all of them and then somehow aggregate
+ *      them - e.g. by choosing the minimum, median or average. The
+ *      multi-variate stats are susceptible to overestimation (because
+ *      we take 50% of the bucket for partial matches). Some stats may
+ *      give better estimates than others, but it's very difficult to
+ *      say determine that in advance which one is the best (it depends
+ *      on the number of buckets, number of additional columns not
+ *      referenced in the clauses etc.) so we may compute all and then
+ *      choose a sane aggregation (minimum seems like a good approach).
+ *      Of course, this may result in longer / more expensive estimation
+ *      (CPU-wise), but it may be worth it.
+ *
+ *      There are ways to address this, though. First, it's possible to
+ *      add a GUC choosing whether to do a 'simple' (using a single
+ *      stats expected to give the best estimate) and 'complex' (combining
+ *      the multiple estimates).
+ *
+ *          multivariate_estimates = (simple|full)
+ *
+ *      Also, this might be enabled at a table level, by something like
+ *
+ *          ALTER TABLE ... SET STATISTICS (simple|full)
+ *
+ *      Which would make it possible to use this only for the tables
+ *      where the simple approach does not work.
+ *
+ *      Also, there are ways to optimize this algorithmically. E.g. we
+ *      may try to get an estimate from a matching MCV list first, and
+ *      if we happen to get a "full equality match" we may stop computing
+ *      the estimates from other stats (for this condition) because
+ *      that's probably the best estimate we can really get.
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive).
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
 collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
-				   Oid *relid, SpecialJoinInfo *sjinfo)
+				   Oid *relid, SpecialJoinInfo *sjinfo, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
@@ -922,12 +1185,11 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+								sjinfo, types);
 	}
 
 	/*
@@ -946,6 +1208,180 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
+ * We're looking for statistics matching at least 2 attributes,
+ * referenced in the clauses compatible with multivariate statistics.
+ * The current selection criteria is very simple - we choose the
+ * statistics referencing the most attributes.
+ *
+ * If there are multiple statistics referencing the same number of
+ * columns (from the clauses), the one with less source columns
+ * (as listed in the ADD STATISTICS when creating the statistics) wins.
+ * Other wise the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns,
+ *     but one has 100 buckets and the other one has 1000 buckets (thus
+ *     likely providing better estimates), this is not currently
+ *     considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list,
+ *     another one with just a histogram and a third one with both,
+ *     this is not considered.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts,
+ *     so if there are multiple clauses on a single attribute, this
+ *     still counts as a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example
+ *     equality clauses probably work better with MCV lists than with
+ *     histograms. But IS [NOT] NULL conditions may often work better
+ *     with histograms (thanks to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
+ * selected as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list
+ * of clauses into two parts - conditions that are compatible with the
+ * selected stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while
+ * the last condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting
+ * conditions instead of just referenced attributes), but eventually
+ * the best option should be to combine multiple statistics. But that's
+ * much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing
+ *      the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ */
+static int
+choose_mv_statistics(int nmvstats, MVStats mvstats, Bitmapset *attnums)
+{
+	int i, j;
+
+	int choice = -1;
+	int current_matches = 1;					/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements)
+	 * and for each one count the referenced attributes (encoded in
+	 * the 'attnums' bitmap).
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = mvstats[i].stakeys;
+		int	numattrs = mvstats[i].stakeys->dim1;
+
+		/* count columns covered by the histogram */
+		for (j = 0; j < numattrs; j++)
+			if (bms_is_member(attrs->values[j], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = i;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen statistics, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStats mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes covered by the stats, so we can
+	 * do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
+									&attnums, sjinfo, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list
+		 * of mv-compatible clauses. Otherwise, keep it in the list of
+		 * 'regular' clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
  * into simple_rte_array) and a bitmap of attributes. This is then
@@ -964,96 +1400,205 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
  */
 static bool
 clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-						Oid *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+						Oid *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+						int types)
 {
+	Relids clause_relids;
+	Relids left_relids;
+	Relids right_relids;
 
 	if (IsA(clause, RestrictInfo))
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) clause;
 
-		/* Pseudoconstants are not really interesting here. */
-		if (rinfo->pseudoconstant)
+		if (! IsA(clause, RestrictInfo))
+		{
+			elog(WARNING, "expected RestrictInfo, got type %d", clause->type);
 			return false;
+		}
 
-		/* no support for OR clauses at this point */
-		if (rinfo->orclause)
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
 			return false;
 
 		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
 		clause = (Node*)rinfo->clause;
 
-		/* only simple opclauses are compatible with multivariate stats */
-		if (! is_opclause(clause))
-			return false;
-
 		/* we don't support join conditions at this moment */
 		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
 			return false;
 
+		clause_relids = rinfo->clause_relids;
+		left_relids = rinfo->left_relids;
+		right_relids = rinfo->right_relids;
+	}
+	else if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+	{
+		left_relids = pull_varnos(get_leftop((Expr*)clause));
+		right_relids = pull_varnos(get_rightop((Expr*)clause));
+
+		clause_relids = bms_union(left_relids,
+								  right_relids);
+	}
+	else
+	{
+		/* Not a binary opclause, so mark left/right relid sets as empty */
+		left_relids = NULL;
+		right_relids = NULL;
+		/* and get the total relid set the hard way */
+		clause_relids = pull_varnos((Node *) clause);
+	}
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+		bool		varonleft = true;
+		bool		ok;
+
 		/* is it 'variable op constant' ? */
-		if (list_length(((OpExpr *) clause)->args) == 2)
-		{
-			OpExpr	   *expr = (OpExpr *) clause;
-			bool		varonleft = true;
-			bool		ok;
 
-			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-				(is_pseudo_constant_clause_relids(lsecond(expr->args),
-												rinfo->right_relids) ||
-				(varonleft = false,
-				is_pseudo_constant_clause_relids(linitial(expr->args),
-												rinfo->left_relids)));
+		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
+			(is_pseudo_constant_clause_relids(lsecond(expr->args),
+											  right_relids) ||
+			(varonleft = false,
+			is_pseudo_constant_clause_relids(linitial(expr->args),
+											 left_relids)));
 
-			if (ok)
-			{
-				RangeTblEntry * rte;
-				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+		if (ok)
+		{
+			RangeTblEntry * rte;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
 
-				/*
-				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-				 * (return NULL).
-				 *
-				 * TODO Maybe use examine_variable() would fix that?
-				 */
-				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-					return false;
+			/*
+			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+			 * (return NULL).
+			 *
+			 * TODO Maybe use examine_variable() would fix that?
+			 */
+			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+				return false;
 
-				/*
-				 * Only consider this variable if (varRelid == 0) or when the varno
-				 * matches varRelid (see explanation at clause_selectivity).
-				 *
-				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-				 *       part seems to be enforced by treat_as_join_clause().
-				 */
-				if (! ((varRelid == 0) || (varRelid == var->varno)))
-					return false;
+			/*
+			 * Only consider this variable if (varRelid == 0) or when the varno
+			 * matches varRelid (see explanation at clause_selectivity).
+			 *
+			 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+			 *       part seems to be enforced by treat_as_join_clause().
+			 */
+			if (! ((varRelid == 0) || (varRelid == var->varno)))
+				return false;
 
-				/* Also skip special varno values, and system attributes ... */
-				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-					return false;
+			/* Also skip special varno values, and system attributes ... */
+			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+				return false;
 
-				/* Lookup info about the base relation (we need to pass the OID out) */
+			/* Lookup info about the base relation (we need to pass the OID out) */
+			if (relid != NULL)
+			{
 				rte = planner_rt_fetch(var->varno, root);
 				*relid = rte->relid;
-
-				/*
-				 * If it's not a "<" or ">" or "=" operator, just ignore the
-				 * clause. Otherwise note the relid and attnum for the variable.
-				 * This uses the function for estimating selectivity, ont the
-				 * operator directly (a bit awkward, but well ...).
-				 */
-				switch (get_oprrest(expr->opno))
-					{
-						case F_EQSEL:
-							*attnum = var->varattno;
-							return true;
-					}
 			}
+
+			/*
+			 * If it's not a "<" or ">" or "=" operator, just ignore the
+			 * clause. Otherwise note the relid and attnum for the variable.
+			 * This uses the function for estimating selectivity, ont the
+			 * operator directly (a bit awkward, but well ...).
+			 */
+			switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:
+					case F_SCALARGTSEL:
+						/* not compatible with functional dependencies */
+						if (types & MV_CLAUSE_TYPE_MCV)
+						{
+							*attnums = bms_add_member(*attnums, var->varattno);
+							return (types & MV_CLAUSE_TYPE_MCV);
+						}
+						return false;
+
+					case F_EQSEL:
+						*attnums = bms_add_member(*attnums, var->varattno);
+						return true;
+				}
 		}
 	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		RangeTblEntry * rte;
+		Var * var = (Var*)((NullTest*)clause)->arg;
 
-	return false;
+		/*
+		 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+		 * (return NULL).
+		 *
+		 * TODO Maybe use examine_variable() would fix that?
+		 */
+		if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+			return false;
 
+		/*
+		 * Only consider this variable if (varRelid == 0) or when the varno
+		 * matches varRelid (see explanation at clause_selectivity).
+		 *
+		 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+		 *       part seems to be enforced by treat_as_join_clause().
+		 */
+		if (! ((varRelid == 0) || (varRelid == var->varno)))
+			return false;
+
+		/* Also skip special varno values, and system attributes ... */
+		if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+			return false;
+
+		/* Lookup info about the base relation (we need to pass the OID out) */
+		if (relid != NULL)
+		{
+			rte = planner_rt_fetch(var->varno, root);
+			*relid = rte->relid;
+		}
+
+		*attnums = bms_add_member(*attnums, var->varattno);
+
+		return true;
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		/*
+		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses
+		 *      are supported and some are not, and treat all supported
+		 *      subclauses as a single clause, compute it's selectivity
+		 *      using mv stats, and compute the total selectivity using
+		 *      the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the
+		 *      orclause with nested RestrictInfo - we won't have to
+		 *      call pull_varnos() for each clause, saving time. 
+		 */
+		Bitmapset *tmp = NULL;
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			if (! clause_is_mv_compatible(root, (Node*)lfirst(l),
+						varRelid, relid, &tmp, sjinfo, types))
+				return false;
+		}
+
+		/* add the attnums from the OR-clause to the set of attnums */
+		*attnums = bms_join(*attnums, tmp);
+
+		return true;
+	}
+
+	return false;
 }
 
 /*
@@ -1115,6 +1660,13 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
  *
  * TODO Merge this docs to dependencies.c, as it's saying mostly the
  *      same things as the comments there.
+ * 
+ * TODO Currently this is applied only to the top-level clauses, but
+ *      maybe we could apply it to lists at subtrees too, e.g. to the
+ *      two AND-clauses in
+ *
+ *          (x=1 AND y=2) OR (z=3 AND q=10)
+ *
  */
 static List *
 clauselist_apply_dependencies(PlannerInfo *root, List *clauses, Oid varRelid,
@@ -1195,17 +1747,27 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	foreach (lc, clauses)
 	{
-		AttrNumber attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
-		if (! clause_is_mv_compatible(root, clause, varRelid, &relid, &attnum, sjinfo))
+		if (! clause_is_mv_compatible(root, clause, varRelid, &relid, &attnums,
+									  sjinfo, MV_CLAUSE_TYPE_FDEP))
+			reduced_clauses = lappend(reduced_clauses, clause);
+		else if (bms_num_members(attnums) > 1)
+			/* FIXME This may happen thanks to OR-clauses, which should
+			 *       really be handled differently for functional
+			 *       dependencies.
+			 */
 			reduced_clauses = lappend(reduced_clauses, clause);
 		else
 		{
+			/* functional dependencies support only [Var = Const] */
+			Assert(bms_num_members(attnums) == 1);
 			mvclauses[nmvclauses] = clause;
-			mvattnums[nmvclauses] = attnum;
+			mvattnums[nmvclauses] = bms_singleton_member(attnums);
 			nmvclauses++;
 
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+											bms_singleton_member(attnums));
 		}
 	}
 
@@ -1430,3 +1992,446 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses, Oid varRelid,
 
 	return reduced_clauses;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStats mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = deserialize_mv_mcvlist(fetch_mv_mcvlist(mvstats->mvoid));
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/* frequency of the lowest MCV item */
+	*lowsel = 1.0;
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 *
+	 * FIXME This would probably deserve a refactoring, I guess. Unify
+	 *       the two loops and put the checks inside, or something like
+	 *       that.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			/* operator */
+			FmgrInfo	opproc;
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	ltproc, gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype,
+										TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * find the lowest selectivity in the MCV
+					 * FIXME Maybe not the best place do do this (in for all clauses).
+					 */
+					if (item->frequency < *lowsel)
+						*lowsel = item->frequency;
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* TODO consider bsearch here (list is sorted by values)
+					 * TODO handle other operators too (LT, GT)
+					 * TODO identify "full match" when the clauses fully
+					 *      match the whole MCV list (so that checking the
+					 *      histogram is not needed)
+					 */
+					if (get_oprrest(expr->opno) == F_EQSEL)
+					{
+						/*
+						 * We don't care about isgt in equality, because it does not
+						 * matter whether it's (var = const) or (const = var).
+						 */
+						bool match = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+
+						if (match)
+							eqmatches = bms_add_member(eqmatches, idx);
+
+						mismatch = (! match);
+					}
+					else if (get_oprrest(expr->opno) == F_SCALARLTSEL)	/* column < constant */
+					{
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+						} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+					}
+					else if (get_oprrest(expr->opno) == F_SCALARGTSEL)	/* column > constant */
+					{
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/*
+				 * find the lowest selectivity in the MCV
+				 * FIXME Maybe not the best place do do this (in for all clauses).
+				 */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! mcvlist->items[i]->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (mcvlist->items[i]->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..3c0aff4 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o mcv.o dependencies.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d44b95a..bd952c6 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -17,8 +17,8 @@
 #include "common.h"
 
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
-
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 /*
  * Compute requested multivariate stats, using the rows sampled for the
@@ -44,6 +44,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 	for (i = 0; i < nmvstats; i++)
 	{
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		/* int2 vector of attnums the stats should be computed on */
 		int2vector * attrs = mvstats[i].stakeys;
@@ -60,8 +62,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (mvstats->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (mvstats->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(mvstats[i].mvoid, deps);
+		update_mv_stats(mvstats[i].mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -143,7 +149,7 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 		 * Skip statistics that were not computed yet (if only stats
 		 * that were already built were requested)
 		 */
-		if (built_only && (! stats->deps_built))
+		if (built_only && (! (stats->mcv_built || stats->deps_built)))
 			continue;
 
 		/* double the array size if needed */
@@ -156,7 +162,9 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 		result[*nstats].mvoid = HeapTupleGetOid(htup);
 		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		result[*nstats].deps_enabled = stats->deps_enabled;
+		result[*nstats].mcv_enabled = stats->mcv_enabled;
 		result[*nstats].deps_built = stats->deps_built;
+		result[*nstats].mcv_built = stats->mcv_built;
 		*nstats += 1;
 	}
 
@@ -171,7 +179,9 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 }
 
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -196,15 +206,26 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
 	oldtup = SearchSysCache1(MVSTATOID,
@@ -232,6 +253,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -242,11 +278,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 6d5465b..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..4466cee
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1002 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+/*
+ * Multivariate MCVs (most-common values lists) are a straightforward
+ * extension of regular MCV list by tracking combinations of values for
+ * several attributes (columns), including NULL flags, and frequency
+ * of the combination.
+ *
+ * For columns small number of distinct values, this works quite well
+ * and may represent the distribution pretty exactly. For columns with
+ * large number of distinct values (e.g. stored as FLOAT), this does
+ * not work that well.
+ *
+ * If we can represent the distribution as a MCV list, we can estimate
+ * some clauses (e.g. equality clauses) much accurately than using
+ * histograms for example.
+ *
+ * Discrete distributions are also easier to combine into a larger
+ * distribution (but this is not yet implemented).
+ *
+ *
+ * TODO For types that don't reasonably support ordering (either because
+ *      the type does not support that or when the user adds some option
+ *      to the ADD STATISTICS command - e.g. UNSORTED_STATS), building
+ *      the histogram may be pointless and inefficient. This is esp.
+ *      true for varlena types that may be quite large and a large MCV
+ *      list may be a better choice, because it makes equality estimates
+ *      more accurate. Due to the unsorted nature, range queries on those
+ *      attributes are rather useless anyway.
+ *
+ *      Another thing is that by restricting to MCV list and equality
+ *      conditions, we can use hash values instead of long varlena values.
+ *      The equality estimation will be very accurate.
+ *
+ *      This however complicates matching the columns to available
+ *      statistics, as it will require matching clauses (not columns) to
+ *      stats. And it may get quite complex - e.g. what if there are
+ *      multiple clauses, each compatible with different stats subset?
+ *
+ *
+ * Selectivity estimation
+ * ----------------------
+ * The estimation, implemented in clauselist_mv_selectivity_mcvlist(),
+ * is quite simple in principle - walk through the MCV items and sum
+ * frequencies of all the items that match all the clauses.
+ *
+ * The current implementation uses MCV lists to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (a) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *
+ *  (b) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ *
+ * Estimating equality clauses
+ * ---------------------------
+ * When computing selectivity estimate for equality clauses
+ *
+ *      (a = 1) AND (b = 2)
+ *
+ * we can do this estimate pretty exactly assuming that two conditions
+ * are met:
+ *
+ *      (1) there's an equality condition on each attribute
+ *
+ *      (2) we find a matching item in the MCV list
+ *
+ * In that case we know the MCV item represents all the tuples matching
+ * the clauses, and the selectivity estimate is complete. This is what
+ * we call 'full match'.
+ *
+ * When only (1) holds, but there's no matching MCV item, we don't know
+ * whether there are no such rows or just are not very frequent. We can
+ * however use the frequency of the least frequent MCV item as an upper
+ * bound for the selectivity.
+ *
+ * If the equality conditions match only a subset of the attributes
+ * the MCV list is built on (i.e. we can't get a full match - we may get
+ * multiple MCV items matching the clauses, but even if we get a single
+ * match there may be items that did not get into the MCV list. But in
+ * this case we can still use the frequency of the last MCV item to clam
+ * the 'additional' selectivity not accounted for by the matching items.
+ *
+ * If there's no histogram, because the MCV list approximates the
+ * distribution accurately (not because the histogram was disabled),
+ * it does not really matter whether there are equality conditions on
+ * all the columns - we can do pretty accurate estimation using the MCV.
+ *
+ * TODO For a combination of equality conditions (not full-match case)
+ *      we probably can clamp the selectivity by the minimum of
+ *      selectivities for each condition. For example if we know the
+ *      number of distinct values for each column, we can use 1/ndistinct
+ *      as a per-column estimate. Or rather 1/ndistinct + selectivity
+ *      derived from the MCV list.
+ *
+ *      If we know the estimate of number of combinations of the columns
+ *      (i.e. ndistinct(A,B)), we may estimate the average frequency of
+ *      items in the remaining 10% as [10% / ndistinct(A,B)].
+ *
+ *
+ * Bounding estimates
+ * ------------------
+ * In general the MCV lists may not provide estimates as accurate as
+ * for the full-match equality case, but may provide some useful
+ * lower/upper boundaries for the estimation error.
+ *
+ * With equality clauses we can do a few more tricks to narrow this
+ * error range (see the previous section and TODO), but with inequality
+ * clauses (or generally non-equality clauses), it's rather dificult.
+ * There's nothing like a 'full match' - we have to consider both the
+ * MCV items and the remaining part every time. We can't use the minimum
+ * selectivity of MCV items, as the clauses may match multiple items.
+ *
+ * For example with a MCV list on columns (A, B), covering 90% of the
+ * table (computed while building the MCV list), about ~10% of the table
+ * is not represented by the MCV list. So even if the conditions match
+ * all the remaining rows (not represented by the MCV items), we can't
+ * get selectivity higher than those 10%. We may use 1/2 the remaining
+ * selectivity as an estimate (minimizing average error).
+ *
+ * TODO Most of these ideas (error limiting) are not yet implemented.
+ *
+ *
+ * General TODO
+ * ------------
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * TODO Add support for IS [NOT] NULL clauses, and clauses referencing
+ *      multiple columns (a < b).
+ *
+ * TODO It's possible to build a special case of MCV list, storing not
+ *      the actual values but only 32/64-bit hash. This is only useful
+ *      for estimating equality clauses and for large varlena types,
+ *      which are very impractical for plain MCV list because of size.
+ *      But for those data types we really want just the equality
+ *      clauses, so it's actually a good solution.
+ *
+ * TODO Currently there's no logic to consider building only a MCV list
+ *      (and not building the histogram at all), except for doing this
+ *      decision manually in ADD STATISTICS.
+ */
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(int32) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((int32*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+bytea *
+fetch_mv_mcvlist(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *mcvlist = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum tmp  = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stamcv, &isnull);
+
+		Assert(!isnull);
+
+		mcvlist = DatumGetByteaP(tmp);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return mcvlist;
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * items in the MCV list max_mcv_items (well, we might increase this to
+ * 32k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval)
+			/*
+			 * passed by value, so just Datum array (int4, int8, ...)
+			 *
+			 * TODO Might save a few bytes here, by storing just typlen
+			 *      bytes instead of whole Datum (8B) on 64-bits.
+			 */
+			info[i].nbytes = info[i].nvalues * sizeof(Datum);
+		else if (info[i].typlen > 0)
+			/* pased by reference, but fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], sizeof(Datum));
+				data += sizeof(Datum);
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/* inverse to serialize_mv_mcvlist() - see the comment there */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	int32  *indexes = NULL;
+	Datum **values = NULL;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* let's parse the value arrays */
+	values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else if (info[i].typlen > 0)
+		{
+			/* pased by reference, but fixed length (name, tid, ...) */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += info[i].typlen;
+			}
+		}
+		else if (info[i].typlen == -1)
+		{
+			/* varlena */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += VARSIZE_ANY(tmp);
+			}
+		}
+		else if (info[i].typlen == -2)
+		{
+			/* cstring */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+			}
+		}
+	}
+
+	/* allocate space for the MCV items */
+	mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)palloc0(sizeof(MCVItemData));
+
+		item->values = (Datum*)palloc0(sizeof(Datum)*ndims);
+		item->isnull = (bool*) palloc0(sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 81ec23b..c6e7d74 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -35,15 +35,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -59,11 +65,15 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					5
+#define Natts_pg_mv_statistic					9
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_deps_enabled		2
-#define Anum_pg_mv_statistic_deps_built			3
-#define Anum_pg_mv_statistic_stakeys			4
-#define Anum_pg_mv_statistic_stadeps			5
+#define Anum_pg_mv_statistic_mcv_enabled		3
+#define Anum_pg_mv_statistic_mcv_max_items		4
+#define Anum_pg_mv_statistic_deps_built			5
+#define Anum_pg_mv_statistic_mcv_built			6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_stadeps			8
+#define Anum_pg_mv_statistic_stamcv				9
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 2916f11..b2aa815 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2716,6 +2716,8 @@ DATA(insert OID = 3377 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3378 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index ec6764b..6ff29d6 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -25,9 +25,11 @@ typedef struct MVStatsData {
 
 	/* statistics requested in ALTER TABLE ... ADD STATISTICS */
 	bool		deps_enabled;	/* analyze functional dependencies */
+	bool		mcv_enabled;	/* analyze MCV lists */
 
 	/* available statistics (computed by ANALYZE) */
 	bool		deps_built;	/* functional dependencies available */
+	bool		mcv_built;	/* MCV list is already available */
 } MVStatsData;
 
 typedef struct MVStatsData *MVStats;
@@ -66,6 +68,47 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -74,24 +117,39 @@ MVStats list_mv_stats(Oid relid, int *nstats, bool built_only);
 bytea * fetch_mv_rules(Oid mvoid);
 
 bytea * fetch_mv_dependencies(Oid mvoid);
+bytea * fetch_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..595cfbf
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,210 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE mcv_list ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+ALTER TABLE mcv_list ADD STATISTICS (dependencies, max_mcv_items 200) ON (a, b, c);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10) ON (a, b, c);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10000) ON (a, b, c);
+ERROR:  max number of MCV items is 8192
+-- correct command
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c, d);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index f0117ca..6d9ab2f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1357,7 +1357,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 00c6ddf..63727a4 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -111,4 +111,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index b818be9..5b07b3b 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -154,3 +154,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..410b52d
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,181 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (unknown_column);
+
+-- single column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a);
+
+-- single column, duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE mcv_list ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- missing MCV statistics
+ALTER TABLE mcv_list ADD STATISTICS (dependencies, max_mcv_items 200) ON (a, b, c);
+
+-- invalid mcv_max_items value / too low
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10) ON (a, b, c);
+
+-- invalid mcv_max_items value / too high
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10000) ON (a, b, c);
+
+-- correct command
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c, d);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+DROP TABLE mcv_list;
-- 
2.0.5

0004-multivariate-histograms.patchtext/x-diff; name=0004-multivariate-histograms.patchDownload

>From 166a13ed6152ebc0e384c53f765946ae8be5193f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 4/5] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/tablecmds.c           |   67 +-
 src/backend/optimizer/path/clausesel.c     |  549 ++++++++-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/common.c         |   41 +-
 src/backend/utils/mvstats/histogram.c      | 1800 ++++++++++++++++++++++++++++
 src/backend/utils/mvstats/mcv.c            |    1 +
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    2 +
 src/include/utils/mvstats.h                |   76 +-
 src/test/regress/expected/mv_histogram.out |  210 ++++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  179 +++
 15 files changed, 2924 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 4538e63..87086f9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,7 +158,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fae0fc7..6b01660 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11876,15 +11876,19 @@ static int compare_int16(const void *a, const void *b)
  * The code is an unholy mix of pieces that really belong to other parts
  * of the source tree.
  *
- * FIXME Check that the types are pass-by-value and support sort,
- *       although maybe we can live without the sort (and only build
- *       MCV list / association rules).
- *
- * FIXME This should probably check for duplicate stats (i.e. same
- *       keys, same options). Although maybe it's useful to have
- *       multiple stats on the same columns with different options
- *       (say, a detailed MCV-only stats for some queries, histogram
- *       for others, etc.)
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ *
+ * TODO It might be useful to have ALTER TABLE DROP STATISTICS too, but
+ *      it's tricky because there may be multiple kinds of stats for the
+ *      same list of columns, with different options (e.g. one just MCV
+ *      list, another with histogram, etc.).
  */
 static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 						StatisticsDef *def, LOCKMODE lockmode)
@@ -11902,12 +11906,15 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 	/* by default build everything */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11985,6 +11992,29 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -11993,10 +12023,10 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -12004,6 +12034,11 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -12021,10 +12056,14 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled  -1] = BoolGetDatum(build_histogram);
+
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps -1]  = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d24aedf..ea4d588 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -53,6 +53,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 							 Oid *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -77,6 +78,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStats mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStats mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -84,6 +87,11 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MVHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
 #define MIN(x, y) (((x) < (y)) ? (x) : (y))
@@ -266,7 +274,7 @@ clauselist_selectivity(PlannerInfo *root,
 	 * From now on we're only interested in MCV-compatible clauses.
 	 */
 	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-								   MV_CLAUSE_TYPE_MCV);
+								   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 	/*
 	 * If there still are at least two columns, we'll try to select
@@ -292,7 +300,7 @@ clauselist_selectivity(PlannerInfo *root,
 				/* split the clauselist into regular and mv-clauses */
 				clauses = clauselist_mv_split(root, sjinfo, clauses,
 										varRelid, &mvclauses, mvstat,
-										MV_CLAUSE_TYPE_MCV);
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 				/* we've chosen the histogram to match the clauses */
 				Assert(mvclauses != NIL);
@@ -1146,6 +1154,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -1159,9 +1168,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1461,7 +1485,6 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		bool		ok;
 
 		/* is it 'variable op constant' ? */
-
 		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
 			(is_pseudo_constant_clause_relids(lsecond(expr->args),
 											  right_relids) ||
@@ -1515,10 +1538,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 					case F_SCALARLTSEL:
 					case F_SCALARGTSEL:
 						/* not compatible with functional dependencies */
-						if (types & MV_CLAUSE_TYPE_MCV)
+						if (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 						{
 							*attnums = bms_add_member(*attnums, var->varattno);
-							return (types & MV_CLAUSE_TYPE_MCV);
+							return (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 						}
 						return false;
 
@@ -2435,3 +2458,515 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all buckets, and increase the match level
+ *      for the clauses (and skip buckets that are 'full match').
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStats mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	int		nmatches = 0;
+	char   *matches = NULL;
+	MVHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = deserialize_mv_histogram(fetch_mv_histogram(mvstats->mvoid));
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys, MVHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Assert (mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert (clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 *
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																	   | TYPECACHE_GT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					bool tmp;
+					MVBucket bucket = mvhist->buckets[i];
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*
+					* FIXME Once the min/max values are deduplicated, we can easily minimize
+					*       the number of calls to the comparator (assuming we keep the
+					*       deduplicated structure). See the note on compression at MVBucket
+					*       serialize/deserialize methods.
+					*/
+					switch (get_oprrest(expr->opno))
+					{
+						case F_SCALARLTSEL:	/* column < constant */
+
+							if (! isgt)	/* (var < const) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 bucket->min[idx]));
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 bucket->max[idx]));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							}
+							else	/* (const < var) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 bucket->max[idx],
+																	 cst->constvalue));
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 bucket->min[idx],
+																	 cst->constvalue));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_SCALARGTSEL:	/* column > constant */
+
+							if (! isgt)	/* (var > const) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 bucket->max[idx]));
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 bucket->min[idx]));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							else /* (const > var) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in
+								 * that case we can skip the bucket, because there's no overlap).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 bucket->min[idx],
+																	 cst->constvalue));
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 bucket->max[idx],
+																	 cst->constvalue));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the lt/gt
+							 * operators fetched from type cache.
+							 *
+							 * TODO We'll use the default 50% estimate, but that's probably way off
+							 *		if there are multiple distinct values. Consider tweaking this a
+							 *		somehow, e.g. using only a part inversely proportional to the
+							 *		estimated number of distinct values in the bucket.
+							 *
+							 * TODO This does not handle inclusion flags at the moment, thus counting
+							 *		some buckets twice (when hitting the boundary).
+							 *
+							 * TODO Optimization is that if max[i] == min[i], it's effectively a MCV
+							 *		item and we can count the whole bucket as a complete match (thus
+							 *		using 100% bucket selectivity and not just 50%).
+							 *
+							 * TODO Technically some buckets may "degenerate" into single-value
+							 *		buckets (not necessarily for all the dimensions) - maybe this
+							 *		is better than keeping a separate MCV list (multi-dimensional).
+							 *		Update: Actually, that's unlikely to be better than a separate
+							 *		MCV list for two reasons - first, it requires ~2x the space
+							 *		(because of storing lower/upper boundaries) and second because
+							 *		the buckets are ranges - depending on the partitioning algorithm
+							 *		it may not even degenerate into (min=max) bucket. For example the
+							 *		the current partitioning algorithm never does that.
+							 */
+							tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 bucket->min[idx]));
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																 DEFAULT_COLLATION_OID,
+																 bucket->max[idx],
+																 cst->constvalue));
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							/* partial match */
+							UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							break;
+					}
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	return nmatches;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 3c0aff4..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o mcv.o dependencies.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd952c6..6e824bd 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -45,7 +45,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 	{
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		/* int2 vector of attnums the stats should be computed on */
 		int2vector * attrs = mvstats[i].stakeys;
@@ -66,8 +67,16 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (mvstats->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (mvstats->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(mvstats[i].mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(mvstats[i].mvoid, deps, mcvlist, histogram, attrs, stats);
+
+#ifdef MVSTATS_DEBUG
+		print_mv_histogram_info(histogram);
+#endif
 	}
 }
 
@@ -149,7 +158,7 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 		 * Skip statistics that were not computed yet (if only stats
 		 * that were already built were requested)
 		 */
-		if (built_only && (! (stats->mcv_built || stats->deps_built)))
+		if (built_only && (! (stats->mcv_built || stats->deps_built || stats->hist_built)))
 			continue;
 
 		/* double the array size if needed */
@@ -161,10 +170,15 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 
 		result[*nstats].mvoid = HeapTupleGetOid(htup);
 		result[*nstats].stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+
 		result[*nstats].deps_enabled = stats->deps_enabled;
 		result[*nstats].mcv_enabled = stats->mcv_enabled;
+		result[*nstats].hist_enabled = stats->hist_enabled;
+
 		result[*nstats].deps_built = stats->deps_built;
 		result[*nstats].mcv_built = stats->mcv_built;
+		result[*nstats].hist_built = stats->hist_built;
+
 		*nstats += 1;
 	}
 
@@ -178,9 +192,16 @@ list_mv_stats(Oid relid, int *nstats, bool built_only)
 	return result;
 }
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -213,19 +234,31 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
 	oldtup = SearchSysCache1(MVSTATOID,
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..2a7f660
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,1800 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+/*
+ * Multivariate histograms
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by a min/max value in each
+ * dimension, stored in an array, so that the bucket includes values
+ * fulfilling condition
+ *
+ *     min[i] <= value[i] <= max[i]
+ *
+ * where 'i' is the dimension. In 1D this corresponds to a simple
+ * interval, in 2D to a rectangle, and in 3D to a block. If you can
+ * imagine this in 4D, congrats!
+ *
+ * In addition to the bounaries, each bucket tracks additional details:
+ *
+ *     * frequency (fraction of tuples it matches)
+ *     * whether the boundaries are inclusive or exclusive
+ *     * whether the dimension contains only NULL values
+ *     * number of distinct values in each dimension (for building)
+ *
+ * and possibly some additional information.
+ *
+ * We do expect to support multiple histogram types, with different
+ * features etc. The 'type' field is used to identify those types.
+ * Technically some histogram types might use completely different
+ * bucket representation, but that's not expected at the moment.
+ *
+ * Although the current implementation builds non-overlapping buckets,
+ * the code does not rely on the non-overlapping nature - there are
+ * interesting types of histograms / histogram building algorithms
+ * producing overlapping buckets.
+ *
+ * TODO Currently the histogram does not include information about what
+ *      part of the table it covers (because the frequencies are
+ *      computed from the rows that may be filtered by MCV list). Seems
+ *      wrong, possibly causing misestimates (when not matching the MCV
+ *      list, we'll probably get much higher selectivity).
+ *
+ *
+ * Estimating selectivity
+ * ----------------------
+ * With histograms, we always "match" a whole bucket, not indivitual
+ * rows (or values), irrespectedly of the type of clause. Therefore we
+ * can't use the optimizations for equality clauses, as in MCV lists.
+ *
+ * The current implementation uses histograms to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (a) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (b) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ * When used on low-cardinality data, histograms usually perform
+ * considerably worse than MCV lists (which are a good fit for this
+ * kind of data). This is especially true on categorical data, where
+ * ordering of the values is only loosely related to meaning of the
+ * data, as proper ordering is crucial for histograms.
+ *
+ * On high-cardinality data the histograms are usually a better choice,
+ * because MCV lists can't accurately represent the distribution.
+ *
+ * By evaluating a clause on a bucket, we may get one of three results:
+ *
+ *     (a) FULL_MATCH - The bucket definitely matches the clause.
+ *
+ *     (b) PARTIAL_MATCH - The bucket matches the clause, but not
+ *                         necessarily all the tuples it represents.
+ *
+ *     (c) NO_MATCH - The bucket definitely does not match the clause.
+ *
+ * This may be illustrated using a range [1, 5], which is essentially
+ * a 1D bucket. With clause
+ *
+ *     WHERE (a < 10) => FULL_MATCH (all range values are below
+ *                       10, so the whole bucket matches)
+ *
+ *     WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+ *                       the clause, but we don't know how many)
+ *
+ *     WHERE (a < 0)  => NO_MATCH (all range values are above 1, so
+ *                       no values from the bucket match)
+ *
+ * Some clauses may produce only some of those results - for example
+ * equality clauses may never produce FULL_MATCH as we always hit only
+ * part of the bucket, not all the values. This results in less accurate
+ * estimates compared to MCV lists, where we can hit a MCV items exactly
+ * (an extreme case of that is 'full match').
+ *
+ * There are clauses that may not produce any PARTIAL_MATCH results.
+ * A nice example of that is 'IS [NOT] NULL' clause, which either
+ * matches the bucket completely (FULL_MATCH) or not at all (NO_MATCH),
+ * thanks to how the NULL-buckets are constructed.
+ *
+ * TODO The IS [NOT] NULL clause is not yet implemented, but should be
+ *      rather trivial to.
+ *
+ * Computing the total selectivity estimate is trivial - simply sum
+ * selectivities from all the FULL_MATCH and PARTIAL_MATCH buckets, but
+ * multiply the PARTIAL_MATCH buckets by 0.5 to minimize average error.
+ *
+ *
+ * NULL handling
+ * -------------
+ * Buckets may not contain tuples with NULL and non-NULL values in
+ * a single dimension (attribute). To handle this, the histogram may
+ * contain NULL-buckets, i.e. buckets with one or more NULL-only
+ * dimensions.
+ *
+ * The maximum number of NULL-buckets is determined by the number of
+ * attributes the histogram is built on. For N-dimensional histogram,
+ * the maximum number of NULL-buckets is 2^N. So for 8 attributes
+ * (which is the current value of MVSTATS_MAX_DIMENSIONS), there may be
+ * up to 256 NULL-buckets.
+ *
+ * Those buckets are only built if needed - if there are no NULL values
+ * in the data, no such buckets are built.
+ *
+ *
+ * Serialization
+ * -------------
+ * After building, the histogram is serialized into a more efficient
+ * form (dedup boundary values etc.). See serialize_mv_histogram() for
+ * more details about how it's done.
+ *
+ * Serialized histograms are marked with 'magic' constant, to make it
+ * easier to check the bytea really is a histogram in serialized form.
+ *
+ *
+ * TODO This structure is used both when building the histogram, and
+ *      then when using it to compute estimates. That's why the last
+ *      few elements are not used once the histogram is built.
+ *
+ *      Add pointer to 'private' data, meant for private data for
+ *      other algorithms for building the histogram. It also removes
+ *      the bogus / unnecessary fields.
+ *
+ * TODO The limit on number of buckets is quite arbitrary, aiming for
+ *      sufficient accuracy while still being fast. Probably should be
+ *      replaced with a dynamic limit dependent on statistics target,
+ *      number of attributes (dimensions) and statistics target
+ *      associated with the attributes. Also, this needs to be related
+ *      to the number of sampled rows, by either clamping it to a
+ *      reasonable number (after seeing the number of rows) or using
+ *      it when computing the number of rows to sample. Something like
+ *      10 rows per bucket seems reasonable.
+ *
+ * TODO Add MVSTAT_HIST_ROWS_PER_BUCKET tracking minimal number of
+ *      tuples per bucket (also, see the previous TODO).
+ *
+ * TODO We may replace the bool arrays with a suitably large data type
+ *      (say, uint16 or uint32) and get rid of the allocations. It's
+ *      unlikely we'll ever support more than 32 columns as that'd
+ *      result in poor precision, huge histograms (splitting each
+ *      dimension once would mean 2^32 buckets), and very expensive
+ *      estimation. MCVItem already does it this way.
+ *
+ *      Update: Actually, this is not 100% true, because we're splitting
+ *      a single bucket, not all the buckets at the same time. So each
+ *      split simply adds one new bucket, and we choose the bucket that
+ *      is most in need of a slit. So even with 32 columns this might
+ *      give reasonable accuracy, maybe? After 1000 splits we'll get
+ *      about 1001 buckets, and some may be quite large (if that area
+ *      frequency has low frequency of tuples).
+ *
+ *      There are other challenges though - e.g. with this many columns
+ *      it's more likely to reference both label/non-label columns,
+ *      which is rather quirky (especially with histograms).
+ *
+ *      However, while this would save some space for histograms built
+ *      on many columns, it won't save anything for up to 4 columns
+ *      (actually, on less than 3 columns it's probably wasteful).
+ *
+ * TODO Maybe the distinct stats (both for combination of all columns
+ *      and for combinations of various subsets of columns) should be
+ *      moved to a separate structure (next to histogram/MCV/...) to
+ *      make it useful even without a histogram computed etc.
+ */
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(int32) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((int32*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* some debugging methods */
+#ifdef MVSTATS_DEBUG
+static void print_mv_histogram_info(MVHistogram histogram);
+#endif
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/* index of the dimension the bucket was split previously */
+	int			last_split_dimension;
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 *
+	 * XXX Maybe it could be useful for improving ndistinct estimates for
+	 *     combinations of columns (e.g. in GROUP BY queries). It would
+	 *     probably mean tracking 2^N values for each bucket, and even if
+	 *     those values might be stores in 1B (which is unlikely) it's
+	 *     still a lot of space (considering the expected number of
+	 *     buckets). So maybe that might be tracked just at the top level.
+	 *
+	 * TODO Consider tracking ndistincts for all attribute combinations.
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, by looking at the number of
+ * distinct values (combination of column values for bucket, column
+ * values for a dimension). This is somehow naive, but seems to work
+ * quite well. See the discussion at select_bucket_to_partition and
+ * partition_bucket for more details about alternative algorithms.
+ *
+ * So the current algorithm looks like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (max distinct combinations)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (max distinct values)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets = (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data = ((HistogramBuild)histogram->buckets[i]->build_data);
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+bytea *
+fetch_mv_histogram(Oid mvoid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	bytea	   *stahist = NULL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				ObjectIdAttributeNumber,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(mvoid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticOidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		bool isnull = false;
+		Datum hist  = SysCacheGetAttr(MVSTATOID, htup,
+								   Anum_pg_mv_statistic_stahist, &isnull);
+
+		Assert(!isnull);
+
+		stahist = DatumGetByteaP(hist);
+
+		break;
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/*
+	 * TODO Maybe save the histogram into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?.
+	 */
+
+	return stahist;
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm
+ * is simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * buckets in the histogram max_buckets (well, we might increase this
+ * to 16k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ *
+ * Deduplication in serialization
+ * ------------------------------
+ * The deduplication is very effective and important here, because every
+ * time we split a bucket, we keep all the boundary values, except for
+ * the dimension that was used for the split. Another way to look at
+ * this is that each split introduces 1 new value (the value used to do
+ * the split). A histogram with M buckets was created by (M-1) splits
+ * of the initial bucket, and each bucket has 2*N boundary values. So
+ * assuming the initial bucket does not have any 'collapsed' dimensions,
+ * the number of distinct values is
+ *
+ *     (2*N + (M-1))
+ *
+ * but the total number of boundary values is
+ *
+ *     2*N*M
+ *
+ * which is clearly much higher. For a histogram on two columns, with
+ * 1024 buckets, it's 1027 vs. 4096. Of course, we're not saving all
+ * the difference (because we'll use 32-bit indexes into the values).
+ * But with large values (e.g. stored as varlena), this saves a lot.
+ *
+ * An interesting feature is that the total number of distinct values
+ * does not really grow with the number of dimensions, except for the
+ * size of the initial bucket. After that it only depends on number of
+ * buckets (i.e. number of splits).
+ *
+ * XXX Of course this only holds for the current histogram building
+ *     algorithm. Algorithms doing the splits differently (e.g.
+ *     producing overlapping buckets) may behave differently.
+ *
+ * TODO This only confirms we can use the uint16 indexes. The worst
+ *      that could happen is if all the splits happened by a single
+ *      dimension. To exhaust the uint16 this would require ~64k
+ *      splits (needs to be reflected in MVSTAT_HIST_MAX_BUCKETS).
+ *
+ * TODO We don't need to use a separate boolean for each flag, instead
+ *      use a single char and set bits.
+ *
+ * TODO We might get a bit better compression by considering the actual
+ *      data type length. The current implementation treats all data
+ *      types passed by value as requiring 8B, but for INT it's actually
+ *      just 4B etc.
+ *
+ *      OTOH this is only related to the lookup table, and most of the
+ *      space is occupied by the buckets (with int16 indexes).
+ *
+ *
+ * Varlena compression
+ * -------------------
+ * This encoding may prevent automatic varlena compression (similarly
+ * to JSONB), because first part of the serialized bytea will be an
+ * array of unique values (although sorted), and pglz decides whether
+ * to compress by trying to compress the first part (~1kB or so). Which
+ * is likely to be poor, due to the lack of repetition.
+ *
+ * One possible cure to that might be storing the buckets first, and
+ * then the deduplicated arrays. The buckets might be better suited
+ * for compression.
+ *
+ * On the other hand the encoding scheme is a context-aware compression,
+ * usually compressing to ~30% (or less, with large data types). So the
+ * lack of pglz compression may be OK.
+ *
+ * XXX But maybe we don't really want to compress this, to save on
+ *     planning time?
+ *
+ * TODO Try storing the buckets / deduplicated arrays in reverse order,
+ *      measure impact on compression.
+ *
+ *
+ * Deserialization
+ * ---------------
+ * The deserialization is currently implemented so that it reconstructs
+ * the histogram back into the same structures - this involves quite
+ * a few of memcpy() and palloc(), but maybe we could create a special
+ * structure for the serialized histogram, and access the data directly,
+ * without the unpacking.
+ *
+ * Not only it would save some memory and CPU time, but might actually
+ * work better with CPU caches (not polluting the caches).
+ *
+ * TODO Try to keep the compressed form, instead of deserializing it to
+ *      MVHistogram/MVBucket.
+ *
+ *
+ * General TODOs
+ * -------------
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval)
+			/*
+			 * passed by value, so just Datum array (int4, int8, ...)
+			 *
+			 * TODO Might save a few bytes here, by storing just typlen
+			 *      bytes instead of whole Datum (8B) on 64-bits.
+			 */
+			info[i].nbytes = info[i].nvalues * sizeof(Datum);
+		else if (info[i].typlen > 0)
+			/* pased by reference, but fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized histogram exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], sizeof(Datum));
+				data += sizeof(Datum);
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				int idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Reverse to serialize histogram. This essentially expands the serialized
+ * form back to MVHistogram / MVBucket.
+ */
+MVHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+	Datum **values = NULL;
+
+	MVHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVHistogramData, buckets));
+	tmp += offsetof(MVHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* let's parse the value arrays */
+	values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else if (info[i].typlen > 0)
+		{
+			/* pased by reference, but fixed length (name, tid, ...) */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += info[i].typlen;
+			}
+		}
+		else if (info[i].typlen == -1)
+		{
+			/* varlena */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += VARSIZE_ANY(tmp);
+			}
+		}
+		else if (info[i].typlen == -2)
+		{
+			/* cstring */
+			values[i] = (Datum*)palloc0(sizeof(Datum) * info[i].nvalues);
+			for (j = 0; j < info[i].nvalues; j++)
+			{
+				/* just point into the array */
+				values[i][j] = PointerGetDatum(tmp);
+				tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+			}
+		}
+	}
+
+	/* allocate space for the buckets */
+	histogram->buckets = (MVBucket*)palloc0(sizeof(MVBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+		bucket->nullsonly     = (bool*) palloc0(sizeof(bool) * ndims);
+		bucket->min_inclusive = (bool*) palloc0(sizeof(bool) * ndims);
+		bucket->max_inclusive = (bool*) palloc0(sizeof(bool) * ndims);
+
+		bucket->min = (Datum*) palloc0(sizeof(Datum) * ndims);
+		bucket->max = (Datum*) palloc0(sizeof(Datum) * ndims);
+
+		bucket->ntuples   = *BUCKET_NTUPLES(tmp);
+
+		memcpy(bucket->nullsonly, BUCKET_NULLS_ONLY(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		memcpy(bucket->min_inclusive, BUCKET_MIN_INCL(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		memcpy(bucket->max_inclusive, BUCKET_MAX_INCL(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		/* translate the indexes to values */
+		for (j = 0; j < ndims; j++)
+		{
+			if (! bucket->nullsonly[j])
+			{
+				bucket->min[j] = values[j][BUCKET_MIN_INDEXES(tmp, ndims)[j]];
+				bucket->max[j] = values[j][BUCKET_MAX_INDEXES(tmp, ndims)[j]];
+			}
+		}
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller
+ * buckets.
+ *
+ * TODO Add ndistinct estimation, probably the one described in "Towards
+ *      Estimation Error Guarantees for Distinct Values, PODS 2000,
+ *      p. 268-279" (the ones called GEE, or maybe AE).
+ *
+ * TODO The "combined" ndistinct is more likely to scale with the number
+ *      of rows (in the table), because a single column behaving this
+ *      way is sufficient for such behavior.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	/*
+	 * The initial bucket was not split at all, so we'll start with the
+	 * first dimension in the next round (index = 0).
+	 */
+	data->last_split_dimension = -1;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * TODO Fix to handle arbitrarily-sized histograms (not just 2D ones)
+ *      and call the right output procedures (for the particular type).
+ *
+ * TODO This should somehow fetch info about the data types, and use
+ *      the appropriate output functions to print the boundary values.
+ *      Right now this prints the 8B value as an integer.
+ *
+ * TODO Also, provide a special function for 2D histogram, printing
+ *      a gnuplot script (with rectangles).
+ *
+ * TODO For string types (once supported) we can sort the strings first,
+ *      assign them a sequence of integers and use the original values
+ *      as labels.
+ */
+#ifdef MVSTATS_DEBUG
+static void
+print_mv_histogram_info(MVHistogram histogram)
+{
+	int i = 0;
+
+	elog(WARNING, "histogram nbuckets=%d", histogram->nbuckets);
+
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		MVBucket bucket = histogram->buckets[i];
+		elog(WARNING, "  bucket %d : ndistinct=%f ntuples=%d min=[%ld, %ld], max=[%ld, %ld] distinct=[%d,%d]",
+			i, bucket->ndistinct, bucket->numrows,
+			bucket->min[0], bucket->min[1], bucket->max[0], bucket->max[1],
+			bucket->ndistincts[0], bucket->ndistincts[1]);
+	}
+}
+#endif
+
+/*
+ * A very simple partitioning selection criteria - choose the bucket
+ * with the highest number of distinct values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int ndistinct = 1; /* if ndistinct=1, we can't split the bucket */
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the ndistinct count is higher, use this bucket */
+		if (data->ndistinct > ndistinct) {
+			bucket = buckets[i];
+			ndistinct = data->ndistinct;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - splits the dimensions in
+ * a round-robin manner (considering only those with ndistinct>1). That
+ * is first a dimension 0 is split, then 1, 2, ... until reaching the
+ * end of attribute list, and then wrapping back to 0. Of course,
+ * dimensions with a single distinct value are skipped.
+ *
+ * This is essentially what Muralikrishna/DeWitt described in their SIGMOD
+ * article (M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For
+ * Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD
+ * Conference 1988: 28-36).
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * This splits the bucket by tweaking the existing one, and returning the
+ * new bucket (essentially shrinking the existing one in-place and returning
+ * the other "half" as a new bucket). The caller is responsible for adding
+ * the new bucket into the list of buckets.
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case of
+ *      strongly dependent columns - e.g. y=x).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g. to
+ *      split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split, in a round robin manner.
+	 * We'll use the first one with (ndistinct > 1).
+	 *
+	 * If we happen to wrap around, something clearly went wrong (we
+	 * can't mess with the last_split_dimension directly, because we
+	 * couldn't do this check).
+	 */
+	dimension = data->last_split_dimension;
+	while (true)
+	{
+		dimension = (dimension + 1) % numattrs;
+
+		if (data->ndistincts[dimension] > 1)
+			break;
+
+		/* if we ran the previous split dimension, it's infinite loop */
+		Assert(dimension != data->last_split_dimension);
+	}
+
+	/* Remember the dimension for the next split of this bucket. */
+	data->last_split_dimension = dimension;
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	split_value = values[0].value;
+	for (i = 1; i < data->numrows; i++)
+	{
+		/* count distinct values */
+		if (values[i].value != values[i-1].value)
+			ndistinct += 1;
+
+		/* once we've seen 1/2 distinct values (and use the value) */
+		if (ndistinct > data->ndistincts[dimension] / 2)
+		{
+			split_value = values[i].value;
+			break;
+		}
+
+		/* keep track how many rows belong to the first bucket */
+		nrows += 1;
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->last_split_dimension = ((HistogramBuild)bucket->build_data)->last_split_dimension;
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
index 4466cee..4f60bd1 100644
--- a/src/backend/utils/mvstats/mcv.c
+++ b/src/backend/utils/mvstats/mcv.c
@@ -961,6 +961,7 @@ MCVList deserialize_mv_mcvlist(bytea * data)
 
 	for (i = 0; i < nitems; i++)
 	{
+		/* FIXME allocate as a single chunk (minimize palloc overhead) */
 		MCVItem item = (MCVItem)palloc0(sizeof(MCVItemData));
 
 		item->values = (Datum*)palloc0(sizeof(Datum)*ndims);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index c6e7d74..84579da 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -36,13 +36,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -50,6 +53,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -65,15 +69,19 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					9
+#define Natts_pg_mv_statistic					13
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_deps_enabled		2
 #define Anum_pg_mv_statistic_mcv_enabled		3
-#define Anum_pg_mv_statistic_mcv_max_items		4
-#define Anum_pg_mv_statistic_deps_built			5
-#define Anum_pg_mv_statistic_mcv_built			6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_stadeps			8
-#define Anum_pg_mv_statistic_stamcv				9
+#define Anum_pg_mv_statistic_hist_enabled		4
+#define Anum_pg_mv_statistic_mcv_max_items		5
+#define Anum_pg_mv_statistic_hist_max_buckets	6
+#define Anum_pg_mv_statistic_deps_built			7
+#define Anum_pg_mv_statistic_mcv_built			8
+#define Anum_pg_mv_statistic_hist_built			9
+#define Anum_pg_mv_statistic_stakeys			10
+#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_stamcv				12
+#define Anum_pg_mv_statistic_stahist			13
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b2aa815..a1b5e2b 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2718,6 +2718,8 @@ DATA(insert OID = 3378 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies show");
 DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
 DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 6ff29d6..673e546 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -26,10 +26,12 @@ typedef struct MVStatsData {
 	/* statistics requested in ALTER TABLE ... ADD STATISTICS */
 	bool		deps_enabled;	/* analyze functional dependencies */
 	bool		mcv_enabled;	/* analyze MCV lists */
+	bool		hist_enabled;	/* analyze histogram */
 
 	/* available statistics (computed by ANALYZE) */
 	bool		deps_built;	/* functional dependencies available */
 	bool		mcv_built;	/* MCV list is already available */
+	bool		hist_built;	/* histogram is already available */
 } MVStatsData;
 
 typedef struct MVStatsData *MVStats;
@@ -109,6 +111,68 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -118,14 +182,18 @@ bytea * fetch_mv_rules(Oid mvoid);
 
 bytea * fetch_mv_dependencies(Oid mvoid);
 bytea * fetch_mv_mcvlist(Oid mvoid);
+bytea * fetch_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVHistogram		deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
@@ -137,6 +205,7 @@ int mv_get_index(AttrNumber varattno, int2vector * stakeys);
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -146,10 +215,15 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..ff2f0cc
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,210 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE mv_histogram ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+ALTER TABLE mv_histogram ADD STATISTICS (dependencies, max_buckets 200) ON (a, b, c);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 10) ON (a, b, c);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 100000) ON (a, b, c);
+ERROR:  minimum number of buckets is 16384
+-- correct command
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=10000
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=1001
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=1001
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=10000
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=3492
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built | pg_mv_stats_histogram_info 
+--------------+------------+----------------------------
+ t            | t          | nbuckets=3433
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c, d);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6d9ab2f..ccc778a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1359,7 +1359,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 63727a4..aeb89f8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -111,4 +111,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 5b07b3b..ee1468d 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -155,3 +155,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..78890c8
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,179 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (unknown_column);
+
+-- single column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a);
+
+-- single column, duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE mv_histogram ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- missing histogram statistics
+ALTER TABLE mv_histogram ADD STATISTICS (dependencies, max_buckets 200) ON (a, b, c);
+
+-- invalid max_buckets value / too low
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 10) ON (a, b, c);
+
+-- invalid max_buckets value / too high
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 100000) ON (a, b, c);
+
+-- correct command
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built, pg_mv_stats_histogram_info(stahist)
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c, d);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DELETE FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+DROP TABLE mv_histogram;
-- 
2.0.5

0005-multi-statistics-estimation.patchtext/x-diff; name=0005-multi-statistics-estimation.patchDownload

>From db24cc534985ce97b238ea539b4216d8e33397a5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 5/5] multi-statistics estimation

The general idea is that a probability (which
is what selectivity is) can be split into a product of
conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part
may be simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute
the original probability.

The implementation works in the other direction, though.
We know what probability P(A & B & C) we need to compute,
and also what statistics are available.

So we search for a combinations of statistics, covering
the clauses in an optimal way (most clauses covered, most
dependencies exploited).

There are two possible approaches - exhaustive and greedy.
The exhaustive one walks through all permutations of
stats using dynamic programming, so it's guaranteed to
find the optimal solution, but it soon gets very slow as
it's roughly O(N!). The dynamic programming may improve
that a bit, but it's still far too expensive for large
numbers of statistics (on a single table).

The greedy algorithm is very simple - in every step choose
the best solution. That may not guarantee the best solution
globally (but maybe it does?), but it only needs N steps
to find the solution, so it's very fast (processing the
selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with
respect to runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply
them to the clauses using the conditional probabilities.
We process the selected stats one by one, and for each
we select the estimated clauses and conditions. See
clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to
be covered by a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single
multivariate statistics.

Clauses not covered by a single statistics at this level
will be passed to clause_selectivity() but this will treat
them as a collection of simpler clauses (connected by AND
or OR), and the clauses from the previous level will be
used as conditions.

So using the same example, the last clause will be passed
to clause_selectivity() with 'clause1' and 'clause2' as
conditions, and it will be processed using multivariate
stats if possible.

The other limitation is that all the expressions have to
be mv-compatible, i.e. there can't be a mix of expressions.
Fixing this should be relatively simple - just split the
list into two parts (mv-compatible/incompatible), as at
the top level.
---
 src/backend/optimizer/path/clausesel.c | 1533 ++++++++++++++++++++++++++++++--
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 7 files changed, 1513 insertions(+), 98 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index ea4d588..98ad802 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -30,7 +30,7 @@
 #include "utils/typcache.h"
 
 #include "parser/parsetree.h"
-
+#include "miscadmin.h"
 
 #include <stdio.h>
 
@@ -63,23 +63,25 @@ static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
 									  Oid varRelid, Oid *relid, SpecialJoinInfo *sjinfo,
 									  int type);
 
+static Bitmapset *clause_mv_get_attnums(PlannerInfo *root, Node *clause);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, int nmvstats, MVStats mvstats,
 								SpecialJoinInfo *sjinfo);
 
-static int choose_mv_statistics(int nmvstats, MVStats mvstats,
-								Bitmapset *attnums);
 static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
 								 List *clauses, Oid varRelid,
 								 List **mvclauses, MVStats mvstats, int types);
 
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-						List *clauses, MVStats mvstats);
+						MVStats mvstats, List *clauses, List *conditions);
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStats mvstats,
+									MVStats mvstats,
+									List *clauses, List *conditions,
 									bool *fullmatch, Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStats mvstats);
+									MVStats mvstats,
+									List *clauses, List *conditions);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -92,6 +94,31 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static mv_solution_t *choose_mv_statistics(PlannerInfo *root,
+								int nmvstats, MVStats mvstats,
+								List *clauses, List *conditions,
+								Oid varRelid,
+								SpecialJoinInfo *sjinfo, int type);
+
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
+
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
 #define MIN(x, y) (((x) < (y)) ? (x) : (y))
@@ -220,7 +247,8 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
@@ -234,13 +262,8 @@ clauselist_selectivity(PlannerInfo *root,
 	/* attributes in mv-compatible clauses */
 	Bitmapset  *mvattnums = NULL;
 
-	/*
-	 * If there's exactly one clause, then no use in trying to match up
-	 * pairs, so just go directly to clause_selectivity().
-	 */
-	if (list_length(clauses) == 1)
-		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+	/* local conditions, accumulated and passed to clauses in this list */
+	List	*conditions_local = list_copy(conditions);
 
 	/*
 	 * Collect attributes referenced by mv-compatible clauses (looking
@@ -288,30 +311,185 @@ clauselist_selectivity(PlannerInfo *root,
 		/* see choose_mv_statistics() for details */
 		if (nmvstats > 0)
 		{
-			int idx = choose_mv_statistics(nmvstats, mvstats, mvattnums);
+			mv_solution_t * solution
+				= choose_mv_statistics(root, nmvstats, mvstats,
+									   clauses, conditions,
+									   varRelid, sjinfo,
+									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
-			if (idx >= 0)	/* we have a matching stats */
+			/*
+			 * FIXME This probaly leaks memory a bit - the lists of clauses
+			 *       should be freed properly.
+			 */
+
+			/* we have a good solution stats */
+			if (solution != NULL)
 			{
-				MVStats mvstat = &mvstats[idx];
+				int i, j, k;
+
+				for (i = 0; i < solution->nstats; i++)
+				{
+					/* clauses compatible with multi-variate stats */
+					List	*mvclauses = NIL;
+					List	*mvclauses_new = NIL;
+					List	*mvclauses_conditions = NIL;
+					Bitmapset	*stat_attnums = NULL;
+
+					MVStats	mvstat = &mvstats[solution->stats[i]];
+
+					/* build attnum bitmapset for this statistics */
+					for (k = 0; k < mvstat->stakeys->dim1; k++)
+						stat_attnums = bms_add_member(stat_attnums,
+													  mvstat->stakeys->values[k]);
+
+					/*
+					 * Append the compatible conditions (passed from above)
+					 * to mvclauses_conditions.
+					 */
+					foreach (l, conditions)
+					{
+						Node *c = (Node*)lfirst(l);
+						Bitmapset *tmp = clause_mv_get_attnums(root, c);
 
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
+						if (bms_is_subset(tmp, stat_attnums))
+							mvclauses_conditions
+								= lappend(mvclauses_conditions, c);
+
+						bms_free(tmp);
+					}
+
+					/* split the clauselist into regular and mv-clauses
+					 *
+					 * We keep the list of clauses (we don't remove the
+					 * clauses yet, because we want to use the clauses
+					 * as conditions of other clauses).
+					 *
+					 * FIXME Do this only once, i.e. filter the clauses
+					 *       once (selecting clauses covered by at least
+					 *       one statistics) and then convert them into
+					 *       smaller per-statistics lists of conditions
+					 *       and estimated clauses.
+					 */
+					clauselist_mv_split(root, sjinfo, clauses,
+											varRelid, &mvclauses, mvstat,
+											(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+					/*
+					 * We've chosen the statistics to match the clauses, so
+					 * each statistics from the solution should have at least
+					 * one new clause (not covered by the previous stats).
+					 */
+					Assert(mvclauses != NIL);
+
+					/*
+					 * Mvclauses now contains only clauses compatible
+					 * with the currently selected stats, but we have to
+					 * split that into conditions (already matched by
+					 * the previous stats), and the new clauses we need
+					 * to estimate using this stats.
+					 */
+					foreach (l, mvclauses)
+					{
+						bool covered = false;
+						Node  *clause = (Node *) lfirst(l);
+						Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+
+						/*
+						 * If already covered by previous stats, add it to
+						 * conditions.
+						 *
+						 * TODO Maybe this could be relaxed a bit? Because
+						 *      with complex and/or clauses, this might
+						 *      mean no statistics actually covers such
+						 *      complex clause.
+						 */
+						for (j = 0; j < i; j++)
+						{
+							int k;
+							Bitmapset  *stat_attnums = NULL;
+							MVStats		prev_stat = &mvstats[solution->stats[j]];
+
+							for (k = 0; k < prev_stat->stakeys->dim1; k++)
+								stat_attnums = bms_add_member(stat_attnums,
+															  prev_stat->stakeys->values[k]);
+
+							covered = bms_is_subset(clause_attnums, stat_attnums);
+
+							bms_free(stat_attnums);
+
+							if (covered)
+								break;
+						}
+
+						if (covered)
+							mvclauses_conditions
+								= lappend(mvclauses_conditions, clause);
+						else
+							mvclauses_new
+								= lappend(mvclauses_new, clause);
+					}
+
+					/*
+					 * We need at least one new clause (not just conditions).
+					 */
+					Assert(mvclauses_new != NIL);
+
+					/* compute the multivariate stats */
+					s1 *= clauselist_mv_selectivity(root, mvstat,
+													mvclauses_new,
+													mvclauses_conditions);
+				}
+
+				/*
+				 * And now finally remove all the mv-compatible clauses.
+				 *
+				 * This only repeats the same split as above, but this
+				 * time we actually use the result list (and feed it to
+				 * the next call).
+				 */
+				for (i = 0; i < solution->nstats; i++)
+				{
+					/* clauses compatible with multi-variate stats */
+					List	*mvclauses = NIL;
 
-				/* split the clauselist into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+					MVStats	mvstat = &mvstats[solution->stats[i]];
 
-				/* we've chosen the histogram to match the clauses */
-				Assert(mvclauses != NIL);
+					/* split the list into regular and mv-clauses */
+					clauses = clauselist_mv_split(root, sjinfo, clauses,
+											varRelid, &mvclauses, mvstat,
+											(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
-				/* compute the multivariate stats */
-				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+					/*
+					 * Add the clauses to the conditions (to be passed
+					 * to regular clauses), irrespectedly whether it
+					 * will be used as a condition or a clause here.
+					 *
+					 * We only keep the remaining conditions in the
+					 * clauses (we keep what clauselist_mv_split returns)
+					 * so we add each MV condition exactly once.
+					 */
+					foreach (l, mvclauses)
+						conditions_local = lappend(conditions_local,
+												   (Node*)lfirst(l));
+				}
 			}
 		}
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+	{
+		Selectivity s = clause_selectivity(root, (Node *) linitial(clauses),
+								  varRelid, jointype, sjinfo,
+								  conditions_local);
+		list_free(conditions_local);
+		return s;
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -323,7 +501,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions_local);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -478,6 +657,9 @@ clauselist_selectivity(PlannerInfo *root,
 		rqlist = rqnext;
 	}
 
+	/* free the local conditions */
+	list_free(conditions_local);
+
 	return s1;
 }
 
@@ -688,7 +870,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -818,7 +1001,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -827,7 +1011,8 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
@@ -839,6 +1024,13 @@ clause_selectivity(PlannerInfo *root,
 		 */
 		ListCell   *arg;
 
+		/* TODO Split the clause list into mv-compatible part, pretty
+		 *      much just like in clauselist_selectivity(), and call
+		 *      clauselist_mv_selectivity(). It has to be taught about
+		 *      OR-semantics (right now it assumes AND) or maybe just
+		 *      create a fake OR clause here, and pass it in.
+		 */
+
 		s1 = 0.0;
 		foreach(arg, ((BoolExpr *) clause)->args)
 		{
@@ -846,7 +1038,8 @@ clause_selectivity(PlannerInfo *root,
 												(Node *) lfirst(arg),
 												varRelid,
 												jointype,
-												sjinfo);
+												sjinfo,
+												conditions);
 
 			s1 = s1 + s2 - s1 * s2;
 		}
@@ -958,7 +1151,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -967,7 +1161,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 
 	/* Cache the result if possible */
@@ -1149,9 +1344,67 @@ clause_selectivity(PlannerInfo *root,
  *      that from the most selective clauses first, because that'll
  *      eliminate the buckets/items sooner (so we'll be able to skip
  *      them without inspection, which is more expensive).
+ *
+ * TODO All this is based on the assumption that the statistics represent
+ *      the necessary dependencies, i.e. that if two colunms are not in
+ *      the same statistics, there's no dependency. If that's not the
+ *      case, we may get misestimates, just like before. For example
+ *      assume we have a table with three columns [a,b,c] with exactly
+ *      the same values, and statistics on [a,b] and [b,c]. So somthing
+ *      like this:
+ *
+ *          CREATE TABLE test AS SELECT i, i, i
+                                  FROM generate_series(1,1000);
+ *
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (a,b);
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (b,c);
+ *
+ *          ANALYZE test;
+ *
+ *          EXPLAIN ANALYZE SELECT * FROM test
+ *                    WHERE (a < 10) AND (b < 20) AND (c < 10);
+ *
+ *      The problem here is that the only shared column between the two
+ *      statistics is 'b' so the probability will be computed like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (a < 10) & (b < 20)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (b < 20)]
+ *
+ *      or like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20)]
+ *
+ *      In both cases the conditional probabilities will be evaluated as
+ *      0.5, because they lack the other column (which would make it 1.0).
+ *
+ *      Theoretically it might be possible to transfer the dependency,
+ *      e.g. by building bitmap for [a,b] and then combine it with [b,c]
+ *      by doing something like this:
+ *
+ *          1) build bitmap on [a,b] using [(a<10) & (b < 20)]
+ *          2) for each element in [b,c] check the bitmap
+ *
+ *      But that's certainly nontrivial - for example the statistics may
+ *      be different (MCV list vs. histogram) and/or the items may not
+ *      match (e.g. MCV items or histogram buckets will be built
+ *      differently). Also, for one value of 'b' there might be multiple
+ *      MCV items (because of the other column values) with different
+ *      bitmap values (some will match, some won't) - so it's not exactly
+ *      bitmap but a partial match.
+ *
+ *      Maybe a hash table with number of matches and mismatches (or
+ *      maybe sums of frequencies) would work? The step (2) would then
+ *      lookup the values and use that to weight the item somehow.
+ * 
+ *      Currently the only solution is to build statistics on all three
+ *      columns.
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStats mvstats,
+						  List *clauses, List *conditions)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -1169,7 +1422,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -1182,7 +1436,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStats mvstats)
 	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
 	 *       selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1232,6 +1487,665 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
+ * Selects the best combination of multivariate statistics, where
+ * 'best' means:
+ *
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ *
+ * There may be other optimality criteria, not considered in the initial
+ * implementation (more on that 'weaknesses' section).
+ *
+ * This is pretty much equal to splitting the probability of clauses
+ * (aka selectivity) into a sequence of conditional probabilities, like
+ * this
+ *
+ *    P(A,B,C,D) = P(A,B) * P(C|A,B) * P(D|A,B,C)
+ *
+ * and removing the attributes not referenced by the existing stats,
+ * under the assumption that there's no dependency (otherwise the DBA
+ * would create the stats).
+ *
+ *
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with
+ * maximum 'depth' equal to the number of multi-variate statistics
+ * available on the table.
+ *
+ * It explores all the possible permutations of the stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it
+ * matches are divided into 'conditions' (clauses already matched by at
+ * least one previous statistics) and clauses that are estimated.
+ *
+ * Then several checks are performed:
+ *
+ *  (a) The statistics covers at least 2 columns, referenced in the
+ *      estimated clauses (otherwise multi-variate stats are useless).
+ *
+ *  (b) The statistics covers at least 1 new column, i.e. column not
+ *      refefenced by the already used stats (and the new column has
+ *      to be referenced by the clauses, of couse). Otherwise the
+ *      statistics would not add any new information.
+ *
+ * There are some other sanity checks (e.g. that the stats must not be
+ * used twice etc.).
+ *
+ * Finally the new solution is compared to the currently best one, and
+ * if it's considered better, it's used instead.
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a somewhat simple optimality criteria,
+ * suffering by the following weaknesses.
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but
+ *     with statistics in a different order). It's unclear which solution
+ *     is the best one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those
+ *      solutions, and then combine them to get the final estimate
+ *      (e.g. by using average or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for
+ *     some types of clauses (e.g. MCV list is a good match for equality
+ *     than a histogram).
+ *
+ *     XXX Maybe MCV is almost always better / more accurate?
+ *
+ *     But maybe this is pointless - generally, each column is either
+ *     a label (it's not important whether because of the data type or
+ *     how it's used), or a value with ordering that makes sense. So
+ *     either a MCV list is more appropriate (labels) or a histogram
+ *     (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing columns of
+ *     both types - maybe it'd be beeter to invent a new type of stats
+ *     combining MCV list and histogram (keeping a small histogram for
+ *     each MCV item, and a separate histogram for values not on the
+ *     MCV list). But that's not implemented at this moment.
+ *
+ * (c) Does not consider that some solutions may better exploit the
+ *     dependencies. For example with clauses on columns [A,B,C,D] and
+ *     statistics on [A,B,C] and [C,D] cover all the columns just like
+ *     [A,B,C] and [B,C,D], but the latter probably exploits additional
+ *     dependencies thanks to having 'B' in both stats (thus allowing
+ *     using it as a condition for the second stats). Of course, if
+ *     B and [C,D] are independent, this is untrue - but if we have that
+ *     statistics created, it's a sign that the DBA/developer believes
+ *     there's a dependency.
+ *
+ * (d) Does not consider the order of clauses, which may be significant.
+ *     For example, when there's a mix of simple and complex clauses,
+ *     i.e. something like
+ *
+ *       (a=2) AND (b=3 OR (c=3 AND d=4)) AND (c=3)
+ *
+ *     It may be better to evaluate the simple clauses first, and then
+ *     use them as conditions for the complex clause.
+ *
+ *     We can for example count number of different attributes
+ *     referenced in the clause, and use that as a metric of complexity
+ *     (lower number -> simpler). Maybe use ratio (#vars/#atts) or
+ *     (#clauses/#atts) as secondary metrics? Also the general complexity
+ *     of the clause (levels of nesting etc.) might be useful.
+ *
+ *     Hopefully most clauses will be reasonably simple, though.
+ *
+ *     Update: On second thought, I believe the order of clauses is
+ *     determined by choosing the order of statistics, and therefore
+ *     optimized by the current algorithm.
+ *
+ * TODO Consider adding a counter of attributes covered by previous
+ *      stats (possibly tracking the number of how many stats reference
+ *      it too), and use this 'dependency_count' when selecting the best
+ *      solution (not sure how). Similarly to (a) it might be possible
+ *      to build estimate for each solution (different criteria) and then
+ *      combine them somehow.
+ *
+ * TODO The current implementation repeatedly walks through the previous
+ *      stats, just to compute the number of covered attributes over and
+ *      over. With non-trivial number of statistics this might be an
+ *      issue, so maybe we should keep track of 'covered' attributes by
+ *      each step, so that we can get rid of this. We'll need this
+ *      information anyway (when splitting clauses into condition and
+ *      the estimated part).
+ *
+ * TODO This needs to consider the conditions passed from the preceding
+ *      and upper clauses (in complex cases), but only as conditions
+ *      and not as estimated clauses. So it needs to somehow affect the
+ *      score (the more conditions we use the better).
+ *
+ * TODO The algorithm should probably count number of Vars (not just
+ *      attnums) when computing the 'score' of each solution. Computing
+ *      the ratio of (num of all vars) / (num of condition vars) as a
+ *      measure of how well the solution uses conditions might be
+ *      useful.
+ *
+ * TODO This might be much easier if we kept Bitmapset of attributes
+ *      covered by the stats up to that step.
+ *
+ * FIXME When comparing the solutions, we currently use this condition:
+ *
+ *       ((current->nstats > (*best)->nstats))
+ *
+ *       i.e. we're choosing solution with more stats, because with
+ *       clauses
+ *
+ *           (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ *       and stats on [a,b], [b,c], [c,d] we want to choose the solution
+ *       with all three stats, and not just [a,b], [c,d]. Otherwise we'd
+ *       fail to exploit one of the dependencies.
+ *
+ *       This is however a workaround for another issue - we're not
+ *       tracking number of 'dependencies' covered by the solution, only
+ *       number of clauses, and that's the same for both solutions.
+ *       ([a,b], [c,d]) and ([a,b], [b,c], [c,d]) both cover all 4 clauses.
+ *
+ *       Once a suitable metric is added, we want to choose the solution
+ *       with less stats, assuming it covers the same number of clauses
+ *       and exploits the same number of dependencies.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStats mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		/* we can't get more conditions that clauses and conditions combined
+		 *
+		 * FIXME This assert does not work because we count the conditions
+		 *       repeatedly (once for each statistics covering it).
+		 */
+		/* Assert((nconditions + nclauses) >= current->nconditions); */
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics
+ * covering the clauses. This chooses the "best" statistics at each step,
+ * so the resulting solution may not be the best solution globally, but
+ * this produces the solution in only N steps (where N is the number of
+ * statistics), while the exhaustive approach may have to walk through
+ * ~N! combinations (although some of those are terminated early).
+ *
+ * TODO There are probably other metrics we might use - e.g. using
+ *      number of columns (num_cond_columns / num_cov_columns), which
+ *      might work better with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled
+ *      in a special way, because there will be 0 conditions at that
+ *      moment, so there needs to be some other criteria - e.g. using
+ *      the simplest (or most complex?) clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria,
+ *      and branch the search. This is however tricky, because if we
+ *      choose k statistics at each step, we get k^N branches to
+ *      walk through (with N steps). That's not really good with
+ *      large number of stats (yet better than exhaustive search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStats mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *new = bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					new = bms_union(attnums_conditions,
+												  clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
  * We're looking for statistics matching at least 2 attributes,
  * referenced in the clauses compatible with multivariate statistics.
  * The current selection criteria is very simple - we choose the
@@ -1299,48 +2213,386 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
  * TODO This will probably have to consider compatibility of clauses,
  *      because 'dependencies' will probably work only with equality
  *      clauses.
+ *
+ * TODO Another way to make the optimization problems smaller might
+ *      be splitting the statistics into several disjoint subsets, i.e.
+ *      if we can split the graph of statistics (after the elimination)
+ *      into multiple components (so that stats in different components
+ *      share no attributes), we can do the optimization for each
+ *      component separately.
+ *
+ * TODO Another possible optimization might be removing redundant
+ *      statistics - if statistics S1 covers S2 (covers S2 attributes
+ *      and possibly some more), we can probably remove S2. What
+ *      actually matters are attributes from covered clauses (not all
+ *      the original attributes). This might however prefer larger,
+ *      and thus less accurate, statistics.
+ *
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew
+ *      that we can cover 10 clauses and reuse 8 dependencies, maybe
+ *      covering 9 clauses and 7 dependencies would be OK?
  */
-static int
-choose_mv_statistics(int nmvstats, MVStats mvstats, Bitmapset *attnums)
+static mv_solution_t *
+choose_mv_statistics(PlannerInfo *root, int nmvstats, MVStats mvstats,
+					 List *clauses, List *conditions,
+					 Oid varRelid, SpecialJoinInfo *sjinfo, int type)
 {
 	int i, j;
+	mv_solution_t *best = NULL;
+	ListCell *l;
+
+	/* pass only stats matching at least two attributes (from clauses) */
+	MVStats	mvstats_filtered = (MVStats)palloc0(nmvstats * sizeof(MVStatsData));
+	int		nmvstats_filtered;
+	bool	repeat = true;
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
+
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
 
-	int choice = -1;
-	int current_matches = 1;					/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements)
-	 * and for each one count the referenced attributes (encoded in
-	 * the 'attnums' bitmap).
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
 	 */
+	while (repeat)
+	{
+		/* pass only mv-compatible clauses covered by at least one statistics */
+		List *compatible_clauses = NIL;
+		List *compatible_conditions = NIL;
+
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
+
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new.
+		 */
+		foreach (l, clauses)
+		{
+			Node *clause = (Node*)lfirst(l);
+			Bitmapset *clause_attnums = NULL;
+			Oid relid;
+
+			/*
+			 * The clause has to be mv-compatible (suitable operators etc.).
+			 */
+			if (! clause_is_mv_compatible(root, clause, varRelid,
+							 &relid, &clause_attnums, sjinfo, type))
+				continue;
+
+			/* is there a statistics covering this clause? */
+			for (i = 0; i < nmvstats; i++)
+			{
+				int k, matches = 0;
+				for (k = 0; k < mvstats[i].stakeys->dim1; k++)
+				{
+					if (bms_is_member(mvstats[i].stakeys->values[k],
+									  clause_attnums))
+						matches += 1;
+				}
+
+				/*
+				 * The clause is compatible if all attributes it references
+				 * are covered by the statistics.
+				 */
+				if (bms_num_members(clause_attnums) == matches)
+				{
+					compatible_attnums = bms_union(compatible_attnums,
+												   clause_attnums);
+					compatible_clauses = lappend(compatible_clauses,
+												 clause);
+					break;
+				}
+			}
+
+			bms_free(clause_attnums);
+		}
+
+		/* we can't have more compatible clauses that source clauses */
+		Assert(list_length(clauses) >= list_length(compatible_clauses));
+
+		/* work with only compatible clauses from now */
+		list_free(clauses);
+		clauses = compatible_clauses;
+
+		/*
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new.
+		 */
+
+		/* next, generate bitmap of attnums from all mv_compatible conditions */
+		foreach (l, conditions)
+		{
+			Node *clause = (Node*)lfirst(l);
+			Bitmapset *clause_attnums = NULL;
+			Oid relid;
+
+			/*
+			 * The clause has to be mv-compatible (suitable operators etc.).
+			 */
+			if (! clause_is_mv_compatible(root, clause, varRelid,
+							 &relid, &clause_attnums, sjinfo, type))
+				continue;
+
+			/* is there a statistics covering this clause? */
+			for (i = 0; i < nmvstats; i++)
+			{
+				int k, matches = 0;
+				for (k = 0; k < mvstats[i].stakeys->dim1; k++)
+				{
+					if (bms_is_member(mvstats[i].stakeys->values[k],
+									  clause_attnums))
+						matches += 1;
+				}
+
+				if (bms_num_members(clause_attnums) == matches)
+				{
+					condition_attnums = bms_union(condition_attnums,
+												  clause_attnums);
+					compatible_conditions = lappend(compatible_conditions,
+													clause);
+					break;
+				}
+			}
+
+			bms_free(clause_attnums);
+		}
+
+		/* we can't have more compatible conditions than source conditions */
+		Assert(list_length(conditions) >= list_length(compatible_conditions));
+
+		/* keep only compatible clauses */
+		list_free(conditions);
+		conditions = compatible_conditions;
+
+		/* get a union of attnums (from conditions and clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		nmvstats_filtered = 0;
+
+		for (i = 0; i < nmvstats; i++)
+		{
+			int k;
+			int matches_new = 0,
+				matches_all = 0;
+
+			for (k = 0; k < mvstats[i].stakeys->dim1; k++)
+			{
+				/* attribute covered by new clause(s) */
+				if (bms_is_member(mvstats[i].stakeys->values[k],
+								  compatible_attnums))
+					matches_new += 1;
+
+				/* attribute covered by clause(s) or ondition(s) */
+				if (bms_is_member(mvstats[i].stakeys->values[k],
+								  all_attnums))
+					matches_all += 1;
+			}
+
+			/* check we have enough attributes for this statistics */
+			if ((matches_new >= 1) && (matches_all >= 2))
+			{
+				mvstats_filtered[nmvstats_filtered] = mvstats[i];
+				nmvstats_filtered += 1;
+			}
+		}
+
+		/* we can't have more useful stats than we had originally */
+		Assert(nmvstats >= nmvstats_filtered);
+
+		/* if we've eliminated a statistics, trigger another round */
+		repeat = (nmvstats > nmvstats_filtered);
+
+		/*
+		 * work only with filtered statistics from now
+		 *
+		 * FIXME This rewrites the input 'mvstats' array, which is not
+		 *       exactly pretty as it's an unexpected side-effect (the
+		 *       caller may use the stats for something else). But the
+		 *       solution contains indexes into this 'reduced' array so
+		 *       we can't stop doing that easily.
+		 *
+		 *       Another issue is that we only modify the local 'mvstats'
+		 *       value, so the caller will still see the original number
+		 *       of stats (and thus maybe duplicate entries).
+		 *
+		 *       We should make a copy of the array, and only mess with
+		 *       that copy (and map the indexes to the original ones at
+		 *       the end, when returning the solution to the user). Or
+		 *       simply work with OIDs.
+		 */
+		if (nmvstats_filtered < nmvstats)
+		{
+			nmvstats = nmvstats_filtered;
+			memcpy(mvstats, mvstats_filtered, sizeof(MVStatsData)*nmvstats);
+			nmvstats_filtered = 0;
+		}
+	}
+
+	/* only do the optimization if we have clauses/statistics */
+	if ((nmvstats == 0) || (list_length(clauses) == 0))
+		return NULL;
+
+	stats_attnums
+		= (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset *));
+
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+
 	for (i = 0; i < nmvstats; i++)
 	{
-		/* columns matching this statistics */
-		int matches = 0;
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i] = bms_add_member(stats_attnums[i],
+										mvstats[i].stakeys->values[j]);
+	}
 
-		int2vector * attrs = mvstats[i].stakeys;
-		int	numattrs = mvstats[i].stakeys->dim1;
+	/* collect clauses an bitmap of attnums */
+	nclauses = 0;
+	clauses_attnums = (Bitmapset **)palloc0(list_length(clauses)
+											* sizeof(Bitmapset *));
+	clauses_array = (Node **)palloc0(list_length(clauses)
+									 * sizeof(Node *));
 
-		/* count columns covered by the histogram */
-		for (j = 0; j < numattrs; j++)
-			if (bms_is_member(attrs->values[j], attnums))
-				matches++;
+	foreach (l, clauses)
+	{
+		Oid relid;
+		Bitmapset * attnums = NULL;
 
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * The clause has to be mv-compatible (suitable operators etc.).
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		if (! clause_is_mv_compatible(root, (Node *)lfirst(l), varRelid,
+						 &relid, &attnums, sjinfo, type))
+			elog(ERROR, "should not get non-mv-compatible cluase");
+
+		clauses_attnums[nclauses] = attnums;
+		clauses_array[nclauses] = (Node *)lfirst(l);
+		nclauses += 1;
+	}
+
+	/* collect conditions and bitmap of attnums */
+	nconditions = 0;
+	conditions_attnums = (Bitmapset **)palloc0(list_length(conditions)
+												* sizeof(Bitmapset *));
+	conditions_array = (Node **)palloc0(list_length(conditions)
+													* sizeof(Node *));
+
+	foreach (l, conditions)
+	{
+		Oid relid;
+		Bitmapset * attnums = NULL;
+
+		/* conditions are mv-compatible (thanks to the reduction) */
+		if (! clause_is_mv_compatible(root, (Node *)lfirst(l), varRelid,
+						 &relid, &attnums, sjinfo, type))
+			elog(ERROR, "should not get non-mv-compatible cluase");
+
+		conditions_attnums[nconditions] = attnums;
+		conditions_array[nconditions] = (Node *)lfirst(l);
+		nconditions += 1;
+	}
+
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map	= (bool*)palloc0(nclauses * nmvstats);
+	condition_cover_map	= (bool*)palloc0(nconditions * nmvstats);
+	ruled_out 			= (int*)palloc0(nmvstats * sizeof(int));
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		ruled_out[i] = -1;	/* not ruled out by default */
+		for (j = 0; j < nclauses; j++)
+		{
+			clause_cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j],
+								stats_attnums[i]);
+		}
+
+		for (j = 0; j < nconditions; j++)
 		{
-			choice = i;
-			current_matches = matches;
-			current_dims = numattrs;
+			condition_cover_map[i * nconditions + j]
+				= bms_is_subset(conditions_attnums[j],
+								stats_attnums[i]);
 		}
 	}
 
-	return choice;
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* maybe we should leave the cleanup up to the memory context */
+	pfree(mvstats_filtered);
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(clauses_array);
+	pfree(conditions_attnums);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+
+	return best;
 }
 
 
@@ -1624,6 +2876,51 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 	return false;
 }
 
+
+static Bitmapset *
+clause_mv_get_attnums(PlannerInfo *root, Node *clause)
+{
+	Bitmapset * attnums = NULL;
+
+	/* Extract clause from restrict info, if needed. */
+	if (IsA(clause, RestrictInfo))
+		clause = (Node*)((RestrictInfo*)clause)->clause;
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+
+		if (IsA(linitial(expr->args), Var))
+			attnums = bms_add_member(attnums,
+							((Var*)linitial(expr->args))->varattno);
+		else
+			attnums = bms_add_member(attnums,
+							((Var*)lsecond(expr->args))->varattno);
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		attnums = bms_add_member(attnums,
+							((Var*)((NullTest*)clause)->arg)->varattno);
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			attnums = bms_join(attnums,
+						clause_mv_get_attnums(root, (Node*)lfirst(l)));
+		}
+	}
+
+	return attnums;
+}
+
 /*
  * Performs reduction of clauses using functional dependencies, i.e.
  * removes clauses that are considered redundant. It simply walks
@@ -2049,20 +3346,24 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses, Oid varRelid,
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStats mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStats mvstats,
+								  List *clauses, List *conditions,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2073,13 +3374,34 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/* conditions */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * nmatches);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/*
+	 * build the match bitmap for the conditions
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
 										   nmatches, matches,
@@ -2088,14 +3410,25 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t);
 }
 
 /*
@@ -2490,13 +3823,17 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStats mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStats mvstats,
+									List *clauses, List *conditions)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
+
 	MVHistogram mvhist = NULL;
 
 	/* there's no histogram */
@@ -2508,18 +3845,34 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
 	 * Bitmap of bucket matches (mismatch, partial, full). by default
 	 * all buckets fully match (and we'll eliminate them).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
-	nmatches = mvhist->nbuckets;
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/* build the match bitmap for the conditions */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
 								  nmatches, matches, false);
@@ -2527,17 +3880,37 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t);
 }
 
 /*
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1a0d358..71beb2e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3280,7 +3280,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3303,7 +3304,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3470,7 +3472,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3506,7 +3508,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3543,7 +3546,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3681,12 +3685,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3698,7 +3704,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index f0acc14..e41508b 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4dd3f9f..326dd36 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1580,13 +1580,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6196,7 +6198,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6521,7 +6524,8 @@ btcostestimate(PG_FUNCTION_ARGS)
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7264,7 +7268,8 @@ gincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7496,7 +7501,7 @@ brincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b8a0f9f..5cd2583 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -396,6 +397,15 @@ static const struct config_enum_entry row_security_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3651,6 +3661,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..7a3835b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -182,11 +182,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 673e546..194bbf8 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -16,6 +16,14 @@
 
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Basic info about the stats, used when choosing what to use
  */
-- 
2.0.5

mcv.sqlapplication/sql; name=mcv.sqlDownload

histogram.sqlapplication/sql; name=histogram.sqlDownload

dependencies.sqlapplication/sql; name=dependencies.sqlDownload

#30

Jeff Janes

jeff.janes@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#29)

Re: WIP: multivariate statistics / proof of concept

On Mon, Mar 30, 2015 at 5:26 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

Hello,

attached is a new version of the patch series. Aside from fixing various
issues (crashes, memory leaks). The patches are rebased to current
master, and I also attach a few SQL scripts I used for testing (nothing
fancy, just stress-testing all the parts the patch touches).

Hi Tomas,

I get cascading conflicts in pg_proc.h. It looked easy enough to fix,
except then I get compiler errors:

funcapi.c: In function 'get_func_trftypes':
funcapi.c:890: warning: unused variable 'procStruct'
utils/fmgrtab.o:(.rodata+0x10cf8): undefined reference to `_null_'
utils/fmgrtab.o:(.rodata+0x10d18): undefined reference to `_null_'
utils/fmgrtab.o:(.rodata+0x10d38): undefined reference to `_null_'
utils/fmgrtab.o:(.rodata+0x10d58): undefined reference to `_null_'
collect2: ld returned 1 exit status
make[2]: *** [postgres] Error 1
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2
make: *** Waiting for unfinished jobs....
make: *** [temp-install] Error 2

Cheers,

Jeff

#31

Stephen Frost

sfrost@snowman.net

over 10 years ago

In reply to: Jeff Janes (#30)

Re: WIP: multivariate statistics / proof of concept

* Jeff Janes (jeff.janes@gmail.com) wrote:

On Mon, Mar 30, 2015 at 5:26 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

attached is a new version of the patch series. Aside from fixing various
issues (crashes, memory leaks). The patches are rebased to current
master, and I also attach a few SQL scripts I used for testing (nothing
fancy, just stress-testing all the parts the patch touches).

I get cascading conflicts in pg_proc.h. It looked easy enough to fix,
except then I get compiler errors:

Yeah, those are because you didn't address the new column which was
added to pg_proc. You need to add another _null_ in the pg_proc.h lines
in the correct place, apparently on four lines.

Thanks!

Stephen

#32

Jeff Janes

jeff.janes@gmail.com

over 10 years ago

In reply to: Stephen Frost (#31)

1 attachment(s)

Re: WIP: multivariate statistics / proof of concept

On Tue, Apr 28, 2015 at 9:13 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Jeff Janes (jeff.janes@gmail.com) wrote:

On Mon, Mar 30, 2015 at 5:26 PM, Tomas Vondra <

tomas.vondra@2ndquadrant.com>

wrote:

attached is a new version of the patch series. Aside from fixing

various

issues (crashes, memory leaks). The patches are rebased to current
master, and I also attach a few SQL scripts I used for testing (nothing
fancy, just stress-testing all the parts the patch touches).

I get cascading conflicts in pg_proc.h. It looked easy enough to fix,
except then I get compiler errors:

Yeah, those are because you didn't address the new column which was
added to pg_proc. You need to add another _null_ in the pg_proc.h lines
in the correct place, apparently on four lines.

Thanks. I think I tried that, but was still having trouble. But it turns
out that the trouble was for an unrelated reason, and I got it to compile
now.

Some of the fdw's need a patch as well in order to compile, see attached.

Cheers,

Jeff

Attachments:

multivariate_contrib.patchapplication/octet-stream; name=multivariate_contrib.patchDownload

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
new file mode 100644
index 4368897..7b4839b
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
*************** estimate_size(PlannerInfo *root, RelOptI
*** 947,953 ****
  							   baserel->baserestrictinfo,
  							   0,
  							   JOIN_INNER,
! 							   NULL);
  
  	nrows = clamp_row_est(nrows);
  
--- 947,954 ----
  							   baserel->baserestrictinfo,
  							   0,
  							   JOIN_INNER,
! 							   NULL,
! 							   NIL);
  
  	nrows = clamp_row_est(nrows);
  
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
new file mode 100644
index 478e124..ff6b438
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
*************** postgresGetForeignRelSize(PlannerInfo *r
*** 478,484 ****
  													 fpinfo->local_conds,
  													 baserel->relid,
  													 JOIN_INNER,
! 													 NULL);
  
  	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
  
--- 478,485 ----
  													 fpinfo->local_conds,
  													 baserel->relid,
  													 JOIN_INNER,
! 													 NULL,
! 													 NIL);
  
  	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
  
*************** estimate_path_cost_size(PlannerInfo *roo
*** 1770,1776 ****
  										   local_join_conds,
  										   baserel->relid,
  										   JOIN_INNER,
! 										   NULL);
  		local_sel *= fpinfo->local_conds_sel;
  
  		rows = clamp_row_est(rows * local_sel);
--- 1771,1778 ----
  										   local_join_conds,
  										   baserel->relid,
  										   JOIN_INNER,
! 										   NULL,
! 										   NIL);
  		local_sel *= fpinfo->local_conds_sel;
  
  		rows = clamp_row_est(rows * local_sel);

#33

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Jeff Janes (#32)

Re: WIP: multivariate statistics / proof of concept

Hi,

On 04/28/15 19:36, Jeff Janes wrote:

...

Thanks. I think I tried that, but was still having trouble. But it
turns out that the trouble was for an unrelated reason, and I got it
to compile now.

Yeah, a new column was added to pg_proc the day after I submitted the
pacth. Will address that in a new version, hopefully in a few days.

Some of the fdw's need a patch as well in order to compile, see
attached.

Thanks, I forgot to tweak the clauselist_selectivity() calls contrib :-(

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#33)

6 attachment(s)

Re: multivariate statistics / patch v6

Attached is v6 of the multivariate stats, with a number of improvements:

1) fix of the contrib compile-time errors (reported by Jeff)

2) fix of pg_proc issues (reported by Jeff)

3) rebase to current master

4) fix a bunch of issues in the previous patches, due to referencing
some parts too early (e.g. histograms in the first patch, etc.)

5) remove the explicit DELETEs from pg_mv_statistic (in the regression
tests), this is now handled automatically by DROP TABLE etc.

6) number of performance optimizations in selectivity estimations:

(a) minimize calls to get_oprrest, significantly reducing
syscache calls

(b) significant reduction of palloc overhead in deserialization of
MCV lists and histograms

(d) use histograms with limited deserialization, which also allows
caching function calls

(e) modified histogram bucket partitioning, resulting in more even
bucket distribution (i.e. producing buckets with more equal
density and about equal size of each dimension)

7) add functions for listing MCV list items and histogram buckets:

- pg_mv_mcvlist_items(oid)
- pg_mv_histogram_buckets(oid, type)

This is quite useful when analyzing the MCV lists / histograms.

8) improved support for OR clauses

9) allow calling pull_varnos() on expression trees containing
RestrictInfo nodes (not sure if this is the right fix, it's being
discussed in another thread)

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-shared-infrastructure-and-functional-dependencies.patchtext/x-patch; name=0001-shared-infrastructure-and-functional-dependencies.patchDownload

>From 62e862b0debfdb44976388a577798179eb7a0727 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 1/6] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate
stats, most importantly:

- adds a new system catalog (pg_mv_statistic)
- ALTER TABLE ... ADD STATISTICS syntax
- implementation of functional dependencies (the simplest
  type of multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e.
it does not influence the query planning (subject to
follow-up patches).

The current implementation requires a valid 'ltopr' for
the columns, so that we can sort the sample rows in various
ways, both in this patch and other kinds of statistics.
Maybe this restriction could be relaxed in the future,
requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

The algorithm detecting the dependencies is rather simple
and probably needs improvements.

The name 'functional dependencies' is more correct (than
'association rules') as it's exactly the name used in
relational theory (esp. Normal Forms) for tracking
column-level dependencies.

The multivariate statistics are automatically removed in
two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics
     would be defined on less than 2 columns (remaining)

If there are more at least 2 columns remaining, we keep
the statistics but perform cleanup on the next ANALYZE.
The dropped columns are removed from stakeys, and the new
statistics is built on the smaller set.

We can't do this at DROP COLUMN, because that'd leave us
with invalid statistics, or we'd have to throw it away
although we can still use it. This lazy approach lets us
use the statistics although some of the columns are dead.

Dropping the statistics is done using DROP STATISTICS

   ALTER TABLE ... DROP STATISTICS ALL;
   ALTER TABLE ... DROP STATISTICS (opts) ON (cols);

The bad consequence of this is that 'statistics' becomes
a reserved keyword (was unreserved before), otherwise it
conflicts with DROP <columnname> in the grammar. Not sure
if there's a workaround to this.

This also adds a simple list of statistics to \d in psql.
---
 src/backend/catalog/Makefile               |   1 +
 src/backend/catalog/heap.c                 | 102 +++++
 src/backend/catalog/system_views.sql       |  10 +
 src/backend/commands/analyze.c             |  20 +-
 src/backend/commands/tablecmds.c           | 346 +++++++++++++++-
 src/backend/nodes/copyfuncs.c              |  14 +
 src/backend/nodes/outfuncs.c               |  18 +
 src/backend/optimizer/util/plancat.c       |  63 +++
 src/backend/parser/gram.y                  |  84 +++-
 src/backend/utils/Makefile                 |   2 +-
 src/backend/utils/cache/relcache.c         |  59 +++
 src/backend/utils/cache/syscache.c         |  12 +
 src/backend/utils/mvstats/Makefile         |  17 +
 src/backend/utils/mvstats/common.c         | 356 ++++++++++++++++
 src/backend/utils/mvstats/common.h         |  75 ++++
 src/backend/utils/mvstats/dependencies.c   | 638 +++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |  40 ++
 src/include/catalog/heap.h                 |   1 +
 src/include/catalog/indexing.h             |   5 +
 src/include/catalog/pg_mv_statistic.h      |  69 ++++
 src/include/catalog/pg_proc.h              |   5 +
 src/include/catalog/toasting.h             |   1 +
 src/include/nodes/nodes.h                  |   2 +
 src/include/nodes/parsenodes.h             |  12 +-
 src/include/nodes/relation.h               |  28 ++
 src/include/parser/kwlist.h                |   2 +-
 src/include/utils/mvstats.h                |  69 ++++
 src/include/utils/rel.h                    |   4 +
 src/include/utils/relcache.h               |   1 +
 src/include/utils/syscache.h               |   1 +
 src/test/regress/expected/rules.out        |   8 +
 src/test/regress/expected/sanity_check.out |   1 +
 32 files changed, 2057 insertions(+), 9 deletions(-)
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 37d05d1..8476489 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index d04e94d..1c28ca3 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -46,6 +46,7 @@
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1611,7 +1612,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1839,6 +1843,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2694,6 +2703,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2ad01f4..07586c6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -150,6 +150,16 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 15ec0ad..fff27e0 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -54,7 +55,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Data structure for Algorithm S from Knuth 3.4.2 */
 typedef struct
@@ -111,7 +116,6 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
-
 /*
  *	analyze_rel() -- analyze one relation
  */
@@ -474,6 +478,17 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, and in some cases samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -576,6 +591,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 299d8cc..5c57146 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -92,7 +93,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -140,8 +141,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
@@ -416,6 +418,10 @@ static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKM
 static void ATExecGenericOptions(Relation rel, List *options);
 static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
+static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
 
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
 				   ForkNumber forkNum, char relpersistence);
@@ -3011,6 +3017,8 @@ AlterTableGetLockLevel(List *cmds)
 				 * updates.
 				 */
 			case AT_SetStatistics:		/* Uses MVCC in getTableAttrs() */
+			case AT_AddStatistics:		/* XXX not sure if the right level */
+			case AT_DropStatistics:		/* XXX not sure if the right level */
 			case AT_ClusterOn:	/* Uses MVCC in getIndexes() */
 			case AT_DropCluster:		/* Uses MVCC in getIndexes() */
 			case AT_SetOptions:	/* Uses MVCC in getTableAttrs() */
@@ -3167,6 +3175,8 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_ADD_CONSTR;
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
+		case AT_AddStatistics:	/* XXX maybe not the right place */
+		case AT_DropStatistics:	/* XXX maybe not the right place */
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* Performs own permission checks */
 			ATPrepSetStatistics(rel, cmd->name, cmd->def, lockmode);
@@ -3469,6 +3479,12 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			address = ATExecSetStatistics(rel, cmd->name, cmd->def, lockmode);
 			break;
+		case AT_AddStatistics:		/* ADD STATISTICS */
+			ATExecAddStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
+		case AT_DropStatistics:		/* DROP STATISTICS */
+			ATExecDropStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 			address = ATExecSetOptions(rel, cmd->name, cmd->def, false, lockmode);
 			break;
@@ -11860,3 +11876,327 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 
 	ReleaseSysCache(tuple);
 }
+
+/* used for sorting the attnums in ATExecAddStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the ALTER TABLE ... ADD STATISTICS (options) ON (columns).
+ *
+ * The code is an unholy mix of pieces that really belong to other parts
+ * of the source tree.
+ *
+ * FIXME Check that the types are pass-by-value and support sort,
+ *       although maybe we can live without the sort (and only build
+ *       MCV list / association rules).
+ *
+ * FIXME This should probably check for duplicate stats (i.e. same
+ *       keys, same options). Although maybe it's useful to have
+ *       multiple stats on the same columns with different options
+ *       (say, a detailed MCV-only stats for some queries, histogram
+ *       for others, etc.)
+ */
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+
+	/* by default build everything */
+	bool 	build_dependencies = true;
+
+	Assert(IsA(def, StatisticsDef));
+
+	/* transform the column names to attnum values */
+
+	foreach(l, def->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, def->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	heap_freetuple(htup);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	return;
+}
+
+/*
+ * Implements the ALTER TABLE ... DROP STATISTICS in two forms:
+ *
+ *     ALTER TABLE ... DROP STATISTICS (options) ON (columns)
+ *     ALTER TABLE ... DROP STATISTICS ALL;
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	Relation	statrel;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	ListCell   *l;
+
+	int16	attnums[INDEX_MAX_KEYS];
+	int		numcols = 0;
+
+	/* checking whether the statistics matches / should be dropped */
+	bool	build_dependencies = false;
+	bool	check_dependencies = false;
+
+	if (def != NULL)
+	{
+		Assert(IsA(def, StatisticsDef));
+
+		/* collect attribute numbers */
+		foreach(l, def->keys)
+		{
+			char	   *attname = strVal(lfirst(l));
+			HeapTuple	atttuple;
+
+			atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+			if (!HeapTupleIsValid(atttuple))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("column \"%s\" referenced in statistics does not exist",
+								attname)));
+
+			/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+			if (numcols >= MVSTATS_MAX_DIMENSIONS)
+				ereport(ERROR,
+						(errcode(ERRCODE_TOO_MANY_COLUMNS),
+						 errmsg("cannot have more than %d keys in a statistics",
+								MVSTATS_MAX_DIMENSIONS)));
+
+			attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+			ReleaseSysCache(atttuple);
+			numcols++;
+		}
+
+		/* parse the statistics options */
+		foreach (l, def->options)
+		{
+			DefElem *opt = (DefElem*)lfirst(l);
+
+			if (strcmp(opt->defname, "dependencies") == 0)
+			{
+				check_dependencies = true;
+				build_dependencies = defGetBoolean(opt);
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("unrecognized STATISTICS option \"%s\"",
+								opt->defname)));
+		}
+
+	}
+
+	statrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(rel)));
+
+	scan = systable_beginscan(statrel,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		/* by default we delete everything */
+		bool delete = true;
+
+		/* check that the options match (dependencies, mcv, histogram) */
+		if (delete && check_dependencies)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_deps_enabled,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetBool(adatum) == build_dependencies);
+		}
+
+		/* check that the columns match the statistics definition */
+		if (delete && (numcols > 0))
+		{
+			int i, j;
+			ArrayType *arr;
+			bool isnull;
+
+			int16  *stakeys;
+			int		nstakeys;
+
+			Datum adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+
+			nstakeys = ARR_DIMS(arr)[0];
+			stakeys = (int16 *) ARR_DATA_PTR(arr);
+
+			/* assume match */
+			delete = true;
+
+			/* check that for each column we find a match in stakeys */
+			for (i = 0; i < numcols; i++)
+			{
+				bool found = false;
+				for (j = 0; j < nstakeys; j++)
+				{
+					if (attnums[i] == stakeys[j])
+					{
+						found = true;
+						break;
+					}
+				}
+
+				if (! found)
+				{
+					delete = false;
+					break;
+				}
+			}
+
+			/* check that for each stakeys we find a match in columns */
+			for (j = 0; j < nstakeys; j++)
+			{
+				bool found = false;
+
+				for (i = 0; i < numcols; i++)
+				{
+					if (attnums[i] == stakeys[j])
+					{
+						found = true;
+						break;
+					}
+				}
+
+				if (! found)
+				{
+					delete = false;
+					break;
+				}
+			}
+		}
+
+		/* don't delete, if we've found mismatches */
+		if (delete)
+			simple_heap_delete(statrel, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(statrel, RowExclusiveLock);
+
+	/*
+	 * Invalidate relcache so that others forget the dropped statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	return;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 805045d..ddc88a3 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3938,6 +3938,17 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static StatisticsDef *
+_copyStatisticsDef(const StatisticsDef *from)
+{
+	StatisticsDef  *newnode = makeNode(StatisticsDef);
+
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4755,6 +4766,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_StatisticsDef:
+			retval = _copyStatisticsDef(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f9f948e..c2d5dc5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1842,6 +1842,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3220,6 +3235,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_CreateStmt:
 				_outCreateStmt(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 068ab39..1cf64f8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -26,6 +26,7 @@
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/heap.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -38,7 +39,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -89,6 +92,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -377,6 +381,65 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+			if (! HeapTupleIsValid(htup))
+				continue;
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0180530..dbeb3c8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -369,6 +369,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				relation_expr_list dostmt_opt_list
 				transform_element_list transform_type_list
 
+%type <list>	OptStatsOptions
+%type <str>		stats_options_name
+%type <node>	stats_options_arg
+%type <defelt>	stats_options_elem
+%type <list>	stats_options_list
+
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -488,7 +495,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <keyword> unreserved_keyword type_func_name_keyword
 %type <keyword> col_name_keyword reserved_keyword
 
-%type <node>	TableConstraint TableLikeClause
+%type <node>	TableConstraint TableLikeClause TableStatistics
 %type <ival>	TableLikeOptionList TableLikeOption
 %type <list>	ColQualList
 %type <node>	ColConstraint ColConstraintElem ConstraintAttr
@@ -2315,6 +2322,29 @@ alter_table_cmd:
 					n->subtype = AT_DisableRowSecurity;
 					$$ = (Node *)n;
 				}
+			/* ALTER TABLE <name> ADD STATISTICS (options) ON (columns) */
+			| ADD_P TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_AddStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
+			/* ALTER TABLE <name> DROP STATISTICS (options) ON (columns) */
+			| DROP TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_DropStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
+			/* ALTER TABLE <name> DROP STATISTICS ALL */
+			| DROP STATISTICS ALL
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_DropStatistics;
+					$$ = (Node *)n;
+				}
 			| alter_generic_options
 				{
 					AlterTableCmd *n = makeNode(AlterTableCmd);
@@ -3389,6 +3419,56 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				ALTER TABLE relname ADD STATISTICS (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+TableStatistics:
+			STATISTICS OptStatsOptions ON '(' columnList ')'
+				{
+					StatisticsDef *n = makeNode(StatisticsDef);
+					n->keys  = $5;
+					n->options  = $2;
+					$$ = (Node *) n;
+				}
+		;
+
+OptStatsOptions:
+			'(' stats_options_list ')'		{ $$ = $2; }
+			| /*EMPTY*/						{ $$ = NIL; }
+		;
+
+stats_options_list:
+			stats_options_elem
+				{
+					$$ = list_make1($1);
+				}
+			| stats_options_list ',' stats_options_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+stats_options_elem:
+			stats_options_name stats_options_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+stats_options_name:
+			NonReservedWord			{ $$ = $1; }
+		;
+
+stats_options_arg:
+			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
+			| NumericOnly			{ $$ = (Node *) $1; }
+			| /* EMPTY */			{ $$ = NULL; }
+		;
+
 
 /*****************************************************************************
  *
@@ -13547,7 +13627,6 @@ unreserved_keyword:
 			| STANDALONE_P
 			| START
 			| STATEMENT
-			| STATISTICS
 			| STDIN
 			| STDOUT
 			| STORAGE
@@ -13762,6 +13841,7 @@ reserved_keyword:
 			| SELECT
 			| SESSION_USER
 			| SOME
+			| STATISTICS
 			| SYMMETRIC
 			| TABLE
 			| THEN
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index e745006..855ff05 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3906,6 +3907,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4875,6 +4932,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index f58e1ce..9aaf68f 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -501,6 +502,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		128
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..6d5465b
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..0ca16a0
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,638 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Mine functional dependencies between columns, in the form (A => B),
+ * meaning that a value in column 'A' determines value in 'B'. A simple
+ * artificial example may be a table created like this
+ *
+ *     CREATE TABLE deptest (a INT, b INT)
+ *        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+ *
+ * Clearly, once we know the value for 'A' we can easily determine the
+ * value of 'B' by dividing (A/10). A more practical example may be
+ * addresses, where (ZIP code => city name), i.e. once we know the ZIP,
+ * we probably know which city it belongs to. Larger cities usually have
+ * multiple ZIP codes, so the dependency can't be reversed.
+ *
+ * Functional dependencies are a concept well described in relational
+ * theory, especially in definition of normalization and "normal forms".
+ * Wikipedia has a nice definition of a functional dependency [1]:
+ *
+ *     In a given table, an attribute Y is said to have a functional
+ *     dependency on a set of attributes X (written X -> Y) if and only
+ *     if each X value is associated with precisely one Y value. For
+ *     example, in an "Employee" table that includes the attributes
+ *     "Employee ID" and "Employee Date of Birth", the functional
+ *     dependency {Employee ID} -> {Employee Date of Birth} would hold.
+ *     It follows from the previous two sentences that each {Employee ID}
+ *     is associated with precisely one {Employee Date of Birth}.
+ *
+ * [1] http://en.wikipedia.org/wiki/Database_normalization
+ *
+ * Most datasets might be normalized not to contain any such functional
+ * dependencies, but sometimes it's not practical. In some cases it's
+ * actually a conscious choice to model the dataset in denormalized way,
+ * either because of performance or to make querying easier.
+ *
+ * The current implementation supports only dependencies between two
+ * columns, but this is merely a simplification of the initial patch.
+ * It's certainly useful to mine for dependencies involving multiple
+ * columns on the 'left' side, i.e. a condition for the dependency.
+ * That is dependencies [A,B] => C and so on.
+ *
+ * TODO The implementation may/should be smart enough not to mine both
+ *      [A => B] and [A,C => B], because the second dependency is a
+ *      consequence of the first one (if values of A determine values
+ *      of B, adding another column won't change that). The ANALYZE
+ *      should first analyze 1:1 dependencies, then 2:1 dependencies
+ *      (and skip the already identified ones), etc.
+ *
+ * For example the dependency [city name => zip code] is much weaker
+ * than [city name, state name => zip code], because there may be
+ * multiple cities with the same name in various states. It's not
+ * perfect though - there are probably cities with the same name within
+ * the same state, but this is relatively rare occurence hopefully.
+ * More about this in the section about dependency mining.
+ *
+ * Handling multiple columns on the right side is not necessary, as such
+ * dependencies may be decomposed into a set of dependencies with
+ * the same meaning, one for each column on the right side. For example
+ *
+ *     A => [B,C]
+ *
+ * is exactly the same as
+ *
+ *     (A => B) & (A => C).
+ *
+ * Of course, storing (A => [B, C]) may be more efficient thant storing
+ * the two dependencies (A => B) and (A => C) separately.
+ *
+ *
+ * Dependency mining (ANALYZE)
+ * ---------------------------
+ *
+ * The current build algorithm is rather simple - for each pair [A,B] of
+ * columns, the data are sorted lexicographically (first by A, then B),
+ * and then a number of metrics is computed by walking the sorted data.
+ *
+ * In general the algorithm counts distict values of A (forming groups
+ * thanks to the sorting), supporting or contradicting the hypothesis
+ * that A => B (i.e. that values of B are predetermined by A). If there
+ * are multiple values of B for a single value of A, it's counted as
+ * contradicting.
+ *
+ * A group may be neither supporting nor contradicting. To be counted as
+ * supporting, the group has to have at least min_group_size(=3) rows.
+ * Smaller 'supporting' groups are counted as neutral.
+ *
+ * Finally, the number of rows in supporting and contradicting groups is
+ * compared, and if there is at least 10x more supporting rows, the
+ * dependency is considered valid.
+ *
+ *
+ * Real-world datasets are imperfect - there may be errors (e.g. due to
+ * data-entry mistakes), or factually correct records, yet contradicting
+ * the dependency (e.g. when a city splits into two, but both keep the
+ * same ZIP code). A strict ANALYZE implementation (where the functional
+ * dependencies are identified) would ignore dependencies on such noisy
+ * data, making the approach unusable in practice.
+ *
+ * The proposed implementation attempts to handle such noisy cases
+ * gracefully, by tolerating small number of contradicting cases.
+ *
+ * In the future this might also perform some sort of test and decide
+ * whether it's worth building any other kind of multivariate stats,
+ * or whether the dependencies sufficiently describe the data. Or at
+ * least not build the MCV list / histogram on the implied columns.
+ * Such reduction would however make the 'verification' (see the next
+ * section) impossible.
+ *
+ *
+ * Clause reduction (planner/optimizer)
+ * ------------------------------------
+ *
+ * Apllying the dependencies is quite simple - given a list of clauses,
+ * try to apply all the dependencies. For example given clause list
+ *
+ *    (a = 1) AND (b = 1) AND (c = 1) AND (d < 100)
+ *
+ * and dependencies [a=>b] and [a=>d], this may be reduced to
+ *
+ *    (a = 1) AND (c = 1) AND (d < 100)
+ *
+ * The (d<100) can't be reduced as it's not an equality clause, so the
+ * dependency [a=>d] can't be applied.
+ *
+ * See clauselist_apply_dependencies() for more details.
+ *
+ * The problem with the reduction is that the query may use conditions
+ * that are not redundant, but in fact contradictory - e.g. the user
+ * may search for a ZIP code and a city name not matching the ZIP code.
+ *
+ * In such cases, the condition on the city name is not actually
+ * redundant, but actually contradictory (making the result empty), and
+ * removing it while estimating the cardinality will make the estimate
+ * worse.
+ *
+ * The current estimation assuming independence (and multiplying the
+ * selectivities) works better in this case, but only by utter luck.
+ *
+ * In some cases this might be verified using the other multivariate
+ * statistics - MCV lists and histograms. For MCV lists the verification
+ * might be very simple - peek into the list if there are any items
+ * matching the clause on the 'A' column (e.g. ZIP code), and if such
+ * item is found, check that the 'B' column matches the other clause.
+ * If it does not, the clauses are contradictory. We can't really say
+ * if such item was not found, except maybe restricting the selectivity
+ * using the MCV data (e.g. using min/max selectivity, or something).
+ *
+ * With histograms, it might work similarly - we can't check the values
+ * directly (because histograms use buckets, unlike MCV lists, storing
+ * the actual values). So we can only observe the buckets matching the
+ * clauses - if those buckets have very low frequency, it probably means
+ * the two clauses are incompatible.
+ *
+ * It's unclear what 'low frequency' is, but if one of the clauses is
+ * implied (automatically true because of the other clause), then
+ *
+ *     selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+ *
+ * So we might compute selectivity of the first clause (on the column
+ * A in dependency [A=>B]) - for example using regular statistics.
+ * And then check if the selectivity computed from the histogram is
+ * about the same (or significantly lower).
+ *
+ * The problem is that histograms work well only when the data ordering
+ * matches the natural meaning. For values that serve as labels - like
+ * city names or ZIP codes, or even generated IDs, histograms really
+ * don't work all that well. For example sorting cities by name won't
+ * match the sorting of ZIP codes, rendering the histogram unusable.
+ *
+ * The MCV are probably going to work much better, because they don't
+ * really assume any sort of ordering. And it's probably more appropriate
+ * for the label-like data.
+ *
+ * TODO Support dependencies with multiple columns on left/right.
+ *
+ * TODO Investigate using histogram and MCV list to confirm the
+ *      functional dependencies.
+ *
+ * TODO Investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram/MCV list).
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ *
+ * TODO Consider eliminating the implied columns from the histogram and
+ *      MCV lists (but maybe that's not a good idea, because that'd make
+ *      it impossible to use these stats for non-equality clauses and
+ *      also it wouldn't be possible to use the stats for verification
+ *      of the dependencies as proposed in another TODO).
+ *
+ * TODO This builds a complete set of dependencies, i.e. including
+ *      transitive dependencies - if we identify [A => B] and [B => C],
+ *      we're likely to identify [A => C] too. It might be better to
+ *      keep only the minimal set of dependencies, i.e. prune all the
+ *      dependencies that we can recreate by transivitity.
+ *
+ *      There are two conceptual ways to do that:
+ *
+ *      (a) generate all the rules, and then prune the rules that may
+ *          be recteated by combining other dependencies, or
+ *
+ *      (b) performing the 'is combination of other dependencies' check
+ *          before actually doing the work
+ *
+ *      The second option has the advantage that we don't really need
+ *      to perform the sort/count. It's not sufficient alone, though,
+ *      because we may discover the dependencies in the wrong order.
+ *      For example [A => B], [A => C] and then [B => C]. None of those
+ *      dependencies is a combination of the already known ones, yet
+ *      [A => C] is a combination of [A => B] and [B => C].
+ *
+ * FIXME Not sure the current NULL handling makes much sense. We assume
+ *       that NULL is 0, so it's handled like a regular value
+ *       (NULL == NULL), so all NULLs in a single column form a single
+ *       group. Maybe that's not the right thing to do, especially with
+ *       equality conditions - in that case NULLs are irrelevant. So
+ *       maybe the right solution would be to just ignore NULL values?
+ *
+ *       However simply "ignoring" the NULL values does not seem like
+ *       a good idea - imagine columns A and B, where for each value of
+ *       A, values in B are constant (same for the whole group) or NULL.
+ *       Let's say only 10% of B values in each group is not NULL. Then
+ *       ignoring the NULL values will result in 10x misestimate (and
+ *       it's trivial to construct arbitrary errors). So maybe handling
+ *       NULL values just like a regular value is the right thing here.
+ *
+ *       Or maybe NULL values should be treated differently on each side
+ *       of the dependency? E.g. as ignored on the left (condition) and
+ *       as regular values on the right - this seems consistent with how
+ *       equality clauses work, as equality clause means 'NOT NULL'.
+ *       So if we say [A => B] then it may also imply "NOT NULL" on the
+ *       right side.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct columns in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we should expected to
+	 *      observe in the sample - we can then use the average group
+	 *      size as a threshold. That seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 04d769e..0b3518c 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2096,6 +2096,46 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90500)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 2), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index e6ac394..36debeb 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 71e0010..e404ae3 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,11 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..81ec23b
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_attrdef
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					5
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_deps_enabled		2
+#define Anum_pg_mv_statistic_deps_built			3
+#define Anum_pg_mv_statistic_stakeys			4
+#define Anum_pg_mv_statistic_stadeps			5
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index bd67d72..5024a01 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2724,6 +2724,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3284 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3285 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index fb2f035..724a169 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3288, 3289);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8991f3f..d60835f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -243,6 +243,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -415,6 +416,7 @@ typedef enum NodeTag
 	T_WithClause,
 	T_CommonTableExpr,
 	T_RoleSpec,
+	T_StatisticsDef,
 
 	/*
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 852eb4f..3cd57fd 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -570,6 +570,14 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct StatisticsDef
+{
+	NodeTag		type;
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+} StatisticsDef;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1372,7 +1380,9 @@ typedef enum AlterTableType
 	AT_ReplicaIdentity,			/* REPLICA IDENTITY */
 	AT_EnableRowSecurity,		/* ENABLE ROW SECURITY */
 	AT_DisableRowSecurity,		/* DISABLE ROW SECURITY */
-	AT_GenericOptions			/* OPTIONS (...) */
+	AT_GenericOptions,			/* OPTIONS (...) */
+	AT_AddStatistics,			/* ADD STATISTICS */
+	AT_DropStatistics			/* DROP STATISTICS */
 } AlterTableType;
 
 typedef struct ReplicaIdentityStmt
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1713d29..f6c4932 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -453,6 +453,7 @@ typedef struct RelOptInfo
 	Relids		lateral_relids; /* minimum parameterization of rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -545,6 +546,33 @@ typedef struct IndexOptInfo
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5b1ee15..0d7d758 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -355,7 +355,7 @@ PG_KEYWORD("stable", STABLE, UNRESERVED_KEYWORD)
 PG_KEYWORD("standalone", STANDALONE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("start", START, UNRESERVED_KEYWORD)
 PG_KEYWORD("statement", STATEMENT, UNRESERVED_KEYWORD)
-PG_KEYWORD("statistics", STATISTICS, UNRESERVED_KEYWORD)
+PG_KEYWORD("statistics", STATISTICS, RESERVED_KEYWORD)
 PG_KEYWORD("stdin", STDIN, UNRESERVED_KEYWORD)
 PG_KEYWORD("stdout", STDOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("storage", STORAGE, UNRESERVED_KEYWORD)
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..411cd16
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 9e17d87..83ca7fb 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -80,6 +80,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -112,6 +113,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 6953281..77efeff 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 6634099..ac119b3 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index f7f016b..2f9758f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1353,6 +1353,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
1.9.3

0002-clause-reduction-using-functional-dependencies.patchtext/x-patch; name=0002-clause-reduction-using-functional-dependencies.patchDownload

>From 827e5633e4368706a60ea6a949208205fc0928a3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 2/6] clause reduction using functional dependencies

During planning, use functional dependencies to decide
which clauses to skip during cardinality estimation.
Initial and rather simplistic implementation.

This only works with regular WHERE clauses, not clauses
used for joining.

Note: The clause_is_mv_compatible() needs to identify the
relation (so that we can fetch the list of multivariate stats
by OID). planner_rt_fetch() seems like the appropriate way to
get the relation OID, but apparently it only works with simple
vars. Maybe examine_variable() would make this work with more
complex vars too?

Includes regression tests analyzing functional dependencies
(part of ANALYZE) on several datasets (no dependencies, no
transitive dependencies, ...).

Checks that a query with conditions on two columns, where one (B)
is functionally dependent on the other one (A), correctly ignores
the clause on (B) and chooses bitmap index scan instead of plain
index scan (which is what happens otherwise, thanks to assumption
of independence).

Note: Functional dependencies only work with equality clauses,
      no inequalities etc.
---
 src/backend/commands/analyze.c                |   1 +
 src/backend/commands/tablecmds.c              |   8 +-
 src/backend/optimizer/path/clausesel.c        | 659 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 +
 src/include/catalog/pg_proc.h                 |   4 +-
 src/include/utils/mvstats.h                   |  16 +-
 src/test/regress/expected/mv_dependencies.out | 172 +++++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 ++++++
 11 files changed, 1035 insertions(+), 8 deletions(-)
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fff27e0..8f335f2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -116,6 +116,7 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
+
 /*
  *	analyze_rel() -- analyze one relation
  */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5c57146..b372660 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11914,7 +11914,7 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	Relation	mvstatrel;
 
 	/* by default build everything */
-	bool 	build_dependencies = true;
+	bool 	build_dependencies = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11976,6 +11976,12 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 							opt->defname)));
 	}
 
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index dcac1c1..fb7adf8 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -24,6 +24,14 @@
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
+#include "utils/mvstats.h"
+#include "catalog/pg_collation.h"
+#include "utils/typcache.h"
+
+#include "parser/parsetree.h"
+
+
+#include <stdio.h>
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -43,6 +51,16 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 
+static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+
+static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+								Oid varRelid, List *stats,
+								SpecialJoinInfo *sjinfo);
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -61,7 +79,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -88,6 +106,76 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  *
  * Of course this is all very dependent on the behavior of
  * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ *
+ *
+ * Multivariate statististics
+ * --------------------------
+ * This also uses multivariate stats to estimate combinations of conditions,
+ * in a way attempting to minimize the overhead when there are no suitable
+ * multivariate stats.
+ *
+ * The following checks are performed (in this order), and the optimizer
+ * falls back to regular stats on the first 'false'.
+ *
+ * NOTE: This explains how this works with all the patches applied, not
+ *       just the functional dependencies.
+ *
+ * (1) check that at least two columns are referenced from conditions
+ *     compatible with multivariate stats
+ *
+ *     If there are no conditions that might be handled by multivariate
+ *     stats, or if the conditions reference just a single column, it
+ *     makes no sense to use multivariate stats.
+ *
+ *     What conditions are compatible with multivariate stats is decided
+ *     by clause_is_mv_compatible(). At this moment, only simple conditions
+ *     of the form "column operator constant" (for simple comparison
+ *     operators), and IS NULL / IS NOT NULL are considered compatible
+ *     with multivariate statistics.
+ *
+ * (2) reduce the clauses using functional dependencies
+ *
+ *     This simply attempts to 'reduce' the clauses by applying functional
+ *     dependencies. For example if there are two clauses:
+ *
+ *         WHERE (a = 1) AND (b = 2)
+ *
+ *     and we know that 'a' determines the value of 'b', we may remove
+ *     the second condition (b = 2) when computing the selectivity.
+ *     This is of course tricky - see mvstats/dependencies.c for details.
+ *
+ *     After the reduction, step (1) is to be repeated.
+ *
+ * (3) check if there are multivariate stats built on the columns
+ *
+ *     If there are no multivariate statistics, we have to fall back to
+ *     the regular stats. We might perform checks (1) and (2) in reverse
+ *     order, i.e. first check if there are multivariate statistics and
+ *     then collect the attributes only if needed. The assumption is
+ *     that checking the clauses is cheaper than querying the catalog,
+ *     so this check is performed first.
+ *
+ * (4) choose the stats matching the most columns (at least two)
+ *
+ *     If there are multiple instances of multivariate statistics (e.g.
+ *     built on different sets of columns), we choose the stats covering
+ *     the most columns from step (1). It may happen that all available
+ *     stats match just a single column - for example with conditions
+ *
+ *         WHERE a = 1 AND b = 2
+ *
+ *     and statistics built on (a,c) and (b,c). In such case just fall
+ *     back to the regular stats because it makes no sense to use the
+ *     multivariate statistics.
+ *
+ *     This selection criteria (the most columns) is certainly very
+ *     simple and definitely not optimal - it's simple to come up with
+ *     examples where other approaches work better. More about this
+ *     at choose_mv_statistics().
+ *
+ * (5) use the multivariate stats to estimate matching clauses
+ *
+ * (6) estimate the remaining clauses using the regular statistics
  */
 Selectivity
 clauselist_selectivity(PlannerInfo *root,
@@ -100,6 +188,12 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +202,35 @@ clauselist_selectivity(PlannerInfo *root,
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
+	/* collect attributes referenced by mv-compatible clauses */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+
+	/*
+	 * If there are mv-compatible clauses, referencing at least two
+	 * different columns (otherwise it makes no sense to use mv stats),
+	 * try to reduce the clauses using functional dependencies, and
+	 * recollect the attributes from the reduced list.
+	 *
+	 * We don't need to select a single statistics for this - we can
+	 * apply all the functional dependencies we have.
+	 */
+	if (bms_num_members(mvattnums) >= 2)
+	{
+		/*
+		 * fetch info from the catalog (not the serialized stats yet)
+		 *
+		 * TODO This is rather ugly - we get the stats as a list from
+		 *      RelOptInfo (thanks to relcache/syscache), but we transform
+		 *      it into an array (which the other methods use for now).
+		 *      This should not be necessary, I guess.
+		 * */
+		List *stats = root->simple_rel_array[relid]->mvstatlist;
+
+		/* reduce clauses by applying functional dependencies rules */
+		clauses = clauselist_apply_dependencies(root, clauses, varRelid,
+												stats, sjinfo);
+	}
+
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -782,3 +905,537 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
+				   Index *relid, SpecialJoinInfo *sjinfo)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* no support for OR clauses at this point */
+		if (rinfo->orclause)
+			return false;
+
+		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
+		clause = (Node*)rinfo->clause;
+
+		/* only simple opclauses are compatible with multivariate stats */
+		if (! is_opclause(clause))
+			return false;
+
+		/* we don't support join conditions at this moment */
+		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+			return false;
+
+		/* is it 'variable op constant' ? */
+		if (list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+				 * (return NULL).
+				 *
+				 * TODO Maybe use examine_variable() would fix that?
+				 */
+				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+					return false;
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 *
+				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+				 *       part seems to be enforced by treat_as_join_clause().
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				*relid = var->varno;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_EQSEL:
+							*attnum = var->varattno;
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * Performs reduction of clauses using functional dependencies, i.e.
+ * removes clauses that are considered redundant. It simply walks
+ * through dependencies, and checks whether the dependency 'matches'
+ * the clauses, i.e. if there's a clause matching the condition. If yes,
+ * all clauses matching the implied part of the dependency are removed
+ * from the list.
+ *
+ * This simply looks at attnums references by the clauses, not at the
+ * type of the operator (equality, inequality, ...). This may not be the
+ * right way to do - it certainly works best for equalities, which is
+ * naturally consistent with functional dependencies (implications).
+ * It's not clear that other operators are handled sensibly - for
+ * example for inequalities, like
+ *
+ *     WHERE (A >= 10) AND (B <= 20)
+ *
+ * and a trivial case where [A == B], resulting in symmetric pair of
+ * rules [A => B], [B => A], it's rather clear we can't remove either of
+ * those clauses.
+ *
+ * That only highlights that functional dependencies are most suitable
+ * for label-like data, where using non-equality operators is very rare.
+ * Using the common city/zipcode example, clauses like
+ *
+ *     (zipcode <= 12345)
+ *
+ * or
+ *
+ *     (cityname >= 'Washington')
+ *
+ * are rare. So restricting the reduction to equality should not harm
+ * the usefulness / applicability.
+ *
+ * The other assumption is that this assumes 'compatible' clauses. For
+ * example by using mismatching zip code and city name, this is unable
+ * to identify the discrepancy and eliminates one of the clauses. The
+ * usual approach (multiplying both selectivities) thus produces a more
+ * accurate estimate, although mostly by luck - the multiplication
+ * comes from assumption of statistical independence of the two
+ * conditions (which is not not valid in this case), but moves the
+ * estimate in the right direction (towards 0%).
+ *
+ * This might be somewhat improved by cross-checking the selectivities
+ * against MCV and/or histogram.
+ *
+ * The implementation needs to be careful about cyclic rules, i.e. rules
+ * like [A => B] and [B => A] at the same time. This must not reduce
+ * clauses on both attributes at the same time.
+ *
+ * Technically we might consider selectivities here too, somehow. E.g.
+ * when (A => B) and (B => A), we might use the clauses with minimum
+ * selectivity.
+ *
+ * TODO Consider restricting the reduction to equality clauses. Or maybe
+ *      use equality classes somehow?
+ *
+ * TODO Merge this docs to dependencies.c, as it's saying mostly the
+ *      same things as the comments there.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Oid varRelid, List *stats,
+							  SpecialJoinInfo *sjinfo)
+{
+	int i;
+	ListCell *lc;
+	List * reduced_clauses = NIL;
+	Index	relid;
+
+	/*
+	 * preallocate space for all clauses, including non-mv-compatible,
+	 * so that we don't need to reallocate the arrays repeatedly
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+	int			nmvclauses = 0;	/* number clauses in the arrays */
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	int			attnum, attidx, attnum_max;
+
+	bool		has_deps_built = false;
+
+	/* see if there's at least one statistics with dependencies */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		if (info->deps_built)
+		{
+			has_deps_built = true;
+			break;
+		}
+	}
+
+	/* no dependencies available - return the original clauses */
+	if (! has_deps_built)
+		return clauses;
+
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+
+	/*
+	 * Walk through the clauses - clauses that are not mv-compatible copy
+	 * directly into the result list, and mv-compatible ones store into
+	 * an array of clauses (and remember the attnumb in another array).
+	 */
+	foreach (lc, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+		if (! clause_is_mv_compatible(root, clause, varRelid, &relid, &attnum, sjinfo))
+			reduced_clauses = lappend(reduced_clauses, clause);
+		else
+		{
+			mvclauses[nmvclauses] = clause;
+			mvattnums[nmvclauses] = attnum;
+			nmvclauses++;
+
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((nmvclauses < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		pfree(mvattnums);
+		pfree(mvclauses);
+
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* build the dependency matrix */
+	attnum_max = -1;
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		int2vector *stakeys = info->stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+		{
+			int attnum = stakeys->values[j];
+			deps_attnums = bms_add_member(deps_attnums, attnum);
+
+			/* keep the max attnum in the dependencies */
+			attnum_max = (attnum > attnum_max) ? attnum : attnum_max;
+		}
+	}
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		pfree(mvattnums);
+		pfree(mvclauses);
+
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/* allocate the matrix and mappings */
+	deps_natts  = bms_num_members(deps_attnums);
+	deps_matrix = (bool*)palloc0(deps_natts * deps_natts * sizeof(int));
+	deps_idx_to_attnum = (int*)palloc0(deps_natts * sizeof(int));
+	deps_attnum_to_idx = (int*)palloc0((attnum_max+1) * sizeof(int));
+
+	/* build the (attnum => attidx) and (attidx => attnum) mappings */
+	attidx = 0;
+	attnum = -1;
+
+	while (true)
+	{
+		attnum = bms_next_member(deps_attnums, attnum);
+		if (attnum == -2)
+			break;
+
+		deps_idx_to_attnum[attidx] = attnum;
+		deps_attnum_to_idx[attnum] = attidx;
+
+		attidx += 1;
+	}
+
+	/* do we have all the attributes mapped? */
+	Assert(attidx == deps_natts);
+
+	/* walk through all the mvstats, build the adjacency matrix */
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		/* fetch dependencies */
+		dependencies = load_mv_dependencies(info->mvoid);
+		if (dependencies == NULL)
+			continue;
+
+		/* set deps_matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = deps_attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = deps_attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			deps_matrix[aidx * deps_natts + bidx] = true;
+		}
+	}
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	for (i = 0; i < deps_natts; i++)
+	{
+		int k, l, m;
+		int nchanges = 0;
+
+		/* k => l */
+		for (k = 0; k < deps_natts; k++)
+		{
+			for (l = 0; l < deps_natts; l++)
+			{
+				/* we already have this dependency */
+				if (deps_matrix[k * deps_natts + l])
+					continue;
+
+				/* we don't really care about the exact value, just 0/1 */
+				for (m = 0; m < deps_natts; m++)
+				{
+					if (deps_matrix[k * deps_natts + m] * deps_matrix[m * deps_natts + l])
+					{
+						deps_matrix[k * deps_natts + l] = true;
+						nchanges += 1;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added here, so terminate */
+		if (nchanges == 0)
+			break;
+	}
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], deps_attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], deps_attnums))
+				continue;
+
+			aidx = deps_attnum_to_idx[mvattnums[i]];
+			bidx = deps_attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = deps_matrix[aidx * deps_natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+	}
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..bd200bc 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 0ca16a0..cf66bc5 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -636,3 +636,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 5024a01..2178f6c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2724,9 +2724,9 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
-DATA(insert OID = 3284 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DATA(insert OID = 3377 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies info");
-DATA(insert OID = 3285 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DATA(insert OID = 3378 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 411cd16..02a7dda 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -16,12 +16,20 @@
 
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -47,6 +55,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..cf986e8
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE functional_dependencies ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c, d);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 6d3b865..00c6ddf 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -109,3 +109,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 8326894..b818be9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -153,3 +153,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..2491aca
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (unknown_column);
+
+-- single column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a);
+
+-- single column, duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE functional_dependencies ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- correct command
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c, d);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
1.9.3

0003-multivariate-MCV-lists.patchtext/x-patch; name=0003-multivariate-MCV-lists.patchDownload

>From d454055da3025437cbfab0ca772df818fccc3c13 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 3/6] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.

Conflicts:
	src/backend/optimizer/path/clausesel.c
---
 src/backend/catalog/system_views.sql   |    4 +-
 src/backend/commands/tablecmds.c       |   89 ++-
 src/backend/nodes/outfuncs.c           |    2 +
 src/backend/optimizer/path/clausesel.c | 1167 ++++++++++++++++++++++++++++--
 src/backend/optimizer/util/plancat.c   |    4 +-
 src/backend/utils/mvstats/Makefile     |    2 +-
 src/backend/utils/mvstats/common.c     |  104 ++-
 src/backend/utils/mvstats/common.h     |   11 +-
 src/backend/utils/mvstats/mcv.c        | 1232 ++++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                |   24 +-
 src/include/catalog/pg_mv_statistic.h  |   18 +-
 src/include/catalog/pg_proc.h          |    4 +
 src/include/nodes/relation.h           |    2 +
 src/include/utils/mvstats.h            |   69 +-
 src/test/regress/expected/mv_mcv.out   |  207 ++++++
 src/test/regress/expected/rules.out    |    4 +-
 src/test/regress/parallel_schedule     |    2 +-
 src/test/regress/serial_schedule       |    1 +
 src/test/regress/sql/mv_mcv.sql        |  178 +++++
 19 files changed, 3021 insertions(+), 103 deletions(-)
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 07586c6..74fedf0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -156,7 +156,9 @@ CREATE VIEW pg_mv_stats AS
         C.relname AS tablename,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b372660..545b595 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11914,7 +11914,13 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	Relation	mvstatrel;
 
 	/* by default build everything */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11969,6 +11975,29 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -11977,10 +12006,16 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -11996,9 +12031,13 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
 
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
-	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+	nulls[Anum_pg_mv_statistic_stadeps -1]  = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
@@ -12045,7 +12084,13 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 
 	/* checking whether the statistics matches / should be dropped */
 	bool	build_dependencies = false;
+	bool	build_mcv = false;
+
+	bool	max_mcv_items = 0;
+
 	bool	check_dependencies = false;
+	bool	check_mcv = false;
+	bool	check_mcv_items = false;
 
 	if (def != NULL)
 	{
@@ -12087,6 +12132,18 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 				check_dependencies = true;
 				build_dependencies = defGetBoolean(opt);
 			}
+			else if (strcmp(opt->defname, "mcv") == 0)
+			{
+				check_mcv = true;
+				build_mcv = defGetBoolean(opt);
+			}
+			else if (strcmp(opt->defname, "max_mcv_items") == 0)
+			{
+				check_mcv       = true;
+				check_mcv_items = true;
+				build_mcv       = true;
+				max_mcv_items   = defGetInt32(opt);
+			}
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
@@ -12126,6 +12183,30 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 					 (DatumGetBool(adatum) == build_dependencies);
 		}
 
+		if (delete && check_mcv)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_mcv_enabled,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetBool(adatum) == build_mcv);
+		}
+
+		if (delete && check_mcv_items)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_mcv_max_items,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetInt32(adatum) == max_mcv_items);
+		}
+
 		/* check that the columns match the statistics definition */
 		if (delete && (numcols > 0))
 		{
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c2d5dc5..635ccc1 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1851,9 +1851,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index fb7adf8..abffb0a 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -20,6 +20,7 @@
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
@@ -50,17 +51,46 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+							 int type);
 
 static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
-									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
+									  int type);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+								 List *clauses, Oid varRelid,
+								 List **mvclauses, MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStatisticInfo *mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -195,15 +225,19 @@ clauselist_selectivity(PlannerInfo *root,
 	Bitmapset  *mvattnums = NULL;
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
-	/* collect attributes referenced by mv-compatible clauses */
-	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+	/*
+	 * Collect attributes referenced by mv-compatible clauses (looking
+	 * for clauses compatible with functional dependencies for now).
+	 */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+								   MV_CLAUSE_TYPE_FDEP);
 
 	/*
 	 * If there are mv-compatible clauses, referencing at least two
@@ -232,6 +266,58 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
+	 * Recollect attributes from mv-compatible clauses (maybe we've
+	 * removed so many clauses we have a single mv-compatible attnum).
+	 * From now on we're only interested in MCV-compatible clauses.
+	 */
+	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+								   MV_CLAUSE_TYPE_MCV);
+
+	/*
+	 * If there still are at least two columns, we'll try to select
+	 * a suitable multivariate stats.
+	 */
+	if (bms_num_members(mvattnums) >= 2)
+	{
+		/*
+		 * fetch info from the catalog (not the serialized stats yet)
+		 *
+		 * TODO We may need to repeat this, because the previous load only
+		 *      happens if there are at least 2 clauses compatible with
+		 *      functional dependencies.
+		 *
+		 * TODO This is rather ugly - we get the stats as a list from
+		 *      RelOptInfo (thanks to relcache/syscache), but we transform
+		 *      it into an array (which the other methods use for now).
+		 *      This should not be necessary, I guess.
+		 * */
+		List *stats = root->simple_rel_array[relid]->mvstatlist;
+
+		/* see choose_mv_statistics() for details */
+		if (stats != NIL)
+		{
+			MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+			if (mvstat != NULL)	/* we have a matching stats */
+			{
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										MV_CLAUSE_TYPE_MCV);
+
+				/* we've chosen the histogram to match the clauses */
+				Assert(mvclauses != NIL);
+
+				/* compute the multivariate stats */
+				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			}
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -906,12 +992,198 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * Estimate selectivity for the list of MV-compatible clauses, using that
+ * particular histogram.
+ *
+ * When we hit a single bucket, we don't know what portion of it actually
+ * matches the clauses (e.g. equality), and we use 1/2 the bucket by
+ * default. However, the MV histograms are usually less detailed than
+ * the per-column ones, meaning the sum of buckets is often quite high
+ * (thanks to combining a lot of "partially hit" buckets).
+ *
+ * There are several ways to improve this, usually with cases when it
+ * won't really help. Also, the more complex the process, the worse
+ * the failures (i.e. misestimates).
+ *
+ * (1) Use the MV histogram only as a way to combine multiple
+ *     per-column histograms, essentially rewriting
+ *
+ *       P(A & B) = P(A) * P(B|A)
+ *
+ *     where P(B|A) may be computed using a proper "slice" of the
+ *     histogram, by first selecting only buckets where A is true, and
+ *     then using the boundaries to 'restrict' the per-colunm histogram.
+ *
+ *     With more clauses, it gets more complicated, of course
+ *
+ *       P(A & B & C) = P(A & C) * P(B|A & C)
+ *                    = P(A) * P(C|A) * P(B|A & C)
+ *
+ *     and so on.
+ *
+ *     Of course, the question is how well and efficiently we can
+ *     compute the conditional probabilities - whether this approach
+ *     can improve the estimates (instead of amplifying the errors).
+ *
+ *     Also, this does not eliminate the need for histogram on [A,B,C].
+ *
+ * (2) Use multiple smaller (and more accurate) histograms, and combine
+ *     them using a process similar to the above. E.g. by assuming that
+ *     B and C are independent, we can rewrite
+ *
+ *       P(B|A & C) = P(B|A)
+ *
+ *     so we can rewrite the whole formula to
+ *
+ *       P(A & B & C) = P(A) * P(C|A) * P(B|A)
+ *
+ *     and we're OK with two 2D histograms [A,C] and [A,B].
+ *
+ *     It'd be nice to perform some sort of statistical test (Fisher
+ *     or another chi-squared test) to identify independent components
+ *     and automatically separate them into smaller histograms.
+ *
+ * (3) Using the estimated number of distinct values in a bucket to
+ *     decide the selectivity of equality in the bucket (instead of
+ *     blindly using 1/2 of the bucket, we may use 1/ndistinct).
+ *     Of course, if the ndistinct estimate is way off, or when the
+ *     distribution is not uniform (one distict items get much more
+ *     items), this will fail. Also, we currently don't have ndistinct
+ *     estimate available at this moment (but it shouldn't be that
+ *     difficult to compute as ndistinct and ntuples should be available).
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities
+ *      (i.e. the selectivity of the most restrictive clause), because
+ *      that's the maximum we can ever get from ANDed list of clauses.
+ *      This may probably prevent issues with hitting too many buckets
+ *      and low precision histograms.
+ *
+ * TODO We may support some additional conditions, most importantly
+ *      those matching multiple columns (e.g. "a = b" or "a < b").
+ *      Ultimately we could track multi-table histograms for join
+ *      cardinality estimation.
+ *
+ * TODO Currently this is only estimating all clauses, or clauses
+ *      matching varRelid (when it's not 0). I'm not sure what's the
+ *      purpose of varRelid, but my assumption is this is used for
+ *      join conditions and such. In that case we can use those clauses
+ *      to restrict the other (i.e. filter the histogram buckets first,
+ *      before estimating the other clauses). This is essentially equal
+ *      to computing P(A|B) where "B" are the clauses not matching the
+ *      varRelid.
+ *
+ * TODO Further thoughts on processing equality clauses - maybe it'd be
+ *      better to look for stats (with MCV) covered by the equality
+ *      clauses, because then we have a chance to find an exact match
+ *      in the MCV list, which is pretty much the best we can do. We may
+ *      also look at the least frequent MCV item, and use it as a upper
+ *      boundary for the selectivity (had there been a more frequent
+ *      item, it'd be in the MCV list).
+ *
+ *      These conditions may then be used as a condition for the other
+ *      selectivities, i.e. we may estimate P(A,B) first, and then
+ *      compute P(C|A,B) from another histogram. This may be useful when
+ *      we can estimate P(A,B) accurately (e.g. because it's a complete
+ *      equality match evaluated on MCV list), and then compute the
+ *      conditional probability P(C|A,B), giving us the requested stats
+ *
+ *          P(A,B,C) = P(A,B) * P(C|A,B)
+ *
+ * TODO There are several options for 'sanity clamping' the estimates.
+ *
+ *      First, if we have selectivities for each condition, then
+ *
+ *          P(A,B) <= MIN(P(A), P(B))
+ *
+ *      Because additional conditions (connected by AND) can only lower
+ *      the probability.
+ *
+ *      So we can do some basic sanity checks using the single-variate
+ *      stats (the ones we have right now).
+ *
+ *      Second, when we have multivariate stats with a MCV list, then
+ *
+ *      (a) if we have a full equality condition (one equality condition
+ *          on each column) and we found a match in the MCV list, this is
+ *          the selectivity (and it's supposed to be exact)
+ *
+ *      (b) if we have a full equality condition and we haven't found a
+ *          match in the MCV list, then the selectivity is below the
+ *          lowest selectivity in the MCV list
+ *
+ *      (c) if we have a equality condition (not full), we can still
+ *          search the MCV for matches and use the sum of probabilities
+ *          as a lower boundary for the histogram (if there are no
+ *          matches in the MCV list, then we have no boundary)
+ *
+ *      Third, if there are multiple multivariate stats for a set of
+ *      clauses, we may compute all of them and then somehow aggregate
+ *      them - e.g. by choosing the minimum, median or average. The
+ *      multi-variate stats are susceptible to overestimation (because
+ *      we take 50% of the bucket for partial matches). Some stats may
+ *      give better estimates than others, but it's very difficult to
+ *      say determine that in advance which one is the best (it depends
+ *      on the number of buckets, number of additional columns not
+ *      referenced in the clauses etc.) so we may compute all and then
+ *      choose a sane aggregation (minimum seems like a good approach).
+ *      Of course, this may result in longer / more expensive estimation
+ *      (CPU-wise), but it may be worth it.
+ *
+ *      There are ways to address this, though. First, it's possible to
+ *      add a GUC choosing whether to do a 'simple' (using a single
+ *      stats expected to give the best estimate) and 'complex' (combining
+ *      the multiple estimates).
+ *
+ *          multivariate_estimates = (simple|full)
+ *
+ *      Also, this might be enabled at a table level, by something like
+ *
+ *          ALTER TABLE ... SET STATISTICS (simple|full)
+ *
+ *      Which would make it possible to use this only for the tables
+ *      where the simple approach does not work.
+ *
+ *      Also, there are ways to optimize this algorithmically. E.g. we
+ *      may try to get an estimate from a matching MCV list first, and
+ *      if we happen to get a "full equality match" we may stop computing
+ *      the estimates from other stats (for this condition) because
+ *      that's probably the best estimate we can really get.
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive).
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
 collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
-				   Index *relid, SpecialJoinInfo *sjinfo)
+				   Index *relid, SpecialJoinInfo *sjinfo, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
@@ -927,12 +1199,11 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+								sjinfo, types);
 	}
 
 	/*
@@ -951,6 +1222,188 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
+ * We're looking for statistics matching at least 2 attributes,
+ * referenced in the clauses compatible with multivariate statistics.
+ * The current selection criteria is very simple - we choose the
+ * statistics referencing the most attributes.
+ *
+ * If there are multiple statistics referencing the same number of
+ * columns (from the clauses), the one with less source columns
+ * (as listed in the ADD STATISTICS when creating the statistics) wins.
+ * Other wise the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns,
+ *     but one has 100 buckets and the other one has 1000 buckets (thus
+ *     likely providing better estimates), this is not currently
+ *     considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list,
+ *     another one with just a histogram and a third one with both,
+ *     this is not considered.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts,
+ *     so if there are multiple clauses on a single attribute, this
+ *     still counts as a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example
+ *     equality clauses probably work better with MCV lists than with
+ *     histograms. But IS [NOT] NULL conditions may often work better
+ *     with histograms (thanks to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
+ * selected as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list
+ * of clauses into two parts - conditions that are compatible with the
+ * selected stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while
+ * the last condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting
+ * conditions instead of just referenced attributes), but eventually
+ * the best option should be to combine multiple statistics. But that's
+ * much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing
+ *      the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements)
+	 * and for each one count the referenced attributes (encoded in
+	 * the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen statistics, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes covered by the stats, so we can
+	 * do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
+									&attnums, sjinfo, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list
+		 * of mv-compatible clauses. Otherwise, keep it in the list of
+		 * 'regular' clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
  * into simple_rte_array) and a bitmap of attributes. This is then
@@ -969,93 +1422,197 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
  */
 static bool
 clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+						Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+						int types)
 {
+	Relids clause_relids;
+	Relids left_relids;
+	Relids right_relids;
 
 	if (IsA(clause, RestrictInfo))
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) clause;
 
-		/* Pseudoconstants are not really interesting here. */
-		if (rinfo->pseudoconstant)
+		if (! IsA(clause, RestrictInfo))
+		{
+			elog(WARNING, "expected RestrictInfo, got type %d", clause->type);
 			return false;
+		}
 
-		/* no support for OR clauses at this point */
-		if (rinfo->orclause)
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
 			return false;
 
 		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
 		clause = (Node*)rinfo->clause;
 
-		/* only simple opclauses are compatible with multivariate stats */
-		if (! is_opclause(clause))
-			return false;
-
 		/* we don't support join conditions at this moment */
 		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
 			return false;
 
+		clause_relids = rinfo->clause_relids;
+		left_relids = rinfo->left_relids;
+		right_relids = rinfo->right_relids;
+	}
+	else if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+	{
+		left_relids = pull_varnos(get_leftop((Expr*)clause));
+		right_relids = pull_varnos(get_rightop((Expr*)clause));
+
+		clause_relids = bms_union(left_relids,
+								  right_relids);
+	}
+	else
+	{
+		/* Not a binary opclause, so mark left/right relid sets as empty */
+		left_relids = NULL;
+		right_relids = NULL;
+		/* and get the total relid set the hard way */
+		clause_relids = pull_varnos((Node *) clause);
+	}
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+		bool		varonleft = true;
+		bool		ok;
+
 		/* is it 'variable op constant' ? */
-		if (list_length(((OpExpr *) clause)->args) == 2)
+
+		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
+			(is_pseudo_constant_clause_relids(lsecond(expr->args),
+											  right_relids) ||
+			(varonleft = false,
+			is_pseudo_constant_clause_relids(linitial(expr->args),
+											 left_relids)));
+
+		if (ok)
 		{
-			OpExpr	   *expr = (OpExpr *) clause;
-			bool		varonleft = true;
-			bool		ok;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
 
-			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-				(is_pseudo_constant_clause_relids(lsecond(expr->args),
-												rinfo->right_relids) ||
-				(varonleft = false,
-				is_pseudo_constant_clause_relids(linitial(expr->args),
-												rinfo->left_relids)));
+			/*
+			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+			 * (return NULL).
+			 *
+			 * TODO Maybe use examine_variable() would fix that?
+			 */
+			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+				return false;
 
-			if (ok)
-			{
-				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			/*
+			 * Only consider this variable if (varRelid == 0) or when the varno
+			 * matches varRelid (see explanation at clause_selectivity).
+			 *
+			 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+			 *       part seems to be enforced by treat_as_join_clause().
+			 */
+			if (! ((varRelid == 0) || (varRelid == var->varno)))
+				return false;
 
-				/*
-				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-				 * (return NULL).
-				 *
-				 * TODO Maybe use examine_variable() would fix that?
-				 */
-				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-					return false;
+			/* Also skip special varno values, and system attributes ... */
+			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+				return false;
 
-				/*
-				 * Only consider this variable if (varRelid == 0) or when the varno
-				 * matches varRelid (see explanation at clause_selectivity).
-				 *
-				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-				 *       part seems to be enforced by treat_as_join_clause().
-				 */
-				if (! ((varRelid == 0) || (varRelid == var->varno)))
-					return false;
+			/* Lookup info about the base relation (we need to pass the OID out) */
+			if (relid != NULL)
+				*relid = var->varno;
+
+			/*
+			 * If it's not a "<" or ">" or "=" operator, just ignore the
+			 * clause. Otherwise note the relid and attnum for the variable.
+			 * This uses the function for estimating selectivity, ont the
+			 * operator directly (a bit awkward, but well ...).
+			 */
+			switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:
+					case F_SCALARGTSEL:
+						/* not compatible with functional dependencies */
+						if (types & MV_CLAUSE_TYPE_MCV)
+						{
+							*attnums = bms_add_member(*attnums, var->varattno);
+							return (types & MV_CLAUSE_TYPE_MCV);
+						}
+						return false;
+
+					case F_EQSEL:
+						*attnums = bms_add_member(*attnums, var->varattno);
+						return true;
+				}
+		}
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		Var * var = (Var*)((NullTest*)clause)->arg;
+
+		/*
+		 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+		 * (return NULL).
+		 *
+		 * TODO Maybe use examine_variable() would fix that?
+		 */
+		if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+			return false;
+
+		/*
+		 * Only consider this variable if (varRelid == 0) or when the varno
+		 * matches varRelid (see explanation at clause_selectivity).
+		 *
+		 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+		 *       part seems to be enforced by treat_as_join_clause().
+		 */
+		if (! ((varRelid == 0) || (varRelid == var->varno)))
+			return false;
 
-				/* Also skip special varno values, and system attributes ... */
-				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-					return false;
+		/* Also skip special varno values, and system attributes ... */
+		if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+			return false;
 
+		/* Lookup info about the base relation (we need to pass the OID out) */
+		if (relid != NULL)
 				*relid = var->varno;
 
-				/*
-				 * If it's not a "<" or ">" or "=" operator, just ignore the
-				 * clause. Otherwise note the relid and attnum for the variable.
-				 * This uses the function for estimating selectivity, ont the
-				 * operator directly (a bit awkward, but well ...).
-				 */
-				switch (get_oprrest(expr->opno))
-					{
-						case F_EQSEL:
-							*attnum = var->varattno;
-							return true;
-					}
-			}
+		*attnums = bms_add_member(*attnums, var->varattno);
+
+		return true;
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		/*
+		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses
+		 *      are supported and some are not, and treat all supported
+		 *      subclauses as a single clause, compute it's selectivity
+		 *      using mv stats, and compute the total selectivity using
+		 *      the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the
+		 *      orclause with nested RestrictInfo - we won't have to
+		 *      call pull_varnos() for each clause, saving time. 
+		 */
+		Bitmapset *tmp = NULL;
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			if (! clause_is_mv_compatible(root, (Node*)lfirst(l),
+						varRelid, relid, &tmp, sjinfo, types))
+				return false;
 		}
+
+		/* add the attnums from the OR-clause to the set of attnums */
+		*attnums = bms_join(*attnums, tmp);
+
+		return true;
 	}
 
 	return false;
-
 }
 
 /*
@@ -1117,6 +1674,13 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
  *
  * TODO Merge this docs to dependencies.c, as it's saying mostly the
  *      same things as the comments there.
+ * 
+ * TODO Currently this is applied only to the top-level clauses, but
+ *      maybe we could apply it to lists at subtrees too, e.g. to the
+ *      two AND-clauses in
+ *
+ *          (x=1 AND y=2) OR (z=3 AND q=10)
+ *
  */
 static List *
 clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
@@ -1200,17 +1764,27 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 	 */
 	foreach (lc, clauses)
 	{
-		AttrNumber attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
-		if (! clause_is_mv_compatible(root, clause, varRelid, &relid, &attnum, sjinfo))
+		if (! clause_is_mv_compatible(root, clause, varRelid, &relid, &attnums,
+									  sjinfo, MV_CLAUSE_TYPE_FDEP))
+			reduced_clauses = lappend(reduced_clauses, clause);
+		else if (bms_num_members(attnums) > 1)
+			/* FIXME This may happen thanks to OR-clauses, which should
+			 *       really be handled differently for functional
+			 *       dependencies.
+			 */
 			reduced_clauses = lappend(reduced_clauses, clause);
 		else
 		{
+			/* functional dependencies support only [Var = Const] */
+			Assert(bms_num_members(attnums) == 1);
 			mvclauses[nmvclauses] = clause;
-			mvattnums[nmvclauses] = attnum;
+			mvattnums[nmvclauses] = bms_singleton_member(attnums);
 			nmvclauses++;
 
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+											bms_singleton_member(attnums));
 		}
 	}
 
@@ -1439,3 +2013,454 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 
 	return reduced_clauses;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/* frequency of the lowest MCV item */
+	*lowsel = 1.0;
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 *
+	 * FIXME This would probably deserve a refactoring, I guess. Unify
+	 *       the two loops and put the checks inside, or something like
+	 *       that.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			/* operator */
+			FmgrInfo		opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	ltproc, gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype,
+										TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * find the lowest selectivity in the MCV
+					 * FIXME Maybe not the best place do do this (in for all clauses).
+					 */
+					if (item->frequency < *lowsel)
+						*lowsel = item->frequency;
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* TODO consider bsearch here (list is sorted by values)
+					 * TODO handle other operators too (LT, GT)
+					 * TODO identify "full match" when the clauses fully
+					 *      match the whole MCV list (so that checking the
+					 *      histogram is not needed)
+					 */
+					if (oprrest == F_EQSEL)
+					{
+						/*
+						 * We don't care about isgt in equality, because it does not
+						 * matter whether it's (var = const) or (const = var).
+						 */
+						bool match = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+
+						if (match)
+							eqmatches = bms_add_member(eqmatches, idx);
+
+						mismatch = (! match);
+					}
+					else if (oprrest == F_SCALARLTSEL)	/* column < constant */
+					{
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+						} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+					}
+					else if (oprrest == F_SCALARGTSEL)	/* column > constant */
+					{
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/*
+				 * find the lowest selectivity in the MCV
+				 * FIXME Maybe not the best place do do this (in for all clauses).
+				 */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! mcvlist->items[i]->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (mcvlist->items[i]->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 1cf64f8..c196ca0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -406,7 +406,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -415,9 +415,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..3c0aff4 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o mcv.o dependencies.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd200bc..d1da714 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 6d5465b..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..96bdf41
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1232 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Multivariate MCVs (most-common values lists) are a straightforward
+ * extension of regular MCV list by tracking combinations of values for
+ * several attributes (columns), including NULL flags, and frequency
+ * of the combination.
+ *
+ * For columns small number of distinct values, this works quite well
+ * and may represent the distribution pretty exactly. For columns with
+ * large number of distinct values (e.g. stored as FLOAT), this does
+ * not work that well.
+ *
+ * If we can represent the distribution as a MCV list, we can estimate
+ * some clauses (e.g. equality clauses) much accurately than using
+ * histograms for example.
+ *
+ * Discrete distributions are also easier to combine into a larger
+ * distribution (but this is not yet implemented).
+ *
+ *
+ * TODO For types that don't reasonably support ordering (either because
+ *      the type does not support that or when the user adds some option
+ *      to the ADD STATISTICS command - e.g. UNSORTED_STATS), building
+ *      the histogram may be pointless and inefficient. This is esp.
+ *      true for varlena types that may be quite large and a large MCV
+ *      list may be a better choice, because it makes equality estimates
+ *      more accurate. Due to the unsorted nature, range queries on those
+ *      attributes are rather useless anyway.
+ *
+ *      Another thing is that by restricting to MCV list and equality
+ *      conditions, we can use hash values instead of long varlena values.
+ *      The equality estimation will be very accurate.
+ *
+ *      This however complicates matching the columns to available
+ *      statistics, as it will require matching clauses (not columns) to
+ *      stats. And it may get quite complex - e.g. what if there are
+ *      multiple clauses, each compatible with different stats subset?
+ *
+ *
+ * Selectivity estimation
+ * ----------------------
+ * The estimation, implemented in clauselist_mv_selectivity_mcvlist(),
+ * is quite simple in principle - walk through the MCV items and sum
+ * frequencies of all the items that match all the clauses.
+ *
+ * The current implementation uses MCV lists to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (a) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *
+ *  (b) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ *
+ * Estimating equality clauses
+ * ---------------------------
+ * When computing selectivity estimate for equality clauses
+ *
+ *      (a = 1) AND (b = 2)
+ *
+ * we can do this estimate pretty exactly assuming that two conditions
+ * are met:
+ *
+ *      (1) there's an equality condition on each attribute
+ *
+ *      (2) we find a matching item in the MCV list
+ *
+ * In that case we know the MCV item represents all the tuples matching
+ * the clauses, and the selectivity estimate is complete. This is what
+ * we call 'full match'.
+ *
+ * When only (1) holds, but there's no matching MCV item, we don't know
+ * whether there are no such rows or just are not very frequent. We can
+ * however use the frequency of the least frequent MCV item as an upper
+ * bound for the selectivity.
+ *
+ * If the equality conditions match only a subset of the attributes
+ * the MCV list is built on (i.e. we can't get a full match - we may get
+ * multiple MCV items matching the clauses, but even if we get a single
+ * match there may be items that did not get into the MCV list. But in
+ * this case we can still use the frequency of the last MCV item to clam
+ * the 'additional' selectivity not accounted for by the matching items.
+ *
+ * If there's no histogram, because the MCV list approximates the
+ * distribution accurately (not because the histogram was disabled),
+ * it does not really matter whether there are equality conditions on
+ * all the columns - we can do pretty accurate estimation using the MCV.
+ *
+ * TODO For a combination of equality conditions (not full-match case)
+ *      we probably can clamp the selectivity by the minimum of
+ *      selectivities for each condition. For example if we know the
+ *      number of distinct values for each column, we can use 1/ndistinct
+ *      as a per-column estimate. Or rather 1/ndistinct + selectivity
+ *      derived from the MCV list.
+ *
+ *      If we know the estimate of number of combinations of the columns
+ *      (i.e. ndistinct(A,B)), we may estimate the average frequency of
+ *      items in the remaining 10% as [10% / ndistinct(A,B)].
+ *
+ *
+ * Bounding estimates
+ * ------------------
+ * In general the MCV lists may not provide estimates as accurate as
+ * for the full-match equality case, but may provide some useful
+ * lower/upper boundaries for the estimation error.
+ *
+ * With equality clauses we can do a few more tricks to narrow this
+ * error range (see the previous section and TODO), but with inequality
+ * clauses (or generally non-equality clauses), it's rather dificult.
+ * There's nothing like a 'full match' - we have to consider both the
+ * MCV items and the remaining part every time. We can't use the minimum
+ * selectivity of MCV items, as the clauses may match multiple items.
+ *
+ * For example with a MCV list on columns (A, B), covering 90% of the
+ * table (computed while building the MCV list), about ~10% of the table
+ * is not represented by the MCV list. So even if the conditions match
+ * all the remaining rows (not represented by the MCV items), we can't
+ * get selectivity higher than those 10%. We may use 1/2 the remaining
+ * selectivity as an estimate (minimizing average error).
+ *
+ * TODO Most of these ideas (error limiting) are not yet implemented.
+ *
+ *
+ * General TODO
+ * ------------
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * TODO Add support for IS [NOT] NULL clauses, and clauses referencing
+ *      multiple columns (a < b).
+ *
+ * TODO It's possible to build a special case of MCV list, storing not
+ *      the actual values but only 32/64-bit hash. This is only useful
+ *      for estimating equality clauses and for large varlena types,
+ *      which are very impractical for plain MCV list because of size.
+ *      But for those data types we really want just the equality
+ *      clauses, so it's actually a good solution.
+ *
+ * TODO Currently there's no logic to consider building only a MCV list
+ *      (and not building the histogram at all), except for doing this
+ *      decision manually in ADD STATISTICS.
+ */
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * items in the MCV list max_mcv_items (well, we might increase this to
+ * 32k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* do not exceed UINT16_MAX */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval || (info[i].typlen > 0))
+			/* by value pased by reference, but fixed length */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/* inverse to serialize_mv_mcvlist() - see the comment there */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			val = item->values[i];
+			valout = FunctionCall1(&fmgrinfo[i], val);
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0b3518c..448cf35 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2101,8 +2101,8 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
 						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
@@ -2121,14 +2121,28 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 2), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 3), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 8));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 81ec23b..c6e7d74 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -35,15 +35,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -59,11 +65,15 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					5
+#define Natts_pg_mv_statistic					9
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_deps_enabled		2
-#define Anum_pg_mv_statistic_deps_built			3
-#define Anum_pg_mv_statistic_stakeys			4
-#define Anum_pg_mv_statistic_stadeps			5
+#define Anum_pg_mv_statistic_mcv_enabled		3
+#define Anum_pg_mv_statistic_mcv_max_items		4
+#define Anum_pg_mv_statistic_deps_built			5
+#define Anum_pg_mv_statistic_mcv_built			6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_stadeps			8
+#define Anum_pg_mv_statistic_stamcv				9
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 2178f6c..0d12dd3 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2728,6 +2728,10 @@ DATA(insert OID = 3377 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3378 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f6c4932..6fab94a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -564,9 +564,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 02a7dda..b028192 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -50,30 +50,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..85e8499
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE mcv_list ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+ALTER TABLE mcv_list ADD STATISTICS (dependencies, max_mcv_items 200) ON (a, b, c);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10) ON (a, b, c);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10000) ON (a, b, c);
+ERROR:  max number of MCV items is 8192
+-- correct command
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c, d);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2f9758f..fc27d34 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1357,7 +1357,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 00c6ddf..63727a4 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -111,4 +111,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index b818be9..5b07b3b 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -154,3 +154,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..5de3d29
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (unknown_column);
+
+-- single column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a);
+
+-- single column, duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE mcv_list ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- missing MCV statistics
+ALTER TABLE mcv_list ADD STATISTICS (dependencies, max_mcv_items 200) ON (a, b, c);
+
+-- invalid mcv_max_items value / too low
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10) ON (a, b, c);
+
+-- invalid mcv_max_items value / too high
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10000) ON (a, b, c);
+
+-- correct command
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c, d);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
1.9.3

0004-multivariate-histograms.patchtext/x-patch; name=0004-multivariate-histograms.patchDownload

>From 1ba83086a428bac548adba934bbb0c3909983978 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 4/6] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/tablecmds.c           |  108 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  751 ++++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/common.c         |   41 +-
 src/backend/utils/mvstats/histogram.c      | 2486 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   15 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  133 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 ++
 18 files changed, 3921 insertions(+), 45 deletions(-)
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 74fedf0..a9e761e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,7 +158,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 545b595..831bd2f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11889,15 +11889,19 @@ static int compare_int16(const void *a, const void *b)
  * The code is an unholy mix of pieces that really belong to other parts
  * of the source tree.
  *
- * FIXME Check that the types are pass-by-value and support sort,
- *       although maybe we can live without the sort (and only build
- *       MCV list / association rules).
- *
- * FIXME This should probably check for duplicate stats (i.e. same
- *       keys, same options). Although maybe it's useful to have
- *       multiple stats on the same columns with different options
- *       (say, a detailed MCV-only stats for some queries, histogram
- *       for others, etc.)
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ *
+ * TODO It might be useful to have ALTER TABLE DROP STATISTICS too, but
+ *      it's tricky because there may be multiple kinds of stats for the
+ *      same list of columns, with different options (e.g. one just MCV
+ *      list, another with histogram, etc.).
  */
 static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 						StatisticsDef *def, LOCKMODE lockmode)
@@ -11915,12 +11919,15 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 	/* by default build everything */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11998,6 +12005,29 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -12006,10 +12036,10 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -12017,6 +12047,11 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -12034,10 +12069,14 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled  -1] = BoolGetDatum(build_histogram);
+
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps -1]  = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
@@ -12060,6 +12099,7 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	return;
 }
 
+
 /*
  * Implements the ALTER TABLE ... DROP STATISTICS in two forms:
  *
@@ -12085,12 +12125,16 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 	/* checking whether the statistics matches / should be dropped */
 	bool	build_dependencies = false;
 	bool	build_mcv = false;
+	bool	build_histogram = false;
 
 	bool	max_mcv_items = 0;
+	bool	max_buckets = 0;
 
 	bool	check_dependencies = false;
 	bool	check_mcv = false;
 	bool	check_mcv_items = false;
+	bool	check_histogram = false;
+	bool	check_buckets = false;
 
 	if (def != NULL)
 	{
@@ -12144,6 +12188,18 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 				build_mcv       = true;
 				max_mcv_items   = defGetInt32(opt);
 			}
+			else if (strcmp(opt->defname, "histogram") == 0)
+			{
+				check_histogram = true;
+				build_histogram = defGetBoolean(opt);
+			}
+			else if (strcmp(opt->defname, "max_buckets") == 0)
+			{
+				check_histogram = true;
+				check_buckets   = true;
+				max_buckets     = defGetInt32(opt);
+				build_histogram = true;
+			}
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
@@ -12207,6 +12263,30 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 					 (DatumGetInt32(adatum) == max_mcv_items);
 		}
 
+		if (delete && check_histogram)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_hist_enabled,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetBool(adatum) == build_histogram);
+		}
+
+		if (delete && check_buckets)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_hist_max_buckets,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetInt32(adatum) == max_buckets);
+		}
+
 		/* check that the columns match the statistics definition */
 		if (delete && (numcols > 0))
 		{
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 635ccc1..162b1be 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1852,10 +1852,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index abffb0a..2d3cf09 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -53,6 +53,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -77,6 +78,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -84,6 +87,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
 #define MIN(x, y) (((x) < (y)) ? (x) : (y))
@@ -271,7 +280,7 @@ clauselist_selectivity(PlannerInfo *root,
 	 * From now on we're only interested in MCV-compatible clauses.
 	 */
 	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-								   MV_CLAUSE_TYPE_MCV);
+								   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 	/*
 	 * If there still are at least two columns, we'll try to select
@@ -306,7 +315,7 @@ clauselist_selectivity(PlannerInfo *root,
 				/* split the clauselist into regular and mv-clauses */
 				clauses = clauselist_mv_split(root, sjinfo, clauses,
 										varRelid, &mvclauses, mvstat,
-										MV_CLAUSE_TYPE_MCV);
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 				/* we've chosen the histogram to match the clauses */
 				Assert(mvclauses != NIL);
@@ -1160,6 +1169,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -1173,9 +1183,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1317,7 +1342,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1483,7 +1508,6 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		bool		ok;
 
 		/* is it 'variable op constant' ? */
-
 		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
 			(is_pseudo_constant_clause_relids(lsecond(expr->args),
 											  right_relids) ||
@@ -1533,10 +1557,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 					case F_SCALARLTSEL:
 					case F_SCALARGTSEL:
 						/* not compatible with functional dependencies */
-						if (types & MV_CLAUSE_TYPE_MCV)
+						if (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 						{
 							*attnums = bms_add_member(*attnums, var->varattno);
-							return (types & MV_CLAUSE_TYPE_MCV);
+							return (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 						}
 						return false;
 
@@ -2464,3 +2488,714 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all buckets, and increase the match level
+ *      for the clauses (and skip buckets that are 'full match').
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram2(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	int calls = 0, hits = 0;
+
+	Assert (mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert (clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 *
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																	   | TYPECACHE_GT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					bool tmp;
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+
+					/* values from the call cache */
+					char mincached, maxcached;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mincached = callcache[bucket->min[idx]];
+					maxcached = callcache[bucket->max[idx]];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*
+					* FIXME Once the min/max values are deduplicated, we can easily minimize
+					*       the number of calls to the comparator (assuming we keep the
+					*       deduplicated structure). See the note on compression at MVBucket
+					*       serialize/deserialize methods.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* column < constant */
+
+							if (! isgt)	/* (var < const) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								++calls;
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 minval));
+
+									/*
+									 * Update the cache (but in reverse, because we keep the
+									 * cache for calls with (minval, constvalue).
+									 */
+									if (tmp)
+										callcache[bucket->min[idx]] = 0x01;	/* cached, false */
+									else
+										callcache[bucket->min[idx]] = 0x03;	/* cached, true */
+								}
+								else
+								{
+									++hits;
+									tmp = !(mincached & 0x02);	/* extract the result (reverse) */
+								}
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								++calls;
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 maxval));
+
+									/* update the cache */
+									if (tmp)
+										callcache[bucket->max[idx]] = 0x01;	/* cached, false */
+									else
+										callcache[bucket->max[idx]] = 0x03;	/* cached, true */
+								}
+								else
+								{
+									++hits;
+									tmp = !(maxcached & 0x02);	/* extract the result (reverse) */
+								}
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							}
+							else	/* (const < var) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								++calls;
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 maxval,
+																		 cst->constvalue));
+
+									/* update the cache */
+									if (tmp)
+										callcache[bucket->max[idx]] = 0x03;	/* cached, true */
+									else
+										callcache[bucket->max[idx]] = 0x01;	/* cached, false */
+								}
+								else
+								{
+									++hits;
+									tmp = (maxcached & 0x02);	/* extract the result */
+								}
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								++calls;
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 minval,
+																		 cst->constvalue));
+
+									/* update the cache */
+									if (tmp)
+										callcache[bucket->min[idx]] = 0x03;	/* cached, true */
+									else
+										callcache[bucket->min[idx]] = 0x01;	/* cached, false */
+								}
+								else
+								{
+									++hits;
+									tmp = (mincached & 0x02);	/* extract the result */
+								}
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_SCALARGTSEL:	/* column > constant */
+
+							if (! isgt)	/* (var > const) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								++calls;
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 maxval));
+
+									/* update the cache */
+									if (tmp)
+										callcache[bucket->max[idx]] = 0x01;	/* cached, false */
+									else
+										callcache[bucket->max[idx]] = 0x03;	/* cached, true */
+								}
+								else
+								{
+									++hits;
+									tmp = !(maxcached & 0x02);	/* extract the result */
+								}
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								++calls;
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 minval));
+
+									/* update the cache */
+									if (tmp)
+										callcache[bucket->min[idx]] = 0x01;	/* cached, false */
+									else
+										callcache[bucket->min[idx]] = 0x03;	/* cached, true */
+								}
+								else
+								{
+									++hits;
+									tmp = !(mincached & 0x02);	/* extract the result */
+								}
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							else /* (const > var) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in
+								 * that case we can skip the bucket, because there's no overlap).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 minval,
+																		 cst->constvalue));
+
+									/* update the cache */
+									if (tmp)
+										callcache[bucket->min[idx]] = 0x03;	/* cached, true */
+									else
+										callcache[bucket->min[idx]] = 0x01;	/* cached, false */
+								}
+								else
+								{
+									++hits;
+									tmp = (mincached & 0x02);	/* extract the result */
+								}
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 maxval,
+																		 cst->constvalue));
+
+									/* update the cache */
+									if (tmp)
+										callcache[bucket->max[idx]] = 0x03;	/* cached, true */
+									else
+										callcache[bucket->max[idx]] = 0x01;	/* cached, false */
+								}
+								else
+								{
+									++hits;
+									tmp = (maxcached & 0x02);	/* extract the result */
+								}
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the lt/gt
+							 * operators fetched from type cache.
+							 *
+							 * TODO We'll use the default 50% estimate, but that's probably way off
+							 *		if there are multiple distinct values. Consider tweaking this a
+							 *		somehow, e.g. using only a part inversely proportional to the
+							 *		estimated number of distinct values in the bucket.
+							 *
+							 * TODO This does not handle inclusion flags at the moment, thus counting
+							 *		some buckets twice (when hitting the boundary).
+							 *
+							 * TODO Optimization is that if max[i] == min[i], it's effectively a MCV
+							 *		item and we can count the whole bucket as a complete match (thus
+							 *		using 100% bucket selectivity and not just 50%).
+							 *
+							 * TODO Technically some buckets may "degenerate" into single-value
+							 *		buckets (not necessarily for all the dimensions) - maybe this
+							 *		is better than keeping a separate MCV list (multi-dimensional).
+							 *		Update: Actually, that's unlikely to be better than a separate
+							 *		MCV list for two reasons - first, it requires ~2x the space
+							 *		(because of storing lower/upper boundaries) and second because
+							 *		the buckets are ranges - depending on the partitioning algorithm
+							 *		it may not even degenerate into (min=max) bucket. For example the
+							 *		the current partitioning algorithm never does that.
+							 */
+							++calls;
+							if (! mincached)
+							{
+								tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 minval));
+
+								/* update the cache */
+								if (tmp)
+									callcache[bucket->min[idx]] = 0x03;	/* cached, true */
+								else
+									callcache[bucket->min[idx]] = 0x01;	/* cached, false */
+							}
+							else
+							{
+								++hits;
+								tmp = (mincached & 0x02);	/* extract the result */
+							}
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							++calls;
+							if (! maxcached)
+							{
+								tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																	 DEFAULT_COLLATION_OID,
+																	 maxval,
+																	 cst->constvalue));
+
+								/* update the cache */
+								if (tmp)
+									callcache[bucket->max[idx]] = 0x03;	/* cached, true */
+								else
+									callcache[bucket->max[idx]] = 0x01;	/* cached, false */
+							}
+							else
+							{
+								++hits;
+								tmp = (maxcached & 0x02);	/* extract the result */
+							}
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							/* partial match */
+							UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							break;
+					}
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	elog(WARNING, "calls=%d hits=%d hit ratio %.2f",
+				  calls, hits, hits * 100.0 / calls);
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c196ca0..a05c811 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -406,7 +406,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -416,10 +416,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 3c0aff4..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o mcv.o dependencies.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d1da714..6499357 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,16 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+
+#ifdef MVSTATS_DEBUG
+		print_mv_histogram_info(histogram);
+#endif
 	}
 }
 
@@ -176,6 +185,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +201,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +246,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +288,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..4a7f4b2
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2486 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+/*
+ * Multivariate histograms
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by a min/max value in each
+ * dimension, stored in an array, so that the bucket includes values
+ * fulfilling condition
+ *
+ *     min[i] <= value[i] <= max[i]
+ *
+ * where 'i' is the dimension. In 1D this corresponds to a simple
+ * interval, in 2D to a rectangle, and in 3D to a block. If you can
+ * imagine this in 4D, congrats!
+ *
+ * In addition to the bounaries, each bucket tracks additional details:
+ *
+ *     * frequency (fraction of tuples it matches)
+ *     * whether the boundaries are inclusive or exclusive
+ *     * whether the dimension contains only NULL values
+ *     * number of distinct values in each dimension (for building)
+ *
+ * and possibly some additional information.
+ *
+ * We do expect to support multiple histogram types, with different
+ * features etc. The 'type' field is used to identify those types.
+ * Technically some histogram types might use completely different
+ * bucket representation, but that's not expected at the moment.
+ *
+ * Although the current implementation builds non-overlapping buckets,
+ * the code does not rely on the non-overlapping nature - there are
+ * interesting types of histograms / histogram building algorithms
+ * producing overlapping buckets.
+ *
+ * TODO Currently the histogram does not include information about what
+ *      part of the table it covers (because the frequencies are
+ *      computed from the rows that may be filtered by MCV list). Seems
+ *      wrong, possibly causing misestimates (when not matching the MCV
+ *      list, we'll probably get much higher selectivity).
+ *
+ *
+ * Estimating selectivity
+ * ----------------------
+ * With histograms, we always "match" a whole bucket, not indivitual
+ * rows (or values), irrespectedly of the type of clause. Therefore we
+ * can't use the optimizations for equality clauses, as in MCV lists.
+ *
+ * The current implementation uses histograms to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (a) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (b) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ * When used on low-cardinality data, histograms usually perform
+ * considerably worse than MCV lists (which are a good fit for this
+ * kind of data). This is especially true on categorical data, where
+ * ordering of the values is only loosely related to meaning of the
+ * data, as proper ordering is crucial for histograms.
+ *
+ * On high-cardinality data the histograms are usually a better choice,
+ * because MCV lists can't accurately represent the distribution.
+ *
+ * By evaluating a clause on a bucket, we may get one of three results:
+ *
+ *     (a) FULL_MATCH - The bucket definitely matches the clause.
+ *
+ *     (b) PARTIAL_MATCH - The bucket matches the clause, but not
+ *                         necessarily all the tuples it represents.
+ *
+ *     (c) NO_MATCH - The bucket definitely does not match the clause.
+ *
+ * This may be illustrated using a range [1, 5], which is essentially
+ * a 1D bucket. With clause
+ *
+ *     WHERE (a < 10) => FULL_MATCH (all range values are below
+ *                       10, so the whole bucket matches)
+ *
+ *     WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+ *                       the clause, but we don't know how many)
+ *
+ *     WHERE (a < 0)  => NO_MATCH (all range values are above 1, so
+ *                       no values from the bucket match)
+ *
+ * Some clauses may produce only some of those results - for example
+ * equality clauses may never produce FULL_MATCH as we always hit only
+ * part of the bucket, not all the values. This results in less accurate
+ * estimates compared to MCV lists, where we can hit a MCV items exactly
+ * (an extreme case of that is 'full match').
+ *
+ * There are clauses that may not produce any PARTIAL_MATCH results.
+ * A nice example of that is 'IS [NOT] NULL' clause, which either
+ * matches the bucket completely (FULL_MATCH) or not at all (NO_MATCH),
+ * thanks to how the NULL-buckets are constructed.
+ *
+ * TODO The IS [NOT] NULL clause is not yet implemented, but should be
+ *      rather trivial to.
+ *
+ * Computing the total selectivity estimate is trivial - simply sum
+ * selectivities from all the FULL_MATCH and PARTIAL_MATCH buckets, but
+ * multiply the PARTIAL_MATCH buckets by 0.5 to minimize average error.
+ *
+ *
+ * NULL handling
+ * -------------
+ * Buckets may not contain tuples with NULL and non-NULL values in
+ * a single dimension (attribute). To handle this, the histogram may
+ * contain NULL-buckets, i.e. buckets with one or more NULL-only
+ * dimensions.
+ *
+ * The maximum number of NULL-buckets is determined by the number of
+ * attributes the histogram is built on. For N-dimensional histogram,
+ * the maximum number of NULL-buckets is 2^N. So for 8 attributes
+ * (which is the current value of MVSTATS_MAX_DIMENSIONS), there may be
+ * up to 256 NULL-buckets.
+ *
+ * Those buckets are only built if needed - if there are no NULL values
+ * in the data, no such buckets are built.
+ *
+ *
+ * Serialization
+ * -------------
+ * After building, the histogram is serialized into a more efficient
+ * form (dedup boundary values etc.). See serialize_mv_histogram() for
+ * more details about how it's done.
+ *
+ * Serialized histograms are marked with 'magic' constant, to make it
+ * easier to check the bytea really is a histogram in serialized form.
+ *
+ *
+ * TODO This structure is used both when building the histogram, and
+ *      then when using it to compute estimates. That's why the last
+ *      few elements are not used once the histogram is built.
+ *
+ *      Add pointer to 'private' data, meant for private data for
+ *      other algorithms for building the histogram. It also removes
+ *      the bogus / unnecessary fields.
+ *
+ * TODO The limit on number of buckets is quite arbitrary, aiming for
+ *      sufficient accuracy while still being fast. Probably should be
+ *      replaced with a dynamic limit dependent on statistics target,
+ *      number of attributes (dimensions) and statistics target
+ *      associated with the attributes. Also, this needs to be related
+ *      to the number of sampled rows, by either clamping it to a
+ *      reasonable number (after seeing the number of rows) or using
+ *      it when computing the number of rows to sample. Something like
+ *      10 rows per bucket seems reasonable.
+ *
+ * TODO Add MVSTAT_HIST_ROWS_PER_BUCKET tracking minimal number of
+ *      tuples per bucket (also, see the previous TODO).
+ *
+ * TODO We may replace the bool arrays with a suitably large data type
+ *      (say, uint16 or uint32) and get rid of the allocations. It's
+ *      unlikely we'll ever support more than 32 columns as that'd
+ *      result in poor precision, huge histograms (splitting each
+ *      dimension once would mean 2^32 buckets), and very expensive
+ *      estimation. MCVItem already does it this way.
+ *
+ *      Update: Actually, this is not 100% true, because we're splitting
+ *      a single bucket, not all the buckets at the same time. So each
+ *      split simply adds one new bucket, and we choose the bucket that
+ *      is most in need of a slit. So even with 32 columns this might
+ *      give reasonable accuracy, maybe? After 1000 splits we'll get
+ *      about 1001 buckets, and some may be quite large (if that area
+ *      frequency has low frequency of tuples).
+ *
+ *      There are other challenges though - e.g. with this many columns
+ *      it's more likely to reference both label/non-label columns,
+ *      which is rather quirky (especially with histograms).
+ *
+ *      However, while this would save some space for histograms built
+ *      on many columns, it won't save anything for up to 4 columns
+ *      (actually, on less than 3 columns it's probably wasteful).
+ *
+ * TODO Maybe the distinct stats (both for combination of all columns
+ *      and for combinations of various subsets of columns) should be
+ *      moved to a separate structure (next to histogram/MCV/...) to
+ *      make it useful even without a histogram computed etc.
+ */
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/* some debugging methods */
+#ifdef MVSTATS_DEBUG
+static void print_mv_histogram_info(MVHistogram histogram);
+#endif
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/* index of the dimension the bucket was split previously */
+	int			last_split_dimension;
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 *
+	 * XXX Maybe it could be useful for improving ndistinct estimates for
+	 *     combinations of columns (e.g. in GROUP BY queries). It would
+	 *     probably mean tracking 2^N values for each bucket, and even if
+	 *     those values might be stores in 1B (which is unlikely) it's
+	 *     still a lot of space (considering the expected number of
+	 *     buckets). So maybe that might be tracked just at the top level.
+	 *
+	 * TODO Consider tracking ndistincts for all attribute combinations.
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, by looking at the number of
+ * distinct values (combination of column values for bucket, column
+ * values for a dimension). This is somehow naive, but seems to work
+ * quite well. See the discussion at select_bucket_to_partition and
+ * partition_bucket for more details about alternative algorithms.
+ *
+ * So the current algorithm looks like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (max distinct combinations)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (max distinct values)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later
+	 * to select dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		int				j;
+		int				nvals;
+		Datum		   *tmp;
+
+		SortSupportData	ssup;
+		StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		nvals = 0;
+		tmp = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+		for (j = 0; j < numrows; j++)
+		{
+			bool	isnull;
+
+			/* remember the index of the sample row, to make the partitioning simpler */
+			Datum	value = heap_getattr(rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isnull);
+
+			if (isnull)
+				continue;
+
+			tmp[nvals++] = value;
+		}
+
+		/* do the sort and stuff only if there are non-NULL values */
+		if (nvals > 0)
+		{
+			/* sort the array of values */
+			qsort_arg((void *) tmp, nvals, sizeof(Datum),
+					  compare_scalars_simple, (void *) &ssup);
+
+			/* count distinct values */
+			ndistvalues[i] = 1;
+			for (j = 1; j < nvals; j++)
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+					ndistvalues[i] += 1;
+
+			/* FIXME allocate only needed space (count ndistinct first) */
+			distvalues[i] = (Datum*)palloc0(sizeof(Datum) * ndistvalues[i]);
+
+			/* now collect distinct values into the array */
+			distvalues[i][0] = tmp[0];
+			ndistvalues[i] = 1;
+
+			for (j = 1; j < nvals; j++)
+			{
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+				{
+					distvalues[i][ndistvalues[i]] = tmp[j];
+					ndistvalues[i] += 1;
+				}
+			}
+		}
+
+		pfree(tmp);
+	}
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats,
+							   ndistvalues, distvalues);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data = ((HistogramBuild)histogram->buckets[i]->build_data);
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram2(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram_2(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm
+ * is simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * buckets in the histogram max_buckets (well, we might increase this
+ * to 16k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ *
+ * Deduplication in serialization
+ * ------------------------------
+ * The deduplication is very effective and important here, because every
+ * time we split a bucket, we keep all the boundary values, except for
+ * the dimension that was used for the split. Another way to look at
+ * this is that each split introduces 1 new value (the value used to do
+ * the split). A histogram with M buckets was created by (M-1) splits
+ * of the initial bucket, and each bucket has 2*N boundary values. So
+ * assuming the initial bucket does not have any 'collapsed' dimensions,
+ * the number of distinct values is
+ *
+ *     (2*N + (M-1))
+ *
+ * but the total number of boundary values is
+ *
+ *     2*N*M
+ *
+ * which is clearly much higher. For a histogram on two columns, with
+ * 1024 buckets, it's 1027 vs. 4096. Of course, we're not saving all
+ * the difference (because we'll use 32-bit indexes into the values).
+ * But with large values (e.g. stored as varlena), this saves a lot.
+ *
+ * An interesting feature is that the total number of distinct values
+ * does not really grow with the number of dimensions, except for the
+ * size of the initial bucket. After that it only depends on number of
+ * buckets (i.e. number of splits).
+ *
+ * XXX Of course this only holds for the current histogram building
+ *     algorithm. Algorithms doing the splits differently (e.g.
+ *     producing overlapping buckets) may behave differently.
+ *
+ * TODO This only confirms we can use the uint16 indexes. The worst
+ *      that could happen is if all the splits happened by a single
+ *      dimension. To exhaust the uint16 this would require ~64k
+ *      splits (needs to be reflected in MVSTAT_HIST_MAX_BUCKETS).
+ *
+ * TODO We don't need to use a separate boolean for each flag, instead
+ *      use a single char and set bits.
+ *
+ * TODO We might get a bit better compression by considering the actual
+ *      data type length. The current implementation treats all data
+ *      types passed by value as requiring 8B, but for INT it's actually
+ *      just 4B etc.
+ *
+ *      OTOH this is only related to the lookup table, and most of the
+ *      space is occupied by the buckets (with int16 indexes).
+ *
+ *
+ * Varlena compression
+ * -------------------
+ * This encoding may prevent automatic varlena compression (similarly
+ * to JSONB), because first part of the serialized bytea will be an
+ * array of unique values (although sorted), and pglz decides whether
+ * to compress by trying to compress the first part (~1kB or so). Which
+ * is likely to be poor, due to the lack of repetition.
+ *
+ * One possible cure to that might be storing the buckets first, and
+ * then the deduplicated arrays. The buckets might be better suited
+ * for compression.
+ *
+ * On the other hand the encoding scheme is a context-aware compression,
+ * usually compressing to ~30% (or less, with large data types). So the
+ * lack of pglz compression may be OK.
+ *
+ * XXX But maybe we don't really want to compress this, to save on
+ *     planning time?
+ *
+ * TODO Try storing the buckets / deduplicated arrays in reverse order,
+ *      measure impact on compression.
+ *
+ *
+ * Deserialization
+ * ---------------
+ * The deserialization is currently implemented so that it reconstructs
+ * the histogram back into the same structures - this involves quite
+ * a few of memcpy() and palloc(), but maybe we could create a special
+ * structure for the serialized histogram, and access the data directly,
+ * without the unpacking.
+ *
+ * Not only it would save some memory and CPU time, but might actually
+ * work better with CPU caches (not polluting the caches).
+ *
+ * TODO Try to keep the compressed form, instead of deserializing it to
+ *      MVHistogram/MVBucket.
+ *
+ *
+ * General TODOs
+ * -------------
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (10 * 1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 10MB (%ld > %d)",
+					total_length, (10 * 1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typlen > 0)
+			{
+				/* pased by value or reference, but fixed length */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Reverse to serialize histogram. This essentially expands the serialized
+ * form back to MVHistogram / MVBucket.
+ */
+MVHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+	Datum **values = NULL;
+
+	MVHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* temporary deserialization buffer */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram = (MVHistogram)palloc(sizeof(MVHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVHistogramData, buckets));
+	tmp += offsetof(MVHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		if (! info[i].typbyval)
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the histogram buckets in a single piece */
+	rbufflen = (sizeof(MVBucket) + sizeof(MVBucket) +
+				(sizeof(MVBucketData) + 2*sizeof(Datum)
+					+ 3*sizeof(bool))*ndims) * nbuckets;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	histogram->buckets = (MVBucket*)rbuff;
+	rptr += (sizeof(MVBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+
+		MVBucket bucket = (MVBucket)rptr;
+		rptr += sizeof(MVBucketData);
+
+		bucket->nullsonly     = (bool*)rptr;
+		rptr += (sizeof(bool) * ndims);
+
+		bucket->min_inclusive = (bool*)rptr;
+		rptr += (sizeof(bool) * ndims);
+
+		bucket->max_inclusive = (bool*)rptr;
+		rptr += (sizeof(bool) * ndims);
+
+		bucket->min = (Datum*) rptr;
+		rptr += (sizeof(Datum) * ndims);
+
+		bucket->max = (Datum*) rptr;
+		rptr += (sizeof(Datum) * ndims);
+
+		bucket->ntuples   = *BUCKET_NTUPLES(tmp);
+
+		memcpy(bucket->nullsonly, BUCKET_NULLS_ONLY(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		memcpy(bucket->min_inclusive, BUCKET_MIN_INCL(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		memcpy(bucket->max_inclusive, BUCKET_MAX_INCL(tmp, ndims),
+				sizeof(bool) * ndims);
+
+		/* translate the indexes to values */
+		for (j = 0; j < ndims; j++)
+		{
+			if (! bucket->nullsonly[j])
+			{
+				bucket->min[j] = values[j][BUCKET_MIN_INDEXES(tmp, ndims)[j]];
+				bucket->max[j] = values[j][BUCKET_MAX_INDEXES(tmp, ndims)[j]];
+			}
+		}
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	pfree(buff);
+
+	return histogram;
+}
+
+
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary
+ * values deduplicated, so that it's possible to optimize the estimation
+ * part by caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram_2(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* now let's allocate a single buffer for all the values and counts */
+
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+	for (i = 0; i < ndims; i++)
+	{
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+	}
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+				sizeof(MVSerializedBucket) +		/* bucket pointer */
+				sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval && info[i].typlen == sizeof(Datum))
+		{
+			/* passed by value / Datum - simply reuse the array */
+			histogram->values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typbyval)
+			{
+				/* pased by value, but smaller than Datum */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= *BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller
+ * buckets.
+ *
+ * TODO Add ndistinct estimation, probably the one described in "Towards
+ *      Estimation Error Guarantees for Distinct Values, PODS 2000,
+ *      p. 268-279" (the ones called GEE, or maybe AE).
+ *
+ * TODO The "combined" ndistinct is more likely to scale with the number
+ *      of rows (in the table), because a single column behaving this
+ *      way is sufficient for such behavior.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	/*
+	 * The initial bucket was not split at all, so we'll start with the
+	 * first dimension in the next round (index = 0).
+	 */
+	data->last_split_dimension = -1;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * TODO Fix to handle arbitrarily-sized histograms (not just 2D ones)
+ *      and call the right output procedures (for the particular type).
+ *
+ * TODO This should somehow fetch info about the data types, and use
+ *      the appropriate output functions to print the boundary values.
+ *      Right now this prints the 8B value as an integer.
+ *
+ * TODO Also, provide a special function for 2D histogram, printing
+ *      a gnuplot script (with rectangles).
+ *
+ * TODO For string types (once supported) we can sort the strings first,
+ *      assign them a sequence of integers and use the original values
+ *      as labels.
+ */
+#ifdef MVSTATS_DEBUG
+static void
+print_mv_histogram_info(MVHistogram histogram)
+{
+	int i = 0;
+
+	elog(WARNING, "histogram nbuckets=%d", histogram->nbuckets);
+
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		MVBucket bucket = histogram->buckets[i];
+		elog(WARNING, "  bucket %d : ndistinct=%f ntuples=%d min=[%ld, %ld], max=[%ld, %ld] distinct=[%d,%d]",
+			i, bucket->ndistinct, bucket->numrows,
+			bucket->min[0], bucket->min[1], bucket->max[0], bucket->max[1],
+			bucket->ndistincts[0], bucket->ndistincts[1]);
+	}
+}
+#endif
+
+/*
+ * A very simple partitioning selection criteria - choose the bucket
+ * with the highest number of distinct values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	// int ndistinct = 1; /* if ndistinct=1, we can't split the bucket */
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the ndistinct count is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+//			ndistinct = data->ndistinct;
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - splits the dimensions in
+ * a round-robin manner (considering only those with ndistinct>1). That
+ * is first a dimension 0 is split, then 1, 2, ... until reaching the
+ * end of attribute list, and then wrapping back to 0. Of course,
+ * dimensions with a single distinct value are skipped.
+ *
+ * This is essentially what Muralikrishna/DeWitt described in their SIGMOD
+ * article (M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For
+ * Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD
+ * Conference 1988: 28-36).
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * This splits the bucket by tweaking the existing one, and returning the
+ * new bucket (essentially shrinking the existing one in-place and returning
+ * the other "half" as a new bucket). The caller is responsible for adding
+ * the new bucket into the list of buckets.
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case of
+ *      strongly dependent columns - e.g. y=x).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g. to
+ *      split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	// int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split.
+	 *
+	 * If we happen to wrap around, something clearly went wrong (we
+	 * can't mess with the last_split_dimension directly, because we
+	 * couldn't do this check).
+	 */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* sort support for the bsearch_comparator */
+		ssup_private = &ssup;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		b = (Datum*)bsearch(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/* Remember the dimension for the next split of this bucket. */
+	data->last_split_dimension = dimension;
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->last_split_dimension = ((HistogramBuild)bucket->build_data)->last_split_dimension;
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram2(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_size = 1.0;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(9 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+		values[3] = (char *) palloc0(1024 * sizeof(char));
+		values[4] = (char *) palloc0(1024 * sizeof(char));
+		values[5] = (char *) palloc0(1024 * sizeof(char));
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/*
+		 * currently we only print array of indexes, but the deduplicated
+		 * values should be sorted, so this is actually quite useful
+		 *
+		 * TODO print the actual min/max values, using the output
+		 *      function of the attribute type
+		 */
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bucket_size *= (bucket->max[i] - bucket->min[i]) * 1.0
+											/ (histogram->nvalues[i]-1);
+
+			/* print the actual values, i.e. use output function etc. */
+			if (otype == 0)
+			{
+				Datum minval, maxval;
+				Datum minout, maxout;
+
+				format = "%s, %s";
+				if (i == 0)
+					format = "{%s%s";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %s}";
+
+				minval = histogram->values[i][bucket->min[i]];
+				minout = FunctionCall1(&fmgrinfo[i], minval);
+
+				maxval = histogram->values[i][bucket->max[i]];
+				maxout = FunctionCall1(&fmgrinfo[i], maxval);
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], DatumGetPointer(minout));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], DatumGetPointer(maxout));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else if (otype == 1)
+			{
+				format = "%s, %d";
+				if (i == 0)
+					format = "{%s%d";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %d}";
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else
+			{
+				format = "%s, %f";
+				if (i == 0)
+					format = "{%s%f";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %f}";
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1],
+						 bucket->min[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2],
+						bucket->max[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == histogram->ndimensions-1)
+				format = "%s, %s}";
+
+			snprintf(buff, 1024, format, values[3], bucket->nullsonly[i] ? "t" : "f");
+			strncpy(values[3], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[4], bucket->min_inclusive[i] ? "t" : "f");
+			strncpy(values[4], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[5], bucket->max_inclusive[i] ? "t" : "f");
+			strncpy(values[5], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_size);	/* density */
+		snprintf(values[8], 64, "%f", bucket_size);	/* bucket_size */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+		pfree(values[4]);
+		pfree(values[5]);
+		pfree(values[6]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 448cf35..0699d6c 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2101,8 +2101,8 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
 						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
@@ -2141,8 +2141,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 8));
+							PQgetvalue(result, i, 10));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index c6e7d74..84579da 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -36,13 +36,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -50,6 +53,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -65,15 +69,19 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					9
+#define Natts_pg_mv_statistic					13
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_deps_enabled		2
 #define Anum_pg_mv_statistic_mcv_enabled		3
-#define Anum_pg_mv_statistic_mcv_max_items		4
-#define Anum_pg_mv_statistic_deps_built			5
-#define Anum_pg_mv_statistic_mcv_built			6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_stadeps			8
-#define Anum_pg_mv_statistic_stamcv				9
+#define Anum_pg_mv_statistic_hist_enabled		4
+#define Anum_pg_mv_statistic_mcv_max_items		5
+#define Anum_pg_mv_statistic_hist_max_buckets	6
+#define Anum_pg_mv_statistic_deps_built			7
+#define Anum_pg_mv_statistic_mcv_built			8
+#define Anum_pg_mv_statistic_hist_built			9
+#define Anum_pg_mv_statistic_stakeys			10
+#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_stamcv				12
+#define Anum_pg_mv_statistic_stahist			13
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0d12dd3..9cd3e5a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2732,6 +2732,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_size}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6fab94a..b776962 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -565,10 +565,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index b028192..1cb9400 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -91,6 +91,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -98,20 +215,27 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVHistogram    load_mv_histogram(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram2(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVHistogram		deserialize_mv_histogram(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram_2(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -120,6 +244,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -129,10 +255,15 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..a3d3fd8
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE mv_histogram ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+ALTER TABLE mv_histogram ADD STATISTICS (dependencies, max_buckets 200) ON (a, b, c);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 10) ON (a, b, c);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 100000) ON (a, b, c);
+ERROR:  minimum number of buckets is 16384
+-- correct command
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c, d);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index fc27d34..b02d06e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1359,7 +1359,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 63727a4..aeb89f8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -111,4 +111,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 5b07b3b..ee1468d 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -155,3 +155,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..31c627a
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (unknown_column);
+
+-- single column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a);
+
+-- single column, duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE mv_histogram ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- missing histogram statistics
+ALTER TABLE mv_histogram ADD STATISTICS (dependencies, max_buckets 200) ON (a, b, c);
+
+-- invalid max_buckets value / too low
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 10) ON (a, b, c);
+
+-- invalid max_buckets value / too high
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 100000) ON (a, b, c);
+
+-- correct command
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c, d);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
1.9.3

0005-multi-statistics-estimation.patchtext/x-patch; name=0005-multi-statistics-estimation.patchDownload

>From fb6240254c3fb2311c3ae91597ae29bcbf18f20b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 5/6] multi-statistics estimation

The general idea is that a probability (which
is what selectivity is) can be split into a product of
conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part
may be simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute
the original probability.

The implementation works in the other direction, though.
We know what probability P(A & B & C) we need to compute,
and also what statistics are available.

So we search for a combinations of statistics, covering
the clauses in an optimal way (most clauses covered, most
dependencies exploited).

There are two possible approaches - exhaustive and greedy.
The exhaustive one walks through all permutations of
stats using dynamic programming, so it's guaranteed to
find the optimal solution, but it soon gets very slow as
it's roughly O(N!). The dynamic programming may improve
that a bit, but it's still far too expensive for large
numbers of statistics (on a single table).

The greedy algorithm is very simple - in every step choose
the best solution. That may not guarantee the best solution
globally (but maybe it does?), but it only needs N steps
to find the solution, so it's very fast (processing the
selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with
respect to runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply
them to the clauses using the conditional probabilities.
We process the selected stats one by one, and for each
we select the estimated clauses and conditions. See
clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to
be covered by a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single
multivariate statistics.

Clauses not covered by a single statistics at this level
will be passed to clause_selectivity() but this will treat
them as a collection of simpler clauses (connected by AND
or OR), and the clauses from the previous level will be
used as conditions.

So using the same example, the last clause will be passed
to clause_selectivity() with 'clause1' and 'clause2' as
conditions, and it will be processed using multivariate
stats if possible.

The other limitation is that all the expressions have to
be mv-compatible, i.e. there can't be a mix of expressions.
Fixing this should be relatively simple - just split the
list into two parts (mv-compatible/incompatible), as at
the top level.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |    6 +-
 src/backend/optimizer/path/clausesel.c | 2182 +++++++++++++++++++++++++++++---
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 9 files changed, 2068 insertions(+), 201 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 4368897..7b4839b 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -947,7 +947,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 478e124..ff6b438 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -478,7 +478,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -1770,7 +1771,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_join_conds,
 										   baserel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 2d3cf09..7eb53b9 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -30,7 +30,8 @@
 #include "utils/typcache.h"
 
 #include "parser/parsetree.h"
-
+#include "access/sysattr.h"
+#include "miscadmin.h"
 
 #include <stdio.h>
 
@@ -48,6 +49,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -63,23 +71,29 @@ static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
 									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
 									  int type);
 
+static Bitmapset *clause_mv_get_attnums(PlannerInfo *root, Node *clause);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
 static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
 								 List *clauses, Oid varRelid,
 								 List **mvclauses, MVStatisticInfo *mvstats, int types);
 
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-						List *clauses, MVStatisticInfo *mvstats);
+						MVStatisticInfo *mvstats, List *clauses,
+						List *conditions, bool is_or);
+
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -93,6 +107,33 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root,
+								List *mvstats,
+								List *clauses, List *conditions,
+								Oid varRelid,
+								SpecialJoinInfo *sjinfo, int type);
+
+static Bitmapset * get_varattnos(Node * node, Index relid);
+
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
+
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
 #define MIN(x, y) (((x) < (y)) ? (x) : (y))
@@ -221,112 +262,296 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	bool		has_mv_stats;
+	Index		relid = InvalidOid;
 
 	/* attributes in mv-compatible clauses */
 	Bitmapset  *mvattnums = NULL;
 
-	/*
-	 * If there's exactly one clause, then no use in trying to match up
-	 * pairs, so just go directly to clause_selectivity().
-	 */
-	if (list_length(clauses) == 1)
-		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
-
-	/*
-	 * Collect attributes referenced by mv-compatible clauses (looking
-	 * for clauses compatible with functional dependencies for now).
-	 */
-	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-								   MV_CLAUSE_TYPE_FDEP);
+	/* local conditions, accumulated and passed to clauses in this list */
+	List	*conditions_local = NIL;
 
 	/*
-	 * If there are mv-compatible clauses, referencing at least two
-	 * different columns (otherwise it makes no sense to use mv stats),
-	 * try to reduce the clauses using functional dependencies, and
-	 * recollect the attributes from the reduced list.
+	 * Check whether there are multivariate stats on the table.
 	 *
-	 * We don't need to select a single statistics for this - we can
-	 * apply all the functional dependencies we have.
-	 */
-	if (bms_num_members(mvattnums) >= 2)
+	 * FIXME This seems not to be working as expected. Sometimes there
+	 *       are multiple relids even when (varRelid==0).
+	 * */
+	if (varRelid == 0)
 	{
-		/*
-		 * fetch info from the catalog (not the serialized stats yet)
-		 *
-		 * TODO This is rather ugly - we get the stats as a list from
-		 *      RelOptInfo (thanks to relcache/syscache), but we transform
-		 *      it into an array (which the other methods use for now).
-		 *      This should not be necessary, I guess.
-		 * */
-		List *stats = root->simple_rel_array[relid]->mvstatlist;
+		/* find the (single) relid */
+		Index	relidx;
+		Relids	relids = pull_varnos((Node*)clauses);
 
-		/* reduce clauses by applying functional dependencies rules */
-		clauses = clauselist_apply_dependencies(root, clauses, varRelid,
-												stats, sjinfo);
+		if (bms_num_members(relids) == 1)
+		{
+			relidx = bms_singleton_member(relids);
+			has_mv_stats
+				= (root->simple_rel_array[relidx]->mvstatlist != NIL);
+		}
+		else
+			has_mv_stats = false;
 	}
+	else
+		has_mv_stats
+			= (root->simple_rel_array[varRelid]->mvstatlist != NIL);
 
-	/*
-	 * Recollect attributes from mv-compatible clauses (maybe we've
-	 * removed so many clauses we have a single mv-compatible attnum).
-	 * From now on we're only interested in MCV-compatible clauses.
-	 */
-	mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-								   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-
-	/*
-	 * If there still are at least two columns, we'll try to select
-	 * a suitable multivariate stats.
-	 */
-	if (bms_num_members(mvattnums) >= 2)
+	/* skip the processing if there are no mv stats */
+	if (has_mv_stats)
 	{
+		conditions_local = list_copy(conditions);
+
 		/*
-		 * fetch info from the catalog (not the serialized stats yet)
-		 *
-		 * TODO We may need to repeat this, because the previous load only
-		 *      happens if there are at least 2 clauses compatible with
-		 *      functional dependencies.
+		 * Collect attributes referenced by mv-compatible clauses (looking
+		 * for clauses compatible with functional dependencies for now).
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   MV_CLAUSE_TYPE_FDEP);
+
+		/*
+		 * If there are mv-compatible clauses, referencing at least two
+		 * different columns (otherwise it makes no sense to use mv stats),
+		 * try to reduce the clauses using functional dependencies, and
+		 * recollect the attributes from the reduced list.
 		 *
-		 * TODO This is rather ugly - we get the stats as a list from
-		 *      RelOptInfo (thanks to relcache/syscache), but we transform
-		 *      it into an array (which the other methods use for now).
-		 *      This should not be necessary, I guess.
-		 * */
-		List *stats = root->simple_rel_array[relid]->mvstatlist;
-
-		/* see choose_mv_statistics() for details */
-		if (stats != NIL)
+		 * We don't need to select a single statistics for this - we can
+		 * apply all the functional dependencies we have.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
 		{
-			MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+			/*
+			 * fetch info from the catalog (not the serialized stats yet)
+			 *
+			 * TODO This is rather ugly - we get the stats as a list from
+			 *      RelOptInfo (thanks to relcache/syscache), but we transform
+			 *      it into an array (which the other methods use for now).
+			 *      This should not be necessary, I guess.
+			 * */
+			List *stats = root->simple_rel_array[relid]->mvstatlist;
+
+			/* reduce clauses by applying functional dependencies rules */
+			clauses = clauselist_apply_dependencies(root, clauses, varRelid,
+													stats, sjinfo);
+		}
 
-			if (mvstat != NULL)	/* we have a matching stats */
+		/*
+		 * Recollect attributes from mv-compatible clauses (maybe we've
+		 * removed so many clauses we have a single mv-compatible attnum).
+		 * From now on we're only interested in MCV-compatible clauses.
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+		/*
+		 * If there still are at least two columns, we'll try to select
+		 * a suitable multivariate stats.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+		{
+			/*
+			 * fetch info from the catalog (not the serialized stats yet)
+			 *
+			 * TODO We may need to repeat this, because the previous load only
+			 *      happens if there are at least 2 clauses compatible with
+			 *      functional dependencies.
+			 *
+			 * TODO This is rather ugly - we get the stats as a list from
+			 *      RelOptInfo (thanks to relcache/syscache), but we transform
+			 *      it into an array (which the other methods use for now).
+			 *      This should not be necessary, I guess.
+			 * */
+			List *stats = root->simple_rel_array[relid]->mvstatlist;
+
+			/* see choose_mv_statistics() for details */
+			if (stats != NIL)
 			{
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
+				int k;
+				ListCell *s;
 
-				/* split the clauselist into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+				List *solution
+					= choose_mv_statistics(root, stats,
+										   clauses, conditions,
+										   varRelid, sjinfo,
+										   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+				/* we have a good solution stats */
+				foreach (s, solution)
+				{
+					MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
 
-				/* we've chosen the histogram to match the clauses */
-				Assert(mvclauses != NIL);
+					/* clauses compatible with multi-variate stats */
+					List	*mvclauses = NIL;
+					List	*mvclauses_new = NIL;
+					List	*mvclauses_conditions = NIL;
+					Bitmapset	*stat_attnums = NULL;
 
-				/* compute the multivariate stats */
-				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+					/* build attnum bitmapset for this statistics */
+					for (k = 0; k < mvstat->stakeys->dim1; k++)
+						stat_attnums = bms_add_member(stat_attnums,
+													  mvstat->stakeys->values[k]);
+
+					/*
+					 * Append the compatible conditions (passed from above)
+					 * to mvclauses_conditions.
+					 */
+					foreach (l, conditions)
+					{
+						Node *c = (Node*)lfirst(l);
+						Bitmapset *tmp = clause_mv_get_attnums(root, c);
+
+						if (bms_is_subset(tmp, stat_attnums))
+							mvclauses_conditions
+								= lappend(mvclauses_conditions, c);
+
+						bms_free(tmp);
+					}
+
+					/* split the clauselist into regular and mv-clauses
+					 *
+					 * We keep the list of clauses (we don't remove the
+					 * clauses yet, because we want to use the clauses
+					 * as conditions of other clauses).
+					 *
+					 * FIXME Do this only once, i.e. filter the clauses
+					 *       once (selecting clauses covered by at least
+					 *       one statistics) and then convert them into
+					 *       smaller per-statistics lists of conditions
+					 *       and estimated clauses.
+					 */
+					clauselist_mv_split(root, sjinfo, clauses,
+											varRelid, &mvclauses, mvstat,
+											(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+					/*
+					 * We've chosen the statistics to match the clauses, so
+					 * each statistics from the solution should have at least
+					 * one new clause (not covered by the previous stats).
+					 */
+					Assert(mvclauses != NIL);
+
+					/*
+					 * Mvclauses now contains only clauses compatible
+					 * with the currently selected stats, but we have to
+					 * split that into conditions (already matched by
+					 * the previous stats), and the new clauses we need
+					 * to estimate using this stats.
+					 */
+					foreach (l, mvclauses)
+					{
+						ListCell *p;
+						bool covered = false;
+						Node  *clause = (Node *) lfirst(l);
+						Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+
+						/*
+						 * If already covered by previous stats, add it to
+						 * conditions.
+						 *
+						 * TODO Maybe this could be relaxed a bit? Because
+						 *      with complex and/or clauses, this might
+						 *      mean no statistics actually covers such
+						 *      complex clause.
+						 */
+						foreach (p, solution)
+						{
+							int k;
+							Bitmapset  *stat_attnums = NULL;
+
+							MVStatisticInfo *prev_stat
+								= (MVStatisticInfo *)lfirst(p);
+
+							/* break if we've ran into current statistic */
+							if (prev_stat == mvstat)
+								break;
+
+							for (k = 0; k < prev_stat->stakeys->dim1; k++)
+								stat_attnums = bms_add_member(stat_attnums,
+															  prev_stat->stakeys->values[k]);
+
+							covered = bms_is_subset(clause_attnums, stat_attnums);
+
+							bms_free(stat_attnums);
+
+							if (covered)
+								break;
+						}
+
+						if (covered)
+							mvclauses_conditions
+								= lappend(mvclauses_conditions, clause);
+						else
+							mvclauses_new
+								= lappend(mvclauses_new, clause);
+					}
+
+					/*
+					 * We need at least one new clause (not just conditions).
+					 */
+					Assert(mvclauses_new != NIL);
+
+					/* compute the multivariate stats */
+					s1 *= clauselist_mv_selectivity(root, mvstat,
+													mvclauses_new,
+													mvclauses_conditions,
+													false); /* AND */
+				}
+
+				/*
+				 * And now finally remove all the mv-compatible clauses.
+				 *
+				 * This only repeats the same split as above, but this
+				 * time we actually use the result list (and feed it to
+				 * the next call).
+				 */
+				foreach (s, solution)
+				{
+					/* clauses compatible with multi-variate stats */
+					List	*mvclauses = NIL;
+
+					MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+					/* split the list into regular and mv-clauses */
+					clauses = clauselist_mv_split(root, sjinfo, clauses,
+											varRelid, &mvclauses, mvstat,
+											(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+					/*
+					 * Add the clauses to the conditions (to be passed
+					 * to regular clauses), irrespectedly whether it
+					 * will be used as a condition or a clause here.
+					 *
+					 * We only keep the remaining conditions in the
+					 * clauses (we keep what clauselist_mv_split returns)
+					 * so we add each MV condition exactly once.
+					 */
+					conditions_local = list_concat(conditions_local, mvclauses);
+				}
 			}
 		}
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+	{
+		Selectivity s = clause_selectivity(root, (Node *) linitial(clauses),
+								  varRelid, jointype, sjinfo,
+								  conditions_local);
+		list_free(conditions_local);
+		return s;
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -338,7 +563,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions_local);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -493,6 +719,293 @@ clauselist_selectivity(PlannerInfo *root,
 		rqlist = rqnext;
 	}
 
+	/* free the local conditions */
+	list_free(conditions_local);
+
+	return s1;
+}
+
+/*
+ * Similar to clauselist_selectivity(), but for clauses connected by OR.
+ *
+ * That means a few differences:
+ *
+ *   - functional dependencies don't apply to OR-clauses
+ *
+ *   - we can't add the previous clauses to conditions
+ *
+ *   - combined selectivity is computed as (s1+s2 - s1*s2) and not as
+ *     a multiplication (s1*s2)
+ *
+ * Another way to evaluate this might be turning
+ *
+ *     (a OR b OR c)
+ *
+ * into
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * and computing selectivity of that using clauselist_selectivity().
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	Selectivity s1 = 0.0;
+	ListCell   *l;
+
+	/* processing mv stats */
+	Index		relid = InvalidOid;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+	bool		has_mv_stats;
+
+	/*
+	 * Check whether there are multivariate stats on the table.
+	 *
+	 * FIXME This seems not to be working as expected. Sometimes there
+	 *       are multiple relids even when (varRelid==0).
+	 * */
+	if (varRelid == 0)
+	{
+		/* find the (single) relid */
+		Index	relidx;
+		Relids	relids = pull_varnos((Node*)clauses);
+
+		if (bms_num_members(relids) == 1)
+		{
+			relidx = bms_singleton_member(relids);
+			has_mv_stats
+				= (root->simple_rel_array[relidx]->mvstatlist != NIL);
+		}
+		else
+			has_mv_stats = false;
+	}
+	else
+		has_mv_stats
+			= (root->simple_rel_array[varRelid]->mvstatlist != NIL);
+
+	if (has_mv_stats)
+	{
+		/*
+		 * Recollect attributes from mv-compatible clauses (maybe we've
+		 * removed so many clauses we have a single mv-compatible attnum).
+		 * From now on we're only interested in MCV-compatible clauses.
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+		/*
+		 * If there still are at least two columns, we'll try to select
+		 * a suitable multivariate stats.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+		{
+			/*
+			 * fetch info from the catalog (not the serialized stats yet)
+			 *
+			 * TODO We may need to repeat this, because the previous load only
+			 *      happens if there are at least 2 clauses compatible with
+			 *      functional dependencies.
+			 *
+			 * TODO This is rather ugly - we get the stats as a list from
+			 *      RelOptInfo (thanks to relcache/syscache), but we transform
+			 *      it into an array (which the other methods use for now).
+			 *      This should not be necessary, I guess.
+			 * */
+			List *stats = root->simple_rel_array[relid]->mvstatlist;
+
+			/* see choose_mv_statistics() for details */
+			if (stats != NIL)
+			{
+				int k;
+				ListCell *s;
+
+				List *solution
+					= choose_mv_statistics(root, stats,
+										   clauses, conditions,
+										   varRelid, sjinfo,
+										   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+				/* we have a good solution stats */
+				foreach (s, solution)
+				{
+					Selectivity s2;
+					MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+					/* clauses compatible with multi-variate stats */
+					List	*mvclauses = NIL;
+					List	*mvclauses_new = NIL;
+					List	*mvclauses_conditions = NIL;
+					Bitmapset	*stat_attnums = NULL;
+
+					/* build attnum bitmapset for this statistics */
+					for (k = 0; k < mvstat->stakeys->dim1; k++)
+						stat_attnums = bms_add_member(stat_attnums,
+													  mvstat->stakeys->values[k]);
+
+					/*
+					 * Append the compatible conditions (passed from above)
+					 * to mvclauses_conditions.
+					 */
+					foreach (l, conditions)
+					{
+						Node *c = (Node*)lfirst(l);
+						Bitmapset *tmp = clause_mv_get_attnums(root, c);
+
+						if (bms_is_subset(tmp, stat_attnums))
+							mvclauses_conditions
+								= lappend(mvclauses_conditions, c);
+
+						bms_free(tmp);
+					}
+
+					/* split the clauselist into regular and mv-clauses
+					 *
+					 * We keep the list of clauses (we don't remove the
+					 * clauses yet, because we want to use the clauses
+					 * as conditions of other clauses).
+					 *
+					 * FIXME Do this only once, i.e. filter the clauses
+					 *       once (selecting clauses covered by at least
+					 *       one statistics) and then convert them into
+					 *       smaller per-statistics lists of conditions
+					 *       and estimated clauses.
+					 */
+					clauselist_mv_split(root, sjinfo, clauses,
+											varRelid, &mvclauses, mvstat,
+											(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+					/*
+					 * We've chosen the statistics to match the clauses, so
+					 * each statistics from the solution should have at least
+					 * one new clause (not covered by the previous stats).
+					 */
+					Assert(mvclauses != NIL);
+
+					/*
+					 * Mvclauses now contains only clauses compatible
+					 * with the currently selected stats, but we have to
+					 * split that into conditions (already matched by
+					 * the previous stats), and the new clauses we need
+					 * to estimate using this stats.
+					 *
+					 * XXX We'll only use the new clauses, but maybe we
+					 *     should use the conditions too, somehow. We can't
+					 *     use that directly in conditional probability, but
+					 *     maybe we might use them in a different way?
+					 *
+					 *     If we have a clause (a OR b OR c), then knowing
+					 *     that 'a' is TRUE means (b OR c) can't make the
+					 *     whole clause FALSE.
+					 *
+					 *     This is pretty much what
+					 *
+					 *         (a OR b) == NOT ((NOT a) AND (NOT b))
+					 *
+					 *     implies.
+					 */
+					foreach (l, mvclauses)
+					{
+						ListCell *p;
+						bool covered = false;
+						Node  *clause = (Node *) lfirst(l);
+						Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+
+						/*
+						 * If already covered by previous stats, add it to
+						 * conditions.
+						 *
+						 * TODO Maybe this could be relaxed a bit? Because
+						 *      with complex and/or clauses, this might
+						 *      mean no statistics actually covers such
+						 *      complex clause.
+						 */
+						foreach (p, solution)
+						{
+							int k;
+							Bitmapset  *stat_attnums = NULL;
+
+							MVStatisticInfo *prev_stat
+								= (MVStatisticInfo *)lfirst(p);
+
+							/* break if we've ran into current statistic */
+							if (prev_stat == mvstat)
+								break;
+
+							for (k = 0; k < prev_stat->stakeys->dim1; k++)
+								stat_attnums = bms_add_member(stat_attnums,
+															  prev_stat->stakeys->values[k]);
+
+							covered = bms_is_subset(clause_attnums, stat_attnums);
+
+							bms_free(stat_attnums);
+
+							if (covered)
+								break;
+						}
+
+						if (! covered)
+							mvclauses_new = lappend(mvclauses_new, clause);
+					}
+
+					/*
+					 * We need at least one new clause (not just conditions).
+					 */
+					Assert(mvclauses_new != NIL);
+
+					/* compute the multivariate stats */
+					s2 = clauselist_mv_selectivity(root, mvstat,
+													mvclauses_new,
+													mvclauses_conditions,
+													true); /* OR */
+
+					s1 = s1 + s2 - s1 * s2;
+				}
+
+				/*
+				 * And now finally remove all the mv-compatible clauses.
+				 *
+				 * This only repeats the same split as above, but this
+				 * time we actually use the result list (and feed it to
+				 * the next call).
+				 */
+				foreach (s, solution)
+				{
+					/* clauses compatible with multi-variate stats */
+					List	*mvclauses = NIL;
+
+					MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+					/* split the list into regular and mv-clauses */
+					clauses = clauselist_mv_split(root, sjinfo, clauses,
+											varRelid, &mvclauses, mvstat,
+											(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+				}
+			}
+		}
+	}
+
+	/*
+	 * Handle the remaining clauses (either using regular statistics,
+	 * or by multivariate stats at the next level).
+	 */
+	foreach(l, clauses)
+	{
+		Selectivity s2 = clause_selectivity(root,
+											(Node *) lfirst(l),
+											varRelid,
+											jointype,
+											sjinfo,
+											conditions);
+		s1 = s1 + s2 - s1 * s2;
+	}
+
 	return s1;
 }
 
@@ -703,7 +1216,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -833,7 +1347,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -842,29 +1357,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -973,7 +1477,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -982,7 +1487,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 
 	/* Cache the result if possible */
@@ -1164,9 +1670,67 @@ clause_selectivity(PlannerInfo *root,
  *      that from the most selective clauses first, because that'll
  *      eliminate the buckets/items sooner (so we'll be able to skip
  *      them without inspection, which is more expensive).
+ *
+ * TODO All this is based on the assumption that the statistics represent
+ *      the necessary dependencies, i.e. that if two colunms are not in
+ *      the same statistics, there's no dependency. If that's not the
+ *      case, we may get misestimates, just like before. For example
+ *      assume we have a table with three columns [a,b,c] with exactly
+ *      the same values, and statistics on [a,b] and [b,c]. So somthing
+ *      like this:
+ *
+ *          CREATE TABLE test AS SELECT i, i, i
+                                  FROM generate_series(1,1000);
+ *
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (a,b);
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (b,c);
+ *
+ *          ANALYZE test;
+ *
+ *          EXPLAIN ANALYZE SELECT * FROM test
+ *                    WHERE (a < 10) AND (b < 20) AND (c < 10);
+ *
+ *      The problem here is that the only shared column between the two
+ *      statistics is 'b' so the probability will be computed like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (a < 10) & (b < 20)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (b < 20)]
+ *
+ *      or like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20)]
+ *
+ *      In both cases the conditional probabilities will be evaluated as
+ *      0.5, because they lack the other column (which would make it 1.0).
+ *
+ *      Theoretically it might be possible to transfer the dependency,
+ *      e.g. by building bitmap for [a,b] and then combine it with [b,c]
+ *      by doing something like this:
+ *
+ *          1) build bitmap on [a,b] using [(a<10) & (b < 20)]
+ *          2) for each element in [b,c] check the bitmap
+ *
+ *      But that's certainly nontrivial - for example the statistics may
+ *      be different (MCV list vs. histogram) and/or the items may not
+ *      match (e.g. MCV items or histogram buckets will be built
+ *      differently). Also, for one value of 'b' there might be multiple
+ *      MCV items (because of the other column values) with different
+ *      bitmap values (some will match, some won't) - so it's not exactly
+ *      bitmap but a partial match.
+ *
+ *      Maybe a hash table with number of matches and mismatches (or
+ *      maybe sums of frequencies) would work? The step (2) would then
+ *      lookup the values and use that to weight the item somehow.
+ * 
+ *      Currently the only solution is to build statistics on all three
+ *      columns.
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -1184,7 +1748,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -1197,7 +1762,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
 	 *       selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1226,24 +1792,683 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	{
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result here - we only need the attnums */
-		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
-								sjinfo, types);
-	}
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+								sjinfo, types);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * Selects the best combination of multivariate statistics, where
+ * 'best' means:
+ *
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ *
+ * There may be other optimality criteria, not considered in the initial
+ * implementation (more on that 'weaknesses' section).
+ *
+ * This is pretty much equal to splitting the probability of clauses
+ * (aka selectivity) into a sequence of conditional probabilities, like
+ * this
+ *
+ *    P(A,B,C,D) = P(A,B) * P(C|A,B) * P(D|A,B,C)
+ *
+ * and removing the attributes not referenced by the existing stats,
+ * under the assumption that there's no dependency (otherwise the DBA
+ * would create the stats).
+ *
+ *
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with
+ * maximum 'depth' equal to the number of multi-variate statistics
+ * available on the table.
+ *
+ * It explores all the possible permutations of the stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it
+ * matches are divided into 'conditions' (clauses already matched by at
+ * least one previous statistics) and clauses that are estimated.
+ *
+ * Then several checks are performed:
+ *
+ *  (a) The statistics covers at least 2 columns, referenced in the
+ *      estimated clauses (otherwise multi-variate stats are useless).
+ *
+ *  (b) The statistics covers at least 1 new column, i.e. column not
+ *      refefenced by the already used stats (and the new column has
+ *      to be referenced by the clauses, of couse). Otherwise the
+ *      statistics would not add any new information.
+ *
+ * There are some other sanity checks (e.g. that the stats must not be
+ * used twice etc.).
+ *
+ * Finally the new solution is compared to the currently best one, and
+ * if it's considered better, it's used instead.
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a somewhat simple optimality criteria,
+ * suffering by the following weaknesses.
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but
+ *     with statistics in a different order). It's unclear which solution
+ *     is the best one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those
+ *      solutions, and then combine them to get the final estimate
+ *      (e.g. by using average or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for
+ *     some types of clauses (e.g. MCV list is a good match for equality
+ *     than a histogram).
+ *
+ *     XXX Maybe MCV is almost always better / more accurate?
+ *
+ *     But maybe this is pointless - generally, each column is either
+ *     a label (it's not important whether because of the data type or
+ *     how it's used), or a value with ordering that makes sense. So
+ *     either a MCV list is more appropriate (labels) or a histogram
+ *     (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing columns of
+ *     both types - maybe it'd be beeter to invent a new type of stats
+ *     combining MCV list and histogram (keeping a small histogram for
+ *     each MCV item, and a separate histogram for values not on the
+ *     MCV list). But that's not implemented at this moment.
+ *
+ * (c) Does not consider that some solutions may better exploit the
+ *     dependencies. For example with clauses on columns [A,B,C,D] and
+ *     statistics on [A,B,C] and [C,D] cover all the columns just like
+ *     [A,B,C] and [B,C,D], but the latter probably exploits additional
+ *     dependencies thanks to having 'B' in both stats (thus allowing
+ *     using it as a condition for the second stats). Of course, if
+ *     B and [C,D] are independent, this is untrue - but if we have that
+ *     statistics created, it's a sign that the DBA/developer believes
+ *     there's a dependency.
+ *
+ * (d) Does not consider the order of clauses, which may be significant.
+ *     For example, when there's a mix of simple and complex clauses,
+ *     i.e. something like
+ *
+ *       (a=2) AND (b=3 OR (c=3 AND d=4)) AND (c=3)
+ *
+ *     It may be better to evaluate the simple clauses first, and then
+ *     use them as conditions for the complex clause.
+ *
+ *     We can for example count number of different attributes
+ *     referenced in the clause, and use that as a metric of complexity
+ *     (lower number -> simpler). Maybe use ratio (#vars/#atts) or
+ *     (#clauses/#atts) as secondary metrics? Also the general complexity
+ *     of the clause (levels of nesting etc.) might be useful.
+ *
+ *     Hopefully most clauses will be reasonably simple, though.
+ *
+ *     Update: On second thought, I believe the order of clauses is
+ *     determined by choosing the order of statistics, and therefore
+ *     optimized by the current algorithm.
+ *
+ * TODO Consider adding a counter of attributes covered by previous
+ *      stats (possibly tracking the number of how many stats reference
+ *      it too), and use this 'dependency_count' when selecting the best
+ *      solution (not sure how). Similarly to (a) it might be possible
+ *      to build estimate for each solution (different criteria) and then
+ *      combine them somehow.
+ *
+ * TODO The current implementation repeatedly walks through the previous
+ *      stats, just to compute the number of covered attributes over and
+ *      over. With non-trivial number of statistics this might be an
+ *      issue, so maybe we should keep track of 'covered' attributes by
+ *      each step, so that we can get rid of this. We'll need this
+ *      information anyway (when splitting clauses into condition and
+ *      the estimated part).
+ *
+ * TODO This needs to consider the conditions passed from the preceding
+ *      and upper clauses (in complex cases), but only as conditions
+ *      and not as estimated clauses. So it needs to somehow affect the
+ *      score (the more conditions we use the better).
+ *
+ * TODO The algorithm should probably count number of Vars (not just
+ *      attnums) when computing the 'score' of each solution. Computing
+ *      the ratio of (num of all vars) / (num of condition vars) as a
+ *      measure of how well the solution uses conditions might be
+ *      useful.
+ *
+ * TODO This might be much easier if we kept Bitmapset of attributes
+ *      covered by the stats up to that step.
+ *
+ * FIXME When comparing the solutions, we currently use this condition:
+ *
+ *       ((current->nstats > (*best)->nstats))
+ *
+ *       i.e. we're choosing solution with more stats, because with
+ *       clauses
+ *
+ *           (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ *       and stats on [a,b], [b,c], [c,d] we want to choose the solution
+ *       with all three stats, and not just [a,b], [c,d]. Otherwise we'd
+ *       fail to exploit one of the dependencies.
+ *
+ *       This is however a workaround for another issue - we're not
+ *       tracking number of 'dependencies' covered by the solution, only
+ *       number of clauses, and that's the same for both solutions.
+ *       ([a,b], [c,d]) and ([a,b], [b,c], [c,d]) both cover all 4 clauses.
+ *
+ *       Once a suitable metric is added, we want to choose the solution
+ *       with less stats, assuming it covers the same number of clauses
+ *       and exploits the same number of dependencies.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		/* we can't get more conditions that clauses and conditions combined
+		 *
+		 * FIXME This assert does not work because we count the conditions
+		 *       repeatedly (once for each statistics covering it).
+		 */
+		/* Assert((nconditions + nclauses) >= current->nconditions); */
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics
+ * covering the clauses. This chooses the "best" statistics at each step,
+ * so the resulting solution may not be the best solution globally, but
+ * this produces the solution in only N steps (where N is the number of
+ * statistics), while the exhaustive approach may have to walk through
+ * ~N! combinations (although some of those are terminated early).
+ *
+ * TODO There are probably other metrics we might use - e.g. using
+ *      number of columns (num_cond_columns / num_cov_columns), which
+ *      might work better with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled
+ *      in a special way, because there will be 0 conditions at that
+ *      moment, so there needs to be some other criteria - e.g. using
+ *      the simplest (or most complex?) clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria,
+ *      and branch the search. This is however tricky, because if we
+ *      choose k statistics at each step, we get k^N branches to
+ *      walk through (with N steps). That's not really good with
+ *      large number of stats (yet better than exhaustive search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *new = bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					new = bms_union(attnums_conditions,
+												  clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
 
-	/*
-	 * If there are not at least two attributes referenced by the clause(s),
-	 * we can throw everything out (as we'll revert to simple stats).
-	 */
-	if (bms_num_members(attnums) <= 1)
-	{
-		if (attnums != NULL)
-			pfree(attnums);
-		attnums = NULL;
-		*relid = InvalidOid;
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
 	}
 
-	return attnums;
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
 }
 
 /*
@@ -1314,56 +2539,498 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
  * TODO This will probably have to consider compatibility of clauses,
  *      because 'dependencies' will probably work only with equality
  *      clauses.
+ *
+ * TODO Another way to make the optimization problems smaller might
+ *      be splitting the statistics into several disjoint subsets, i.e.
+ *      if we can split the graph of statistics (after the elimination)
+ *      into multiple components (so that stats in different components
+ *      share no attributes), we can do the optimization for each
+ *      component separately.
+ *
+ * TODO Another possible optimization might be removing redundant
+ *      statistics - if statistics S1 covers S2 (covers S2 attributes
+ *      and possibly some more), we can probably remove S2. What
+ *      actually matters are attributes from covered clauses (not all
+ *      the original attributes). This might however prefer larger,
+ *      and thus less accurate, statistics.
+ *
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew
+ *      that we can cover 10 clauses and reuse 8 dependencies, maybe
+ *      covering 9 clauses and 7 dependencies would be OK?
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static List*
+choose_mv_statistics(PlannerInfo *root, List *stats,
+					 List *clauses, List *conditions,
+					 Oid varRelid, SpecialJoinInfo *sjinfo, int type)
 {
-	int i;
-	ListCell   *lc;
+	int i, j;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+	ListCell *l;
+
+	int nmvstats = list_length(stats);
+	MVStatisticInfo *mvstats
+		= (MVStatisticInfo *)palloc0(nmvstats * sizeof(MVStatisticInfo));
+
+	/* pass only stats matching at least two attributes (from clauses) */
+	MVStatisticInfo *mvstats_filtered
+		= (MVStatisticInfo*)palloc0(nmvstats * sizeof(MVStatisticInfo));
 
-	MVStatisticInfo *choice = NULL;
+	int		nmvstats_filtered;
+	bool	repeat = true;
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
+
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
+
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+
+	/* convert the list of stats into array, to make it easier/faster */
+	nmvstats = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(l);
+
+		/* we only care about stats with MCV/histogram in this part */
+		if (! (info->mcv_built || info->hist_built))
+			continue;
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+		memcpy(&mvstats[nmvstats], info, sizeof(MVStatisticInfo));
+		nmvstats++;
+	}
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements)
-	 * and for each one count the referenced attributes (encoded in
-	 * the 'attnums' bitmap).
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
 	 */
-	foreach (lc, stats)
+	while (repeat)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+		/* pass only mv-compatible clauses covered by at least one statistics */
+		List *compatible_clauses = NIL;
+		List *compatible_conditions = NIL;
 
-		/* columns matching this statistics */
-		int matches = 0;
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
+
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new.
+		 */
+		foreach (l, clauses)
+		{
+			Node *clause = (Node*)lfirst(l);
+			Bitmapset *clause_attnums = NULL;
+			Index relid;
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+			/*
+			 * The clause has to be mv-compatible (suitable operators etc.).
+			 */
+			if (! clause_is_mv_compatible(root, clause, varRelid,
+							 &relid, &clause_attnums, sjinfo, type))
+				continue;
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
-			continue;
+			/* is there a statistics covering this clause? */
+			for (i = 0; i < nmvstats; i++)
+			{
+				int k, matches = 0;
+				for (k = 0; k < mvstats[i].stakeys->dim1; k++)
+				{
+					if (bms_is_member(mvstats[i].stakeys->values[k],
+									  clause_attnums))
+						matches += 1;
+				}
+
+				/*
+				 * The clause is compatible if all attributes it references
+				 * are covered by the statistics.
+				 */
+				if (bms_num_members(clause_attnums) == matches)
+				{
+					compatible_attnums = bms_union(compatible_attnums,
+												   clause_attnums);
+					compatible_clauses = lappend(compatible_clauses,
+												 clause);
+					break;
+				}
+			}
+
+			bms_free(clause_attnums);
+		}
+
+		/* we can't have more compatible clauses that source clauses */
+		Assert(list_length(clauses) >= list_length(compatible_clauses));
+
+		/* work with only compatible clauses from now */
+		list_free(clauses);
+		clauses = compatible_clauses;
+
+		/*
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new.
+		 */
+
+		/* next, generate bitmap of attnums from all mv_compatible conditions */
+		foreach (l, conditions)
+		{
+			Node *clause = (Node*)lfirst(l);
+			Bitmapset *clause_attnums = NULL;
+			Index relid;
+
+			/*
+			 * The clause has to be mv-compatible (suitable operators etc.).
+			 */
+			if (! clause_is_mv_compatible(root, clause, varRelid,
+							 &relid, &clause_attnums, sjinfo, type))
+				continue;
+
+			/* is there a statistics covering this clause? */
+			for (i = 0; i < nmvstats; i++)
+			{
+				int k, matches = 0;
+				for (k = 0; k < mvstats[i].stakeys->dim1; k++)
+				{
+					if (bms_is_member(mvstats[i].stakeys->values[k],
+									  clause_attnums))
+						matches += 1;
+				}
+
+				if (bms_num_members(clause_attnums) == matches)
+				{
+					condition_attnums = bms_union(condition_attnums,
+												  clause_attnums);
+					compatible_conditions = lappend(compatible_conditions,
+													clause);
+					break;
+				}
+			}
+
+			bms_free(clause_attnums);
+		}
+
+		/* we can't have more compatible conditions than source conditions */
+		Assert(list_length(conditions) >= list_length(compatible_conditions));
+
+		/* keep only compatible clauses */
+		list_free(conditions);
+		conditions = compatible_conditions;
+
+		/* get a union of attnums (from conditions and clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		nmvstats_filtered = 0;
+
+		for (i = 0; i < nmvstats; i++)
+		{
+			int k;
+			int matches_new = 0,
+				matches_all = 0;
+
+			for (k = 0; k < mvstats[i].stakeys->dim1; k++)
+			{
+				/* attribute covered by new clause(s) */
+				if (bms_is_member(mvstats[i].stakeys->values[k],
+								  compatible_attnums))
+					matches_new += 1;
+
+				/* attribute covered by clause(s) or ondition(s) */
+				if (bms_is_member(mvstats[i].stakeys->values[k],
+								  all_attnums))
+					matches_all += 1;
+			}
+
+			/* check we have enough attributes for this statistics */
+			if ((matches_new >= 1) && (matches_all >= 2))
+			{
+				mvstats_filtered[nmvstats_filtered] = mvstats[i];
+				nmvstats_filtered += 1;
+			}
+		}
+
+		/* we can't have more useful stats than we had originally */
+		Assert(nmvstats >= nmvstats_filtered);
+
+		/* if we've eliminated a statistics, trigger another round */
+		repeat = (nmvstats > nmvstats_filtered);
+
+		/* work only with filtered statistics from now */
+		if (nmvstats_filtered < nmvstats)
+		{
+			nmvstats = nmvstats_filtered;
+			memcpy(mvstats, mvstats_filtered, sizeof(MVStatisticInfo)*nmvstats);
+			nmvstats_filtered = 0;
+		}
+	}
+
+	/* only do the optimization if we have clauses/statistics */
+	if ((nmvstats == 0) || (list_length(clauses) == 0))
+		return NULL;
+
+	stats_attnums
+		= (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset *));
+
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i] = bms_add_member(stats_attnums[i],
+										mvstats[i].stakeys->values[j]);
+	}
+
+	/*
+	 * Now let's remove redundant statistics, covering the same attributes
+	 * as some other stats, when restricted to the attributes from
+	 * remaining clauses.
+	 *
+	 * When a redundancy is detected, we simply keep the smaller
+	 * statistics (less number of columns), on the assumption that it's
+	 * more accurate and faster to process. That might be incorrect for
+	 * two reasons - first, the accuracy really depends on number of
+	 * buckets/MCV items, not the number of columns. Second, we might
+	 * prefer MCV lists over histograms or something like that.
+	 *
+	 * XXX This might be done in the while loop above, but it does not
+	 *     change the result at all (or is not supposed to), so let's do
+	 *     that only once.
+	 */
+	{
+		/* by default, none of the stats is redundant */
+		bool *redundant = palloc0(nmvstats * sizeof(bool));
+
+		/* we only expect a single varno here */
+		Relids		varnos = pull_varnos((Node*)clauses);
+
+		/* get the varattnos (skip system attributes, although that
+		 * should be impossible thanks to previous filtering out of
+		 * incompatible clauses) */
+		Bitmapset  *varattnos = get_varattnos((Node*)clauses,
+									bms_singleton_member(varnos));
+
+		for (i = 1; i < nmvstats; i++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
+
+			/* walk through 'previous' stats and check redundancy */
+			for (j = 0; j < i; j++)
+			{
+				/* intersect with current statistics */
+				Bitmapset *prev;
+
+				/* skip stats already identified as redundant */
+				if (redundant[j])
+					continue;
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
+				prev = bms_intersect(stats_attnums[j], varattnos);
+
+				switch (bms_subset_compare(curr, prev))
+				{
+					case BMS_EQUAL:
+						/*
+						 * Use the smaller one (hopefully more accurate).
+						 * If both have the same size, use the first one.
+						 */
+						if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+							redundant[i] = TRUE;
+						else
+							redundant[j] = TRUE;
+
+						break;
+
+					case BMS_SUBSET1: /* curr is subset of prev */
+						redundant[i] = TRUE;
+						break;
+
+					case BMS_SUBSET2: /* prev is subset of curr */
+						redundant[j] = TRUE;
+						break;
+
+					case BMS_DIFFERENT:
+						/* do nothing - keep both stats */
+						break;
+				}
+
+				bms_free(prev);
+			}
+
+			bms_free(curr);
+		}
+
+		/* now, let's remove the reduced statistics from the arrays */
+		j = 0;
+		for (i = 0; i < nmvstats; i++)
+		{
+			if (redundant[i])
+				continue;
+
+			stats_attnums[j] = stats_attnums[i];
+			mvstats[j] = mvstats[i];
+
+			j++;
+		}
+
+		nmvstats = j;
+	}
+
+	/* collect clauses an bitmap of attnums */
+	nclauses = 0;
+	clauses_attnums = (Bitmapset **)palloc0(list_length(clauses)
+											* sizeof(Bitmapset *));
+	clauses_array = (Node **)palloc0(list_length(clauses)
+									 * sizeof(Node *));
+
+	foreach (l, clauses)
+	{
+		Index relid;
+		Bitmapset * attnums = NULL;
 
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * The clause has to be mv-compatible (suitable operators etc.).
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		if (! clause_is_mv_compatible(root, (Node *)lfirst(l), varRelid,
+						 &relid, &attnums, sjinfo, type))
+			elog(ERROR, "should not get non-mv-compatible cluase");
+
+		clauses_attnums[nclauses] = attnums;
+		clauses_array[nclauses] = (Node *)lfirst(l);
+		nclauses += 1;
+	}
+
+	/* collect conditions and bitmap of attnums */
+	nconditions = 0;
+	conditions_attnums = (Bitmapset **)palloc0(list_length(conditions)
+												* sizeof(Bitmapset *));
+	conditions_array = (Node **)palloc0(list_length(conditions)
+													* sizeof(Node *));
+
+	foreach (l, conditions)
+	{
+		Index relid;
+		Bitmapset * attnums = NULL;
+
+		/* conditions are mv-compatible (thanks to the reduction) */
+		if (! clause_is_mv_compatible(root, (Node *)lfirst(l), varRelid,
+						 &relid, &attnums, sjinfo, type))
+			elog(ERROR, "should not get non-mv-compatible cluase");
+
+		conditions_attnums[nconditions] = attnums;
+		conditions_array[nconditions] = (Node *)lfirst(l);
+		nconditions += 1;
+	}
+
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map	= (bool*)palloc0(nclauses * nmvstats);
+	condition_cover_map	= (bool*)palloc0(nconditions * nmvstats);
+	ruled_out 			= (int*)palloc0(nmvstats * sizeof(int));
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		ruled_out[i] = -1;	/* not ruled out by default */
+		for (j = 0; j < nclauses; j++)
+		{
+			clause_cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j],
+								stats_attnums[i]);
+		}
+
+		for (j = 0; j < nconditions; j++)
+		{
+			condition_cover_map[i * nconditions + j]
+				= bms_is_subset(conditions_attnums[j],
+								stats_attnums[i]);
+		}
+	}
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* maybe we should leave the cleanup up to the memory context */
+	pfree(mvstats_filtered);
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(clauses_array);
+	pfree(conditions_attnums);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
 		}
+		pfree(best);
 	}
 
-	return choice;
+	pfree(mvstats);
+
+	return result;
 }
 
 
@@ -1639,6 +3306,51 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 	return false;
 }
 
+
+static Bitmapset *
+clause_mv_get_attnums(PlannerInfo *root, Node *clause)
+{
+	Bitmapset * attnums = NULL;
+
+	/* Extract clause from restrict info, if needed. */
+	if (IsA(clause, RestrictInfo))
+		clause = (Node*)((RestrictInfo*)clause)->clause;
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+
+		if (IsA(linitial(expr->args), Var))
+			attnums = bms_add_member(attnums,
+							((Var*)linitial(expr->args))->varattno);
+		else
+			attnums = bms_add_member(attnums,
+							((Var*)lsecond(expr->args))->varattno);
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		attnums = bms_add_member(attnums,
+							((Var*)((NullTest*)clause)->arg)->varattno);
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			attnums = bms_join(attnums,
+						clause_mv_get_attnums(root, (Node*)lfirst(l)));
+		}
+	}
+
+	return attnums;
+}
+
 /*
  * Performs reduction of clauses using functional dependencies, i.e.
  * removes clauses that are considered redundant. It simply walks
@@ -2071,22 +3783,26 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2097,17 +3813,44 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/* conditions (always AND-connected) */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
+	/* by default all the MCV items match the clauses fully (AND) or
+	 * not at all (OR) */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (is_or)
+		memset(matches, MVSTATS_MATCH_NONE, sizeof(char)*nmatches);
+	else
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
@@ -2115,14 +3858,25 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 		/* used to 'scale' for MCV lists not covering all tuples */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2520,13 +4274,16 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2539,36 +4296,77 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
 	 * Bitmap of bucket matches (mismatch, partial, full). by default
 	 * all buckets fully match (and we'll eliminate them).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
 
-	nmatches = mvhist->nbuckets;
+	if (is_or)
+		memset(matches, MVSTATS_MATCH_NONE, sizeof(char)*nmatches);
+	else
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/* build the match bitmap for the conditions */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, is_or);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t);
 }
 
 /*
@@ -3191,11 +4989,35 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 		}
 	}
 
-	elog(WARNING, "calls=%d hits=%d hit ratio %.2f",
-				  calls, hits, hits * 100.0 / calls);
+//	elog(WARNING, "calls=%d hits=%d hit ratio %.2f",
+//				  calls, hits, hits * 100.0 / calls);
 
 	/* free the call cache */
 	pfree(callcache);
 
 	return nmatches;
 }
+
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1a0d358..71beb2e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3280,7 +3280,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3303,7 +3304,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3470,7 +3472,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3506,7 +3508,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3543,7 +3546,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3681,12 +3685,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3698,7 +3704,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index f0acc14..e41508b 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4dd3f9f..326dd36 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1580,13 +1580,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6196,7 +6198,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6521,7 +6524,8 @@ btcostestimate(PG_FUNCTION_ARGS)
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7264,7 +7268,8 @@ gincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7496,7 +7501,7 @@ brincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8727ee3..bd2c7a9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -393,6 +394,15 @@ static const struct config_enum_entry row_security_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3648,6 +3658,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..7a3835b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -182,11 +182,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 1cb9400..6909294 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -16,6 +16,14 @@
 
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
1.9.3

0006-teach-expression-walker-about-RestrictInfo-because-o.patchtext/x-patch; name=0006-teach-expression-walker-about-RestrictInfo-because-o.patchDownload

>From 08f19b674c35127d9c8a8f2cfa371fbf3c80ff00 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 6/6] teach expression walker about RestrictInfo (because of
 pull_varnos)

---
 src/backend/nodes/nodeFuncs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index d6f1f5b..843f06d 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1933,6 +1933,8 @@ expression_tree_walker(Node *node,
 			return walker(((PlaceHolderInfo *) node)->ph_var, context);
 		case T_RangeTblFunction:
 			return walker(((RangeTblFunction *) node)->funcexpr, context);
+		case T_RestrictInfo:
+			return walker(((RestrictInfo *) node)->clause, context);
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) nodeTag(node));
-- 
1.9.3

#35

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Tomas Vondra (#34)

2 attachment(s)

Re: multivariate statistics / patch v6

Hello, this might be somewhat out of place but strongly related
to this patch so I'll propose this here.

This is a proposal of new feature for this patch or asking for
your approval for my moving on this as a different (but very
close) project.

===

Attached is v6 of the multivariate stats, with a number of
improvements:

...

2) fix of pg_proc issues (reported by Jeff)

3) rebase to current master

Unfortunately, the v6 patch suffers some system oid conflicts
with recently added ones. And what more unfortunate for me is
that the code for functional dependencies looks undone:)

I mention this because I recently had a issue from strong
correlation between two columns in dbt3 benchmark. Two columns in
some table are in strong correlation but not in functional
dependencies, there are too many values and the distribution of
them is very uniform so MCV is no use for the table (histogram
has nothing to do with equal conditions). As the result, planner
estimates the number of rows largely wrong as expected especially
for joins.

I, then, had a try calculating the ratio between the product of
distinctness of every column and the distinctness of the set of
the columns, call it multivariate coefficient here, and found
that it looks greately useful for the small storage space, less
calculation, and simple code.

The attached first is a script to generate problematic tables.
And the second is a patch to make use of the mv coef on current
master. The patch is a very primitive POC so no syntactical
interfaces involved.

For the case of your first example,

=# create table t (a int, b int, c int);
=# insert into t (select a/10000, a/10000, a/10000
from generate_series(0, 999999) a);
=# analyze t;
=# explain analyze select * from t where a = 1 and b = 1 and c = 1;
Seq Scan on t (cost=0.00..22906.00 rows=1 width=12)
(actual time=3.878..250.628 rows=10000 loops=1)

Make use of mv coefficient.

=# insert into pg_mvcoefficient values ('t'::regclass, 1, 2, 3, 0);
=# analyze t;
=# explain analyze select * from t where a = 1 and b = 1 and c = 1;
Seq Scan on t (cost=0.00..22906.00 rows=9221 width=12)
(actual time=3.740..242.330 rows=10000 loops=1)

Row number estimation was largely improved.

Well, my example,

$ perl gentbl.pl 10000 | psql postgres
$ psql postgres
=# explain analyze select * from t1 where a = 1 and b = 2501;
Seq Scan on t1 (cost=0.00..6216.00 rows=1 width=8)
(actual time=0.030..66.005 rows=8 loops=1)

=# explain analyze select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b);
Hash Join (cost=1177.00..11393.76 rows=76 width=16)
(actual time=29.811..322.271 rows=320000 loops=1)

Too bad estimate for the join.

=# insert into pg_mvcoefficient values ('t1'::regclass, 1, 2, 0, 0);
=# analyze t1;
=# explain analyze select * from t1 where a = 1 and b = 2501;
Seq Scan on t1 (cost=0.00..6216.00 rows=8 width=8)
(actual time=0.032..104.144 rows=8 loops=1)

=# explain analyze select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b);
Hash Join (cost=1177.00..11393.76 rows=305652 width=16)
(actual time=40.642..325.679 rows=320000 loops=1)

It gives almost correct estimations.

I think the result above shows that the multivariate coefficient
is significant to imporove estimates when correlated colums are
involved.

Would you consider this in your patch? Otherwise I move on this
as a different project from yours if you don't mind. Except user
interface won't conflict with yours, I suppose. But finally they
should need some labor of consolidation.

regards,

1) fix of the contrib compile-time errors (reported by Jeff)

2) fix of pg_proc issues (reported by Jeff)

3) rebase to current master

4) fix a bunch of issues in the previous patches, due to referencing
some parts too early (e.g. histograms in the first patch, etc.)

5) remove the explicit DELETEs from pg_mv_statistic (in the regression
tests), this is now handled automatically by DROP TABLE etc.

6) number of performance optimizations in selectivity estimations:

(a) minimize calls to get_oprrest, significantly reducing
syscache calls

(b) significant reduction of palloc overhead in deserialization of
MCV lists and histograms

(c) use more compact serialized representation of MCV lists and
histograms, often removing ~50% of the size

(d) use histograms with limited deserialization, which also allows
caching function calls

(e) modified histogram bucket partitioning, resulting in more even
bucket distribution (i.e. producing buckets with more equal
density and about equal size of each dimension)

7) add functions for listing MCV list items and histogram buckets:

- pg_mv_mcvlist_items(oid)
- pg_mv_histogram_buckets(oid, type)

This is quite useful when analyzing the MCV lists / histograms.

8) improved support for OR clauses

9) allow calling pull_varnos() on expression trees containing
RestrictInfo nodes (not sure if this is the right fix, it's being
discussed in another thread)

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

gentbl.pltext/plain; charset=us-asciiDownload

mvcoef-poc-20150513.patchtext/x-patch; charset=us-asciiDownload

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 37d05d1..d00835e 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -33,7 +33,8 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
-	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
+	pg_cast.h pg_enum.h pg_mvcoefficient.h pg_namespace.h pg_conversion.h \
+	pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
 	pg_authid.h pg_auth_members.h pg_shdepend.h pg_shdescription.h \
 	pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 15ec0ad..9edaa0f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mvcoefficient.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -45,7 +46,9 @@
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
+#include "utils/catcache.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -110,6 +113,12 @@ static void update_attstats(Oid relid, bool inh,
 				int natts, VacAttrStats **vacattrstats);
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
+static float4 compute_mv_distinct(int nattrs,
+								  int *stacolnums,
+								  VacAttrStats **stats,
+								  AnalyzeAttrFetchFunc fetchfunc,
+								  int samplerows,
+								  double totalrows);
 
 
 /*
@@ -552,6 +561,92 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			MemoryContextResetAndDeleteChildren(col_context);
 		}
 
+		/* Compute multivariate distinctness if ordered */
+		{
+			ScanKeyData	scankey;
+			SysScanDesc	sysscan;
+			Relation	mvcrel;
+			HeapTuple	oldtup, newtup;
+			int			i;
+
+			mvcrel = heap_open(MvCoefficientRelationId, RowExclusiveLock);
+
+			ScanKeyInit(&scankey,
+						Anum_pg_mvcoefficient_mvcreloid,
+						BTEqualStrategyNumber, F_OIDEQ,
+						ObjectIdGetDatum(onerel->rd_id));
+			sysscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true,
+										 NULL, 1, &scankey);
+			oldtup = systable_getnext(sysscan);
+
+			while (HeapTupleIsValid(oldtup))
+			{
+				int		colnums[3];
+				int		ncols = 0;
+				float4	nd;
+				Datum	values[Natts_pg_mvcoefficient];
+				bool	nulls[Natts_pg_mvcoefficient];
+				bool	replaces[Natts_pg_mvcoefficient];
+				float4		simple_mv_distinct;
+				
+				Form_pg_mvcoefficient mvc =
+					(Form_pg_mvcoefficient) GETSTRUCT (oldtup);
+
+				if (mvc->mvcattr1 > 0)
+					colnums[ncols++] = mvc->mvcattr1 - 1;
+				if (mvc->mvcattr2 > 0)
+					colnums[ncols++] = mvc->mvcattr2 - 1;
+				if (mvc->mvcattr3 > 0)
+					colnums[ncols++] = mvc->mvcattr3 - 1;
+
+				if (ncols > 0)
+				{
+					int		j;
+					float4	nd_coef;
+
+					simple_mv_distinct = 
+						vacattrstats[colnums[0]]->stadistinct;
+					if (simple_mv_distinct < 0)
+						simple_mv_distinct = -simple_mv_distinct * totalrows;
+					for (j = 1 ; j < ncols ; j++)
+					{
+						float4 t = vacattrstats[colnums[j]]->stadistinct;
+
+						if (t < 0)
+							t = -t * totalrows;
+						simple_mv_distinct *= t;
+					}
+
+					nd = compute_mv_distinct(j, colnums, vacattrstats,
+									 std_fetch_func, numrows, totalrows);
+
+					nd_coef = nd / simple_mv_distinct;
+					
+					for (i = 0; i < Natts_pg_mvcoefficient ; ++i)
+					{
+						nulls[i] = false;
+						replaces[i] = false;
+					}
+					values[Anum_pg_mvcoefficient_mvccoefficient - 1] =
+						Float4GetDatum(nd_coef);
+					replaces[Anum_pg_mvcoefficient_mvccoefficient - 1] = true;
+					newtup = heap_modify_tuple(oldtup,
+											   RelationGetDescr(mvcrel),
+											   values,
+											   nulls,
+											   replaces);
+					simple_heap_update(mvcrel, &oldtup->t_self, newtup);
+
+					CatalogUpdateIndexes(mvcrel, newtup);
+
+					oldtup = systable_getnext(sysscan);
+				}
+			}
+
+			systable_endscan(sysscan);
+			heap_close(mvcrel, RowExclusiveLock);
+		}
+
 		if (hasindex)
 			compute_index_stats(onerel, totalrows,
 								indexdata, nindexes,
@@ -1911,6 +2006,7 @@ static void compute_scalar_stats(VacAttrStatsP stats,
 					 int samplerows,
 					 double totalrows);
 static int	compare_scalars(const void *a, const void *b, void *arg);
+static int  compare_mv_scalars(const void *a, const void *b, void *arg);
 static int	compare_mcvs(const void *a, const void *b);
 
 
@@ -2840,6 +2936,207 @@ compute_scalar_stats(VacAttrStatsP stats,
 }
 
 /*
+ *	compute_mv_distinct() -- compute multicolumn distinctness
+ */ 
+
+static float4
+compute_mv_distinct(int nattrs,
+					int *stacolnums,
+					VacAttrStats **stats,
+					AnalyzeAttrFetchFunc fetchfunc,
+					int samplerows,
+					double totalrows)
+{
+	int			i, j;
+	int			null_cnt = 0;
+	int			nonnull_cnt = 0;
+	int			toowide_cnt = 0;
+	double		total_width = 0;
+	bool		is_varlena[3];
+	SortSupportData ssup[3];
+	ScalarItem **values, *values2;
+	int			values_cnt = 0;
+	int		   *tupnoLink;
+	StdAnalyzeData *mystats[3];
+	float4		fndistinct;
+
+	Assert (nattrs <= 3);
+	for (i = 0 ; i < nattrs ; i++)
+	{
+		VacAttrStats *vas = stats[stacolnums[i]];
+		is_varlena[i] =
+			!vas->attrtype->typbyval && vas->attrtype->typlen == -1;
+		mystats[i] =
+			(StdAnalyzeData*) vas->extra_data;
+	}
+
+	values2 = (ScalarItem *) palloc(nattrs * samplerows * sizeof(ScalarItem));
+	values  = (ScalarItem **) palloc(samplerows * sizeof(ScalarItem*));
+	tupnoLink = (int *) palloc(samplerows * sizeof(int));
+
+	for (i = 0 ; i < samplerows ; i++)
+		values[i] = &values2[i * nattrs];
+
+	memset(ssup, 0, sizeof(ssup));
+	for (i = 0 ; i < nattrs ; i++)
+	{
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		/* We always use the default collation for statistics */
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+		ssup[i].abbreviate = true;
+		PrepareSortSupportFromOrderingOp(mystats[i]->ltopr, &ssup[i]);
+	}
+	ssup[nattrs].ssup_cxt = NULL;
+
+	/* Initial scan to find sortable values */
+	for (i = 0; i < samplerows; i++)
+	{
+		Datum		value[2];
+		bool		isnull = false;
+		bool		toowide = false;
+
+		vacuum_delay_point();
+
+		for (j = 0 ; j < nattrs ; j++)
+		{
+
+			value[j] = fetchfunc(stats[stacolnums[j]], i, &isnull);
+
+			/* Check for null/nonnull */
+			if (isnull)
+				break;
+
+			if (is_varlena[j])
+			{
+				total_width += VARSIZE_ANY(DatumGetPointer(value[j]));
+				if (toast_raw_datum_size(value[j]) > WIDTH_THRESHOLD)
+				{
+					toowide = true;
+					break;
+				}
+				value[j] = PointerGetDatum(PG_DETOAST_DATUM(value[j]));
+			}
+		}
+		if (isnull)
+		{
+			null_cnt++;
+			continue;
+		}
+		else if (toowide)
+		{
+			toowide_cnt++;
+			continue;
+		}
+		nonnull_cnt++;
+
+		/* Add it to the list to be sorted */
+		for (j = 0 ; j < nattrs ; j++)
+			values[values_cnt][j].value = value[j];
+
+		values[values_cnt][0].tupno = values_cnt;
+		tupnoLink[values_cnt] = values_cnt;
+		values_cnt++;
+	}
+
+	/* We can only compute real stats if we found some sortable values. */
+	if (values_cnt > 0)
+	{
+		int			ndistinct,	/* # distinct values in sample */
+					nmultiple,	/* # that appear multiple times */
+					dups_cnt;
+		CompareScalarsContext cxt;
+
+		/* Sort the collected values */
+		cxt.ssup = ssup;
+		cxt.tupnoLink = tupnoLink;
+		qsort_arg((void *) values, values_cnt, sizeof(ScalarItem*),
+				  compare_mv_scalars, (void *) &cxt);
+
+		ndistinct = 0;
+		nmultiple = 0;
+		dups_cnt = 0;
+		for (i = 0; i < values_cnt; i++)
+		{
+			int			tupno = values[i][0].tupno;
+
+			dups_cnt++;
+			if (tupnoLink[tupno] == tupno)
+			{
+				/* Reached end of duplicates of this value */
+				ndistinct++;
+				if (dups_cnt > 1)
+					nmultiple++;
+
+				dups_cnt = 0;
+			}
+		}
+
+		if (nmultiple == 0)
+		{
+			/* If we found no repeated values, assume it's a unique column */
+			fndistinct = totalrows;
+		}
+		else if (toowide_cnt == 0 && nmultiple == ndistinct)
+		{
+			/*
+			 * Every value in the sample appeared more than once.  Assume the
+			 * column has just these values.
+			 */
+			fndistinct = (float4)ndistinct;
+		}
+		else
+		{
+			/*----------
+			 * Estimate the number of distinct values using the estimator
+			 * proposed by Haas and Stokes in IBM Research Report RJ 10025:
+			 *		n*d / (n - f1 + f1*n/N)
+			 * where f1 is the number of distinct values that occurred
+			 * exactly once in our sample of n rows (from a total of N),
+			 * and d is the total number of distinct values in the sample.
+			 * This is their Duj1 estimator; the other estimators they
+			 * recommend are considerably more complex, and are numerically
+			 * very unstable when n is much smaller than N.
+			 *
+			 * Overwidth values are assumed to have been distinct.
+			 *----------
+			 */
+			int			f1 = ndistinct - nmultiple + toowide_cnt;
+			int			d = f1 + nmultiple;
+			double		numer,
+						denom,
+						stadistinct;
+
+			numer = (double) samplerows *(double) d;
+
+			denom = (double) (samplerows - f1) +
+				(double) f1 *(double) samplerows / totalrows;
+
+			stadistinct = numer / denom;
+			/* Clamp to sane range in case of roundoff error */
+			if (stadistinct < (double) d)
+				stadistinct = (double) d;
+			if (stadistinct > totalrows)
+				stadistinct = totalrows;
+			fndistinct = floor(stadistinct + 0.5);
+		}
+	}
+	else if (nonnull_cnt > 0)
+	{
+		/* Assume all too-wide values are distinct, so it's a unique column */
+		fndistinct = totalrows;
+	}
+	else if (null_cnt > 0)
+	{
+		fndistinct =  0.0;		/* "unknown" */
+	}
+
+	/* We don't need to bother cleaning up any of our temporary palloc's */
+	return fndistinct;
+}
+
+
+/*
  * qsort_arg comparator for sorting ScalarItems
  *
  * Aside from sorting the items, we update the tupnoLink[] array
@@ -2876,6 +3173,43 @@ compare_scalars(const void *a, const void *b, void *arg)
 	return ta - tb;
 }
 
+static int
+compare_mv_scalars(const void *a, const void *b, void *arg)
+{
+	CompareScalarsContext *cxt = (CompareScalarsContext *) arg;
+	ScalarItem *va = *(ScalarItem**)a;
+	ScalarItem *vb = *(ScalarItem**)b;
+	Datum		da, db;
+	int			ta, tb;
+	int			compare;
+	int i;
+
+	for (i = 0 ; cxt->ssup[i].ssup_cxt ; i++)
+	{
+		da = va[i].value;
+		db = vb[i].value;
+
+		compare = ApplySortComparator(da, false, db, false, &cxt->ssup[i]);
+		if (compare != 0)
+			return compare;
+	}
+
+	/*
+	 * The two datums are equal, so update cxt->tupnoLink[].
+	 */
+	ta = va[0].tupno;
+	tb = vb[0].tupno;
+	if (cxt->tupnoLink[ta] < tb)
+		cxt->tupnoLink[ta] = tb;
+	if (cxt->tupnoLink[tb] < ta)
+		cxt->tupnoLink[tb] = ta;
+
+	/*
+	 * For equal datums, sort by tupno
+	 */
+	return ta - tb;
+}
+
 /*
  * qsort comparator for sorting ScalarMCVItems by position
  */
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index dcac1c1..43712ba 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,8 +14,14 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/indexing.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_mvcoefficient.h"
 #include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +49,93 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 
+static bool
+collect_collist_walker(Node *node, Bitmapset **colsetlist)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var *var = (Var*)node;
+
+		if (AttrNumberIsForUserDefinedAttr(var->varattno))
+			colsetlist[var->varno] = 
+				bms_add_member(colsetlist[var->varno], var->varattno);
+	}
+	return expression_tree_walker(node, collect_collist_walker,
+								  (void*)colsetlist);
+}
+
+/* Find multivariate distinctness coefficient for clauselist */
+static double
+find_mv_join_coeffeicient(PlannerInfo *root, List *clauses)
+{
+	int relid;
+	ListCell   *l;
+	Bitmapset **colsetlist = NULL;
+	double mv_coef = 1.0;
+
+	/* Collect columns this clauselist on */
+	colsetlist = (Bitmapset**)
+		palloc0(root->simple_rel_array_size * sizeof(Bitmapset*));
+
+	foreach(l, clauses)
+	{
+		RestrictInfo *rti = (RestrictInfo *) lfirst(l);
+
+		/* Consider only EC-derived clauses between the joinrels */
+		if (rti->left_ec && rti->left_ec == rti->right_ec)
+		{
+			if (IsA(rti, RestrictInfo))
+				collect_collist_walker((Node*)rti->clause, colsetlist);
+		}
+	}
+
+	/* Find pg_mv_coefficient entries match this columlist */
+	for (relid = 1 ; relid < root->simple_rel_array_size ; relid++)
+	{
+		Relation mvcrel;
+		SysScanDesc sscan;
+		ScanKeyData skeys[1];
+		HeapTuple tuple;
+		
+		if (bms_is_empty(colsetlist[relid])) continue;
+
+		if (root->simple_rte_array[relid]->rtekind != RTE_RELATION) continue;
+
+		ScanKeyInit(&skeys[0],
+					Anum_pg_mvcoefficient_mvcreloid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(root->simple_rte_array[relid]->relid));
+		
+		mvcrel = heap_open(MvCoefficientRelationId, AccessShareLock);
+		sscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true,
+								   NULL, 1, skeys);
+		while (HeapTupleIsValid(tuple = systable_getnext(sscan)))
+		{
+			Bitmapset *mvccols = NULL;
+			Form_pg_mvcoefficient mvc =
+				(Form_pg_mvcoefficient) GETSTRUCT (tuple);
+
+			mvccols = bms_add_member(mvccols, mvc->mvcattr1);
+			mvccols = bms_add_member(mvccols, mvc->mvcattr2);
+			if (mvc->mvcattr3 > 0)
+				mvccols = bms_add_member(mvccols, mvc->mvcattr3);
+
+			if (!bms_is_subset(mvccols, colsetlist[relid]))
+				continue;
+
+			/* Prefer smaller one */
+			if (mvc->mvccoefficient > 0 && mvc->mvccoefficient < mv_coef)
+				mv_coef = mvc->mvccoefficient;
+		}
+		systable_endscan(sscan);
+		heap_close(mvcrel, AccessShareLock);
+	}
+
+	return mv_coef;
+}
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -200,6 +293,9 @@ clauselist_selectivity(PlannerInfo *root,
 		s1 = s1 * s2;
 	}
 
+	/* Try multivariate distinctness correction for clauses */
+	s1 /= find_mv_join_coeffeicient(root, clauses);
+
 	/*
 	 * Now scan the rangequery pair list.
 	 */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index f58e1ce..f4c1001 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mvcoefficient.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -501,6 +502,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvCoefficientRelationId,		/* MVCOEFFICIENT */
+		MvCoefficientIndexId,
+		4,
+		{
+			Anum_pg_mvcoefficient_mvcreloid,
+			Anum_pg_mvcoefficient_mvcattr1,
+			Anum_pg_mvcoefficient_mvcattr2,
+			Anum_pg_mvcoefficient_mvcattr3
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 71e0010..0c76f93 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,9 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mvcoefficient_index, 3578, on pg_mvcoefficient using btree(mvcreloid oid_ops, mvcattr1 int2_ops, mvcattr2 int2_ops, mvcattr3 int2_ops));
+#define MvCoefficientIndexId  3578
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_mvcoefficient.h b/src/include/catalog/pg_mvcoefficient.h
new file mode 100644
index 0000000..56259fd
--- /dev/null
+++ b/src/include/catalog/pg_mvcoefficient.h
@@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mvcoefficient.h
+ *	  definition of the system multivariate coefficient relation
+ *	  (pg_mvcoefficient) along with the relation's initial contents.
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ * src/include/catalog/pg_mvcoefficient.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *	  XXX do NOT break up DATA() statements into multiple lines!
+ *		  the scripts are not as smart as you might think...
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MVCOEFFICIENT_H
+#define PG_MVCOEFFICIENT_H
+
+#include "catalog/genbki.h"
+#include "nodes/pg_list.h"
+
+/* ----------------
+ *		pg_mvcoefficient definition.  cpp turns this into
+ *		typedef struct FormData_pg_mvcoefficient
+ * ----------------
+ */
+#define MvCoefficientRelationId	3577
+
+CATALOG(pg_mvcoefficient,3577) BKI_WITHOUT_OIDS
+{
+	Oid		mvcreloid;			/* OID of target relation */
+	int16	mvcattr1;			/* Column numbers */
+	int16	mvcattr2;
+	int16	mvcattr3;
+	float4	mvccoefficient;		/* multivariate distinctness coefficient */
+} FormData_pg_mvcoefficient;
+
+/* ----------------
+ *		Form_pg_mvcoefficient corresponds to a pointer to a tuple with the
+ *		format of pg_mvcoefficient relation.
+ * ----------------
+ */
+typedef FormData_pg_mvcoefficient *Form_pg_mvcoefficient;
+
+/* ----------------
+ *		compiler constants for pg_mvcoefficient
+ * ----------------
+< */
+#define Natts_pg_mvcoefficient				5
+#define Anum_pg_mvcoefficient_mvcreloid		1
+#define Anum_pg_mvcoefficient_mvcattr1			2
+#define Anum_pg_mvcoefficient_mvcattr2			3
+#define Anum_pg_mvcoefficient_mvcattr3			4
+#define Anum_pg_mvcoefficient_mvccoefficient	5
+
+/* ----------------
+ *		pg_mvcoefficient has no initial contents
+ * ----------------
+ */
+
+/*
+ * prototypes for functions in pg_enum.c
+ */
+#endif   /* PG_MVCOEFFICIENT_H */
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 6634099..db8454c 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVDISTINCT,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..7c77796 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mvcoefficient|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t

#36

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#35)

Re: multivariate statistics / patch v6

On 05/13/15 10:31, Kyotaro HORIGUCHI wrote:

Hello, this might be somewhat out of place but strongly related
to this patch so I'll propose this here.

This is a proposal of new feature for this patch or asking for
your approval for my moving on this as a different (but very
close) project.

===

Attached is v6 of the multivariate stats, with a number of
improvements:

...

2) fix of pg_proc issues (reported by Jeff)

3) rebase to current master

Unfortunately, the v6 patch suffers some system oid conflicts
with recently added ones. And what more unfortunate for me is
that the code for functional dependencies looks undone:)

I'll fix the OID conflicts once the CF completes, which should be in a
few days I guess. Until then you can apply it on top of master from
about May 6 (that's when the v6 was created, and there should be no
conflicts).

Regarding the functional dependencies - you're right there's room for
improvement. For example it only works with dependencies between pairs
of columns, not multi-column dependencies. Is this what you mean by
incomplete?

I mention this because I recently had a issue from strong
correlation between two columns in dbt3 benchmark. Two columns in
some table are in strong correlation but not in functional
dependencies, there are too many values and the distribution of
them is very uniform so MCV is no use for the table (histogram
has nothing to do with equal conditions). As the result, planner
estimates the number of rows largely wrong as expected especially
for joins.

I think the other statistics types (esp. histograms) might be more
useful here, but I assume you haven't tried that because of the conflicts.

The current patch does not handle joins at all, though.

I, then, had a try calculating the ratio between the product of
distinctness of every column and the distinctness of the set of
the columns, call it multivariate coefficient here, and found
that it looks greately useful for the small storage space, less
calculation, and simple code.

So when you have two columns A and B, you compute this:

ndistinct(A) * ndistinct(B)
---------------------------
ndistinct(A,B)

where ndistinc(...) means number of distinct values in the column(s)?

The attached first is a script to generate problematic tables.
And the second is a patch to make use of the mv coef on current
master. The patch is a very primitive POC so no syntactical
interfaces involved.

For the case of your first example,

=# create table t (a int, b int, c int);
=# insert into t (select a/10000, a/10000, a/10000
from generate_series(0, 999999) a);
=# analyze t;
=# explain analyze select * from t where a = 1 and b = 1 and c = 1;
Seq Scan on t (cost=0.00..22906.00 rows=1 width=12)
(actual time=3.878..250.628 rows=10000 loops=1)

Make use of mv coefficient.

=# insert into pg_mvcoefficient values ('t'::regclass, 1, 2, 3, 0);
=# analyze t;
=# explain analyze select * from t where a = 1 and b = 1 and c = 1;
Seq Scan on t (cost=0.00..22906.00 rows=9221 width=12)
(actual time=3.740..242.330 rows=10000 loops=1)

Row number estimation was largely improved.

With my patch:

alter table t add statistics (mcv) on (a,b,c);
analyze t;
select * from pg_mv_stats;

explain (analyze,timing off)
select * from t where a = 1 and b = 1 and c = 1;

QUERY PLAN
------------------------------------------------------------
Seq Scan on t (cost=0.00..22906.00 rows=9533 width=12)
(actual rows=10000 loops=1)
Filter: ((a = 1) AND (b = 1) AND (c = 1))
Rows Removed by Filter: 990000
Planning time: 0.233 ms
Execution time: 93.212 ms
(5 rows)

alter table t drop statistics all;
alter table t add statistics (histogram) on (a,b,c);
analyze t;

explain (analyze,timing off)
select * from t where a = 1 and b = 1 and c = 1;

QUERY PLAN
--------------------------------------------------------------------
Seq Scan on t (cost=0.00..22906.00 rows=9667 width=12)
(actual rows=10000 loops=1)
Filter: ((a = 1) AND (b = 1) AND (c = 1))
Rows Removed by Filter: 990000
Planning time: 0.594 ms
Execution time: 109.917 ms
(5 rows)

So both the MCV list and histogram do quite a good work here, but there
are certainly cases when that does not work and the mvcoefficient works
better.

Well, my example,

$ perl gentbl.pl 10000 | psql postgres
$ psql postgres
=# explain analyze select * from t1 where a = 1 and b = 2501;
Seq Scan on t1 (cost=0.00..6216.00 rows=1 width=8)
(actual time=0.030..66.005 rows=8 loops=1)

=# explain analyze select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b);
Hash Join (cost=1177.00..11393.76 rows=76 width=16)
(actual time=29.811..322.271 rows=320000 loops=1)

Too bad estimate for the join.

=# insert into pg_mvcoefficient values ('t1'::regclass, 1, 2, 0, 0);
=# analyze t1;
=# explain analyze select * from t1 where a = 1 and b = 2501;
Seq Scan on t1 (cost=0.00..6216.00 rows=8 width=8)
(actual time=0.032..104.144 rows=8 loops=1)

=# explain analyze select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b);
Hash Join (cost=1177.00..11393.76 rows=305652 width=16)
(actual time=40.642..325.679 rows=320000 loops=1)

It gives almost correct estimations.

The current patch does not handle joins, but it's one of the TODO items.

I think the result above shows that the multivariate coefficient
is significant to imporove estimates when correlated colums are
involved.

Yes, it looks interesting. I'm wondering what are the "failure cases"
when the coefficient approach does not work. It seems to me it relies on
an assumption of consistency for all the ndistinct values. For example
lets assume you have two columns - A and B, each with 1000 distinct
values, and that each value in A has 100 matching values in B, so the
coefficient is ~10

1,000 * 1,000 / 100,000 = 10

Now, let's assume the distribution looks differently - with first 100
values in A matching all 1000 values of B, and the remaining 900 values
just a single B value. Then

1,000 * 1,000 / (100,000 + 900) = ~9,9

So a very different distribution, but almost the same coefficient.

Are there any other assumptions like this?

Also, does the coefficient work only for equality conditions only?

Would you consider this in your patch? Otherwise I move on this
as a different project from yours if you don't mind. Except user
interface won't conflict with yours, I suppose. But finally they
should need some labor of consolidation.

I think it's a neat idea, and I think it might be added to the patch. It
would fit in quite nicely, actually - I already do have other kinds of
stats for addition, but I'm not going to work on that in the near
future. It will require changes in some parts of the patch (selecting
the stats for a list of clauses) and I'd like to complete the current
patch first, and then add features in follow-up patches.

regards,

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Tomas Vondra (#36)

Re: multivariate statistics / patch v6

Hello,

At Thu, 14 May 2015 12:35:50 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55547A86.8020400@2ndquadrant.com>

On 05/13/15 10:31, Kyotaro HORIGUCHI wrote:

Hello, this might be somewhat out of place but strongly related
to this patch so I'll propose this here.

This is a proposal of new feature for this patch or asking for
your approval for my moving on this as a different (but very
close) project.

===

Attached is v6 of the multivariate stats, with a number of
improvements:

...

2) fix of pg_proc issues (reported by Jeff)

3) rebase to current master

Unfortunately, the v6 patch suffers some system oid conflicts
with recently added ones. And what more unfortunate for me is
that the code for functional dependencies looks undone:)

I'll fix the OID conflicts once the CF completes, which should be in a
few days I guess. Until then you can apply it on top of master from
about May 6 (that's when the v6 was created, and there should be no
conflicts).

I applied it with further fixing. It wasn't a problem :)

Regarding the functional dependencies - you're right there's room for
improvement. For example it only works with dependencies between pairs
of columns, not multi-column dependencies. Is this what you mean by
incomplete?

No, It overruns dependencies->deps because build_mv_dependencies
stores many elements into dependencies->deps[n] although it
really has a room for only one element. I suppose that you paused
writing it when you noticed that the number of required elements
is unknown before finising walk through all pairs of
values. palloc'ing numattrs^2 is reasonable enough as POC code
for now. Am I looking wrong version of patch?

-    dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData))
+    dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData) +
+                                sizeof(MVDependency) * numattrs * numattrs);

I mention this because I recently had a issue from strong
correlation between two columns in dbt3 benchmark. Two columns in
some table are in strong correlation but not in functional
dependencies, there are too many values and the distribution of
them is very uniform so MCV is no use for the table (histogram
has nothing to do with equal conditions). As the result, planner
estimates the number of rows largely wrong as expected especially
for joins.

I think the other statistics types (esp. histograms) might be more
useful here, but I assume you haven't tried that because of the
conflicts.

The current patch does not handle joins at all, though.

Well, that's one of the resons. But I understood that any
deterministic estimation cannot be applied for such distribution
when I saw what made the wrong estimation. eqsel and eqsel_join
finally relies on random match assumption on uniform distribution
when the value is not found in MCV list. And functional
dependencies stuff in your old patch (which works) (rightfully)
failed to find such relationship between the problematic
columns. So I tried ndistinct, which is not contained in your
patch to see how it works well.

I, then, had a try calculating the ratio between the product of
distinctness of every column and the distinctness of the set of
the columns, call it multivariate coefficient here, and found
that it looks greately useful for the small storage space, less
calculation, and simple code.

So when you have two columns A and B, you compute this:

ndistinct(A) * ndistinct(B)
---------------------------
ndistinct(A,B)

Yes, I used the reciprocal of that, though.

where ndistinc(...) means number of distinct values in the column(s)?

Yes.

The attached first is a script to generate problematic tables.
And the second is a patch to make use of the mv coef on current
master. The patch is a very primitive POC so no syntactical
interfaces involved.

...

Make use of mv coefficient.

=# insert into pg_mvcoefficient values ('t'::regclass, 1, 2, 3, 0);
=# analyze t;
=# explain analyze select * from t where a = 1 and b = 1 and c = 1;
Seq Scan on t (cost=0.00..22906.00 rows=9221 width=12)
(actual time=3.740..242.330 rows=10000 loops=1)

Row number estimation was largely improved.

With my patch:

alter table t add statistics (mcv) on (a,b,c);

...

Seq Scan on t (cost=0.00..22906.00 rows=9533 width=12)

Yes, your MV-MCV list should have one third of all possible (set
of) values so it works fine, I guess. But my original problem was
occurred on the condition that (the single column) MCVs contain
under 1% of possible values, MCV would not work for such cases,
but its very uniform distribution helps random assumption to
work.

$ perl gentbl.pl 200000 | psql postgres

posttres=# alter table t1 add statistics (mcv true) on (a, b);
postgres=# analyze t1;
postgres=# explain analyze select * from t1 where a = 1 and b = 2501;
Seq Scan on t1 (cost=0.00..124319.00 rows=1 width=8)
(actual time=0.051..1250.773 rows=8 loops=1)

The estimate "rows=1" is internally 2.4e-11, 3.33e+11 times
smaller than the real number. This will result in roughly the
same order of error for joins. This is because MV-MCV holds too
small part of the domain and then calculated using random
assumption. This won't be not saved by increasing
statistics_target to any sane amount.

alter table t drop statistics all;
alter table t add statistics (histogram) on (a,b,c);

...

Seq Scan on t (cost=0.00..22906.00 rows=9667 width=12)

So both the MCV list and histogram do quite a good work here,

I understand how you calculate selectivity for equality clauses
using histogram. And it calculates the result rows as 2.3e-11,
which is almost same as MV-MCV, and this comes the same cause
with it then yields the same result for joins.

but there are certainly cases when that does not work and the
mvcoefficient works better.

The condition mv-coef is effective where, as metioned above,
MV-MCV or MV-HISTO cannot hold sufficient part of the domain. The
appropriate combination of MV-MCV and mv-coef would be the same
as va_eq_(non_)const/eqjoinsel_inner for single column, which is,
applying mv-coef on the part of selectivity corresponding to
values not in MV-MCV. I have no idea to combinate it with
MV-HISTOGRAM right now.

The current patch does not handle joins, but it's one of the TODO
items.

Yes, but the result on the very large tables can be deduced from
the discussion above.

I think the result above shows that the multivariate coefficient
is significant to imporove estimates when correlated colums are
involved.

Yes, it looks interesting. I'm wondering what are the "failure cases"
when the coefficient approach does not work. It seems to me it relies
on an assumption of consistency for all the ndistinct values. For
example lets assume you have two columns - A and B, each with 1000
distinct values, and that each value in A has 100 matching values in
B, so the coefficient is ~10

1,000 * 1,000 / 100,000 = 10

Now, let's assume the distribution looks differently - with first 100
values in A matching all 1000 values of B, and the remaining 900
values just a single B value. Then

1,000 * 1,000 / (100,000 + 900) = ~9,9

So a very different distribution, but almost the same coefficient.

Are there any other assumptions like this?

I think no for now. Just like the current var_eq_(non_)const and
eqjoinsel_inner does, since no clue for *the true* distribution
available, we have no choice other than stand on the random (on
uniform dist) assumption. And it gives not so bad estimates for
not so extreme distributions. It's of course not perfect but good
enough.

Also, does the coefficient work only for equality conditions only?

The mvcoef is a parallel of ndistinct, (it is a bit wierd
expression though). So I guess it is appliable on the current
estimation codes where using ndistinct, almost of all of them
look to relate to equiality comparison.

Would you consider this in your patch? Otherwise I move on this
as a different project from yours if you don't mind. Except user
interface won't conflict with yours, I suppose. But finally they
should need some labor of consolidation.

I think it's a neat idea, and I think it might be added to the
patch. It would fit in quite nicely, actually - I already do have
other kinds of stats for addition, but I'm not going to work on that
in the near future. It will require changes in some parts of the patch
(selecting the stats for a list of clauses) and I'd like to complete
the current patch first, and then add features in follow-up patches.

I see. Let's work on this for now.

regares,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#37)

Re: multivariate statistics / patch v6

Hello,

On 05/15/15 08:29, Kyotaro HORIGUCHI wrote:

Hello,

Regarding the functional dependencies - you're right there's room
for improvement. For example it only works with dependencies
between pairs of columns, not multi-column dependencies. Is this
what you mean by incomplete?

No, It overruns dependencies->deps because build_mv_dependencies
stores many elements into dependencies->deps[n] although it
really has a room for only one element. I suppose that you paused
writing it when you noticed that the number of required elements
is unknown before finising walk through all pairs of
values. palloc'ing numattrs^2 is reasonable enough as POC code
for now. Am I looking wrong version of patch?
-    dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData))
+    dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData) +
+                                sizeof(MVDependency) * numattrs * numattrs);

Ah! That's clearly a bug. Thanks for noticing that, will fix in the next
version of the patch.

I mention this because I recently had a issue from strong
correlation between two columns in dbt3 benchmark. Two columns
in some table are in strong correlation but not in functional
dependencies, there are too many values and the distribution of
them is very uniform so MCV is no use for the table (histogram
has nothing to do with equal conditions). As the result, planner
estimates the number of rows largely wrong as expected
especially for joins.

I think the other statistics types (esp. histograms) might be more
useful here, but I assume you haven't tried that because of the
conflicts.

The current patch does not handle joins at all, though.

Well, that's one of the resons. But I understood that any
deterministic estimation cannot be applied for such distribution
when I saw what made the wrong estimation. eqsel and eqsel_join
finally relies on random match assumption on uniform distribution
when the value is not found in MCV list. And functional
dependencies stuff in your old patch (which works) (rightfully)
failed to find such relationship between the problematic
columns. So I tried ndistinct, which is not contained in your
patch to see how it works well.

Yes, that's certainly true. I think you're right that mv coefficient
might be quite useful in some cases.

With my patch:

alter table t add statistics (mcv) on (a,b,c);

...

Seq Scan on t (cost=0.00..22906.00 rows=9533 width=12)

Yes, your MV-MCV list should have one third of all possible (set
of) values so it works fine, I guess. But my original problem was
occurred on the condition that (the single column) MCVs contain
under 1% of possible values, MCV would not work for such cases,
but its very uniform distribution helps random assumption to
work.

Actually, I think the MCV list should contain all the items, as it
decides the sample contains all the values from the data. The usual 1D
MCV list uses the same logic. But you're right that on a data set with
more MCV items and mostly uniform distribution, this won't work.

$ perl gentbl.pl 200000 | psql postgres

<takes a while..>

posttres=# alter table t1 add statistics (mcv true) on (a, b);
postgres=# analyze t1;
postgres=# explain analyze select * from t1 where a = 1 and b = 2501;
Seq Scan on t1 (cost=0.00..124319.00 rows=1 width=8)
(actual time=0.051..1250.773 rows=8 loops=1)

The estimate "rows=1" is internally 2.4e-11, 3.33e+11 times
smaller than the real number. This will result in roughly the
same order of error for joins. This is because MV-MCV holds too
small part of the domain and then calculated using random
assumption. This won't be not saved by increasing
statistics_target to any sane amount.

Yes, the MCV lists don't do work well with data sets like this.

alter table t drop statistics all;
alter table t add statistics (histogram) on (a,b,c);

...

Seq Scan on t (cost=0.00..22906.00 rows=9667 width=12)

So both the MCV list and histogram do quite a good work here,

I understand how you calculate selectivity for equality clauses
using histogram. And it calculates the result rows as 2.3e-11,
which is almost same as MV-MCV, and this comes the same cause
with it then yields the same result for joins.

but there are certainly cases when that does not work and the
mvcoefficient works better.

The condition mv-coef is effective where, as metioned above,
MV-MCV or MV-HISTO cannot hold sufficient part of the domain. The
appropriate combination of MV-MCV and mv-coef would be the same
as va_eq_(non_)const/eqjoinsel_inner for single column, which is,
applying mv-coef on the part of selectivity corresponding to
values not in MV-MCV. I have no idea to combinate it with
MV-HISTOGRAM right now.

The current patch does not handle joins, but it's one of the TODO
items.

Yes, but the result on the very large tables can be deduced from
the discussion above.

I think the result above shows that the multivariate coefficient
is significant to imporove estimates when correlated colums are
involved.

Yes, it looks interesting. I'm wondering what are the "failure cases"
when the coefficient approach does not work. It seems to me it relies
on an assumption of consistency for all the ndistinct values. For
example lets assume you have two columns - A and B, each with 1000
distinct values, and that each value in A has 100 matching values in
B, so the coefficient is ~10

1,000 * 1,000 / 100,000 = 10

Now, let's assume the distribution looks differently - with first 100
values in A matching all 1000 values of B, and the remaining 900
values just a single B value. Then

1,000 * 1,000 / (100,000 + 900) = ~9,9

So a very different distribution, but almost the same coefficient.

Are there any other assumptions like this?

I think no for now. Just like the current var_eq_(non_)const and
eqjoinsel_inner does, since no clue for *the true* distribution
available, we have no choice other than stand on the random (on
uniform dist) assumption. And it gives not so bad estimates for
not so extreme distributions. It's of course not perfect but good
enough.

Also, does the coefficient work only for equality conditions only?

The mvcoef is a parallel of ndistinct, (it is a bit wierd
expression though). So I guess it is appliable on the current
estimation codes where using ndistinct, almost of all of them
look to relate to equiality comparison.

ISTM the estimation of GROUP BY might benefit tremendously from this
statistics. That is, helping with cardinality estimation of analytical
queries, etc.

Also, we've only discussed 2-column coefficients. Would it be useful to
track those coefficients for large groups of columns? For example

ndistinct(A,B,C)
--------------------------------------------
ndistinct(A) * ndistinct(B) * ndistinct(C)

which might work better for queries like

SELECT a,b,c FROM t GROUP BY a,b,c;

Would you consider this in your patch? Otherwise I move on this
as a different project from yours if you don't mind. Except user
interface won't conflict with yours, I suppose. But finally they
should need some labor of consolidation.

I think it's a neat idea, and I think it might be added to the
patch. It would fit in quite nicely, actually - I already do have
other kinds of stats for addition, but I'm not going to work on
that in the near future. It will require changes in some parts of
the patch (selecting the stats for a list of clauses) and I'd like
to complete the current patch first, and then add features in
follow-up patches.

I see. Let's work on this for now.

Thanks!

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#37)

Re: multivariate statistics / patch v6

Hello,

On 05/15/15 08:29, Kyotaro HORIGUCHI wrote:

Hello,

At Thu, 14 May 2015 12:35:50 +0200, Tomas Vondra

<tomas.vondra@2ndquadrant.com> wrote in <55547A86.8020400@2ndquadrant.com>
...

Regarding the functional dependencies - you're right there's room for
improvement. For example it only works with dependencies between pairs
of columns, not multi-column dependencies. Is this what you mean by
incomplete?

No, It overruns dependencies->deps because build_mv_dependencies
stores many elements into dependencies->deps[n] although it
really has a room for only one element. I suppose that you paused
writing it when you noticed that the number of required elements
is unknown before finising walk through all pairs of
values. palloc'ing numattrs^2 is reasonable enough as POC code
for now. Am I looking wrong version of patch?
-    dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData))
+    dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData) +
+                                sizeof(MVDependency) * numattrs * numattrs);

Actually, looking at this a bit more, I think the current behavior is
correct. I assume the line is from build_mv_dependencies(), but the
whole block looks like this:

if (dependencies == NULL)
{
dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
dependencies->magic = MVSTAT_DEPS_MAGIC;
}
else
dependencies = repalloc(dependencies,
offsetof(MVDependenciesData, deps) +
sizeof(MVDependency) * (dependencies->ndeps + 1));

which allocates space for a single element initially, and then extends
that when other dependencies are added.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#34)

6 attachment(s)

Re: multivariate statistics / patch v7

Hello,

attached is v7 of the multivariate stats patch. The main improvement is
major refactoring of the clausesel.c portion - splitting the awfully
long spaghetti-style functions into smaller pieces, making it much more
understandable etc.

I do assume some of those pieces are unnecessary because there already
is a helper function with the same purpose (but I'm not aware of that).
But IMHO this piece of code begins to look reasonable (especially when
compared to the previous state).

The other major improvement it review of the comments (including FIXMEs
and TODOs), and removal of the obsolete / misplaced ones. And there was
plenty of those ...

These changes made this version ~20k smaller than v6.

The patch also rebases to current master, which I assume shall be quite
stable - so hopefully no more duplicate OIDs for a while.

There are 6 files attached, but only 0002-0006 are actually part of the
multivariate statistics patch itself. The first part makes it possible
to use pull_varnos() with expression trees containing RestrictInfo
nodes, but maybe this is not the right way to fix this (there's another
thread where this was discussed).

Also, the regression tests testing plan choice with multivariate stats
(e.g. that a bitmap index scan is chosen instead of index scan) fail
from time to time. I suppose this happens because the invalidation after
ANALYZE is not processed before executing the query, so the optimizer
does not see the stats, or something like that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-expression-walker-about-RestrictInfo-v7.patchtext/x-patch; name=0001-teach-expression-walker-about-RestrictInfo-v7.patchDownload

>From 886edce86cbe571283ebe49177288e9978b10c81 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/6] teach expression walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/nodes/nodeFuncs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a2bcca5..7dcc1c1 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1995,6 +1995,8 @@ expression_tree_walker(Node *node,
 			return walker(((PlaceHolderInfo *) node)->ph_var, context);
 		case T_RangeTblFunction:
 			return walker(((RangeTblFunction *) node)->funcexpr, context);
+		case T_RestrictInfo:
+			return walker(((RestrictInfo *) node)->clause, context);
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) nodeTag(node));
-- 
1.9.3

0002-shared-infrastructure-and-functional-dependencies-v7.patchtext/x-patch; name=0002-shared-infrastructure-and-functional-dependencies-v7.patchDownload

>From 9e4e4141af44c03ccec77490c84f6c70e68e4449 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/6] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate
stats, most importantly:

- adds a new system catalog (pg_mv_statistic)
- ALTER TABLE ... ADD STATISTICS
- ALTER TABLE ... DROP STATISTICS
- implementation of functional dependencies (the simplest
  type of multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e.
it does not influence the query planning (subject to
follow-up patches).

The current implementation requires a valid 'ltopr' for
the columns, so that we can sort the sample rows in various
ways, both in this patch and other kinds of statistics.
Maybe this restriction could be relaxed in the future,
requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV
list with limited functionality) might be made to work
with hashes of the values, which is sufficient for equality
comparisons. But the queries would require the equality
operator anyway, so it's not really a weaker requirement.
The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple
and probably needs improvements, so that it detects more
complicated dependencies, and also validation of the math.

The name 'functional dependencies' is more correct (than
'association rules') as it's exactly the name used in
relational theory (esp. Normal Forms) for tracking
column-level dependencies.

The multivariate statistics are automatically removed in
two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics
     would be defined on less than 2 columns (remaining)

If there are more at least 2 columns remaining, we keep
the statistics but perform cleanup on the next ANALYZE.
The dropped columns are removed from stakeys, and the new
statistics is built on the smaller set.

We can't do this at DROP COLUMN, because that'd leave us
with invalid statistics, or we'd have to throw it away
although we can still use it. This lazy approach lets us
use the statistics although some of the columns are dead.

Dropping the statistics is done using DROP STATISTICS

   ALTER TABLE ... DROP STATISTICS ALL;
   ALTER TABLE ... DROP STATISTICS (opts) ON (cols);

The bad consequence of this is that 'statistics' becomes
a reserved keyword (was unreserved before), otherwise it
conflicts with DROP <columnname> in the grammar. Not sure
if there's a workaround to this.

This also adds a simple list of statistics to \d in psql.
---
 src/backend/catalog/Makefile               |   1 +
 src/backend/catalog/heap.c                 | 102 +++++
 src/backend/catalog/system_views.sql       |  10 +
 src/backend/commands/analyze.c             |  21 +
 src/backend/commands/tablecmds.c           | 342 +++++++++++++++-
 src/backend/nodes/copyfuncs.c              |  13 +
 src/backend/nodes/outfuncs.c               |  18 +
 src/backend/optimizer/util/plancat.c       |  63 +++
 src/backend/parser/gram.y                  |  83 +++-
 src/backend/utils/Makefile                 |   2 +-
 src/backend/utils/cache/relcache.c         |  59 +++
 src/backend/utils/cache/syscache.c         |  12 +
 src/backend/utils/mvstats/Makefile         |  17 +
 src/backend/utils/mvstats/common.c         | 356 ++++++++++++++++
 src/backend/utils/mvstats/common.h         |  75 ++++
 src/backend/utils/mvstats/dependencies.c   | 638 +++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |  40 ++
 src/include/catalog/heap.h                 |   1 +
 src/include/catalog/indexing.h             |   5 +
 src/include/catalog/pg_mv_statistic.h      |  69 ++++
 src/include/catalog/pg_proc.h              |   5 +
 src/include/catalog/toasting.h             |   1 +
 src/include/nodes/nodes.h                  |   2 +
 src/include/nodes/parsenodes.h             |  12 +-
 src/include/nodes/relation.h               |  28 ++
 src/include/parser/kwlist.h                |   2 +-
 src/include/utils/mvstats.h                |  69 ++++
 src/include/utils/rel.h                    |   4 +
 src/include/utils/relcache.h               |   1 +
 src/include/utils/syscache.h               |   1 +
 src/test/regress/expected/rules.out        |   8 +
 src/test/regress/expected/sanity_check.out |   1 +
 32 files changed, 2053 insertions(+), 8 deletions(-)
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 3d1139b..c6de23c 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index d04e94d..1c28ca3 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -46,6 +46,7 @@
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1611,7 +1612,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1839,6 +1843,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2694,6 +2703,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 18921c4..0dedaba 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -150,6 +150,16 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 861048f..1f50036 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -55,7 +56,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Per-index data for ANALYZE */
 typedef struct AnlIndexData
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 84dbee0..d6c6f8e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -92,7 +93,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -140,8 +141,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
@@ -416,6 +418,10 @@ static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKM
 static void ATExecGenericOptions(Relation rel, List *options);
 static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
+static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
+								StatisticsDef *def, LOCKMODE lockmode);
 
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
 				   ForkNumber forkNum, char relpersistence);
@@ -3013,6 +3019,8 @@ AlterTableGetLockLevel(List *cmds)
 				 * updates.
 				 */
 			case AT_SetStatistics:		/* Uses MVCC in getTableAttrs() */
+			case AT_AddStatistics:		/* XXX not sure if the right level */
+			case AT_DropStatistics:		/* XXX not sure if the right level */
 			case AT_ClusterOn:	/* Uses MVCC in getIndexes() */
 			case AT_DropCluster:		/* Uses MVCC in getIndexes() */
 			case AT_SetOptions:	/* Uses MVCC in getTableAttrs() */
@@ -3169,6 +3177,8 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_ADD_CONSTR;
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
+		case AT_AddStatistics:	/* XXX maybe not the right place */
+		case AT_DropStatistics:	/* XXX maybe not the right place */
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* Performs own permission checks */
 			ATPrepSetStatistics(rel, cmd->name, cmd->def, lockmode);
@@ -3471,6 +3481,12 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			address = ATExecSetStatistics(rel, cmd->name, cmd->def, lockmode);
 			break;
+		case AT_AddStatistics:		/* ADD STATISTICS */
+			ATExecAddStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
+		case AT_DropStatistics:		/* DROP STATISTICS */
+			ATExecDropStatistics(tab, rel, (StatisticsDef *) cmd->def, lockmode);
+			break;
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 			address = ATExecSetOptions(rel, cmd->name, cmd->def, false, lockmode);
 			break;
@@ -11868,3 +11884,323 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 
 	ReleaseSysCache(tuple);
 }
+
+/* used for sorting the attnums in ATExecAddStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the ALTER TABLE ... ADD STATISTICS (options) ON (columns).
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(def, StatisticsDef));
+
+	/* transform the column names to attnum values */
+
+	foreach(l, def->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, def->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	heap_freetuple(htup);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	return;
+}
+
+/*
+ * Implements the ALTER TABLE ... DROP STATISTICS in two forms:
+ *
+ *     ALTER TABLE ... DROP STATISTICS (options) ON (columns)
+ *     ALTER TABLE ... DROP STATISTICS ALL;
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
+						StatisticsDef *def, LOCKMODE lockmode)
+{
+	Relation	statrel;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	ListCell   *l;
+
+	int16	attnums[INDEX_MAX_KEYS];
+	int		numcols = 0;
+
+	/* checking whether the statistics matches / should be dropped */
+	bool	build_dependencies = false;
+	bool	check_dependencies = false;
+
+	if (def != NULL)
+	{
+		Assert(IsA(def, StatisticsDef));
+
+		/* collect attribute numbers */
+		foreach(l, def->keys)
+		{
+			char	   *attname = strVal(lfirst(l));
+			HeapTuple	atttuple;
+
+			atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+			if (!HeapTupleIsValid(atttuple))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("column \"%s\" referenced in statistics does not exist",
+								attname)));
+
+			/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+			if (numcols >= MVSTATS_MAX_DIMENSIONS)
+				ereport(ERROR,
+						(errcode(ERRCODE_TOO_MANY_COLUMNS),
+						 errmsg("cannot have more than %d keys in a statistics",
+								MVSTATS_MAX_DIMENSIONS)));
+
+			attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+			ReleaseSysCache(atttuple);
+			numcols++;
+		}
+
+		/* parse the statistics options */
+		foreach (l, def->options)
+		{
+			DefElem *opt = (DefElem*)lfirst(l);
+
+			if (strcmp(opt->defname, "dependencies") == 0)
+			{
+				check_dependencies = true;
+				build_dependencies = defGetBoolean(opt);
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("unrecognized STATISTICS option \"%s\"",
+								opt->defname)));
+		}
+
+	}
+
+	statrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(rel)));
+
+	scan = systable_beginscan(statrel,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		/* by default we delete everything */
+		bool delete = true;
+
+		/* check that the options match (dependencies, mcv, histogram) */
+		if (delete && check_dependencies)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_deps_enabled,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetBool(adatum) == build_dependencies);
+		}
+
+		/* check that the columns match the statistics definition */
+		if (delete && (numcols > 0))
+		{
+			int i, j;
+			ArrayType *arr;
+			bool isnull;
+
+			int16  *stakeys;
+			int		nstakeys;
+
+			Datum adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+
+			nstakeys = ARR_DIMS(arr)[0];
+			stakeys = (int16 *) ARR_DATA_PTR(arr);
+
+			/* assume match */
+			delete = true;
+
+			/* check that for each column we find a match in stakeys */
+			for (i = 0; i < numcols; i++)
+			{
+				bool found = false;
+				for (j = 0; j < nstakeys; j++)
+				{
+					if (attnums[i] == stakeys[j])
+					{
+						found = true;
+						break;
+					}
+				}
+
+				if (! found)
+				{
+					delete = false;
+					break;
+				}
+			}
+
+			/* check that for each stakeys we find a match in columns */
+			for (j = 0; j < nstakeys; j++)
+			{
+				bool found = false;
+
+				for (i = 0; i < numcols; i++)
+				{
+					if (attnums[i] == stakeys[j])
+					{
+						found = true;
+						break;
+					}
+				}
+
+				if (! found)
+				{
+					delete = false;
+					break;
+				}
+			}
+		}
+
+		/* don't delete, if we've found mismatches */
+		if (delete)
+			simple_heap_delete(statrel, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(statrel, RowExclusiveLock);
+
+	/*
+	 * Invalidate relcache so that others forget the dropped statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	return;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 4c363d3..e5a3d96 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4095,6 +4095,17 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static StatisticsDef *
+_copyStatisticsDef(const StatisticsDef *from)
+{
+	StatisticsDef  *newnode = makeNode(StatisticsDef);
+
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4938,6 +4949,8 @@ copyObject(const void *from)
 			break;
 		case T_TableSampleClause:
 			retval = _copyTableSampleClause(from);
+		case T_StatisticsDef:
+			retval = _copyStatisticsDef(from);
 			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4775acf..93a6f04 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1898,6 +1898,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3331,6 +3346,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_CreateStmt:
 				_outCreateStmt(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b04dc2e..c397773 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -39,7 +40,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -92,6 +95,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -380,6 +384,65 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+			if (! HeapTupleIsValid(htup))
+				continue;
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e0ff6f1..d81bab6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -375,6 +375,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <node>	group_by_item empty_grouping_set rollup_clause cube_clause
 %type <node>	grouping_sets_clause
 
+%type <list>	OptStatsOptions
+%type <str>		stats_options_name
+%type <node>	stats_options_arg
+%type <defelt>	stats_options_elem
+%type <list>	stats_options_list
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -501,7 +507,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <keyword> unreserved_keyword type_func_name_keyword
 %type <keyword> col_name_keyword reserved_keyword
 
-%type <node>	TableConstraint TableLikeClause
+%type <node>	TableConstraint TableLikeClause TableStatistics
 %type <ival>	TableLikeOptionList TableLikeOption
 %type <list>	ColQualList
 %type <node>	ColConstraint ColConstraintElem ConstraintAttr
@@ -2333,6 +2339,29 @@ alter_table_cmd:
 					n->subtype = AT_DisableRowSecurity;
 					$$ = (Node *)n;
 				}
+			/* ALTER TABLE <name> ADD STATISTICS (options) ON (columns) */
+			| ADD_P TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_AddStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
+			/* ALTER TABLE <name> DROP STATISTICS (options) ON (columns) */
+			| DROP TableStatistics
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_DropStatistics;
+					n->def = $2;
+					$$ = (Node *)n;
+				}
+			/* ALTER TABLE <name> DROP STATISTICS ALL */
+			| DROP STATISTICS ALL
+				{
+					AlterTableCmd *n = makeNode(AlterTableCmd);
+					n->subtype = AT_DropStatistics;
+					$$ = (Node *)n;
+				}
 			| alter_generic_options
 				{
 					AlterTableCmd *n = makeNode(AlterTableCmd);
@@ -3407,6 +3436,56 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				ALTER TABLE relname ADD STATISTICS (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+TableStatistics:
+			STATISTICS OptStatsOptions ON '(' columnList ')'
+				{
+					StatisticsDef *n = makeNode(StatisticsDef);
+					n->keys  = $5;
+					n->options  = $2;
+					$$ = (Node *) n;
+				}
+		;
+
+OptStatsOptions:
+			'(' stats_options_list ')'		{ $$ = $2; }
+			| /*EMPTY*/						{ $$ = NIL; }
+		;
+
+stats_options_list:
+			stats_options_elem
+				{
+					$$ = list_make1($1);
+				}
+			| stats_options_list ',' stats_options_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+stats_options_elem:
+			stats_options_name stats_options_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+stats_options_name:
+			NonReservedWord			{ $$ = $1; }
+		;
+
+stats_options_arg:
+			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
+			| NumericOnly			{ $$ = (Node *) $1; }
+			| /* EMPTY */			{ $$ = NULL; }
+		;
+
 
 /*****************************************************************************
  *
@@ -13796,7 +13875,6 @@ unreserved_keyword:
 			| STANDALONE_P
 			| START
 			| STATEMENT
-			| STATISTICS
 			| STDIN
 			| STDOUT
 			| STORAGE
@@ -14013,6 +14091,7 @@ reserved_keyword:
 			| SELECT
 			| SESSION_USER
 			| SOME
+			| STATISTICS
 			| SYMMETRIC
 			| TABLE
 			| THEN
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index f60f3cb..8e17872 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3906,6 +3907,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4875,6 +4932,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 58f90f6..89173d6 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -502,6 +503,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		128
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..6d5465b
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..84b6561
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,638 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Mine functional dependencies between columns, in the form (A => B),
+ * meaning that a value in column 'A' determines value in 'B'. A simple
+ * artificial example may be a table created like this
+ *
+ *     CREATE TABLE deptest (a INT, b INT)
+ *        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+ *
+ * Clearly, once we know the value for 'A' we can easily determine the
+ * value of 'B' by dividing (A/10). A more practical example may be
+ * addresses, where (ZIP code => city name), i.e. once we know the ZIP,
+ * we probably know which city it belongs to. Larger cities usually have
+ * multiple ZIP codes, so the dependency can't be reversed.
+ *
+ * Functional dependencies are a concept well described in relational
+ * theory, especially in definition of normalization and "normal forms".
+ * Wikipedia has a nice definition of a functional dependency [1]:
+ *
+ *     In a given table, an attribute Y is said to have a functional
+ *     dependency on a set of attributes X (written X -> Y) if and only
+ *     if each X value is associated with precisely one Y value. For
+ *     example, in an "Employee" table that includes the attributes
+ *     "Employee ID" and "Employee Date of Birth", the functional
+ *     dependency {Employee ID} -> {Employee Date of Birth} would hold.
+ *     It follows from the previous two sentences that each {Employee ID}
+ *     is associated with precisely one {Employee Date of Birth}.
+ *
+ * [1] http://en.wikipedia.org/wiki/Database_normalization
+ *
+ * Most datasets might be normalized not to contain any such functional
+ * dependencies, but sometimes it's not practical. In some cases it's
+ * actually a conscious choice to model the dataset in denormalized way,
+ * either because of performance or to make querying easier.
+ *
+ * The current implementation supports only dependencies between two
+ * columns, but this is merely a simplification of the initial patch.
+ * It's certainly useful to mine for dependencies involving multiple
+ * columns on the 'left' side, i.e. a condition for the dependency.
+ * That is dependencies [A,B] => C and so on.
+ *
+ * TODO The implementation may/should be smart enough not to mine both
+ *      [A => B] and [A,C => B], because the second dependency is a
+ *      consequence of the first one (if values of A determine values
+ *      of B, adding another column won't change that). The ANALYZE
+ *      should first analyze 1:1 dependencies, then 2:1 dependencies
+ *      (and skip the already identified ones), etc.
+ *
+ * For example the dependency [city name => zip code] is much weaker
+ * than [city name, state name => zip code], because there may be
+ * multiple cities with the same name in various states. It's not
+ * perfect though - there are probably cities with the same name within
+ * the same state, but this is relatively rare occurence hopefully.
+ * More about this in the section about dependency mining.
+ *
+ * Handling multiple columns on the right side is not necessary, as such
+ * dependencies may be decomposed into a set of dependencies with
+ * the same meaning, one for each column on the right side. For example
+ *
+ *     A => [B,C]
+ *
+ * is exactly the same as
+ *
+ *     (A => B) & (A => C).
+ *
+ * Of course, storing (A => [B, C]) may be more efficient thant storing
+ * the two dependencies (A => B) and (A => C) separately.
+ *
+ *
+ * Dependency mining (ANALYZE)
+ * ---------------------------
+ *
+ * The current build algorithm is rather simple - for each pair [A,B] of
+ * columns, the data are sorted lexicographically (first by A, then B),
+ * and then a number of metrics is computed by walking the sorted data.
+ *
+ * In general the algorithm counts distict values of A (forming groups
+ * thanks to the sorting), supporting or contradicting the hypothesis
+ * that A => B (i.e. that values of B are predetermined by A). If there
+ * are multiple values of B for a single value of A, it's counted as
+ * contradicting.
+ *
+ * A group may be neither supporting nor contradicting. To be counted as
+ * supporting, the group has to have at least min_group_size(=3) rows.
+ * Smaller 'supporting' groups are counted as neutral.
+ *
+ * Finally, the number of rows in supporting and contradicting groups is
+ * compared, and if there is at least 10x more supporting rows, the
+ * dependency is considered valid.
+ *
+ *
+ * Real-world datasets are imperfect - there may be errors (e.g. due to
+ * data-entry mistakes), or factually correct records, yet contradicting
+ * the dependency (e.g. when a city splits into two, but both keep the
+ * same ZIP code). A strict ANALYZE implementation (where the functional
+ * dependencies are identified) would ignore dependencies on such noisy
+ * data, making the approach unusable in practice.
+ *
+ * The proposed implementation attempts to handle such noisy cases
+ * gracefully, by tolerating small number of contradicting cases.
+ *
+ * In the future this might also perform some sort of test and decide
+ * whether it's worth building any other kind of multivariate stats,
+ * or whether the dependencies sufficiently describe the data. Or at
+ * least not build the MCV list / histogram on the implied columns.
+ * Such reduction would however make the 'verification' (see the next
+ * section) impossible.
+ *
+ *
+ * Clause reduction (planner/optimizer)
+ * ------------------------------------
+ *
+ * Apllying the dependencies is quite simple - given a list of clauses,
+ * try to apply all the dependencies. For example given clause list
+ *
+ *    (a = 1) AND (b = 1) AND (c = 1) AND (d < 100)
+ *
+ * and dependencies [a=>b] and [a=>d], this may be reduced to
+ *
+ *    (a = 1) AND (c = 1) AND (d < 100)
+ *
+ * The (d<100) can't be reduced as it's not an equality clause, so the
+ * dependency [a=>d] can't be applied.
+ *
+ * See clauselist_apply_dependencies() for more details.
+ *
+ * The problem with the reduction is that the query may use conditions
+ * that are not redundant, but in fact contradictory - e.g. the user
+ * may search for a ZIP code and a city name not matching the ZIP code.
+ *
+ * In such cases, the condition on the city name is not actually
+ * redundant, but actually contradictory (making the result empty), and
+ * removing it while estimating the cardinality will make the estimate
+ * worse.
+ *
+ * The current estimation assuming independence (and multiplying the
+ * selectivities) works better in this case, but only by utter luck.
+ *
+ * In some cases this might be verified using the other multivariate
+ * statistics - MCV lists and histograms. For MCV lists the verification
+ * might be very simple - peek into the list if there are any items
+ * matching the clause on the 'A' column (e.g. ZIP code), and if such
+ * item is found, check that the 'B' column matches the other clause.
+ * If it does not, the clauses are contradictory. We can't really say
+ * if such item was not found, except maybe restricting the selectivity
+ * using the MCV data (e.g. using min/max selectivity, or something).
+ *
+ * With histograms, it might work similarly - we can't check the values
+ * directly (because histograms use buckets, unlike MCV lists, storing
+ * the actual values). So we can only observe the buckets matching the
+ * clauses - if those buckets have very low frequency, it probably means
+ * the two clauses are incompatible.
+ *
+ * It's unclear what 'low frequency' is, but if one of the clauses is
+ * implied (automatically true because of the other clause), then
+ *
+ *     selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+ *
+ * So we might compute selectivity of the first clause (on the column
+ * A in dependency [A=>B]) - for example using regular statistics.
+ * And then check if the selectivity computed from the histogram is
+ * about the same (or significantly lower).
+ *
+ * The problem is that histograms work well only when the data ordering
+ * matches the natural meaning. For values that serve as labels - like
+ * city names or ZIP codes, or even generated IDs, histograms really
+ * don't work all that well. For example sorting cities by name won't
+ * match the sorting of ZIP codes, rendering the histogram unusable.
+ *
+ * The MCV are probably going to work much better, because they don't
+ * really assume any sort of ordering. And it's probably more appropriate
+ * for the label-like data.
+ *
+ * TODO Support dependencies with multiple columns on left/right.
+ *
+ * TODO Investigate using histogram and MCV list to confirm the
+ *      functional dependencies.
+ *
+ * TODO Investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram/MCV list).
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ *
+ * TODO Consider eliminating the implied columns from the histogram and
+ *      MCV lists (but maybe that's not a good idea, because that'd make
+ *      it impossible to use these stats for non-equality clauses and
+ *      also it wouldn't be possible to use the stats for verification
+ *      of the dependencies as proposed in another TODO).
+ *
+ * TODO This builds a complete set of dependencies, i.e. including
+ *      transitive dependencies - if we identify [A => B] and [B => C],
+ *      we're likely to identify [A => C] too. It might be better to
+ *      keep only the minimal set of dependencies, i.e. prune all the
+ *      dependencies that we can recreate by transivitity.
+ *
+ *      There are two conceptual ways to do that:
+ *
+ *      (a) generate all the rules, and then prune the rules that may
+ *          be recteated by combining other dependencies, or
+ *
+ *      (b) performing the 'is combination of other dependencies' check
+ *          before actually doing the work
+ *
+ *      The second option has the advantage that we don't really need
+ *      to perform the sort/count. It's not sufficient alone, though,
+ *      because we may discover the dependencies in the wrong order.
+ *      For example [A => B], [A => C] and then [B => C]. None of those
+ *      dependencies is a combination of the already known ones, yet
+ *      [A => C] is a combination of [A => B] and [B => C].
+ *
+ * FIXME Not sure the current NULL handling makes much sense. We assume
+ *       that NULL is 0, so it's handled like a regular value
+ *       (NULL == NULL), so all NULLs in a single column form a single
+ *       group. Maybe that's not the right thing to do, especially with
+ *       equality conditions - in that case NULLs are irrelevant. So
+ *       maybe the right solution would be to just ignore NULL values?
+ *
+ *       However simply "ignoring" the NULL values does not seem like
+ *       a good idea - imagine columns A and B, where for each value of
+ *       A, values in B are constant (same for the whole group) or NULL.
+ *       Let's say only 10% of B values in each group is not NULL. Then
+ *       ignoring the NULL values will result in 10x misestimate (and
+ *       it's trivial to construct arbitrary errors). So maybe handling
+ *       NULL values just like a regular value is the right thing here.
+ *
+ *       Or maybe NULL values should be treated differently on each side
+ *       of the dependency? E.g. as ignored on the left (condition) and
+ *       as regular values on the right - this seems consistent with how
+ *       equality clauses work, as equality clause means 'NOT NULL'.
+ *       So if we say [A => B] then it may also imply "NOT NULL" on the
+ *       right side.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct values in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we can estimate the
+	 *      average group size and use it as a threshold. Or something
+	 *      like that. Seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index db56809..912b4f3 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2096,6 +2096,46 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90500)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 2), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index e6ac394..36debeb 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 748aadd..03ada1b 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,11 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..81ec23b
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_attrdef
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					5
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_deps_enabled		2
+#define Anum_pg_mv_statistic_deps_built			3
+#define Anum_pg_mv_statistic_stakeys			4
+#define Anum_pg_mv_statistic_stadeps			5
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c0aab38..69fc482 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2735,6 +2735,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3307 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3308 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index fb2f035..55f6079 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3309, 3310);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 290cdb3..9254f85 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -249,6 +249,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -426,6 +427,7 @@ typedef enum NodeTag
 	T_RoleSpec,
 	T_RangeTableSample,
 	T_TableSampleClause,
+	T_StatisticsDef,
 
 	/*
 	 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 868905b..d81537c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -610,6 +610,14 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct StatisticsDef
+{
+	NodeTag		type;
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+} StatisticsDef;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1515,7 +1523,9 @@ typedef enum AlterTableType
 	AT_ReplicaIdentity,			/* REPLICA IDENTITY */
 	AT_EnableRowSecurity,		/* ENABLE ROW SECURITY */
 	AT_DisableRowSecurity,		/* DISABLE ROW SECURITY */
-	AT_GenericOptions			/* OPTIONS (...) */
+	AT_GenericOptions,			/* OPTIONS (...) */
+	AT_AddStatistics,			/* ADD STATISTICS */
+	AT_DropStatistics			/* DROP STATISTICS */
 } AlterTableType;
 
 typedef struct ReplicaIdentityStmt
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 279051e..10f7425 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -459,6 +459,7 @@ typedef struct RelOptInfo
 	Relids		lateral_relids; /* minimum parameterization of rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -553,6 +554,33 @@ typedef struct IndexOptInfo
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 2414069..f69480b 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -360,7 +360,7 @@ PG_KEYWORD("stable", STABLE, UNRESERVED_KEYWORD)
 PG_KEYWORD("standalone", STANDALONE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("start", START, UNRESERVED_KEYWORD)
 PG_KEYWORD("statement", STATEMENT, UNRESERVED_KEYWORD)
-PG_KEYWORD("statistics", STATISTICS, UNRESERVED_KEYWORD)
+PG_KEYWORD("statistics", STATISTICS, RESERVED_KEYWORD)
 PG_KEYWORD("stdin", STDIN, UNRESERVED_KEYWORD)
 PG_KEYWORD("stdout", STDOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("storage", STORAGE, UNRESERVED_KEYWORD)
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..411cd16
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 8a55a09..4d6edb6 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -79,6 +79,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -111,6 +112,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 6953281..77efeff 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 2dbd384..814269b 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,7 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 60c1f40..a12ad30 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1363,6 +1363,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 14acd16..d740241 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
1.9.3

0003-clause-reduction-using-functional-dependencies-v7.patchtext/x-patch; name=0003-clause-reduction-using-functional-dependencies-v7.patchDownload

>From 1bc8e278cf96a33bdb5716023ae9929e4c625893 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/6] clause reduction using functional dependencies

During planning, use functional dependencies to decide which
clauses to skip during cardinality estimation. Initial and
rather simplistic implementation.

This only works with regular WHERE clauses, not clauses used
for join clauses.

Note: The clause_is_mv_compatible() needs to identify the
relation (so that we can fetch the list of multivariate stats
by OID). planner_rt_fetch() seems like the appropriate way to
get the relation OID, but apparently it only works with simple
vars. Maybe examine_variable() would make this work with more
complex vars too?

Includes regression tests analyzing functional dependencies
(part of ANALYZE) on several datasets (no dependencies, no
transitive dependencies, ...).

Checks that a query with conditions on two columns, where one (B)
is functionally dependent on the other one (A), correctly ignores
the clause on (B) and chooses bitmap index scan instead of plain
index scan (which is what happens otherwise, thanks to assumption
of independence).

Note: Functional dependencies only work with equality clauses,
no inequalities etc.
---
 src/backend/commands/tablecmds.c              |   6 +
 src/backend/nodes/copyfuncs.c                 |   1 +
 src/backend/optimizer/path/clausesel.c        | 911 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 +
 src/bin/psql/describe.c                       |   1 -
 src/include/utils/mvstats.h                   |  16 +-
 src/test/regress/expected/mv_dependencies.out | 172 +++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/regression.diffs             |  30 +
 src/test/regress/regression.out               | 156 +++++
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 +++++
 13 files changed, 1470 insertions(+), 6 deletions(-)
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/regression.diffs
 create mode 100644 src/test/regress/regression.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d6c6f8e..107e9fc 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11980,6 +11980,12 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 							opt->defname)));
 	}
 
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e5a3d96..36094c0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4949,6 +4949,7 @@ copyObject(const void *from)
 			break;
 		case T_TableSampleClause:
 			retval = _copyTableSampleClause(from);
+			break;
 		case T_StatisticsDef:
 			retval = _copyStatisticsDef(from);
 			break;
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index dcac1c1..6365425 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,15 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -42,6 +46,44 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+
+static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+								Oid varRelid, List *stats,
+								SpecialJoinInfo *sjinfo);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, List *clauses,
+						 Oid varRelid, Index *relid);
+ 
+static Bitmapset* fdeps_collect_attnums(List *stats);
+
+static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
+static int	*make_attnum_to_idx_mapping(Bitmapset *attnums);
+
+static bool	*build_adjacency_matrix(List *stats, Bitmapset *attnums,
+								int *idx_to_attnum, int *attnum_to_idx);
+
+static void	multiply_adjacency_matrix(bool *matrix, int natts);
+
+static List* fdeps_reduce_clauses(List *clauses,
+								  Bitmapset *attnums, bool *matrix,
+								  int *idx_to_attnum, int *attnum_to_idx,
+								  Index relid);
+
+static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static Bitmapset * get_varattnos(Node * node, Index relid);
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -61,7 +103,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -88,6 +130,88 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  *
  * Of course this is all very dependent on the behavior of
  * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ *
+ *
+ * Multivariate statististics
+ * --------------------------
+ * This also uses multivariate stats to estimate combinations of
+ * conditions, in a way (a) maximizing the estimate accuracy by using
+ * as many stats as possible, and (b) minimizing the overhead,
+ * especially when there are no suitable multivariate stats (so if you
+ * are not using multivariate stats, there's no additional overhead).
+ *
+ * The following checks are performed (in this order), and the optimizer
+ * falls back to regular stats on the first 'false'.
+ *
+ * NOTE: This explains how this works with all the patches applied, not
+ *       just the functional dependencies.
+ *
+ * (0) check if there are multivariate stats on the relation
+ *
+ *     If no, just skip all the following steps (directly to the
+ *     original code).
+ *
+ * (1) check how many attributes are there in conditions compatible
+ *     with functional dependencies
+ *
+ *     Only simple equality clauses are considered compatible with
+ *     functional dependencies (and that's unlikely to change, because
+ *     that's the only case when functional dependencies are useful).
+ *
+ *     If there are no conditions that might be handled by multivariate
+ *     stats, or if the conditions reference just a single column, it
+ *     makes no sense to use functional dependencies, so skip to (4).
+ *
+ * (2) reduce the clauses using functional dependencies
+ *
+ *     This simply attempts to 'reduce' the clauses by applying functional
+ *     dependencies. For example if there are two clauses:
+ *
+ *         WHERE (a = 1) AND (b = 2)
+ *
+ *     and we know that 'a' determines the value of 'b', we may remove
+ *     the second condition (b = 2) when computing the selectivity.
+ *     This is of course tricky - see mvstats/dependencies.c for details.
+ *
+ *     After the reduction, step (1) is to be repeated.
+ *
+ * (3) check how many attributes are there in conditions compatible
+ *     with MCV lists and histograms
+ *
+ *     What conditions are compatible with multivariate stats is decided
+ *     by clause_is_mv_compatible(). At this moment, only conditions
+ *     of the form "column operator constant" (for simple comparison
+ *     operators), IS [NOT] NULL and some AND/OR clauses are considered
+ *     compatible with multivariate statistics.
+ *
+ *     Again, see clause_is_mv_compatible() for details.
+ *
+ * (4) check how many attributes are there in conditions compatible
+ *     with MCV lists and histograms
+ *
+ *     If there are no conditions that might be handled by MCV lists
+ *     or histograms, or if the conditions reference just a single
+ *     column, it makes no sense to continue, so just skip to (7).
+ *
+ * (5) choose the stats matching the most columns
+ *
+ *     If there are multiple instances of multivariate statistics (e.g.
+ *     built on different sets of columns), we choose the stats covering
+ *     the most columns from step (1). It may happen that all available
+ *     stats match just a single column - for example with conditions
+ *
+ *         WHERE a = 1 AND b = 2
+ *
+ *     and statistics built on (a,c) and (b,c). In such case just fall
+ *     back to the regular stats because it makes no sense to use the
+ *     multivariate statistics.
+ *
+ *     For more details about how exactly we choose the stats, see
+ *     choose_mv_statistics().
+ *
+ * (6) use the multivariate stats to estimate matching clauses
+ *
+ * (7) estimate the remaining clauses using the regular statistics
  */
 Selectivity
 clauselist_selectivity(PlannerInfo *root,
@@ -100,6 +224,16 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+	List	   *stats = NIL;
+
+	/* use clauses (not conditions), because those are always non-empty */
+	stats = find_stats(root, clauses, varRelid, &relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -109,6 +243,31 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Check that there are some stats with functional dependencies
+	 * built (by walking the stats list). We're going to find that
+	 * anyway when trying to apply the functional dependencies, but
+	 * this is probably a tad faster.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+	{
+		/* collect attributes referenced by mv-compatible clauses */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+
+		/*
+		 * If there are mv-compatible clauses, referencing at least two
+		 * different columns (otherwise it makes no sense to use mv stats),
+		 * try to reduce the clauses using functional dependencies, and
+		 * recollect the attributes from the reduced list.
+		 *
+		 * We don't need to select a single statistics for this - we can
+		 * apply all the functional dependencies we have.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+			clauses = clauselist_apply_dependencies(root, clauses, varRelid,
+													stats, sjinfo);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -782,3 +941,753 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
+				   Index *relid, SpecialJoinInfo *sjinfo)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* no support for OR clauses at this point */
+		if (rinfo->orclause)
+			return false;
+
+		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
+		clause = (Node*)rinfo->clause;
+
+		/* only simple opclauses are compatible with multivariate stats */
+		if (! is_opclause(clause))
+			return false;
+
+		/* we don't support join conditions at this moment */
+		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+			return false;
+
+		/* is it 'variable op constant' ? */
+		if (list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+				 * (return NULL).
+				 *
+				 * TODO Maybe use examine_variable() would fix that?
+				 */
+				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+					return false;
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 *
+				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+				 *       part seems to be enforced by treat_as_join_clause().
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				*relid = var->varno;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_EQSEL:
+							*attnum = var->varattno;
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * Performs reduction of clauses using functional dependencies, i.e.
+ * removes clauses that are considered redundant. It simply walks
+ * through dependencies, and checks whether the dependency 'matches'
+ * the clauses, i.e. if there's a clause matching the condition. If yes,
+ * all clauses matching the implied part of the dependency are removed
+ * from the list.
+ *
+ * This simply looks at attnums references by the clauses, not at the
+ * type of the operator (equality, inequality, ...). This may not be the
+ * right way to do - it certainly works best for equalities, which is
+ * naturally consistent with functional dependencies (implications).
+ * It's not clear that other operators are handled sensibly - for
+ * example for inequalities, like
+ *
+ *     WHERE (A >= 10) AND (B <= 20)
+ *
+ * and a trivial case where [A == B], resulting in symmetric pair of
+ * rules [A => B], [B => A], it's rather clear we can't remove either of
+ * those clauses.
+ *
+ * That only highlights that functional dependencies are most suitable
+ * for label-like data, where using non-equality operators is very rare.
+ * Using the common city/zipcode example, clauses like
+ *
+ *     (zipcode <= 12345)
+ *
+ * or
+ *
+ *     (cityname >= 'Washington')
+ *
+ * are rare. So restricting the reduction to equality should not harm
+ * the usefulness / applicability.
+ *
+ * The other assumption is that this assumes 'compatible' clauses. For
+ * example by using mismatching zip code and city name, this is unable
+ * to identify the discrepancy and eliminates one of the clauses. The
+ * usual approach (multiplying both selectivities) thus produces a more
+ * accurate estimate, although mostly by luck - the multiplication
+ * comes from assumption of statistical independence of the two
+ * conditions (which is not not valid in this case), but moves the
+ * estimate in the right direction (towards 0%).
+ *
+ * This might be somewhat improved by cross-checking the selectivities
+ * against MCV and/or histogram.
+ *
+ * The implementation needs to be careful about cyclic rules, i.e. rules
+ * like [A => B] and [B => A] at the same time. This must not reduce
+ * clauses on both attributes at the same time.
+ *
+ * Technically we might consider selectivities here too, somehow. E.g.
+ * when (A => B) and (B => A), we might use the clauses with minimum
+ * selectivity.
+ *
+ * TODO Consider restricting the reduction to equality clauses. Or maybe
+ *      use equality classes somehow?
+ *
+ * TODO Merge this docs to dependencies.c, as it's saying mostly the
+ *      same things as the comments there.
+ *
+ * TODO Currently this is applied only to the top-level clauses, but
+ *      maybe we could apply it to lists at subtrees too, e.g. to the
+ *      two AND-clauses in
+ *
+ *          (x=1 AND y=2) OR (z=3 AND q=10)
+ *
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Oid varRelid, List *stats,
+							  SpecialJoinInfo *sjinfo)
+{
+	List	   *reduced_clauses = NIL;
+	Index		relid;
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	List	   *deps_clauses   = NIL;
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/*
+	 * Build the dependency matrix, i.e. attribute adjacency matrix,
+	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
+	 * multiply it by itself, to get transitive dependencies.
+	 *
+	 * Note: This is pretty much transitive closure from graph theory.
+	 *
+	 * First, let's see what attributes are covered by functional
+	 * dependencies (sides of the adjacency matrix), and also a maximum
+	 * attribute (size of mapping to simple integer indexes);
+	 */
+	deps_attnums = fdeps_collect_attnums(stats);
+
+	/*
+	 * Walk through the clauses - clauses that are (one of)
+	 *
+	 * (a) not mv-compatible
+	 * (b) are using more than a single attnum
+	 * (c) using attnum not covered by functional depencencies
+	 *
+	 * may be copied directly to the result. The interesting clauses are
+	 * kept in 'deps_clauses' and will be processed later.
+	 */
+	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
+										  &reduced_clauses, &deps_clauses,
+										  varRelid, &relid, sjinfo);
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+		list_free(deps_clauses);
+
+		return clauses;
+	}
+
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't really reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(deps_clauses);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/*
+	 * Build mapping between matrix indexes and attnums, and then the
+	 * adjacency matrix itself.
+	 */
+	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
+	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
+
+	/* build the adjacency matrix */
+	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
+										 deps_idx_to_attnum,
+										 deps_attnum_to_idx);
+
+	deps_natts = bms_num_members(deps_attnums);
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	multiply_adjacency_matrix(deps_matrix, deps_natts);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	deps_clauses = fdeps_reduce_clauses(deps_clauses,
+										deps_attnums, deps_matrix,
+										deps_idx_to_attnum,
+										deps_attnum_to_idx, relid);
+
+	/* join the two lists of clauses */
+	reduced_clauses = list_union(reduced_clauses, deps_clauses);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
+
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Determing relid (either from varRelid or from clauses) and then
+ * lookup stats using the relid.
+ */
+static List *
+find_stats(PlannerInfo *root, List *clauses, Oid varRelid, Index *relid)
+{
+	/* unknown relid by default */
+	*relid = InvalidOid;
+
+	/*
+	 * First we need to find the relid (index info simple_rel_array).
+	 * If varRelid is not 0, we already have it, otherwise we have to
+	 * look it up from the clauses.
+	 */
+	if (varRelid != 0)
+		*relid = varRelid;
+	else
+	{
+		Relids	relids = pull_varnos((Node*)clauses);
+
+		/*
+		 * We only expect 0 or 1 members in the bitmapset. If there are
+		 * no vars, we'll get empty bitmapset, otherwise we'll get the
+		 * relid as the single member.
+		 *
+		 * FIXME For some reason we can get 2 relids here (e.g. \d in
+		 *       psql does that).
+		 */
+		if (bms_num_members(relids) == 1)
+			*relid = bms_singleton_member(relids);
+
+		bms_free(relids);
+	}
+
+	/*
+	 * if we found the relid, we can get the stats from simple_rel_array
+	 *
+	 * This only gets stats that are already built, because that's how
+	 * we load it into RelOptInfo (see get_relation_info), but we don't
+	 * detoast the whole stats yet. That'll be done later, after we
+	 * decide which stats to use.
+	 */
+	if (*relid != InvalidOid)
+		return root->simple_rel_array[*relid]->mvstatlist;
+
+	return NIL;
+}
+
+static Bitmapset*
+fdeps_collect_attnums(List *stats)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		int2vector *stakeys = info->stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+			attnums = bms_add_member(attnums, stakeys->values[j]);
+	}
+
+	return attnums;
+}
+
+
+static int*
+make_idx_to_attnum_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum = -1;
+
+	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
+
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attidx++] = attnum;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+static int*
+make_attnum_to_idx_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum = -1;
+	int		maxattnum = -1;
+	int	   *mapping;
+
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		maxattnum = attnum;
+
+	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attnum] = attidx++;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+static bool*
+build_adjacency_matrix(List *stats, Bitmapset *attnums,
+					   int *idx_to_attnum, int *attnum_to_idx)
+{
+	ListCell *lc;
+	int		natts  = bms_num_members(attnums);
+	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! stat->deps_built)
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(stat->mvoid);
+		if (dependencies == NULL)
+		{
+			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
+			continue;
+		}
+
+		/* set matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			matrix[aidx * natts + bidx] = true;
+		}
+	}
+
+	return matrix;
+}
+
+static void
+multiply_adjacency_matrix(bool *matrix, int natts)
+{
+	int i;
+
+	for (i = 0; i < natts; i++)
+	{
+		int k, l, m;
+		int nchanges = 0;
+
+		/* k => l */
+		for (k = 0; k < natts; k++)
+		{
+			for (l = 0; l < natts; l++)
+			{
+				/* we already have this dependency */
+				if (matrix[k * natts + l])
+					continue;
+
+				/* we don't really care about the exact value, just 0/1 */
+				for (m = 0; m < natts; m++)
+				{
+					if (matrix[k * natts + m] * matrix[m * natts + l])
+					{
+						matrix[k * natts + l] = true;
+						nchanges += 1;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added here, so terminate */
+		if (nchanges == 0)
+			break;
+	}
+}
+
+static List*
+fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
+					int *idx_to_attnum, int *attnum_to_idx, Index relid)
+{
+	int i;
+	ListCell *lc;
+	List   *reduced_clauses = NIL;
+
+	int			nmvclauses;	/* size of the arrays */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+
+	int			natts = bms_num_members(attnums);
+
+	/*
+	 * Preallocate space for all clauses (the list only containst
+	 * compatible clauses at this point). This makes it somewhat easier
+	 * to access the stats / attnums randomly.
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* fill the arrays */
+	nmvclauses = 0;
+	foreach (lc, clauses)
+	{
+		Node * clause = (Node*)lfirst(lc);
+		Bitmapset * attnums = get_varattnos(clause, relid);
+
+		mvclauses[nmvclauses] = clause;
+		mvattnums[nmvclauses] = bms_singleton_member(attnums);
+		nmvclauses++;
+	}
+
+	Assert(nmvclauses == list_length(clauses));
+
+	/* now try to reduce the clauses (using the dependencies) */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], attnums))
+				continue;
+
+			aidx = attnum_to_idx[mvattnums[i]];
+			bidx = attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = matrix[aidx * natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	return reduced_clauses;
+}
+
+
+static Bitmapset *
+fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo)
+{
+	ListCell *lc;
+	Bitmapset *clause_attnums = NULL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+
+		if (! clause_is_mv_compatible(root, clause, varRelid, relid,
+									  &attnum, sjinfo))
+
+			/* clause incompatible with functional dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(attnum, deps_attnums))
+
+			/* clause not covered by the dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else
+		{
+			*deps_clauses   = lappend(*deps_clauses, clause);
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	return clause_attnums;
+}
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..bd200bc 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 84b6561..0a08d12 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -636,3 +636,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 912b4f3..5f89604 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2103,7 +2103,6 @@ describeOneTableDetails(const char *schemaname,
 						   "SELECT oid, stakeys,\n"
 						   "  deps_enabled,\n"
 						   "  deps_built,\n"
-						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 411cd16..02a7dda 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -16,12 +16,20 @@
 
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -47,6 +55,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..cf986e8
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE functional_dependencies ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c, d);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 91780cd..11d9d38 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/regression.diffs b/src/test/regress/regression.diffs
new file mode 100644
index 0000000..95b9cc5
--- /dev/null
+++ b/src/test/regress/regression.diffs
@@ -0,0 +1,30 @@
+*** /home/user/work/tvondra_postgres/src/test/regress/expected/rolenames.out	Wed May  6 21:31:06 2015
+--- /home/user/work/tvondra_postgres/src/test/regress/results/rolenames.out	Mon May 25 22:24:21 2015
+***************
+*** 38,47 ****
+--- 38,52 ----
+   ORDER BY 2;
+  $$ LANGUAGE SQL;
+  CREATE ROLE "Public";
++ ERROR:  role "Public" already exists
+  CREATE ROLE "None";
++ ERROR:  role "None" already exists
+  CREATE ROLE "current_user";
++ ERROR:  role "current_user" already exists
+  CREATE ROLE "session_user";
++ ERROR:  role "session_user" already exists
+  CREATE ROLE "user";
++ ERROR:  role "user" already exists
+  CREATE ROLE current_user; -- error
+  ERROR:  CURRENT_USER cannot be used as a role name here
+  LINE 1: CREATE ROLE current_user;
+***************
+*** 938,940 ****
+--- 943,946 ----
+  DROP OWNED BY testrol0, "Public", "current_user", testrol1, testrol2, testrolx CASCADE;
+  DROP ROLE testrol0, testrol1, testrol2, testrolx;
+  DROP ROLE "Public", "None", "current_user", "session_user", "user";
++ ERROR:  current user cannot be dropped
+
+======================================================================
+
diff --git a/src/test/regress/regression.out b/src/test/regress/regression.out
new file mode 100644
index 0000000..bd81385
--- /dev/null
+++ b/src/test/regress/regression.out
@@ -0,0 +1,156 @@
+test tablespace               ... ok
+test boolean                  ... ok
+test char                     ... ok
+test name                     ... ok
+test varchar                  ... ok
+test text                     ... ok
+test int2                     ... ok
+test int4                     ... ok
+test int8                     ... ok
+test oid                      ... ok
+test float4                   ... ok
+test float8                   ... ok
+test bit                      ... ok
+test numeric                  ... ok
+test txid                     ... ok
+test uuid                     ... ok
+test enum                     ... ok
+test money                    ... ok
+test rangetypes               ... ok
+test pg_lsn                   ... ok
+test regproc                  ... ok
+test strings                  ... ok
+test numerology               ... ok
+test point                    ... ok
+test lseg                     ... ok
+test line                     ... ok
+test box                      ... ok
+test path                     ... ok
+test polygon                  ... ok
+test circle                   ... ok
+test date                     ... ok
+test time                     ... ok
+test timetz                   ... ok
+test timestamp                ... ok
+test timestamptz              ... ok
+test interval                 ... ok
+test abstime                  ... ok
+test reltime                  ... ok
+test tinterval                ... ok
+test inet                     ... ok
+test macaddr                  ... ok
+test tstypes                  ... ok
+test comments                 ... ok
+test geometry                 ... ok
+test horology                 ... ok
+test regex                    ... ok
+test oidjoins                 ... ok
+test type_sanity              ... ok
+test opr_sanity               ... ok
+test insert                   ... ok
+test insert_conflict          ... ok
+test create_function_1        ... ok
+test create_type              ... ok
+test create_table             ... ok
+test create_function_2        ... ok
+test copy                     ... ok
+test copyselect               ... ok
+test create_misc              ... ok
+test create_operator          ... ok
+test create_index             ... ok
+test create_view              ... ok
+test create_aggregate         ... ok
+test create_function_3        ... ok
+test create_cast              ... ok
+test constraints              ... ok
+test triggers                 ... ok
+test inherit                  ... ok
+test create_table_like        ... ok
+test typed_table              ... ok
+test vacuum                   ... ok
+test drop_if_exists           ... ok
+test updatable_views          ... ok
+test rolenames                ... FAILED
+test sanity_check             ... ok
+test errors                   ... ok
+test select                   ... ok
+test select_into              ... ok
+test select_distinct          ... ok
+test select_distinct_on       ... ok
+test select_implicit          ... ok
+test select_having            ... ok
+test subselect                ... ok
+test union                    ... ok
+test case                     ... ok
+test join                     ... ok
+test aggregates               ... ok
+test groupingsets             ... ok
+test transactions             ... ok
+test random                   ... ok
+test portals                  ... ok
+test arrays                   ... ok
+test btree_index              ... ok
+test hash_index               ... ok
+test update                   ... ok
+test delete                   ... ok
+test namespace                ... ok
+test prepared_xacts           ... ok
+test brin                     ... ok
+test gin                      ... ok
+test gist                     ... ok
+test spgist                   ... ok
+test privileges               ... ok
+test security_label           ... ok
+test collate                  ... ok
+test matview                  ... ok
+test lock                     ... ok
+test replica_identity         ... ok
+test rowsecurity              ... ok
+test object_address           ... ok
+test alter_generic            ... ok
+test misc                     ... ok
+test psql                     ... ok
+test async                    ... ok
+test rules                    ... ok
+test select_views             ... ok
+test portals_p2               ... ok
+test foreign_key              ... ok
+test cluster                  ... ok
+test dependency               ... ok
+test guc                      ... ok
+test bitmapops                ... ok
+test combocid                 ... ok
+test tsearch                  ... ok
+test tsdicts                  ... ok
+test foreign_data             ... ok
+test window                   ... ok
+test xmlmap                   ... ok
+test functional_deps          ... ok
+test advisory_lock            ... ok
+test json                     ... ok
+test jsonb                    ... ok
+test indirect_toast           ... ok
+test equivclass               ... ok
+test plancache                ... ok
+test limit                    ... ok
+test plpgsql                  ... ok
+test copy2                    ... ok
+test temp                     ... ok
+test domain                   ... ok
+test rangefuncs               ... ok
+test prepare                  ... ok
+test without_oid              ... ok
+test conversion               ... ok
+test truncate                 ... ok
+test alter_table              ... ok
+test sequence                 ... ok
+test polymorphism             ... ok
+test rowtypes                 ... ok
+test returning                ... ok
+test largeobject              ... ok
+test with                     ... ok
+test xml                      ... ok
+test event_trigger            ... ok
+test stats                    ... ok
+test tablesample              ... ok
+test mv_dependencies          ... ok
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index a2e0ceb..66925b3 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -156,3 +156,4 @@ test: xml
 test: event_trigger
 test: stats
 test: tablesample
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..2491aca
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (unknown_column);
+
+-- single column
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a);
+
+-- single column, duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE functional_dependencies ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- correct command
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE functional_dependencies ADD STATISTICS (dependencies) ON (a, b, c, d);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
1.9.3

0004-multivariate-MCV-lists-v7.patchtext/x-patch; name=0004-multivariate-MCV-lists-v7.patchDownload

>From 3114c82ae310d840f613583b169ac1cc79520f81 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/6] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 src/backend/catalog/system_views.sql   |    4 +-
 src/backend/commands/tablecmds.c       |   89 ++-
 src/backend/nodes/outfuncs.c           |    2 +
 src/backend/optimizer/path/clausesel.c | 1079 ++++++++++++++++++++++++++--
 src/backend/optimizer/util/plancat.c   |    4 +-
 src/backend/utils/mvstats/Makefile     |    2 +-
 src/backend/utils/mvstats/common.c     |  104 ++-
 src/backend/utils/mvstats/common.h     |   11 +-
 src/backend/utils/mvstats/mcv.c        | 1237 ++++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                |   25 +-
 src/include/catalog/pg_mv_statistic.h  |   18 +-
 src/include/catalog/pg_proc.h          |    4 +
 src/include/nodes/relation.h           |    2 +
 src/include/utils/mvstats.h            |   69 +-
 src/test/regress/expected/mv_mcv.out   |  207 ++++++
 src/test/regress/expected/rules.out    |    4 +-
 src/test/regress/parallel_schedule     |    2 +-
 src/test/regress/regression.diffs      |   30 -
 src/test/regress/regression.out        |  156 ----
 src/test/regress/serial_schedule       |    1 +
 src/test/regress/sql/mv_mcv.sql        |  178 +++++
 21 files changed, 2940 insertions(+), 288 deletions(-)
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 delete mode 100644 src/test/regress/regression.diffs
 delete mode 100644 src/test/regress/regression.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dedaba..3144a29 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -156,7 +156,9 @@ CREATE VIEW pg_mv_stats AS
         C.relname AS tablename,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 107e9fc..0d72aec 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11918,7 +11918,13 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	Relation	mvstatrel;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -11973,6 +11979,29 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -11981,10 +12010,16 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -12000,9 +12035,13 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
 
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
-	nulls[Anum_pg_mv_statistic_stadeps -1] = true;
+	nulls[Anum_pg_mv_statistic_stadeps -1]  = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
@@ -12049,7 +12088,13 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 
 	/* checking whether the statistics matches / should be dropped */
 	bool	build_dependencies = false;
+	bool	build_mcv = false;
+
+	bool	max_mcv_items = 0;
+
 	bool	check_dependencies = false;
+	bool	check_mcv = false;
+	bool	check_mcv_items = false;
 
 	if (def != NULL)
 	{
@@ -12091,6 +12136,18 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 				check_dependencies = true;
 				build_dependencies = defGetBoolean(opt);
 			}
+			else if (strcmp(opt->defname, "mcv") == 0)
+			{
+				check_mcv = true;
+				build_mcv = defGetBoolean(opt);
+			}
+			else if (strcmp(opt->defname, "max_mcv_items") == 0)
+			{
+				check_mcv       = true;
+				check_mcv_items = true;
+				build_mcv       = true;
+				max_mcv_items   = defGetInt32(opt);
+			}
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
@@ -12130,6 +12187,30 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 					 (DatumGetBool(adatum) == build_dependencies);
 		}
 
+		if (delete && check_mcv)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_mcv_enabled,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetBool(adatum) == build_mcv);
+		}
+
+		if (delete && check_mcv_items)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_mcv_max_items,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetInt32(adatum) == max_mcv_items);
+		}
+
 		/* check that the columns match the statistics definition */
 		if (delete && (numcols > 0))
 		{
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 93a6f04..1867ab7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1907,9 +1907,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 6365425..95872de 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,17 +48,38 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+							 int type);
 
 static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
-									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
+									  int type);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+								 List *clauses, Oid varRelid,
+								 List **mvclauses, MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStatisticInfo *mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
@@ -85,6 +107,13 @@ static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
 
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -250,8 +279,12 @@ clauselist_selectivity(PlannerInfo *root,
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP))
 	{
-		/* collect attributes referenced by mv-compatible clauses */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+		/*
+		 * Collect attributes referenced by mv-compatible clauses (looking
+		 * for clauses compatible with functional dependencies for now).
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   MV_CLAUSE_TYPE_FDEP);
 
 		/*
 		 * If there are mv-compatible clauses, referencing at least two
@@ -268,6 +301,48 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
+	 * Check that there are statistics with MCV list. If not, we don't
+	 * need to waste time with the optimization.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV))
+	{
+		/*
+		 * Recollect attributes from mv-compatible clauses (maybe we've
+		 * removed so many clauses we have a single mv-compatible attnum).
+		 * From now on we're only interested in MCV-compatible clauses.
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   MV_CLAUSE_TYPE_MCV);
+
+		/*
+		 * If there still are at least two columns, we'll try to select
+		 * a suitable multivariate stats.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+		{
+			/* see choose_mv_statistics() for details */
+			MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+			if (mvstat != NULL)	/* we have a matching stats */
+			{
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										MV_CLAUSE_TYPE_MCV);
+
+				/* we've chosen the histogram to match the clauses */
+				Assert(mvclauses != NIL);
+
+				/* compute the multivariate stats */
+				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			}
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -942,12 +1017,129 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * Estimate selectivity for the list of MV-compatible clauses, using
+ * using a MV statistics (combining a histogram and MCV list).
+ *
+ * This simply passes the estimation to the MCV list and then to the
+ * histogram, if available.
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities
+ *      (i.e. the selectivity of the most restrictive clause), because
+ *      that's the maximum we can ever get from ANDed list of clauses.
+ *      This may probably prevent issues with hitting too many buckets
+ *      and low precision histograms.
+ *
+ * TODO We may support some additional conditions, most importantly
+ *      those matching multiple columns (e.g. "a = b" or "a < b").
+ *      Ultimately we could track multi-table histograms for join
+ *      cardinality estimation.
+ *
+ * TODO Further thoughts on processing equality clauses: Maybe it'd be
+ *      better to look for stats (with MCV) covered by the equality
+ *      clauses, because then we have a chance to find an exact match
+ *      in the MCV list, which is pretty much the best we can do. We may
+ *      also look at the least frequent MCV item, and use it as a upper
+ *      boundary for the selectivity (had there been a more frequent
+ *      item, it'd be in the MCV list).
+ *
+ * TODO There are several options for 'sanity clamping' the estimates.
+ *
+ *      First, if we have selectivities for each condition, then
+ *
+ *          P(A,B) <= MIN(P(A), P(B))
+ *
+ *      Because additional conditions (connected by AND) can only lower
+ *      the probability.
+ *
+ *      So we can do some basic sanity checks using the single-variate
+ *      stats (the ones we have right now).
+ *
+ *      Second, when we have multivariate stats with a MCV list, then
+ *
+ *      (a) if we have a full equality condition (one equality condition
+ *          on each column) and we found a match in the MCV list, this is
+ *          the selectivity (and it's supposed to be exact)
+ *
+ *      (b) if we have a full equality condition and we haven't found a
+ *          match in the MCV list, then the selectivity is below the
+ *          lowest selectivity in the MCV list
+ *
+ *      (c) if we have a equality condition (not full), we can still
+ *          search the MCV for matches and use the sum of probabilities
+ *          as a lower boundary for the histogram (if there are no
+ *          matches in the MCV list, then we have no boundary)
+ *
+ *      Third, if there are multiple (combinations of) multivariate
+ *      stats for a set of clauses, we may compute all of them and then
+ *      somehow aggregate them - e.g. by choosing the minimum, median or
+ *      average. The stats are susceptible to overestimation (because
+ *      we take 50% of the bucket for partial matches). Some stats may
+ *      give better estimates than others, but it's very difficult to
+ *      say that in advance which one is the best (it depends on the
+ *      number of buckets, number of additional columns not referenced
+ *      in the clauses, type of condition etc.).
+ *
+ *      So we may compute them all and then choose a sane aggregation
+ *      (minimum seems like a good approach). Of course, this may result
+ *      in longer / more expensive estimation (CPU-wise), but it may be
+ *      worth it.
+ *
+ *      It's possible to add a GUC choosing whether to do a 'simple'
+ *      (using a single stats expected to give the best estimate) and
+ *      'complex' (combining the multiple estimates).
+ *
+ *          multivariate_estimates = (simple|full)
+ *
+ *      Also, this might be enabled at a table level, by something like
+ *
+ *          ALTER TABLE ... SET STATISTICS (simple|full)
+ *
+ *      Which would make it possible to use this only for the tables
+ *      where the simple approach does not work.
+ *
+ *      Also, there are ways to optimize this algorithmically. E.g. we
+ *      may try to get an estimate from a matching MCV list first, and
+ *      if we happen to get a "full equality match" we may stop computing
+ *      the estimates from other stats (for this condition) because
+ *      that's probably the best estimate we can really get.
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
 collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
-				   Index *relid, SpecialJoinInfo *sjinfo)
+				   Index *relid, SpecialJoinInfo *sjinfo, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
@@ -963,12 +1155,11 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+								sjinfo, types);
 	}
 
 	/*
@@ -987,6 +1178,188 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
+ * We're looking for statistics matching at least 2 attributes,
+ * referenced in the clauses compatible with multivariate statistics.
+ * The current selection criteria is very simple - we choose the
+ * statistics referencing the most attributes.
+ *
+ * If there are multiple statistics referencing the same number of
+ * columns (from the clauses), the one with less source columns
+ * (as listed in the ADD STATISTICS when creating the statistics) wins.
+ * Other wise the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns,
+ *     but one has 100 buckets and the other one has 1000 buckets (thus
+ *     likely providing better estimates), this is not currently
+ *     considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list,
+ *     another one with just a histogram and a third one with both,
+ *     this is not considered.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts,
+ *     so if there are multiple clauses on a single attribute, this
+ *     still counts as a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example
+ *     equality clauses probably work better with MCV lists than with
+ *     histograms. But IS [NOT] NULL conditions may often work better
+ *     with histograms (thanks to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
+ * selected as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list
+ * of clauses into two parts - conditions that are compatible with the
+ * selected stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while
+ * the last condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting
+ * conditions instead of just referenced attributes), but eventually
+ * the best option should be to combine multiple statistics. But that's
+ * much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing
+ *      the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements)
+	 * and for each one count the referenced attributes (encoded in
+	 * the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen statistics, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes covered by the stats, so we can
+	 * do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
+									&attnums, sjinfo, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list
+		 * of mv-compatible clauses. Otherwise, keep it in the list of
+		 * 'regular' clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
  * into simple_rte_array) and a bitmap of attributes. This is then
@@ -1005,8 +1378,12 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
  */
 static bool
 clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+						Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+						int types)
 {
+	Relids clause_relids;
+	Relids left_relids;
+	Relids right_relids;
 
 	if (IsA(clause, RestrictInfo))
 	{
@@ -1016,82 +1393,176 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		if (rinfo->pseudoconstant)
 			return false;
 
-		/* no support for OR clauses at this point */
-		if (rinfo->orclause)
-			return false;
-
 		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
 		clause = (Node*)rinfo->clause;
 
-		/* only simple opclauses are compatible with multivariate stats */
-		if (! is_opclause(clause))
-			return false;
-
 		/* we don't support join conditions at this moment */
 		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
 			return false;
 
+		clause_relids = rinfo->clause_relids;
+		left_relids = rinfo->left_relids;
+		right_relids = rinfo->right_relids;
+	}
+	else if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+	{
+		left_relids = pull_varnos(get_leftop((Expr*)clause));
+		right_relids = pull_varnos(get_rightop((Expr*)clause));
+
+		clause_relids = bms_union(left_relids,
+								  right_relids);
+	}
+	else
+	{
+		/* Not a binary opclause, so mark left/right relid sets as empty */
+		left_relids = NULL;
+		right_relids = NULL;
+		/* and get the total relid set the hard way */
+		clause_relids = pull_varnos((Node *) clause);
+	}
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+		bool		varonleft = true;
+		bool		ok;
+
 		/* is it 'variable op constant' ? */
-		if (list_length(((OpExpr *) clause)->args) == 2)
+
+		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
+			(is_pseudo_constant_clause_relids(lsecond(expr->args),
+											  right_relids) ||
+			(varonleft = false,
+			is_pseudo_constant_clause_relids(linitial(expr->args),
+											 left_relids)));
+
+		if (ok)
 		{
-			OpExpr	   *expr = (OpExpr *) clause;
-			bool		varonleft = true;
-			bool		ok;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
 
-			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-				(is_pseudo_constant_clause_relids(lsecond(expr->args),
-												rinfo->right_relids) ||
-				(varonleft = false,
-				is_pseudo_constant_clause_relids(linitial(expr->args),
-												rinfo->left_relids)));
+			/*
+			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+			 * (return NULL).
+			 *
+			 * TODO Maybe use examine_variable() would fix that?
+			 */
+			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+				return false;
 
-			if (ok)
-			{
-				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			/*
+			 * Only consider this variable if (varRelid == 0) or when the varno
+			 * matches varRelid (see explanation at clause_selectivity).
+			 *
+			 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+			 *       part seems to be enforced by treat_as_join_clause().
+			 */
+			if (! ((varRelid == 0) || (varRelid == var->varno)))
+				return false;
 
-				/*
-				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-				 * (return NULL).
-				 *
-				 * TODO Maybe use examine_variable() would fix that?
-				 */
-				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-					return false;
+			/* Also skip special varno values, and system attributes ... */
+			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+				return false;
 
-				/*
-				 * Only consider this variable if (varRelid == 0) or when the varno
-				 * matches varRelid (see explanation at clause_selectivity).
-				 *
-				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-				 *       part seems to be enforced by treat_as_join_clause().
-				 */
-				if (! ((varRelid == 0) || (varRelid == var->varno)))
-					return false;
+			/* Lookup info about the base relation (we need to pass the OID out) */
+			if (relid != NULL)
+				*relid = var->varno;
+
+			/*
+			 * If it's not a "<" or ">" or "=" operator, just ignore the
+			 * clause. Otherwise note the relid and attnum for the variable.
+			 * This uses the function for estimating selectivity, ont the
+			 * operator directly (a bit awkward, but well ...).
+			 */
+			switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:
+					case F_SCALARGTSEL:
+						/* not compatible with functional dependencies */
+						if (types & MV_CLAUSE_TYPE_MCV)
+						{
+							*attnums = bms_add_member(*attnums, var->varattno);
+							return (types & MV_CLAUSE_TYPE_MCV);
+						}
+						return false;
+
+					case F_EQSEL:
+						*attnums = bms_add_member(*attnums, var->varattno);
+						return true;
+				}
+		}
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		Var * var = (Var*)((NullTest*)clause)->arg;
+
+		/*
+		 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+		 * (return NULL).
+		 *
+		 * TODO Maybe use examine_variable() would fix that?
+		 */
+		if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+			return false;
+
+		/*
+		 * Only consider this variable if (varRelid == 0) or when the varno
+		 * matches varRelid (see explanation at clause_selectivity).
+		 *
+		 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+		 *       part seems to be enforced by treat_as_join_clause().
+		 */
+		if (! ((varRelid == 0) || (varRelid == var->varno)))
+			return false;
 
-				/* Also skip special varno values, and system attributes ... */
-				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-					return false;
+		/* Also skip special varno values, and system attributes ... */
+		if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+			return false;
 
+		/* Lookup info about the base relation (we need to pass the OID out) */
+		if (relid != NULL)
 				*relid = var->varno;
 
-				/*
-				 * If it's not a "<" or ">" or "=" operator, just ignore the
-				 * clause. Otherwise note the relid and attnum for the variable.
-				 * This uses the function for estimating selectivity, ont the
-				 * operator directly (a bit awkward, but well ...).
-				 */
-				switch (get_oprrest(expr->opno))
-					{
-						case F_EQSEL:
-							*attnum = var->varattno;
-							return true;
-					}
-			}
+		*attnums = bms_add_member(*attnums, var->varattno);
+
+		return true;
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		/*
+		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses
+		 *      are supported and some are not, and treat all supported
+		 *      subclauses as a single clause, compute it's selectivity
+		 *      using mv stats, and compute the total selectivity using
+		 *      the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the
+		 *      orclause with nested RestrictInfo - we won't have to
+		 *      call pull_varnos() for each clause, saving time. 
+		 */
+		Bitmapset *tmp = NULL;
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			if (! clause_is_mv_compatible(root, (Node*)lfirst(l),
+						varRelid, relid, &tmp, sjinfo, types))
+				return false;
 		}
+
+		/* add the attnums from the OR-clause to the set of attnums */
+		*attnums = bms_join(*attnums, tmp);
+
+		return true;
 	}
 
 	return false;
-
 }
 
 /*
@@ -1340,6 +1811,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+			return true;
 	}
 
 	return false;
@@ -1635,25 +2109,39 @@ fdeps_filter_clauses(PlannerInfo *root,
 
 	foreach (lc, clauses)
 	{
-		AttrNumber	attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
 
-		if (! clause_is_mv_compatible(root, clause, varRelid, relid,
-									  &attnum, sjinfo))
+		if (! clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+									  sjinfo, MV_CLAUSE_TYPE_FDEP))
 
 			/* clause incompatible with functional dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
-		else if (! bms_is_member(attnum, deps_attnums))
+		else if (bms_num_members(attnums) > 1)
+
+			/*
+			 * clause referencing multiple attributes (strange, should
+			 * this be handled by clause_is_mv_compatible directly)
+			 */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
 
 			/* clause not covered by the dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
 		else
 		{
+			/* ok, clause compatible with existing dependencies */
+			Assert(bms_num_members(attnums) == 1);
+
 			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+										bms_singleton_member(attnums));
 		}
+
+		bms_free(attnums);
 	}
 
 	return clause_attnums;
@@ -1691,3 +2179,454 @@ get_varattnos(Node * node, Index relid)
 
 	return result;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/* frequency of the lowest MCV item */
+	*lowsel = 1.0;
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 *
+	 * FIXME This would probably deserve a refactoring, I guess. Unify
+	 *       the two loops and put the checks inside, or something like
+	 *       that.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			/* operator */
+			FmgrInfo		opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	ltproc, gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype,
+										TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * find the lowest selectivity in the MCV
+					 * FIXME Maybe not the best place do do this (in for all clauses).
+					 */
+					if (item->frequency < *lowsel)
+						*lowsel = item->frequency;
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* TODO consider bsearch here (list is sorted by values)
+					 * TODO handle other operators too (LT, GT)
+					 * TODO identify "full match" when the clauses fully
+					 *      match the whole MCV list (so that checking the
+					 *      histogram is not needed)
+					 */
+					if (oprrest == F_EQSEL)
+					{
+						/*
+						 * We don't care about isgt in equality, because it does not
+						 * matter whether it's (var = const) or (const = var).
+						 */
+						bool match = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+
+						if (match)
+							eqmatches = bms_add_member(eqmatches, idx);
+
+						mismatch = (! match);
+					}
+					else if (oprrest == F_SCALARLTSEL)	/* column < constant */
+					{
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+						} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+					}
+					else if (oprrest == F_SCALARGTSEL)	/* column > constant */
+					{
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/*
+				 * find the lowest selectivity in the MCV
+				 * FIXME Maybe not the best place do do this (in for all clauses).
+				 */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! mcvlist->items[i]->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (mcvlist->items[i]->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c397773..8c4396a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -409,7 +409,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -418,9 +418,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd200bc..d1da714 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 6d5465b..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..670dbda
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1237 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Multivariate MCVs (most-common values lists) are a straightforward
+ * extension of regular MCV list, tracking combinations of values for
+ * several attributes (columns), including NULL flags, and frequency
+ * of the combination.
+ *
+ * For columns with small number of distinct values, this works quite
+ * well and may represent the distribution very accurately. For columns
+ * with large number of distinct values (e.g. stored as FLOAT), this
+ * does not work that well. Especially if the distribution is mostly
+ * uniform, with no very common combinations.
+ *
+ * If we can represent the distribution as a MCV list, we can estimate
+ * some clauses (e.g. equality clauses) much accurately than using
+ * histograms for example.
+ *
+ * Another benefit of MCV lists (compared to histograms) is that they
+ * don't require sorting of the values, so that they work better for
+ * data types that either don't support sorting at all, or when the
+ * sorting does not really match the meaning. For example we know how to
+ * sort strings, but it's unlikely to make much sense for city names.
+ *
+ *
+ * Hashed MCV (not yet implemented)
+ * -------------------------------- 
+ * By restricting to MCV list and equality conditions, we may use hash
+ * values instead of the long varlena values. This significantly reduces
+ * the storage requirements, and we can still use it to estimate the
+ * equality conditions (assuming the collisions are rare enough).
+ *
+ * This however complicates matching the columns to available stats, as
+ * it requires matching clauses (not columns) to stats. And it may get
+ * quite complex - e.g. what if there are multiple clauses, each
+ * compatible with different stats subset?
+ *
+ *
+ * Selectivity estimation
+ * ----------------------
+ * The estimation, implemented in clauselist_mv_selectivity_mcvlist(),
+ * is quite simple in principle - walk through the MCV items and sum
+ * frequencies of all the items that match all the clauses.
+ *
+ * The current implementation uses MCV lists to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *  (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (e) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ *
+ * Estimating equality clauses
+ * ---------------------------
+ * When computing selectivity estimate for equality clauses
+ *
+ *      (a = 1) AND (b = 2)
+ *
+ * we can do this estimate pretty exactly assuming that two conditions
+ * are met:
+ *
+ *      (1) there's an equality condition on each attribute
+ *
+ *      (2) we find a matching item in the MCV list
+ *
+ * In that case we know the MCV item represents all the tuples matching
+ * the clauses, and the selectivity estimate is complete. This is what
+ * we call 'full match'.
+ *
+ * When only (1) holds, but there's no matching MCV item, we don't know
+ * whether there are no such rows or just are not very frequent. We can
+ * however use the frequency of the least frequent MCV item as an upper
+ * bound for the selectivity.
+ *
+ * If the equality conditions match only a subset of the attributes
+ * the MCV list is built on (i.e. we can't get a full match - we may get
+ * multiple MCV items matching the clauses, but even if we get a single
+ * match there may be items that did not get into the MCV list. But in
+ * this case we can still use the frequency of the last MCV item to clam
+ * the 'additional' selectivity not accounted for by the matching items.
+ *
+ * If there's no histogram, because the MCV list approximates the
+ * distribution accurately (not because the histogram was disabled),
+ * it does not really matter whether there are equality conditions on
+ * all the columns - we can do pretty accurate estimation using the MCV.
+ *
+ * TODO For a combination of equality conditions (not full-match case)
+ *      we probably can clamp the selectivity by the minimum of
+ *      selectivities for each condition. For example if we know the
+ *      number of distinct values for each column, we can use 1/ndistinct
+ *      as a per-column estimate. Or rather 1/ndistinct + selectivity
+ *      derived from the MCV list.
+ *
+ *      If we know the estimate of number of combinations of the columns
+ *      (i.e. ndistinct(A,B)), we may estimate the average frequency of
+ *      items in the remaining 10% as [10% / ndistinct(A,B)].
+ *
+ *
+ * Bounding estimates
+ * ------------------
+ * In general the MCV lists may not provide estimates as accurate as
+ * for the full-match equality case, but may provide some useful
+ * lower/upper boundaries for the estimation error.
+ *
+ * With equality clauses we can do a few more tricks to narrow this
+ * error range (see the previous section and TODO), but with inequality
+ * clauses (or generally non-equality clauses), it's rather dificult.
+ * There's nothing like a 'full match' - we have to consider both the
+ * MCV items and the remaining part every time. We can't use the minimum
+ * selectivity of MCV items, as the clauses may match multiple items.
+ *
+ * For example with a MCV list on columns (A, B), covering 90% of the
+ * table (computed while building the MCV list), about ~10% of the table
+ * is not represented by the MCV list. So even if the conditions match
+ * all the remaining rows (not represented by the MCV items), we can't
+ * get selectivity higher than those 10%. We may use 1/2 the remaining
+ * selectivity as an estimate (minimizing average error).
+ *
+ * TODO Most of these ideas (error limiting) are not yet implemented.
+ *
+ *
+ * General TODO
+ * ------------
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * TODO Add support for clauses referencing multiple columns (a < b).
+ *
+ * TODO It's possible to build a special case of MCV list, storing not
+ *      the actual values but only 32/64-bit hash. This is only useful
+ *      for estimating equality clauses and for large varlena types,
+ *      which are very impractical for plain MCV list because of size.
+ *      But for those data types we really want just the equality
+ *      clauses, so it's actually a good solution.
+ *
+ * TODO Currently there's no logic to consider building only a MCV list
+ *      (and not building the histogram at all), except for doing this
+ *      decision manually in ADD STATISTICS.
+ */
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't
+ * allow more than 8k MCV items (see list max_mcv_items). We might
+ * increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the high compression as with histograms,
+ * because we're not doing any bucket splits etc. (which is the source
+ * of high redundancy there), but we need to do it anyway as we need
+ * to serialize varlena values etc. We might invent another way to
+ * serialize MCV lists, but let's keep it consistent.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* do not exceed UINT16_MAX */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval || (info[i].typlen > 0))
+			/* by value pased by reference, but fixed length */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * Inverse to serialize_mv_mcvlist() - see the comment there.
+ *
+ * We'll do full deserialization, because we don't really expect high
+ * duplication of values so the caching may not be as efficient as with
+ * histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			val = item->values[i];
+			valout = FunctionCall1(&fmgrinfo[i], val);
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 5f89604..01d29db 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2101,8 +2101,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2120,14 +2121,28 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 2), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 3), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 8));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 81ec23b..c6e7d74 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -35,15 +35,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -59,11 +65,15 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					5
+#define Natts_pg_mv_statistic					9
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_deps_enabled		2
-#define Anum_pg_mv_statistic_deps_built			3
-#define Anum_pg_mv_statistic_stakeys			4
-#define Anum_pg_mv_statistic_stadeps			5
+#define Anum_pg_mv_statistic_mcv_enabled		3
+#define Anum_pg_mv_statistic_mcv_max_items		4
+#define Anum_pg_mv_statistic_deps_built			5
+#define Anum_pg_mv_statistic_mcv_built			6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_stadeps			8
+#define Anum_pg_mv_statistic_stamcv				9
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 69fc482..890c763 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2739,6 +2739,10 @@ DATA(insert OID = 3307 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3308 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 10f7425..917ae8d 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -572,9 +572,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 02a7dda..b028192 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -50,30 +50,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..85e8499
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE mcv_list ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+ALTER TABLE mcv_list ADD STATISTICS (dependencies, max_mcv_items 200) ON (a, b, c);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10) ON (a, b, c);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10000) ON (a, b, c);
+ERROR:  max number of MCV items is 8192
+-- correct command
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c, d);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a12ad30..faa41c7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1367,7 +1367,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 11d9d38..d083442 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/regression.diffs b/src/test/regress/regression.diffs
deleted file mode 100644
index 95b9cc5..0000000
--- a/src/test/regress/regression.diffs
+++ /dev/null
@@ -1,30 +0,0 @@
-*** /home/user/work/tvondra_postgres/src/test/regress/expected/rolenames.out	Wed May  6 21:31:06 2015
---- /home/user/work/tvondra_postgres/src/test/regress/results/rolenames.out	Mon May 25 22:24:21 2015
-***************
-*** 38,47 ****
---- 38,52 ----
-   ORDER BY 2;
-  $$ LANGUAGE SQL;
-  CREATE ROLE "Public";
-+ ERROR:  role "Public" already exists
-  CREATE ROLE "None";
-+ ERROR:  role "None" already exists
-  CREATE ROLE "current_user";
-+ ERROR:  role "current_user" already exists
-  CREATE ROLE "session_user";
-+ ERROR:  role "session_user" already exists
-  CREATE ROLE "user";
-+ ERROR:  role "user" already exists
-  CREATE ROLE current_user; -- error
-  ERROR:  CURRENT_USER cannot be used as a role name here
-  LINE 1: CREATE ROLE current_user;
-***************
-*** 938,940 ****
---- 943,946 ----
-  DROP OWNED BY testrol0, "Public", "current_user", testrol1, testrol2, testrolx CASCADE;
-  DROP ROLE testrol0, testrol1, testrol2, testrolx;
-  DROP ROLE "Public", "None", "current_user", "session_user", "user";
-+ ERROR:  current user cannot be dropped
-
-======================================================================
-
diff --git a/src/test/regress/regression.out b/src/test/regress/regression.out
deleted file mode 100644
index bd81385..0000000
--- a/src/test/regress/regression.out
+++ /dev/null
@@ -1,156 +0,0 @@
-test tablespace               ... ok
-test boolean                  ... ok
-test char                     ... ok
-test name                     ... ok
-test varchar                  ... ok
-test text                     ... ok
-test int2                     ... ok
-test int4                     ... ok
-test int8                     ... ok
-test oid                      ... ok
-test float4                   ... ok
-test float8                   ... ok
-test bit                      ... ok
-test numeric                  ... ok
-test txid                     ... ok
-test uuid                     ... ok
-test enum                     ... ok
-test money                    ... ok
-test rangetypes               ... ok
-test pg_lsn                   ... ok
-test regproc                  ... ok
-test strings                  ... ok
-test numerology               ... ok
-test point                    ... ok
-test lseg                     ... ok
-test line                     ... ok
-test box                      ... ok
-test path                     ... ok
-test polygon                  ... ok
-test circle                   ... ok
-test date                     ... ok
-test time                     ... ok
-test timetz                   ... ok
-test timestamp                ... ok
-test timestamptz              ... ok
-test interval                 ... ok
-test abstime                  ... ok
-test reltime                  ... ok
-test tinterval                ... ok
-test inet                     ... ok
-test macaddr                  ... ok
-test tstypes                  ... ok
-test comments                 ... ok
-test geometry                 ... ok
-test horology                 ... ok
-test regex                    ... ok
-test oidjoins                 ... ok
-test type_sanity              ... ok
-test opr_sanity               ... ok
-test insert                   ... ok
-test insert_conflict          ... ok
-test create_function_1        ... ok
-test create_type              ... ok
-test create_table             ... ok
-test create_function_2        ... ok
-test copy                     ... ok
-test copyselect               ... ok
-test create_misc              ... ok
-test create_operator          ... ok
-test create_index             ... ok
-test create_view              ... ok
-test create_aggregate         ... ok
-test create_function_3        ... ok
-test create_cast              ... ok
-test constraints              ... ok
-test triggers                 ... ok
-test inherit                  ... ok
-test create_table_like        ... ok
-test typed_table              ... ok
-test vacuum                   ... ok
-test drop_if_exists           ... ok
-test updatable_views          ... ok
-test rolenames                ... FAILED
-test sanity_check             ... ok
-test errors                   ... ok
-test select                   ... ok
-test select_into              ... ok
-test select_distinct          ... ok
-test select_distinct_on       ... ok
-test select_implicit          ... ok
-test select_having            ... ok
-test subselect                ... ok
-test union                    ... ok
-test case                     ... ok
-test join                     ... ok
-test aggregates               ... ok
-test groupingsets             ... ok
-test transactions             ... ok
-test random                   ... ok
-test portals                  ... ok
-test arrays                   ... ok
-test btree_index              ... ok
-test hash_index               ... ok
-test update                   ... ok
-test delete                   ... ok
-test namespace                ... ok
-test prepared_xacts           ... ok
-test brin                     ... ok
-test gin                      ... ok
-test gist                     ... ok
-test spgist                   ... ok
-test privileges               ... ok
-test security_label           ... ok
-test collate                  ... ok
-test matview                  ... ok
-test lock                     ... ok
-test replica_identity         ... ok
-test rowsecurity              ... ok
-test object_address           ... ok
-test alter_generic            ... ok
-test misc                     ... ok
-test psql                     ... ok
-test async                    ... ok
-test rules                    ... ok
-test select_views             ... ok
-test portals_p2               ... ok
-test foreign_key              ... ok
-test cluster                  ... ok
-test dependency               ... ok
-test guc                      ... ok
-test bitmapops                ... ok
-test combocid                 ... ok
-test tsearch                  ... ok
-test tsdicts                  ... ok
-test foreign_data             ... ok
-test window                   ... ok
-test xmlmap                   ... ok
-test functional_deps          ... ok
-test advisory_lock            ... ok
-test json                     ... ok
-test jsonb                    ... ok
-test indirect_toast           ... ok
-test equivclass               ... ok
-test plancache                ... ok
-test limit                    ... ok
-test plpgsql                  ... ok
-test copy2                    ... ok
-test temp                     ... ok
-test domain                   ... ok
-test rangefuncs               ... ok
-test prepare                  ... ok
-test without_oid              ... ok
-test conversion               ... ok
-test truncate                 ... ok
-test alter_table              ... ok
-test sequence                 ... ok
-test polymorphism             ... ok
-test rowtypes                 ... ok
-test returning                ... ok
-test largeobject              ... ok
-test with                     ... ok
-test xml                      ... ok
-test event_trigger            ... ok
-test stats                    ... ok
-test tablesample              ... ok
-test mv_dependencies          ... ok
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 66925b3..e63b7aa 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -157,3 +157,4 @@ test: event_trigger
 test: stats
 test: tablesample
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..5de3d29
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (unknown_column);
+
+-- single column
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a);
+
+-- single column, duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE mcv_list ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- missing MCV statistics
+ALTER TABLE mcv_list ADD STATISTICS (dependencies, max_mcv_items 200) ON (a, b, c);
+
+-- invalid mcv_max_items value / too low
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10) ON (a, b, c);
+
+-- invalid mcv_max_items value / too high
+ALTER TABLE mcv_list ADD STATISTICS (mcv, max_mcv_items 10000) ON (a, b, c);
+
+-- correct command
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE mcv_list ADD STATISTICS (mcv) ON (a, b, c, d);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
1.9.3

0005-multivariate-histograms-v7.patchtext/x-patch; name=0005-multivariate-histograms-v7.patchDownload

>From 89db32a7015e92bb5642604b822e9d3a41db2701 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/6] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/tablecmds.c           |   86 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  713 ++++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2188 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  131 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 18 files changed, 3566 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3144a29..0a1c25b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,7 +158,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0d72aec..4c2da51 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11919,12 +11919,15 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(def, StatisticsDef));
 
@@ -12002,6 +12005,29 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -12010,10 +12036,10 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -12021,6 +12047,11 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -12038,10 +12069,14 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled   -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled  -1] = BoolGetDatum(build_histogram);
+
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps -1]  = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
@@ -12064,6 +12099,7 @@ static void ATExecAddStatistics(AlteredTableInfo *tab, Relation rel,
 	return;
 }
 
+
 /*
  * Implements the ALTER TABLE ... DROP STATISTICS in two forms:
  *
@@ -12089,12 +12125,16 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 	/* checking whether the statistics matches / should be dropped */
 	bool	build_dependencies = false;
 	bool	build_mcv = false;
+	bool	build_histogram = false;
 
 	bool	max_mcv_items = 0;
+	bool	max_buckets = 0;
 
 	bool	check_dependencies = false;
 	bool	check_mcv = false;
 	bool	check_mcv_items = false;
+	bool	check_histogram = false;
+	bool	check_buckets = false;
 
 	if (def != NULL)
 	{
@@ -12148,6 +12188,18 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 				build_mcv       = true;
 				max_mcv_items   = defGetInt32(opt);
 			}
+			else if (strcmp(opt->defname, "histogram") == 0)
+			{
+				check_histogram = true;
+				build_histogram = defGetBoolean(opt);
+			}
+			else if (strcmp(opt->defname, "max_buckets") == 0)
+			{
+				check_histogram = true;
+				check_buckets   = true;
+				max_buckets     = defGetInt32(opt);
+				build_histogram = true;
+			}
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
@@ -12211,6 +12263,30 @@ static void ATExecDropStatistics(AlteredTableInfo *tab, Relation rel,
 					 (DatumGetInt32(adatum) == max_mcv_items);
 		}
 
+		if (delete && check_histogram)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_hist_enabled,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetBool(adatum) == build_histogram);
+		}
+
+		if (delete && check_buckets)
+		{
+			bool isnull;
+			Datum adatum = heap_getattr(tuple,
+								  Anum_pg_mv_statistic_hist_max_buckets,
+								  RelationGetDescr(statrel),
+								  &isnull);
+
+			delete = (! isnull) &&
+					 (DatumGetInt32(adatum) == max_buckets);
+		}
+
 		/* check that the columns match the statistics definition */
 		if (delete && (numcols > 0))
 		{
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1867ab7..19d672f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1908,10 +1908,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 95872de..bc02e92 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -73,6 +74,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -80,6 +83,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
@@ -304,7 +313,7 @@ clauselist_selectivity(PlannerInfo *root,
 	 * Check that there are statistics with MCV list. If not, we don't
 	 * need to waste time with the optimization.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 	{
 		/*
 		 * Recollect attributes from mv-compatible clauses (maybe we've
@@ -312,7 +321,7 @@ clauselist_selectivity(PlannerInfo *root,
 		 * From now on we're only interested in MCV-compatible clauses.
 		 */
 		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   MV_CLAUSE_TYPE_MCV);
+									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 		/*
 		 * If there still are at least two columns, we'll try to select
@@ -331,7 +340,7 @@ clauselist_selectivity(PlannerInfo *root,
 				/* split the clauselist into regular and mv-clauses */
 				clauses = clauselist_mv_split(root, sjinfo, clauses,
 										varRelid, &mvclauses, mvstat,
-										MV_CLAUSE_TYPE_MCV);
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 				/* we've chosen the histogram to match the clauses */
 				Assert(mvclauses != NIL);
@@ -1116,6 +1125,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -1129,9 +1139,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1273,7 +1298,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1433,7 +1458,6 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		bool		ok;
 
 		/* is it 'variable op constant' ? */
-
 		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
 			(is_pseudo_constant_clause_relids(lsecond(expr->args),
 											  right_relids) ||
@@ -1483,10 +1507,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 					case F_SCALARLTSEL:
 					case F_SCALARGTSEL:
 						/* not compatible with functional dependencies */
-						if (types & MV_CLAUSE_TYPE_MCV)
+						if (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 						{
 							*attnums = bms_add_member(*attnums, var->varattno);
-							return (types & MV_CLAUSE_TYPE_MCV);
+							return (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 						}
 						return false;
 
@@ -1814,6 +1838,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2630,3 +2657,671 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 *
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																	   | TYPECACHE_GT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					bool tmp;
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+
+					/* values from the call cache */
+					char mincached, maxcached;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mincached = callcache[bucket->min[idx]];
+					maxcached = callcache[bucket->max[idx]];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*
+					* FIXME Once the min/max values are deduplicated, we can easily minimize
+					*       the number of calls to the comparator (assuming we keep the
+					*       deduplicated structure). See the note on compression at MVBucket
+					*       serialize/deserialize methods.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* column < constant */
+
+							if (! isgt)	/* (var < const) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 minval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (minval, constvalue).
+									 */
+									callcache[bucket->min[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(mincached & 0x02);	/* get call result from the cache (inverse) */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 maxval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (minval, constvalue).
+									 */
+									callcache[bucket->max[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(maxcached & 0x02);	/* extract the result (reverse) */
+
+								if (tmp)	/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							}
+							else	/* (const < var) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 maxval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 minval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (mincached & 0x02);	/* extract the result */
+
+								if (tmp)	/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_SCALARGTSEL:	/* column > constant */
+
+							if (! isgt)	/* (var > const) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 maxval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (val, constvalue).
+									 */
+									callcache[bucket->max[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 minval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (val, constvalue).
+									 */
+									callcache[bucket->min[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(mincached & 0x02);	/* extract the result */
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							else /* (const > var) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in
+								 * that case we can skip the bucket, because there's no overlap).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 minval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (mincached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 maxval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the lt/gt
+							 * operators fetched from type cache.
+							 *
+							 * TODO We'll use the default 50% estimate, but that's probably way off
+							 *		if there are multiple distinct values. Consider tweaking this a
+							 *		somehow, e.g. using only a part inversely proportional to the
+							 *		estimated number of distinct values in the bucket.
+							 *
+							 * TODO This does not handle inclusion flags at the moment, thus counting
+							 *		some buckets twice (when hitting the boundary).
+							 *
+							 * TODO Optimization is that if max[i] == min[i], it's effectively a MCV
+							 *		item and we can count the whole bucket as a complete match (thus
+							 *		using 100% bucket selectivity and not just 50%).
+							 *
+							 * TODO Technically some buckets may "degenerate" into single-value
+							 *		buckets (not necessarily for all the dimensions) - maybe this
+							 *		is better than keeping a separate MCV list (multi-dimensional).
+							 *		Update: Actually, that's unlikely to be better than a separate
+							 *		MCV list for two reasons - first, it requires ~2x the space
+							 *		(because of storing lower/upper boundaries) and second because
+							 *		the buckets are ranges - depending on the partitioning algorithm
+							 *		it may not even degenerate into (min=max) bucket. For example the
+							 *		the current partitioning algorithm never does that.
+							 */
+							if (! mincached)
+							{
+								tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 minval));
+
+								/* Update the cache. */
+								callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+							}
+							else
+								tmp = (mincached & 0x02);	/* extract the result */
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							if (! maxcached)
+							{
+								tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																	 DEFAULT_COLLATION_OID,
+																	 maxval,
+																	 cst->constvalue));
+
+								/* Update the cache. */
+								callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+							}
+							else
+								tmp = (maxcached & 0x02);	/* extract the result */
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							/* partial match */
+							UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							break;
+					}
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8c4396a..0dc575a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -409,7 +409,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -419,10 +419,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d1da714..ffb76f4 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..6290d2f
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2188 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+/*
+ * Multivariate histograms
+ * -----------------------
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by a min/max value in each
+ * dimension, stored in an array, so that the bucket includes values
+ * fulfilling condition
+ *
+ *     min[i] <= value[i] <= max[i]
+ *
+ * where 'i' is the dimension. In 1D this corresponds to a simple
+ * interval, in 2D to a rectangle, and in 3D to a block. If you can
+ * imagine this in 4D, congrats!
+ *
+ * In addition to the bounaries, each bucket tracks additional details:
+ *
+ *     * frequency (fraction of tuples it matches)
+ *     * whether the boundaries are inclusive or exclusive
+ *     * whether the dimension contains only NULL values
+ *     * number of distinct values in each dimension (for building)
+ *
+ * and possibly some additional information.
+ *
+ * We do expect to support multiple histogram types, with different
+ * features etc. The 'type' field is used to identify those types.
+ * Technically some histogram types might use completely different
+ * bucket representation, but that's not expected at the moment.
+ *
+ * Although the current implementation builds non-overlapping buckets,
+ * the code does not (and should not) rely on the non-overlapping
+ * nature - there are interesting types of histograms / histogram
+ * building algorithms producing overlapping buckets.
+ *
+ *
+ * NULL handling (create_null_buckets)
+ * -----------------------------------
+ * Another thing worth mentioning is handling of NULL values. It would
+ * be quite difficult to work with buckets containing NULL and non-NULL
+ * values for a single dimension. To work around this, the initial step
+ * in building a histogram is building a set of 'NULL-buckets', i.e.
+ * buckets with one or more NULL-only dimensions.
+ *
+ * After that, no buckets are mixing NULL and non-NULL values in one
+ * dimension, and the actual histogram building starts. As that only
+ * splits the buckets into smaller ones, the resulting buckets can't
+ * mix NULL and non-NULL values either.
+ *
+ * The maximum number of NULL-buckets is determined by the number of
+ * attributes the histogram is built on. For N-dimensional histogram,
+ * the maximum number of NULL-buckets is 2^N. So for 8 attributes
+ * (which is the current value of MVSTATS_MAX_DIMENSIONS), there may be
+ * up to 256 NULL-buckets.
+ *
+ * Those buckets are only built if needed - if there are no NULL values
+ * in the data, no such buckets are built.
+ *
+ *
+ * Estimating selectivity
+ * ----------------------
+ * With histograms, we always "match" a whole bucket, not indivitual
+ * rows (or values), irrespectedly of the type of clause. Therefore we
+ * can't use the optimizations for equality clauses, as in MCV lists.
+ *
+ * The current implementation uses histograms to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *  (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (e) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ * When used on low-cardinality data, histograms usually perform
+ * considerably worse than MCV lists (which are a good fit for this
+ * kind of data). This is especially true on categorical data, where
+ * ordering of the values is mostly unrelated to meaning of the data,
+ * as proper ordering is crucial for histograms.
+ *
+ * On high-cardinality data the histograms are usually a better choice,
+ * because MCV lists can't represent the distribution accurately enough.
+ *
+ * By evaluating a clause on a bucket, we may get one of three results:
+ *
+ *     (a) FULL_MATCH - The bucket definitely matches the clause.
+ *
+ *     (b) PARTIAL_MATCH - The bucket matches the clause, but not
+ *                         necessarily all the tuples it represents.
+ *
+ *     (c) NO_MATCH - The bucket definitely does not match the clause.
+ *
+ * This may be illustrated using a range [1, 5], which is essentially
+ * a 1-D bucket. With clause
+ *
+ *     WHERE (a < 10) => FULL_MATCH (all range values are below
+ *                       10, so the whole bucket matches)
+ *
+ *     WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+ *                       the clause, but we don't know how many)
+ *
+ *     WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+ *                       no values from the bucket can match)
+ *
+ * Some clauses may produce only some of those results - for example
+ * equality clauses may never produce FULL_MATCH as we always hit only
+ * part of the bucket (we can't match both boundaries at the same time).
+ * This results in less accurate estimates compared to MCV lists, where
+ * we can hit a MCV items exactly (there's no PARTIAL match in MCV).
+ *
+ * There are clauses that may not produce any PARTIAL_MATCH results.
+ * A nice example of that is 'IS [NOT] NULL' clause, which either
+ * matches the bucket completely (FULL_MATCH) or not at all (NO_MATCH),
+ * thanks to how the NULL-buckets are constructed.
+ *
+ * Computing the total selectivity estimate is trivial - simply sum
+ * selectivities from all the FULL_MATCH and PARTIAL_MATCH buckets (but
+ * multiply the PARTIAL_MATCH buckets by 0.5 to minimize average error).
+ *
+ *
+ * Serialization
+ * -------------
+ * After building, the histogram is serialized into a more efficient
+ * form (dedup boundary values etc.). See serialize_mv_histogram() for
+ * more details about how it's done.
+ *
+ * Serialized histograms are marked with 'magic' constant, to make it
+ * easier to check the bytea value really is a serialized histogram.
+ *
+ * In the serialized form, values for each dimension are deduplicated,
+ * and referenced using an uint16 index. This saves a lot of space,
+ * because every time we split a bucket, we introduce a single new
+ * boundary value (to split the bucket by the selected dimension), but
+ * we actually copy all the boundary values for all dimensions. So for
+ * a histogram with 4 dimensions and 1000 buckets, we do have
+ *
+ *     1000 * 4 * 2 = 8000
+ *
+ * boundary values, but many of them are actually duplicated because
+ * the histogram started with a single bucket (8 boundary values) and
+ * then there were 999 splits (each introducing 1 new value):
+ *
+ *      8 + 999 = 1007
+ *
+ * So that's quite large diffence. Let's assume the Datum values are
+ * 8 bytes each. Storing the raw histogram would take ~ 64 kB, while
+ * with deduplication it's only ~18 kB.
+ *
+ * The difference may be removed by the transparent bytea compression,
+ * but the deduplication is also used to optimize the estimation. It's
+ * possible to process the deduplicated values, and then use this as
+ * a cache to minimize the actual function calls while checking the
+ * buckets. This significantly reduces the number of calls to the
+ * (often quite expensive) operator functions etc.
+ *
+ *
+ * The current limit on number of buckets (16384) is mostly arbitrary,
+ * but set so that it makes sure we don't exceed the number of distinct
+ * values indexable by uint16. In practice we could handle more buckets,
+ * because we index each dimension independently, and we do the splits
+ * over multiple dimensions.
+ *
+ * Histograms with more than 16k buckets are quite expensive to build
+ * and process, so the current limit is somewhat reasonable.
+ *
+ * The actual number of buckets is also related to statistics target,
+ * because we require MIN_BUCKET_ROWS (10) tuples per bucket before
+ * a split, so we can't have more than (2 * 300 * target / 10) buckets.
+ *
+ *
+ * TODO Maybe the distinct stats (both for combination of all columns
+ *      and for combinations of various subsets of columns) should be
+ *      moved to a separate structure (next to histogram/MCV/...) to
+ *      make it useful even without a histogram computed etc.
+ *
+ *      This would actually make mvcoeff (proposed by Kyotaro Horiguchi
+ *      in [1]) possible. Seems like a good way to estimate GROUP BY
+ *      cardinality, and also some other cases, pointed out by Kyotaro:
+ *
+ *      [1] http://www.postgresql.org/message-id/20150515.152936.83796179.horiguchi.kyotaro@lab.ntt.co.jp
+ *
+ *      This is not implemented at the moment, though. Also, Kyotaro's
+ *      patch only works with pairs of columns, but maybe tracking all
+ *      the combinations would be useful to handle more complex
+ *      conditions. It only seems to handle equalities, though (but for
+ *      GROUP BY estimation that's not a big deal).
+ */
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size.
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket
+ * for more details about the algorithm.
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later
+	 * to select dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		int				j;
+		int				nvals;
+		Datum		   *tmp;
+
+		SortSupportData	ssup;
+		StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		nvals = 0;
+		tmp = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+		for (j = 0; j < numrows; j++)
+		{
+			bool	isnull;
+
+			/* remember the index of the sample row, to make the partitioning simpler */
+			Datum	value = heap_getattr(rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isnull);
+
+			if (isnull)
+				continue;
+
+			tmp[nvals++] = value;
+		}
+
+		/* do the sort and stuff only if there are non-NULL values */
+		if (nvals > 0)
+		{
+			/* sort the array of values */
+			qsort_arg((void *) tmp, nvals, sizeof(Datum),
+					  compare_scalars_simple, (void *) &ssup);
+
+			/* count distinct values */
+			ndistvalues[i] = 1;
+			for (j = 1; j < nvals; j++)
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+					ndistvalues[i] += 1;
+
+			/* FIXME allocate only needed space (count ndistinct first) */
+			distvalues[i] = (Datum*)palloc0(sizeof(Datum) * ndistvalues[i]);
+
+			/* now collect distinct values into the array */
+			distvalues[i][0] = tmp[0];
+			ndistvalues[i] = 1;
+
+			for (j = 1; j < nvals; j++)
+			{
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+				{
+					distvalues[i][ndistvalues[i]] = tmp[j];
+					ndistvalues[i] += 1;
+				}
+			}
+		}
+
+		pfree(tmp);
+	}
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats,
+							   ndistvalues, distvalues);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in
+		 * case some of the rows were used for MCV (and thus are missing
+		 * from the histogram).
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm
+ * is simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * buckets in the histogram max_buckets (well, we might increase this
+ * to 16k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ *
+ * Deduplication in serialization
+ * ------------------------------
+ * The deduplication is very effective and important here, because every
+ * time we split a bucket, we keep all the boundary values, except for
+ * the dimension that was used for the split. Another way to look at
+ * this is that each split introduces 1 new value (the value used to do
+ * the split). A histogram with M buckets was created by (M-1) splits
+ * of the initial bucket, and each bucket has 2*N boundary values. So
+ * assuming the initial bucket does not have any 'collapsed' dimensions,
+ * the number of distinct values is
+ *
+ *     (2*N + (M-1))
+ *
+ * but the total number of boundary values is
+ *
+ *     2*N*M
+ *
+ * which is clearly much higher. For a histogram on two columns, with
+ * 1024 buckets, it's 1027 vs. 4096. Of course, we're not saving all
+ * the difference (because we'll use 32-bit indexes into the values).
+ * But with large values (e.g. stored as varlena), this saves a lot.
+ *
+ * An interesting feature is that the total number of distinct values
+ * does not really grow with the number of dimensions, except for the
+ * size of the initial bucket. After that it only depends on number of
+ * buckets (i.e. number of splits).
+ *
+ * XXX Of course this only holds for the current histogram building
+ *     algorithm. Algorithms doing the splits differently (e.g.
+ *     producing overlapping buckets) may behave differently.
+ *
+ * TODO This only confirms we can use the uint16 indexes. The worst
+ *      that could happen is if all the splits happened by a single
+ *      dimension. To exhaust the uint16 this would require ~64k
+ *      splits (needs to be reflected in MVSTAT_HIST_MAX_BUCKETS).
+ *
+ * TODO We don't need to use a separate boolean for each flag, instead
+ *      use a single char and set bits.
+ *
+ * TODO We might get a bit better compression by considering the actual
+ *      data type length. The current implementation treats all data
+ *      types passed by value as requiring 8B, but for INT it's actually
+ *      just 4B etc.
+ *
+ *      OTOH this is only related to the lookup table, and most of the
+ *      space is occupied by the buckets (with int16 indexes).
+ *
+ *
+ * Varlena compression
+ * -------------------
+ * This encoding may prevent automatic varlena compression (similarly
+ * to JSONB), because first part of the serialized bytea will be an
+ * array of unique values (although sorted), and pglz decides whether
+ * to compress by trying to compress the first part (~1kB or so). Which
+ * is likely to be poor, due to the lack of repetition.
+ *
+ * One possible cure to that might be storing the buckets first, and
+ * then the deduplicated arrays. The buckets might be better suited
+ * for compression.
+ *
+ * On the other hand the encoding scheme is a context-aware compression,
+ * usually compressing to ~30% (or less, with large data types). So the
+ * lack of pglz compression may be OK.
+ *
+ * XXX But maybe we don't really want to compress this, to save on
+ *     planning time?
+ *
+ * TODO Try storing the buckets / deduplicated arrays in reverse order,
+ *      measure impact on compression.
+ *
+ *
+ * Deserialization
+ * ---------------
+ * The deserialization is currently implemented so that it reconstructs
+ * the histogram back into the same structures - this involves quite
+ * a few of memcpy() and palloc(), but maybe we could create a special
+ * structure for the serialized histogram, and access the data directly,
+ * without the unpacking.
+ *
+ * Not only it would save some memory and CPU time, but might actually
+ * work better with CPU caches (not polluting the caches).
+ *
+ * TODO Try to keep the compressed form, instead of deserializing it to
+ *      MVHistogram/MVBucket.
+ *
+ *
+ * General TODOs
+ * -------------
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (10 * 1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 10MB (%ld > %d)",
+					total_length, (10 * 1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typlen > 0)
+			{
+				/* pased by value or reference, but fixed length */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary
+ * values deduplicated, so that it's possible to optimize the estimation
+ * part by caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* now let's allocate a single buffer for all the values and counts */
+
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+	for (i = 0; i < ndims; i++)
+	{
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+	}
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+				sizeof(MVSerializedBucket) +		/* bucket pointer */
+				sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval && info[i].typlen == sizeof(Datum))
+		{
+			/* passed by value / Datum - simply reuse the array */
+			histogram->values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typbyval)
+			{
+				/* pased by value, but smaller than Datum */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= *BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size. We
+ * select the bucket with the highest number of distinct values, and
+ * then split it by the longest dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this
+ * is used to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this
+ *       contains values for all the tuples from the sample, not just
+ *       the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *      For example use the "bucket volume" (product of dimension
+ *      lengths) to select the bucket.
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest
+ * bucket dimension, measured using the array of distinct values built
+ * at the very beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly
+ * distributed, and then use this to measure length. It's essentially
+ * a number of distinct values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts
+ * with roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning
+ * the new bucket (essentially shrinking the existing one in-place and
+ * returning the other "half" as a new bucket). The caller is responsible
+ * for adding the new bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case
+ *      of strongly dependent columns - e.g. y=x). The current algorithm
+ *      minimizes this, but may still happen for perfectly dependent
+ *      examples (when all the dimensions have equal length, the first
+ *      one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	// int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split.
+	 */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* sort support for the bsearch_comparator */
+		ssup_private = &ssup;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		b = (Datum*)bsearch(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram (or if there's no
+ * statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ * 
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options
+ * skew the lengths by distributing the distinct values uniformly. For
+ * data types without a clear meaning of 'distance' (e.g. strings) that
+ * is not a big deal, but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_size = 1.0;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(9 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+		values[3] = (char *) palloc0(1024 * sizeof(char));
+		values[4] = (char *) palloc0(1024 * sizeof(char));
+		values[5] = (char *) palloc0(1024 * sizeof(char));
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/*
+		 * currently we only print array of indexes, but the deduplicated
+		 * values should be sorted, so this is actually quite useful
+		 *
+		 * TODO print the actual min/max values, using the output
+		 *      function of the attribute type
+		 */
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bucket_size *= (bucket->max[i] - bucket->min[i]) * 1.0
+											/ (histogram->nvalues[i]-1);
+
+			/* print the actual values, i.e. use output function etc. */
+			if (otype == 0)
+			{
+				Datum minval, maxval;
+				Datum minout, maxout;
+
+				format = "%s, %s";
+				if (i == 0)
+					format = "{%s%s";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %s}";
+
+				minval = histogram->values[i][bucket->min[i]];
+				minout = FunctionCall1(&fmgrinfo[i], minval);
+
+				maxval = histogram->values[i][bucket->max[i]];
+				maxout = FunctionCall1(&fmgrinfo[i], maxval);
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], DatumGetPointer(minout));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], DatumGetPointer(maxout));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else if (otype == 1)
+			{
+				format = "%s, %d";
+				if (i == 0)
+					format = "{%s%d";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %d}";
+
+				snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else
+			{
+				format = "%s, %f";
+				if (i == 0)
+					format = "{%s%f";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %f}";
+
+				snprintf(buff, 1024, format, values[1],
+						 bucket->min[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2],
+						bucket->max[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == histogram->ndimensions-1)
+				format = "%s, %s}";
+
+			snprintf(buff, 1024, format, values[3], bucket->nullsonly[i] ? "t" : "f");
+			strncpy(values[3], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[4], bucket->min_inclusive[i] ? "t" : "f");
+			strncpy(values[4], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[5], bucket->max_inclusive[i] ? "t" : "f");
+			strncpy(values[5], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_size);	/* density */
+		snprintf(values[8], 64, "%f", bucket_size);	/* bucket_size */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+		pfree(values[4]);
+		pfree(values[5]);
+		pfree(values[6]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 01d29db..af3bd62 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2101,9 +2101,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2141,8 +2141,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 8));
+							PQgetvalue(result, i, 10));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index c6e7d74..84579da 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -36,13 +36,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -50,6 +53,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -65,15 +69,19 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					9
+#define Natts_pg_mv_statistic					13
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_deps_enabled		2
 #define Anum_pg_mv_statistic_mcv_enabled		3
-#define Anum_pg_mv_statistic_mcv_max_items		4
-#define Anum_pg_mv_statistic_deps_built			5
-#define Anum_pg_mv_statistic_mcv_built			6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_stadeps			8
-#define Anum_pg_mv_statistic_stamcv				9
+#define Anum_pg_mv_statistic_hist_enabled		4
+#define Anum_pg_mv_statistic_mcv_max_items		5
+#define Anum_pg_mv_statistic_hist_max_buckets	6
+#define Anum_pg_mv_statistic_deps_built			7
+#define Anum_pg_mv_statistic_mcv_built			8
+#define Anum_pg_mv_statistic_hist_built			9
+#define Anum_pg_mv_statistic_stakeys			10
+#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_stamcv				12
+#define Anum_pg_mv_statistic_stahist			13
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 890c763..1d451f6 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2743,6 +2743,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_size}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 917ae8d..abf5815 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -573,10 +573,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index b028192..70f79ed 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -91,6 +91,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -98,20 +215,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -120,6 +242,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -129,10 +253,15 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..a3d3fd8
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (unknown_column);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a, b);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+ALTER TABLE mv_histogram ADD STATISTICS (unknown_option) ON (a, b, c);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+ALTER TABLE mv_histogram ADD STATISTICS (dependencies, max_buckets 200) ON (a, b, c);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 10) ON (a, b, c);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 100000) ON (a, b, c);
+ERROR:  minimum number of buckets is 16384
+-- correct command
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c, d);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index faa41c7..e230e58 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1369,7 +1369,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index d083442..8715d17 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index e63b7aa..6b9ed27 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -158,3 +158,4 @@ test: stats
 test: tablesample
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..31c627a
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (unknown_column);
+
+-- single column
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a);
+
+-- single column, duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a);
+
+-- two columns, one duplicated
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, a, b);
+
+-- unknown option
+ALTER TABLE mv_histogram ADD STATISTICS (unknown_option) ON (a, b, c);
+
+-- missing histogram statistics
+ALTER TABLE mv_histogram ADD STATISTICS (dependencies, max_buckets 200) ON (a, b, c);
+
+-- invalid max_buckets value / too low
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 10) ON (a, b, c);
+
+-- invalid max_buckets value / too high
+ALTER TABLE mv_histogram ADD STATISTICS (mcv, max_buckets 100000) ON (a, b, c);
+
+-- correct command
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+ALTER TABLE mv_histogram ADD STATISTICS (histogram) ON (a, b, c, d);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
1.9.3

0006-multi-statistics-estimation-v7.patchtext/x-patch; name=0006-multi-statistics-estimation-v7.patchDownload

>From a9df974e90067f68ea106e89a08ebc887412b5b5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/6] multi-statistics estimation

The general idea is that a probability (which
is what selectivity is) can be split into a product of
conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part
may be simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute
the original probability.

The implementation works in the other direction, though.
We know what probability P(A & B & C) we need to compute,
and also what statistics are available.

So we search for a combinations of statistics, covering
the clauses in an optimal way (most clauses covered, most
dependencies exploited).

There are two possible approaches - exhaustive and greedy.
The exhaustive one walks through all permutations of
stats using dynamic programming, so it's guaranteed to
find the optimal solution, but it soon gets very slow as
it's roughly O(N!). The dynamic programming may improve
that a bit, but it's still far too expensive for large
numbers of statistics (on a single table).

The greedy algorithm is very simple - in every step choose
the best solution. That may not guarantee the best solution
globally (but maybe it does?), but it only needs N steps
to find the solution, so it's very fast (processing the
selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with
respect to runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply
them to the clauses using the conditional probabilities.
We process the selected stats one by one, and for each
we select the estimated clauses and conditions. See
clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to
be covered by a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single
multivariate statistics.

Clauses not covered by a single statistics at this level
will be passed to clause_selectivity() but this will treat
them as a collection of simpler clauses (connected by AND
or OR), and the clauses from the previous level will be
used as conditions.

So using the same example, the last clause will be passed
to clause_selectivity() with 'clause1' and 'clause2' as
conditions, and it will be processed using multivariate
stats if possible.

The other limitation is that all the expressions have to
be mv-compatible, i.e. there can't be a mix of expressions.
If this is violated, the clause may be passed to the next
level (just like with list of clauses not covered by
a single statistics), which splits that into clauses
handled by multivariate stats and clauses handler by
regular statistics.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |    6 +-
 src/backend/optimizer/path/clausesel.c | 2151 +++++++++++++++++++++++++++++---
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 9 files changed, 2016 insertions(+), 222 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 499f24f..0d7d2e7 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -949,7 +949,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 6da01e1..bd487c5 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -479,7 +479,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -1785,7 +1786,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_join_conds,
 										   baserel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index bc02e92..fce77ec 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -59,23 +68,29 @@ static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
 									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
 									  int type);
 
+static Bitmapset *clause_mv_get_attnums(PlannerInfo *root, Node *clause);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
 static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
 								 List *clauses, Oid varRelid,
 								 List **mvclauses, MVStatisticInfo *mvstats, int types);
 
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-						List *clauses, MVStatisticInfo *mvstats);
+						MVStatisticInfo *mvstats, List *clauses,
+						List *conditions, bool is_or);
+
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -89,11 +104,59 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root,
+								List *mvstats,
+								List *clauses, List *conditions,
+								Oid varRelid,
+								SpecialJoinInfo *sjinfo);
+
+static List *filter_clauses(PlannerInfo *root, Oid varRelid,
+							SpecialJoinInfo *sjinfo, int type,
+							List *stats, List *clauses,
+							Bitmapset **attnums);
+
+static List *filter_stats(List *stats, Bitmapset *new_attnums,
+						  Bitmapset *all_attnums);
+
+static Bitmapset **make_stats_attnums(MVStatisticInfo *mvstats,
+									  int nmvstats);
+
+static MVStatisticInfo *make_stats_array(List *stats, int *nmvstats);
+
+static List* filter_redundant_stats(List *stats,
+									List *clauses, List *conditions);
+
+static Node** make_clauses_array(List *clauses, int *nclauses);
+
+static Bitmapset ** make_clauses_attnums(PlannerInfo *root, Oid varRelid,
+										 SpecialJoinInfo *sjinfo, int type,
+										 Node **clauses, int nclauses);
+
+static bool* make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+							Bitmapset **clauses_attnums, int nclauses);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
 						 Oid varRelid, Index *relid);
- 
+
 static Bitmapset* fdeps_collect_attnums(List *stats);
 
 static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
@@ -116,6 +179,8 @@ static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
 
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
+
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
 #define MIN(x, y) (((x) < (y)) ? (x) : (y))
@@ -256,14 +321,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* attributes in mv-compatible clauses */
 	Bitmapset  *mvattnums = NULL;
@@ -273,12 +339,13 @@ clauselist_selectivity(PlannerInfo *root,
 	stats = find_stats(root, clauses, varRelid, &relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Check that there are some stats with functional dependencies
@@ -310,8 +377,8 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
-	 * Check that there are statistics with MCV list. If not, we don't
-	 * need to waste time with the optimization.
+	 * Check that there are statistics with MCV list or histogram.
+	 * If not, we don't need to waste time with the optimization.
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 	{
@@ -325,33 +392,194 @@ clauselist_selectivity(PlannerInfo *root,
 
 		/*
 		 * If there still are at least two columns, we'll try to select
-		 * a suitable multivariate stats.
+		 * a suitable combination of multivariate stats. If there are
+		 * multiple combinations, we'll try to choose the best one.
+		 * See choose_mv_statistics for more details.
 		 */
 		if (bms_num_members(mvattnums) >= 2)
 		{
-			/* see choose_mv_statistics() for details */
-			MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+			int k;
+			ListCell *s;
 
-			if (mvstat != NULL)	/* we have a matching stats */
+			/*
+			 * Copy the list of conditions, so that we can build a list
+			 * of local conditions (and keep the original intact, for
+			 * the other clauses at the same level).
+			 */
+			List *conditions_local = list_copy(conditions);
+
+			/* find the best combination of statistics */
+			List *solution = choose_mv_statistics(root, stats,
+												  clauses, conditions,
+												  varRelid, sjinfo);
+
+			/* we have a good solution (list of stats) */
+			foreach (s, solution)
 			{
+				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
 				/* clauses compatible with multi-variate stats */
 				List	*mvclauses = NIL;
+				List	*mvclauses_new = NIL;
+				List	*mvclauses_conditions = NIL;
+				Bitmapset	*stat_attnums = NULL;
 
-				/* split the clauselist into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
+				/* build attnum bitmapset for this statistics */
+				for (k = 0; k < mvstat->stakeys->dim1; k++)
+					stat_attnums = bms_add_member(stat_attnums,
+												  mvstat->stakeys->values[k]);
+
+				/*
+				 * Append the compatible conditions (passed from above)
+				 * to mvclauses_conditions.
+				 */
+				foreach (l, conditions)
+				{
+					Node *c = (Node*)lfirst(l);
+					Bitmapset *tmp = clause_mv_get_attnums(root, c);
+
+					if (bms_is_subset(tmp, stat_attnums))
+						mvclauses_conditions
+							= lappend(mvclauses_conditions, c);
+
+					bms_free(tmp);
+				}
+
+				/* split the clauselist into regular and mv-clauses
+				 *
+				 * We keep the list of clauses (we don't remove the
+				 * clauses yet, because we want to use the clauses
+				 * as conditions of other clauses).
+				 *
+				 * FIXME Do this only once, i.e. filter the clauses
+				 *       once (selecting clauses covered by at least
+				 *       one statistics) and then convert them into
+				 *       smaller per-statistics lists of conditions
+				 *       and estimated clauses.
+				 */
+				clauselist_mv_split(root, sjinfo, clauses,
 										varRelid, &mvclauses, mvstat,
 										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
-				/* we've chosen the histogram to match the clauses */
+				/*
+				 * We've chosen the statistics to match the clauses, so
+				 * each statistics from the solution should have at least
+				 * one new clause (not covered by the previous stats).
+				 */
 				Assert(mvclauses != NIL);
 
+				/*
+				 * Mvclauses now contains only clauses compatible
+				 * with the currently selected stats, but we have to
+				 * split that into conditions (already matched by
+				 * the previous stats), and the new clauses we need
+				 * to estimate using this stats.
+				 */
+				foreach (l, mvclauses)
+				{
+					ListCell *p;
+					bool covered = false;
+					Node  *clause = (Node *) lfirst(l);
+					Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+
+					/*
+					 * If already covered by previous stats, add it to
+					 * conditions.
+					 *
+					 * TODO Maybe this could be relaxed a bit? Because
+					 *      with complex and/or clauses, this might
+					 *      mean no statistics actually covers such
+					 *      complex clause.
+					 */
+					foreach (p, solution)
+					{
+						int k;
+						Bitmapset  *stat_attnums = NULL;
+
+						MVStatisticInfo *prev_stat
+							= (MVStatisticInfo *)lfirst(p);
+
+						/* break if we've ran into current statistic */
+						if (prev_stat == mvstat)
+							break;
+
+						for (k = 0; k < prev_stat->stakeys->dim1; k++)
+							stat_attnums = bms_add_member(stat_attnums,
+														  prev_stat->stakeys->values[k]);
+
+						covered = bms_is_subset(clause_attnums, stat_attnums);
+
+						bms_free(stat_attnums);
+
+						if (covered)
+							break;
+					}
+
+					if (covered)
+						mvclauses_conditions
+							= lappend(mvclauses_conditions, clause);
+					else
+						mvclauses_new
+							= lappend(mvclauses_new, clause);
+				}
+
+				/*
+				 * We need at least one new clause (not just conditions).
+				 */
+				Assert(mvclauses_new != NIL);
+
 				/* compute the multivariate stats */
-				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+				s1 *= clauselist_mv_selectivity(root, mvstat,
+												mvclauses_new,
+												mvclauses_conditions,
+												false); /* AND */
+			}
+
+			/*
+			 * And now finally remove all the mv-compatible clauses.
+			 *
+			 * This only repeats the same split as above, but this
+			 * time we actually use the result list (and feed it to
+			 * the next call).
+			 */
+			foreach (s, solution)
+			{
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+				/* split the list into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+				/*
+				 * Add the clauses to the conditions (to be passed
+				 * to regular clauses), irrespectedly whether it
+				 * will be used as a condition or a clause here.
+				 *
+				 * We only keep the remaining conditions in the
+				 * clauses (we keep what clauselist_mv_split returns)
+				 * so we add each MV condition exactly once.
+				 */
+				conditions_local = list_concat(conditions_local, mvclauses);
 			}
+
+			/* from now on, work with the 'local' list of conditions */
+			conditions = conditions_local;
 		}
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return clause_selectivity(root, (Node *) linitial(clauses),
+								  varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -363,7 +591,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -522,6 +751,253 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for clauses connected by OR.
+ *
+ * That means a few differences:
+ *
+ *   - functional dependencies don't apply to OR-clauses
+ *
+ *   - we can't add the previous clauses to conditions
+ *
+ *   - combined selectivities are combined using (s1+s2 - s1*s2)
+ *     and not as a multiplication (s1*s2)
+ *
+ * Another way to evaluate this might be turning
+ *
+ *     (a OR b OR c)
+ *
+ * into
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * and computing selectivity of that using clauselist_selectivity().
+ * That would allow (a) using the clauselist_selectivity directly and
+ * (b) using the previous clauses as conditions. Not sure if it's
+ * worth the additional complexity, though.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	Selectivity s1 = 0.0;
+	ListCell   *l;
+
+	/* processing mv stats */
+	Index		relid = InvalidOid;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+	List	   *stats = NIL;
+
+	/* use clauses (not conditions), because those are always non-empty */
+	stats = find_stats(root, clauses, varRelid, &relid);
+
+	/* OR-clauses do not work with functional dependencies */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
+	{
+		/*
+		 * Recollect attributes from mv-compatible clauses (maybe we've
+		 * removed so many clauses we have a single mv-compatible attnum).
+		 * From now on we're only interested in MCV-compatible clauses.
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+		/*
+		 * If there still are at least two columns, we'll try to select
+		 * a suitable multivariate stats.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+		{
+			int k;
+			ListCell *s;
+
+			List *solution
+				= choose_mv_statistics(root, stats,
+									   clauses, conditions,
+									   varRelid, sjinfo);
+
+			/* we have a good solution stats */
+			foreach (s, solution)
+			{
+				Selectivity s2;
+				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+				List	*mvclauses_new = NIL;
+				List	*mvclauses_conditions = NIL;
+				Bitmapset	*stat_attnums = NULL;
+
+				/* build attnum bitmapset for this statistics */
+				for (k = 0; k < mvstat->stakeys->dim1; k++)
+					stat_attnums = bms_add_member(stat_attnums,
+												  mvstat->stakeys->values[k]);
+
+				/*
+				 * Append the compatible conditions (passed from above)
+				 * to mvclauses_conditions.
+				 */
+				foreach (l, conditions)
+				{
+					Node *c = (Node*)lfirst(l);
+					Bitmapset *tmp = clause_mv_get_attnums(root, c);
+
+					if (bms_is_subset(tmp, stat_attnums))
+						mvclauses_conditions
+							= lappend(mvclauses_conditions, c);
+
+					bms_free(tmp);
+				}
+
+				/* split the clauselist into regular and mv-clauses
+				 *
+				 * We keep the list of clauses (we don't remove the
+				 * clauses yet, because we want to use the clauses
+				 * as conditions of other clauses).
+				 *
+				 * FIXME Do this only once, i.e. filter the clauses
+				 *       once (selecting clauses covered by at least
+				 *       one statistics) and then convert them into
+				 *       smaller per-statistics lists of conditions
+				 *       and estimated clauses.
+				 */
+				clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+				/*
+				 * We've chosen the statistics to match the clauses, so
+				 * each statistics from the solution should have at least
+				 * one new clause (not covered by the previous stats).
+				 */
+				Assert(mvclauses != NIL);
+
+				/*
+				 * Mvclauses now contains only clauses compatible
+				 * with the currently selected stats, but we have to
+				 * split that into conditions (already matched by
+				 * the previous stats), and the new clauses we need
+				 * to estimate using this stats.
+				 *
+				 * XXX We'll only use the new clauses, but maybe we
+				 *     should use the conditions too, somehow. We can't
+				 *     use that directly in conditional probability, but
+				 *     maybe we might use them in a different way?
+				 *
+				 *     If we have a clause (a OR b OR c), then knowing
+				 *     that 'a' is TRUE means (b OR c) can't make the
+				 *     whole clause FALSE.
+				 *
+				 *     This is pretty much what
+				 *
+				 *         (a OR b) == NOT ((NOT a) AND (NOT b))
+				 *
+				 *     implies.
+				 */
+				foreach (l, mvclauses)
+				{
+					ListCell *p;
+					bool covered = false;
+					Node  *clause = (Node *) lfirst(l);
+					Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+
+					/*
+					 * If already covered by previous stats, add it to
+					 * conditions.
+					 *
+					 * TODO Maybe this could be relaxed a bit? Because
+					 *      with complex and/or clauses, this might
+					 *      mean no statistics actually covers such
+					 *      complex clause.
+					 */
+					foreach (p, solution)
+					{
+						int k;
+						Bitmapset  *stat_attnums = NULL;
+
+						MVStatisticInfo *prev_stat
+							= (MVStatisticInfo *)lfirst(p);
+
+						/* break if we've ran into current statistic */
+						if (prev_stat == mvstat)
+							break;
+
+						for (k = 0; k < prev_stat->stakeys->dim1; k++)
+							stat_attnums = bms_add_member(stat_attnums,
+														  prev_stat->stakeys->values[k]);
+
+						covered = bms_is_subset(clause_attnums, stat_attnums);
+
+						bms_free(stat_attnums);
+
+						if (covered)
+							break;
+					}
+
+					if (! covered)
+						mvclauses_new = lappend(mvclauses_new, clause);
+				}
+
+				/*
+				 * We need at least one new clause (not just conditions).
+				 */
+				Assert(mvclauses_new != NIL);
+
+				/* compute the multivariate stats */
+				s2 = clauselist_mv_selectivity(root, mvstat,
+												mvclauses_new,
+												mvclauses_conditions,
+												true); /* OR */
+
+				s1 = s1 + s2 - s1 * s2;
+			}
+
+			/*
+			 * And now finally remove all the mv-compatible clauses.
+			 *
+			 * This only repeats the same split as above, but this
+			 * time we actually use the result list (and feed it to
+			 * the next call).
+			 */
+			foreach (s, solution)
+			{
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+				/* split the list into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+			}
+		}
+	}
+
+	/*
+	 * Handle the remaining clauses (either using regular statistics,
+	 * or by multivariate stats at the next level).
+	 */
+	foreach(l, clauses)
+	{
+		Selectivity s2 = clause_selectivity(root,
+											(Node *) lfirst(l),
+											varRelid,
+											jointype,
+											sjinfo,
+											conditions);
+		s1 = s1 + s2 - s1 * s2;
+	}
+
+	return s1;
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -728,7 +1204,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -858,7 +1335,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -867,29 +1345,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -998,7 +1465,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -1007,7 +1475,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 
 	/* Cache the result if possible */
@@ -1120,9 +1589,67 @@ clause_selectivity(PlannerInfo *root,
  *      them without inspection, which is more expensive). But this
  *      requires really knowing the per-clause selectivities in advance,
  *      and that's not what we do now.
+ *
+ * TODO All this is based on the assumption that the statistics represent
+ *      the necessary dependencies, i.e. that if two colunms are not in
+ *      the same statistics, there's no dependency. If that's not the
+ *      case, we may get misestimates, just like before. For example
+ *      assume we have a table with three columns [a,b,c] with exactly
+ *      the same values, and statistics on [a,b] and [b,c]. So somthing
+ *      like this:
+ *
+ *          CREATE TABLE test AS SELECT i, i, i
+                                  FROM generate_series(1,1000);
+ *
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (a,b);
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (b,c);
+ *
+ *          ANALYZE test;
+ *
+ *          EXPLAIN ANALYZE SELECT * FROM test
+ *                    WHERE (a < 10) AND (b < 20) AND (c < 10);
+ *
+ *      The problem here is that the only shared column between the two
+ *      statistics is 'b' so the probability will be computed like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (a < 10) & (b < 20)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (b < 20)]
+ *
+ *      or like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20)]
+ *
+ *      In both cases the conditional probabilities will be evaluated as
+ *      0.5, because they lack the other column (which would make it 1.0).
+ *
+ *      Theoretically it might be possible to transfer the dependency,
+ *      e.g. by building bitmap for [a,b] and then combine it with [b,c]
+ *      by doing something like this:
+ *
+ *          1) build bitmap on [a,b] using [(a<10) & (b < 20)]
+ *          2) for each element in [b,c] check the bitmap
+ *
+ *      But that's certainly nontrivial - for example the statistics may
+ *      be different (MCV list vs. histogram) and/or the items may not
+ *      match (e.g. MCV items or histogram buckets will be built
+ *      differently). Also, for one value of 'b' there might be multiple
+ *      MCV items (because of the other column values) with different
+ *      bitmap values (some will match, some won't) - so it's not exactly
+ *      bitmap but a partial match.
+ *
+ *      Maybe a hash table with number of matches and mismatches (or
+ *      maybe sums of frequencies) would work? The step (2) would then
+ *      lookup the values and use that to weight the item somehow.
+ * 
+ *      Currently the only solution is to build statistics on all three
+ *      columns.
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -1140,7 +1667,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -1153,7 +1681,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
 	 *       selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1193,8 +1722,7 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	if (bms_num_members(attnums) <= 1)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
+		bms_free(attnums);
 		attnums = NULL;
 		*relid = InvalidOid;
 	}
@@ -1203,123 +1731,852 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
- * We're looking for statistics matching at least 2 attributes,
- * referenced in the clauses compatible with multivariate statistics.
- * The current selection criteria is very simple - we choose the
- * statistics referencing the most attributes.
+ * Selects the best combination of multivariate statistics, in an
+ * exhaustive way, where 'best' means:
+ *
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
+ *
+ * There may be other optimality criteria, not considered in the initial
+ * implementation (more on that 'weaknesses' section).
+ *
+ * This pretty much splits the probability of clauses (aka selectivity)
+ * into a sequence of conditional probabilities, like this
+ *
+ *    P(A,B,C,D) = P(A,B) * P(C|A,B) * P(D|A,B,C)
+ *
+ * and removing the attributes not referenced by the existing stats,
+ * under the assumption that there's no dependency (otherwise the DBA
+ * would create the stats).
+ *
+ * The last criteria means that when we have the choice to compute like
+ * this
+ *
+ *      P(A,B,C,D) = P(A,B,C) * P(D|B,C)
  *
- * If there are multiple statistics referencing the same number of
- * columns (from the clauses), the one with less source columns
- * (as listed in the ADD STATISTICS when creating the statistics) wins.
- * Other wise the first one wins.
+ * or like this
  *
- * This is a very simple criteria, and has several weaknesses:
+ *      P(A,B,C,D) = P(A,B,C) * P(D|C)
  *
- * (a) does not consider the accuracy of the statistics
+ * we should use the first option, as that exploits more dependencies.
  *
- *     If there are two histograms built on the same set of columns,
- *     but one has 100 buckets and the other one has 1000 buckets (thus
- *     likely providing better estimates), this is not currently
- *     considered.
+ * The order of statistics in the solution implicitly determines the
+ * order of estimation of clauses, because as we apply a statistics,
+ * we always use it to estimate all the clauses covered by it (and
+ * then we use those clauses as conditions for the next statistics).
  *
- * (b) does not consider the type of statistics
+ * Don't call this directly but through choose_mv_statistics().
  *
- *     If there are three statistics - one containing just a MCV list,
- *     another one with just a histogram and a third one with both,
- *     this is not considered.
  *
- * (c) does not consider the number of clauses
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with
+ * maximum 'depth' equal to the number of multi-variate statistics
+ * available on the table.
  *
- *     As explained, only the number of referenced attributes counts,
- *     so if there are multiple clauses on a single attribute, this
- *     still counts as a single attribute.
+ * It explores all the possible permutations of the stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it
+ * matches are divided into 'conditions' (clauses already matched by at
+ * least one previous statistics) and clauses that are estimated.
  *
- * (d) does not consider type of condition
+ * Then several checks are performed:
  *
- *     Some clauses may work better with some statistics - for example
- *     equality clauses probably work better with MCV lists than with
- *     histograms. But IS [NOT] NULL conditions may often work better
- *     with histograms (thanks to NULL-buckets).
+ *  (a) The statistics covers at least 2 columns, referenced in the
+ *      estimated clauses (otherwise multi-variate stats are useless).
  *
- * So for example with five WHERE conditions
+ *  (b) The statistics covers at least 1 new column, i.e. column not
+ *      refefenced by the already used stats (and the new column has
+ *      to be referenced by the clauses, of couse). Otherwise the
+ *      statistics would not add any new information.
  *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ * There are some other sanity checks (e.g. that the stats must not be
+ * used twice etc.).
  *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
- * selected as it references the most columns.
+ * Finally the new solution is compared to the currently best one, and
+ * if it's considered better, it's used instead.
  *
- * Once we have selected the multivariate statistics, we split the list
- * of clauses into two parts - conditions that are compatible with the
- * selected stats, and conditions are estimated using simple statistics.
  *
- * From the example above, conditions
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a somewhat simple optimality criteria,
+ * suffering by the following weaknesses.
  *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but
+ *     with statistics in a different order). It's unclear which solution
+ *     is the best one - in a sense all of them are equal.
  *
- * will be estimated using the multivariate statistics (a,b,c,d) while
- * the last condition (e = 1) will get estimated using the regular ones.
+ * TODO It might be possible to compute estimate for each of those
+ *      solutions, and then combine them to get the final estimate
+ *      (e.g. by using average or median).
  *
- * There are various alternative selection criteria (e.g. counting
- * conditions instead of just referenced attributes), but eventually
- * the best option should be to combine multiple statistics. But that's
- * much harder to do correctly.
+ * (b) Does not consider that some types of stats are a better match for
+ *     some types of clauses (e.g. MCV list is a good match for equality
+ *     than a histogram).
  *
- * TODO Select multiple statistics and combine them when computing
- *      the estimate.
+ *     XXX Maybe MCV is almost always better / more accurate?
+ *
+ *     But maybe this is pointless - generally, each column is either
+ *     a label (it's not important whether because of the data type or
+ *     how it's used), or a value with ordering that makes sense. So
+ *     either a MCV list is more appropriate (labels) or a histogram
+ *     (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing columns of
+ *     both types - maybe it'd be beeter to invent a new type of stats
+ *     combining MCV list and histogram (keeping a small histogram for
+ *     each MCV item, and a separate histogram for values not on the
+ *     MCV list). But that's not implemented at this moment.
+ *
+ * TODO The algorithm should probably count number of Vars (not just
+ *      attnums) when computing the 'score' of each solution. Computing
+ *      the ratio of (num of all vars) / (num of condition vars) as a
+ *      measure of how well the solution uses conditions might be
+ *      useful.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		/* we can't get more conditions that clauses and conditions combined
+		 *
+		 * FIXME This assert does not work because we count the conditions
+		 *       repeatedly (once for each statistics covering it).
+		 */
+		/* Assert((nconditions + nclauses) >= current->nconditions); */
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics
+ * covering the clauses. This chooses the "best" statistics at each step,
+ * so the resulting solution may not be the best solution globally, but
+ * this produces the solution in only N steps (where N is the number of
+ * statistics), while the exhaustive approach may have to walk through
+ * ~N! combinations (although some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does
+ * the same thing (but in a different way).
+ *
+ * Don't call this directly, but through choose_mv_statistics().
+ *
+ * TODO There are probably other metrics we might use - e.g. using
+ *      number of columns (num_cond_columns / num_cov_columns), which
+ *      might work better with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled
+ *      in a special way, because there will be 0 conditions at that
+ *      moment, so there needs to be some other criteria - e.g. using
+ *      the simplest (or most complex?) clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria,
+ *      and branch the search. This is however tricky, because if we
+ *      choose k statistics at each step, we get k^N branches to
+ *      walk through (with N steps). That's not really good with
+ *      large number of stats (yet better than exhaustive search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
+ * Chooses the combination of statistics, optimal for estimation of
+ * a particular clause list.
+ *
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce
+ * the size of the problem (eliminate clauses/statistics that can't be
+ * really used in the solution).
+ *
+ * It also precomputes bitmaps for attributes covered by clauses and
+ * statistics, so that we don't need to do that over and over in the
+ * actual optimizations (as it's both CPU and memory intensive).
  *
  * TODO This will probably have to consider compatibility of clauses,
  *      because 'dependencies' will probably work only with equality
  *      clauses.
+ *
+ * TODO Another way to make the optimization problems smaller might
+ *      be splitting the statistics into several disjoint subsets, i.e.
+ *      if we can split the graph of statistics (after the elimination)
+ *      into multiple components (so that stats in different components
+ *      share no attributes), we can do the optimization for each
+ *      component separately.
+ *
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew
+ *      that we can cover 10 clauses and reuse 8 dependencies, maybe
+ *      covering 9 clauses and 7 dependencies would be OK?
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static List*
+choose_mv_statistics(PlannerInfo *root, List *stats,
+					 List *clauses, List *conditions,
+					 Oid varRelid, SpecialJoinInfo *sjinfo)
 {
 	int i;
-	ListCell   *lc;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
+
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
 
-	MVStatisticInfo *choice = NULL;
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements)
-	 * and for each one count the referenced attributes (encoded in
-	 * the 'attnums' bitmap).
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
 	 */
-	foreach (lc, stats)
+	while (true)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+		List	   *tmp;
+
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
+
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, varRelid, sjinfo, type,
+							 stats, clauses, &compatible_attnums);
+
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
+
+		/*
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
+		 */
+		if (conditions != NIL)
+		{
+			tmp = filter_clauses(root, varRelid, sjinfo, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
+		}
+
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
+
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
+
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
+
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
+
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
+
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
+										   clauses_array, nclauses);
+
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
+										   conditions_array, nconditions);
+
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
+		{
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
+		}
+		pfree(best);
+	}
 
-		/* columns matching this statistics */
-		int matches = 0;
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
-			continue;
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
 
-		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
-		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
-		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
-		}
-	}
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
 
-	return choice;
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
+
+	return result;
 }
 
 
@@ -1589,6 +2846,51 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 	return false;
 }
 
+
+static Bitmapset *
+clause_mv_get_attnums(PlannerInfo *root, Node *clause)
+{
+	Bitmapset * attnums = NULL;
+
+	/* Extract clause from restrict info, if needed. */
+	if (IsA(clause, RestrictInfo))
+		clause = (Node*)((RestrictInfo*)clause)->clause;
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+
+		if (IsA(linitial(expr->args), Var))
+			attnums = bms_add_member(attnums,
+							((Var*)linitial(expr->args))->varattno);
+		else
+			attnums = bms_add_member(attnums,
+							((Var*)lsecond(expr->args))->varattno);
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		attnums = bms_add_member(attnums,
+							((Var*)((NullTest*)clause)->arg)->varattno);
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			attnums = bms_join(attnums,
+						clause_mv_get_attnums(root, (Node*)lfirst(l)));
+		}
+	}
+
+	return attnums;
+}
+
 /*
  * Performs reduction of clauses using functional dependencies, i.e.
  * removes clauses that are considered redundant. It simply walks
@@ -2240,22 +3542,26 @@ get_varattnos(Node * node, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2266,32 +3572,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
 
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2589,38 +3948,29 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert(list_length(tmp_clauses) >= 2);
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
@@ -2632,16 +3982,14 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2699,15 +4047,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2720,25 +4071,52 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
 
-	nmatches = mvhist->nbuckets;
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
-	/* build the match bitmap */
+	/* build the match bitmap for the conditions */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, is_or);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2752,17 +4130,35 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -3268,38 +4664,31 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert(list_length(tmp_clauses) >= 2);
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
 			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
 			if (or_clause(clause))
 			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+				/* AND clauses assume everything matches, initially */
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 			}
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
@@ -3310,10 +4699,10 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
@@ -3325,3 +4714,363 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Walk through clauses and keep only those covered by at least
+ * one of the statistics.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
+			   int type, List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+		Index relid;
+
+		/*
+		 * The clause has to be mv-compatible (suitable operators etc.).
+		 */
+		if (! clause_is_mv_compatible(root, clause, varRelid,
+							 &relid, &clause_attnums, sjinfo, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* is there a statistics covering this clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			for (k = 0; k < stat->stakeys->dim1; k++)
+			{
+				if (bms_is_member(stat->stakeys->values[k],
+								  clause_attnums))
+					matches += 1;
+			}
+
+			/*
+			 * The clause is compatible if all attributes it references
+			 * are covered by the statistics.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+
+/*
+ * Walk through statistics and only keep those covering at least
+ * one new attribute (excluding conditions) and at two attributes
+ * in both clauses and conditions.
+ *
+ * This check might be made more strict by checking against individual
+ * clauses, because by using the bitmapsets of all attnums we may
+ * actually use attnums from clauses that are not covered by the
+ * statistics. For example, we may have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this
+ * (assuming there are some statistics covering both clases).
+ *
+ * TODO Do the more strict check.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
+
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
+
+	return mvstats;
+}
+
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
+
+	Assert(nmvstats > 0);
+
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
+}
+
+
+/*
+ * Now let's remove redundant statistics, covering the same columns
+ * as some other stats, when restricted to the attributes from
+ * remaining clauses.
+ *
+ * If statistics S1 covers S2 (covers S2 attributes and possibly
+ * some more), we can probably remove S2. What actually matters are
+ * attributes from covered clauses (not all the attributes). This
+ * might however prefer larger, and thus less accurate, statistics.
+ *
+ * When a redundancy is detected, we simply keep the smaller
+ * statistics (less number of columns), on the assumption that it's
+ * more accurate and faster to process. That might be incorrect for
+ * two reasons - first, the accuracy really depends on number of
+ * buckets/MCV items, not the number of columns. Second, we might
+ * prefer MCV lists over histograms or something like that.
+ */
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
+{
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
+
+	/*
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+
+	/*
+	 * Get the varattnos from both conditions and clauses.
+	 *
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
+	 *
+	 * XXX Is that really true?
+	 */
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
+	{
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
+
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
+	}
+
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
+	}
+
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
+}
+
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
+{
+	int i;
+	ListCell *l;
+
+	Node** clauses_array;
+
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
+
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
+}
+
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
+					 int type, Node **clauses, int nclauses)
+{
+	int			i;
+	Index		relid;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
+
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
+
+		if (! clause_is_mv_compatible(root, clauses[i], varRelid,
+									  &relid, &attnums, sjinfo, type))
+			elog(ERROR, "should not get non-mv-compatible cluase");
+
+		clauses_attnums[i] = attnums;
+	}
+
+	return clauses_attnums;
+}
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ac865be..8f625e6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3347,7 +3347,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3370,7 +3371,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3537,7 +3539,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3573,7 +3575,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3610,7 +3613,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3748,12 +3752,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3765,7 +3771,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index f0acc14..e41508b 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 04ed07b..3e2f7a4 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1580,13 +1580,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6209,7 +6211,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6534,7 +6537,8 @@ btcostestimate(PG_FUNCTION_ARGS)
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7277,7 +7281,8 @@ gincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7509,7 +7514,7 @@ brincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index be7ba4f..982b66a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -393,6 +394,15 @@ static const struct config_enum_entry row_security_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3648,6 +3658,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 24003ae..6bfd338 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -183,11 +183,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 70f79ed..f2fbc11 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -16,6 +16,14 @@
 
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
1.9.3

#41

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Tomas Vondra (#40)

Re: multivariate statistics / patch v7

Hello, I started to work on this patch.

attached is v7 of the multivariate stats patch. The main improvement
is major refactoring of the clausesel.c portion - splitting the
awfully long spaghetti-style functions into smaller pieces, making it
much more understandable etc.

Thank you, it looks clearer. I have some comment for the brief
look at this. This patchset is relatively large so I will comment
on "per-notice" basis.. which means I'll send comment before
examining the entire of this patchset. Sorry in advance for the
desultory comments.

=======
General comments:

- You included unnecessary stuffs such like regression.diffs in
these patches.

- Now OID 3307 is used by pg_stat_file. I moved
pg_mv_stats_dependencies_info/show to 3311/3312.

- Single-variate stats have a mechanism to inject arbitrary
values as statistics, that is, get_relation_stats_hook and the
similar stuffs. I want the similar mechanism for multivariate
statistics, too.

0001:

- I also don't think it is right thing for expression_tree_walker
to recognize RestrictInfo since it is not a part of expression.

0003:

- In clauselist_selectivity, find_stats is uselessly called for
single clause. This should be called after the clauselist found
to consist more than one clause.

- Searching vars to be compared with mv-stat columns which
find_stats does should stop at disjunctions. But this patch
doesn't behave so and it should be an unwanted behavior. The
following steps shows that.

====
=# CREATE TABLE t1 (a int, b int, c int);
=# INSERT INTO t1 (SELECT a, a * 2, a * 3 FROM generate_series(0, 9999) a);
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=1 width=12)
=# ALTER TABLE t1 ADD STATISTICS (HISTOGRAM) ON (a, b, c);
=# ANALZYE t1;
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=268 width=12)
====
Rows changed unwantedly.

It seems not so simple thing as your code assumes.

I do assume some of those pieces are unnecessary because there already
is a helper function with the same purpose (but I'm not aware of
that). But IMHO this piece of code begins to look reasonable
(especially when compared to the previous state).

Year, such kind of work should be done later:p This patch is
not-so-invasive so as to make it undoable.

The other major improvement it review of the comments (including
FIXMEs and TODOs), and removal of the obsolete / misplaced ones. And
there was plenty of those ...

These changes made this version ~20k smaller than v6.

The patch also rebases to current master, which I assume shall be
quite stable - so hopefully no more duplicate OIDs for a while.

There are 6 files attached, but only 0002-0006 are actually part of
the multivariate statistics patch itself. The first part makes it
possible to use pull_varnos() with expression trees containing
RestrictInfo nodes, but maybe this is not the right way to fix this
(there's another thread where this was discussed).

As mentioned above, checking if mv stats can be applied would be
more complex matter than now you are assuming. I also will
consider that.

Also, the regression tests testing plan choice with multivariate stats
(e.g. that a bitmap index scan is chosen instead of index scan) fail
from time to time. I suppose this happens because the invalidation
after ANALYZE is not processed before executing the query, so the
optimizer does not see the stats, or something like that.

I saw that occurs, but have no idea how it occurs so far..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#41)

Re: multivariate statistics / patch v7

Hello Horiguchi-san!

On 07/03/2015 07:30 AM, Kyotaro HORIGUCHI wrote:

Hello, I started to work on this patch.

attached is v7 of the multivariate stats patch. The main improvement
is major refactoring of the clausesel.c portion - splitting the
awfully long spaghetti-style functions into smaller pieces, making it
much more understandable etc.

Thank you, it looks clearer. I have some comment for the brief look
at this. This patchset is relatively large so I will comment on
"per-notice" basis.. which means I'll send comment before examining
the entire of this patchset. Sorry in advance for the desultory
comments.

Sure. If you run into something that's not clear enough, I'm happy to
explain that (I tried to cover all the important details in the
comments, but it's a large patch, indeed.)

=======
General comments:

- You included unnecessary stuffs such like regression.diffs in
these patches.

Ahhhh :-/ Will fix.

- Now OID 3307 is used by pg_stat_file. I moved
pg_mv_stats_dependencies_info/show to 3311/3312.

Will fix while rebasing to current master.

- Single-variate stats have a mechanism to inject arbitrary
values as statistics, that is, get_relation_stats_hook and the
similar stuffs. I want the similar mechanism for multivariate
statistics, too.

Fair point, although I'm not sure where should we place the hook, how
exactly should it be defined and how useful that would be in the end.
Can you give an example of how you'd use such hook?

I've never used get_relation_stats_hook, but if I get it right, the
plugins can use the hook to create the stats (for each column), either
from scratch or tweaking the existing stats.

I'm not sure how this should work with multivariate stats, though,
because there can be arbitrary number of stats for a column, and it
really depends on all the clauses (so examine_variable() seems a bit
inappropriate, as it only sees a single variable at a time).

Moreover, with multivariate stats

(a) there may be arbitrary number of stats for a column

(b) only some of the stats end up being used for the estimation

I see two or three possible places for calling such hook:

(a) at the very beginning, after fetching the list of stats

- sees all the existing stats on a table
- may add entirely new stats or tweak the existing ones

(b) after collecting the list of variables compatible with
multivariate stats

- like (a) and additionally knows which columns are interesting
for the query (but only with respect to the existing stats)

- like (b), but can't affect the optimization

But I can't really imagine anyone building multivariate stats on the
fly, in the hook.

It's more complicated, though, because the query may call
clauselist_selectivity multiple times, depending on how complex the
WHERE clauses are.

0001:

- I also don't think it is right thing for expression_tree_walker
to recognize RestrictInfo since it is not a part of expression.

Yes. In my working git repo, I've reworked this to use the second
option, i.e. adding RestrictInfo pull_(varno|varattno)_walker:

https://github.com/tvondra/postgres/commit/2dc79b914c759d31becd8ae670b37b79663a595f

Do you think this is the correct solution? If not, how to fix it?

0003:

- In clauselist_selectivity, find_stats is uselessly called for
single clause. This should be called after the clauselist found
to consist more than one clause.

Ok, will fix.

- Searching vars to be compared with mv-stat columns which
find_stats does should stop at disjunctions. But this patch
doesn't behave so and it should be an unwanted behavior. The
following steps shows that.

Why should it stop at disjunctions? There's nothing wrong with using
multivariate stats to estimate OR-clauses, IMHO.

====
=# CREATE TABLE t1 (a int, b int, c int);
=# INSERT INTO t1 (SELECT a, a * 2, a * 3 FROM generate_series(0, 9999) a);
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=1 width=12)
=# ALTER TABLE t1 ADD STATISTICS (HISTOGRAM) ON (a, b, c);
=# ANALZYE t1;
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=268 width=12)
====
Rows changed unwantedly.

That has nothing to do with OR clauses, but rather with using a type of
statistics that does not fit the data and queries. Histograms are quite
inaccurate for discrete data and equality conditions - in this case the
clauses probably match one bucket, and so we use 1/2 the bucket as an
estimate. There's nothing wrong with that.

So let's use MCV instead:

ALTER TABLE t1 ADD STATISTICS (MCV) ON (a, b, c);
ANALYZE t1;
EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t1 (cost=0.00..230.00 rows=1 width=12)
Filter: (((a = 1) AND (b = 2)) OR (c = 3))
(2 rows)

It seems not so simple thing as your code assumes.

Maybe, but I don't see what assumption is invalid? I see nothing wrong
with the previous query.

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Tomas Vondra (#42)

Re: multivariate statistics / patch v7

Hi, Tomas. I'll kick the gas pedal.

Thank you, it looks clearer. I have some comment for the brief look
at this. This patchset is relatively large so I will comment on
"per-notice" basis.. which means I'll send comment before examining
the entire of this patchset. Sorry in advance for the desultory
comments.

Sure. If you run into something that's not clear enough, I'm happy to
explain that (I tried to cover all the important details in the
comments, but it's a large patch, indeed.)

- Single-variate stats have a mechanism to inject arbitrary
values as statistics, that is, get_relation_stats_hook and the
similar stuffs. I want the similar mechanism for multivariate
statistics, too.

Fair point, although I'm not sure where should we place the hook, how
exactly should it be defined and how useful that would be in the
end. Can you give an example of how you'd use such hook?

It's my secret, but is open:p. this is crucial for us to examine
many planner-related problems occurred in our customer in-vitro.

http://pgdbmsstats.osdn.jp/pg_dbms_stats-en.html

# Mmm, this doc is a bit too old..

One tool of ours does like following,

- Copy pg_statistics and some attributes of pg_class into some
table. Of course this is exportable.

- For example, in examine_simple_variable, using the hook
get_relation_stats_hook, inject the saved statistics in place
of the real statistics.

The hook point is placed where the parameters to specify what
statistics is needed are avaiable in compact shape, and all the
hook function should do is returning corresponding statistics
values.

So the parallel stuff for this mv stats will look like this.

MVStatisticInfo *
get_mv_statistics(PlannerInfo *root, relid);

MVStatisticInfo *
get_mv_statistics(PlannerInfo *root, relid, <bitmap or list of attnos>);

So by simplly applying this, the current clauselist_selectivity
code will turn into following.

if (list_length(clauses) == 1)
return clause_selectivity(....);

Index varrelid = find_singleton_relid(root, clauses, varRelid);

if (varrelid)
{
// Bitmapset attnums = collect_attnums(root, clauses, varrelid);
if (get_mv_statistics_hook)
stats = get_mv_statistics_hook(root, varrelid /*, attnums */);
else
statis = get_mv_statistics(root, varrelid /*, attnums*/);

....

In comparison to single statistics, statistics values might be
preferable to separate from definition.

I've never used get_relation_stats_hook, but if I get it right, the
plugins can use the hook to create the stats (for each column), either
from scratch or tweaking the existing stats.

Mostly existing stats without change. I saw few hackers wanted to
provide predefined statistics for typical cases. I haven't see
anyone who tweaks existing stats.

I'm not sure how this should work with multivariate stats, though,
because there can be arbitrary number of stats for a column, and it
really depends on all the clauses (so examine_variable() seems a bit
inappropriate, as it only sees a single variable at a time).

Restriction clauses are not a problem. What is needed to replace
stats value is defining few APIs to retrieve them, and to
retrieve the stats values only in a way that compatible with the
API. It would be okay to be a substitute views for mv stats as an
extreme case but it is not good.

Moreover, with multivariate stats

(a) there may be arbitrary number of stats for a column

(b) only some of the stats end up being used for the estimation

I see two or three possible places for calling such hook:

(a) at the very beginning, after fetching the list of stats

- sees all the existing stats on a table
- may add entirely new stats or tweak the existing ones

Getting all stats for a table would be okay but attnum list can
restrict the possibilities, as the second form of the example
APIs above. And we may forget the case of forged or tweaked
stats, they are their problem, not ours.

(b) after collecting the list of variables compatible with
multivariate stats

- like (a) and additionally knows which columns are interesting
for the query (but only with respect to the existing stats)

We should carefully design the API to be able to point the
pertinent stats for every situation. Mv stats is based on the
correlation of multiple columns so I think only relid and
attributes list are enough as the parameter.

| if (st.relid == param.relid && bms_equal(st.attnums, param.attnums))
| /* This is the stats to be wanted */

If we can filter the appropriate stats from all the stats using
clauselist, we definitely can make the appropriate parameter
(column set) prior to retrieving mv statistics. Isn't it correct?

(c) after optimization (selection of the right combination if stats)

- like (b), but can't affect the optimization

But I can't really imagine anyone building multivariate stats on the
fly, in the hook.

It's more complicated, though, because the query may call
clauselist_selectivity multiple times, depending on how complex the
WHERE clauses are.

0001:

- I also don't think it is right thing for expression_tree_walker
to recognize RestrictInfo since it is not a part of expression.

Yes. In my working git repo, I've reworked this to use the second
option, i.e. adding RestrictInfo pull_(varno|varattno)_walker:

https://github.com/tvondra/postgres/commit/2dc79b914c759d31becd8ae670b37b79663a595f

Do you think this is the correct solution? If not, how to fix it?

The reason why I think it is not appropreate is that RestrictInfo
is not a part of expression.

Increasing selectivity of a condition by column correlation is
occurs only for a set of conjunctive clauses. OR operation
devides the sets. Is it agreeable? RestrictInfos can be nested
each other and we should be aware of the AND/OR operators. This
is what expression_tree_walker doesn't.

Perhaps we should provide the dedicate function such like
find_conjunctive_attr_set which does this,

- Check the type top expression of the clause

- If it is a RestrictInfo, check clause_relids then check
clause.

- If it is a bool OR, stop to search and return empty set of
attributes.

- If it is a bool AND, make further check of the components. A
list of RestrictInfo should be treaed as AND connection.

- If it is operator exression, collect used relids and attrs
walking the expression tree.

I should missing something but I think the outline is correct.

Addition to that we should carefully avoid duplicate correction
using the same mv statistics.

I haven't understood what choose_mv_satistics precisely but I
suppose what this function does would be split into the 'making
parameter to find stats' part and 'matching the parameter with
stats in order to retrieve desired stats' part. Could you
reconstruct this process into the form like this?

I feel it is too invasive, or exccesively intermix(?)ed.

0003:

- In clauselist_selectivity, find_stats is uselessly called for
single clause. This should be called after the clauselist found
to consist more than one clause.

Ok, will fix.

- Searching vars to be compared with mv-stat columns which
find_stats does should stop at disjunctions. But this patch
doesn't behave so and it should be an unwanted behavior. The
following steps shows that.

Why should it stop at disjunctions? There's nothing wrong with using
multivariate stats to estimate OR-clauses, IMHO.

Mv statistics represents how often *every combination of the
column values* occurs. Is it correct? Where the combination can
be replaced with coexists, that is AND. For example MV-MCV.

(a, b, c) freq
(1, 2, 3) 100
(1, 2, 5) 50
(1, 3, 8) 20
(1, 7, 2) 5
===============
total 175

| select * from t where a = 1 and b = 2 and c = 3;
| SELECT 100

This is correct,

| select * from t where a = 1 and b = 2 or c = 3;
| SELECT 100

This is *not* correct. The correct number of tuples is 150.
This is a simple example where OR breaks MV stats assumption.

====
=# CREATE TABLE t1 (a int, b int, c int);
=# INSERT INTO t1 (SELECT a, a * 2, a * 3 FROM generate_series(0,
9999) a);
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=1 width=12)
=# ALTER TABLE t1 ADD STATISTICS (HISTOGRAM) ON (a, b, c);
=# ANALZYE t1;
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=268 width=12)
====
Rows changed unwantedly.

That has nothing to do with OR clauses, but rather with using a type
of statistics that does not fit the data and queries. Histograms are
quite inaccurate for discrete data and equality conditions - in this
case the clauses probably match one bucket, and so we use 1/2 the
bucket as an estimate. There's nothing wrong with that.

So let's use MCV instead:

Hmm, it's not a problem what specific number is displayed as
rows. What is crucial is the fact that rows has changed even
though it shouldn't have changed. As I demonstrated above.

ALTER TABLE t1 ADD STATISTICS (MCV) ON (a, b, c);
ANALYZE t1;
EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t1 (cost=0.00..230.00 rows=1 width=12)
Filter: (((a = 1) AND (b = 2)) OR (c = 3))
(2 rows)

It seems not so simple thing as your code assumes.

Maybe, but I don't see what assumption is invalid? I see nothing wrong
with the previous query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#43)

Re: multivariate statistics / patch v7

Hi,

On 07/07/2015 08:05 AM, Kyotaro HORIGUCHI wrote:

Hi, Tomas. I'll kick the gas pedal.

Thank you, it looks clearer. I have some comment for the brief look
at this. This patchset is relatively large so I will comment on
"per-notice" basis.. which means I'll send comment before examining
the entire of this patchset. Sorry in advance for the desultory
comments.

Sure. If you run into something that's not clear enough, I'm happy to
explain that (I tried to cover all the important details in the
comments, but it's a large patch, indeed.)

- Single-variate stats have a mechanism to inject arbitrary
values as statistics, that is, get_relation_stats_hook and the
similar stuffs. I want the similar mechanism for multivariate
statistics, too.

Fair point, although I'm not sure where should we place the hook,
how exactly should it be defined and how useful that would be in
the end. Can you give an example of how you'd use such hook?

...

We should carefully design the API to be able to point the pertinent
stats for every situation. Mv stats is based on the correlation of
multiple columns so I think only relid and attributes list are
enough as the parameter.

| if (st.relid == param.relid && bms_equal(st.attnums, param.attnums))
| /* This is the stats to be wanted */

If we can filter the appropriate stats from all the stats using
clauselist, we definitely can make the appropriate parameter (column
set) prior to retrieving mv statistics. Isn't it correct?

Let me briefly explain how the current clauselist_selectivity
implementation works.

(1) check if there are multivariate statistics on the table - if not,
skip the multivariate parts altogether (the point of this is to
minimize impact on users who don't use the new feature)

(2) see if the are clauses compatible with multivariate stats - this
only checks "general compatibility" without actually checking the
existing stats (the point is to terminate early, if the clauses
are not compatible somehow - e.g. if the clauses reference only a
single attribute, use unsupported operators etc.)

(3) if there are multivariate stats and compatible clauses, the
function choose_mv_stats tries to find the best combination of
multivariate stats with respect to the clauses (details later)

(4) the clauses are estimated using the stats, the remaining clauses
are estimated using the current statistics (single attribute)

The only way to reliably inject new stats is by calling a hook before
(1), allowing it to arbitrarily modify the list of stats. Based on the
use cases you provided, I don't think it makes much sense to add
additional hooks in the other phases.

At this place it's however now known what clauses are compatible with
multivariate stats, or what attributes they are referencing. It might be
possible to simply call pull_varattnos() and pass it to the hook, except
that does not work with RestrictInfo :-/

Or maybe we could / should not put the hook into clauselist_selectivity
but somewhere else? Say, to get_relation_info where we actually read the
list of stats for the relation?

0001:

- I also don't think it is right thing for expression_tree_walker
to recognize RestrictInfo since it is not a part of expression.

Yes. In my working git repo, I've reworked this to use the second
option, i.e. adding RestrictInfo pull_(varno|varattno)_walker:

https://github.com/tvondra/postgres/commit/2dc79b914c759d31becd8ae670b37b79663a595f

Do you think this is the correct solution? If not, how to fix it?

The reason why I think it is not appropreate is that RestrictInfo
is not a part of expression.

Increasing selectivity of a condition by column correlation is
occurs only for a set of conjunctive clauses. OR operation
devides the sets. Is it agreeable? RestrictInfos can be nested
each other and we should be aware of the AND/OR operators. This
is what expression_tree_walker doesn't.

I still don't understand why you think we need to differentiate between
AND and OR operators. There's nothing wrong with estimating OR clauses
using multivariate statistics.

Perhaps we should provide the dedicate function such like
find_conjunctive_attr_set which does this,

Perhaps. The reason why I added support for RestrictInfo into the
existing walker implementations is that it seemed like the easiest way
to fix the issue. But if there are reasons why that's incorrect, then
inventing a new function is probably the right way.

- Check the type top expression of the clause

- If it is a RestrictInfo, check clause_relids then check
clause.

- If it is a bool OR, stop to search and return empty set of
attributes.

- If it is a bool AND, make further check of the components. A
list of RestrictInfo should be treaed as AND connection.

- If it is operator exression, collect used relids and attrs
walking the expression tree.

I should missing something but I think the outline is correct.

As I said before, there's nothing wrong with estimating OR clauses using
multivariate statistics. So OR and AND should be handled exactly the same.

I think you're missing the fact that it's not enough to look at the
relids from the RestrictInfo - we need to actually check what clauses
are used inside, i.e. we need to check the clauses.

That's because only some of the clauses are compatible with multivariate
stats, and only if all the clauses of the BoolExpr are "compatible" then
we can estimate the clause as a whole. If it's a mix of supported and
unsupported clauses, we can simply pass it to clauselist_selectivity
which will repeat the whole process with.

Addition to that we should carefully avoid duplicate correction
using the same mv statistics.

Sure. That's what choose_mv_statistics does.

I haven't understood what choose_mv_satistics precisely but I
suppose what this function does would be split into the 'making
parameter to find stats' part and 'matching the parameter with
stats in order to retrieve desired stats' part. Could you
reconstruct this process into the form like this?

The goal of choose_mv_statistics does is very simple - given a list of
clauses, it tries to find the best combination of statistics, exploiting
as much information as possible.

So let's say you have clauses

WHERE a=1 AND b=1 AND c=1 AND d=1

but you only have statistics on [a,b], [b,c] and [b,c,d].

The simplest approach would be to use the 'largest' statistics, covering
the most columns from the clauses - in this case [b,c,d]. This is what
the initial patches do.

The last patch improves this significantly, by combining the statistics
using conditional probability. In this case it'd probably use all three
statistics, effectively decomposing the selectivity like this:

P(a=1,b=1,c=1,d=1) = P(a=1,b=1) * P(c=1|b=1) * P(d=1|b=1,c=1)
[a,b] [b,c] [b,c,d]

And each of those probabilities can be estimated using one of the stats.

I feel it is too invasive, or exccesively intermix(?)ed.

I don't think it really fits your model - the hook has to be called much
sooner, effectively at the very beginning of the clauselist_selectivity
or even before that. Otherwise it might not get called at all (e.g. if
there are no multivariate stats on the table, this whole part will be
skipped).

Why should it stop at disjunctions? There's nothing wrong with using
multivariate stats to estimate OR-clauses, IMHO.

Mv statistics represents how often *every combination of the
column values* occurs. Is it correct? Where the combination can
be replaced with coexists, that is AND. For example MV-MCV.

(a, b, c) freq
(1, 2, 3) 100
(1, 2, 5) 50
(1, 3, 8) 20
(1, 7, 2) 5
===============
total 175

| select * from t where a = 1 and b = 2 and c = 3;
| SELECT 100

This is correct,

| select * from t where a = 1 and b = 2 or c = 3;
| SELECT 100

This is *not* correct. The correct number of tuples is 150.
This is a simple example where OR breaks MV stats assumption.

No, it does not.

I'm not sure where are the numbers coming from, though. So let's see how
this actually works with multivariate statistics. I'll create a table
with the 4 combinations you used in your example, but with 1000x more
rows, to make the estimates a bit more accurate:

CREATE TABLE t (a INT, b INT, c INT);

INSERT INTO t SELECT 1, 2, 3 FROM generate_series(1,100000);
INSERT INTO t SELECT 1, 2, 5 FROM generate_series(1,50000);
INSERT INTO t SELECT 1, 3, 8 FROM generate_series(1,20000);
INSERT INTO t SELECT 1, 7, 2 FROM generate_series(1,5000);

ALTER TABLE t ADD STATISTICS (mcv) ON (a,b,c);

ANALYZE t;

And now let's see the two queries:

EXPLAIN select * from t where a = 1 and b = 2 and c = 3;
QUERY PLAN
----------------------------------------------------------
Seq Scan on t (cost=0.00..4008.50 rows=100403 width=12)
Filter: ((a = 1) AND (b = 2) AND (c = 3))
(2 rows)

EXPLAIN select * from t where a = 1 and b = 2 or c = 3;
QUERY PLAN
----------------------------------------------------------
Seq Scan on t (cost=0.00..4008.50 rows=150103 width=12)
Filter: (((a = 1) AND (b = 2)) OR (c = 3))
(2 rows)

So the first query estimates 100k rows, the second one 150k rows.
Exactly as expected, because MCV lists are discrete, match perfectly the
data and behave exactly like your mental model.

If you try this with histograms though, you'll get the same estimate in
both cases:

ALTER TABLE t DROP STATISTICS ALL;
ALTER TABLE t ADD STATISTICS (histogram) ON (a,b,c);
ANALYZE t;

EXPLAIN select * from t where a = 1 and b = 2 and c = 3;
QUERY PLAN
---------------------------------------------------------
Seq Scan on t (cost=0.00..4008.50 rows=52707 width=12)
Filter: ((a = 1) AND (b = 2) AND (c = 3))
(2 rows)

EXPLAIN select * from t where a = 1 and b = 2 or c = 3;
QUERY PLAN
---------------------------------------------------------
Seq Scan on t (cost=0.00..4008.50 rows=52707 width=12)
Filter: (((a = 1) AND (b = 2)) OR (c = 3))
(2 rows)

That's unfortunate, but it has nothing to do with some assumptions of
multivariate statistics. The "problem" is that histograms are naturally
fuzzy, and both conditions hit the same bucket.

The solution is simple - don't use histograms for such discrete data.

====
=# CREATE TABLE t1 (a int, b int, c int);
=# INSERT INTO t1 (SELECT a, a * 2, a * 3 FROM generate_series(0,
9999) a);
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=1 width=12)
=# ALTER TABLE t1 ADD STATISTICS (HISTOGRAM) ON (a, b, c);
=# ANALZYE t1;
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=268 width=12)
====
Rows changed unwantedly.

That has nothing to do with OR clauses, but rather with using a
type of statistics that does not fit the data and queries.
Histograms are quite inaccurate for discrete data and equality
conditions - in this case the clauses probably match one bucket,
and so we use 1/2 the bucket as an estimate. There's nothing wrong
with that.

So let's use MCV instead:

Hmm, it's not a problem what specific number is displayed as
rows. What is crucial is the fact that rows has changed even
though it shouldn't have changed. As I demonstrated above.

Again, that has nothing to do with any assumptions, and it certainly
does not demonstrate that OR clauses should not be handled by
multivariate statistics.

In this case, you're observing two effects.

(1) Natural inaccuracy of histograms when used for discrete data,
especially in combination with equality conditions (because
that's impossible to estimate accurately with histograms).

(2) The original estimate (without multivariate statistics) is only
seemingly accurate, because it falsely assumes independence.
It simply assumes that each condition matches 1/10000 of the
table, and multiplies that, getting ~0.00001 row estimate. This
is rounded up to 1, which is accidentally the exact value.

Let me demonstrate this on two examples - one with discrete data, one
with continuous distribution.

1) discrete data

CREATE TABLE t (a INT, b INT, c INT);
INSERT INTO t SELECT i/1000, 2*(i/1000), 3*(i/1000)
FROM generate_series(1, 1000000) s(i);
ANALYZE t;

-- no multivariate stats (so assumption of independence)

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 and c = 3;

Seq Scan on t (cost=0.00..22906.00 rows=1 width=12)
(actual time=0.290..59.120 rows=1000 loops=1)

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 or c = 3;

Seq Scan on t (cost=0.00..22906.00 rows=966 width=12)
(actual time=0.434..117.643 rows=1000 loops=1)

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 or c = 6;

Seq Scan on t (cost=0.00..22906.00 rows=966 width=12)
(actual time=0.433..96.956 rows=2000 loops=1)

-- now let's add a histogram

ALTER TABLE t ADD STATISTICS (histogram) on (a,b,c);
ANALYZE t;

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 and c = 3;

Seq Scan on t (cost=0.00..22906.00 rows=817 width=12)
(actual time=0.268..116.318 rows=1000 loops=1)

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 or c = 3;

Seq Scan on t (cost=0.00..22906.00 rows=30333 width=12)
(actual time=0.435..93.232 rows=1000 loops=1)

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 or c = 6;

Seq Scan on t (cost=0.00..22906.00 rows=30333 width=12)
(actual time=0.434..122.930 rows=2000 loops=1)

-- now let's use a MCV list

ALTER TABLE t DROP STATISTICS ALL;
ALTER TABLE t ADD STATISTICS (mcv) on (a,b,c);
ANALYZE t;

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 and c = 3;

Seq Scan on t (cost=0.00..22906.00 rows=767 width=12)
(actual time=0.268..70.604 rows=1000 loops=1)

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 or c = 3;

Seq Scan on t (cost=0.00..22906.00 rows=767 width=12)
(actual time=0.268..70.604 rows=1000 loops=1)

EXPLAIN ANALYZE select * from t where a = 1 and b = 2 or c = 6;

Seq Scan on t (cost=0.00..22906.00 rows=1767 width=12)
(actual time=0.428..100.607 rows=2000 loops=1)

The default estimate of AND query is rather bad. For OR clause, it's not
that bad (the OR selectivity is not that bad when it comes to
dependency, but it's not difficult to construct counter examples).

The histogram is not that good - for the OR queries it often results in
over-estimates (for equality conditions on discrete data).

But the MCV estimates are very accurate. The slight under-estimate is
probably caused by the block sampling we're using to get sample rows.

2) continuous data (I'll only show histograms)

CREATE TABLE t (a FLOAT, b FLOAT, c FLOAT);
INSERT INTO t SELECT r,
r + r*(random() - 0.5)/2,
r + r*(random() - 0.5)/2
FROM (SELECT random() as r
FROM generate_series(1,1000000)) foo;
ANALYZE t;

-- no multivariate stats
EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 and c < 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=28768 width=24)
(actual time=0.026..323.383 rows=273897 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c < 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=372362 width=24)
(actual time=0.026..375.005 rows=317533 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c > 0.9;
Seq Scan on t (cost=0.00..23870.00 rows=192979 width=24)
(actual time=0.026..431.376 rows=393528 loops=1)

-- histograms
ALTER TABLE t ADD STATISTICS (histogram) on (a,b,c);
ANALYZE t;

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 and c < 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=267033 width=24)
(actual time=0.021..330.487 rows=273897 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c > 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=14317 width=24)
(actual time=0.027..906.321 rows=966870 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c > 0.9;
Seq Scan on t (cost=0.00..23870.00 rows=20367 width=24)
(actual time=0.028..452.494 rows=393528 loops=1)

This seems wrong, because the estimate for the OR queries should not be
lower than the estimate for the first query (with just AND), and it
should not increase when increasing the boundary. I'd bet this is a bug
in how the inequalities are handled with histograms, or how the AND/OR
clauses are combined. I'll look into that.

But once again, there's nothing that would make OR clauses somehow
incompatible with multivariate stats.

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#44)

Re: multivariate statistics / patch v7

Hello Horiguchi-san!

On 07/07/2015 09:43 PM, Tomas Vondra wrote:

-- histograms
ALTER TABLE t ADD STATISTICS (histogram) on (a,b,c);
ANALYZE t;

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 and c < 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=267033 width=24)
(actual time=0.021..330.487 rows=273897 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c > 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=14317 width=24)
(actual time=0.027..906.321 rows=966870 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c > 0.9;
Seq Scan on t (cost=0.00..23870.00 rows=20367 width=24)
(actual time=0.028..452.494 rows=393528 loops=1)

This seems wrong, because the estimate for the OR queries should not be
lower than the estimate for the first query (with just AND), and it
should not increase when increasing the boundary. I'd bet this is a bug
in how the inequalities are handled with histograms, or how the AND/OR
clauses are combined. I'll look into that.

FWIW this was a stupid bug in update_match_bitmap_histogram(), which
initially handled only AND clauses, and thus assumed the "match" of a
bucket can only decrease. But for OR clauses this is exactly the
opposite (we assume no buckets match and add buckets matching at least
one of the clauses).

With this fixed, the estimates look like this:

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 and c < 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=267033 width=24)
(actual time=0.102..321.524 rows=273897 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c < 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=319400 width=24)
(actual time=0.103..386.089 rows=317533 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c > 0.3;
Seq Scan on t (cost=0.00..23870.00 rows=956833 width=24)
(actual time=0.133..908.455 rows=966870 loops=1)

EXPLAIN ANALYZE select * from t where a < 0.3 and b < 0.3 or c > 0.9;
Seq Scan on t (cost=0.00..23870.00 rows=393633 width=24)
(actual time=0.105..440.607 rows=393528 loops=1)

IMHO pretty accurate estimates - no issue with OR clauses.

I've pushed this to github [1]https://github.com/tvondra/postgres/tree/mvstats but I need to do some additional fixes. I
also had to remove some optimizations while fixing this, and will have
to reimplement those.

That's not to say that the handling of OR-clauses is perfectly correct.
After looking at clauselist_selectivity_or(), I believe it's a bit
broken and will need a bunch of fixes, as explained in the FIXMEs I
pushed to github.

[1]: https://github.com/tvondra/postgres/tree/mvstats

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Tomas Vondra (#45)

Re: multivariate statistics / patch v7

Hi, Thanks for the detailed explaination. I misunderstood the
code (more honest speaking, din't look so close there). Then I
looked it closer.

At Wed, 08 Jul 2015 03:03:16 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <559C76D4.2030805@2ndquadrant.com>

FWIW this was a stupid bug in update_match_bitmap_histogram(), which
initially handled only AND clauses, and thus assumed the "match" of a
bucket can only decrease. But for OR clauses this is exactly the
opposite (we assume no buckets match and add buckets matching at least
one of the clauses).

With this fixed, the estimates look like this:

IMHO pretty accurate estimates - no issue with OR clauses.

Ok, I understood the diferrence between what I thought and what
you say. The code is actually concious of OR clause but is looks
somewhat confused.

Currently choosing mv stats in clauselist_selectivity can be
outlined as following,

1. find_stats finds candidate mv stats containing *all*
attributes appeared in the whole clauses regardless of and/or
exprs by walking whole the clause tree.

Perhaps this is the measure to early bailout.

2.1. Within every disjunction elements, collect mv-related
attributes while checking whether the all leaf nodes (binop or
ifnull) are compatible by (eventually) walking whole the
clause tree.

2.2. Check if all the collected attribute are contained in
mv-stats columns.

3. Finally, clauseset_mv_selectivity_histogram() (and others).

This funciton applies every ExprOp onto every attribute in
every histogram backes and (tries to) make the boolean
operation of the result bitmaps.

I have some comments on the implement and I also try to find the
solution for them.

1. The flow above looks doing very similiar thins repeatedly.

2. I believe what the current code does can be simplified.

3. As you mentioned in comments, some additional infrastructure
needed.

After all, I think what we should do after this are as follows,
as the first step.

- Add the means to judge the selectivity operator(?) by other
than oprrest of the op of ExprOp. (You missed neqsel already)

I suppose one solution for this is adding oprmvstats taking
'm', 'h' and 'f' and their combinations. Or for the
convenience, it would be a fixed-length string like this.

oprname | oprmvstats
= | 'mhf'
<> | 'mhf'
< | 'mh-'

| 'mh-'
= | 'mh-'

<= | 'mh-'

This would make the code in clause_is_mv_compatible like this.

oprmvstats = get_mvstatsset(expr->opno); /* bitwise representation */
if (oprmvstats & types)
{
*attnums = bms_add_member(*attnums, var->varattno);
return true;
}
return false;

- Current design just manage to work but it is too complicated
and hardly have affinity with the existing estimation
framework. I proposed separation of finding stats phase and
calculation phase, but I would like to propose transforming
RestrictInfo(and finding mvstat) phase and running the
transformed RestrictInfo phase after looking close to the
patch.

I think transforing RestrictInfo makes the situnation
better. Since it nedds different information, maybe it is
better to have new struct, say, RestrictInfoForEstimate
(boo!). Then provide mvstatssel() to use in the new struct.
The rough looking of the code would be like below.

clauselist_selectivity()
{
...
RestrictInfoForEstmate *esclause =
transformClauseListForEstimation(root, clauses, varRelid);
...

return clause_selectivity(esclause):
}

clause_selectivity(RestrictInfoForEstmate *esclause)
{
if (IsA(clause, RestrictInfo))...
if (IsA(clause, RestrictInfoForEstimate))
{
RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause;
if (ecl->selfunc)
{
sx = ecl->selfunc(root, ecl);
}
}
if (IsA(clause, Var))...
}

transformClauseListForEstimation(...)
{
...

relid = collect_mvstats_info(root, clause, &attlist);
if (!relid) return;
if (get_mvstats_hook)
mvstats = (*get_mvstats_hoook) (root, relid, attset);
else
mvstats = find_mv_stats(root, relid, attset))
}
...

I've pushed this to github [1] but I need to do some additional
fixes. I also had to remove some optimizations while fixing this, and
will have to reimplement those.

That's not to say that the handling of OR-clauses is perfectly
correct. After looking at clauselist_selectivity_or(), I believe it's
a bit broken and will need a bunch of fixes, as explained in the
FIXMEs I pushed to github.

[1] https://github.com/tvondra/postgres/tree/mvstats

I don't see whether it is doable or not, and I suppose you're
unwilling to change the big picture, so I will consider the idea
and will show you the result, if it turns out to be possible and
promising.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#46)

Re: multivariate statistics / patch v7

Hi,

On 07/13/2015 10:51 AM, Kyotaro HORIGUCHI wrote:

Ok, I understood the diferrence between what I thought and what you
say. The code is actually concious of OR clause but is looks somewhat
confused.

I'm not sure which part is confused by the OR clauses, but it's
certainly possible. Initially it only handled AND clauses, and the
support for OR clauses was added later, so it's possible some parts are
not behaving correctly.

Currently choosing mv stats in clauselist_selectivity can be
outlined as following,

1. find_stats finds candidate mv stats containing *all*
attributes appeared in the whole clauses regardless of and/or
exprs by walking whole the clause tree.

Perhaps this is the measure to early bailout.

Not entirely. The goal of find_stats() is to lookup all stats on the
'current' relation - it's coded the way it is because I had to deal with
varRelid=0 cases, in which case I have to inspect the Var nodes. But
maybe I got this wrong and there's much simpler way to do that?

It is an early bailout in the sense that if there are no multivariate
stats defined on the table, there's no point in doing any of the
following steps. So that we don't increase planning times for users not
using multivariate stats.

2.1. Within every disjunction elements, collect mv-related
attributes while checking whether the all leaf nodes (binop or
ifnull) are compatible by (eventually) walking whole the
clause tree.

Generally, yes. The idea is to check whether there are clauses that
might be estimated using multivariate statistics, and whether the
clauses reference at least two different attributes. Imagine a query
like this:

SELECT * FROM t WHERE (a=1) AND (a>0) AND (a<100)

It makes no sense to process this using multivariate statistics, because
all the Var nodes reference a single attribute.

Similarly, the check is not just about the leaf nodes - to be able to
estimate a clause at this point, we have to be able to process the whole
tree, starting from the top-level clause. Although maybe that's no
longer true, now that support for OR clauses was added ... I wonder
whether there are other BoolExpr-like nodes, that might make the tree
incompatible with multivariate statistics (in the sense that the current
implementation does not know how to handle them).

Also note that even though the clause may be "incompatible" at this
level, it may get partially processed by multivariate statistics later.
For example with a query:

SELECT * FROM t WHERE (a=1 OR b=2 OR c ~* 'xyz') AND (q=1 OR r=4)

the first query is "incompatible" because it contains unsupported
operator '~*', but it will eventually be processed as BoolExpr node, and
should be split into two parts - (a=1 OR b=2) which is compatible, and
(c ~* 'xyz') which is incompatible.

This split should happen in clauselist_selectivity_or(), and the other
thing that may be interesting is that it uses (q=1 OR r=4) as a
condition. So if there's a statistics built on (a,b,q,r) we'll compute
conditional probability

P(a=1,b=2 | q=1,r=4)

2.2. Check if all the collected attribute are contained in
mv-stats columns.

No, I think you got this wrong. We do not check that *all* the
attributes are contained in mvstats columns - we only need two such
columns (then there's a chance that the multivariate statistics will get
applied).

Anyway, both 2.1 and 2.2 are meant as a quick bailout, before doing the
most expensive part, which is choose_mv_statistics(). Which is however
missing in this list.

3. Finally, clauseset_mv_selectivity_histogram() (and others).

This funciton applies every ExprOp onto every attribute in
every histogram backes and (tries to) make the boolean
operation of the result bitmaps.

Yes, but this only happens after choose_mv_statistics(), because that's
the code that decides which statistics will be used and in what order.

The list is also missing handling of the 'functional dependencies', so a
complete list of steps would look like this:

1) find_stats - lookup stats on the current relation (from RelOptInfo)

2) apply functional dependencies

a) check if there are equality clauses that may be reduced using
functional dependencies, referencing at least two columns

b) if yes, perform the clause reduction

3) apply MCV lists and histograms

a) check if there are clauses 'compatible' with those types of
statistics, again containing at least two columns

b) if yes, use choose_mv_statistics() to decide which statistics to
apply and in which order

c) apply the selected histograms and MCV lists

4) estimate the remaining clauses using the regular statistics

a) this is where the clauselist_mv_selectivity_histogram and other
are called

I tried to explain this in the comment before clauselist_selectivity(),
but maybe it's not detailed enough / missing some important details.

I have some comments on the implement and I also try to find the
solution for them.

1. The flow above looks doing very similiar thins repeatedly.

I worked hard to remove such code duplicities, and believe all the
current steps are necessary - for example 2(a) and 3(a) may seems
similar, but it's really necessary to do that twice.

2. I believe what the current code does can be simplified.

Possibly.

3. As you mentioned in comments, some additional infrastructure
needed.

After all, I think what we should do after this are as follows,
as the first step.

OK.

- Add the means to judge the selectivity operator(?) by other
than oprrest of the op of ExprOp. (You missed neqsel already)

Yes, the way we use 'oprno' to determine how to estimate the selectivity
is a bit awkward. It's inspired by handling of range queries, and having
something better would be nice.

But I don't think this is the reason why I missed neqsel, and I don't
see this as a significant design issue at this point. But if we can come
up with a better solution, why not ...

I suppose one solution for this is adding oprmvstats taking
'm', 'h' and 'f' and their combinations. Or for the
convenience, it would be a fixed-length string like this.

oprname | oprmvstats
= | 'mhf'
<> | 'mhf'
< | 'mh-'

| 'mh-'
= | 'mh-'

<= | 'mh-'

This would make the code in clause_is_mv_compatible like this.

oprmvstats = get_mvstatsset(expr->opno); /* bitwise representation */
if (oprmvstats & types)
{
*attnums = bms_add_member(*attnums, var->varattno);
return true;
}
return false;

So this only determines the compatibility of operators with respect to
different types of statistics? How does that solve the neqsel case? It
will probably decide the clause is compatible, but it will later fail at
the actual estimation, no?

- Current design just manage to work but it is too complicated
and hardly have affinity with the existing estimation
framework.

I respectfully disagree. I've strived to make it as affine to the
current implementation as possible - maybe it's possible to improve
that, but I believe there's a natural difference between the two types
of statistics. It may be somewhat simplified, but it will never be
exactly the same.

I proposed separation of finding stats phase and
calculation phase, but I would like to propose transforming
RestrictInfo(and finding mvstat) phase and running the
transformed RestrictInfo phase after looking close to the
patch.

Those phases are already separated, as is illustrated by the steps
explained above.

So technically we might place a hook either right after the find_stats()
call, so that it's possible to process all the stats on the table, or
maybe after the choose_mv_statistics() call, so that we only process the
actually used stats.

I think transforing RestrictInfo makes the situnation
better. Since it nedds different information, maybe it is
better to have new struct, say, RestrictInfoForEstimate
(boo!). Then provide mvstatssel() to use in the new struct.
The rough looking of the code would be like below.

clauselist_selectivity()
{
...
RestrictInfoForEstmate *esclause =
transformClauseListForEstimation(root, clauses, varRelid);
...

return clause_selectivity(esclause):
}

clause_selectivity(RestrictInfoForEstmate *esclause)
{
if (IsA(clause, RestrictInfo))...
if (IsA(clause, RestrictInfoForEstimate))
{
RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause;
if (ecl->selfunc)
{
sx = ecl->selfunc(root, ecl);
}
}
if (IsA(clause, Var))...
}

transformClauseListForEstimation(...)
{
...

relid = collect_mvstats_info(root, clause, &attlist);
if (!relid) return;
if (get_mvstats_hook)
mvstats = (*get_mvstats_hoook) (root, relid, attset);
else
mvstats = find_mv_stats(root, relid, attset))
}
...

So you'd transform the clause tree first, replacing parts of the tree
(to be estimated by multivariate stats) by a new node type? That's an
interesting idea, I think ...

I can't really say whether it's a good approach, though. Can you explain
why do you think it'd make the situation better?

The one benefit I can think of is being able to look at the processed
tree and see which parts will be estimated using multivariate stats.

But we'd effectively have to do the same stuff (choosing the stats,
...), and if we move this pre-processing before clauselist_selectivity
(I assume that's the point), we'd end up repeating a lot of the code. Or
maybe not, I'm not sure.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#46)

2 attachment(s)

Re: multivariate statistics / patch v7

Hi, I'd like to show you the modified constitution of
multivariate statistics application logic. Please find the
attached. They apply on your v7 patch.

The code to find mv-applicable clause is moved out of the main
flow of clauselist_selectivity. As I said in the previous mail,
the new function transformRestrictInfoForEstimate (too bad name
but just for PoC:) scans clauselist and generates
RestrictStatsData struct which drives mv-aware selectivity
calculation. This struct isolates MV and non-MV estimation.

The struct RestrictStatData mainly consists of the following
three parts,

- clause to be estimated by current logic (MV is not applicable)
- clause to be estimated by MV-staistics.
- list of child RestrictStatDatas, which are to be run
recursively.

mvclause_selectivty() is the topmost function where mv stats
works. This structure effectively prevents main estimation flow
from being broken by modifying mvstats part. Although I haven't
measured but I'm positive the code is far reduced from yours.

I attached two patches to this message. The first one is to
rebase v7 patch to current(maybe) master and the second applies
the refactoring.

I'm a little anxious about performance but I think this makes the
process to apply mv-stats far clearer. Regtests for mvstats
succeeded asis except for fdep, which is not implememted in this
patch.

What do you think about this?

regards,

Hi, Thanks for the detailed explaination. I misunderstood the
code (more honest speaking, din't look so close there). Then I
looked it closer.

At Wed, 08 Jul 2015 03:03:16 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <559C76D4.2030805@2ndquadrant.com>

FWIW this was a stupid bug in update_match_bitmap_histogram(), which
initially handled only AND clauses, and thus assumed the "match" of a
bucket can only decrease. But for OR clauses this is exactly the
opposite (we assume no buckets match and add buckets matching at least
one of the clauses).

With this fixed, the estimates look like this:

IMHO pretty accurate estimates - no issue with OR clauses.

Ok, I understood the diferrence between what I thought and what
you say. The code is actually concious of OR clause but is looks
somewhat confused.

Currently choosing mv stats in clauselist_selectivity can be
outlined as following,

1. find_stats finds candidate mv stats containing *all*
attributes appeared in the whole clauses regardless of and/or
exprs by walking whole the clause tree.

Perhaps this is the measure to early bailout.

2.1. Within every disjunction elements, collect mv-related
attributes while checking whether the all leaf nodes (binop or
ifnull) are compatible by (eventually) walking whole the
clause tree.

2.2. Check if all the collected attribute are contained in
mv-stats columns.

3. Finally, clauseset_mv_selectivity_histogram() (and others).

This funciton applies every ExprOp onto every attribute in
every histogram backes and (tries to) make the boolean
operation of the result bitmaps.

I have some comments on the implement and I also try to find the
solution for them.

1. The flow above looks doing very similiar thins repeatedly.

2. I believe what the current code does can be simplified.

3. As you mentioned in comments, some additional infrastructure
needed.

After all, I think what we should do after this are as follows,
as the first step.

- Add the means to judge the selectivity operator(?) by other
than oprrest of the op of ExprOp. (You missed neqsel already)

I suppose one solution for this is adding oprmvstats taking
'm', 'h' and 'f' and their combinations. Or for the
convenience, it would be a fixed-length string like this.

oprname | oprmvstats
= | 'mhf'
<> | 'mhf'
< | 'mh-'

| 'mh-'
= | 'mh-'

<= | 'mh-'

This would make the code in clause_is_mv_compatible like this.

oprmvstats = get_mvstatsset(expr->opno); /* bitwise representation */
if (oprmvstats & types)
{
*attnums = bms_add_member(*attnums, var->varattno);
return true;
}
return false;

- Current design just manage to work but it is too complicated
and hardly have affinity with the existing estimation
framework. I proposed separation of finding stats phase and
calculation phase, but I would like to propose transforming
RestrictInfo(and finding mvstat) phase and running the
transformed RestrictInfo phase after looking close to the
patch.

I think transforing RestrictInfo makes the situnation
better. Since it nedds different information, maybe it is
better to have new struct, say, RestrictInfoForEstimate
(boo!). Then provide mvstatssel() to use in the new struct.
The rough looking of the code would be like below.

clauselist_selectivity()
{
...
RestrictInfoForEstmate *esclause =
transformClauseListForEstimation(root, clauses, varRelid);
...

return clause_selectivity(esclause):
}

clause_selectivity(RestrictInfoForEstmate *esclause)
{
if (IsA(clause, RestrictInfo))...
if (IsA(clause, RestrictInfoForEstimate))
{
RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause;
if (ecl->selfunc)
{
sx = ecl->selfunc(root, ecl);
}
}
if (IsA(clause, Var))...
}

transformClauseListForEstimation(...)
{
...

relid = collect_mvstats_info(root, clause, &attlist);
if (!relid) return;
if (get_mvstats_hook)
mvstats = (*get_mvstats_hoook) (root, relid, attset);
else
mvstats = find_mv_stats(root, relid, attset))
}
...

I've pushed this to github [1] but I need to do some additional
fixes. I also had to remove some optimizations while fixing this, and
will have to reimplement those.

That's not to say that the handling of OR-clauses is perfectly
correct. After looking at clauselist_selectivity_or(), I believe it's
a bit broken and will need a bunch of fixes, as explained in the
FIXMEs I pushed to github.

[1] https://github.com/tvondra/postgres/tree/mvstats

I don't see whether it is doable or not, and I suppose you're
unwilling to change the big picture, so I will consider the idea
and will show you the result, if it turns out to be possible and
promising.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-rebase-v7-patch-to-current-master.patchtext/x-patch; charset=us-asciiDownload

>From bd5a497a8eaa3276f4491537d2633268de079b18 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 6 Jul 2015 17:42:36 +0900
Subject: [PATCH 1/2] rebase v7 patch to current master

---
 src/backend/nodes/nodeFuncs.c | 1 +
 src/include/catalog/pg_proc.h | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 9932c8c..115ff98 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1996,6 +1996,7 @@ expression_tree_walker(Node *node,
 		case T_RangeTblFunction:
 			return walker(((RangeTblFunction *) node)->funcexpr, context);
 		case T_RestrictInfo:
+			elog(LOG, "HOGEEEEE: RestrictInfo");
 			return walker(((RestrictInfo *) node)->clause, context);
 		default:
 			elog(ERROR, "unrecognized node type: %d",
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 7810f97..b1e78a8 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2735,9 +2735,9 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
-DATA(insert OID = 3307 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DATA(insert OID = 3311 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies info");
-DATA(insert OID = 3308 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DATA(insert OID = 3312 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
 DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
 DESCR("multi-variate statistics: MCV list info");
-- 
1.8.3.1

0002-PoC-Planner-part-refactoring-of-mv-stats-facility.patchtext/x-patch; charset=us-asciiDownload

>From 77ccd9c8d455a365b2ad6eb779ed76da0d431e3f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 16 Jul 2015 13:56:58 +0900
Subject: [PATCH 2/2] PoC: Planner part refactoring of mv stats facility

---
 src/backend/catalog/pg_operator.c      |    6 +
 src/backend/nodes/nodeFuncs.c          |    2 +-
 src/backend/optimizer/path/clausesel.c | 4107 +++++++-------------------------
 src/backend/utils/cache/lsyscache.c    |   40 +
 src/include/catalog/pg_operator.h      | 1550 ++++++------
 src/include/nodes/nodes.h              |    1 +
 src/include/nodes/relation.h           |   22 +-
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/lsyscache.h          |    1 +
 src/include/utils/mvstats.h            |    3 +
 10 files changed, 1668 insertions(+), 4070 deletions(-)

diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 072f530..dea39d3 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -251,6 +251,9 @@ OperatorShellMake(const char *operatorName,
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(InvalidOid);
 
+	/* XXXX: How this should be implemented? */
+	values[Anum_pg_operator_oprmvstat - 1] = CStringGetTextDatum("---");
+
 	/*
 	 * open pg_operator
 	 */
@@ -508,6 +511,9 @@ OperatorCreate(const char *operatorName,
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(restrictionId);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(joinId);
 
+	/* XXXX: How this should be implemented? */
+	values[Anum_pg_operator_oprmvstat - 1] = CStringGetTextDatum("---");
+
 	pg_operator_desc = heap_open(OperatorRelationId, RowExclusiveLock);
 
 	/*
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 115ff98..00ef04b 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1996,7 +1996,7 @@ expression_tree_walker(Node *node,
 		case T_RangeTblFunction:
 			return walker(((RangeTblFunction *) node)->funcexpr, context);
 		case T_RestrictInfo:
-			elog(LOG, "HOGEEEEE: RestrictInfo");
+//			elog(LOG, "HOGEEEEE: RestrictInfo");
 			return walker(((RestrictInfo *) node)->clause, context);
 		default:
 			elog(ERROR, "unrecognized node type: %d",
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index fce77ec..61f3cd8 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -46,13 +46,6 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
-static Selectivity clauselist_selectivity_or(PlannerInfo *root,
-											 List *clauses,
-											 int varRelid,
-											 JoinType jointype,
-											 SpecialJoinInfo *sjinfo,
-											 List *conditions);
-
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -60,38 +53,6 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
 
-static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
-							 int type);
-
-static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
-									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
-									  int type);
-
-static Bitmapset *clause_mv_get_attnums(PlannerInfo *root, Node *clause);
-
-static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
-								Oid varRelid, List *stats,
-								SpecialJoinInfo *sjinfo);
-
-static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
-								 List *clauses, Oid varRelid,
-								 List **mvclauses, MVStatisticInfo *mvstats, int types);
-
-static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-						MVStatisticInfo *mvstats, List *clauses,
-						List *conditions, bool is_or);
-
-static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									MVStatisticInfo *mvstats,
-									List *clauses, List *conditions,
-									bool is_or, bool *fullmatch,
-									Selectivity *lowsel);
-static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									MVStatisticInfo *mvstats,
-									List *clauses, List *conditions,
-									bool is_or);
-
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
 									int nmatches, char * matches,
@@ -104,79 +65,11 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
-/*
- * Describes a combination of multiple statistics to cover attributes
- * referenced by the clauses. The array 'stats' (with nstats elements)
- * lists attributes (in the order as they are applied), and number of
- * clause attributes covered by this solution.
- *
- * choose_mv_statistics_exhaustive() uses this to track both the current
- * and the best solutions, while walking through the state of possible
- * combination.
- */
-typedef struct mv_solution_t {
-	int		nclauses;		/* number of clauses covered */
-	int		nconditions;	/* number of conditions covered */
-	int		nstats;			/* number of stats applied */
-	int	   *stats;			/* stats (in the apply order) */
-} mv_solution_t;
-
-static List *choose_mv_statistics(PlannerInfo *root,
-								List *mvstats,
-								List *clauses, List *conditions,
-								Oid varRelid,
-								SpecialJoinInfo *sjinfo);
-
-static List *filter_clauses(PlannerInfo *root, Oid varRelid,
-							SpecialJoinInfo *sjinfo, int type,
-							List *stats, List *clauses,
-							Bitmapset **attnums);
-
-static List *filter_stats(List *stats, Bitmapset *new_attnums,
-						  Bitmapset *all_attnums);
-
-static Bitmapset **make_stats_attnums(MVStatisticInfo *mvstats,
-									  int nmvstats);
-
-static MVStatisticInfo *make_stats_array(List *stats, int *nmvstats);
-
-static List* filter_redundant_stats(List *stats,
-									List *clauses, List *conditions);
-
-static Node** make_clauses_array(List *clauses, int *nclauses);
-
-static Bitmapset ** make_clauses_attnums(PlannerInfo *root, Oid varRelid,
-										 SpecialJoinInfo *sjinfo, int type,
-										 Node **clauses, int nclauses);
-
-static bool* make_cover_map(Bitmapset **stats_attnums, int nmvstats,
-							Bitmapset **clauses_attnums, int nclauses);
-
-static bool has_stats(List *stats, int type);
-
-static List * find_stats(PlannerInfo *root, List *clauses,
-						 Oid varRelid, Index *relid);
-
-static Bitmapset* fdeps_collect_attnums(List *stats);
-
-static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
-static int	*make_attnum_to_idx_mapping(Bitmapset *attnums);
-
-static bool	*build_adjacency_matrix(List *stats, Bitmapset *attnums,
-								int *idx_to_attnum, int *attnum_to_idx);
-
-static void	multiply_adjacency_matrix(bool *matrix, int natts);
-
 static List* fdeps_reduce_clauses(List *clauses,
 								  Bitmapset *attnums, bool *matrix,
 								  int *idx_to_attnum, int *attnum_to_idx,
 								  Index relid);
 
-static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
-					 List *clauses, Bitmapset *deps_attnums,
-					 List **reduced_clauses, List **deps_clauses,
-					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
-
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
 int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
@@ -188,397 +81,41 @@ int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+typedef enum mv_selec_status
+{
+	NORMAL,
+	FULL_MATCH,
+	FAILURE
+} mv_selec_status;
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
+/***************/
+RestrictStatData *
+transformRestrictInfoForEstimate(PlannerInfo *root, List *clauses, int varRelid, SpecialJoinInfo *sjinfo);
 
 /*
- * clauselist_selectivity -
- *	  Compute the selectivity of an implicitly-ANDed list of boolean
- *	  expression clauses.  The list can be empty, in which case 1.0
- *	  must be returned.  List elements may be either RestrictInfos
- *	  or bare expression clauses --- the former is preferred since
- *	  it allows caching of results.
- *
- * See clause_selectivity() for the meaning of the additional parameters.
- *
- * Our basic approach is to take the product of the selectivities of the
- * subclauses.  However, that's only right if the subclauses have independent
- * probabilities, and in reality they are often NOT independent.  So,
- * we want to be smarter where we can.
- *
- * Currently, the only extra smarts we have is to recognize "range queries",
- * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
- * query components if they are restriction opclauses whose operators have
- * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
- * We pair up clauses of this form that refer to the same variable.  An
- * unpairable clause of this kind is simply multiplied into the selectivity
- * product in the normal way.  But when we find a pair, we know that the
- * selectivities represent the relative positions of the low and high bounds
- * within the column's range, so instead of figuring the selectivity as
- * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
- * see that hisel is the fraction of the range below the high bound, while
- * losel is the fraction above the low bound; so hisel can be interpreted
- * directly as a 0..1 value but we need to convert losel to 1-losel before
- * interpreting it as a value.  Then the available range is 1-losel to hisel.
- * However, this calculation double-excludes nulls, so really we need
- * hisel + losel + null_frac - 1.)
- *
- * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
- * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
- * yields an impossible (negative) result.
- *
- * A free side-effect is that we can recognize redundant inequalities such
- * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
- *
- * Of course this is all very dependent on the behavior of
- * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
- *
- *
- * Multivariate statististics
- * --------------------------
- * This also uses multivariate stats to estimate combinations of
- * conditions, in a way (a) maximizing the estimate accuracy by using
- * as many stats as possible, and (b) minimizing the overhead,
- * especially when there are no suitable multivariate stats (so if you
- * are not using multivariate stats, there's no additional overhead).
- *
- * The following checks are performed (in this order), and the optimizer
- * falls back to regular stats on the first 'false'.
- *
- * NOTE: This explains how this works with all the patches applied, not
- *       just the functional dependencies.
- *
- * (0) check if there are multivariate stats on the relation
- *
- *     If no, just skip all the following steps (directly to the
- *     original code).
- *
- * (1) check how many attributes are there in conditions compatible
- *     with functional dependencies
- *
- *     Only simple equality clauses are considered compatible with
- *     functional dependencies (and that's unlikely to change, because
- *     that's the only case when functional dependencies are useful).
- *
- *     If there are no conditions that might be handled by multivariate
- *     stats, or if the conditions reference just a single column, it
- *     makes no sense to use functional dependencies, so skip to (4).
- *
- * (2) reduce the clauses using functional dependencies
- *
- *     This simply attempts to 'reduce' the clauses by applying functional
- *     dependencies. For example if there are two clauses:
- *
- *         WHERE (a = 1) AND (b = 2)
- *
- *     and we know that 'a' determines the value of 'b', we may remove
- *     the second condition (b = 2) when computing the selectivity.
- *     This is of course tricky - see mvstats/dependencies.c for details.
- *
- *     After the reduction, step (1) is to be repeated.
- *
- * (3) check how many attributes are there in conditions compatible
- *     with MCV lists and histograms
- *
- *     What conditions are compatible with multivariate stats is decided
- *     by clause_is_mv_compatible(). At this moment, only conditions
- *     of the form "column operator constant" (for simple comparison
- *     operators), IS [NOT] NULL and some AND/OR clauses are considered
- *     compatible with multivariate statistics.
- *
- *     Again, see clause_is_mv_compatible() for details.
- *
- * (4) check how many attributes are there in conditions compatible
- *     with MCV lists and histograms
- *
- *     If there are no conditions that might be handled by MCV lists
- *     or histograms, or if the conditions reference just a single
- *     column, it makes no sense to continue, so just skip to (7).
- *
- * (5) choose the stats matching the most columns
- *
- *     If there are multiple instances of multivariate statistics (e.g.
- *     built on different sets of columns), we choose the stats covering
- *     the most columns from step (1). It may happen that all available
- *     stats match just a single column - for example with conditions
- *
- *         WHERE a = 1 AND b = 2
- *
- *     and statistics built on (a,c) and (b,c). In such case just fall
- *     back to the regular stats because it makes no sense to use the
- *     multivariate statistics.
- *
- *     For more details about how exactly we choose the stats, see
- *     choose_mv_statistics().
- *
- * (6) use the multivariate stats to estimate matching clauses
- *
- * (7) estimate the remaining clauses using the regular statistics
+ * boolop_selectivity -
  */
-Selectivity
-clauselist_selectivity(PlannerInfo *root,
+static Selectivity
+and_clause_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo,
-					   List *conditions)
+					   SpecialJoinInfo *sjinfo)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
-	/* processing mv stats */
-	Index		relid = InvalidOid;
-
-	/* attributes in mv-compatible clauses */
-	Bitmapset  *mvattnums = NULL;
-	List	   *stats = NIL;
-
-	/* use clauses (not conditions), because those are always non-empty */
-	stats = find_stats(root, clauses, varRelid, &relid);
-
-	/*
-	 * If there's exactly one clause, then no use in trying to match up
-	 * pairs, or matching multivariate statistics, so just go directly
-	 * to clause_selectivity().
-	 */
-	if (list_length(clauses) == 1)
-		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo, conditions);
-
-	/*
-	 * Check that there are some stats with functional dependencies
-	 * built (by walking the stats list). We're going to find that
-	 * anyway when trying to apply the functional dependencies, but
-	 * this is probably a tad faster.
-	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP))
-	{
-		/*
-		 * Collect attributes referenced by mv-compatible clauses (looking
-		 * for clauses compatible with functional dependencies for now).
-		 */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   MV_CLAUSE_TYPE_FDEP);
-
-		/*
-		 * If there are mv-compatible clauses, referencing at least two
-		 * different columns (otherwise it makes no sense to use mv stats),
-		 * try to reduce the clauses using functional dependencies, and
-		 * recollect the attributes from the reduced list.
-		 *
-		 * We don't need to select a single statistics for this - we can
-		 * apply all the functional dependencies we have.
-		 */
-		if (bms_num_members(mvattnums) >= 2)
-			clauses = clauselist_apply_dependencies(root, clauses, varRelid,
-													stats, sjinfo);
-	}
-
-	/*
-	 * Check that there are statistics with MCV list or histogram.
-	 * If not, we don't need to waste time with the optimization.
-	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
-	{
-		/*
-		 * Recollect attributes from mv-compatible clauses (maybe we've
-		 * removed so many clauses we have a single mv-compatible attnum).
-		 * From now on we're only interested in MCV-compatible clauses.
-		 */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-
-		/*
-		 * If there still are at least two columns, we'll try to select
-		 * a suitable combination of multivariate stats. If there are
-		 * multiple combinations, we'll try to choose the best one.
-		 * See choose_mv_statistics for more details.
-		 */
-		if (bms_num_members(mvattnums) >= 2)
-		{
-			int k;
-			ListCell *s;
-
-			/*
-			 * Copy the list of conditions, so that we can build a list
-			 * of local conditions (and keep the original intact, for
-			 * the other clauses at the same level).
-			 */
-			List *conditions_local = list_copy(conditions);
-
-			/* find the best combination of statistics */
-			List *solution = choose_mv_statistics(root, stats,
-												  clauses, conditions,
-												  varRelid, sjinfo);
-
-			/* we have a good solution (list of stats) */
-			foreach (s, solution)
-			{
-				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
-
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
-				List	*mvclauses_new = NIL;
-				List	*mvclauses_conditions = NIL;
-				Bitmapset	*stat_attnums = NULL;
-
-				/* build attnum bitmapset for this statistics */
-				for (k = 0; k < mvstat->stakeys->dim1; k++)
-					stat_attnums = bms_add_member(stat_attnums,
-												  mvstat->stakeys->values[k]);
-
-				/*
-				 * Append the compatible conditions (passed from above)
-				 * to mvclauses_conditions.
-				 */
-				foreach (l, conditions)
-				{
-					Node *c = (Node*)lfirst(l);
-					Bitmapset *tmp = clause_mv_get_attnums(root, c);
-
-					if (bms_is_subset(tmp, stat_attnums))
-						mvclauses_conditions
-							= lappend(mvclauses_conditions, c);
-
-					bms_free(tmp);
-				}
-
-				/* split the clauselist into regular and mv-clauses
-				 *
-				 * We keep the list of clauses (we don't remove the
-				 * clauses yet, because we want to use the clauses
-				 * as conditions of other clauses).
-				 *
-				 * FIXME Do this only once, i.e. filter the clauses
-				 *       once (selecting clauses covered by at least
-				 *       one statistics) and then convert them into
-				 *       smaller per-statistics lists of conditions
-				 *       and estimated clauses.
-				 */
-				clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-
-				/*
-				 * We've chosen the statistics to match the clauses, so
-				 * each statistics from the solution should have at least
-				 * one new clause (not covered by the previous stats).
-				 */
-				Assert(mvclauses != NIL);
-
-				/*
-				 * Mvclauses now contains only clauses compatible
-				 * with the currently selected stats, but we have to
-				 * split that into conditions (already matched by
-				 * the previous stats), and the new clauses we need
-				 * to estimate using this stats.
-				 */
-				foreach (l, mvclauses)
-				{
-					ListCell *p;
-					bool covered = false;
-					Node  *clause = (Node *) lfirst(l);
-					Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
-
-					/*
-					 * If already covered by previous stats, add it to
-					 * conditions.
-					 *
-					 * TODO Maybe this could be relaxed a bit? Because
-					 *      with complex and/or clauses, this might
-					 *      mean no statistics actually covers such
-					 *      complex clause.
-					 */
-					foreach (p, solution)
-					{
-						int k;
-						Bitmapset  *stat_attnums = NULL;
-
-						MVStatisticInfo *prev_stat
-							= (MVStatisticInfo *)lfirst(p);
-
-						/* break if we've ran into current statistic */
-						if (prev_stat == mvstat)
-							break;
-
-						for (k = 0; k < prev_stat->stakeys->dim1; k++)
-							stat_attnums = bms_add_member(stat_attnums,
-														  prev_stat->stakeys->values[k]);
-
-						covered = bms_is_subset(clause_attnums, stat_attnums);
-
-						bms_free(stat_attnums);
-
-						if (covered)
-							break;
-					}
-
-					if (covered)
-						mvclauses_conditions
-							= lappend(mvclauses_conditions, clause);
-					else
-						mvclauses_new
-							= lappend(mvclauses_new, clause);
-				}
-
-				/*
-				 * We need at least one new clause (not just conditions).
-				 */
-				Assert(mvclauses_new != NIL);
-
-				/* compute the multivariate stats */
-				s1 *= clauselist_mv_selectivity(root, mvstat,
-												mvclauses_new,
-												mvclauses_conditions,
-												false); /* AND */
-			}
-
-			/*
-			 * And now finally remove all the mv-compatible clauses.
-			 *
-			 * This only repeats the same split as above, but this
-			 * time we actually use the result list (and feed it to
-			 * the next call).
-			 */
-			foreach (s, solution)
-			{
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
-
-				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
-
-				/* split the list into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-
-				/*
-				 * Add the clauses to the conditions (to be passed
-				 * to regular clauses), irrespectedly whether it
-				 * will be used as a condition or a clause here.
-				 *
-				 * We only keep the remaining conditions in the
-				 * clauses (we keep what clauselist_mv_split returns)
-				 * so we add each MV condition exactly once.
-				 */
-				conditions_local = list_concat(conditions_local, mvclauses);
-			}
-
-			/* from now on, work with the 'local' list of conditions */
-			conditions = conditions_local;
-		}
-	}
-
 	/*
 	 * If there's exactly one clause, then no use in trying to match up
 	 * pairs, so just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo, conditions);
-
+								  varRelid, jointype, sjinfo);
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -591,8 +128,7 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
-								conditions);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -750,250 +286,334 @@ clauselist_selectivity(PlannerInfo *root,
 	return s1;
 }
 
-/*
- * Similar to clauselist_selectivity(), but for clauses connected by OR.
- *
- * That means a few differences:
- *
- *   - functional dependencies don't apply to OR-clauses
- *
- *   - we can't add the previous clauses to conditions
- *
- *   - combined selectivities are combined using (s1+s2 - s1*s2)
- *     and not as a multiplication (s1*s2)
- *
- * Another way to evaluate this might be turning
- *
- *     (a OR b OR c)
- *
- * into
- *
- *     NOT ((NOT a) AND (NOT b) AND (NOT c))
- *
- * and computing selectivity of that using clauselist_selectivity().
- * That would allow (a) using the clauselist_selectivity directly and
- * (b) using the previous clauses as conditions. Not sure if it's
- * worth the additional complexity, though.
- */
 static Selectivity
-clauselist_selectivity_or(PlannerInfo *root,
-					   List *clauses,
-					   int varRelid,
-					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo,
-					   List *conditions)
+clause_mcv_selectivity(PlannerInfo *root, MVStatisticInfo *stats,
+					   Node *clause, int *status)
 {
-	Selectivity s1 = 0.0;
-	ListCell   *l;
-
-	/* processing mv stats */
-	Index		relid = InvalidOid;
-
-	/* attributes in mv-compatible clauses */
-	Bitmapset  *mvattnums = NULL;
-	List	   *stats = NIL;
-
-	/* use clauses (not conditions), because those are always non-empty */
-	stats = find_stats(root, clauses, varRelid, &relid);
+	MCVList mcvlist = NULL;
+	int		nmatches = 0;
+	int		nconditions = 0;
+	char   *matches = NULL;
+	char   *condition_matches = NULL;
+	Selectivity s = 0.0;
+	Selectivity t = 0.0;
+	Selectivity u = 0.0;
+	BoolExpr *expr = (BoolExpr*) clause;
+	bool	is_or = or_clause(clause);
+	int i;
+	bool fullmatch;
+	Selectivity lowsel;
 
-	/* OR-clauses do not work with functional dependencies */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
+	Assert(IsA(expr, BoolExpr));
+	
+	if (!expr || not_clause(clause)) /* For now!! */
 	{
-		/*
-		 * Recollect attributes from mv-compatible clauses (maybe we've
-		 * removed so many clauses we have a single mv-compatible attnum).
-		 * From now on we're only interested in MCV-compatible clauses.
-		 */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-
-		/*
-		 * If there still are at least two columns, we'll try to select
-		 * a suitable multivariate stats.
-		 */
-		if (bms_num_members(mvattnums) >= 2)
-		{
-			int k;
-			ListCell *s;
+		*status = FAILURE;
+		return 0.0;
+	}
+	if (!stats->mcv_built)
+	{
+		*status = FAILURE;
+		return 0.0;
+	}
+	
+	mcvlist = load_mv_mcvlist(stats->mvoid);
+	Assert (mcvlist != NULL);
+	Assert (mcvlist->nitems > 0);
 
-			List *solution
-				= choose_mv_statistics(root, stats,
-									   clauses, conditions,
-									   varRelid, sjinfo);
+	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
 
-			/* we have a good solution stats */
-			foreach (s, solution)
-			{
-				Selectivity s2;
-				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+	matches = palloc0(sizeof(char) * nmatches);
 
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
-				List	*mvclauses_new = NIL;
-				List	*mvclauses_conditions = NIL;
-				Bitmapset	*stat_attnums = NULL;
+	if (!is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
-				/* build attnum bitmapset for this statistics */
-				for (k = 0; k < mvstat->stakeys->dim1; k++)
-					stat_attnums = bms_add_member(stat_attnums,
-												  mvstat->stakeys->values[k]);
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
-				/*
-				 * Append the compatible conditions (passed from above)
-				 * to mvclauses_conditions.
-				 */
-				foreach (l, conditions)
-				{
-					Node *c = (Node*)lfirst(l);
-					Bitmapset *tmp = clause_mv_get_attnums(root, c);
+	nmatches = update_match_bitmap_mcvlist(root, expr->args,
+										   stats->stakeys, mcvlist,
+										   (is_or ? 0 : nmatches), matches,
+										   &lowsel, &fullmatch, is_or);
 
-					if (bms_is_subset(tmp, stat_attnums))
-						mvclauses_conditions
-							= lappend(mvclauses_conditions, c);
-
-					bms_free(tmp);
-				}
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		u += mcvlist->items[i]->frequency;
+		
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
 
-				/* split the clauselist into regular and mv-clauses
-				 *
-				 * We keep the list of clauses (we don't remove the
-				 * clauses yet, because we want to use the clauses
-				 * as conditions of other clauses).
-				 *
-				 * FIXME Do this only once, i.e. filter the clauses
-				 *       once (selecting clauses covered by at least
-				 *       one statistics) and then convert them into
-				 *       smaller per-statistics lists of conditions
-				 *       and estimated clauses.
-				 */
-				clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
 
-				/*
-				 * We've chosen the statistics to match the clauses, so
-				 * each statistics from the solution should have at least
-				 * one new clause (not covered by the previous stats).
-				 */
-				Assert(mvclauses != NIL);
+		t += mcvlist->items[i]->frequency;
+	}
 
-				/*
-				 * Mvclauses now contains only clauses compatible
-				 * with the currently selected stats, but we have to
-				 * split that into conditions (already matched by
-				 * the previous stats), and the new clauses we need
-				 * to estimate using this stats.
-				 *
-				 * XXX We'll only use the new clauses, but maybe we
-				 *     should use the conditions too, somehow. We can't
-				 *     use that directly in conditional probability, but
-				 *     maybe we might use them in a different way?
-				 *
-				 *     If we have a clause (a OR b OR c), then knowing
-				 *     that 'a' is TRUE means (b OR c) can't make the
-				 *     whole clause FALSE.
-				 *
-				 *     This is pretty much what
-				 *
-				 *         (a OR b) == NOT ((NOT a) AND (NOT b))
-				 *
-				 *     implies.
-				 */
-				foreach (l, mvclauses)
-				{
-					ListCell *p;
-					bool covered = false;
-					Node  *clause = (Node *) lfirst(l);
-					Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+	pfree(matches);
+	pfree(condition_matches);
+	pfree(mcvlist);
 
-					/*
-					 * If already covered by previous stats, add it to
-					 * conditions.
-					 *
-					 * TODO Maybe this could be relaxed a bit? Because
-					 *      with complex and/or clauses, this might
-					 *      mean no statistics actually covers such
-					 *      complex clause.
-					 */
-					foreach (p, solution)
-					{
-						int k;
-						Bitmapset  *stat_attnums = NULL;
+	if (fullmatch)
+		*status = FULL_MATCH;
 
-						MVStatisticInfo *prev_stat
-							= (MVStatisticInfo *)lfirst(p);
+	/* mcv_low is omitted for now */
 
-						/* break if we've ran into current statistic */
-						if (prev_stat == mvstat)
-							break;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
 
-						for (k = 0; k < prev_stat->stakeys->dim1; k++)
-							stat_attnums = bms_add_member(stat_attnums,
-														  prev_stat->stakeys->values[k]);
+	return (s / t) * u;
+}
 
-						covered = bms_is_subset(clause_attnums, stat_attnums);
+static Selectivity
+clause_hist_selectivity(PlannerInfo *root, MVStatisticInfo *stats,
+						Node *clause, int *status)
+{
+	MVSerializedHistogram mvhist = NULL;
+	int		nmatches = 0;
+	int		nconditions = 0;
+	char   *matches = NULL;
+	char   *condition_matches = NULL;
+	Selectivity s = 0.0;
+	Selectivity t = 0.0;
+	Selectivity u = 0.0;
+	BoolExpr *expr = (BoolExpr*) clause;
+	bool	is_or = or_clause(clause);
+	int i;
 
-						bms_free(stat_attnums);
+	Assert(IsA(expr, BoolExpr));
 
-						if (covered)
-							break;
-					}
+	if (!expr || not_clause(clause))  /* for now */
+	{
+		*status = 0;
+		return 0.0;
+	}
+	if (!stats->hist_built)
+	{
+		*status = 1;
+		return 0.0;
+	}
+	mvhist = load_mv_histogram(stats->mvoid);
+	Assert (mvhist != NULL);
+	Assert (clause != NULL);
 
-					if (! covered)
-						mvclauses_new = lappend(mvclauses_new, clause);
-				}
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
+	matches = palloc0(sizeof(char) * nmatches);
+	if (!is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
-				/*
-				 * We need at least one new clause (not just conditions).
-				 */
-				Assert(mvclauses_new != NIL);
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
-				/* compute the multivariate stats */
-				s2 = clauselist_mv_selectivity(root, mvstat,
-												mvclauses_new,
-												mvclauses_conditions,
-												true); /* OR */
+	update_match_bitmap_histogram(root, expr->args, stats->stakeys, mvhist,
+								  (is_or ? 0 : nmatches), matches, is_or);
 
-				s1 = s1 + s2 - s1 * s2;
-			}
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		float coeff = 1.0;
+		u += mvhist->buckets[i]->ntuples;
 
-			/*
-			 * And now finally remove all the mv-compatible clauses.
-			 *
-			 * This only repeats the same split as above, but this
-			 * time we actually use the result list (and feed it to
-			 * the next call).
-			 */
-			foreach (s, solution)
-			{
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
 
-				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+		t += coeff * mvhist->buckets[i]->ntuples;
 
-				/* split the list into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-			}
-		}
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += coeff * mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
-	/*
-	 * Handle the remaining clauses (either using regular statistics,
-	 * or by multivariate stats at the next level).
-	 */
-	foreach(l, clauses)
+	pfree(matches);
+	pfree(condition_matches);
+	pfree(mvhist);
+
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
+}
+
+static Selectivity
+apply_mvstats(PlannerInfo *root, Node *clause, bm_mvstat *statent)
+{
+	Selectivity s1 = 0.0;
+	int status;
+
+	if (statent->mvkind & MVSTATISTIC_MCV)
 	{
-		Selectivity s2 = clause_selectivity(root,
-											(Node *) lfirst(l),
-											varRelid,
-											jointype,
-											sjinfo,
-											conditions);
+		s1 = clause_mcv_selectivity(root, statent->stats, clause, &status);
+		if (status == FULL_MATCH && s1 > 0.0)
+			return s1;
+	}
+	
+	if (statent->mvkind & MVSTATISTIC_HIST)
+		s1 = s1 + clause_hist_selectivity(root, statent->stats,
+										  clause, &status);
+
+	return s1;
+}
+
+static inline Selectivity
+merge_selectivity(Selectivity s1, Selectivity s2, BoolExprType op)
+{
+	if (op == AND_EXPR)
+		s1 = s1 * s2;
+	else
 		s1 = s1 + s2 - s1 * s2;
+
+	return s1;
+}
+/*
+ * mvclause_selectivity -
+ */
+static Selectivity
+mvclause_selectivity(PlannerInfo *root,
+					 RestrictStatData *rstat,
+					 int varRelid,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo)
+{
+	Selectivity s1;
+	ListCell *lc;
+
+	if (!rstat->mvclause && !rstat->nonmvclause && !rstat->children)
+		return clause_selectivity(root, rstat->clause, varRelid, jointype,
+								  sjinfo);
+
+	if (rstat->boolop == NOT_EXPR)
+	{
+		RestrictStatData *clause =
+			(RestrictStatData *)linitial(rstat->children);
+
+		s1 = 1.0 - mvclause_selectivity(root, clause, varRelid,
+										jointype, sjinfo);
+		return s1;
+	}
+
+	s1 = (rstat->boolop == AND_EXPR ? 1.0 : 0.0);
+
+	if (rstat->nonmvclause)
+		s1 = merge_selectivity(s1,
+				   clause_selectivity(root, rstat->nonmvclause,
+									  varRelid, jointype, sjinfo),
+							   rstat->boolop);
+
+	if (rstat->mvclause)
+	{
+		bm_mvstat *mvs = (bm_mvstat*)linitial(rstat->mvstats);
+		Selectivity s2 = apply_mvstats(root, rstat->mvclause, mvs);
+
+		/* Fall back to ordinary calculation */
+		if (s2 < 0)
+			s2 = clause_selectivity(root, rstat->mvclause, varRelid,
+									jointype, sjinfo);
+		s1 = merge_selectivity(s1, s2, rstat->boolop);
+	}
+
+	foreach(lc, rstat->children)
+	{
+		RestrictStatData *rsd = (RestrictStatData *) lfirst(lc);
+		Assert(IsA(rsd, RestrictStatData));
+
+		s1 = merge_selectivity(s1,
+							   mvclause_selectivity(root, rsd, varRelid,
+													jointype, sjinfo),
+							   rstat->boolop);
+	}
+
+	return s1;
+}
+
+
+/*
+ * clauselist_selectivity -
+ *	  Compute the selectivity of an implicitly-ANDed list of boolean
+ *	  expression clauses.  The list can be empty, in which case 1.0
+ *	  must be returned.  List elements may be either RestrictInfos
+ *	  or bare expression clauses --- the former is preferred since
+ *	  it allows caching of results.
+ *
+ * See clause_selectivity() for the meaning of the additional parameters.
+ *
+ * Our basic approach is to take the product of the selectivities of the
+ * subclauses.  However, that's only right if the subclauses have independent
+ * probabilities, and in reality they are often NOT independent.  So,
+ * we want to be smarter where we can.
+ *
+ * Currently, the only extra smarts we have is to recognize "range queries",
+ * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
+ * query components if they are restriction opclauses whose operators have
+ * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
+ * We pair up clauses of this form that refer to the same variable.  An
+ * unpairable clause of this kind is simply multiplied into the selectivity
+ * product in the normal way.  But when we find a pair, we know that the
+ * selectivities represent the relative positions of the low and high bounds
+ * within the column's range, so instead of figuring the selectivity as
+ * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
+ * see that hisel is the fraction of the range below the high bound, while
+ * losel is the fraction above the low bound; so hisel can be interpreted
+ * directly as a 0..1 value but we need to convert losel to 1-losel before
+ * interpreting it as a value.  Then the available range is 1-losel to hisel.
+ * However, this calculation double-excludes nulls, so really we need
+ * hisel + losel + null_frac - 1.)
+ *
+ * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
+ * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
+ * yields an impossible (negative) result.
+ *
+ * A free side-effect is that we can recognize redundant inequalities such
+ * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
+ *
+ * Of course this is all very dependent on the behavior of
+ * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ *
+ *
+ * Multivariate statististics
+ * --------------------------
+ * This also uses multivariate stats to estimate combinations of
+ * conditions, in a way (a) maximizing the estimate accuracy by using
+ * as many stats as possible, and (b) minimizing the overhead,
+ * especially when there are no suitable multivariate stats (so if you
+ * are not using multivariate stats, there's no additional overhead).
+ *
+ * The following checks are performed (in this order), and the optimizer
+ * falls back to regular stats on the first 'false'.
+ *
+ * NOTE: This explains how this works with all the patches applied, not
+ *       just the functional dependencies.
+ *
+ */
+Selectivity
+clauselist_selectivity(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo)
+{
+	Selectivity s1 = 1.0;
+	RestrictStatData *rstat;
+	List *rinfos = clauses;
+
+	/* Reconstruct clauses so that multivariate statistics can be applied */
+	rstat = transformRestrictInfoForEstimate(root, clauses, varRelid, sjinfo);
+
+	if (rstat)
+	{
+		rinfos = rstat->unusedrinfos;
+
+		s1 = mvclause_selectivity(root, rstat, varRelid, jointype, sjinfo);
 	}
 
+	s1 = s1 * and_clause_selectivity(root, rinfos, varRelid, jointype, sjinfo);
+
 	return s1;
 }
 
@@ -1204,8 +824,7 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo,
-				   List *conditions)
+				   SpecialJoinInfo *sjinfo)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -1335,28 +954,37 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo,
-									  conditions);
+									  sjinfo);
 	}
 	else if (and_clause(clause))
 	{
-		/* share code with clauselist_selectivity() */
-		s1 = clauselist_selectivity(root,
+		s1 = and_clause_selectivity(root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo,
-									conditions);
+									sjinfo);
 	}
 	else if (or_clause(clause))
 	{
-		/* just call to clauselist_selectivity_or() */
-		s1 = clauselist_selectivity_or(root,
-									((BoolExpr *) clause)->args,
-									varRelid,
-									jointype,
-									sjinfo,
-									conditions);
+		/*
+		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
+		 * account for the probable overlap of selected tuple sets.
+		 *
+		 * XXX is this too conservative?
+		 */
+		ListCell   *arg;
+
+		s1 = 0.0;
+		foreach(arg, ((BoolExpr *) clause)->args)
+		{
+			Selectivity s2 = clause_selectivity(root,
+												(Node *) lfirst(arg),
+												varRelid,
+												jointype,
+												sjinfo);
+
+			s1 = s1 + s2 - s1 * s2;
+		}
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -1445,1899 +1073,55 @@ clause_selectivity(PlannerInfo *root,
 		s1 = booltestsel(root,
 						 ((BooleanTest *) clause)->booltesttype,
 						 (Node *) ((BooleanTest *) clause)->arg,
-						 varRelid,
-						 jointype,
-						 sjinfo);
-	}
-	else if (IsA(clause, CurrentOfExpr))
-	{
-		/* CURRENT OF selects at most one row of its table */
-		CurrentOfExpr *cexpr = (CurrentOfExpr *) clause;
-		RelOptInfo *crel = find_base_rel(root, cexpr->cvarno);
-
-		if (crel->tuples > 0)
-			s1 = 1.0 / crel->tuples;
-	}
-	else if (IsA(clause, RelabelType))
-	{
-		/* Not sure this case is needed, but it can't hurt */
-		s1 = clause_selectivity(root,
-								(Node *) ((RelabelType *) clause)->arg,
-								varRelid,
-								jointype,
-								sjinfo,
-								conditions);
-	}
-	else if (IsA(clause, CoerceToDomain))
-	{
-		/* Not sure this case is needed, but it can't hurt */
-		s1 = clause_selectivity(root,
-								(Node *) ((CoerceToDomain *) clause)->arg,
-								varRelid,
-								jointype,
-								sjinfo,
-								conditions);
-	}
-
-	/* Cache the result if possible */
-	if (cacheable)
-	{
-		if (jointype == JOIN_INNER)
-			rinfo->norm_selec = s1;
-		else
-			rinfo->outer_selec = s1;
-	}
-
-#ifdef SELECTIVITY_DEBUG
-	elog(DEBUG4, "clause_selectivity: s1 %f", s1);
-#endif   /* SELECTIVITY_DEBUG */
-
-	return s1;
-}
-
-
-/*
- * Estimate selectivity for the list of MV-compatible clauses, using
- * using a MV statistics (combining a histogram and MCV list).
- *
- * This simply passes the estimation to the MCV list and then to the
- * histogram, if available.
- *
- * TODO Clamp the selectivity by min of the per-clause selectivities
- *      (i.e. the selectivity of the most restrictive clause), because
- *      that's the maximum we can ever get from ANDed list of clauses.
- *      This may probably prevent issues with hitting too many buckets
- *      and low precision histograms.
- *
- * TODO We may support some additional conditions, most importantly
- *      those matching multiple columns (e.g. "a = b" or "a < b").
- *      Ultimately we could track multi-table histograms for join
- *      cardinality estimation.
- *
- * TODO Further thoughts on processing equality clauses: Maybe it'd be
- *      better to look for stats (with MCV) covered by the equality
- *      clauses, because then we have a chance to find an exact match
- *      in the MCV list, which is pretty much the best we can do. We may
- *      also look at the least frequent MCV item, and use it as a upper
- *      boundary for the selectivity (had there been a more frequent
- *      item, it'd be in the MCV list).
- *
- * TODO There are several options for 'sanity clamping' the estimates.
- *
- *      First, if we have selectivities for each condition, then
- *
- *          P(A,B) <= MIN(P(A), P(B))
- *
- *      Because additional conditions (connected by AND) can only lower
- *      the probability.
- *
- *      So we can do some basic sanity checks using the single-variate
- *      stats (the ones we have right now).
- *
- *      Second, when we have multivariate stats with a MCV list, then
- *
- *      (a) if we have a full equality condition (one equality condition
- *          on each column) and we found a match in the MCV list, this is
- *          the selectivity (and it's supposed to be exact)
- *
- *      (b) if we have a full equality condition and we haven't found a
- *          match in the MCV list, then the selectivity is below the
- *          lowest selectivity in the MCV list
- *
- *      (c) if we have a equality condition (not full), we can still
- *          search the MCV for matches and use the sum of probabilities
- *          as a lower boundary for the histogram (if there are no
- *          matches in the MCV list, then we have no boundary)
- *
- *      Third, if there are multiple (combinations of) multivariate
- *      stats for a set of clauses, we may compute all of them and then
- *      somehow aggregate them - e.g. by choosing the minimum, median or
- *      average. The stats are susceptible to overestimation (because
- *      we take 50% of the bucket for partial matches). Some stats may
- *      give better estimates than others, but it's very difficult to
- *      say that in advance which one is the best (it depends on the
- *      number of buckets, number of additional columns not referenced
- *      in the clauses, type of condition etc.).
- *
- *      So we may compute them all and then choose a sane aggregation
- *      (minimum seems like a good approach). Of course, this may result
- *      in longer / more expensive estimation (CPU-wise), but it may be
- *      worth it.
- *
- *      It's possible to add a GUC choosing whether to do a 'simple'
- *      (using a single stats expected to give the best estimate) and
- *      'complex' (combining the multiple estimates).
- *
- *          multivariate_estimates = (simple|full)
- *
- *      Also, this might be enabled at a table level, by something like
- *
- *          ALTER TABLE ... SET STATISTICS (simple|full)
- *
- *      Which would make it possible to use this only for the tables
- *      where the simple approach does not work.
- *
- *      Also, there are ways to optimize this algorithmically. E.g. we
- *      may try to get an estimate from a matching MCV list first, and
- *      if we happen to get a "full equality match" we may stop computing
- *      the estimates from other stats (for this condition) because
- *      that's probably the best estimate we can really get.
- *
- * TODO When applying the clauses to the histogram/MCV list, we can do
- *      that from the most selective clauses first, because that'll
- *      eliminate the buckets/items sooner (so we'll be able to skip
- *      them without inspection, which is more expensive). But this
- *      requires really knowing the per-clause selectivities in advance,
- *      and that's not what we do now.
- *
- * TODO All this is based on the assumption that the statistics represent
- *      the necessary dependencies, i.e. that if two colunms are not in
- *      the same statistics, there's no dependency. If that's not the
- *      case, we may get misestimates, just like before. For example
- *      assume we have a table with three columns [a,b,c] with exactly
- *      the same values, and statistics on [a,b] and [b,c]. So somthing
- *      like this:
- *
- *          CREATE TABLE test AS SELECT i, i, i
-                                  FROM generate_series(1,1000);
- *
- *          ALTER TABLE test ADD STATISTICS (mcv) ON (a,b);
- *          ALTER TABLE test ADD STATISTICS (mcv) ON (b,c);
- *
- *          ANALYZE test;
- *
- *          EXPLAIN ANALYZE SELECT * FROM test
- *                    WHERE (a < 10) AND (b < 20) AND (c < 10);
- *
- *      The problem here is that the only shared column between the two
- *      statistics is 'b' so the probability will be computed like this
- *
- *          P[(a < 10) & (b < 20) & (c < 10)]
- *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (a < 10) & (b < 20)]
- *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (b < 20)]
- *
- *      or like this
- *
- *          P[(a < 10) & (b < 20) & (c < 10)]
- *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20) & (c < 10)]
- *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20)]
- *
- *      In both cases the conditional probabilities will be evaluated as
- *      0.5, because they lack the other column (which would make it 1.0).
- *
- *      Theoretically it might be possible to transfer the dependency,
- *      e.g. by building bitmap for [a,b] and then combine it with [b,c]
- *      by doing something like this:
- *
- *          1) build bitmap on [a,b] using [(a<10) & (b < 20)]
- *          2) for each element in [b,c] check the bitmap
- *
- *      But that's certainly nontrivial - for example the statistics may
- *      be different (MCV list vs. histogram) and/or the items may not
- *      match (e.g. MCV items or histogram buckets will be built
- *      differently). Also, for one value of 'b' there might be multiple
- *      MCV items (because of the other column values) with different
- *      bitmap values (some will match, some won't) - so it's not exactly
- *      bitmap but a partial match.
- *
- *      Maybe a hash table with number of matches and mismatches (or
- *      maybe sums of frequencies) would work? The step (2) would then
- *      lookup the values and use that to weight the item somehow.
- * 
- *      Currently the only solution is to build statistics on all three
- *      columns.
- */
-static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
-						  List *clauses, List *conditions, bool is_or)
-{
-	bool fullmatch = false;
-	Selectivity s1 = 0.0, s2 = 0.0;
-
-	/*
-	 * Lowest frequency in the MCV list (may be used as an upper bound
-	 * for full equality conditions that did not match any MCV item).
-	 */
-	Selectivity mcv_low = 0.0;
-
-	/* TODO Evaluate simple 1D selectivities, use the smallest one as
-	 *      an upper bound, product as lower bound, and sort the
-	 *      clauses in ascending order by selectivity (to optimize the
-	 *      MCV/histogram evaluation).
-	 */
-
-	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
-										   clauses, conditions, is_or,
-										   &fullmatch, &mcv_low);
-
-	/*
-	 * If we got a full equality match on the MCV list, we're done (and
-	 * the estimate is pretty good).
-	 */
-	if (fullmatch && (s1 > 0.0))
-		return s1;
-
-	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
-	 *       selectivity as upper bound */
-
-	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
-											 clauses, conditions, is_or);
-
-	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
-	return s1 + s2;
-}
-
-/*
- * Collect attributes from mv-compatible clauses.
- */
-static Bitmapset *
-collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
-				   Index *relid, SpecialJoinInfo *sjinfo, int types)
-{
-	Bitmapset  *attnums = NULL;
-	ListCell   *l;
-
-	/*
-	 * Walk through the clauses and identify the ones we can estimate
-	 * using multivariate stats, and remember the relid/columns. We'll
-	 * then cross-check if we have suitable stats, and only if needed
-	 * we'll split the clauses into multivariate and regular lists.
-	 *
-	 * For now we're only interested in RestrictInfo nodes with nested
-	 * OpExpr, using either a range or equality.
-	 */
-	foreach (l, clauses)
-	{
-		Node	   *clause = (Node *) lfirst(l);
-
-		/* ignore the result here - we only need the attnums */
-		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
-								sjinfo, types);
-	}
-
-	/*
-	 * If there are not at least two attributes referenced by the clause(s),
-	 * we can throw everything out (as we'll revert to simple stats).
-	 */
-	if (bms_num_members(attnums) <= 1)
-	{
-		bms_free(attnums);
-		attnums = NULL;
-		*relid = InvalidOid;
-	}
-
-	return attnums;
-}
-
-/*
- * Selects the best combination of multivariate statistics, in an
- * exhaustive way, where 'best' means:
- *
- * (a) covering the most attributes (referenced by clauses)
- * (b) using the least number of multivariate stats
- * (c) using the most conditions to exploit dependency
- *
- * There may be other optimality criteria, not considered in the initial
- * implementation (more on that 'weaknesses' section).
- *
- * This pretty much splits the probability of clauses (aka selectivity)
- * into a sequence of conditional probabilities, like this
- *
- *    P(A,B,C,D) = P(A,B) * P(C|A,B) * P(D|A,B,C)
- *
- * and removing the attributes not referenced by the existing stats,
- * under the assumption that there's no dependency (otherwise the DBA
- * would create the stats).
- *
- * The last criteria means that when we have the choice to compute like
- * this
- *
- *      P(A,B,C,D) = P(A,B,C) * P(D|B,C)
- *
- * or like this
- *
- *      P(A,B,C,D) = P(A,B,C) * P(D|C)
- *
- * we should use the first option, as that exploits more dependencies.
- *
- * The order of statistics in the solution implicitly determines the
- * order of estimation of clauses, because as we apply a statistics,
- * we always use it to estimate all the clauses covered by it (and
- * then we use those clauses as conditions for the next statistics).
- *
- * Don't call this directly but through choose_mv_statistics().
- *
- *
- * Algorithm
- * ---------
- * The algorithm is a recursive implementation of backtracking, with
- * maximum 'depth' equal to the number of multi-variate statistics
- * available on the table.
- *
- * It explores all the possible permutations of the stats.
- * 
- * Whenever it considers adding the next statistics, the clauses it
- * matches are divided into 'conditions' (clauses already matched by at
- * least one previous statistics) and clauses that are estimated.
- *
- * Then several checks are performed:
- *
- *  (a) The statistics covers at least 2 columns, referenced in the
- *      estimated clauses (otherwise multi-variate stats are useless).
- *
- *  (b) The statistics covers at least 1 new column, i.e. column not
- *      refefenced by the already used stats (and the new column has
- *      to be referenced by the clauses, of couse). Otherwise the
- *      statistics would not add any new information.
- *
- * There are some other sanity checks (e.g. that the stats must not be
- * used twice etc.).
- *
- * Finally the new solution is compared to the currently best one, and
- * if it's considered better, it's used instead.
- *
- *
- * Weaknesses
- * ----------
- * The current implemetation uses a somewhat simple optimality criteria,
- * suffering by the following weaknesses.
- *
- * (a) There may be multiple solutions with the same number of covered
- *     attributes and number of statistics (e.g. the same solution but
- *     with statistics in a different order). It's unclear which solution
- *     is the best one - in a sense all of them are equal.
- *
- * TODO It might be possible to compute estimate for each of those
- *      solutions, and then combine them to get the final estimate
- *      (e.g. by using average or median).
- *
- * (b) Does not consider that some types of stats are a better match for
- *     some types of clauses (e.g. MCV list is a good match for equality
- *     than a histogram).
- *
- *     XXX Maybe MCV is almost always better / more accurate?
- *
- *     But maybe this is pointless - generally, each column is either
- *     a label (it's not important whether because of the data type or
- *     how it's used), or a value with ordering that makes sense. So
- *     either a MCV list is more appropriate (labels) or a histogram
- *     (values with orderings).
- *
- *     Now sure what to do with statistics on columns mixing columns of
- *     both types - maybe it'd be beeter to invent a new type of stats
- *     combining MCV list and histogram (keeping a small histogram for
- *     each MCV item, and a separate histogram for values not on the
- *     MCV list). But that's not implemented at this moment.
- *
- * TODO The algorithm should probably count number of Vars (not just
- *      attnums) when computing the 'score' of each solution. Computing
- *      the ratio of (num of all vars) / (num of condition vars) as a
- *      measure of how well the solution uses conditions might be
- *      useful.
- */
-static void
-choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
-					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
-					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
-					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
-					bool *cover_map, bool *condition_map, int *ruled_out,
-					mv_solution_t *current, mv_solution_t **best)
-{
-	int i, j;
-
-	Assert(best != NULL);
-	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
-
-	CHECK_FOR_INTERRUPTS();
-
-	if (current == NULL)
-	{
-		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
-		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
-		current->nstats = 0;
-		current->nclauses = 0;
-		current->nconditions = 0;
-	}
-
-	/*
-	 * Now try to apply each statistics, matching at least two attributes,
-	 * unless it's already used in one of the previous steps.
-	 */
-	for (i = 0; i < nmvstats; i++)
-	{
-		int c;
-
-		int ncovered_clauses = 0;		/* number of covered clauses */
-		int ncovered_conditions = 0;	/* number of covered conditions */
-		int nattnums = 0;		/* number of covered attributes */
-
-		Bitmapset  *all_attnums = NULL;
-		Bitmapset  *new_attnums = NULL;
-
-		/* skip statistics that were already used or eliminated */
-		if (ruled_out[i] != -1)
-			continue;
-
-		/*
-		 * See if we have clauses covered by this statistics, but not
-		 * yet covered by any of the preceding onces.
-		 */
-		for (c = 0; c < nclauses; c++)
-		{
-			bool covered = false;
-			Bitmapset *clause_attnums = clauses_attnums[c];
-			Bitmapset *tmp = NULL;
-
-			/*
-			 * If this clause is not covered by this stats, we can't
-			 * use the stats to estimate that at all.
-			 */
-			if (! cover_map[i * nclauses + c])
-				continue;
-
-			/*
-			 * Now we know we'll use this clause - either as a condition
-			 * or as a new clause (the estimated one). So let's add the
-			 * attributes to the attnums from all the clauses usable with
-			 * this statistics.
-			 */
-			tmp = bms_union(all_attnums, clause_attnums);
-
-			/* free the old bitmap */
-			bms_free(all_attnums);
-			all_attnums = tmp;
-
-			/* let's see if it's covered by any of the previous stats */
-			for (j = 0; j < step; j++)
-			{
-				/* already covered by the previous stats */
-				if (cover_map[current->stats[j] * nclauses + c])
-					covered = true;
-
-				if (covered)
-					break;
-			}
-
-			/* if already covered, continue with the next clause */
-			if (covered)
-			{
-				ncovered_conditions += 1;
-				continue;
-			}
-
-			/*
-			 * OK, this clause is covered by this statistics (and not by
-			 * any of the previous ones)
-			 */
-			ncovered_clauses += 1;
-
-			/* add the attnums into attnums from 'new clauses' */
-			// new_attnums = bms_union(new_attnums, clause_attnums);
-		}
-
-		/* can't have more new clauses than original clauses */
-		Assert(nclauses >= ncovered_clauses);
-		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
-
-		nattnums = bms_num_members(all_attnums);
-
-		/* free all the bitmapsets - we don't need them anymore */
-		bms_free(all_attnums);
-		bms_free(new_attnums);
-
-		all_attnums = NULL;
-		new_attnums = NULL;
-
-		/*
-		 * See if we have clauses covered by this statistics, but not
-		 * yet covered by any of the preceding onces.
-		 */
-		for (c = 0; c < nconditions; c++)
-		{
-			Bitmapset *clause_attnums = conditions_attnums[c];
-			Bitmapset *tmp = NULL;
-
-			/*
-			 * If this clause is not covered by this stats, we can't
-			 * use the stats to estimate that at all.
-			 */
-			if (! condition_map[i * nconditions + c])
-				continue;
-
-			/* count this as a condition */
-			ncovered_conditions += 1;
-
-			/*
-			 * Now we know we'll use this clause - either as a condition
-			 * or as a new clause (the estimated one). So let's add the
-			 * attributes to the attnums from all the clauses usable with
-			 * this statistics.
-			 */
-			tmp = bms_union(all_attnums, clause_attnums);
-
-			/* free the old bitmap */
-			bms_free(all_attnums);
-			all_attnums = tmp;
-		}
-
-		/*
-		 * Let's mark the statistics as 'ruled out' - either we'll use
-		 * it (and proceed to the next step), or it's incompatible.
-		 */
-		ruled_out[i] = step;
-
-		/*
-		 * There are no clauses usable with this statistics (not already
-		 * covered by aome of the previous stats).
-		 *
-		 * Similarly, if the clauses only use a single attribute, we
-		 * can't really use that.
-		 */
-		if ((ncovered_clauses == 0) || (nattnums < 2))
-			continue;
-
-		/*
-		 * TODO Not sure if it's possible to add a clause referencing
-		 *      only attributes already covered by previous stats?
-		 *      Introducing only some new dependency, not a new
-		 *      attribute. Couldn't come up with an example, though.
-		 *      Might be worth adding some assert.
-		 */
-
-		/*
-		 * got a suitable statistics - let's update the current solution,
-		 * maybe use it as the best solution
-		 */
-		current->nclauses += ncovered_clauses;
-		current->nconditions += ncovered_conditions;
-		current->nstats += 1;
-		current->stats[step] = i;
-
-		/*
-		 * We can never cover more clauses, or use more stats that we
-		 * actually have at the beginning.
-		 */
-		Assert(nclauses >= current->nclauses);
-		Assert(nmvstats >= current->nstats);
-		Assert(step < nmvstats);
-
-		/* we can't get more conditions that clauses and conditions combined
-		 *
-		 * FIXME This assert does not work because we count the conditions
-		 *       repeatedly (once for each statistics covering it).
-		 */
-		/* Assert((nconditions + nclauses) >= current->nconditions); */
-
-		if (*best == NULL)
-		{
-			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
-			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
-			(*best)->nstats = 0;
-			(*best)->nclauses = 0;
-			(*best)->nconditions = 0;
-		}
-
-		/* see if it's better than the current 'best' solution */
-		if ((current->nclauses > (*best)->nclauses) ||
-			((current->nclauses == (*best)->nclauses) &&
-			((current->nstats > (*best)->nstats))))
-		{
-			(*best)->nstats = current->nstats;
-			(*best)->nclauses = current->nclauses;
-			(*best)->nconditions = current->nconditions;
-			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
-		}
-
-		/*
-		 * The recursion only makes sense if we haven't covered all the
-		 * attributes (then adding stats is not really possible).
-		 */
-		if ((step + 1) < nmvstats)
-			choose_mv_statistics_exhaustive(root, step+1,
-									nmvstats, mvstats, stats_attnums,
-									nclauses, clauses, clauses_attnums,
-									nconditions, conditions, conditions_attnums,
-									cover_map, condition_map, ruled_out,
-									current, best);
-
-		/* reset the last step */
-		current->nclauses -= ncovered_clauses;
-		current->nconditions -= ncovered_conditions;
-		current->nstats -= 1;
-		current->stats[step] = 0;
-
-		/* mark the statistics as usable again */
-		ruled_out[i] = -1;
-
-		Assert(current->nclauses >= 0);
-		Assert(current->nstats >= 0);
-	}
-
-	/* reset all statistics as 'incompatible' in this step */
-	for (i = 0; i < nmvstats; i++)
-		if (ruled_out[i] == step)
-			ruled_out[i] = -1;
-
-}
-
-/*
- * Greedy search for a multivariate solution - a sequence of statistics
- * covering the clauses. This chooses the "best" statistics at each step,
- * so the resulting solution may not be the best solution globally, but
- * this produces the solution in only N steps (where N is the number of
- * statistics), while the exhaustive approach may have to walk through
- * ~N! combinations (although some of those are terminated early).
- *
- * See the comments at choose_mv_statistics_exhaustive() as this does
- * the same thing (but in a different way).
- *
- * Don't call this directly, but through choose_mv_statistics().
- *
- * TODO There are probably other metrics we might use - e.g. using
- *      number of columns (num_cond_columns / num_cov_columns), which
- *      might work better with a mix of simple and complex clauses.
- *
- * TODO Also the choice at the very first step should be handled
- *      in a special way, because there will be 0 conditions at that
- *      moment, so there needs to be some other criteria - e.g. using
- *      the simplest (or most complex?) clause might be a good idea.
- *
- * TODO We might also select multiple stats using different criteria,
- *      and branch the search. This is however tricky, because if we
- *      choose k statistics at each step, we get k^N branches to
- *      walk through (with N steps). That's not really good with
- *      large number of stats (yet better than exhaustive search).
- */
-static void
-choose_mv_statistics_greedy(PlannerInfo *root, int step,
-					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
-					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
-					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
-					bool *cover_map, bool *condition_map, int *ruled_out,
-					mv_solution_t *current, mv_solution_t **best)
-{
-	int i, j;
-	int best_stat = -1;
-	double gain, max_gain = -1.0;
-
-	/*
-	 * Bitmap tracking which clauses are already covered (by the previous
-	 * statistics) and may thus serve only as a condition in this step.
-	 */
-	bool *covered_clauses = (bool*)palloc0(nclauses);
-
-	/*
-	 * Number of clauses and columns covered by each statistics - this
-	 * includes both conditions and clauses covered by the statistics for
-	 * the first time. The number of columns may count some columns
-	 * repeatedly - if a column is shared by multiple clauses, it will
-	 * be counted once for each clause (covered by the statistics).
-	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
-	 * will be counted twice (if both clauses are covered).
-	 *
-	 * The values for reduded statistics (that can't be applied) are
-	 * not computed, because that'd be pointless.
-	 */
-	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
-	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
-
-	/*
-	 * Same as above, but this only includes clauses that are already
-	 * covered by the previous stats (and the current one).
-	 */
-	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
-	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
-
-	/*
-	 * Number of attributes for each clause.
-	 *
-	 * TODO Might be computed in choose_mv_statistics() and then passed
-	 *      here, but then the function would not have the same signature
-	 *      as _exhaustive().
-	 */
-	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
-	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
-
-	CHECK_FOR_INTERRUPTS();
-
-	Assert(best != NULL);
-	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
-
-	/* compute attributes (columns) for each clause */
-	for (i = 0; i < nclauses; i++)
-		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
-
-	/* compute attributes (columns) for each condition */
-	for (i = 0; i < nconditions; i++)
-		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
-
-	/* see which clauses are already covered at this point (by previous stats) */
-	for (i = 0; i < step; i++)
-		for (j = 0; j < nclauses; j++)
-			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
-
-	/* which remaining statistics covers most clauses / uses most conditions? */
-	for (i = 0; i < nmvstats; i++)
-	{
-		Bitmapset *attnums_covered = NULL;
-		Bitmapset *attnums_conditions = NULL;
-
-		/* skip stats that are already ruled out (either used or inapplicable) */
-		if (ruled_out[i] != -1)
-			continue;
-
-		/* count covered clauses and conditions (for the statistics) */
-		for (j = 0; j < nclauses; j++)
-		{
-			if (cover_map[i * nclauses + j])
-			{
-				Bitmapset *attnums_new
-					= bms_union(attnums_covered, clauses_attnums[j]);
-
-				/* get rid of the old bitmap and keep the unified result */
-				bms_free(attnums_covered);
-				attnums_covered = attnums_new;
-
-				num_cov_clauses[i] += 1;
-				num_cov_columns[i] += attnum_counts[j];
-
-				/* is the clause already covered (i.e. a condition)? */
-				if (covered_clauses[j])
-				{
-					num_cond_clauses[i] += 1;
-					num_cond_columns[i] += attnum_counts[j];
-					attnums_new = bms_union(attnums_conditions,
-											clauses_attnums[j]);
-
-					bms_free(attnums_conditions);
-					attnums_conditions = attnums_new;
-				}
-			}
-		}
-
-		/* if all covered clauses are covered by prev stats (thus conditions) */
-		if (num_cov_clauses[i] == num_cond_clauses[i])
-			ruled_out[i] = step;
-
-		/* same if there are no new attributes */
-		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
-			ruled_out[i] = step;
-
-		bms_free(attnums_covered);
-		bms_free(attnums_conditions);
-
-		/* if the statistics is inapplicable, try the next one */
-		if (ruled_out[i] != -1)
-			continue;
-
-		/* now let's walk through conditions and count the covered */
-		for (j = 0; j < nconditions; j++)
-		{
-			if (condition_map[i * nconditions + j])
-			{
-				num_cond_clauses[i] += 1;
-				num_cond_columns[i] += attnum_cond_counts[j];
-			}
-		}
-
-		/* otherwise see if this improves the interesting metrics */
-		gain = num_cond_columns[i] / (double)num_cov_columns[i];
-
-		if (gain > max_gain)
-		{
-			max_gain = gain;
-			best_stat = i;
-		}
-	}
-
-	/*
-	 * Have we found a suitable statistics? Add it to the solution and
-	 * try next step.
-	 */
-	if (best_stat != -1)
-	{
-		/* mark the statistics, so that we skip it in next steps */
-		ruled_out[best_stat] = step;
-
-		/* allocate current solution if necessary */
-		if (current == NULL)
-		{
-			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
-			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
-			current->nstats = 0;
-			current->nclauses = 0;
-			current->nconditions = 0;
-		}
-
-		current->nclauses += num_cov_clauses[best_stat];
-		current->nconditions += num_cond_clauses[best_stat];
-		current->stats[step] = best_stat;
-		current->nstats++;
-
-		if (*best == NULL)
-		{
-			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
-			(*best)->nstats = current->nstats;
-			(*best)->nclauses = current->nclauses;
-			(*best)->nconditions = current->nconditions;
-
-			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
-			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
-		}
-		else
-		{
-			/* see if this is a better solution */
-			double current_gain = (double)current->nconditions / current->nclauses;
-			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
-
-			if ((current_gain > best_gain) ||
-				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
-			{
-				(*best)->nstats = current->nstats;
-				(*best)->nclauses = current->nclauses;
-				(*best)->nconditions = current->nconditions;
-				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
-			}
-		}
-
-		/*
-		 * The recursion only makes sense if we haven't covered all the
-		 * attributes (then adding stats is not really possible).
-		*/
-		if ((step + 1) < nmvstats)
-			choose_mv_statistics_greedy(root, step+1,
-									nmvstats, mvstats, stats_attnums,
-									nclauses, clauses, clauses_attnums,
-									nconditions, conditions, conditions_attnums,
-									cover_map, condition_map, ruled_out,
-									current, best);
-
-		/* reset the last step */
-		current->nclauses -= num_cov_clauses[best_stat];
-		current->nconditions -= num_cond_clauses[best_stat];
-		current->nstats -= 1;
-		current->stats[step] = 0;
-
-		/* mark the statistics as usable again */
-		ruled_out[best_stat] = -1;
-	}
-
-	/* reset all statistics eliminated in this step */
-	for (i = 0; i < nmvstats; i++)
-		if (ruled_out[i] == step)
-			ruled_out[i] = -1;
-
-	/* free everything allocated in this step */
-	pfree(covered_clauses);
-	pfree(attnum_counts);
-	pfree(num_cov_clauses);
-	pfree(num_cov_columns);
-	pfree(num_cond_clauses);
-	pfree(num_cond_columns);
-}
-
-/*
- * Chooses the combination of statistics, optimal for estimation of
- * a particular clause list.
- *
- * This only handles a 'preparation' shared by the exhaustive and greedy
- * implementations (see the previous methods), mostly trying to reduce
- * the size of the problem (eliminate clauses/statistics that can't be
- * really used in the solution).
- *
- * It also precomputes bitmaps for attributes covered by clauses and
- * statistics, so that we don't need to do that over and over in the
- * actual optimizations (as it's both CPU and memory intensive).
- *
- * TODO This will probably have to consider compatibility of clauses,
- *      because 'dependencies' will probably work only with equality
- *      clauses.
- *
- * TODO Another way to make the optimization problems smaller might
- *      be splitting the statistics into several disjoint subsets, i.e.
- *      if we can split the graph of statistics (after the elimination)
- *      into multiple components (so that stats in different components
- *      share no attributes), we can do the optimization for each
- *      component separately.
- *
- * TODO If we could compute what is a "perfect solution" maybe we could
- *      terminate the search after reaching ~90% of it? Say, if we knew
- *      that we can cover 10 clauses and reuse 8 dependencies, maybe
- *      covering 9 clauses and 7 dependencies would be OK?
- */
-static List*
-choose_mv_statistics(PlannerInfo *root, List *stats,
-					 List *clauses, List *conditions,
-					 Oid varRelid, SpecialJoinInfo *sjinfo)
-{
-	int i;
-	mv_solution_t *best = NULL;
-	List *result = NIL;
-
-	int nmvstats;
-	MVStatisticInfo *mvstats;
-
-	/* we only work with MCV lists and histograms here */
-	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
-
-	bool   *clause_cover_map = NULL,
-		   *condition_cover_map = NULL;
-	int	   *ruled_out = NULL;
-
-	/* build bitmapsets for all stats and clauses */
-	Bitmapset **stats_attnums;
-	Bitmapset **clauses_attnums;
-	Bitmapset **conditions_attnums;
-
-	int nclauses, nconditions;
-	Node ** clauses_array;
-	Node ** conditions_array;
-
-	/* copy lists, so that we can free them during elimination easily */
-	clauses = list_copy(clauses);
-	conditions = list_copy(conditions);
-	stats = list_copy(stats);
-
-	/*
-	 * Reduce the optimization problem size as much as possible.
-	 *
-	 * Eliminate clauses and conditions not covered by any statistics,
-	 * or statistics not matching at least two attributes (one of them
-	 * has to be in a regular clause).
-	 *
-	 * It's possible that removing a statistics in one iteration
-	 * eliminates clause in the next one, so we'll repeat this until we
-	 * eliminate no clauses/stats in that iteration.
-	 *
-	 * This can only happen after eliminating a statistics - clauses are
-	 * eliminated first, so statistics always reflect that.
-	 */
-	while (true)
-	{
-		List	   *tmp;
-
-		Bitmapset *compatible_attnums = NULL;
-		Bitmapset *condition_attnums  = NULL;
-		Bitmapset *all_attnums = NULL;
-
-		/*
-		 * Clauses
-		 *
-		 * Walk through clauses and keep only those covered by at least
-		 * one of the statistics we still have. We'll also keep info
-		 * about attnums in clauses (without conditions) so that we can
-		 * ignore stats covering just conditions (which is pointless).
-		 */
-		tmp = filter_clauses(root, varRelid, sjinfo, type,
-							 stats, clauses, &compatible_attnums);
-
-		/* discard the original list */
-		list_free(clauses);
-		clauses = tmp;
-
-		/*
-		 * Conditions
-		 *
-		 * Walk through clauses and keep only those covered by at least
-		 * one of the statistics we still have. Also, collect bitmap of
-		 * attributes so that we can make sure we add at least one new
-		 * attribute (by comparing with clauses).
-		 */
-		if (conditions != NIL)
-		{
-			tmp = filter_clauses(root, varRelid, sjinfo, type,
-								 stats, conditions, &condition_attnums);
-
-			/* discard the original list */
-			list_free(conditions);
-			conditions = tmp;
-		}
-
-		/* get a union of attnums (from conditions and new clauses) */
-		all_attnums = bms_union(compatible_attnums, condition_attnums);
-
-		/*
-		 * Statisitics
-		 *
-		 * Walk through statistics and only keep those covering at least
-		 * one new attribute (excluding conditions) and at two attributes
-		 * in both clauses and conditions.
-		 */
-		tmp = filter_stats(stats, compatible_attnums, all_attnums);
-
-		/* if we've not eliminated anything, terminate */
-		if (list_length(stats) == list_length(tmp))
-			break;
-
-		/* work only with filtered statistics from now */
-		list_free(stats);
-		stats = tmp;
-	}
-
-	/* only do the optimization if we have clauses/statistics */
-	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
-		return NULL;
-
-	/* remove redundant stats (stats covered by another stats) */
-	stats = filter_redundant_stats(stats, clauses, conditions);
-
-	/*
-	 * TODO We should sort the stats to make the order deterministic,
-	 *      otherwise we may get different estimates on different
-	 *      executions - if there are multiple "equally good" solutions,
-	 *      we'll keep the first solution we see.
-	 *
-	 *      Sorting by OID probably is not the right solution though,
-	 *      because we'd like it to be somehow reproducible,
-	 *      irrespectedly of the order of ADD STATISTICS commands.
-	 *      So maybe statkeys?
-	 */
-	mvstats = make_stats_array(stats, &nmvstats);
-	stats_attnums = make_stats_attnums(mvstats, nmvstats);
-
-	/* collect clauses an bitmap of attnums */
-	clauses_array = make_clauses_array(clauses, &nclauses);
-	clauses_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
-										   clauses_array, nclauses);
-
-	/* collect conditions and bitmap of attnums */
-	conditions_array = make_clauses_array(conditions, &nconditions);
-	conditions_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
-										   conditions_array, nconditions);
-
-	/*
-	 * Build bitmaps with info about which clauses/conditions are
-	 * covered by each statistics (so that we don't need to call the
-	 * bms_is_subset over and over again).
-	 */
-	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
-									  clauses_attnums, nclauses);
-
-	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
-										 conditions_attnums, nconditions);
-
-	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
-
-	/* no stats are ruled out by default */
-	for (i = 0; i < nmvstats; i++)
-		ruled_out[i] = -1;
-
-	/* do the optimization itself */
-	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
-		choose_mv_statistics_exhaustive(root, 0,
-									   nmvstats, mvstats, stats_attnums,
-									   nclauses, clauses_array, clauses_attnums,
-									   nconditions, conditions_array, conditions_attnums,
-									   clause_cover_map, condition_cover_map,
-									   ruled_out, NULL, &best);
-	else
-		choose_mv_statistics_greedy(root, 0,
-									   nmvstats, mvstats, stats_attnums,
-									   nclauses, clauses_array, clauses_attnums,
-									   nconditions, conditions_array, conditions_attnums,
-									   clause_cover_map, condition_cover_map,
-									   ruled_out, NULL, &best);
-
-	/* create a list of statistics from the array */
-	if (best != NULL)
-	{
-		for (i = 0; i < best->nstats; i++)
-		{
-			MVStatisticInfo *info = makeNode(MVStatisticInfo);
-			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
-			result = lappend(result, info);
-		}
-		pfree(best);
-	}
-
-	/* cleanup (maybe leave it up to the memory context?) */
-	for (i = 0; i < nmvstats; i++)
-		bms_free(stats_attnums[i]);
-
-	for (i = 0; i < nclauses; i++)
-		bms_free(clauses_attnums[i]);
-
-	for (i = 0; i < nconditions; i++)
-		bms_free(conditions_attnums[i]);
-
-	pfree(stats_attnums);
-	pfree(clauses_attnums);
-	pfree(conditions_attnums);
-
-	pfree(clauses_array);
-	pfree(conditions_array);
-	pfree(clause_cover_map);
-	pfree(condition_cover_map);
-	pfree(ruled_out);
-	pfree(mvstats);
-
-	list_free(clauses);
-	list_free(conditions);
-	list_free(stats);
-
-	return result;
-}
-
-
-/*
- * This splits the clauses list into two parts - one containing clauses
- * that will be evaluated using the chosen statistics, and the remaining
- * clauses (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
-					List *clauses, Oid varRelid, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
-
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
-
-	Bitmapset *mvattnums = NULL;
-
-	/* build bitmap of attributes covered by the stats, so we can
-	 * do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
-
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
-
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
-
-		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
-									&attnums, sjinfo, types))
-		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
-		}
-
-		/*
-		 * The clause matches the selected stats, so put it to the list
-		 * of mv-compatible clauses. Otherwise, keep it in the list of
-		 * 'regular' clauses (that may be selected later).
-		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
-	}
-
-	/*
-	 * Perform regular estimation using the clauses incompatible
-	 * with the chosen histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
-
-}
-
-/*
- * Determines whether the clause is compatible with multivariate stats,
- * and if it is, returns some additional information - varno (index
- * into simple_rte_array) and a bitmap of attributes. This is then
- * used to fetch related multivariate statistics.
- *
- * At this moment we only support basic conditions of the form
- *
- *     variable OP constant
- *
- * where OP is one of [=,<,<=,>=,>] (which is however determined by
- * looking at the associated function for estimating selectivity, just
- * like with the single-dimensional case).
- *
- * TODO Support 'OR clauses' - shouldn't be all that difficult to
- *      evaluate them using multivariate stats.
- */
-static bool
-clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-						Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
-						int types)
-{
-	Relids clause_relids;
-	Relids left_relids;
-	Relids right_relids;
-
-	if (IsA(clause, RestrictInfo))
-	{
-		RestrictInfo *rinfo = (RestrictInfo *) clause;
-
-		/* Pseudoconstants are not really interesting here. */
-		if (rinfo->pseudoconstant)
-			return false;
-
-		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
-		clause = (Node*)rinfo->clause;
-
-		/* we don't support join conditions at this moment */
-		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
-			return false;
-
-		clause_relids = rinfo->clause_relids;
-		left_relids = rinfo->left_relids;
-		right_relids = rinfo->right_relids;
-	}
-	else if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
-	{
-		left_relids = pull_varnos(get_leftop((Expr*)clause));
-		right_relids = pull_varnos(get_rightop((Expr*)clause));
-
-		clause_relids = bms_union(left_relids,
-								  right_relids);
-	}
-	else
-	{
-		/* Not a binary opclause, so mark left/right relid sets as empty */
-		left_relids = NULL;
-		right_relids = NULL;
-		/* and get the total relid set the hard way */
-		clause_relids = pull_varnos((Node *) clause);
-	}
-
-	/*
-	 * Only simple opclauses and IS NULL tests are compatible with
-	 * multivariate stats at this point.
-	 */
-	if ((is_opclause(clause))
-		&& (list_length(((OpExpr *) clause)->args) == 2))
-	{
-		OpExpr	   *expr = (OpExpr *) clause;
-		bool		varonleft = true;
-		bool		ok;
-
-		/* is it 'variable op constant' ? */
-		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
-			(is_pseudo_constant_clause_relids(lsecond(expr->args),
-											  right_relids) ||
-			(varonleft = false,
-			is_pseudo_constant_clause_relids(linitial(expr->args),
-											 left_relids)));
-
-		if (ok)
-		{
-			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
-
-			/*
-			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-			 * (return NULL).
-			 *
-			 * TODO Maybe use examine_variable() would fix that?
-			 */
-			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-				return false;
-
-			/*
-			 * Only consider this variable if (varRelid == 0) or when the varno
-			 * matches varRelid (see explanation at clause_selectivity).
-			 *
-			 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-			 *       part seems to be enforced by treat_as_join_clause().
-			 */
-			if (! ((varRelid == 0) || (varRelid == var->varno)))
-				return false;
-
-			/* Also skip special varno values, and system attributes ... */
-			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-				return false;
-
-			/* Lookup info about the base relation (we need to pass the OID out) */
-			if (relid != NULL)
-				*relid = var->varno;
-
-			/*
-			 * If it's not a "<" or ">" or "=" operator, just ignore the
-			 * clause. Otherwise note the relid and attnum for the variable.
-			 * This uses the function for estimating selectivity, ont the
-			 * operator directly (a bit awkward, but well ...).
-			 */
-			switch (get_oprrest(expr->opno))
-				{
-					case F_SCALARLTSEL:
-					case F_SCALARGTSEL:
-						/* not compatible with functional dependencies */
-						if (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
-						{
-							*attnums = bms_add_member(*attnums, var->varattno);
-							return (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-						}
-						return false;
-
-					case F_EQSEL:
-						*attnums = bms_add_member(*attnums, var->varattno);
-						return true;
-				}
-		}
-	}
-	else if (IsA(clause, NullTest)
-			 && IsA(((NullTest*)clause)->arg, Var))
-	{
-		Var * var = (Var*)((NullTest*)clause)->arg;
-
-		/*
-		 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-		 * (return NULL).
-		 *
-		 * TODO Maybe use examine_variable() would fix that?
-		 */
-		if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-			return false;
-
-		/*
-		 * Only consider this variable if (varRelid == 0) or when the varno
-		 * matches varRelid (see explanation at clause_selectivity).
-		 *
-		 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-		 *       part seems to be enforced by treat_as_join_clause().
-		 */
-		if (! ((varRelid == 0) || (varRelid == var->varno)))
-			return false;
-
-		/* Also skip special varno values, and system attributes ... */
-		if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-			return false;
-
-		/* Lookup info about the base relation (we need to pass the OID out) */
-		if (relid != NULL)
-				*relid = var->varno;
-
-		*attnums = bms_add_member(*attnums, var->varattno);
-
-		return true;
-	}
-	else if (or_clause(clause) || and_clause(clause))
-	{
-		/*
-		 * AND/OR-clauses are supported if all sub-clauses are supported
-		 *
-		 * TODO We might support mixed case, where some of the clauses
-		 *      are supported and some are not, and treat all supported
-		 *      subclauses as a single clause, compute it's selectivity
-		 *      using mv stats, and compute the total selectivity using
-		 *      the current algorithm.
-		 *
-		 * TODO For RestrictInfo above an OR-clause, we might use the
-		 *      orclause with nested RestrictInfo - we won't have to
-		 *      call pull_varnos() for each clause, saving time. 
-		 */
-		Bitmapset *tmp = NULL;
-		ListCell *l;
-		foreach (l, ((BoolExpr*)clause)->args)
-		{
-			if (! clause_is_mv_compatible(root, (Node*)lfirst(l),
-						varRelid, relid, &tmp, sjinfo, types))
-				return false;
-		}
-
-		/* add the attnums from the OR-clause to the set of attnums */
-		*attnums = bms_join(*attnums, tmp);
-
-		return true;
-	}
-
-	return false;
-}
-
-
-static Bitmapset *
-clause_mv_get_attnums(PlannerInfo *root, Node *clause)
-{
-	Bitmapset * attnums = NULL;
-
-	/* Extract clause from restrict info, if needed. */
-	if (IsA(clause, RestrictInfo))
-		clause = (Node*)((RestrictInfo*)clause)->clause;
-
-	/*
-	 * Only simple opclauses and IS NULL tests are compatible with
-	 * multivariate stats at this point.
-	 */
-	if ((is_opclause(clause))
-		&& (list_length(((OpExpr *) clause)->args) == 2))
-	{
-		OpExpr	   *expr = (OpExpr *) clause;
-
-		if (IsA(linitial(expr->args), Var))
-			attnums = bms_add_member(attnums,
-							((Var*)linitial(expr->args))->varattno);
-		else
-			attnums = bms_add_member(attnums,
-							((Var*)lsecond(expr->args))->varattno);
-	}
-	else if (IsA(clause, NullTest)
-			 && IsA(((NullTest*)clause)->arg, Var))
-	{
-		attnums = bms_add_member(attnums,
-							((Var*)((NullTest*)clause)->arg)->varattno);
-	}
-	else if (or_clause(clause) || and_clause(clause))
-	{
-		ListCell *l;
-		foreach (l, ((BoolExpr*)clause)->args)
-		{
-			attnums = bms_join(attnums,
-						clause_mv_get_attnums(root, (Node*)lfirst(l)));
-		}
-	}
-
-	return attnums;
-}
-
-/*
- * Performs reduction of clauses using functional dependencies, i.e.
- * removes clauses that are considered redundant. It simply walks
- * through dependencies, and checks whether the dependency 'matches'
- * the clauses, i.e. if there's a clause matching the condition. If yes,
- * all clauses matching the implied part of the dependency are removed
- * from the list.
- *
- * This simply looks at attnums references by the clauses, not at the
- * type of the operator (equality, inequality, ...). This may not be the
- * right way to do - it certainly works best for equalities, which is
- * naturally consistent with functional dependencies (implications).
- * It's not clear that other operators are handled sensibly - for
- * example for inequalities, like
- *
- *     WHERE (A >= 10) AND (B <= 20)
- *
- * and a trivial case where [A == B], resulting in symmetric pair of
- * rules [A => B], [B => A], it's rather clear we can't remove either of
- * those clauses.
- *
- * That only highlights that functional dependencies are most suitable
- * for label-like data, where using non-equality operators is very rare.
- * Using the common city/zipcode example, clauses like
- *
- *     (zipcode <= 12345)
- *
- * or
- *
- *     (cityname >= 'Washington')
- *
- * are rare. So restricting the reduction to equality should not harm
- * the usefulness / applicability.
- *
- * The other assumption is that this assumes 'compatible' clauses. For
- * example by using mismatching zip code and city name, this is unable
- * to identify the discrepancy and eliminates one of the clauses. The
- * usual approach (multiplying both selectivities) thus produces a more
- * accurate estimate, although mostly by luck - the multiplication
- * comes from assumption of statistical independence of the two
- * conditions (which is not not valid in this case), but moves the
- * estimate in the right direction (towards 0%).
- *
- * This might be somewhat improved by cross-checking the selectivities
- * against MCV and/or histogram.
- *
- * The implementation needs to be careful about cyclic rules, i.e. rules
- * like [A => B] and [B => A] at the same time. This must not reduce
- * clauses on both attributes at the same time.
- *
- * Technically we might consider selectivities here too, somehow. E.g.
- * when (A => B) and (B => A), we might use the clauses with minimum
- * selectivity.
- *
- * TODO Consider restricting the reduction to equality clauses. Or maybe
- *      use equality classes somehow?
- *
- * TODO Merge this docs to dependencies.c, as it's saying mostly the
- *      same things as the comments there.
- *
- * TODO Currently this is applied only to the top-level clauses, but
- *      maybe we could apply it to lists at subtrees too, e.g. to the
- *      two AND-clauses in
- *
- *          (x=1 AND y=2) OR (z=3 AND q=10)
- *
- */
-static List *
-clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
-							  Oid varRelid, List *stats,
-							  SpecialJoinInfo *sjinfo)
-{
-	List	   *reduced_clauses = NIL;
-	Index		relid;
-
-	/*
-	 * matrix of (natts x natts), 1 means x=>y
-	 *
-	 * This serves two purposes - first, it merges dependencies from all
-	 * the statistics, second it makes generating all the transitive
-	 * dependencies easier.
-	 *
-	 * We need to build this only for attributes from the dependencies,
-	 * not for all attributes in the table.
-	 *
-	 * We can't do that only for attributes from the clauses, because we
-	 * want to build transitive dependencies (including those going
-	 * through attributes not listed in the stats).
-	 *
-	 * This only works for A=>B dependencies, not sure how to do that
-	 * for complex dependencies.
-	 */
-	bool       *deps_matrix;
-	int			deps_natts;	/* size of the matric */
-
-	/* mapping attnum <=> matrix index */
-	int		   *deps_idx_to_attnum;
-	int		   *deps_attnum_to_idx;
-
-	/* attnums in dependencies and clauses (and intersection) */
-	List	   *deps_clauses   = NIL;
-	Bitmapset  *deps_attnums   = NULL;
-	Bitmapset  *clause_attnums = NULL;
-	Bitmapset  *intersect_attnums = NULL;
-
-	/*
-	 * Is there at least one statistics with functional dependencies?
-	 * If not, return the original clauses right away.
-	 *
-	 * XXX Isn't this pointless, thanks to exactly the same check in
-	 *     clauselist_selectivity()? Can we trigger the condition here?
-	 */
-	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
-		return clauses;
-
-	/*
-	 * Build the dependency matrix, i.e. attribute adjacency matrix,
-	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
-	 * multiply it by itself, to get transitive dependencies.
-	 *
-	 * Note: This is pretty much transitive closure from graph theory.
-	 *
-	 * First, let's see what attributes are covered by functional
-	 * dependencies (sides of the adjacency matrix), and also a maximum
-	 * attribute (size of mapping to simple integer indexes);
-	 */
-	deps_attnums = fdeps_collect_attnums(stats);
-
-	/*
-	 * Walk through the clauses - clauses that are (one of)
-	 *
-	 * (a) not mv-compatible
-	 * (b) are using more than a single attnum
-	 * (c) using attnum not covered by functional depencencies
-	 *
-	 * may be copied directly to the result. The interesting clauses are
-	 * kept in 'deps_clauses' and will be processed later.
-	 */
-	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
-										  &reduced_clauses, &deps_clauses,
-										  varRelid, &relid, sjinfo);
-
-	/*
-	 * we need at least two clauses referencing two different attributes
-	 * referencing to do the reduction
-	 */
-	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
-	{
-		bms_free(clause_attnums);
-		list_free(reduced_clauses);
-		list_free(deps_clauses);
-
-		return clauses;
-	}
-
-
-	/*
-	 * We need at least two matching attributes in the clauses and
-	 * dependencies, otherwise we can't really reduce anything.
-	 */
-	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
-	if (bms_num_members(intersect_attnums) < 2)
-	{
-		bms_free(clause_attnums);
-		bms_free(deps_attnums);
-		bms_free(intersect_attnums);
-
-		list_free(deps_clauses);
-		list_free(reduced_clauses);
-
-		return clauses;
-	}
-
-	/*
-	 * Build mapping between matrix indexes and attnums, and then the
-	 * adjacency matrix itself.
-	 */
-	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
-	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
-
-	/* build the adjacency matrix */
-	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
-										 deps_idx_to_attnum,
-										 deps_attnum_to_idx);
-
-	deps_natts = bms_num_members(deps_attnums);
-
-	/*
-	 * Multiply the matrix N-times (N = size of the matrix), so that we
-	 * get all the transitive dependencies. That makes the next step
-	 * much easier and faster.
-	 *
-	 * This is essentially an adjacency matrix from graph theory, and
-	 * by multiplying it we get transitive edges. We don't really care
-	 * about the exact number (number of paths between vertices) though,
-	 * so we can do the multiplication in-place (we don't care whether
-	 * we found the dependency in this round or in the previous one).
-	 *
-	 * Track how many new dependencies were added, and stop when 0, but
-	 * we can't multiply more than N-times (longest path in the graph).
-	 */
-	multiply_adjacency_matrix(deps_matrix, deps_natts);
-
-	/*
-	 * Walk through the clauses, and see which other clauses we may
-	 * reduce. The matrix contains all transitive dependencies, which
-	 * makes this very fast.
-	 *
-	 * We have to be careful not to reduce the clause using itself, or
-	 * reducing all clauses forming a cycle (so we have to skip already
-	 * eliminated clauses).
-	 *
-	 * I'm not sure whether this guarantees finding the best solution,
-	 * i.e. reducing the most clauses, but it probably does (thanks to
-	 * having all the transitive dependencies).
-	 */
-	deps_clauses = fdeps_reduce_clauses(deps_clauses,
-										deps_attnums, deps_matrix,
-										deps_idx_to_attnum,
-										deps_attnum_to_idx, relid);
-
-	/* join the two lists of clauses */
-	reduced_clauses = list_union(reduced_clauses, deps_clauses);
-
-	pfree(deps_matrix);
-	pfree(deps_idx_to_attnum);
-	pfree(deps_attnum_to_idx);
-
-	bms_free(deps_attnums);
-	bms_free(clause_attnums);
-	bms_free(intersect_attnums);
-
-	return reduced_clauses;
-}
-
-static bool
-has_stats(List *stats, int type)
-{
-	ListCell   *s;
-
-	foreach (s, stats)
-	{
-		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
-
-		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
-			return true;
-	}
-
-	return false;
-}
-
-/*
- * Determing relid (either from varRelid or from clauses) and then
- * lookup stats using the relid.
- */
-static List *
-find_stats(PlannerInfo *root, List *clauses, Oid varRelid, Index *relid)
-{
-	/* unknown relid by default */
-	*relid = InvalidOid;
-
-	/*
-	 * First we need to find the relid (index info simple_rel_array).
-	 * If varRelid is not 0, we already have it, otherwise we have to
-	 * look it up from the clauses.
-	 */
-	if (varRelid != 0)
-		*relid = varRelid;
-	else
-	{
-		Relids	relids = pull_varnos((Node*)clauses);
-
-		/*
-		 * We only expect 0 or 1 members in the bitmapset. If there are
-		 * no vars, we'll get empty bitmapset, otherwise we'll get the
-		 * relid as the single member.
-		 *
-		 * FIXME For some reason we can get 2 relids here (e.g. \d in
-		 *       psql does that).
-		 */
-		if (bms_num_members(relids) == 1)
-			*relid = bms_singleton_member(relids);
-
-		bms_free(relids);
-	}
-
-	/*
-	 * if we found the relid, we can get the stats from simple_rel_array
-	 *
-	 * This only gets stats that are already built, because that's how
-	 * we load it into RelOptInfo (see get_relation_info), but we don't
-	 * detoast the whole stats yet. That'll be done later, after we
-	 * decide which stats to use.
-	 */
-	if (*relid != InvalidOid)
-		return root->simple_rel_array[*relid]->mvstatlist;
-
-	return NIL;
-}
-
-static Bitmapset*
-fdeps_collect_attnums(List *stats)
-{
-	ListCell *lc;
-	Bitmapset *attnums = NULL;
-
-	foreach (lc, stats)
-	{
-		int j;
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
-
-		int2vector *stakeys = info->stakeys;
-
-		/* skip stats without functional dependencies built */
-		if (! info->deps_built)
-			continue;
-
-		for (j = 0; j < stakeys->dim1; j++)
-			attnums = bms_add_member(attnums, stakeys->values[j]);
-	}
-
-	return attnums;
-}
-
-
-static int*
-make_idx_to_attnum_mapping(Bitmapset *attnums)
-{
-	int		attidx = 0;
-	int		attnum = -1;
-
-	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
-
-	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
-		mapping[attidx++] = attnum;
-
-	Assert(attidx == bms_num_members(attnums));
-
-	return mapping;
-}
-
-static int*
-make_attnum_to_idx_mapping(Bitmapset *attnums)
-{
-	int		attidx = 0;
-	int		attnum = -1;
-	int		maxattnum = -1;
-	int	   *mapping;
-
-	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
-		maxattnum = attnum;
-
-	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
-
-	attnum = -1;
-	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
-		mapping[attnum] = attidx++;
-
-	Assert(attidx == bms_num_members(attnums));
-
-	return mapping;
-}
-
-static bool*
-build_adjacency_matrix(List *stats, Bitmapset *attnums,
-					   int *idx_to_attnum, int *attnum_to_idx)
-{
-	ListCell *lc;
-	int		natts  = bms_num_members(attnums);
-	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
-
-	foreach (lc, stats)
-	{
-		int j;
-		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
-		MVDependencies dependencies = NULL;
-
-		/* skip stats without functional dependencies built */
-		if (! stat->deps_built)
-			continue;
-
-		/* fetch and deserialize dependencies */
-		dependencies = load_mv_dependencies(stat->mvoid);
-		if (dependencies == NULL)
-		{
-			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
-			continue;
-		}
-
-		/* set matrix[a,b] to 'true' if 'a=>b' */
-		for (j = 0; j < dependencies->ndeps; j++)
-		{
-			int aidx = attnum_to_idx[dependencies->deps[j]->a];
-			int bidx = attnum_to_idx[dependencies->deps[j]->b];
-
-			/* a=> b */
-			matrix[aidx * natts + bidx] = true;
-		}
+						 varRelid,
+						 jointype,
+						 sjinfo);
 	}
+	else if (IsA(clause, CurrentOfExpr))
+	{
+		/* CURRENT OF selects at most one row of its table */
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) clause;
+		RelOptInfo *crel = find_base_rel(root, cexpr->cvarno);
 
-	return matrix;
-}
-
-static void
-multiply_adjacency_matrix(bool *matrix, int natts)
-{
-	int i;
-
-	for (i = 0; i < natts; i++)
+		if (crel->tuples > 0)
+			s1 = 1.0 / crel->tuples;
+	}
+	else if (IsA(clause, RelabelType))
+	{
+		/* Not sure this case is needed, but it can't hurt */
+		s1 = clause_selectivity(root,
+								(Node *) ((RelabelType *) clause)->arg,
+								varRelid,
+								jointype,
+								sjinfo);
+	}
+	else if (IsA(clause, CoerceToDomain))
 	{
-		int k, l, m;
-		int nchanges = 0;
+		/* Not sure this case is needed, but it can't hurt */
+		s1 = clause_selectivity(root,
+								(Node *) ((CoerceToDomain *) clause)->arg,
+								varRelid,
+								jointype,
+								sjinfo);
+	}
 
-		/* k => l */
-		for (k = 0; k < natts; k++)
-		{
-			for (l = 0; l < natts; l++)
-			{
-				/* we already have this dependency */
-				if (matrix[k * natts + l])
-					continue;
+	/* Cache the result if possible */
+	if (cacheable)
+	{
+		if (jointype == JOIN_INNER)
+			rinfo->norm_selec = s1;
+		else
+			rinfo->outer_selec = s1;
+	}
 
-				/* we don't really care about the exact value, just 0/1 */
-				for (m = 0; m < natts; m++)
-				{
-					if (matrix[k * natts + m] * matrix[m * natts + l])
-					{
-						matrix[k * natts + l] = true;
-						nchanges += 1;
-						break;
-					}
-				}
-			}
-		}
+#ifdef SELECTIVITY_DEBUG
+	elog(DEBUG4, "clause_selectivity: s1 %f", s1);
+#endif   /* SELECTIVITY_DEBUG */
 
-		/* no transitive dependency added here, so terminate */
-		if (nchanges == 0)
-			break;
-	}
+	return s1;
 }
 
+
 static List*
 fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
 					int *idx_to_attnum, int *attnum_to_idx, Index relid)
@@ -3427,55 +1211,6 @@ fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
 }
 
 
-static Bitmapset *
-fdeps_filter_clauses(PlannerInfo *root,
-					 List *clauses, Bitmapset *deps_attnums,
-					 List **reduced_clauses, List **deps_clauses,
-					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo)
-{
-	ListCell *lc;
-	Bitmapset *clause_attnums = NULL;
-
-	foreach (lc, clauses)
-	{
-		Bitmapset *attnums = NULL;
-		Node	   *clause = (Node *) lfirst(lc);
-
-		if (! clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
-									  sjinfo, MV_CLAUSE_TYPE_FDEP))
-
-			/* clause incompatible with functional dependencies */
-			*reduced_clauses = lappend(*reduced_clauses, clause);
-
-		else if (bms_num_members(attnums) > 1)
-
-			/*
-			 * clause referencing multiple attributes (strange, should
-			 * this be handled by clause_is_mv_compatible directly)
-			 */
-			*reduced_clauses = lappend(*reduced_clauses, clause);
-
-		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
-
-			/* clause not covered by the dependencies */
-			*reduced_clauses = lappend(*reduced_clauses, clause);
-
-		else
-		{
-			/* ok, clause compatible with existing dependencies */
-			Assert(bms_num_members(attnums) == 1);
-
-			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums,
-										bms_singleton_member(attnums));
-		}
-
-		bms_free(attnums);
-	}
-
-	return clause_attnums;
-}
-
 /*
  * Pull varattnos from the clauses, similarly to pull_varattnos() but:
  *
@@ -3509,162 +1244,6 @@ get_varattnos(Node * node, Index relid)
 	return result;
 }
 
-/*
- * Estimate selectivity of clauses using a MCV list.
- *
- * If there's no MCV list for the stats, the function returns 0.0.
- *
- * While computing the estimate, the function checks whether all the
- * columns were matched with an equality condition. If that's the case,
- * we can skip processing the histogram, as there can be no rows in
- * it with the same values - all the rows matching the condition are
- * represented by the MCV item. This can only happen with equality
- * on all the attributes.
- *
- * The algorithm works like this:
- *
- *   1) mark all items as 'match'
- *   2) walk through all the clauses
- *   3) for a particular clause, walk through all the items
- *   4) skip items that are already 'no match'
- *   5) check clause for items that still match
- *   6) sum frequencies for items to get selectivity
- *
- * The function also returns the frequency of the least frequent item
- * on the MCV list, which may be useful for clamping estimate from the
- * histogram (all items not present in the MCV list are less frequent).
- * This however seems useful only for cases with conditions on all
- * attributes.
- *
- * TODO This only handles AND-ed clauses, but it might work for OR-ed
- *      lists too - it just needs to reverse the logic a bit. I.e. start
- *      with 'no match' for all items, and mark the items as a match
- *      as the clauses are processed (and skip items that are 'match').
- */
-static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
-								  List *clauses, List *conditions, bool is_or,
-								  bool *fullmatch, Selectivity *lowsel)
-{
-	int i;
-	Selectivity s = 0.0;
-	Selectivity t = 0.0;
-	Selectivity u = 0.0;
-
-	MCVList mcvlist = NULL;
-
-	int	nmatches = 0;
-	int	nconditions = 0;
-
-	/* match/mismatch bitmap for each MCV item */
-	char * matches = NULL;
-	char * condition_matches = NULL;
-
-	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 1);
-
-	/* there's no MCV list built yet */
-	if (! mvstats->mcv_built)
-		return 0.0;
-
-	mcvlist = load_mv_mcvlist(mvstats->mvoid);
-
-	Assert(mcvlist != NULL);
-	Assert(mcvlist->nitems > 0);
-
-	/* number of matching MCV items */
-	nmatches = mcvlist->nitems;
-	nconditions = mcvlist->nitems;
-
-	/*
-	 * Bitmap of bucket matches (mismatch, partial, full).
-	 *
-	 * For AND clauses all buckets match (and we'll eliminate them).
-	 * For OR  clauses no  buckets match (and we'll add them).
-	 *
-	 * We only need to do the memset for AND clauses (for OR clauses
-	 * it's already set correctly by the palloc0).
-	 */
-	matches = palloc0(sizeof(char) * nmatches);
-
-	if (! is_or) /* AND-clause */
-		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
-
-	/* Conditions are treated as AND clause, so match by default. */
-	condition_matches = palloc0(sizeof(char) * nconditions);
-	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
-
-	/*
-	 * build the match bitmap for the conditions (conditions are always
-	 * connected by AND)
-	 */
-	if (conditions != NIL)
-		nconditions = update_match_bitmap_mcvlist(root, conditions,
-									   mvstats->stakeys, mcvlist,
-									   nconditions, condition_matches,
-									   lowsel, fullmatch, false);
-
-	/*
-	 * build the match bitmap for the estimated clauses
-	 *
-	 * TODO This evaluates the clauses for all MCV items, even those
-	 *      ruled out by the conditions. The final result should be the
-	 *      same, but it might be faster.
-	 */
-	nmatches = update_match_bitmap_mcvlist(root, clauses,
-										   mvstats->stakeys, mcvlist,
-										   ((is_or) ? 0 : nmatches), matches,
-										   lowsel, fullmatch, is_or);
-
-	/* sum frequencies for all the matching MCV items */
-	for (i = 0; i < mcvlist->nitems; i++)
-	{
-		/*
-		 * Find out what part of the data is covered by the MCV list,
-		 * so that we can 'scale' the selectivity properly (e.g. when
-		 * only 50% of the sample items got into the MCV, and the rest
-		 * is either in a histogram, or not covered by stats).
-		 *
-		 * TODO This might be handled by keeping a global "frequency"
-		 *      for the whole list, which might save us a bit of time
-		 *      spent on accessing the not-matching part of the MCV list.
-		 *      Although it's likely in a cache, so it's very fast.
-		 */
-		u += mcvlist->items[i]->frequency;
-
-		/* skit MCV items not matching the conditions */
-		if (condition_matches[i] == MVSTATS_MATCH_NONE)
-			continue;
-
-		if (matches[i] != MVSTATS_MATCH_NONE)
-			s += mcvlist->items[i]->frequency;
-
-		t += mcvlist->items[i]->frequency;
-	}
-
-	pfree(matches);
-	pfree(condition_matches);
-	pfree(mcvlist);
-
-	/* no condition matches */
-	if (t == 0.0)
-		return (Selectivity)0.0;
-
-	return (s / t) * u;
-}
-
-/*
- * Evaluate clauses using the MCV list, and update the match bitmap.
- *
- * The bitmap may be already partially set, so this is really a way to
- * combine results of several clause lists - either when computing
- * conditional probability P(A|B) or a combination of AND/OR clauses.
- *
- * TODO This works with 'bitmap' where each bit is represented as a char,
- *      which is slightly wasteful. Instead, we could use a regular
- *      bitmap, reducing the size to ~1/8. Another thing is merging the
- *      bitmaps using & and |, which might be faster than min/max.
- */
 static int
 update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 						   int2vector *stakeys, MCVList mcvlist,
@@ -3952,213 +1531,58 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 			/* match/mismatch bitmap for each MCV item */
 			int	tmp_nmatches = 0;
-			char * tmp_matches = NULL;
-
-			Assert(tmp_clauses != NIL);
-			Assert(list_length(tmp_clauses) >= 2);
-
-			/* number of matching MCV items */
-			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
-
-			/* by default none of the MCV items matches the clauses */
-			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
-
-			/* AND clauses assume everything matches, initially */
-			if (! or_clause(clause))
-				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
-			/* build the match bitmap for the OR-clauses */
-			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
-									   stakeys, mcvlist,
-									   tmp_nmatches, tmp_matches,
-									   lowsel, fullmatch, or_clause(clause));
-
-			/* merge the bitmap into the existing one*/
-			for (i = 0; i < mcvlist->nitems; i++)
-			{
-				/*
-				 * To AND-merge the bitmaps, a MIN() semantics is used.
-				 * For OR-merge, use MAX().
-				 *
-				 * FIXME this does not decrease the number of matches
-				 */
-				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
-			}
-
-			pfree(tmp_matches);
-
-		}
-		else
-			elog(ERROR, "unknown clause type: %d", clause->type);
-	}
-
-	/*
-	 * If all the columns were matched by equality, it's a full match.
-	 * In this case there can be just a single MCV item, matching the
-	 * clause (if there were two, both would match the other one).
-	 */
-	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
-
-	/* free the allocated pieces */
-	if (eqmatches)
-		pfree(eqmatches);
-
-	return nmatches;
-}
-
-/*
- * Estimate selectivity of clauses using a histogram.
- *
- * If there's no histogram for the stats, the function returns 0.0.
- *
- * The general idea of this method is similar to how MCV lists are
- * processed, except that this introduces the concept of a partial
- * match (MCV only works with full match / mismatch).
- *
- * The algorithm works like this:
- *
- *   1) mark all buckets as 'full match'
- *   2) walk through all the clauses
- *   3) for a particular clause, walk through all the buckets
- *   4) skip buckets that are already 'no match'
- *   5) check clause for buckets that still match (at least partially)
- *   6) sum frequencies for buckets to get selectivity
- *
- * Unlike MCV lists, histograms have a concept of a partial match. In
- * that case we use 1/2 the bucket, to minimize the average error. The
- * MV histograms are usually less detailed than the per-column ones,
- * meaning the sum is often quite high (thanks to combining a lot of
- * "partially hit" buckets).
- *
- * Maybe we could use per-bucket information with number of distinct
- * values it contains (for each dimension), and then use that to correct
- * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
- * frequency). We might also scale the value depending on the actual
- * ndistinct estimate (not just the values observed in the sample).
- *
- * Another option would be to multiply the selectivities, i.e. if we get
- * 'partial match' for a bucket for multiple conditions, we might use
- * 0.5^k (where k is the number of conditions), instead of 0.5. This
- * probably does not minimize the average error, though.
- *
- * TODO This might use a similar shortcut to MCV lists - count buckets
- *      marked as partial/full match, and terminate once this drop to 0.
- *      Not sure if it's really worth it - for MCV lists a situation like
- *      this is not uncommon, but for histograms it's not that clear.
- */
-static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
-									List *clauses, List *conditions, bool is_or)
-{
-	int i;
-	Selectivity s = 0.0;
-	Selectivity t = 0.0;
-	Selectivity u = 0.0;
-
-	int		nmatches = 0;
-	int		nconditions = 0;
-	char   *matches = NULL;
-	char   *condition_matches = NULL;
-
-	MVSerializedHistogram mvhist = NULL;
-
-	/* there's no histogram */
-	if (! mvstats->hist_built)
-		return 0.0;
-
-	/* There may be no histogram in the stats (check hist_built flag) */
-	mvhist = load_mv_histogram(mvstats->mvoid);
-
-	Assert (mvhist != NULL);
-	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 1);
-
-	nmatches = mvhist->nbuckets;
-	nconditions = mvhist->nbuckets;
-
-	/*
-	 * Bitmap of bucket matches (mismatch, partial, full).
-	 *
-	 * For AND clauses all buckets match (and we'll eliminate them).
-	 * For OR  clauses no  buckets match (and we'll add them).
-	 *
-	 * We only need to do the memset for AND clauses (for OR clauses
-	 * it's already set correctly by the palloc0).
-	 */
-	matches = palloc0(sizeof(char) * nmatches);
-
-	if (! is_or) /* AND-clause */
-		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+			char * tmp_matches = NULL;
 
-	/* Conditions are treated as AND clause, so match by default. */
-	condition_matches = palloc0(sizeof(char)*nconditions);
-	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+			Assert(tmp_clauses != NIL);
+			Assert(list_length(tmp_clauses) >= 2);
 
-	/* build the match bitmap for the conditions */
-	if (conditions != NIL)
-		update_match_bitmap_histogram(root, conditions,
-								  mvstats->stakeys, mvhist,
-								  nconditions, condition_matches, is_or);
+			/* number of matching MCV items */
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
-	/*
-	 * build the match bitmap for the estimated clauses
-	 *
-	 * TODO This evaluates the clauses for all buckets, even those
-	 *      ruled out by the conditions. The final result should be
-	 *      the same, but it might be faster.
-	 */
-	update_match_bitmap_histogram(root, clauses,
-								  mvstats->stakeys, mvhist,
-								  ((is_or) ? 0 : nmatches), matches,
-								  is_or);
+			/* by default none of the MCV items matches the clauses */
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-	/* now, walk through the buckets and sum the selectivities */
-	for (i = 0; i < mvhist->nbuckets; i++)
-	{
-		float coeff = 1.0;
+			/* AND clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
-		/*
-		 * Find out what part of the data is covered by the histogram,
-		 * so that we can 'scale' the selectivity properly (e.g. when
-		 * only 50% of the sample got into the histogram, and the rest
-		 * is in a MCV list).
-		 *
-		 * TODO This might be handled by keeping a global "frequency"
-		 *      for the whole histogram, which might save us some time
-		 *      spent accessing the not-matching part of the histogram.
-		 *      Although it's likely in a cache, so it's very fast.
-		 */
-		u += mvhist->buckets[i]->ntuples;
+			/* build the match bitmap for the OR-clauses */
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
+									   stakeys, mcvlist,
+									   tmp_nmatches, tmp_matches,
+									   lowsel, fullmatch, or_clause(clause));
 
-		/* skip buckets not matching the conditions */
-		if (condition_matches[i] == MVSTATS_MATCH_NONE)
-			continue;
-		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
-			coeff = 0.5;
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
+			}
 
-		t += coeff * mvhist->buckets[i]->ntuples;
+			pfree(tmp_matches);
 
-		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += coeff * mvhist->buckets[i]->ntuples;
-		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			/*
-			 * TODO If both conditions and clauses match partially, this
-			 *      will use 0.25 match - not sure if that's the right
-			 *      thing solution, but seems about right.
-			 */
-			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* release the allocated bitmap and deserialized histogram */
-	pfree(matches);
-	pfree(condition_matches);
-	pfree(mvhist);
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
 
-	/* no condition matches */
-	if (t == 0.0)
-		return (Selectivity)0.0;
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
 
-	return (s / t) * u;
+	return nmatches;
 }
 
 /*
@@ -4715,362 +2139,463 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 	return nmatches;
 }
 
-/*
- * Walk through clauses and keep only those covered by at least
- * one of the statistics.
- */
-static List *
-filter_clauses(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
-			   int type, List *stats, List *clauses, Bitmapset **attnums)
+static Node *
+stripRestrictStatData(List *clauses, BoolExprType boolop, Bitmapset **attrs)
 {
-	ListCell   *c;
-	ListCell   *s;
-
-	/* results (list of compatible clauses, attnums) */
-	List	   *rclauses = NIL;
+	Expr *newexpr;
+	ListCell *lc;
 
-	foreach (c, clauses)
+	if (attrs) *attrs = NULL;
+	
+	if (list_length(clauses) == 0)
+		newexpr = NULL;
+	else if (list_length(clauses) == 1)
 	{
-		Node *clause = (Node*)lfirst(c);
-		Bitmapset *clause_attnums = NULL;
-		Index relid;
+		RestrictStatData *rsd = (RestrictStatData *) linitial(clauses);
+		Assert(IsA(rsd, RestrictStatData));
 
-		/*
-		 * The clause has to be mv-compatible (suitable operators etc.).
-		 */
-		if (! clause_is_mv_compatible(root, clause, varRelid,
-							 &relid, &clause_attnums, sjinfo, type))
-				elog(ERROR, "should not get non-mv-compatible cluase");
+		newexpr = (Expr*)(rsd->clause);
+		if (attrs) *attrs = rsd->mvattrs;
+	}
+	else
+	{
+		BoolExpr *newboolexpr;
+		newboolexpr = makeNode(BoolExpr);
+		newboolexpr->boolop = boolop;
 
-		/* is there a statistics covering this clause? */
-		foreach (s, stats)
+		foreach (lc, clauses)
 		{
-			int k, matches = 0;
-			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
-
-			for (k = 0; k < stat->stakeys->dim1; k++)
-			{
-				if (bms_is_member(stat->stakeys->values[k],
-								  clause_attnums))
-					matches += 1;
-			}
-
-			/*
-			 * The clause is compatible if all attributes it references
-			 * are covered by the statistics.
-			 */
-			if (bms_num_members(clause_attnums) == matches)
-			{
-				*attnums = bms_union(*attnums, clause_attnums);
-				rclauses = lappend(rclauses, clause);
-				break;
-			}
+			RestrictStatData *rsd = (RestrictStatData *) lfirst(lc);
+			Assert(IsA(rsd, RestrictStatData));
+			newboolexpr->args = 
+				lappend(newboolexpr->args, rsd->clause);
+			if (attrs)
+				*attrs = bms_add_members(*attrs, rsd->mvattrs);
 		}
-
-		bms_free(clause_attnums);
+		newexpr = (Expr*) newboolexpr;
 	}
 
-	/* we can't have more compatible conditions than source conditions */
-	Assert(list_length(clauses) >= list_length(rclauses));
-
-	return rclauses;
+	return (Node*)newexpr;
 }
 
-
-/*
- * Walk through statistics and only keep those covering at least
- * one new attribute (excluding conditions) and at two attributes
- * in both clauses and conditions.
- *
- * This check might be made more strict by checking against individual
- * clauses, because by using the bitmapsets of all attnums we may
- * actually use attnums from clauses that are not covered by the
- * statistics. For example, we may have a condition
- *
- *    (a=1 AND b=2)
- *
- * and a new clause
- *
- *    (c=1 AND d=1)
- *
- * With only bitmapsets, statistics on [b,c] will pass through this
- * (assuming there are some statistics covering both clases).
- *
- * TODO Do the more strict check.
- */
-static List *
-filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+RestrictStatData *
+transformRestrictInfoForEstimate(PlannerInfo *root, List *clauses,
+								 int relid, SpecialJoinInfo *sjinfo)
 {
-	ListCell   *s;
-	List	   *stats_filtered = NIL;
+	static int level = 0;
+	int i = -1;
+	char head[100];
+	RestrictStatData *rdata = makeNode(RestrictStatData);
+	Node *clause;
 
-	foreach (s, stats)
+	memset(head, '.', 100);
+	head[level] = 0;
+	
+	if (list_length(clauses) == 1 &&
+		!IsA((Node*)linitial(clauses), RestrictInfo))
 	{
-		int k;
-		int matches_new = 0,
-			matches_all = 0;
-
-		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
-
-		/* see how many attributes the statistics covers */
-		for (k = 0; k < stat->stakeys->dim1; k++)
-		{
-			/* attributes from new clauses */
-			if (bms_is_member(stat->stakeys->values[k], new_attnums))
-				matches_new += 1;
-
-			/* attributes from onditions */
-			if (bms_is_member(stat->stakeys->values[k], all_attnums))
-				matches_all += 1;
-		}
-
-		/* check we have enough attributes for this statistics */
-		if ((matches_new >= 1) && (matches_all >= 2))
-			stats_filtered = lappend(stats_filtered, stat);
+		Assert(relid > 0);
+		clause = (Node*)linitial(clauses);
 	}
+	else
+	{
+		/* This is top level clauselist. Convert it to and expression */
+		ListCell *lc;
+		Index clauserelid = 0;
+		Relids relids = pull_varnos((Node*)clauses);
 
-	/* we can't have more useful stats than we had originally */
-	Assert(list_length(stats) >= list_length(stats_filtered));
-
-	return stats_filtered;
-}
+		if (bms_num_members(relids) != 1)
+			return NULL;
 
-static MVStatisticInfo *
-make_stats_array(List *stats, int *nmvstats)
-{
-	int i;
-	ListCell   *l;
+		clauserelid = bms_singleton_member(relids);
+		if (relid != 0 && relid != clauserelid)
+			return NULL;
 
-	MVStatisticInfo *mvstats = NULL;
-	*nmvstats = list_length(stats);
+		relid = clauserelid;
 
-	mvstats
-		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+		if (list_length(clauses) == 1)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) linitial(clauses);
+			Assert(IsA(rinfo, RestrictInfo));
+			
+			clause = (Node*) rinfo->clause;
+		}
+		else
+		{
+			BoolExpr *andexpr = makeNode(BoolExpr);
+			andexpr->boolop = AND_EXPR;
+			foreach (lc, clauses)
+			{
+				RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+				
+				Assert(IsA(rinfo, RestrictInfo));
+				if (rinfo->pseudoconstant ||
+					treat_as_join_clause((Node*)rinfo->clause,
+										 rinfo, 0, sjinfo))
+					rdata->unusedrinfos = lappend(rdata->unusedrinfos,
+												  rinfo);
+				else
+					andexpr->args = lappend(andexpr->args, rinfo->clause);
+			}
+			clause = (Node*)andexpr;
+		}
 
-	i = 0;
-	foreach (l, stats)
-	{
-		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
-		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
 	}
 
-	return mvstats;
-}
-
-static Bitmapset **
-make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
-{
-	int			i, j;
-	Bitmapset **stats_attnums = NULL;
-
-	Assert(nmvstats > 0);
+	Assert(!IsA(clause, RestrictInfo));
 
-	/* build bitmaps of attnums for the stats (easier to compare) */
-	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+	rdata->clause = clause;
+	rdata->boolop = AND_EXPR;
 
-	for (i = 0; i < nmvstats; i++)
-		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
-			stats_attnums[i]
-				= bms_add_member(stats_attnums[i],
-								 mvstats[i].stakeys->values[j]);
+	if (and_clause(clause) || or_clause(clause))
+	{
+		BoolExpr *boolexpr = (BoolExpr *)clause;
+		ListCell *lc;
+		List *mvclauses = NIL;
+		List *nonmvclauses = NIL;
+		List *partialclauses = NIL;
+		Bitmapset *resultattrs = NULL;
+		List *resultstats = NIL;
 
-	return stats_attnums;
-}
+		rdata->boolop = boolexpr->boolop;
+		ereport(DEBUG1,
+				(errmsg ("%s%s[%d][%d](%d)",
+						 head,
+						 and_clause(clause)?"AND":
+						 (or_clause(clause)?"OR":"NOT"),
+						 level, i, list_length(boolexpr->args)),
+				 errhidestmt(level)));
 
+		/* Recursively process the subexpressions */
+		level++;
+		foreach (lc, (boolexpr->args))
+		{
+			Node *nd = (Node*) lfirst(lc);
+			RestrictStatData *tmpsd;
 
-/*
- * Now let's remove redundant statistics, covering the same columns
- * as some other stats, when restricted to the attributes from
- * remaining clauses.
- *
- * If statistics S1 covers S2 (covers S2 attributes and possibly
- * some more), we can probably remove S2. What actually matters are
- * attributes from covered clauses (not all the attributes). This
- * might however prefer larger, and thus less accurate, statistics.
- *
- * When a redundancy is detected, we simply keep the smaller
- * statistics (less number of columns), on the assumption that it's
- * more accurate and faster to process. That might be incorrect for
- * two reasons - first, the accuracy really depends on number of
- * buckets/MCV items, not the number of columns. Second, we might
- * prefer MCV lists over histograms or something like that.
- */
-static List*
-filter_redundant_stats(List *stats, List *clauses, List *conditions)
-{
-	int i, j, nmvstats;
+			tmpsd = transformRestrictInfoForEstimate(root,
+													 list_make1(nd),
+													 relid, sjinfo);
+			/*
+			 * mvclauses is to hold the child RestrictStatData that
+			 * potentially can be pulled-up to this node's mvclause, which is
+			 * to be estimated using multivariate statistics.
+			 *
+			 * partialclauses is to hold the child RestrictStatData that
+			 * cannot be pulled-up.
+			 * 
+			 * nonmvclauses is to hold the child RestrictStatData to be
+			 * pulled-up into the clause to be estimated in the normal way.
+			 */
+			if (tmpsd->mvattrs)
+				mvclauses = lappend(mvclauses, tmpsd);
+			else if (tmpsd->mvclause)
+				partialclauses = lappend(partialclauses, tmpsd);
+			else
+				nonmvclauses = lappend(nonmvclauses, tmpsd);
+		}
+		level--;
 
-	MVStatisticInfo	   *mvstats;
-	bool			   *redundant;
-	Bitmapset		  **stats_attnums;
-	Bitmapset		   *varattnos;
-	Index				relid;
 
-	Assert(list_length(stats) > 0);
-	Assert(list_length(clauses) > 0);
+		if (list_length(mvclauses) == 1)
+		{
+			/*
+			 * If this boolean clause has only one mv clause, pull it up for
+			 * now.
+			 */
+			RestrictStatData *rsd = (RestrictStatData *) linitial(mvclauses);
+			resultattrs = rsd->mvattrs;
+			resultstats = rsd->mvstats;
+		}
+		if (list_length(mvclauses) > 1)
+		{
+			/*
+			 * Pick up the smallest mv-stats that covers as large part as
+			 * possible of the attrutes appeard in the subclauses, then remove
+			 * clauses that is not covered by the selected mv-stats.
+			 */
+			int nmvstats = 0;
+			ListCell *lc;
+			bm_mvstat *mvstatslist[16];
+			int maxnattrs = 0;
+			int candidatestats;
+			int i;
+
+			/* Check functional dependency first, maybe.. */
+//			if (list_length(mvclauses) == 2)
+//			{
+//				RestrictStatData *rsd1 =
+//					(RestrictStatData *) linitial(mvclauses);
+//				RestrictStatData *rsd2 =
+//					(RestrictStatData *) lsecond(mvclauses);
+//				/* To do more...*/
+//			}
 
-	/*
-	 * We'll convert the list of statistics into an array now, because
-	 * the reduction of redundant statistics is easier to do that way
-	 * (we can mark previous stats as redundant, etc.).
-	 */
-	mvstats = make_stats_array(stats, &nmvstats);
-	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+			/*
+			 * Collect all mvstats from all subclauses. Attribute set should
+			 * be unique so use it as key. There should be not so many stats.
+			 */
+			foreach (lc, mvclauses)
+			{
+				RestrictStatData *rsd = (RestrictStatData *) lfirst(lc);
+				Bitmapset *mvattrs = rsd->mvattrs;
+				ListCell *lcs;
 
-	/* by default, none of the stats is redundant (so palloc0) */
-	redundant = palloc0(nmvstats * sizeof(bool));
+				/* make a covering attribute set of all cluases */
+				resultattrs = bms_add_members(resultattrs, mvattrs);
 
-	/*
-	 * We only expect a single relid here, and also we should get the
-	 * same relid from clauses and conditions (but we get it from
-	 * clauses, because those are certainly non-empty).
-	 */
-	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+				/* pick up new mv stats */
+				foreach (lcs, rsd->mvstats)
+				{
+					bm_mvstat *mvs = (bm_mvstat*) lfirst(lcs);
+					bool found = false;
 
-	/*
-	 * Get the varattnos from both conditions and clauses.
-	 *
-	 * This skips system attributes, although that should be impossible
-	 * thanks to previous filtering out of incompatible clauses.
-	 *
-	 * XXX Is that really true?
-	 */
-	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
-						  get_varattnos((Node*)conditions, relid));
+					for (i = 0 ; !found && i < nmvstats ; i++)
+					{
+						if(bms_equal(mvstatslist[i]->attrs, mvs->attrs))
+							found = true;
+					}
+					if (!found)
+					{
+						mvstatslist[nmvstats] = mvs;
+						nmvstats++;
+					}
 
-	for (i = 1; i < nmvstats; i++)
-	{
-		/* intersect with current statistics */
-		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
+					/* ignore more than 15(!) stats for a clause */
+					if (nmvstats > 15)
+						break;
+				}
+			}
 
-		/* walk through 'previous' stats and check redundancy */
-		for (j = 0; j < i; j++)
-		{
-			/* intersect with current statistics */
-			Bitmapset *prev;
+			/* we try functional dependency first? */
+			//if (clauseboolop == AND_EXPR && ...
+				
+			/*
+			 * find a mv stats covers the largest number of attribute used in
+			 * the cluases and having the smallest attrubute set.
+			 */
+			maxnattrs = 0;
+			candidatestats = -1;
+			for (i = 0 ; i < nmvstats ; i++)
+			{
+				Bitmapset *matchattr =
+					bms_intersect(resultattrs, mvstatslist[i]->attrs);
+				int nmatchattrs = bms_num_members(matchattr);
 
-			/* skip stats already identified as redundant */
-			if (redundant[j])
-				continue;
+				if (maxnattrs < nmatchattrs)
+				{
+					candidatestats = i;
+					maxnattrs = nmatchattrs;
+				}
+				else if (maxnattrs > 0 && maxnattrs == nmatchattrs)
+				{
+					if (bms_num_members(mvstatslist[i]->attrs) <
+						bms_num_members(mvstatslist[candidatestats]->attrs))
+						candidatestats = i;
+				}
+			}
 
-			prev = bms_intersect(stats_attnums[j], varattnos);
+			Assert(candidatestats >= 0);
 
-			switch (bms_subset_compare(curr, prev))
+			if (maxnattrs == 1)
 			{
-				case BMS_EQUAL:
+				/*
+				 * No two of mvclauses share a mv statistics. Make this node
+				 * non-mv.
+				 */
+				mvclauses = NIL;
+				nonmvclauses = NIL;
+				resultattrs = NULL;
+				resultstats = NIL;
+			}
+			else
+			{
+				if (!bms_is_subset(resultattrs,
+								   mvstatslist[candidatestats]->attrs))
+				{
 					/*
-					 * Use the smaller one (hopefully more accurate).
-					 * If both have the same size, use the first one.
+					 * move out the clauses that is not covered by the
+					 * candidate stats
 					 */
-					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
-						redundant[i] = TRUE;
-					else
-						redundant[j] = TRUE;
-
-					break;
-
-				case BMS_SUBSET1: /* curr is subset of prev */
-					redundant[i] = TRUE;
-					break;
+					List *old_mvclauses = mvclauses;
+					ListCell *lc;
+					Bitmapset *statsattrs =
+						mvstatslist[candidatestats]->attrs;
+					mvclauses = NIL;
 
-				case BMS_SUBSET2: /* prev is subset of curr */
-					redundant[j] = TRUE;
-					break;
+					foreach(lc, old_mvclauses)
+					{
+						RestrictStatData *rsd =	(RestrictStatData *) lfirst(lc);
+						Assert(IsA(rsd, RestrictStatData));
 
-				case BMS_DIFFERENT:
-					/* do nothing - keep both stats */
-					break;
+						if (bms_is_subset(rsd->mvattrs, statsattrs))
+							mvclauses = lappend(mvclauses, rsd);
+						else
+							nonmvclauses = lappend(nonmvclauses, rsd);
+					}
+					resultattrs = bms_intersect(resultattrs, 
+										mvstatslist[candidatestats]->attrs);
+				}
+				resultstats = list_make1(mvstatslist[candidatestats]);
 			}
-
-			bms_free(prev);
 		}
 
-		bms_free(curr);
-	}
-
-	/* can't reduce all statistics (at least one has to remain) */
-	Assert(nmvstats > 0);
+		if (bms_num_members(resultattrs) < 2)
+		{
+			/*
+			 * make this non-mv if mvclause covers only one mv-attribute.
+			 */
+			nonmvclauses = list_concat(nonmvclauses, mvclauses);
+			mvclauses = NULL;
+			resultattrs = NULL;
+			resultstats = NIL;
+		}
 
-	/* now, let's remove the reduced statistics from the arrays */
-	list_free(stats);
-	stats = NIL;
+		/*
+		 * All mvclauses are covered by the candidate stats	here.
+		 */
+		rdata->mvclause =
+			stripRestrictStatData(mvclauses, rdata->boolop, NULL);
+		rdata->children = partialclauses;
+		rdata->mvattrs = resultattrs;
+		rdata->nonmvclause =
+			stripRestrictStatData(nonmvclauses, rdata->boolop, NULL);
+		rdata->mvstats = resultstats;
 
-	for (i = 0; i < nmvstats; i++)
+	}
+	else if (not_clause(clause))
 	{
-		MVStatisticInfo *info;
-
-		pfree(stats_attnums[i]);
+		Node *nd = (Node *) linitial(((BoolExpr*)clause)->args);
+		RestrictStatData *tmpsd;
 
-		if (redundant[i])
-			continue;
-
-		info = makeNode(MVStatisticInfo);
-		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
-
-		stats = lappend(stats, info);
+		tmpsd = transformRestrictInfoForEstimate(root, list_make1(nd),
+												 relid, sjinfo);
+		rdata->children = list_make1(tmpsd);
 	}
-
-	pfree(mvstats);
-	pfree(stats_attnums);
-	pfree(redundant);
-
-	return stats;
-}
-
-static Node**
-make_clauses_array(List *clauses, int *nclauses)
-{
-	int i;
-	ListCell *l;
-
-	Node** clauses_array;
-
-	*nclauses = list_length(clauses);
-	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
-
-	i = 0;
-	foreach (l, clauses)
-		clauses_array[i++] = (Node *)lfirst(l);
-
-	*nclauses = i;
-
-	return clauses_array;
-}
-
-static Bitmapset **
-make_clauses_attnums(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
-					 int type, Node **clauses, int nclauses)
-{
-	int			i;
-	Index		relid;
-	Bitmapset **clauses_attnums
-		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
-
-	for (i = 0; i < nclauses; i++)
+	else if (is_opclause(clause) &&
+			 list_length(((OpExpr *) clause)->args) == 2)
 	{
-		Bitmapset * attnums = NULL;
+		Node *varnode = get_leftop((Expr*)clause);
+		Node *nonvarnode = get_rightop((Expr*)clause);
 
-		if (! clause_is_mv_compatible(root, clauses[i], varRelid,
-									  &relid, &attnums, sjinfo, type))
-			elog(ERROR, "should not get non-mv-compatible cluase");
+		/* Place var on vernode if any */
+		if (!IsA(varnode, Var))
+		{
+			Node *tmp = nonvarnode;
+			nonvarnode = varnode;
+			varnode = tmp;
+		}
+		
+		if (IsA(varnode, Var) && is_pseudo_constant_clause(nonvarnode))
+		{
+			Var *var = (Var *)varnode;
+			List *statslist = root->simple_rel_array[relid]->mvstatlist;
+			Oid opno = ((OpExpr*)clause)->opno;
+			int varmvbitmap = get_oprmvstat(opno);
+
+			if (varmvbitmap &&
+				!IS_SPECIAL_VARNO(var->varno) &&
+				AttrNumberIsForUserDefinedAttr(var->varattno))
+			{
+				List *mvstats = NIL;
+				ListCell *lc;
+				Bitmapset *varattrs = bms_make_singleton(var->varattno);
 
-		clauses_attnums[i] = attnums;
+				/*
+				 * Add mv statistics if it is applicable on this expression
+				 */
+				foreach (lc, statslist)
+				{
+					int k;
+					MVStatisticInfo *stats = (MVStatisticInfo *) lfirst(lc);
+					Bitmapset *statsattrs = NULL;
+					int statsmvbitmap =
+						(stats->mcv_built ? MVSTATISTIC_MCV : 0) |
+						(stats->hist_built ? MVSTATISTIC_HIST : 0) |
+						(stats->deps_built ? MVSTATISTIC_FDEP : 0);
+
+					for (k = 0 ; k < stats->stakeys->dim1 ; k++)
+						statsattrs = bms_add_member(statsattrs,
+													stats->stakeys->values[k]);
+					/* XXX: Does this work as expected? */
+					if (bms_is_subset(varattrs, statsattrs) &&
+						(statsmvbitmap & varmvbitmap))
+					{
+						bm_mvstat *mvstatsent = palloc0(sizeof(bm_mvstat));
+						mvstatsent->attrs = statsattrs;
+						mvstatsent->stats = stats;
+						mvstatsent->mvkind = statsmvbitmap;
+						mvstats = lappend(mvstats, mvstatsent);
+					}
+				}
+				if (mvstats)
+				{
+					/* MV stats is potentially applicable on this expression */
+					ereport(DEBUG1,
+							(errmsg ("%sMATCH[%d][%d](varno = %d, attno = %d)",
+									 head, level, i,
+									 var->varno, var->varattno),
+							 errhidestmt(level)));
+
+					rdata->mvstats = mvstats;
+					rdata->mvattrs = varattrs;
+				}
+			}
+		}
+		else
+		{
+			ereport(DEBUG1,
+					(errmsg ("%sno match BinOp[%d][%d]: r=%d, l=%d",
+							 head, level, i,
+							 varnode->type, nonvarnode->type),
+					 errhidestmt(level)));
+		}
 	}
+	else if (IsA(clause, NullTest))
+	{
+		NullTest *expr = (NullTest*)clause;
+		Var *var = (Var *)(expr->arg);
 
-	return clauses_attnums;
-}
-
-static bool*
-make_cover_map(Bitmapset **stats_attnums, int nmvstats,
-			   Bitmapset **clauses_attnums, int nclauses)
-{
-	int		i, j;
-	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+		if (IsA(var, Var) &&
+			!IS_SPECIAL_VARNO(var->varno) &&
+			AttrNumberIsForUserDefinedAttr(var->varattno))
+		{
+			Bitmapset *varattrs = bms_make_singleton(var->varattno);
+			List *mvstats = NIL;
+			ListCell *lc;
 
-	for (i = 0; i < nmvstats; i++)
-		for (j = 0; j < nclauses; j++)
-			cover_map[i * nclauses + j]
-				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+			foreach(lc, root->simple_rel_array[relid]->mvstatlist)
+			{
+				MVStatisticInfo *stats = (MVStatisticInfo *) lfirst(lc);
+				Bitmapset *statsattrs = NULL;			
+				int k;
+
+				for (k = 0 ; k < stats->stakeys->dim1 ; k++)
+					statsattrs = bms_add_member(statsattrs,
+											stats->stakeys->values[k]);
+				if (bms_is_subset(varattrs, statsattrs))
+				{
+					bm_mvstat *mvstatsent = palloc0(sizeof(bm_mvstat));
+					mvstatsent->stats = stats;
+					mvstatsent->attrs = statsattrs;
+					mvstatsent->mvkind = (MVSTATISTIC_MCV |MVSTATISTIC_HIST);
+					mvstats = lappend(mvstats, mvstatsent);
+				}
+			}
+			if (mvstats)
+			{
+				rdata->mvstats = mvstats;
+				rdata->mvattrs = varattrs;
+			}		
+		}
+	}
+	else
+	{
+		ereport(DEBUG1,
+				(errmsg ("%sno match node(%d)[%d][%d]",
+						 head, clause->type, level, i),
+				 errhidestmt(level)));
+	}
 
-	return cover_map;
+	return rdata;
 }
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 7b32247..61e578f 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -45,6 +45,7 @@
 #include "utils/rel.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
+#include "utils/mvstats.h"
 
 /* Hook for plugins to get control in get_attavgwidth() */
 get_attavgwidth_hook_type get_attavgwidth_hook = NULL;
@@ -1345,6 +1346,45 @@ get_oprjoin(Oid opno)
 		return (RegProcedure) InvalidOid;
 }
 
+/*
+ * get_oprmvstat
+ *
+ *		Returns mv stats compatibility for computing selectivity
+ *      Return valueis bitwise or of MVSTATISTIC_* symbols
+ */
+int
+get_oprmvstat(Oid opno)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum tmp;
+		bool  isnull;
+		char *str;
+		int result = 0;
+
+		tmp = SysCacheGetAttr(OPEROID, tp,
+							  Anum_pg_operator_oprmvstat, &isnull);
+		if (!isnull)
+		{
+			str = TextDatumGetCString(tmp);
+			if (strlen(str) == 3)
+			{
+				if (str[0] != '-') result |= MVSTATISTIC_MCV;
+				if (str[1] != '-') result |= MVSTATISTIC_HIST;
+				if (str[2] != '-') result |= MVSTATISTIC_FDEP;
+			}
+		}
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return 0;
+}
+
+
 /*				---------- FUNCTION CACHE ----------					 */
 
 /*
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 26c9d4e..c75ac72 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -49,6 +49,9 @@ CATALOG(pg_operator,2617)
 	regproc		oprcode;		/* OID of underlying function */
 	regproc		oprrest;		/* OID of restriction estimator, or 0 */
 	regproc		oprjoin;		/* OID of join estimator, or 0 */
+#ifdef CATALOG_VARLEN			/* variable-length fields start here */
+	text		oprmvstat;		/* MV stat compatibility in '[m-][h-][f-]' */
+#endif
 } FormData_pg_operator;
 
 /* ----------------
@@ -63,7 +66,7 @@ typedef FormData_pg_operator *Form_pg_operator;
  * ----------------
  */
 
-#define Natts_pg_operator				14
+#define Natts_pg_operator				15
 #define Anum_pg_operator_oprname		1
 #define Anum_pg_operator_oprnamespace	2
 #define Anum_pg_operator_oprowner		3
@@ -78,6 +81,7 @@ typedef FormData_pg_operator *Form_pg_operator;
 #define Anum_pg_operator_oprcode		12
 #define Anum_pg_operator_oprrest		13
 #define Anum_pg_operator_oprjoin		14
+#define Anum_pg_operator_oprmvstat		15
 
 /* ----------------
  *		initial contents of pg_operator
@@ -91,1735 +95,1735 @@ typedef FormData_pg_operator *Form_pg_operator;
  * for the underlying function.
  */
 
-DATA(insert OID =  15 ( "="		   PGNSP PGUID b t t	23	20	16 416	36 int48eq eqsel eqjoinsel ));
+DATA(insert OID =  15 ( "="		   PGNSP PGUID b t t	23	20	16 416	36 int48eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  36 ( "<>"	   PGNSP PGUID b f f	23	20	16 417	15 int48ne neqsel neqjoinsel ));
+DATA(insert OID =  36 ( "<>"	   PGNSP PGUID b f f	23	20	16 417	15 int48ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID =  37 ( "<"		   PGNSP PGUID b f f	23	20	16 419	82 int48lt scalarltsel scalarltjoinsel ));
+DATA(insert OID =  37 ( "<"		   PGNSP PGUID b f f	23	20	16 419	82 int48lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID =  76 ( ">"		   PGNSP PGUID b f f	23	20	16 418	80 int48gt scalargtsel scalargtjoinsel ));
+DATA(insert OID =  76 ( ">"		   PGNSP PGUID b f f	23	20	16 418	80 int48gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID =  80 ( "<="	   PGNSP PGUID b f f	23	20	16 430	76 int48le scalarltsel scalarltjoinsel ));
+DATA(insert OID =  80 ( "<="	   PGNSP PGUID b f f	23	20	16 430	76 int48le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID =  82 ( ">="	   PGNSP PGUID b f f	23	20	16 420	37 int48ge scalargtsel scalargtjoinsel ));
+DATA(insert OID =  82 ( ">="	   PGNSP PGUID b f f	23	20	16 420	37 int48ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID =  58 ( "<"		   PGNSP PGUID b f f	16	16	16	59	1695 boollt scalarltsel scalarltjoinsel ));
+DATA(insert OID =  58 ( "<"		   PGNSP PGUID b f f	16	16	16	59	1695 boollt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID =  59 ( ">"		   PGNSP PGUID b f f	16	16	16	58	1694 boolgt scalargtsel scalargtjoinsel ));
+DATA(insert OID =  59 ( ">"		   PGNSP PGUID b f f	16	16	16	58	1694 boolgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID =  85 ( "<>"	   PGNSP PGUID b f f	16	16	16	85	91 boolne neqsel neqjoinsel ));
+DATA(insert OID =  85 ( "<>"	   PGNSP PGUID b f f	16	16	16	85	91 boolne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 #define BooleanNotEqualOperator   85
-DATA(insert OID =  91 ( "="		   PGNSP PGUID b t t	16	16	16	91	85 booleq eqsel eqjoinsel ));
+DATA(insert OID =  91 ( "="		   PGNSP PGUID b t t	16	16	16	91	85 booleq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define BooleanEqualOperator   91
-DATA(insert OID = 1694 (  "<="	   PGNSP PGUID b f f	16	16	16 1695 59 boolle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1694 (  "<="	   PGNSP PGUID b f f	16	16	16 1695 59 boolle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1695 (  ">="	   PGNSP PGUID b f f	16	16	16 1694 58 boolge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1695 (  ">="	   PGNSP PGUID b f f	16	16	16 1694 58 boolge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID =  92 ( "="		   PGNSP PGUID b t t	18	18	16	92 630 chareq eqsel eqjoinsel ));
+DATA(insert OID =  92 ( "="		   PGNSP PGUID b t t	18	18	16	92 630 chareq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  93 ( "="		   PGNSP PGUID b t t	19	19	16	93 643 nameeq eqsel eqjoinsel ));
+DATA(insert OID =  93 ( "="		   PGNSP PGUID b t t	19	19	16	93 643 nameeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  94 ( "="		   PGNSP PGUID b t t	21	21	16	94 519 int2eq eqsel eqjoinsel ));
+DATA(insert OID =  94 ( "="		   PGNSP PGUID b t t	21	21	16	94 519 int2eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  95 ( "<"		   PGNSP PGUID b f f	21	21	16 520 524 int2lt scalarltsel scalarltjoinsel ));
+DATA(insert OID =  95 ( "<"		   PGNSP PGUID b f f	21	21	16 520 524 int2lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID =  96 ( "="		   PGNSP PGUID b t t	23	23	16	96 518 int4eq eqsel eqjoinsel ));
+DATA(insert OID =  96 ( "="		   PGNSP PGUID b t t	23	23	16	96 518 int4eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define Int4EqualOperator	96
-DATA(insert OID =  97 ( "<"		   PGNSP PGUID b f f	23	23	16 521 525 int4lt scalarltsel scalarltjoinsel ));
+DATA(insert OID =  97 ( "<"		   PGNSP PGUID b f f	23	23	16 521 525 int4lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define Int4LessOperator	97
-DATA(insert OID =  98 ( "="		   PGNSP PGUID b t t	25	25	16	98 531 texteq eqsel eqjoinsel ));
+DATA(insert OID =  98 ( "="		   PGNSP PGUID b t t	25	25	16	98 531 texteq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define TextEqualOperator	98
 
-DATA(insert OID = 349 (  "||"	   PGNSP PGUID b f f 2277 2283 2277 0 0 array_append   -	   -	 ));
+DATA(insert OID = 349 (  "||"	   PGNSP PGUID b f f 2277 2283 2277 0 0 array_append   -	   -	 "---"));
 DESCR("append element onto end of array");
-DATA(insert OID = 374 (  "||"	   PGNSP PGUID b f f 2283 2277 2277 0 0 array_prepend  -	   -	 ));
+DATA(insert OID = 374 (  "||"	   PGNSP PGUID b f f 2283 2277 2277 0 0 array_prepend  -	   -	 "---"));
 DESCR("prepend element onto front of array");
-DATA(insert OID = 375 (  "||"	   PGNSP PGUID b f f 2277 2277 2277 0 0 array_cat	   -	   -	 ));
+DATA(insert OID = 375 (  "||"	   PGNSP PGUID b f f 2277 2277 2277 0 0 array_cat	   -	   -	 "---"));
 DESCR("concatenate");
 
-DATA(insert OID = 352 (  "="	   PGNSP PGUID b f t	28	28	16 352	 0 xideq eqsel eqjoinsel ));
+DATA(insert OID = 352 (  "="	   PGNSP PGUID b f t	28	28	16 352	 0 xideq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 353 (  "="	   PGNSP PGUID b f f	28	23	16	 0	 0 xideqint4 eqsel eqjoinsel ));
+DATA(insert OID = 353 (  "="	   PGNSP PGUID b f f	28	23	16	 0	 0 xideqint4 eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 388 (  "!"	   PGNSP PGUID r f f	20	 0	1700  0  0 numeric_fac - - ));
+DATA(insert OID = 388 (  "!"	   PGNSP PGUID r f f	20	 0	1700  0  0 numeric_fac - - "---"));
 DESCR("factorial");
-DATA(insert OID = 389 (  "!!"	   PGNSP PGUID l f f	 0	20	1700  0  0 numeric_fac - - ));
+DATA(insert OID = 389 (  "!!"	   PGNSP PGUID l f f	 0	20	1700  0  0 numeric_fac - - "---"));
 DESCR("deprecated, use ! instead");
-DATA(insert OID = 385 (  "="	   PGNSP PGUID b f t	29	29	16 385	 0 cideq eqsel eqjoinsel ));
+DATA(insert OID = 385 (  "="	   PGNSP PGUID b f t	29	29	16 385	 0 cideq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 386 (  "="	   PGNSP PGUID b f t	22	22	16 386	 0 int2vectoreq eqsel eqjoinsel ));
+DATA(insert OID = 386 (  "="	   PGNSP PGUID b f t	22	22	16 386	 0 int2vectoreq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 
-DATA(insert OID = 387 (  "="	   PGNSP PGUID b t f	27	27	16 387 402 tideq eqsel eqjoinsel ));
+DATA(insert OID = 387 (  "="	   PGNSP PGUID b t f	27	27	16 387 402 tideq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define TIDEqualOperator   387
-DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
+DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define TIDLessOperator    2799
-DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
+DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 411 ( "<>"	   PGNSP PGUID b f f	20	20	16 411 410 int8ne neqsel neqjoinsel ));
+DATA(insert OID = 411 ( "<>"	   PGNSP PGUID b f f	20	20	16 411 410 int8ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 412 ( "<"		   PGNSP PGUID b f f	20	20	16 413 415 int8lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 412 ( "<"		   PGNSP PGUID b f f	20	20	16 413 415 int8lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define Int8LessOperator	412
-DATA(insert OID = 413 ( ">"		   PGNSP PGUID b f f	20	20	16 412 414 int8gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 413 ( ">"		   PGNSP PGUID b f f	20	20	16 412 414 int8gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 414 ( "<="	   PGNSP PGUID b f f	20	20	16 415 413 int8le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 414 ( "<="	   PGNSP PGUID b f f	20	20	16 415 413 int8le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 415 ( ">="	   PGNSP PGUID b f f	20	20	16 414 412 int8ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 415 ( ">="	   PGNSP PGUID b f f	20	20	16 414 412 int8ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 416 ( "="		   PGNSP PGUID b t t	20	23	16	15 417 int84eq eqsel eqjoinsel ));
+DATA(insert OID = 416 ( "="		   PGNSP PGUID b t t	20	23	16	15 417 int84eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 417 ( "<>"	   PGNSP PGUID b f f	20	23	16	36 416 int84ne neqsel neqjoinsel ));
+DATA(insert OID = 417 ( "<>"	   PGNSP PGUID b f f	20	23	16	36 416 int84ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 418 ( "<"		   PGNSP PGUID b f f	20	23	16	76 430 int84lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 418 ( "<"		   PGNSP PGUID b f f	20	23	16	76 430 int84lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 419 ( ">"		   PGNSP PGUID b f f	20	23	16	37 420 int84gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 419 ( ">"		   PGNSP PGUID b f f	20	23	16	37 420 int84gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 420 ( "<="	   PGNSP PGUID b f f	20	23	16	82 419 int84le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 420 ( "<="	   PGNSP PGUID b f f	20	23	16	82 419 int84le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 430 ( ">="	   PGNSP PGUID b f f	20	23	16	80 418 int84ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 430 ( ">="	   PGNSP PGUID b f f	20	23	16	80 418 int84ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 439 (  "%"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8mod - - ));
+DATA(insert OID = 439 (  "%"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8mod - - "---"));
 DESCR("modulus");
-DATA(insert OID = 473 (  "@"	   PGNSP PGUID l f f	 0	20	20	 0	 0 int8abs - - ));
+DATA(insert OID = 473 (  "@"	   PGNSP PGUID l f f	 0	20	20	 0	 0 int8abs - - "---"));
 DESCR("absolute value");
 
-DATA(insert OID = 484 (  "-"	   PGNSP PGUID l f f	 0	20	20	 0	 0 int8um - - ));
+DATA(insert OID = 484 (  "-"	   PGNSP PGUID l f f	 0	20	20	 0	 0 int8um - - "---"));
 DESCR("negate");
-DATA(insert OID = 485 (  "<<"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_left positionsel positionjoinsel ));
+DATA(insert OID = 485 (  "<<"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_left positionsel positionjoinsel "---"));
 DESCR("is left of");
-DATA(insert OID = 486 (  "&<"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_overleft positionsel positionjoinsel ));
+DATA(insert OID = 486 (  "&<"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_overleft positionsel positionjoinsel "---"));
 DESCR("overlaps or is left of");
-DATA(insert OID = 487 (  "&>"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_overright positionsel positionjoinsel ));
+DATA(insert OID = 487 (  "&>"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_overright positionsel positionjoinsel "---"));
 DESCR("overlaps or is right of");
-DATA(insert OID = 488 (  ">>"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_right positionsel positionjoinsel ));
+DATA(insert OID = 488 (  ">>"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_right positionsel positionjoinsel "---"));
 DESCR("is right of");
-DATA(insert OID = 489 (  "<@"	   PGNSP PGUID b f f 604 604	16 490	 0 poly_contained contsel contjoinsel ));
+DATA(insert OID = 489 (  "<@"	   PGNSP PGUID b f f 604 604	16 490	 0 poly_contained contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 490 (  "@>"	   PGNSP PGUID b f f 604 604	16 489	 0 poly_contain contsel contjoinsel ));
+DATA(insert OID = 490 (  "@>"	   PGNSP PGUID b f f 604 604	16 489	 0 poly_contain contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 491 (  "~="	   PGNSP PGUID b f f 604 604	16 491	 0 poly_same eqsel eqjoinsel ));
+DATA(insert OID = 491 (  "~="	   PGNSP PGUID b f f 604 604	16 491	 0 poly_same eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 492 (  "&&"	   PGNSP PGUID b f f 604 604	16 492	 0 poly_overlap areasel areajoinsel ));
+DATA(insert OID = 492 (  "&&"	   PGNSP PGUID b f f 604 604	16 492	 0 poly_overlap areasel areajoinsel "---"));
 DESCR("overlaps");
-DATA(insert OID = 493 (  "<<"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_left positionsel positionjoinsel ));
+DATA(insert OID = 493 (  "<<"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_left positionsel positionjoinsel "---"));
 DESCR("is left of");
-DATA(insert OID = 494 (  "&<"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_overleft positionsel positionjoinsel ));
+DATA(insert OID = 494 (  "&<"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_overleft positionsel positionjoinsel "---"));
 DESCR("overlaps or is left of");
-DATA(insert OID = 495 (  "&>"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_overright positionsel positionjoinsel ));
+DATA(insert OID = 495 (  "&>"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_overright positionsel positionjoinsel "---"));
 DESCR("overlaps or is right of");
-DATA(insert OID = 496 (  ">>"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_right positionsel positionjoinsel ));
+DATA(insert OID = 496 (  ">>"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_right positionsel positionjoinsel "---"));
 DESCR("is right of");
-DATA(insert OID = 497 (  "<@"	   PGNSP PGUID b f f 603 603	16 498	 0 box_contained contsel contjoinsel ));
+DATA(insert OID = 497 (  "<@"	   PGNSP PGUID b f f 603 603	16 498	 0 box_contained contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 498 (  "@>"	   PGNSP PGUID b f f 603 603	16 497	 0 box_contain contsel contjoinsel ));
+DATA(insert OID = 498 (  "@>"	   PGNSP PGUID b f f 603 603	16 497	 0 box_contain contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 499 (  "~="	   PGNSP PGUID b f f 603 603	16 499	 0 box_same eqsel eqjoinsel ));
+DATA(insert OID = 499 (  "~="	   PGNSP PGUID b f f 603 603	16 499	 0 box_same eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 500 (  "&&"	   PGNSP PGUID b f f 603 603	16 500	 0 box_overlap areasel areajoinsel ));
+DATA(insert OID = 500 (  "&&"	   PGNSP PGUID b f f 603 603	16 500	 0 box_overlap areasel areajoinsel "---"));
 DESCR("overlaps");
-DATA(insert OID = 501 (  ">="	   PGNSP PGUID b f f 603 603	16 505 504 box_ge areasel areajoinsel ));
+DATA(insert OID = 501 (  ">="	   PGNSP PGUID b f f 603 603	16 505 504 box_ge areasel areajoinsel "---"));
 DESCR("greater than or equal by area");
-DATA(insert OID = 502 (  ">"	   PGNSP PGUID b f f 603 603	16 504 505 box_gt areasel areajoinsel ));
+DATA(insert OID = 502 (  ">"	   PGNSP PGUID b f f 603 603	16 504 505 box_gt areasel areajoinsel "---"));
 DESCR("greater than by area");
-DATA(insert OID = 503 (  "="	   PGNSP PGUID b f f 603 603	16 503	 0 box_eq eqsel eqjoinsel ));
+DATA(insert OID = 503 (  "="	   PGNSP PGUID b f f 603 603	16 503	 0 box_eq eqsel eqjoinsel "mhf"));
 DESCR("equal by area");
-DATA(insert OID = 504 (  "<"	   PGNSP PGUID b f f 603 603	16 502 501 box_lt areasel areajoinsel ));
+DATA(insert OID = 504 (  "<"	   PGNSP PGUID b f f 603 603	16 502 501 box_lt areasel areajoinsel "---"));
 DESCR("less than by area");
-DATA(insert OID = 505 (  "<="	   PGNSP PGUID b f f 603 603	16 501 502 box_le areasel areajoinsel ));
+DATA(insert OID = 505 (  "<="	   PGNSP PGUID b f f 603 603	16 501 502 box_le areasel areajoinsel "---"));
 DESCR("less than or equal by area");
-DATA(insert OID = 506 (  ">^"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_above positionsel positionjoinsel ));
+DATA(insert OID = 506 (  ">^"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_above positionsel positionjoinsel "---"));
 DESCR("is above");
-DATA(insert OID = 507 (  "<<"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_left positionsel positionjoinsel ));
+DATA(insert OID = 507 (  "<<"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_left positionsel positionjoinsel "---"));
 DESCR("is left of");
-DATA(insert OID = 508 (  ">>"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_right positionsel positionjoinsel ));
+DATA(insert OID = 508 (  ">>"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_right positionsel positionjoinsel "---"));
 DESCR("is right of");
-DATA(insert OID = 509 (  "<^"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_below positionsel positionjoinsel ));
+DATA(insert OID = 509 (  "<^"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_below positionsel positionjoinsel "---"));
 DESCR("is below");
-DATA(insert OID = 510 (  "~="	   PGNSP PGUID b f f 600 600	16 510 713 point_eq eqsel eqjoinsel ));
+DATA(insert OID = 510 (  "~="	   PGNSP PGUID b f f 600 600	16 510 713 point_eq eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 511 (  "<@"	   PGNSP PGUID b f f 600 603	16 433	 0 on_pb contsel contjoinsel ));
+DATA(insert OID = 511 (  "<@"	   PGNSP PGUID b f f 600 603	16 433	 0 on_pb contsel contjoinsel "---"));
 DESCR("point inside box");
-DATA(insert OID = 433 (  "@>"	   PGNSP PGUID b f f 603 600	16 511	 0 box_contain_pt contsel contjoinsel ));
+DATA(insert OID = 433 (  "@>"	   PGNSP PGUID b f f 603 600	16 511	 0 box_contain_pt contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 512 (  "<@"	   PGNSP PGUID b f f 600 602	16 755	 0 on_ppath - - ));
+DATA(insert OID = 512 (  "<@"	   PGNSP PGUID b f f 600 602	16 755	 0 on_ppath - - "---"));
 DESCR("point within closed path, or point on open path");
-DATA(insert OID = 513 (  "@@"	   PGNSP PGUID l f f	 0 603 600	 0	 0 box_center - - ));
+DATA(insert OID = 513 (  "@@"	   PGNSP PGUID l f f	 0 603 600	 0	 0 box_center - - "---"));
 DESCR("center of");
-DATA(insert OID = 514 (  "*"	   PGNSP PGUID b f f	23	23	23 514	 0 int4mul - - ));
+DATA(insert OID = 514 (  "*"	   PGNSP PGUID b f f	23	23	23 514	 0 int4mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 517 (  "<->"	   PGNSP PGUID b f f 600 600 701 517	 0 point_distance - - ));
+DATA(insert OID = 517 (  "<->"	   PGNSP PGUID b f f 600 600 701 517	 0 point_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 518 (  "<>"	   PGNSP PGUID b f f	23	23	16 518	96 int4ne neqsel neqjoinsel ));
+DATA(insert OID = 518 (  "<>"	   PGNSP PGUID b f f	23	23	16 518	96 int4ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 519 (  "<>"	   PGNSP PGUID b f f	21	21	16 519	94 int2ne neqsel neqjoinsel ));
+DATA(insert OID = 519 (  "<>"	   PGNSP PGUID b f f	21	21	16 519	94 int2ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 520 (  ">"	   PGNSP PGUID b f f	21	21	16	95 522 int2gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 520 (  ">"	   PGNSP PGUID b f f	21	21	16	95 522 int2gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 521 (  ">"	   PGNSP PGUID b f f	23	23	16	97 523 int4gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 521 (  ">"	   PGNSP PGUID b f f	23	23	16	97 523 int4gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 522 (  "<="	   PGNSP PGUID b f f	21	21	16 524 520 int2le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 522 (  "<="	   PGNSP PGUID b f f	21	21	16 524 520 int2le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 523 (  "<="	   PGNSP PGUID b f f	23	23	16 525 521 int4le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 523 (  "<="	   PGNSP PGUID b f f	23	23	16 525 521 int4le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 524 (  ">="	   PGNSP PGUID b f f	21	21	16 522	95 int2ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 524 (  ">="	   PGNSP PGUID b f f	21	21	16 522	95 int2ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 525 (  ">="	   PGNSP PGUID b f f	23	23	16 523	97 int4ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 525 (  ">="	   PGNSP PGUID b f f	23	23	16 523	97 int4ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 526 (  "*"	   PGNSP PGUID b f f	21	21	21 526	 0 int2mul - - ));
+DATA(insert OID = 526 (  "*"	   PGNSP PGUID b f f	21	21	21 526	 0 int2mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 527 (  "/"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2div - - ));
+DATA(insert OID = 527 (  "/"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2div - - "---"));
 DESCR("divide");
-DATA(insert OID = 528 (  "/"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4div - - ));
+DATA(insert OID = 528 (  "/"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4div - - "---"));
 DESCR("divide");
-DATA(insert OID = 529 (  "%"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2mod - - ));
+DATA(insert OID = 529 (  "%"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2mod - - "---"));
 DESCR("modulus");
-DATA(insert OID = 530 (  "%"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4mod - - ));
+DATA(insert OID = 530 (  "%"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4mod - - "---"));
 DESCR("modulus");
-DATA(insert OID = 531 (  "<>"	   PGNSP PGUID b f f	25	25	16 531	98 textne neqsel neqjoinsel ));
+DATA(insert OID = 531 (  "<>"	   PGNSP PGUID b f f	25	25	16 531	98 textne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 532 (  "="	   PGNSP PGUID b t t	21	23	16 533 538 int24eq eqsel eqjoinsel ));
+DATA(insert OID = 532 (  "="	   PGNSP PGUID b t t	21	23	16 533 538 int24eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 533 (  "="	   PGNSP PGUID b t t	23	21	16 532 539 int42eq eqsel eqjoinsel ));
+DATA(insert OID = 533 (  "="	   PGNSP PGUID b t t	23	21	16 532 539 int42eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 534 (  "<"	   PGNSP PGUID b f f	21	23	16 537 542 int24lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 534 (  "<"	   PGNSP PGUID b f f	21	23	16 537 542 int24lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 535 (  "<"	   PGNSP PGUID b f f	23	21	16 536 543 int42lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 535 (  "<"	   PGNSP PGUID b f f	23	21	16 536 543 int42lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 536 (  ">"	   PGNSP PGUID b f f	21	23	16 535 540 int24gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 536 (  ">"	   PGNSP PGUID b f f	21	23	16 535 540 int24gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 537 (  ">"	   PGNSP PGUID b f f	23	21	16 534 541 int42gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 537 (  ">"	   PGNSP PGUID b f f	23	21	16 534 541 int42gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 538 (  "<>"	   PGNSP PGUID b f f	21	23	16 539 532 int24ne neqsel neqjoinsel ));
+DATA(insert OID = 538 (  "<>"	   PGNSP PGUID b f f	21	23	16 539 532 int24ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 539 (  "<>"	   PGNSP PGUID b f f	23	21	16 538 533 int42ne neqsel neqjoinsel ));
+DATA(insert OID = 539 (  "<>"	   PGNSP PGUID b f f	23	21	16 538 533 int42ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 540 (  "<="	   PGNSP PGUID b f f	21	23	16 543 536 int24le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 540 (  "<="	   PGNSP PGUID b f f	21	23	16 543 536 int24le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 541 (  "<="	   PGNSP PGUID b f f	23	21	16 542 537 int42le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 541 (  "<="	   PGNSP PGUID b f f	23	21	16 542 537 int42le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 542 (  ">="	   PGNSP PGUID b f f	21	23	16 541 534 int24ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 542 (  ">="	   PGNSP PGUID b f f	21	23	16 541 534 int24ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 543 (  ">="	   PGNSP PGUID b f f	23	21	16 540 535 int42ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 543 (  ">="	   PGNSP PGUID b f f	23	21	16 540 535 int42ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 544 (  "*"	   PGNSP PGUID b f f	21	23	23 545	 0 int24mul - - ));
+DATA(insert OID = 544 (  "*"	   PGNSP PGUID b f f	21	23	23 545	 0 int24mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 545 (  "*"	   PGNSP PGUID b f f	23	21	23 544	 0 int42mul - - ));
+DATA(insert OID = 545 (  "*"	   PGNSP PGUID b f f	23	21	23 544	 0 int42mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 546 (  "/"	   PGNSP PGUID b f f	21	23	23	 0	 0 int24div - - ));
+DATA(insert OID = 546 (  "/"	   PGNSP PGUID b f f	21	23	23	 0	 0 int24div - - "---"));
 DESCR("divide");
-DATA(insert OID = 547 (  "/"	   PGNSP PGUID b f f	23	21	23	 0	 0 int42div - - ));
+DATA(insert OID = 547 (  "/"	   PGNSP PGUID b f f	23	21	23	 0	 0 int42div - - "---"));
 DESCR("divide");
-DATA(insert OID = 550 (  "+"	   PGNSP PGUID b f f	21	21	21 550	 0 int2pl - - ));
+DATA(insert OID = 550 (  "+"	   PGNSP PGUID b f f	21	21	21 550	 0 int2pl - - "---"));
 DESCR("add");
-DATA(insert OID = 551 (  "+"	   PGNSP PGUID b f f	23	23	23 551	 0 int4pl - - ));
+DATA(insert OID = 551 (  "+"	   PGNSP PGUID b f f	23	23	23 551	 0 int4pl - - "---"));
 DESCR("add");
-DATA(insert OID = 552 (  "+"	   PGNSP PGUID b f f	21	23	23 553	 0 int24pl - - ));
+DATA(insert OID = 552 (  "+"	   PGNSP PGUID b f f	21	23	23 553	 0 int24pl - - "---"));
 DESCR("add");
-DATA(insert OID = 553 (  "+"	   PGNSP PGUID b f f	23	21	23 552	 0 int42pl - - ));
+DATA(insert OID = 553 (  "+"	   PGNSP PGUID b f f	23	21	23 552	 0 int42pl - - "---"));
 DESCR("add");
-DATA(insert OID = 554 (  "-"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2mi - - ));
+DATA(insert OID = 554 (  "-"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 555 (  "-"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4mi - - ));
+DATA(insert OID = 555 (  "-"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 556 (  "-"	   PGNSP PGUID b f f	21	23	23	 0	 0 int24mi - - ));
+DATA(insert OID = 556 (  "-"	   PGNSP PGUID b f f	21	23	23	 0	 0 int24mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 557 (  "-"	   PGNSP PGUID b f f	23	21	23	 0	 0 int42mi - - ));
+DATA(insert OID = 557 (  "-"	   PGNSP PGUID b f f	23	21	23	 0	 0 int42mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 558 (  "-"	   PGNSP PGUID l f f	 0	23	23	 0	 0 int4um - - ));
+DATA(insert OID = 558 (  "-"	   PGNSP PGUID l f f	 0	23	23	 0	 0 int4um - - "---"));
 DESCR("negate");
-DATA(insert OID = 559 (  "-"	   PGNSP PGUID l f f	 0	21	21	 0	 0 int2um - - ));
+DATA(insert OID = 559 (  "-"	   PGNSP PGUID l f f	 0	21	21	 0	 0 int2um - - "---"));
 DESCR("negate");
-DATA(insert OID = 560 (  "="	   PGNSP PGUID b t t 702 702	16 560 561 abstimeeq eqsel eqjoinsel ));
+DATA(insert OID = 560 (  "="	   PGNSP PGUID b t t 702 702	16 560 561 abstimeeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 561 (  "<>"	   PGNSP PGUID b f f 702 702	16 561 560 abstimene neqsel neqjoinsel ));
+DATA(insert OID = 561 (  "<>"	   PGNSP PGUID b f f 702 702	16 561 560 abstimene neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 562 (  "<"	   PGNSP PGUID b f f 702 702	16 563 565 abstimelt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 562 (  "<"	   PGNSP PGUID b f f 702 702	16 563 565 abstimelt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 563 (  ">"	   PGNSP PGUID b f f 702 702	16 562 564 abstimegt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 563 (  ">"	   PGNSP PGUID b f f 702 702	16 562 564 abstimegt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 564 (  "<="	   PGNSP PGUID b f f 702 702	16 565 563 abstimele scalarltsel scalarltjoinsel ));
+DATA(insert OID = 564 (  "<="	   PGNSP PGUID b f f 702 702	16 565 563 abstimele scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 565 (  ">="	   PGNSP PGUID b f f 702 702	16 564 562 abstimege scalargtsel scalargtjoinsel ));
+DATA(insert OID = 565 (  ">="	   PGNSP PGUID b f f 702 702	16 564 562 abstimege scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 566 (  "="	   PGNSP PGUID b t t 703 703	16 566 567 reltimeeq eqsel eqjoinsel ));
+DATA(insert OID = 566 (  "="	   PGNSP PGUID b t t 703 703	16 566 567 reltimeeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 567 (  "<>"	   PGNSP PGUID b f f 703 703	16 567 566 reltimene neqsel neqjoinsel ));
+DATA(insert OID = 567 (  "<>"	   PGNSP PGUID b f f 703 703	16 567 566 reltimene neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 568 (  "<"	   PGNSP PGUID b f f 703 703	16 569 571 reltimelt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 568 (  "<"	   PGNSP PGUID b f f 703 703	16 569 571 reltimelt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 569 (  ">"	   PGNSP PGUID b f f 703 703	16 568 570 reltimegt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 569 (  ">"	   PGNSP PGUID b f f 703 703	16 568 570 reltimegt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 570 (  "<="	   PGNSP PGUID b f f 703 703	16 571 569 reltimele scalarltsel scalarltjoinsel ));
+DATA(insert OID = 570 (  "<="	   PGNSP PGUID b f f 703 703	16 571 569 reltimele scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 571 (  ">="	   PGNSP PGUID b f f 703 703	16 570 568 reltimege scalargtsel scalargtjoinsel ));
+DATA(insert OID = 571 (  ">="	   PGNSP PGUID b f f 703 703	16 570 568 reltimege scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 572 (  "~="	   PGNSP PGUID b f f 704 704	16 572	 0 tintervalsame eqsel eqjoinsel ));
+DATA(insert OID = 572 (  "~="	   PGNSP PGUID b f f 704 704	16 572	 0 tintervalsame eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 573 (  "<<"	   PGNSP PGUID b f f 704 704	16	 0	 0 tintervalct - - ));
+DATA(insert OID = 573 (  "<<"	   PGNSP PGUID b f f 704 704	16	 0	 0 tintervalct - - "---"));
 DESCR("contains");
-DATA(insert OID = 574 (  "&&"	   PGNSP PGUID b f f 704 704	16 574	 0 tintervalov - - ));
+DATA(insert OID = 574 (  "&&"	   PGNSP PGUID b f f 704 704	16 574	 0 tintervalov - - "---"));
 DESCR("overlaps");
-DATA(insert OID = 575 (  "#="	   PGNSP PGUID b f f 704 703	16	 0 576 tintervalleneq - - ));
+DATA(insert OID = 575 (  "#="	   PGNSP PGUID b f f 704 703	16	 0 576 tintervalleneq - - "---"));
 DESCR("equal by length");
-DATA(insert OID = 576 (  "#<>"	   PGNSP PGUID b f f 704 703	16	 0 575 tintervallenne - - ));
+DATA(insert OID = 576 (  "#<>"	   PGNSP PGUID b f f 704 703	16	 0 575 tintervallenne - - "---"));
 DESCR("not equal by length");
-DATA(insert OID = 577 (  "#<"	   PGNSP PGUID b f f 704 703	16	 0 580 tintervallenlt - - ));
+DATA(insert OID = 577 (  "#<"	   PGNSP PGUID b f f 704 703	16	 0 580 tintervallenlt - - "---"));
 DESCR("less than by length");
-DATA(insert OID = 578 (  "#>"	   PGNSP PGUID b f f 704 703	16	 0 579 tintervallengt - - ));
+DATA(insert OID = 578 (  "#>"	   PGNSP PGUID b f f 704 703	16	 0 579 tintervallengt - - "---"));
 DESCR("greater than by length");
-DATA(insert OID = 579 (  "#<="	   PGNSP PGUID b f f 704 703	16	 0 578 tintervallenle - - ));
+DATA(insert OID = 579 (  "#<="	   PGNSP PGUID b f f 704 703	16	 0 578 tintervallenle - - "---"));
 DESCR("less than or equal by length");
-DATA(insert OID = 580 (  "#>="	   PGNSP PGUID b f f 704 703	16	 0 577 tintervallenge - - ));
+DATA(insert OID = 580 (  "#>="	   PGNSP PGUID b f f 704 703	16	 0 577 tintervallenge - - "---"));
 DESCR("greater than or equal by length");
-DATA(insert OID = 581 (  "+"	   PGNSP PGUID b f f 702 703 702	 0	 0 timepl - - ));
+DATA(insert OID = 581 (  "+"	   PGNSP PGUID b f f 702 703 702	 0	 0 timepl - - "---"));
 DESCR("add");
-DATA(insert OID = 582 (  "-"	   PGNSP PGUID b f f 702 703 702	 0	 0 timemi - - ));
+DATA(insert OID = 582 (  "-"	   PGNSP PGUID b f f 702 703 702	 0	 0 timemi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 583 (  "<?>"	   PGNSP PGUID b f f 702 704	16	 0	 0 intinterval - - ));
+DATA(insert OID = 583 (  "<?>"	   PGNSP PGUID b f f 702 704	16	 0	 0 intinterval - - "---"));
 DESCR("is contained by");
-DATA(insert OID = 584 (  "-"	   PGNSP PGUID l f f	 0 700 700	 0	 0 float4um - - ));
+DATA(insert OID = 584 (  "-"	   PGNSP PGUID l f f	 0 700 700	 0	 0 float4um - - "---"));
 DESCR("negate");
-DATA(insert OID = 585 (  "-"	   PGNSP PGUID l f f	 0 701 701	 0	 0 float8um - - ));
+DATA(insert OID = 585 (  "-"	   PGNSP PGUID l f f	 0 701 701	 0	 0 float8um - - "---"));
 DESCR("negate");
-DATA(insert OID = 586 (  "+"	   PGNSP PGUID b f f 700 700 700 586	 0 float4pl - - ));
+DATA(insert OID = 586 (  "+"	   PGNSP PGUID b f f 700 700 700 586	 0 float4pl - - "---"));
 DESCR("add");
-DATA(insert OID = 587 (  "-"	   PGNSP PGUID b f f 700 700 700	 0	 0 float4mi - - ));
+DATA(insert OID = 587 (  "-"	   PGNSP PGUID b f f 700 700 700	 0	 0 float4mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 588 (  "/"	   PGNSP PGUID b f f 700 700 700	 0	 0 float4div - - ));
+DATA(insert OID = 588 (  "/"	   PGNSP PGUID b f f 700 700 700	 0	 0 float4div - - "---"));
 DESCR("divide");
-DATA(insert OID = 589 (  "*"	   PGNSP PGUID b f f 700 700 700 589	 0 float4mul - - ));
+DATA(insert OID = 589 (  "*"	   PGNSP PGUID b f f 700 700 700 589	 0 float4mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 590 (  "@"	   PGNSP PGUID l f f	 0 700 700	 0	 0 float4abs - - ));
+DATA(insert OID = 590 (  "@"	   PGNSP PGUID l f f	 0 700 700	 0	 0 float4abs - - "---"));
 DESCR("absolute value");
-DATA(insert OID = 591 (  "+"	   PGNSP PGUID b f f 701 701 701 591	 0 float8pl - - ));
+DATA(insert OID = 591 (  "+"	   PGNSP PGUID b f f 701 701 701 591	 0 float8pl - - "---"));
 DESCR("add");
-DATA(insert OID = 592 (  "-"	   PGNSP PGUID b f f 701 701 701	 0	 0 float8mi - - ));
+DATA(insert OID = 592 (  "-"	   PGNSP PGUID b f f 701 701 701	 0	 0 float8mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 593 (  "/"	   PGNSP PGUID b f f 701 701 701	 0	 0 float8div - - ));
+DATA(insert OID = 593 (  "/"	   PGNSP PGUID b f f 701 701 701	 0	 0 float8div - - "---"));
 DESCR("divide");
-DATA(insert OID = 594 (  "*"	   PGNSP PGUID b f f 701 701 701 594	 0 float8mul - - ));
+DATA(insert OID = 594 (  "*"	   PGNSP PGUID b f f 701 701 701 594	 0 float8mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 595 (  "@"	   PGNSP PGUID l f f	 0 701 701	 0	 0 float8abs - - ));
+DATA(insert OID = 595 (  "@"	   PGNSP PGUID l f f	 0 701 701	 0	 0 float8abs - - "---"));
 DESCR("absolute value");
-DATA(insert OID = 596 (  "|/"	   PGNSP PGUID l f f	 0 701 701	 0	 0 dsqrt - - ));
+DATA(insert OID = 596 (  "|/"	   PGNSP PGUID l f f	 0 701 701	 0	 0 dsqrt - - "---"));
 DESCR("square root");
-DATA(insert OID = 597 (  "||/"	   PGNSP PGUID l f f	 0 701 701	 0	 0 dcbrt - - ));
+DATA(insert OID = 597 (  "||/"	   PGNSP PGUID l f f	 0 701 701	 0	 0 dcbrt - - "---"));
 DESCR("cube root");
-DATA(insert OID = 1284 (  "|"	   PGNSP PGUID l f f	 0 704 702	 0	 0 tintervalstart - - ));
+DATA(insert OID = 1284 (  "|"	   PGNSP PGUID l f f	 0 704 702	 0	 0 tintervalstart - - "---"));
 DESCR("start of interval");
-DATA(insert OID = 606 (  "<#>"	   PGNSP PGUID b f f 702 702 704	 0	 0 mktinterval - - ));
+DATA(insert OID = 606 (  "<#>"	   PGNSP PGUID b f f 702 702 704	 0	 0 mktinterval - - "---"));
 DESCR("convert to tinterval");
 
-DATA(insert OID = 607 (  "="	   PGNSP PGUID b t t	26	26	16 607 608 oideq eqsel eqjoinsel ));
+DATA(insert OID = 607 (  "="	   PGNSP PGUID b t t	26	26	16 607 608 oideq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 608 (  "<>"	   PGNSP PGUID b f f	26	26	16 608 607 oidne neqsel neqjoinsel ));
+DATA(insert OID = 608 (  "<>"	   PGNSP PGUID b f f	26	26	16 608 607 oidne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 609 (  "<"	   PGNSP PGUID b f f	26	26	16 610 612 oidlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 609 (  "<"	   PGNSP PGUID b f f	26	26	16 610 612 oidlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 610 (  ">"	   PGNSP PGUID b f f	26	26	16 609 611 oidgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 610 (  ">"	   PGNSP PGUID b f f	26	26	16 609 611 oidgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 611 (  "<="	   PGNSP PGUID b f f	26	26	16 612 610 oidle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 611 (  "<="	   PGNSP PGUID b f f	26	26	16 612 610 oidle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 612 (  ">="	   PGNSP PGUID b f f	26	26	16 611 609 oidge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 612 (  ">="	   PGNSP PGUID b f f	26	26	16 611 609 oidge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 644 (  "<>"	   PGNSP PGUID b f f	30	30	16 644 649 oidvectorne neqsel neqjoinsel ));
+DATA(insert OID = 644 (  "<>"	   PGNSP PGUID b f f	30	30	16 644 649 oidvectorne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 645 (  "<"	   PGNSP PGUID b f f	30	30	16 646 648 oidvectorlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 645 (  "<"	   PGNSP PGUID b f f	30	30	16 646 648 oidvectorlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 646 (  ">"	   PGNSP PGUID b f f	30	30	16 645 647 oidvectorgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 646 (  ">"	   PGNSP PGUID b f f	30	30	16 645 647 oidvectorgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 647 (  "<="	   PGNSP PGUID b f f	30	30	16 648 646 oidvectorle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 647 (  "<="	   PGNSP PGUID b f f	30	30	16 648 646 oidvectorle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 648 (  ">="	   PGNSP PGUID b f f	30	30	16 647 645 oidvectorge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 648 (  ">="	   PGNSP PGUID b f f	30	30	16 647 645 oidvectorge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 649 (  "="	   PGNSP PGUID b t t	30	30	16 649 644 oidvectoreq eqsel eqjoinsel ));
+DATA(insert OID = 649 (  "="	   PGNSP PGUID b t t	30	30	16 649 644 oidvectoreq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 
-DATA(insert OID = 613 (  "<->"	   PGNSP PGUID b f f 600 628 701	 0	 0 dist_pl - - ));
+DATA(insert OID = 613 (  "<->"	   PGNSP PGUID b f f 600 628 701	 0	 0 dist_pl - - "---"));
 DESCR("distance between");
-DATA(insert OID = 614 (  "<->"	   PGNSP PGUID b f f 600 601 701	 0	 0 dist_ps - - ));
+DATA(insert OID = 614 (  "<->"	   PGNSP PGUID b f f 600 601 701	 0	 0 dist_ps - - "---"));
 DESCR("distance between");
-DATA(insert OID = 615 (  "<->"	   PGNSP PGUID b f f 600 603 701	 0	 0 dist_pb - - ));
+DATA(insert OID = 615 (  "<->"	   PGNSP PGUID b f f 600 603 701	 0	 0 dist_pb - - "---"));
 DESCR("distance between");
-DATA(insert OID = 616 (  "<->"	   PGNSP PGUID b f f 601 628 701	 0	 0 dist_sl - - ));
+DATA(insert OID = 616 (  "<->"	   PGNSP PGUID b f f 601 628 701	 0	 0 dist_sl - - "---"));
 DESCR("distance between");
-DATA(insert OID = 617 (  "<->"	   PGNSP PGUID b f f 601 603 701	 0	 0 dist_sb - - ));
+DATA(insert OID = 617 (  "<->"	   PGNSP PGUID b f f 601 603 701	 0	 0 dist_sb - - "---"));
 DESCR("distance between");
-DATA(insert OID = 618 (  "<->"	   PGNSP PGUID b f f 600 602 701	 0	 0 dist_ppath - - ));
+DATA(insert OID = 618 (  "<->"	   PGNSP PGUID b f f 600 602 701	 0	 0 dist_ppath - - "---"));
 DESCR("distance between");
 
-DATA(insert OID = 620 (  "="	   PGNSP PGUID b t t	700  700	16 620 621 float4eq eqsel eqjoinsel ));
+DATA(insert OID = 620 (  "="	   PGNSP PGUID b t t	700  700	16 620 621 float4eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 621 (  "<>"	   PGNSP PGUID b f f	700  700	16 621 620 float4ne neqsel neqjoinsel ));
+DATA(insert OID = 621 (  "<>"	   PGNSP PGUID b f f	700  700	16 621 620 float4ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 622 (  "<"	   PGNSP PGUID b f f	700  700	16 623 625 float4lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 622 (  "<"	   PGNSP PGUID b f f	700  700	16 623 625 float4lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 623 (  ">"	   PGNSP PGUID b f f	700  700	16 622 624 float4gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 623 (  ">"	   PGNSP PGUID b f f	700  700	16 622 624 float4gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 624 (  "<="	   PGNSP PGUID b f f	700  700	16 625 623 float4le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 624 (  "<="	   PGNSP PGUID b f f	700  700	16 625 623 float4le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 625 (  ">="	   PGNSP PGUID b f f	700  700	16 624 622 float4ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 625 (  ">="	   PGNSP PGUID b f f	700  700	16 624 622 float4ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 630 (  "<>"	   PGNSP PGUID b f f	18	18		16 630	92	charne neqsel neqjoinsel ));
+DATA(insert OID = 630 (  "<>"	   PGNSP PGUID b f f	18	18		16 630	92	charne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 631 (  "<"	   PGNSP PGUID b f f	18	18	16 633 634 charlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 631 (  "<"	   PGNSP PGUID b f f	18	18	16 633 634 charlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 632 (  "<="	   PGNSP PGUID b f f	18	18	16 634 633 charle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 632 (  "<="	   PGNSP PGUID b f f	18	18	16 634 633 charle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 633 (  ">"	   PGNSP PGUID b f f	18	18	16 631 632 chargt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 633 (  ">"	   PGNSP PGUID b f f	18	18	16 631 632 chargt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 634 (  ">="	   PGNSP PGUID b f f	18	18	16 632 631 charge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 634 (  ">="	   PGNSP PGUID b f f	18	18	16 632 631 charge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 639 (  "~"	   PGNSP PGUID b f f	19	25	16 0 640 nameregexeq regexeqsel regexeqjoinsel ));
+DATA(insert OID = 639 (  "~"	   PGNSP PGUID b f f	19	25	16 0 640 nameregexeq regexeqsel regexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-sensitive");
 #define OID_NAME_REGEXEQ_OP		639
-DATA(insert OID = 640 (  "!~"	   PGNSP PGUID b f f	19	25	16 0 639 nameregexne regexnesel regexnejoinsel ));
+DATA(insert OID = 640 (  "!~"	   PGNSP PGUID b f f	19	25	16 0 639 nameregexne regexnesel regexnejoinsel "---"));
 DESCR("does not match regular expression, case-sensitive");
-DATA(insert OID = 641 (  "~"	   PGNSP PGUID b f f	25	25	16 0 642 textregexeq regexeqsel regexeqjoinsel ));
+DATA(insert OID = 641 (  "~"	   PGNSP PGUID b f f	25	25	16 0 642 textregexeq regexeqsel regexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-sensitive");
 #define OID_TEXT_REGEXEQ_OP		641
-DATA(insert OID = 642 (  "!~"	   PGNSP PGUID b f f	25	25	16 0 641 textregexne regexnesel regexnejoinsel ));
+DATA(insert OID = 642 (  "!~"	   PGNSP PGUID b f f	25	25	16 0 641 textregexne regexnesel regexnejoinsel "---"));
 DESCR("does not match regular expression, case-sensitive");
-DATA(insert OID = 643 (  "<>"	   PGNSP PGUID b f f	19	19	16 643 93 namene neqsel neqjoinsel ));
+DATA(insert OID = 643 (  "<>"	   PGNSP PGUID b f f	19	19	16 643 93 namene neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 654 (  "||"	   PGNSP PGUID b f f	25	25	25	 0 0 textcat - - ));
+DATA(insert OID = 654 (  "||"	   PGNSP PGUID b f f	25	25	25	 0 0 textcat - - "---"));
 DESCR("concatenate");
 
-DATA(insert OID = 660 (  "<"	   PGNSP PGUID b f f	19	19	16 662 663 namelt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 660 (  "<"	   PGNSP PGUID b f f	19	19	16 662 663 namelt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 661 (  "<="	   PGNSP PGUID b f f	19	19	16 663 662 namele scalarltsel scalarltjoinsel ));
+DATA(insert OID = 661 (  "<="	   PGNSP PGUID b f f	19	19	16 663 662 namele scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 662 (  ">"	   PGNSP PGUID b f f	19	19	16 660 661 namegt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 662 (  ">"	   PGNSP PGUID b f f	19	19	16 660 661 namegt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 663 (  ">="	   PGNSP PGUID b f f	19	19	16 661 660 namege scalargtsel scalargtjoinsel ));
+DATA(insert OID = 663 (  ">="	   PGNSP PGUID b f f	19	19	16 661 660 namege scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 664 (  "<"	   PGNSP PGUID b f f	25	25	16 666 667 text_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 664 (  "<"	   PGNSP PGUID b f f	25	25	16 666 667 text_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 665 (  "<="	   PGNSP PGUID b f f	25	25	16 667 666 text_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 665 (  "<="	   PGNSP PGUID b f f	25	25	16 667 666 text_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 666 (  ">"	   PGNSP PGUID b f f	25	25	16 664 665 text_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 666 (  ">"	   PGNSP PGUID b f f	25	25	16 664 665 text_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 667 (  ">="	   PGNSP PGUID b f f	25	25	16 665 664 text_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 667 (  ">="	   PGNSP PGUID b f f	25	25	16 665 664 text_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 670 (  "="	   PGNSP PGUID b t t	701  701	16 670 671 float8eq eqsel eqjoinsel ));
+DATA(insert OID = 670 (  "="	   PGNSP PGUID b t t	701  701	16 670 671 float8eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 671 (  "<>"	   PGNSP PGUID b f f	701  701	16 671 670 float8ne neqsel neqjoinsel ));
+DATA(insert OID = 671 (  "<>"	   PGNSP PGUID b f f	701  701	16 671 670 float8ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 672 (  "<"	   PGNSP PGUID b f f	701  701	16 674 675 float8lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 672 (  "<"	   PGNSP PGUID b f f	701  701	16 674 675 float8lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define Float8LessOperator	672
-DATA(insert OID = 673 (  "<="	   PGNSP PGUID b f f	701  701	16 675 674 float8le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 673 (  "<="	   PGNSP PGUID b f f	701  701	16 675 674 float8le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 674 (  ">"	   PGNSP PGUID b f f	701  701	16 672 673 float8gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 674 (  ">"	   PGNSP PGUID b f f	701  701	16 672 673 float8gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 675 (  ">="	   PGNSP PGUID b f f	701  701	16 673 672 float8ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 675 (  ">="	   PGNSP PGUID b f f	701  701	16 673 672 float8ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 682 (  "@"	   PGNSP PGUID l f f	 0	21	21	 0	 0 int2abs - - ));
+DATA(insert OID = 682 (  "@"	   PGNSP PGUID l f f	 0	21	21	 0	 0 int2abs - - "---"));
 DESCR("absolute value");
-DATA(insert OID = 684 (  "+"	   PGNSP PGUID b f f	20	20	20 684	 0 int8pl - - ));
+DATA(insert OID = 684 (  "+"	   PGNSP PGUID b f f	20	20	20 684	 0 int8pl - - "---"));
 DESCR("add");
-DATA(insert OID = 685 (  "-"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8mi - - ));
+DATA(insert OID = 685 (  "-"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 686 (  "*"	   PGNSP PGUID b f f	20	20	20 686	 0 int8mul - - ));
+DATA(insert OID = 686 (  "*"	   PGNSP PGUID b f f	20	20	20 686	 0 int8mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 687 (  "/"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8div - - ));
+DATA(insert OID = 687 (  "/"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8div - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 688 (  "+"	   PGNSP PGUID b f f	20	23	20 692	 0 int84pl - - ));
+DATA(insert OID = 688 (  "+"	   PGNSP PGUID b f f	20	23	20 692	 0 int84pl - - "---"));
 DESCR("add");
-DATA(insert OID = 689 (  "-"	   PGNSP PGUID b f f	20	23	20	 0	 0 int84mi - - ));
+DATA(insert OID = 689 (  "-"	   PGNSP PGUID b f f	20	23	20	 0	 0 int84mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 690 (  "*"	   PGNSP PGUID b f f	20	23	20 694	 0 int84mul - - ));
+DATA(insert OID = 690 (  "*"	   PGNSP PGUID b f f	20	23	20 694	 0 int84mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 691 (  "/"	   PGNSP PGUID b f f	20	23	20	 0	 0 int84div - - ));
+DATA(insert OID = 691 (  "/"	   PGNSP PGUID b f f	20	23	20	 0	 0 int84div - - "---"));
 DESCR("divide");
-DATA(insert OID = 692 (  "+"	   PGNSP PGUID b f f	23	20	20 688	 0 int48pl - - ));
+DATA(insert OID = 692 (  "+"	   PGNSP PGUID b f f	23	20	20 688	 0 int48pl - - "---"));
 DESCR("add");
-DATA(insert OID = 693 (  "-"	   PGNSP PGUID b f f	23	20	20	 0	 0 int48mi - - ));
+DATA(insert OID = 693 (  "-"	   PGNSP PGUID b f f	23	20	20	 0	 0 int48mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 694 (  "*"	   PGNSP PGUID b f f	23	20	20 690	 0 int48mul - - ));
+DATA(insert OID = 694 (  "*"	   PGNSP PGUID b f f	23	20	20 690	 0 int48mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 695 (  "/"	   PGNSP PGUID b f f	23	20	20	 0	 0 int48div - - ));
+DATA(insert OID = 695 (  "/"	   PGNSP PGUID b f f	23	20	20	 0	 0 int48div - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 818 (  "+"	   PGNSP PGUID b f f	20	21	20 822	 0 int82pl - - ));
+DATA(insert OID = 818 (  "+"	   PGNSP PGUID b f f	20	21	20 822	 0 int82pl - - "---"));
 DESCR("add");
-DATA(insert OID = 819 (  "-"	   PGNSP PGUID b f f	20	21	20	 0	 0 int82mi - - ));
+DATA(insert OID = 819 (  "-"	   PGNSP PGUID b f f	20	21	20	 0	 0 int82mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 820 (  "*"	   PGNSP PGUID b f f	20	21	20 824	 0 int82mul - - ));
+DATA(insert OID = 820 (  "*"	   PGNSP PGUID b f f	20	21	20 824	 0 int82mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 821 (  "/"	   PGNSP PGUID b f f	20	21	20	 0	 0 int82div - - ));
+DATA(insert OID = 821 (  "/"	   PGNSP PGUID b f f	20	21	20	 0	 0 int82div - - "---"));
 DESCR("divide");
-DATA(insert OID = 822 (  "+"	   PGNSP PGUID b f f	21	20	20 818	 0 int28pl - - ));
+DATA(insert OID = 822 (  "+"	   PGNSP PGUID b f f	21	20	20 818	 0 int28pl - - "---"));
 DESCR("add");
-DATA(insert OID = 823 (  "-"	   PGNSP PGUID b f f	21	20	20	 0	 0 int28mi - - ));
+DATA(insert OID = 823 (  "-"	   PGNSP PGUID b f f	21	20	20	 0	 0 int28mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 824 (  "*"	   PGNSP PGUID b f f	21	20	20 820	 0 int28mul - - ));
+DATA(insert OID = 824 (  "*"	   PGNSP PGUID b f f	21	20	20 820	 0 int28mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 825 (  "/"	   PGNSP PGUID b f f	21	20	20	 0	 0 int28div - - ));
+DATA(insert OID = 825 (  "/"	   PGNSP PGUID b f f	21	20	20	 0	 0 int28div - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 706 (  "<->"	   PGNSP PGUID b f f 603 603 701 706	 0 box_distance - - ));
+DATA(insert OID = 706 (  "<->"	   PGNSP PGUID b f f 603 603 701 706	 0 box_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 707 (  "<->"	   PGNSP PGUID b f f 602 602 701 707	 0 path_distance - - ));
+DATA(insert OID = 707 (  "<->"	   PGNSP PGUID b f f 602 602 701 707	 0 path_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 708 (  "<->"	   PGNSP PGUID b f f 628 628 701 708	 0 line_distance - - ));
+DATA(insert OID = 708 (  "<->"	   PGNSP PGUID b f f 628 628 701 708	 0 line_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 709 (  "<->"	   PGNSP PGUID b f f 601 601 701 709	 0 lseg_distance - - ));
+DATA(insert OID = 709 (  "<->"	   PGNSP PGUID b f f 601 601 701 709	 0 lseg_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 712 (  "<->"	   PGNSP PGUID b f f 604 604 701 712	 0 poly_distance - - ));
+DATA(insert OID = 712 (  "<->"	   PGNSP PGUID b f f 604 604 701 712	 0 poly_distance - - "---"));
 DESCR("distance between");
 
-DATA(insert OID = 713 (  "<>"	   PGNSP PGUID b f f 600 600	16 713 510 point_ne neqsel neqjoinsel ));
+DATA(insert OID = 713 (  "<>"	   PGNSP PGUID b f f 600 600	16 713 510 point_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
 /* add translation/rotation/scaling operators for geometric types. - thomas 97/05/10 */
-DATA(insert OID = 731 (  "+"	   PGNSP PGUID b f f	600  600	600  731  0 point_add - - ));
+DATA(insert OID = 731 (  "+"	   PGNSP PGUID b f f	600  600	600  731  0 point_add - - "---"));
 DESCR("add points (translate)");
-DATA(insert OID = 732 (  "-"	   PGNSP PGUID b f f	600  600	600    0  0 point_sub - - ));
+DATA(insert OID = 732 (  "-"	   PGNSP PGUID b f f	600  600	600    0  0 point_sub - - "---"));
 DESCR("subtract points (translate)");
-DATA(insert OID = 733 (  "*"	   PGNSP PGUID b f f	600  600	600  733  0 point_mul - - ));
+DATA(insert OID = 733 (  "*"	   PGNSP PGUID b f f	600  600	600  733  0 point_mul - - "---"));
 DESCR("multiply points (scale/rotate)");
-DATA(insert OID = 734 (  "/"	   PGNSP PGUID b f f	600  600	600    0  0 point_div - - ));
+DATA(insert OID = 734 (  "/"	   PGNSP PGUID b f f	600  600	600    0  0 point_div - - "---"));
 DESCR("divide points (scale/rotate)");
-DATA(insert OID = 735 (  "+"	   PGNSP PGUID b f f	602  602	602  735  0 path_add - - ));
+DATA(insert OID = 735 (  "+"	   PGNSP PGUID b f f	602  602	602  735  0 path_add - - "---"));
 DESCR("concatenate");
-DATA(insert OID = 736 (  "+"	   PGNSP PGUID b f f	602  600	602    0  0 path_add_pt - - ));
+DATA(insert OID = 736 (  "+"	   PGNSP PGUID b f f	602  600	602    0  0 path_add_pt - - "---"));
 DESCR("add (translate path)");
-DATA(insert OID = 737 (  "-"	   PGNSP PGUID b f f	602  600	602    0  0 path_sub_pt - - ));
+DATA(insert OID = 737 (  "-"	   PGNSP PGUID b f f	602  600	602    0  0 path_sub_pt - - "---"));
 DESCR("subtract (translate path)");
-DATA(insert OID = 738 (  "*"	   PGNSP PGUID b f f	602  600	602    0  0 path_mul_pt - - ));
+DATA(insert OID = 738 (  "*"	   PGNSP PGUID b f f	602  600	602    0  0 path_mul_pt - - "---"));
 DESCR("multiply (rotate/scale path)");
-DATA(insert OID = 739 (  "/"	   PGNSP PGUID b f f	602  600	602    0  0 path_div_pt - - ));
+DATA(insert OID = 739 (  "/"	   PGNSP PGUID b f f	602  600	602    0  0 path_div_pt - - "---"));
 DESCR("divide (rotate/scale path)");
-DATA(insert OID = 755 (  "@>"	   PGNSP PGUID b f f	602  600	 16  512  0 path_contain_pt - - ));
+DATA(insert OID = 755 (  "@>"	   PGNSP PGUID b f f	602  600	 16  512  0 path_contain_pt - - "---"));
 DESCR("contains");
-DATA(insert OID = 756 (  "<@"	   PGNSP PGUID b f f	600  604	 16  757  0 pt_contained_poly contsel contjoinsel ));
+DATA(insert OID = 756 (  "<@"	   PGNSP PGUID b f f	600  604	 16  757  0 pt_contained_poly contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 757 (  "@>"	   PGNSP PGUID b f f	604  600	 16  756  0 poly_contain_pt contsel contjoinsel ));
+DATA(insert OID = 757 (  "@>"	   PGNSP PGUID b f f	604  600	 16  756  0 poly_contain_pt contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 758 (  "<@"	   PGNSP PGUID b f f	600  718	 16  759  0 pt_contained_circle contsel contjoinsel ));
+DATA(insert OID = 758 (  "<@"	   PGNSP PGUID b f f	600  718	 16  759  0 pt_contained_circle contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 759 (  "@>"	   PGNSP PGUID b f f	718  600	 16  758  0 circle_contain_pt contsel contjoinsel ));
+DATA(insert OID = 759 (  "@>"	   PGNSP PGUID b f f	718  600	 16  758  0 circle_contain_pt contsel contjoinsel "---"));
 DESCR("contains");
 
-DATA(insert OID = 773 (  "@"	   PGNSP PGUID l f f	 0	23	23	 0	 0 int4abs - - ));
+DATA(insert OID = 773 (  "@"	   PGNSP PGUID l f f	 0	23	23	 0	 0 int4abs - - "---"));
 DESCR("absolute value");
 
 /* additional operators for geometric types - thomas 1997-07-09 */
-DATA(insert OID =  792 (  "="	   PGNSP PGUID b f f	602  602	 16  792  0 path_n_eq eqsel eqjoinsel ));
+DATA(insert OID =  792 (  "="	   PGNSP PGUID b f f	602  602	 16  792  0 path_n_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  793 (  "<"	   PGNSP PGUID b f f	602  602	 16  794  0 path_n_lt - - ));
+DATA(insert OID =  793 (  "<"	   PGNSP PGUID b f f	602  602	 16  794  0 path_n_lt - - "---"));
 DESCR("less than");
-DATA(insert OID =  794 (  ">"	   PGNSP PGUID b f f	602  602	 16  793  0 path_n_gt - - ));
+DATA(insert OID =  794 (  ">"	   PGNSP PGUID b f f	602  602	 16  793  0 path_n_gt - - "---"));
 DESCR("greater than");
-DATA(insert OID =  795 (  "<="	   PGNSP PGUID b f f	602  602	 16  796  0 path_n_le - - ));
+DATA(insert OID =  795 (  "<="	   PGNSP PGUID b f f	602  602	 16  796  0 path_n_le - - "---"));
 DESCR("less than or equal");
-DATA(insert OID =  796 (  ">="	   PGNSP PGUID b f f	602  602	 16  795  0 path_n_ge - - ));
+DATA(insert OID =  796 (  ">="	   PGNSP PGUID b f f	602  602	 16  795  0 path_n_ge - - "---"));
 DESCR("greater than or equal");
-DATA(insert OID =  797 (  "#"	   PGNSP PGUID l f f	0	 602	 23    0  0 path_npoints - - ));
+DATA(insert OID =  797 (  "#"	   PGNSP PGUID l f f	0	 602	 23    0  0 path_npoints - - "---"));
 DESCR("number of points");
-DATA(insert OID =  798 (  "?#"	   PGNSP PGUID b f f	602  602	 16    0  0 path_inter - - ));
+DATA(insert OID =  798 (  "?#"	   PGNSP PGUID b f f	602  602	 16    0  0 path_inter - - "---"));
 DESCR("intersect");
-DATA(insert OID =  799 (  "@-@"    PGNSP PGUID l f f	0	 602	701    0  0 path_length - - ));
+DATA(insert OID =  799 (  "@-@"    PGNSP PGUID l f f	0	 602	701    0  0 path_length - - "---"));
 DESCR("sum of path segment lengths");
-DATA(insert OID =  800 (  ">^"	   PGNSP PGUID b f f	603  603	 16    0  0 box_above_eq positionsel positionjoinsel ));
+DATA(insert OID =  800 (  ">^"	   PGNSP PGUID b f f	603  603	 16    0  0 box_above_eq positionsel positionjoinsel "---"));
 DESCR("is above (allows touching)");
-DATA(insert OID =  801 (  "<^"	   PGNSP PGUID b f f	603  603	 16    0  0 box_below_eq positionsel positionjoinsel ));
+DATA(insert OID =  801 (  "<^"	   PGNSP PGUID b f f	603  603	 16    0  0 box_below_eq positionsel positionjoinsel "---"));
 DESCR("is below (allows touching)");
-DATA(insert OID =  802 (  "?#"	   PGNSP PGUID b f f	603  603	 16    0  0 box_overlap areasel areajoinsel ));
+DATA(insert OID =  802 (  "?#"	   PGNSP PGUID b f f	603  603	 16    0  0 box_overlap areasel areajoinsel "---"));
 DESCR("deprecated, use && instead");
-DATA(insert OID =  803 (  "#"	   PGNSP PGUID b f f	603  603	603    0  0 box_intersect - - ));
+DATA(insert OID =  803 (  "#"	   PGNSP PGUID b f f	603  603	603    0  0 box_intersect - - "---"));
 DESCR("box intersection");
-DATA(insert OID =  804 (  "+"	   PGNSP PGUID b f f	603  600	603    0  0 box_add - - ));
+DATA(insert OID =  804 (  "+"	   PGNSP PGUID b f f	603  600	603    0  0 box_add - - "---"));
 DESCR("add point to box (translate)");
-DATA(insert OID =  805 (  "-"	   PGNSP PGUID b f f	603  600	603    0  0 box_sub - - ));
+DATA(insert OID =  805 (  "-"	   PGNSP PGUID b f f	603  600	603    0  0 box_sub - - "---"));
 DESCR("subtract point from box (translate)");
-DATA(insert OID =  806 (  "*"	   PGNSP PGUID b f f	603  600	603    0  0 box_mul - - ));
+DATA(insert OID =  806 (  "*"	   PGNSP PGUID b f f	603  600	603    0  0 box_mul - - "---"));
 DESCR("multiply box by point (scale)");
-DATA(insert OID =  807 (  "/"	   PGNSP PGUID b f f	603  600	603    0  0 box_div - - ));
+DATA(insert OID =  807 (  "/"	   PGNSP PGUID b f f	603  600	603    0  0 box_div - - "---"));
 DESCR("divide box by point (scale)");
-DATA(insert OID =  808 (  "?-"	   PGNSP PGUID b f f	600  600	 16  808  0 point_horiz - - ));
+DATA(insert OID =  808 (  "?-"	   PGNSP PGUID b f f	600  600	 16  808  0 point_horiz - - "---"));
 DESCR("horizontally aligned");
-DATA(insert OID =  809 (  "?|"	   PGNSP PGUID b f f	600  600	 16  809  0 point_vert - - ));
+DATA(insert OID =  809 (  "?|"	   PGNSP PGUID b f f	600  600	 16  809  0 point_vert - - "---"));
 DESCR("vertically aligned");
 
-DATA(insert OID = 811 (  "="	   PGNSP PGUID b t f 704 704	16 811 812 tintervaleq eqsel eqjoinsel ));
+DATA(insert OID = 811 (  "="	   PGNSP PGUID b t f 704 704	16 811 812 tintervaleq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 812 (  "<>"	   PGNSP PGUID b f f 704 704	16 812 811 tintervalne neqsel neqjoinsel ));
+DATA(insert OID = 812 (  "<>"	   PGNSP PGUID b f f 704 704	16 812 811 tintervalne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 813 (  "<"	   PGNSP PGUID b f f 704 704	16 814 816 tintervallt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 813 (  "<"	   PGNSP PGUID b f f 704 704	16 814 816 tintervallt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 814 (  ">"	   PGNSP PGUID b f f 704 704	16 813 815 tintervalgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 814 (  ">"	   PGNSP PGUID b f f 704 704	16 813 815 tintervalgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 815 (  "<="	   PGNSP PGUID b f f 704 704	16 816 814 tintervalle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 815 (  "<="	   PGNSP PGUID b f f 704 704	16 816 814 tintervalle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 816 (  ">="	   PGNSP PGUID b f f 704 704	16 815 813 tintervalge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 816 (  ">="	   PGNSP PGUID b f f 704 704	16 815 813 tintervalge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 843 (  "*"	   PGNSP PGUID b f f	790  700	790 845   0 cash_mul_flt4 - - ));
+DATA(insert OID = 843 (  "*"	   PGNSP PGUID b f f	790  700	790 845   0 cash_mul_flt4 - - "---"));
 DESCR("multiply");
-DATA(insert OID = 844 (  "/"	   PGNSP PGUID b f f	790  700	790   0   0 cash_div_flt4 - - ));
+DATA(insert OID = 844 (  "/"	   PGNSP PGUID b f f	790  700	790   0   0 cash_div_flt4 - - "---"));
 DESCR("divide");
-DATA(insert OID = 845 (  "*"	   PGNSP PGUID b f f	700  790	790 843   0 flt4_mul_cash - - ));
+DATA(insert OID = 845 (  "*"	   PGNSP PGUID b f f	700  790	790 843   0 flt4_mul_cash - - "---"));
 DESCR("multiply");
 
-DATA(insert OID = 900 (  "="	   PGNSP PGUID b t f	790  790	16 900 901 cash_eq eqsel eqjoinsel ));
+DATA(insert OID = 900 (  "="	   PGNSP PGUID b t f	790  790	16 900 901 cash_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 901 (  "<>"	   PGNSP PGUID b f f	790  790	16 901 900 cash_ne neqsel neqjoinsel ));
+DATA(insert OID = 901 (  "<>"	   PGNSP PGUID b f f	790  790	16 901 900 cash_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 902 (  "<"	   PGNSP PGUID b f f	790  790	16 903 905 cash_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 902 (  "<"	   PGNSP PGUID b f f	790  790	16 903 905 cash_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 903 (  ">"	   PGNSP PGUID b f f	790  790	16 902 904 cash_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 903 (  ">"	   PGNSP PGUID b f f	790  790	16 902 904 cash_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 904 (  "<="	   PGNSP PGUID b f f	790  790	16 905 903 cash_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 904 (  "<="	   PGNSP PGUID b f f	790  790	16 905 903 cash_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 905 (  ">="	   PGNSP PGUID b f f	790  790	16 904 902 cash_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 905 (  ">="	   PGNSP PGUID b f f	790  790	16 904 902 cash_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 906 (  "+"	   PGNSP PGUID b f f	790  790	790 906   0 cash_pl - - ));
+DATA(insert OID = 906 (  "+"	   PGNSP PGUID b f f	790  790	790 906   0 cash_pl - - "---"));
 DESCR("add");
-DATA(insert OID = 907 (  "-"	   PGNSP PGUID b f f	790  790	790   0   0 cash_mi - - ));
+DATA(insert OID = 907 (  "-"	   PGNSP PGUID b f f	790  790	790   0   0 cash_mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 908 (  "*"	   PGNSP PGUID b f f	790  701	790 916   0 cash_mul_flt8 - - ));
+DATA(insert OID = 908 (  "*"	   PGNSP PGUID b f f	790  701	790 916   0 cash_mul_flt8 - - "---"));
 DESCR("multiply");
-DATA(insert OID = 909 (  "/"	   PGNSP PGUID b f f	790  701	790   0   0 cash_div_flt8 - - ));
+DATA(insert OID = 909 (  "/"	   PGNSP PGUID b f f	790  701	790   0   0 cash_div_flt8 - - "---"));
 DESCR("divide");
-DATA(insert OID = 912 (  "*"	   PGNSP PGUID b f f	790  23		790 917   0 cash_mul_int4 - - ));
+DATA(insert OID = 912 (  "*"	   PGNSP PGUID b f f	790  23		790 917   0 cash_mul_int4 - - "---"));
 DESCR("multiply");
-DATA(insert OID = 913 (  "/"	   PGNSP PGUID b f f	790  23		790   0   0 cash_div_int4 - - ));
+DATA(insert OID = 913 (  "/"	   PGNSP PGUID b f f	790  23		790   0   0 cash_div_int4 - - "---"));
 DESCR("divide");
-DATA(insert OID = 914 (  "*"	   PGNSP PGUID b f f	790  21		790 918   0 cash_mul_int2 - - ));
+DATA(insert OID = 914 (  "*"	   PGNSP PGUID b f f	790  21		790 918   0 cash_mul_int2 - - "---"));
 DESCR("multiply");
-DATA(insert OID = 915 (  "/"	   PGNSP PGUID b f f	790  21		790   0   0 cash_div_int2 - - ));
+DATA(insert OID = 915 (  "/"	   PGNSP PGUID b f f	790  21		790   0   0 cash_div_int2 - - "---"));
 DESCR("divide");
-DATA(insert OID = 916 (  "*"	   PGNSP PGUID b f f	701  790	790 908   0 flt8_mul_cash - - ));
+DATA(insert OID = 916 (  "*"	   PGNSP PGUID b f f	701  790	790 908   0 flt8_mul_cash - - "---"));
 DESCR("multiply");
-DATA(insert OID = 917 (  "*"	   PGNSP PGUID b f f	23	790		790 912   0 int4_mul_cash - - ));
+DATA(insert OID = 917 (  "*"	   PGNSP PGUID b f f	23	790		790 912   0 int4_mul_cash - - "---"));
 DESCR("multiply");
-DATA(insert OID = 918 (  "*"	   PGNSP PGUID b f f	21	790		790 914   0 int2_mul_cash - - ));
+DATA(insert OID = 918 (  "*"	   PGNSP PGUID b f f	21	790		790 914   0 int2_mul_cash - - "---"));
 DESCR("multiply");
-DATA(insert OID = 3825 ( "/"	   PGNSP PGUID b f f	790 790		701   0   0 cash_div_cash - - ));
+DATA(insert OID = 3825 ( "/"	   PGNSP PGUID b f f	790 790		701   0   0 cash_div_cash - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 965 (  "^"	   PGNSP PGUID b f f	701  701	701 0 0 dpow - - ));
+DATA(insert OID = 965 (  "^"	   PGNSP PGUID b f f	701  701	701 0 0 dpow - - "---"));
 DESCR("exponentiation");
-DATA(insert OID = 966 (  "+"	   PGNSP PGUID b f f 1034 1033 1034 0 0 aclinsert - - ));
+DATA(insert OID = 966 (  "+"	   PGNSP PGUID b f f 1034 1033 1034 0 0 aclinsert - - "---"));
 DESCR("add/update ACL item");
-DATA(insert OID = 967 (  "-"	   PGNSP PGUID b f f 1034 1033 1034 0 0 aclremove - - ));
+DATA(insert OID = 967 (  "-"	   PGNSP PGUID b f f 1034 1033 1034 0 0 aclremove - - "---"));
 DESCR("remove ACL item");
-DATA(insert OID = 968 (  "@>"	   PGNSP PGUID b f f 1034 1033	 16 0 0 aclcontains - - ));
+DATA(insert OID = 968 (  "@>"	   PGNSP PGUID b f f 1034 1033	 16 0 0 aclcontains - - "---"));
 DESCR("contains");
-DATA(insert OID = 974 (  "="	   PGNSP PGUID b f t 1033 1033	 16 974 0 aclitemeq eqsel eqjoinsel ));
+DATA(insert OID = 974 (  "="	   PGNSP PGUID b f t 1033 1033	 16 974 0 aclitemeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 
 /* additional geometric operators - thomas 1997-07-09 */
-DATA(insert OID =  969 (  "@@"	   PGNSP PGUID l f f	0  601	600    0  0 lseg_center - - ));
+DATA(insert OID =  969 (  "@@"	   PGNSP PGUID l f f	0  601	600    0  0 lseg_center - - "---"));
 DESCR("center of");
-DATA(insert OID =  970 (  "@@"	   PGNSP PGUID l f f	0  602	600    0  0 path_center - - ));
+DATA(insert OID =  970 (  "@@"	   PGNSP PGUID l f f	0  602	600    0  0 path_center - - "---"));
 DESCR("center of");
-DATA(insert OID =  971 (  "@@"	   PGNSP PGUID l f f	0  604	600    0  0 poly_center - - ));
+DATA(insert OID =  971 (  "@@"	   PGNSP PGUID l f f	0  604	600    0  0 poly_center - - "---"));
 DESCR("center of");
 
-DATA(insert OID = 1054 ( "="	   PGNSP PGUID b t t 1042 1042	 16 1054 1057 bpchareq eqsel eqjoinsel ));
+DATA(insert OID = 1054 ( "="	   PGNSP PGUID b t t 1042 1042	 16 1054 1057 bpchareq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 
-DATA(insert OID = 1055 ( "~"	   PGNSP PGUID b f f 1042 25	 16    0 1056 bpcharregexeq regexeqsel regexeqjoinsel ));
+DATA(insert OID = 1055 ( "~"	   PGNSP PGUID b f f 1042 25	 16    0 1056 bpcharregexeq regexeqsel regexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-sensitive");
 #define OID_BPCHAR_REGEXEQ_OP		1055
-DATA(insert OID = 1056 ( "!~"	   PGNSP PGUID b f f 1042 25	 16    0 1055 bpcharregexne regexnesel regexnejoinsel ));
+DATA(insert OID = 1056 ( "!~"	   PGNSP PGUID b f f 1042 25	 16    0 1055 bpcharregexne regexnesel regexnejoinsel "---"));
 DESCR("does not match regular expression, case-sensitive");
-DATA(insert OID = 1057 ( "<>"	   PGNSP PGUID b f f 1042 1042	 16 1057 1054 bpcharne neqsel neqjoinsel ));
+DATA(insert OID = 1057 ( "<>"	   PGNSP PGUID b f f 1042 1042	 16 1057 1054 bpcharne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1058 ( "<"	   PGNSP PGUID b f f 1042 1042	 16 1060 1061 bpcharlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1058 ( "<"	   PGNSP PGUID b f f 1042 1042	 16 1060 1061 bpcharlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1059 ( "<="	   PGNSP PGUID b f f 1042 1042	 16 1061 1060 bpcharle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1059 ( "<="	   PGNSP PGUID b f f 1042 1042	 16 1061 1060 bpcharle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1060 ( ">"	   PGNSP PGUID b f f 1042 1042	 16 1058 1059 bpchargt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1060 ( ">"	   PGNSP PGUID b f f 1042 1042	 16 1058 1059 bpchargt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1061 ( ">="	   PGNSP PGUID b f f 1042 1042	 16 1059 1058 bpcharge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1061 ( ">="	   PGNSP PGUID b f f 1042 1042	 16 1059 1058 bpcharge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* generic array comparison operators */
-DATA(insert OID = 1070 (  "="	   PGNSP PGUID b t t 2277 2277 16 1070 1071 array_eq eqsel eqjoinsel ));
+DATA(insert OID = 1070 (  "="	   PGNSP PGUID b t t 2277 2277 16 1070 1071 array_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define ARRAY_EQ_OP 1070
-DATA(insert OID = 1071 (  "<>"	   PGNSP PGUID b f f 2277 2277 16 1071 1070 array_ne neqsel neqjoinsel ));
+DATA(insert OID = 1071 (  "<>"	   PGNSP PGUID b f f 2277 2277 16 1071 1070 array_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1072 (  "<"	   PGNSP PGUID b f f 2277 2277 16 1073 1075 array_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1072 (  "<"	   PGNSP PGUID b f f 2277 2277 16 1073 1075 array_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define ARRAY_LT_OP 1072
-DATA(insert OID = 1073 (  ">"	   PGNSP PGUID b f f 2277 2277 16 1072 1074 array_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1073 (  ">"	   PGNSP PGUID b f f 2277 2277 16 1072 1074 array_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
 #define ARRAY_GT_OP 1073
-DATA(insert OID = 1074 (  "<="	   PGNSP PGUID b f f 2277 2277 16 1075 1073 array_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1074 (  "<="	   PGNSP PGUID b f f 2277 2277 16 1075 1073 array_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1075 (  ">="	   PGNSP PGUID b f f 2277 2277 16 1074 1072 array_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1075 (  ">="	   PGNSP PGUID b f f 2277 2277 16 1074 1072 array_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* date operators */
-DATA(insert OID = 1076 ( "+"	   PGNSP PGUID b f f	1082	1186 1114 2551 0 date_pl_interval - - ));
+DATA(insert OID = 1076 ( "+"	   PGNSP PGUID b f f	1082	1186 1114 2551 0 date_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 1077 ( "-"	   PGNSP PGUID b f f	1082	1186 1114 0 0 date_mi_interval - - ));
+DATA(insert OID = 1077 ( "-"	   PGNSP PGUID b f f	1082	1186 1114 0 0 date_mi_interval - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1093 ( "="	   PGNSP PGUID b t t	1082	1082   16 1093 1094 date_eq eqsel eqjoinsel ));
+DATA(insert OID = 1093 ( "="	   PGNSP PGUID b t t	1082	1082   16 1093 1094 date_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1094 ( "<>"	   PGNSP PGUID b f f	1082	1082   16 1094 1093 date_ne neqsel neqjoinsel ));
+DATA(insert OID = 1094 ( "<>"	   PGNSP PGUID b f f	1082	1082   16 1094 1093 date_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1095 ( "<"	   PGNSP PGUID b f f	1082	1082   16 1097 1098 date_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1095 ( "<"	   PGNSP PGUID b f f	1082	1082   16 1097 1098 date_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1096 ( "<="	   PGNSP PGUID b f f	1082	1082   16 1098 1097 date_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1096 ( "<="	   PGNSP PGUID b f f	1082	1082   16 1098 1097 date_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1097 ( ">"	   PGNSP PGUID b f f	1082	1082   16 1095 1096 date_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1097 ( ">"	   PGNSP PGUID b f f	1082	1082   16 1095 1096 date_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1098 ( ">="	   PGNSP PGUID b f f	1082	1082   16 1096 1095 date_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1098 ( ">="	   PGNSP PGUID b f f	1082	1082   16 1096 1095 date_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 1099 ( "-"	   PGNSP PGUID b f f	1082	1082   23 0 0 date_mi - - ));
+DATA(insert OID = 1099 ( "-"	   PGNSP PGUID b f f	1082	1082   23 0 0 date_mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1100 ( "+"	   PGNSP PGUID b f f	1082	  23 1082 2555 0 date_pli - - ));
+DATA(insert OID = 1100 ( "+"	   PGNSP PGUID b f f	1082	  23 1082 2555 0 date_pli - - "---"));
 DESCR("add");
-DATA(insert OID = 1101 ( "-"	   PGNSP PGUID b f f	1082	  23 1082 0 0 date_mii - - ));
+DATA(insert OID = 1101 ( "-"	   PGNSP PGUID b f f	1082	  23 1082 0 0 date_mii - - "---"));
 DESCR("subtract");
 
 /* time operators */
-DATA(insert OID = 1108 ( "="	   PGNSP PGUID b t t	1083	1083  16 1108 1109 time_eq eqsel eqjoinsel ));
+DATA(insert OID = 1108 ( "="	   PGNSP PGUID b t t	1083	1083  16 1108 1109 time_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1109 ( "<>"	   PGNSP PGUID b f f	1083	1083  16 1109 1108 time_ne neqsel neqjoinsel ));
+DATA(insert OID = 1109 ( "<>"	   PGNSP PGUID b f f	1083	1083  16 1109 1108 time_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1110 ( "<"	   PGNSP PGUID b f f	1083	1083  16 1112 1113 time_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1110 ( "<"	   PGNSP PGUID b f f	1083	1083  16 1112 1113 time_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1111 ( "<="	   PGNSP PGUID b f f	1083	1083  16 1113 1112 time_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1111 ( "<="	   PGNSP PGUID b f f	1083	1083  16 1113 1112 time_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1112 ( ">"	   PGNSP PGUID b f f	1083	1083  16 1110 1111 time_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1112 ( ">"	   PGNSP PGUID b f f	1083	1083  16 1110 1111 time_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1113 ( ">="	   PGNSP PGUID b f f	1083	1083  16 1111 1110 time_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1113 ( ">="	   PGNSP PGUID b f f	1083	1083  16 1111 1110 time_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* timetz operators */
-DATA(insert OID = 1550 ( "="	   PGNSP PGUID b t t	1266 1266	16 1550 1551 timetz_eq eqsel eqjoinsel ));
+DATA(insert OID = 1550 ( "="	   PGNSP PGUID b t t	1266 1266	16 1550 1551 timetz_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1551 ( "<>"	   PGNSP PGUID b f f	1266 1266	16 1551 1550 timetz_ne neqsel neqjoinsel ));
+DATA(insert OID = 1551 ( "<>"	   PGNSP PGUID b f f	1266 1266	16 1551 1550 timetz_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1552 ( "<"	   PGNSP PGUID b f f	1266 1266	16 1554 1555 timetz_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1552 ( "<"	   PGNSP PGUID b f f	1266 1266	16 1554 1555 timetz_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1553 ( "<="	   PGNSP PGUID b f f	1266 1266	16 1555 1554 timetz_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1553 ( "<="	   PGNSP PGUID b f f	1266 1266	16 1555 1554 timetz_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1554 ( ">"	   PGNSP PGUID b f f	1266 1266	16 1552 1553 timetz_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1554 ( ">"	   PGNSP PGUID b f f	1266 1266	16 1552 1553 timetz_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1555 ( ">="	   PGNSP PGUID b f f	1266 1266	16 1553 1552 timetz_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1555 ( ">="	   PGNSP PGUID b f f	1266 1266	16 1553 1552 timetz_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* float48 operators */
-DATA(insert OID = 1116 (  "+"		PGNSP PGUID b f f 700 701 701 1126	 0 float48pl - - ));
+DATA(insert OID = 1116 (  "+"		PGNSP PGUID b f f 700 701 701 1126	 0 float48pl - - "---"));
 DESCR("add");
-DATA(insert OID = 1117 (  "-"		PGNSP PGUID b f f 700 701 701  0	 0 float48mi - - ));
+DATA(insert OID = 1117 (  "-"		PGNSP PGUID b f f 700 701 701  0	 0 float48mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1118 (  "/"		PGNSP PGUID b f f 700 701 701  0	 0 float48div - - ));
+DATA(insert OID = 1118 (  "/"		PGNSP PGUID b f f 700 701 701  0	 0 float48div - - "---"));
 DESCR("divide");
-DATA(insert OID = 1119 (  "*"		PGNSP PGUID b f f 700 701 701 1129	 0 float48mul - - ));
+DATA(insert OID = 1119 (  "*"		PGNSP PGUID b f f 700 701 701 1129	 0 float48mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1120 (  "="		PGNSP PGUID b t t  700	701  16 1130 1121 float48eq eqsel eqjoinsel ));
+DATA(insert OID = 1120 (  "="		PGNSP PGUID b t t  700	701  16 1130 1121 float48eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1121 (  "<>"		PGNSP PGUID b f f  700	701  16 1131 1120 float48ne neqsel neqjoinsel ));
+DATA(insert OID = 1121 (  "<>"		PGNSP PGUID b f f  700	701  16 1131 1120 float48ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1122 (  "<"		PGNSP PGUID b f f  700	701  16 1133 1125 float48lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1122 (  "<"		PGNSP PGUID b f f  700	701  16 1133 1125 float48lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1123 (  ">"		PGNSP PGUID b f f  700	701  16 1132 1124 float48gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1123 (  ">"		PGNSP PGUID b f f  700	701  16 1132 1124 float48gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1124 (  "<="		PGNSP PGUID b f f  700	701  16 1135 1123 float48le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1124 (  "<="		PGNSP PGUID b f f  700	701  16 1135 1123 float48le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1125 (  ">="		PGNSP PGUID b f f  700	701  16 1134 1122 float48ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1125 (  ">="		PGNSP PGUID b f f  700	701  16 1134 1122 float48ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* float84 operators */
-DATA(insert OID = 1126 (  "+"		PGNSP PGUID b f f 701 700 701 1116	 0 float84pl - - ));
+DATA(insert OID = 1126 (  "+"		PGNSP PGUID b f f 701 700 701 1116	 0 float84pl - - "---"));
 DESCR("add");
-DATA(insert OID = 1127 (  "-"		PGNSP PGUID b f f 701 700 701  0	 0 float84mi - - ));
+DATA(insert OID = 1127 (  "-"		PGNSP PGUID b f f 701 700 701  0	 0 float84mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1128 (  "/"		PGNSP PGUID b f f 701 700 701  0	 0 float84div - - ));
+DATA(insert OID = 1128 (  "/"		PGNSP PGUID b f f 701 700 701  0	 0 float84div - - "---"));
 DESCR("divide");
-DATA(insert OID = 1129 (  "*"		PGNSP PGUID b f f 701 700 701 1119	 0 float84mul - - ));
+DATA(insert OID = 1129 (  "*"		PGNSP PGUID b f f 701 700 701 1119	 0 float84mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1130 (  "="		PGNSP PGUID b t t  701	700  16 1120 1131 float84eq eqsel eqjoinsel ));
+DATA(insert OID = 1130 (  "="		PGNSP PGUID b t t  701	700  16 1120 1131 float84eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1131 (  "<>"		PGNSP PGUID b f f  701	700  16 1121 1130 float84ne neqsel neqjoinsel ));
+DATA(insert OID = 1131 (  "<>"		PGNSP PGUID b f f  701	700  16 1121 1130 float84ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1132 (  "<"		PGNSP PGUID b f f  701	700  16 1123 1135 float84lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1132 (  "<"		PGNSP PGUID b f f  701	700  16 1123 1135 float84lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1133 (  ">"		PGNSP PGUID b f f  701	700  16 1122 1134 float84gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1133 (  ">"		PGNSP PGUID b f f  701	700  16 1122 1134 float84gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1134 (  "<="		PGNSP PGUID b f f  701	700  16 1125 1133 float84le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1134 (  "<="		PGNSP PGUID b f f  701	700  16 1125 1133 float84le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1135 (  ">="		PGNSP PGUID b f f  701	700  16 1124 1132 float84ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1135 (  ">="		PGNSP PGUID b f f  701	700  16 1124 1132 float84ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 
 /* LIKE hacks by Keith Parks. */
-DATA(insert OID = 1207 (  "~~"	  PGNSP PGUID b f f  19 25	16 0 1208 namelike likesel likejoinsel ));
+DATA(insert OID = 1207 (  "~~"	  PGNSP PGUID b f f  19 25	16 0 1208 namelike likesel likejoinsel "---"));
 DESCR("matches LIKE expression");
 #define OID_NAME_LIKE_OP		1207
-DATA(insert OID = 1208 (  "!~~"   PGNSP PGUID b f f  19 25	16 0 1207 namenlike nlikesel nlikejoinsel ));
+DATA(insert OID = 1208 (  "!~~"   PGNSP PGUID b f f  19 25	16 0 1207 namenlike nlikesel nlikejoinsel "---"));
 DESCR("does not match LIKE expression");
-DATA(insert OID = 1209 (  "~~"	  PGNSP PGUID b f f  25 25	16 0 1210 textlike likesel likejoinsel ));
+DATA(insert OID = 1209 (  "~~"	  PGNSP PGUID b f f  25 25	16 0 1210 textlike likesel likejoinsel "---"));
 DESCR("matches LIKE expression");
 #define OID_TEXT_LIKE_OP		1209
-DATA(insert OID = 1210 (  "!~~"   PGNSP PGUID b f f  25 25	16 0 1209 textnlike nlikesel nlikejoinsel ));
+DATA(insert OID = 1210 (  "!~~"   PGNSP PGUID b f f  25 25	16 0 1209 textnlike nlikesel nlikejoinsel "---"));
 DESCR("does not match LIKE expression");
-DATA(insert OID = 1211 (  "~~"	  PGNSP PGUID b f f  1042 25	16 0 1212 bpcharlike likesel likejoinsel ));
+DATA(insert OID = 1211 (  "~~"	  PGNSP PGUID b f f  1042 25	16 0 1212 bpcharlike likesel likejoinsel "---"));
 DESCR("matches LIKE expression");
 #define OID_BPCHAR_LIKE_OP		1211
-DATA(insert OID = 1212 (  "!~~"   PGNSP PGUID b f f  1042 25	16 0 1211 bpcharnlike nlikesel nlikejoinsel ));
+DATA(insert OID = 1212 (  "!~~"   PGNSP PGUID b f f  1042 25	16 0 1211 bpcharnlike nlikesel nlikejoinsel "---"));
 DESCR("does not match LIKE expression");
 
 /* case-insensitive regex hacks */
-DATA(insert OID = 1226 (  "~*"		 PGNSP PGUID b f f	19	25	16 0 1227 nameicregexeq icregexeqsel icregexeqjoinsel ));
+DATA(insert OID = 1226 (  "~*"		 PGNSP PGUID b f f	19	25	16 0 1227 nameicregexeq icregexeqsel icregexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-insensitive");
 #define OID_NAME_ICREGEXEQ_OP		1226
-DATA(insert OID = 1227 (  "!~*"		 PGNSP PGUID b f f	19	25	16 0 1226 nameicregexne icregexnesel icregexnejoinsel ));
+DATA(insert OID = 1227 (  "!~*"		 PGNSP PGUID b f f	19	25	16 0 1226 nameicregexne icregexnesel icregexnejoinsel "---"));
 DESCR("does not match regular expression, case-insensitive");
-DATA(insert OID = 1228 (  "~*"		 PGNSP PGUID b f f	25	25	16 0 1229 texticregexeq icregexeqsel icregexeqjoinsel ));
+DATA(insert OID = 1228 (  "~*"		 PGNSP PGUID b f f	25	25	16 0 1229 texticregexeq icregexeqsel icregexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-insensitive");
 #define OID_TEXT_ICREGEXEQ_OP		1228
-DATA(insert OID = 1229 (  "!~*"		 PGNSP PGUID b f f	25	25	16 0 1228 texticregexne icregexnesel icregexnejoinsel ));
+DATA(insert OID = 1229 (  "!~*"		 PGNSP PGUID b f f	25	25	16 0 1228 texticregexne icregexnesel icregexnejoinsel "---"));
 DESCR("does not match regular expression, case-insensitive");
-DATA(insert OID = 1234 (  "~*"		PGNSP PGUID b f f  1042  25  16 0 1235 bpcharicregexeq icregexeqsel icregexeqjoinsel ));
+DATA(insert OID = 1234 (  "~*"		PGNSP PGUID b f f  1042  25  16 0 1235 bpcharicregexeq icregexeqsel icregexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-insensitive");
 #define OID_BPCHAR_ICREGEXEQ_OP		1234
-DATA(insert OID = 1235 ( "!~*"		PGNSP PGUID b f f  1042  25  16 0 1234 bpcharicregexne icregexnesel icregexnejoinsel ));
+DATA(insert OID = 1235 ( "!~*"		PGNSP PGUID b f f  1042  25  16 0 1234 bpcharicregexne icregexnesel icregexnejoinsel "---"));
 DESCR("does not match regular expression, case-insensitive");
 
 /* timestamptz operators */
-DATA(insert OID = 1320 (  "="	   PGNSP PGUID b t t 1184 1184	 16 1320 1321 timestamptz_eq eqsel eqjoinsel ));
+DATA(insert OID = 1320 (  "="	   PGNSP PGUID b t t 1184 1184	 16 1320 1321 timestamptz_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1321 (  "<>"	   PGNSP PGUID b f f 1184 1184	 16 1321 1320 timestamptz_ne neqsel neqjoinsel ));
+DATA(insert OID = 1321 (  "<>"	   PGNSP PGUID b f f 1184 1184	 16 1321 1320 timestamptz_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1322 (  "<"	   PGNSP PGUID b f f 1184 1184	 16 1324 1325 timestamptz_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1322 (  "<"	   PGNSP PGUID b f f 1184 1184	 16 1324 1325 timestamptz_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1323 (  "<="	   PGNSP PGUID b f f 1184 1184	 16 1325 1324 timestamptz_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1323 (  "<="	   PGNSP PGUID b f f 1184 1184	 16 1325 1324 timestamptz_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1324 (  ">"	   PGNSP PGUID b f f 1184 1184	 16 1322 1323 timestamptz_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1324 (  ">"	   PGNSP PGUID b f f 1184 1184	 16 1322 1323 timestamptz_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1325 (  ">="	   PGNSP PGUID b f f 1184 1184	 16 1323 1322 timestamptz_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1325 (  ">="	   PGNSP PGUID b f f 1184 1184	 16 1323 1322 timestamptz_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 1327 (  "+"	   PGNSP PGUID b f f 1184 1186 1184  2554 0 timestamptz_pl_interval - - ));
+DATA(insert OID = 1327 (  "+"	   PGNSP PGUID b f f 1184 1186 1184  2554 0 timestamptz_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 1328 (  "-"	   PGNSP PGUID b f f 1184 1184 1186  0	0 timestamptz_mi - - ));
+DATA(insert OID = 1328 (  "-"	   PGNSP PGUID b f f 1184 1184 1186  0	0 timestamptz_mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1329 (  "-"	   PGNSP PGUID b f f 1184 1186 1184  0	0 timestamptz_mi_interval - - ));
+DATA(insert OID = 1329 (  "-"	   PGNSP PGUID b f f 1184 1186 1184  0	0 timestamptz_mi_interval - - "---"));
 DESCR("subtract");
 
 /* interval operators */
-DATA(insert OID = 1330 (  "="	   PGNSP PGUID b t t 1186 1186	 16 1330 1331 interval_eq eqsel eqjoinsel ));
+DATA(insert OID = 1330 (  "="	   PGNSP PGUID b t t 1186 1186	 16 1330 1331 interval_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1331 (  "<>"	   PGNSP PGUID b f f 1186 1186	 16 1331 1330 interval_ne neqsel neqjoinsel ));
+DATA(insert OID = 1331 (  "<>"	   PGNSP PGUID b f f 1186 1186	 16 1331 1330 interval_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1332 (  "<"	   PGNSP PGUID b f f 1186 1186	 16 1334 1335 interval_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1332 (  "<"	   PGNSP PGUID b f f 1186 1186	 16 1334 1335 interval_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1333 (  "<="	   PGNSP PGUID b f f 1186 1186	 16 1335 1334 interval_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1333 (  "<="	   PGNSP PGUID b f f 1186 1186	 16 1335 1334 interval_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1334 (  ">"	   PGNSP PGUID b f f 1186 1186	 16 1332 1333 interval_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1334 (  ">"	   PGNSP PGUID b f f 1186 1186	 16 1332 1333 interval_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1335 (  ">="	   PGNSP PGUID b f f 1186 1186	 16 1333 1332 interval_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1335 (  ">="	   PGNSP PGUID b f f 1186 1186	 16 1333 1332 interval_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 1336 (  "-"	   PGNSP PGUID l f f	0 1186 1186    0	0 interval_um - - ));
+DATA(insert OID = 1336 (  "-"	   PGNSP PGUID l f f	0 1186 1186    0	0 interval_um - - "---"));
 DESCR("negate");
-DATA(insert OID = 1337 (  "+"	   PGNSP PGUID b f f 1186 1186 1186 1337	0 interval_pl - - ));
+DATA(insert OID = 1337 (  "+"	   PGNSP PGUID b f f 1186 1186 1186 1337	0 interval_pl - - "---"));
 DESCR("add");
-DATA(insert OID = 1338 (  "-"	   PGNSP PGUID b f f 1186 1186 1186    0	0 interval_mi - - ));
+DATA(insert OID = 1338 (  "-"	   PGNSP PGUID b f f 1186 1186 1186    0	0 interval_mi - - "---"));
 DESCR("subtract");
 
-DATA(insert OID = 1360 (  "+"	   PGNSP PGUID b f f 1082 1083 1114 1363 0 datetime_pl - - ));
+DATA(insert OID = 1360 (  "+"	   PGNSP PGUID b f f 1082 1083 1114 1363 0 datetime_pl - - "---"));
 DESCR("convert date and time to timestamp");
-DATA(insert OID = 1361 (  "+"	   PGNSP PGUID b f f 1082 1266 1184 1366 0 datetimetz_pl - - ));
+DATA(insert OID = 1361 (  "+"	   PGNSP PGUID b f f 1082 1266 1184 1366 0 datetimetz_pl - - "---"));
 DESCR("convert date and time with time zone to timestamp with time zone");
-DATA(insert OID = 1363 (  "+"	   PGNSP PGUID b f f 1083 1082 1114 1360 0 timedate_pl - - ));
+DATA(insert OID = 1363 (  "+"	   PGNSP PGUID b f f 1083 1082 1114 1360 0 timedate_pl - - "---"));
 DESCR("convert time and date to timestamp");
-DATA(insert OID = 1366 (  "+"	   PGNSP PGUID b f f 1266 1082 1184 1361 0 timetzdate_pl - - ));
+DATA(insert OID = 1366 (  "+"	   PGNSP PGUID b f f 1266 1082 1184 1361 0 timetzdate_pl - - "---"));
 DESCR("convert time with time zone and date to timestamp with time zone");
 
-DATA(insert OID = 1399 (  "-"	   PGNSP PGUID b f f 1083 1083 1186  0	0 time_mi_time - - ));
+DATA(insert OID = 1399 (  "-"	   PGNSP PGUID b f f 1083 1083 1186  0	0 time_mi_time - - "---"));
 DESCR("subtract");
 
 /* additional geometric operators - thomas 97/04/18 */
-DATA(insert OID = 1420 (  "@@"	  PGNSP PGUID l f f  0	718 600   0    0 circle_center - - ));
+DATA(insert OID = 1420 (  "@@"	  PGNSP PGUID l f f  0	718 600   0    0 circle_center - - "---"));
 DESCR("center of");
-DATA(insert OID = 1500 (  "="	  PGNSP PGUID b f f  718	718 16 1500 1501 circle_eq eqsel eqjoinsel ));
+DATA(insert OID = 1500 (  "="	  PGNSP PGUID b f f  718	718 16 1500 1501 circle_eq eqsel eqjoinsel "mhf"));
 DESCR("equal by area");
-DATA(insert OID = 1501 (  "<>"	  PGNSP PGUID b f f  718	718 16 1501 1500 circle_ne neqsel neqjoinsel ));
+DATA(insert OID = 1501 (  "<>"	  PGNSP PGUID b f f  718	718 16 1501 1500 circle_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal by area");
-DATA(insert OID = 1502 (  "<"	  PGNSP PGUID b f f  718	718 16 1503 1505 circle_lt areasel areajoinsel ));
+DATA(insert OID = 1502 (  "<"	  PGNSP PGUID b f f  718	718 16 1503 1505 circle_lt areasel areajoinsel "---"));
 DESCR("less than by area");
-DATA(insert OID = 1503 (  ">"	  PGNSP PGUID b f f  718	718 16 1502 1504 circle_gt areasel areajoinsel ));
+DATA(insert OID = 1503 (  ">"	  PGNSP PGUID b f f  718	718 16 1502 1504 circle_gt areasel areajoinsel "---"));
 DESCR("greater than by area");
-DATA(insert OID = 1504 (  "<="	  PGNSP PGUID b f f  718	718 16 1505 1503 circle_le areasel areajoinsel ));
+DATA(insert OID = 1504 (  "<="	  PGNSP PGUID b f f  718	718 16 1505 1503 circle_le areasel areajoinsel "---"));
 DESCR("less than or equal by area");
-DATA(insert OID = 1505 (  ">="	  PGNSP PGUID b f f  718	718 16 1504 1502 circle_ge areasel areajoinsel ));
+DATA(insert OID = 1505 (  ">="	  PGNSP PGUID b f f  718	718 16 1504 1502 circle_ge areasel areajoinsel "---"));
 DESCR("greater than or equal by area");
 
-DATA(insert OID = 1506 (  "<<"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_left positionsel positionjoinsel ));
+DATA(insert OID = 1506 (  "<<"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_left positionsel positionjoinsel "---"));
 DESCR("is left of");
-DATA(insert OID = 1507 (  "&<"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_overleft positionsel positionjoinsel ));
+DATA(insert OID = 1507 (  "&<"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_overleft positionsel positionjoinsel "---"));
 DESCR("overlaps or is left of");
-DATA(insert OID = 1508 (  "&>"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_overright positionsel positionjoinsel ));
+DATA(insert OID = 1508 (  "&>"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_overright positionsel positionjoinsel "---"));
 DESCR("overlaps or is right of");
-DATA(insert OID = 1509 (  ">>"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_right positionsel positionjoinsel ));
+DATA(insert OID = 1509 (  ">>"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_right positionsel positionjoinsel "---"));
 DESCR("is right of");
-DATA(insert OID = 1510 (  "<@"	  PGNSP PGUID b f f  718	718 16 1511    0 circle_contained contsel contjoinsel ));
+DATA(insert OID = 1510 (  "<@"	  PGNSP PGUID b f f  718	718 16 1511    0 circle_contained contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 1511 (  "@>"	  PGNSP PGUID b f f  718	718 16 1510    0 circle_contain contsel contjoinsel ));
+DATA(insert OID = 1511 (  "@>"	  PGNSP PGUID b f f  718	718 16 1510    0 circle_contain contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 1512 (  "~="	  PGNSP PGUID b f f  718	718 16 1512    0 circle_same eqsel eqjoinsel ));
+DATA(insert OID = 1512 (  "~="	  PGNSP PGUID b f f  718	718 16 1512    0 circle_same eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 1513 (  "&&"	  PGNSP PGUID b f f  718	718 16 1513    0 circle_overlap areasel areajoinsel ));
+DATA(insert OID = 1513 (  "&&"	  PGNSP PGUID b f f  718	718 16 1513    0 circle_overlap areasel areajoinsel "---"));
 DESCR("overlaps");
-DATA(insert OID = 1514 (  "|>>"   PGNSP PGUID b f f  718	718 16	  0    0 circle_above positionsel positionjoinsel ));
+DATA(insert OID = 1514 (  "|>>"   PGNSP PGUID b f f  718	718 16	  0    0 circle_above positionsel positionjoinsel "---"));
 DESCR("is above");
-DATA(insert OID = 1515 (  "<<|"   PGNSP PGUID b f f  718	718 16	  0    0 circle_below positionsel positionjoinsel ));
+DATA(insert OID = 1515 (  "<<|"   PGNSP PGUID b f f  718	718 16	  0    0 circle_below positionsel positionjoinsel "---"));
 DESCR("is below");
 
-DATA(insert OID = 1516 (  "+"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_add_pt - - ));
+DATA(insert OID = 1516 (  "+"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_add_pt - - "---"));
 DESCR("add");
-DATA(insert OID = 1517 (  "-"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_sub_pt - - ));
+DATA(insert OID = 1517 (  "-"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_sub_pt - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1518 (  "*"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_mul_pt - - ));
+DATA(insert OID = 1518 (  "*"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_mul_pt - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1519 (  "/"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_div_pt - - ));
+DATA(insert OID = 1519 (  "/"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_div_pt - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 1520 (  "<->"   PGNSP PGUID b f f  718	718  701   1520    0 circle_distance - - ));
+DATA(insert OID = 1520 (  "<->"   PGNSP PGUID b f f  718	718  701   1520    0 circle_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 1521 (  "#"	  PGNSP PGUID l f f  0		604   23	  0    0 poly_npoints - - ));
+DATA(insert OID = 1521 (  "#"	  PGNSP PGUID l f f  0		604   23	  0    0 poly_npoints - - "---"));
 DESCR("number of points");
-DATA(insert OID = 1522 (  "<->"   PGNSP PGUID b f f  600	718  701   3291    0 dist_pc - - ));
+DATA(insert OID = 1522 (  "<->"   PGNSP PGUID b f f  600	718  701   3291    0 dist_pc - - "---"));
 DESCR("distance between");
-DATA(insert OID = 3291 (  "<->"   PGNSP PGUID b f f  718	600  701   1522    0 dist_cpoint - - ));
+DATA(insert OID = 3291 (  "<->"   PGNSP PGUID b f f  718	600  701   1522    0 dist_cpoint - - "---"));
 DESCR("distance between");
-DATA(insert OID = 3276 (  "<->"   PGNSP PGUID b f f  600	604  701   3289    0 dist_ppoly - - ));
+DATA(insert OID = 3276 (  "<->"   PGNSP PGUID b f f  600	604  701   3289    0 dist_ppoly - - "---"));
 DESCR("distance between");
-DATA(insert OID = 3289 (  "<->"   PGNSP PGUID b f f  604	600  701   3276    0 dist_polyp - - ));
+DATA(insert OID = 3289 (  "<->"   PGNSP PGUID b f f  604	600  701   3276    0 dist_polyp - - "---"));
 DESCR("distance between");
-DATA(insert OID = 1523 (  "<->"   PGNSP PGUID b f f  718	604  701	  0    0 dist_cpoly - - ));
+DATA(insert OID = 1523 (  "<->"   PGNSP PGUID b f f  718	604  701	  0    0 dist_cpoly - - "---"));
 DESCR("distance between");
 
 /* additional geometric operators - thomas 1997-07-09 */
-DATA(insert OID = 1524 (  "<->"   PGNSP PGUID b f f  628	603  701	  0  0 dist_lb - - ));
+DATA(insert OID = 1524 (  "<->"   PGNSP PGUID b f f  628	603  701	  0  0 dist_lb - - "---"));
 DESCR("distance between");
 
-DATA(insert OID = 1525 (  "?#"	  PGNSP PGUID b f f  601	601 16 1525  0 lseg_intersect - - ));
+DATA(insert OID = 1525 (  "?#"	  PGNSP PGUID b f f  601	601 16 1525  0 lseg_intersect - - "---"));
 DESCR("intersect");
-DATA(insert OID = 1526 (  "?||"   PGNSP PGUID b f f  601	601 16 1526  0 lseg_parallel - - ));
+DATA(insert OID = 1526 (  "?||"   PGNSP PGUID b f f  601	601 16 1526  0 lseg_parallel - - "---"));
 DESCR("parallel");
-DATA(insert OID = 1527 (  "?-|"   PGNSP PGUID b f f  601	601 16 1527  0 lseg_perp - - ));
+DATA(insert OID = 1527 (  "?-|"   PGNSP PGUID b f f  601	601 16 1527  0 lseg_perp - - "---"));
 DESCR("perpendicular");
-DATA(insert OID = 1528 (  "?-"	  PGNSP PGUID l f f  0	601 16	  0  0 lseg_horizontal - - ));
+DATA(insert OID = 1528 (  "?-"	  PGNSP PGUID l f f  0	601 16	  0  0 lseg_horizontal - - "---"));
 DESCR("horizontal");
-DATA(insert OID = 1529 (  "?|"	  PGNSP PGUID l f f  0	601 16	  0  0 lseg_vertical - - ));
+DATA(insert OID = 1529 (  "?|"	  PGNSP PGUID l f f  0	601 16	  0  0 lseg_vertical - - "---"));
 DESCR("vertical");
-DATA(insert OID = 1535 (  "="	  PGNSP PGUID b f f  601	601 16 1535 1586 lseg_eq eqsel eqjoinsel ));
+DATA(insert OID = 1535 (  "="	  PGNSP PGUID b f f  601	601 16 1535 1586 lseg_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1536 (  "#"	  PGNSP PGUID b f f  601	601  600 1536  0 lseg_interpt - - ));
+DATA(insert OID = 1536 (  "#"	  PGNSP PGUID b f f  601	601  600 1536  0 lseg_interpt - - "---"));
 DESCR("intersection point");
-DATA(insert OID = 1537 (  "?#"	  PGNSP PGUID b f f  601	628 16	  0  0 inter_sl - - ));
+DATA(insert OID = 1537 (  "?#"	  PGNSP PGUID b f f  601	628 16	  0  0 inter_sl - - "---"));
 DESCR("intersect");
-DATA(insert OID = 1538 (  "?#"	  PGNSP PGUID b f f  601	603 16	  0  0 inter_sb - - ));
+DATA(insert OID = 1538 (  "?#"	  PGNSP PGUID b f f  601	603 16	  0  0 inter_sb - - "---"));
 DESCR("intersect");
-DATA(insert OID = 1539 (  "?#"	  PGNSP PGUID b f f  628	603 16	  0  0 inter_lb - - ));
+DATA(insert OID = 1539 (  "?#"	  PGNSP PGUID b f f  628	603 16	  0  0 inter_lb - - "---"));
 DESCR("intersect");
 
-DATA(insert OID = 1546 (  "<@"	  PGNSP PGUID b f f  600	628 16	  0  0 on_pl - - ));
+DATA(insert OID = 1546 (  "<@"	  PGNSP PGUID b f f  600	628 16	  0  0 on_pl - - "---"));
 DESCR("point on line");
-DATA(insert OID = 1547 (  "<@"	  PGNSP PGUID b f f  600	601 16	  0  0 on_ps - - ));
+DATA(insert OID = 1547 (  "<@"	  PGNSP PGUID b f f  600	601 16	  0  0 on_ps - - "---"));
 DESCR("is contained by");
-DATA(insert OID = 1548 (  "<@"	  PGNSP PGUID b f f  601	628 16	  0  0 on_sl - - ));
+DATA(insert OID = 1548 (  "<@"	  PGNSP PGUID b f f  601	628 16	  0  0 on_sl - - "---"));
 DESCR("lseg on line");
-DATA(insert OID = 1549 (  "<@"	  PGNSP PGUID b f f  601	603 16	  0  0 on_sb - - ));
+DATA(insert OID = 1549 (  "<@"	  PGNSP PGUID b f f  601	603 16	  0  0 on_sb - - "---"));
 DESCR("is contained by");
 
-DATA(insert OID = 1557 (  "##"	  PGNSP PGUID b f f  600	628  600	  0  0 close_pl - - ));
+DATA(insert OID = 1557 (  "##"	  PGNSP PGUID b f f  600	628  600	  0  0 close_pl - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1558 (  "##"	  PGNSP PGUID b f f  600	601  600	  0  0 close_ps - - ));
+DATA(insert OID = 1558 (  "##"	  PGNSP PGUID b f f  600	601  600	  0  0 close_ps - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1559 (  "##"	  PGNSP PGUID b f f  600	603  600	  0  0 close_pb - - ));
+DATA(insert OID = 1559 (  "##"	  PGNSP PGUID b f f  600	603  600	  0  0 close_pb - - "---"));
 DESCR("closest point to A on B");
 
-DATA(insert OID = 1566 (  "##"	  PGNSP PGUID b f f  601	628  600	  0  0 close_sl - - ));
+DATA(insert OID = 1566 (  "##"	  PGNSP PGUID b f f  601	628  600	  0  0 close_sl - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1567 (  "##"	  PGNSP PGUID b f f  601	603  600	  0  0 close_sb - - ));
+DATA(insert OID = 1567 (  "##"	  PGNSP PGUID b f f  601	603  600	  0  0 close_sb - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1568 (  "##"	  PGNSP PGUID b f f  628	603  600	  0  0 close_lb - - ));
+DATA(insert OID = 1568 (  "##"	  PGNSP PGUID b f f  628	603  600	  0  0 close_lb - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1577 (  "##"	  PGNSP PGUID b f f  628	601  600	  0  0 close_ls - - ));
+DATA(insert OID = 1577 (  "##"	  PGNSP PGUID b f f  628	601  600	  0  0 close_ls - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1578 (  "##"	  PGNSP PGUID b f f  601	601  600	  0  0 close_lseg - - ));
+DATA(insert OID = 1578 (  "##"	  PGNSP PGUID b f f  601	601  600	  0  0 close_lseg - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1583 (  "*"	  PGNSP PGUID b f f 1186	701 1186	1584 0 interval_mul - - ));
+DATA(insert OID = 1583 (  "*"	  PGNSP PGUID b f f 1186	701 1186	1584 0 interval_mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1584 (  "*"	  PGNSP PGUID b f f  701 1186 1186	1583 0 mul_d_interval - - ));
+DATA(insert OID = 1584 (  "*"	  PGNSP PGUID b f f  701 1186 1186	1583 0 mul_d_interval - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1585 (  "/"	  PGNSP PGUID b f f 1186	701 1186	  0  0 interval_div - - ));
+DATA(insert OID = 1585 (  "/"	  PGNSP PGUID b f f 1186	701 1186	  0  0 interval_div - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 1586 (  "<>"	  PGNSP PGUID b f f  601	601 16 1586 1535 lseg_ne neqsel neqjoinsel ));
+DATA(insert OID = 1586 (  "<>"	  PGNSP PGUID b f f  601	601 16 1586 1535 lseg_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1587 (  "<"	  PGNSP PGUID b f f  601	601 16 1589 1590 lseg_lt - - ));
+DATA(insert OID = 1587 (  "<"	  PGNSP PGUID b f f  601	601 16 1589 1590 lseg_lt - - "---"));
 DESCR("less than by length");
-DATA(insert OID = 1588 (  "<="	  PGNSP PGUID b f f  601	601 16 1590 1589 lseg_le - - ));
+DATA(insert OID = 1588 (  "<="	  PGNSP PGUID b f f  601	601 16 1590 1589 lseg_le - - "---"));
 DESCR("less than or equal by length");
-DATA(insert OID = 1589 (  ">"	  PGNSP PGUID b f f  601	601 16 1587 1588 lseg_gt - - ));
+DATA(insert OID = 1589 (  ">"	  PGNSP PGUID b f f  601	601 16 1587 1588 lseg_gt - - "---"));
 DESCR("greater than by length");
-DATA(insert OID = 1590 (  ">="	  PGNSP PGUID b f f  601	601 16 1588 1587 lseg_ge - - ));
+DATA(insert OID = 1590 (  ">="	  PGNSP PGUID b f f  601	601 16 1588 1587 lseg_ge - - "---"));
 DESCR("greater than or equal by length");
 
-DATA(insert OID = 1591 (  "@-@"   PGNSP PGUID l f f 0  601	701    0  0 lseg_length - - ));
+DATA(insert OID = 1591 (  "@-@"   PGNSP PGUID l f f 0  601	701    0  0 lseg_length - - "---"));
 DESCR("distance between endpoints");
 
-DATA(insert OID = 1611 (  "?#"	  PGNSP PGUID b f f  628	628 16 1611  0 line_intersect - - ));
+DATA(insert OID = 1611 (  "?#"	  PGNSP PGUID b f f  628	628 16 1611  0 line_intersect - - "---"));
 DESCR("intersect");
-DATA(insert OID = 1612 (  "?||"   PGNSP PGUID b f f  628	628 16 1612  0 line_parallel - - ));
+DATA(insert OID = 1612 (  "?||"   PGNSP PGUID b f f  628	628 16 1612  0 line_parallel - - "---"));
 DESCR("parallel");
-DATA(insert OID = 1613 (  "?-|"   PGNSP PGUID b f f  628	628 16 1613  0 line_perp - - ));
+DATA(insert OID = 1613 (  "?-|"   PGNSP PGUID b f f  628	628 16 1613  0 line_perp - - "---"));
 DESCR("perpendicular");
-DATA(insert OID = 1614 (  "?-"	  PGNSP PGUID l f f  0	628 16	  0  0 line_horizontal - - ));
+DATA(insert OID = 1614 (  "?-"	  PGNSP PGUID l f f  0	628 16	  0  0 line_horizontal - - "---"));
 DESCR("horizontal");
-DATA(insert OID = 1615 (  "?|"	  PGNSP PGUID l f f  0	628 16	  0  0 line_vertical - - ));
+DATA(insert OID = 1615 (  "?|"	  PGNSP PGUID l f f  0	628 16	  0  0 line_vertical - - "---"));
 DESCR("vertical");
-DATA(insert OID = 1616 (  "="	  PGNSP PGUID b f f  628	628 16 1616  0 line_eq eqsel eqjoinsel ));
+DATA(insert OID = 1616 (  "="	  PGNSP PGUID b f f  628	628 16 1616  0 line_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1617 (  "#"	  PGNSP PGUID b f f  628	628  600 1617  0 line_interpt - - ));
+DATA(insert OID = 1617 (  "#"	  PGNSP PGUID b f f  628	628  600 1617  0 line_interpt - - "---"));
 DESCR("intersection point");
 
 /* MAC type */
-DATA(insert OID = 1220 (  "="	   PGNSP PGUID b t t 829 829	 16 1220 1221 macaddr_eq eqsel eqjoinsel ));
+DATA(insert OID = 1220 (  "="	   PGNSP PGUID b t t 829 829	 16 1220 1221 macaddr_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1221 (  "<>"	   PGNSP PGUID b f f 829 829	 16 1221 1220 macaddr_ne neqsel neqjoinsel ));
+DATA(insert OID = 1221 (  "<>"	   PGNSP PGUID b f f 829 829	 16 1221 1220 macaddr_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1222 (  "<"	   PGNSP PGUID b f f 829 829	 16 1224 1225 macaddr_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1222 (  "<"	   PGNSP PGUID b f f 829 829	 16 1224 1225 macaddr_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1223 (  "<="	   PGNSP PGUID b f f 829 829	 16 1225 1224 macaddr_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1223 (  "<="	   PGNSP PGUID b f f 829 829	 16 1225 1224 macaddr_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1224 (  ">"	   PGNSP PGUID b f f 829 829	 16 1222 1223 macaddr_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1224 (  ">"	   PGNSP PGUID b f f 829 829	 16 1222 1223 macaddr_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1225 (  ">="	   PGNSP PGUID b f f 829 829	 16 1223 1222 macaddr_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1225 (  ">="	   PGNSP PGUID b f f 829 829	 16 1223 1222 macaddr_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 3147 (  "~"	   PGNSP PGUID l f f	  0 829 829 0 0 macaddr_not - - ));
+DATA(insert OID = 3147 (  "~"	   PGNSP PGUID l f f	  0 829 829 0 0 macaddr_not - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 3148 (  "&"	   PGNSP PGUID b f f	829 829 829 0 0 macaddr_and - - ));
+DATA(insert OID = 3148 (  "&"	   PGNSP PGUID b f f	829 829 829 0 0 macaddr_and - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 3149 (  "|"	   PGNSP PGUID b f f	829 829 829 0 0 macaddr_or - - ));
+DATA(insert OID = 3149 (  "|"	   PGNSP PGUID b f f	829 829 829 0 0 macaddr_or - - "---"));
 DESCR("bitwise or");
 
 /* INET type (these also support CIDR via implicit cast) */
-DATA(insert OID = 1201 (  "="	   PGNSP PGUID b t t 869 869	 16 1201 1202 network_eq eqsel eqjoinsel ));
+DATA(insert OID = 1201 (  "="	   PGNSP PGUID b t t 869 869	 16 1201 1202 network_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1202 (  "<>"	   PGNSP PGUID b f f 869 869	 16 1202 1201 network_ne neqsel neqjoinsel ));
+DATA(insert OID = 1202 (  "<>"	   PGNSP PGUID b f f 869 869	 16 1202 1201 network_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1203 (  "<"	   PGNSP PGUID b f f 869 869	 16 1205 1206 network_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1203 (  "<"	   PGNSP PGUID b f f 869 869	 16 1205 1206 network_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1204 (  "<="	   PGNSP PGUID b f f 869 869	 16 1206 1205 network_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1204 (  "<="	   PGNSP PGUID b f f 869 869	 16 1206 1205 network_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1205 (  ">"	   PGNSP PGUID b f f 869 869	 16 1203 1204 network_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1205 (  ">"	   PGNSP PGUID b f f 869 869	 16 1203 1204 network_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1206 (  ">="	   PGNSP PGUID b f f 869 869	 16 1204 1203 network_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1206 (  ">="	   PGNSP PGUID b f f 869 869	 16 1204 1203 network_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 931  (  "<<"	   PGNSP PGUID b f f 869 869	 16 933		0 network_sub networksel networkjoinsel ));
+DATA(insert OID = 931  (  "<<"	   PGNSP PGUID b f f 869 869	 16 933		0 network_sub networksel networkjoinsel "---"));
 DESCR("is subnet");
 #define OID_INET_SUB_OP			931
-DATA(insert OID = 932  (  "<<="    PGNSP PGUID b f f 869 869	 16 934		0 network_subeq networksel networkjoinsel ));
+DATA(insert OID = 932  (  "<<="    PGNSP PGUID b f f 869 869	 16 934		0 network_subeq networksel networkjoinsel "---"));
 DESCR("is subnet or equal");
 #define OID_INET_SUBEQ_OP		932
-DATA(insert OID = 933  (  ">>"	   PGNSP PGUID b f f 869 869	 16 931		0 network_sup networksel networkjoinsel ));
+DATA(insert OID = 933  (  ">>"	   PGNSP PGUID b f f 869 869	 16 931		0 network_sup networksel networkjoinsel "---"));
 DESCR("is supernet");
 #define OID_INET_SUP_OP			933
-DATA(insert OID = 934  (  ">>="    PGNSP PGUID b f f 869 869	 16 932		0 network_supeq networksel networkjoinsel ));
+DATA(insert OID = 934  (  ">>="    PGNSP PGUID b f f 869 869	 16 932		0 network_supeq networksel networkjoinsel "---"));
 DESCR("is supernet or equal");
 #define OID_INET_SUPEQ_OP		934
-DATA(insert OID = 3552	(  "&&"    PGNSP PGUID b f f 869 869	 16 3552	0 network_overlap networksel networkjoinsel ));
+DATA(insert OID = 3552	(  "&&"    PGNSP PGUID b f f 869 869	 16 3552	0 network_overlap networksel networkjoinsel "---"));
 DESCR("overlaps (is subnet or supernet)");
 #define OID_INET_OVERLAP_OP		3552
 
-DATA(insert OID = 2634 (  "~"	   PGNSP PGUID l f f	  0 869 869 0 0 inetnot - - ));
+DATA(insert OID = 2634 (  "~"	   PGNSP PGUID l f f	  0 869 869 0 0 inetnot - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 2635 (  "&"	   PGNSP PGUID b f f	869 869 869 0 0 inetand - - ));
+DATA(insert OID = 2635 (  "&"	   PGNSP PGUID b f f	869 869 869 0 0 inetand - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 2636 (  "|"	   PGNSP PGUID b f f	869 869 869 0 0 inetor - - ));
+DATA(insert OID = 2636 (  "|"	   PGNSP PGUID b f f	869 869 869 0 0 inetor - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 2637 (  "+"	   PGNSP PGUID b f f	869  20 869 2638 0 inetpl - - ));
+DATA(insert OID = 2637 (  "+"	   PGNSP PGUID b f f	869  20 869 2638 0 inetpl - - "---"));
 DESCR("add");
-DATA(insert OID = 2638 (  "+"	   PGNSP PGUID b f f	 20 869 869 2637 0 int8pl_inet - - ));
+DATA(insert OID = 2638 (  "+"	   PGNSP PGUID b f f	 20 869 869 2637 0 int8pl_inet - - "---"));
 DESCR("add");
-DATA(insert OID = 2639 (  "-"	   PGNSP PGUID b f f	869  20 869 0 0 inetmi_int8 - - ));
+DATA(insert OID = 2639 (  "-"	   PGNSP PGUID b f f	869  20 869 0 0 inetmi_int8 - - "---"));
 DESCR("subtract");
-DATA(insert OID = 2640 (  "-"	   PGNSP PGUID b f f	869 869  20 0 0 inetmi - - ));
+DATA(insert OID = 2640 (  "-"	   PGNSP PGUID b f f	869 869  20 0 0 inetmi - - "---"));
 DESCR("subtract");
 
 /* case-insensitive LIKE hacks */
-DATA(insert OID = 1625 (  "~~*"   PGNSP PGUID b f f  19 25	16 0 1626 nameiclike iclikesel iclikejoinsel ));
+DATA(insert OID = 1625 (  "~~*"   PGNSP PGUID b f f  19 25	16 0 1626 nameiclike iclikesel iclikejoinsel "---"));
 DESCR("matches LIKE expression, case-insensitive");
 #define OID_NAME_ICLIKE_OP		1625
-DATA(insert OID = 1626 (  "!~~*"  PGNSP PGUID b f f  19 25	16 0 1625 nameicnlike icnlikesel icnlikejoinsel ));
+DATA(insert OID = 1626 (  "!~~*"  PGNSP PGUID b f f  19 25	16 0 1625 nameicnlike icnlikesel icnlikejoinsel "---"));
 DESCR("does not match LIKE expression, case-insensitive");
-DATA(insert OID = 1627 (  "~~*"   PGNSP PGUID b f f  25 25	16 0 1628 texticlike iclikesel iclikejoinsel ));
+DATA(insert OID = 1627 (  "~~*"   PGNSP PGUID b f f  25 25	16 0 1628 texticlike iclikesel iclikejoinsel "---"));
 DESCR("matches LIKE expression, case-insensitive");
 #define OID_TEXT_ICLIKE_OP		1627
-DATA(insert OID = 1628 (  "!~~*"  PGNSP PGUID b f f  25 25	16 0 1627 texticnlike icnlikesel icnlikejoinsel ));
+DATA(insert OID = 1628 (  "!~~*"  PGNSP PGUID b f f  25 25	16 0 1627 texticnlike icnlikesel icnlikejoinsel "---"));
 DESCR("does not match LIKE expression, case-insensitive");
-DATA(insert OID = 1629 (  "~~*"   PGNSP PGUID b f f  1042 25	16 0 1630 bpchariclike iclikesel iclikejoinsel ));
+DATA(insert OID = 1629 (  "~~*"   PGNSP PGUID b f f  1042 25	16 0 1630 bpchariclike iclikesel iclikejoinsel "---"));
 DESCR("matches LIKE expression, case-insensitive");
 #define OID_BPCHAR_ICLIKE_OP	1629
-DATA(insert OID = 1630 (  "!~~*"  PGNSP PGUID b f f  1042 25	16 0 1629 bpcharicnlike icnlikesel icnlikejoinsel ));
+DATA(insert OID = 1630 (  "!~~*"  PGNSP PGUID b f f  1042 25	16 0 1629 bpcharicnlike icnlikesel icnlikejoinsel "---"));
 DESCR("does not match LIKE expression, case-insensitive");
 
 /* NUMERIC type - OID's 1700-1799 */
-DATA(insert OID = 1751 (  "-"	   PGNSP PGUID l f f	0 1700 1700    0	0 numeric_uminus - - ));
+DATA(insert OID = 1751 (  "-"	   PGNSP PGUID l f f	0 1700 1700    0	0 numeric_uminus - - "---"));
 DESCR("negate");
-DATA(insert OID = 1752 (  "="	   PGNSP PGUID b t t 1700 1700	 16 1752 1753 numeric_eq eqsel eqjoinsel ));
+DATA(insert OID = 1752 (  "="	   PGNSP PGUID b t t 1700 1700	 16 1752 1753 numeric_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1753 (  "<>"	   PGNSP PGUID b f f 1700 1700	 16 1753 1752 numeric_ne neqsel neqjoinsel ));
+DATA(insert OID = 1753 (  "<>"	   PGNSP PGUID b f f 1700 1700	 16 1753 1752 numeric_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1754 (  "<"	   PGNSP PGUID b f f 1700 1700	 16 1756 1757 numeric_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1754 (  "<"	   PGNSP PGUID b f f 1700 1700	 16 1756 1757 numeric_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1755 (  "<="	   PGNSP PGUID b f f 1700 1700	 16 1757 1756 numeric_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1755 (  "<="	   PGNSP PGUID b f f 1700 1700	 16 1757 1756 numeric_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1756 (  ">"	   PGNSP PGUID b f f 1700 1700	 16 1754 1755 numeric_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1756 (  ">"	   PGNSP PGUID b f f 1700 1700	 16 1754 1755 numeric_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1757 (  ">="	   PGNSP PGUID b f f 1700 1700	 16 1755 1754 numeric_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1757 (  ">="	   PGNSP PGUID b f f 1700 1700	 16 1755 1754 numeric_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 1758 (  "+"	   PGNSP PGUID b f f 1700 1700 1700 1758	0 numeric_add - - ));
+DATA(insert OID = 1758 (  "+"	   PGNSP PGUID b f f 1700 1700 1700 1758	0 numeric_add - - "---"));
 DESCR("add");
-DATA(insert OID = 1759 (  "-"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_sub - - ));
+DATA(insert OID = 1759 (  "-"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_sub - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1760 (  "*"	   PGNSP PGUID b f f 1700 1700 1700 1760	0 numeric_mul - - ));
+DATA(insert OID = 1760 (  "*"	   PGNSP PGUID b f f 1700 1700 1700 1760	0 numeric_mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1761 (  "/"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_div - - ));
+DATA(insert OID = 1761 (  "/"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_div - - "---"));
 DESCR("divide");
-DATA(insert OID = 1762 (  "%"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_mod - - ));
+DATA(insert OID = 1762 (  "%"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_mod - - "---"));
 DESCR("modulus");
-DATA(insert OID = 1038 (  "^"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_power - - ));
+DATA(insert OID = 1038 (  "^"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_power - - "---"));
 DESCR("exponentiation");
-DATA(insert OID = 1763 (  "@"	   PGNSP PGUID l f f	0 1700 1700    0	0 numeric_abs - - ));
+DATA(insert OID = 1763 (  "@"	   PGNSP PGUID l f f	0 1700 1700    0	0 numeric_abs - - "---"));
 DESCR("absolute value");
 
-DATA(insert OID = 1784 (  "="	  PGNSP PGUID b t f 1560 1560 16 1784 1785 biteq eqsel eqjoinsel ));
+DATA(insert OID = 1784 (  "="	  PGNSP PGUID b t f 1560 1560 16 1784 1785 biteq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1785 (  "<>"	  PGNSP PGUID b f f 1560 1560 16 1785 1784 bitne neqsel neqjoinsel ));
+DATA(insert OID = 1785 (  "<>"	  PGNSP PGUID b f f 1560 1560 16 1785 1784 bitne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1786 (  "<"	  PGNSP PGUID b f f 1560 1560 16 1787 1789 bitlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1786 (  "<"	  PGNSP PGUID b f f 1560 1560 16 1787 1789 bitlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1787 (  ">"	  PGNSP PGUID b f f 1560 1560 16 1786 1788 bitgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1787 (  ">"	  PGNSP PGUID b f f 1560 1560 16 1786 1788 bitgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1788 (  "<="	  PGNSP PGUID b f f 1560 1560 16 1789 1787 bitle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1788 (  "<="	  PGNSP PGUID b f f 1560 1560 16 1789 1787 bitle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1789 (  ">="	  PGNSP PGUID b f f 1560 1560 16 1788 1786 bitge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1789 (  ">="	  PGNSP PGUID b f f 1560 1560 16 1788 1786 bitge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 1791 (  "&"	  PGNSP PGUID b f f 1560 1560 1560 1791  0 bitand - - ));
+DATA(insert OID = 1791 (  "&"	  PGNSP PGUID b f f 1560 1560 1560 1791  0 bitand - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 1792 (  "|"	  PGNSP PGUID b f f 1560 1560 1560 1792  0 bitor - - ));
+DATA(insert OID = 1792 (  "|"	  PGNSP PGUID b f f 1560 1560 1560 1792  0 bitor - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 1793 (  "#"	  PGNSP PGUID b f f 1560 1560 1560 1793  0 bitxor - - ));
+DATA(insert OID = 1793 (  "#"	  PGNSP PGUID b f f 1560 1560 1560 1793  0 bitxor - - "---"));
 DESCR("bitwise exclusive or");
-DATA(insert OID = 1794 (  "~"	  PGNSP PGUID l f f    0 1560 1560	  0  0 bitnot - - ));
+DATA(insert OID = 1794 (  "~"	  PGNSP PGUID l f f    0 1560 1560	  0  0 bitnot - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 1795 (  "<<"	  PGNSP PGUID b f f 1560   23 1560	  0  0 bitshiftleft - - ));
+DATA(insert OID = 1795 (  "<<"	  PGNSP PGUID b f f 1560   23 1560	  0  0 bitshiftleft - - "---"));
 DESCR("bitwise shift left");
-DATA(insert OID = 1796 (  ">>"	  PGNSP PGUID b f f 1560   23 1560	  0  0 bitshiftright - - ));
+DATA(insert OID = 1796 (  ">>"	  PGNSP PGUID b f f 1560   23 1560	  0  0 bitshiftright - - "---"));
 DESCR("bitwise shift right");
-DATA(insert OID = 1797 (  "||"	  PGNSP PGUID b f f 1562 1562 1562	  0  0 bitcat - - ));
+DATA(insert OID = 1797 (  "||"	  PGNSP PGUID b f f 1562 1562 1562	  0  0 bitcat - - "---"));
 DESCR("concatenate");
 
-DATA(insert OID = 1800 (  "+"	   PGNSP PGUID b f f 1083 1186 1083  1849 0 time_pl_interval - - ));
+DATA(insert OID = 1800 (  "+"	   PGNSP PGUID b f f 1083 1186 1083  1849 0 time_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 1801 (  "-"	   PGNSP PGUID b f f 1083 1186 1083  0	0 time_mi_interval - - ));
+DATA(insert OID = 1801 (  "-"	   PGNSP PGUID b f f 1083 1186 1083  0	0 time_mi_interval - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1802 (  "+"	   PGNSP PGUID b f f 1266 1186 1266  2552 0 timetz_pl_interval - - ));
+DATA(insert OID = 1802 (  "+"	   PGNSP PGUID b f f 1266 1186 1266  2552 0 timetz_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 1803 (  "-"	   PGNSP PGUID b f f 1266 1186 1266  0	0 timetz_mi_interval - - ));
+DATA(insert OID = 1803 (  "-"	   PGNSP PGUID b f f 1266 1186 1266  0	0 timetz_mi_interval - - "---"));
 DESCR("subtract");
 
-DATA(insert OID = 1804 (  "="	  PGNSP PGUID b t f 1562 1562 16 1804 1805 varbiteq eqsel eqjoinsel ));
+DATA(insert OID = 1804 (  "="	  PGNSP PGUID b t f 1562 1562 16 1804 1805 varbiteq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1805 (  "<>"	  PGNSP PGUID b f f 1562 1562 16 1805 1804 varbitne neqsel neqjoinsel ));
+DATA(insert OID = 1805 (  "<>"	  PGNSP PGUID b f f 1562 1562 16 1805 1804 varbitne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1806 (  "<"	  PGNSP PGUID b f f 1562 1562 16 1807 1809 varbitlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1806 (  "<"	  PGNSP PGUID b f f 1562 1562 16 1807 1809 varbitlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1807 (  ">"	  PGNSP PGUID b f f 1562 1562 16 1806 1808 varbitgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1807 (  ">"	  PGNSP PGUID b f f 1562 1562 16 1806 1808 varbitgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1808 (  "<="	  PGNSP PGUID b f f 1562 1562 16 1809 1807 varbitle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1808 (  "<="	  PGNSP PGUID b f f 1562 1562 16 1809 1807 varbitle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1809 (  ">="	  PGNSP PGUID b f f 1562 1562 16 1808 1806 varbitge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1809 (  ">="	  PGNSP PGUID b f f 1562 1562 16 1808 1806 varbitge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 1849 (  "+"	   PGNSP PGUID b f f 1186 1083 1083  1800 0 interval_pl_time - - ));
+DATA(insert OID = 1849 (  "+"	   PGNSP PGUID b f f 1186 1083 1083  1800 0 interval_pl_time - - "---"));
 DESCR("add");
 
-DATA(insert OID = 1862 ( "="	   PGNSP PGUID b t t	21	20	16 1868  1863 int28eq eqsel eqjoinsel ));
+DATA(insert OID = 1862 ( "="	   PGNSP PGUID b t t	21	20	16 1868  1863 int28eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1863 ( "<>"	   PGNSP PGUID b f f	21	20	16 1869  1862 int28ne neqsel neqjoinsel ));
+DATA(insert OID = 1863 ( "<>"	   PGNSP PGUID b f f	21	20	16 1869  1862 int28ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1864 ( "<"	   PGNSP PGUID b f f	21	20	16 1871  1867 int28lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1864 ( "<"	   PGNSP PGUID b f f	21	20	16 1871  1867 int28lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1865 ( ">"	   PGNSP PGUID b f f	21	20	16 1870  1866 int28gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1865 ( ">"	   PGNSP PGUID b f f	21	20	16 1870  1866 int28gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1866 ( "<="	   PGNSP PGUID b f f	21	20	16 1873  1865 int28le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1866 ( "<="	   PGNSP PGUID b f f	21	20	16 1873  1865 int28le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1867 ( ">="	   PGNSP PGUID b f f	21	20	16 1872  1864 int28ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1867 ( ">="	   PGNSP PGUID b f f	21	20	16 1872  1864 int28ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 1868 ( "="	   PGNSP PGUID b t t	20	21	16	1862 1869 int82eq eqsel eqjoinsel ));
+DATA(insert OID = 1868 ( "="	   PGNSP PGUID b t t	20	21	16	1862 1869 int82eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1869 ( "<>"	   PGNSP PGUID b f f	20	21	16	1863 1868 int82ne neqsel neqjoinsel ));
+DATA(insert OID = 1869 ( "<>"	   PGNSP PGUID b f f	20	21	16	1863 1868 int82ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1870 ( "<"	   PGNSP PGUID b f f	20	21	16	1865 1873 int82lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1870 ( "<"	   PGNSP PGUID b f f	20	21	16	1865 1873 int82lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1871 ( ">"	   PGNSP PGUID b f f	20	21	16	1864 1872 int82gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1871 ( ">"	   PGNSP PGUID b f f	20	21	16	1864 1872 int82gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1872 ( "<="	   PGNSP PGUID b f f	20	21	16	1867 1871 int82le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1872 ( "<="	   PGNSP PGUID b f f	20	21	16	1867 1871 int82le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1873 ( ">="	   PGNSP PGUID b f f	20	21	16	1866 1870 int82ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1873 ( ">="	   PGNSP PGUID b f f	20	21	16	1866 1870 int82ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 1874 ( "&"	   PGNSP PGUID b f f	21	21	21	1874  0 int2and - - ));
+DATA(insert OID = 1874 ( "&"	   PGNSP PGUID b f f	21	21	21	1874  0 int2and - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 1875 ( "|"	   PGNSP PGUID b f f	21	21	21	1875  0 int2or - - ));
+DATA(insert OID = 1875 ( "|"	   PGNSP PGUID b f f	21	21	21	1875  0 int2or - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 1876 ( "#"	   PGNSP PGUID b f f	21	21	21	1876  0 int2xor - - ));
+DATA(insert OID = 1876 ( "#"	   PGNSP PGUID b f f	21	21	21	1876  0 int2xor - - "---"));
 DESCR("bitwise exclusive or");
-DATA(insert OID = 1877 ( "~"	   PGNSP PGUID l f f	 0	21	21	 0	  0 int2not - - ));
+DATA(insert OID = 1877 ( "~"	   PGNSP PGUID l f f	 0	21	21	 0	  0 int2not - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 1878 ( "<<"	   PGNSP PGUID b f f	21	23	21	 0	  0 int2shl - - ));
+DATA(insert OID = 1878 ( "<<"	   PGNSP PGUID b f f	21	23	21	 0	  0 int2shl - - "---"));
 DESCR("bitwise shift left");
-DATA(insert OID = 1879 ( ">>"	   PGNSP PGUID b f f	21	23	21	 0	  0 int2shr - - ));
+DATA(insert OID = 1879 ( ">>"	   PGNSP PGUID b f f	21	23	21	 0	  0 int2shr - - "---"));
 DESCR("bitwise shift right");
 
-DATA(insert OID = 1880 ( "&"	   PGNSP PGUID b f f	23	23	23	1880  0 int4and - - ));
+DATA(insert OID = 1880 ( "&"	   PGNSP PGUID b f f	23	23	23	1880  0 int4and - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 1881 ( "|"	   PGNSP PGUID b f f	23	23	23	1881  0 int4or - - ));
+DATA(insert OID = 1881 ( "|"	   PGNSP PGUID b f f	23	23	23	1881  0 int4or - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 1882 ( "#"	   PGNSP PGUID b f f	23	23	23	1882  0 int4xor - - ));
+DATA(insert OID = 1882 ( "#"	   PGNSP PGUID b f f	23	23	23	1882  0 int4xor - - "---"));
 DESCR("bitwise exclusive or");
-DATA(insert OID = 1883 ( "~"	   PGNSP PGUID l f f	 0	23	23	 0	  0 int4not - - ));
+DATA(insert OID = 1883 ( "~"	   PGNSP PGUID l f f	 0	23	23	 0	  0 int4not - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 1884 ( "<<"	   PGNSP PGUID b f f	23	23	23	 0	  0 int4shl - - ));
+DATA(insert OID = 1884 ( "<<"	   PGNSP PGUID b f f	23	23	23	 0	  0 int4shl - - "---"));
 DESCR("bitwise shift left");
-DATA(insert OID = 1885 ( ">>"	   PGNSP PGUID b f f	23	23	23	 0	  0 int4shr - - ));
+DATA(insert OID = 1885 ( ">>"	   PGNSP PGUID b f f	23	23	23	 0	  0 int4shr - - "---"));
 DESCR("bitwise shift right");
 
-DATA(insert OID = 1886 ( "&"	   PGNSP PGUID b f f	20	20	20	1886  0 int8and - - ));
+DATA(insert OID = 1886 ( "&"	   PGNSP PGUID b f f	20	20	20	1886  0 int8and - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 1887 ( "|"	   PGNSP PGUID b f f	20	20	20	1887  0 int8or - - ));
+DATA(insert OID = 1887 ( "|"	   PGNSP PGUID b f f	20	20	20	1887  0 int8or - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 1888 ( "#"	   PGNSP PGUID b f f	20	20	20	1888  0 int8xor - - ));
+DATA(insert OID = 1888 ( "#"	   PGNSP PGUID b f f	20	20	20	1888  0 int8xor - - "---"));
 DESCR("bitwise exclusive or");
-DATA(insert OID = 1889 ( "~"	   PGNSP PGUID l f f	 0	20	20	 0	  0 int8not - - ));
+DATA(insert OID = 1889 ( "~"	   PGNSP PGUID l f f	 0	20	20	 0	  0 int8not - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 1890 ( "<<"	   PGNSP PGUID b f f	20	23	20	 0	  0 int8shl - - ));
+DATA(insert OID = 1890 ( "<<"	   PGNSP PGUID b f f	20	23	20	 0	  0 int8shl - - "---"));
 DESCR("bitwise shift left");
-DATA(insert OID = 1891 ( ">>"	   PGNSP PGUID b f f	20	23	20	 0	  0 int8shr - - ));
+DATA(insert OID = 1891 ( ">>"	   PGNSP PGUID b f f	20	23	20	 0	  0 int8shr - - "---"));
 DESCR("bitwise shift right");
 
-DATA(insert OID = 1916 (  "+"	   PGNSP PGUID l f f	 0	20	20	0	0 int8up - - ));
+DATA(insert OID = 1916 (  "+"	   PGNSP PGUID l f f	 0	20	20	0	0 int8up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1917 (  "+"	   PGNSP PGUID l f f	 0	21	21	0	0 int2up - - ));
+DATA(insert OID = 1917 (  "+"	   PGNSP PGUID l f f	 0	21	21	0	0 int2up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1918 (  "+"	   PGNSP PGUID l f f	 0	23	23	0	0 int4up - - ));
+DATA(insert OID = 1918 (  "+"	   PGNSP PGUID l f f	 0	23	23	0	0 int4up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1919 (  "+"	   PGNSP PGUID l f f	 0	700 700 0	0 float4up - - ));
+DATA(insert OID = 1919 (  "+"	   PGNSP PGUID l f f	 0	700 700 0	0 float4up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1920 (  "+"	   PGNSP PGUID l f f	 0	701 701 0	0 float8up - - ));
+DATA(insert OID = 1920 (  "+"	   PGNSP PGUID l f f	 0	701 701 0	0 float8up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1921 (  "+"	   PGNSP PGUID l f f	 0 1700 1700	0	0 numeric_uplus - - ));
+DATA(insert OID = 1921 (  "+"	   PGNSP PGUID l f f	 0 1700 1700	0	0 numeric_uplus - - "---"));
 DESCR("unary plus");
 
 /* bytea operators */
-DATA(insert OID = 1955 ( "="	   PGNSP PGUID b t t 17 17	16 1955 1956 byteaeq eqsel eqjoinsel ));
+DATA(insert OID = 1955 ( "="	   PGNSP PGUID b t t 17 17	16 1955 1956 byteaeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1956 ( "<>"	   PGNSP PGUID b f f 17 17	16 1956 1955 byteane neqsel neqjoinsel ));
+DATA(insert OID = 1956 ( "<>"	   PGNSP PGUID b f f 17 17	16 1956 1955 byteane neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1957 ( "<"	   PGNSP PGUID b f f 17 17	16 1959 1960 bytealt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1957 ( "<"	   PGNSP PGUID b f f 17 17	16 1959 1960 bytealt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1958 ( "<="	   PGNSP PGUID b f f 17 17	16 1960 1959 byteale scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1958 ( "<="	   PGNSP PGUID b f f 17 17	16 1960 1959 byteale scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1959 ( ">"	   PGNSP PGUID b f f 17 17	16 1957 1958 byteagt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1959 ( ">"	   PGNSP PGUID b f f 17 17	16 1957 1958 byteagt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1960 ( ">="	   PGNSP PGUID b f f 17 17	16 1958 1957 byteage scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1960 ( ">="	   PGNSP PGUID b f f 17 17	16 1958 1957 byteage scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 2016 (  "~~"	   PGNSP PGUID b f f 17 17	16 0	2017 bytealike likesel likejoinsel ));
+DATA(insert OID = 2016 (  "~~"	   PGNSP PGUID b f f 17 17	16 0	2017 bytealike likesel likejoinsel "---"));
 DESCR("matches LIKE expression");
 #define OID_BYTEA_LIKE_OP		2016
-DATA(insert OID = 2017 (  "!~~"    PGNSP PGUID b f f 17 17	16 0	2016 byteanlike nlikesel nlikejoinsel ));
+DATA(insert OID = 2017 (  "!~~"    PGNSP PGUID b f f 17 17	16 0	2016 byteanlike nlikesel nlikejoinsel "---"));
 DESCR("does not match LIKE expression");
-DATA(insert OID = 2018 (  "||"	   PGNSP PGUID b f f 17 17	17 0	0	 byteacat - - ));
+DATA(insert OID = 2018 (  "||"	   PGNSP PGUID b f f 17 17	17 0	0	 byteacat - - "---"));
 DESCR("concatenate");
 
 /* timestamp operators */
-DATA(insert OID = 2060 (  "="	   PGNSP PGUID b t t 1114 1114	 16 2060 2061 timestamp_eq eqsel eqjoinsel ));
+DATA(insert OID = 2060 (  "="	   PGNSP PGUID b t t 1114 1114	 16 2060 2061 timestamp_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2061 (  "<>"	   PGNSP PGUID b f f 1114 1114	 16 2061 2060 timestamp_ne neqsel neqjoinsel ));
+DATA(insert OID = 2061 (  "<>"	   PGNSP PGUID b f f 1114 1114	 16 2061 2060 timestamp_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 2062 (  "<"	   PGNSP PGUID b f f 1114 1114	 16 2064 2065 timestamp_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2062 (  "<"	   PGNSP PGUID b f f 1114 1114	 16 2064 2065 timestamp_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2063 (  "<="	   PGNSP PGUID b f f 1114 1114	 16 2065 2064 timestamp_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2063 (  "<="	   PGNSP PGUID b f f 1114 1114	 16 2065 2064 timestamp_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2064 (  ">"	   PGNSP PGUID b f f 1114 1114	 16 2062 2063 timestamp_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2064 (  ">"	   PGNSP PGUID b f f 1114 1114	 16 2062 2063 timestamp_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2065 (  ">="	   PGNSP PGUID b f f 1114 1114	 16 2063 2062 timestamp_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2065 (  ">="	   PGNSP PGUID b f f 1114 1114	 16 2063 2062 timestamp_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2066 (  "+"	   PGNSP PGUID b f f 1114 1186 1114  2553 0 timestamp_pl_interval - - ));
+DATA(insert OID = 2066 (  "+"	   PGNSP PGUID b f f 1114 1186 1114  2553 0 timestamp_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 2067 (  "-"	   PGNSP PGUID b f f 1114 1114 1186  0	0 timestamp_mi - - ));
+DATA(insert OID = 2067 (  "-"	   PGNSP PGUID b f f 1114 1114 1186  0	0 timestamp_mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 2068 (  "-"	   PGNSP PGUID b f f 1114 1186 1114  0	0 timestamp_mi_interval - - ));
+DATA(insert OID = 2068 (  "-"	   PGNSP PGUID b f f 1114 1186 1114  0	0 timestamp_mi_interval - - "---"));
 DESCR("subtract");
 
 /* character-by-character (not collation order) comparison operators for character types */
 
-DATA(insert OID = 2314 ( "~<~"	PGNSP PGUID b f f 25 25 16 2318 2317 text_pattern_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2314 ( "~<~"	PGNSP PGUID b f f 25 25 16 2318 2317 text_pattern_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2315 ( "~<=~" PGNSP PGUID b f f 25 25 16 2317 2318 text_pattern_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2315 ( "~<=~" PGNSP PGUID b f f 25 25 16 2317 2318 text_pattern_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2317 ( "~>=~" PGNSP PGUID b f f 25 25 16 2315 2314 text_pattern_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2317 ( "~>=~" PGNSP PGUID b f f 25 25 16 2315 2314 text_pattern_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2318 ( "~>~"	PGNSP PGUID b f f 25 25 16 2314 2315 text_pattern_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2318 ( "~>~"	PGNSP PGUID b f f 25 25 16 2314 2315 text_pattern_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
 
-DATA(insert OID = 2326 ( "~<~"	PGNSP PGUID b f f 1042 1042 16 2330 2329 bpchar_pattern_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2326 ( "~<~"	PGNSP PGUID b f f 1042 1042 16 2330 2329 bpchar_pattern_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2327 ( "~<=~" PGNSP PGUID b f f 1042 1042 16 2329 2330 bpchar_pattern_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2327 ( "~<=~" PGNSP PGUID b f f 1042 1042 16 2329 2330 bpchar_pattern_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2329 ( "~>=~" PGNSP PGUID b f f 1042 1042 16 2327 2326 bpchar_pattern_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2329 ( "~>=~" PGNSP PGUID b f f 1042 1042 16 2327 2326 bpchar_pattern_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2330 ( "~>~"	PGNSP PGUID b f f 1042 1042 16 2326 2327 bpchar_pattern_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2330 ( "~>~"	PGNSP PGUID b f f 1042 1042 16 2326 2327 bpchar_pattern_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
 
 /* crosstype operations for date vs. timestamp and timestamptz */
 
-DATA(insert OID = 2345 ( "<"	   PGNSP PGUID b f f	1082	1114   16 2375 2348 date_lt_timestamp scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2345 ( "<"	   PGNSP PGUID b f f	1082	1114   16 2375 2348 date_lt_timestamp scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2346 ( "<="	   PGNSP PGUID b f f	1082	1114   16 2374 2349 date_le_timestamp scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2346 ( "<="	   PGNSP PGUID b f f	1082	1114   16 2374 2349 date_le_timestamp scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2347 ( "="	   PGNSP PGUID b t f	1082	1114   16 2373 2350 date_eq_timestamp eqsel eqjoinsel ));
+DATA(insert OID = 2347 ( "="	   PGNSP PGUID b t f	1082	1114   16 2373 2350 date_eq_timestamp eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2348 ( ">="	   PGNSP PGUID b f f	1082	1114   16 2372 2345 date_ge_timestamp scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2348 ( ">="	   PGNSP PGUID b f f	1082	1114   16 2372 2345 date_ge_timestamp scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2349 ( ">"	   PGNSP PGUID b f f	1082	1114   16 2371 2346 date_gt_timestamp scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2349 ( ">"	   PGNSP PGUID b f f	1082	1114   16 2371 2346 date_gt_timestamp scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2350 ( "<>"	   PGNSP PGUID b f f	1082	1114   16 2376 2347 date_ne_timestamp neqsel neqjoinsel ));
+DATA(insert OID = 2350 ( "<>"	   PGNSP PGUID b f f	1082	1114   16 2376 2347 date_ne_timestamp neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 2358 ( "<"	   PGNSP PGUID b f f	1082	1184   16 2388 2361 date_lt_timestamptz scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2358 ( "<"	   PGNSP PGUID b f f	1082	1184   16 2388 2361 date_lt_timestamptz scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2359 ( "<="	   PGNSP PGUID b f f	1082	1184   16 2387 2362 date_le_timestamptz scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2359 ( "<="	   PGNSP PGUID b f f	1082	1184   16 2387 2362 date_le_timestamptz scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2360 ( "="	   PGNSP PGUID b t f	1082	1184   16 2386 2363 date_eq_timestamptz eqsel eqjoinsel ));
+DATA(insert OID = 2360 ( "="	   PGNSP PGUID b t f	1082	1184   16 2386 2363 date_eq_timestamptz eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2361 ( ">="	   PGNSP PGUID b f f	1082	1184   16 2385 2358 date_ge_timestamptz scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2361 ( ">="	   PGNSP PGUID b f f	1082	1184   16 2385 2358 date_ge_timestamptz scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2362 ( ">"	   PGNSP PGUID b f f	1082	1184   16 2384 2359 date_gt_timestamptz scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2362 ( ">"	   PGNSP PGUID b f f	1082	1184   16 2384 2359 date_gt_timestamptz scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2363 ( "<>"	   PGNSP PGUID b f f	1082	1184   16 2389 2360 date_ne_timestamptz neqsel neqjoinsel ));
+DATA(insert OID = 2363 ( "<>"	   PGNSP PGUID b f f	1082	1184   16 2389 2360 date_ne_timestamptz neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 2371 ( "<"	   PGNSP PGUID b f f	1114	1082   16 2349 2374 timestamp_lt_date scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2371 ( "<"	   PGNSP PGUID b f f	1114	1082   16 2349 2374 timestamp_lt_date scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2372 ( "<="	   PGNSP PGUID b f f	1114	1082   16 2348 2375 timestamp_le_date scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2372 ( "<="	   PGNSP PGUID b f f	1114	1082   16 2348 2375 timestamp_le_date scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2373 ( "="	   PGNSP PGUID b t f	1114	1082   16 2347 2376 timestamp_eq_date eqsel eqjoinsel ));
+DATA(insert OID = 2373 ( "="	   PGNSP PGUID b t f	1114	1082   16 2347 2376 timestamp_eq_date eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2374 ( ">="	   PGNSP PGUID b f f	1114	1082   16 2346 2371 timestamp_ge_date scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2374 ( ">="	   PGNSP PGUID b f f	1114	1082   16 2346 2371 timestamp_ge_date scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2375 ( ">"	   PGNSP PGUID b f f	1114	1082   16 2345 2372 timestamp_gt_date scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2375 ( ">"	   PGNSP PGUID b f f	1114	1082   16 2345 2372 timestamp_gt_date scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2376 ( "<>"	   PGNSP PGUID b f f	1114	1082   16 2350 2373 timestamp_ne_date neqsel neqjoinsel ));
+DATA(insert OID = 2376 ( "<>"	   PGNSP PGUID b f f	1114	1082   16 2350 2373 timestamp_ne_date neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 2384 ( "<"	   PGNSP PGUID b f f	1184	1082   16 2362 2387 timestamptz_lt_date scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2384 ( "<"	   PGNSP PGUID b f f	1184	1082   16 2362 2387 timestamptz_lt_date scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2385 ( "<="	   PGNSP PGUID b f f	1184	1082   16 2361 2388 timestamptz_le_date scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2385 ( "<="	   PGNSP PGUID b f f	1184	1082   16 2361 2388 timestamptz_le_date scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2386 ( "="	   PGNSP PGUID b t f	1184	1082   16 2360 2389 timestamptz_eq_date eqsel eqjoinsel ));
+DATA(insert OID = 2386 ( "="	   PGNSP PGUID b t f	1184	1082   16 2360 2389 timestamptz_eq_date eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2387 ( ">="	   PGNSP PGUID b f f	1184	1082   16 2359 2384 timestamptz_ge_date scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2387 ( ">="	   PGNSP PGUID b f f	1184	1082   16 2359 2384 timestamptz_ge_date scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2388 ( ">"	   PGNSP PGUID b f f	1184	1082   16 2358 2385 timestamptz_gt_date scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2388 ( ">"	   PGNSP PGUID b f f	1184	1082   16 2358 2385 timestamptz_gt_date scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2389 ( "<>"	   PGNSP PGUID b f f	1184	1082   16 2363 2386 timestamptz_ne_date neqsel neqjoinsel ));
+DATA(insert OID = 2389 ( "<>"	   PGNSP PGUID b f f	1184	1082   16 2363 2386 timestamptz_ne_date neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
 /* crosstype operations for timestamp vs. timestamptz */
 
-DATA(insert OID = 2534 ( "<"	   PGNSP PGUID b f f	1114	1184   16 2544 2537 timestamp_lt_timestamptz scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2534 ( "<"	   PGNSP PGUID b f f	1114	1184   16 2544 2537 timestamp_lt_timestamptz scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2535 ( "<="	   PGNSP PGUID b f f	1114	1184   16 2543 2538 timestamp_le_timestamptz scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2535 ( "<="	   PGNSP PGUID b f f	1114	1184   16 2543 2538 timestamp_le_timestamptz scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2536 ( "="	   PGNSP PGUID b t f	1114	1184   16 2542 2539 timestamp_eq_timestamptz eqsel eqjoinsel ));
+DATA(insert OID = 2536 ( "="	   PGNSP PGUID b t f	1114	1184   16 2542 2539 timestamp_eq_timestamptz eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2537 ( ">="	   PGNSP PGUID b f f	1114	1184   16 2541 2534 timestamp_ge_timestamptz scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2537 ( ">="	   PGNSP PGUID b f f	1114	1184   16 2541 2534 timestamp_ge_timestamptz scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2538 ( ">"	   PGNSP PGUID b f f	1114	1184   16 2540 2535 timestamp_gt_timestamptz scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2538 ( ">"	   PGNSP PGUID b f f	1114	1184   16 2540 2535 timestamp_gt_timestamptz scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2539 ( "<>"	   PGNSP PGUID b f f	1114	1184   16 2545 2536 timestamp_ne_timestamptz neqsel neqjoinsel ));
+DATA(insert OID = 2539 ( "<>"	   PGNSP PGUID b f f	1114	1184   16 2545 2536 timestamp_ne_timestamptz neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 2540 ( "<"	   PGNSP PGUID b f f	1184	1114   16 2538 2543 timestamptz_lt_timestamp scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2540 ( "<"	   PGNSP PGUID b f f	1184	1114   16 2538 2543 timestamptz_lt_timestamp scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2541 ( "<="	   PGNSP PGUID b f f	1184	1114   16 2537 2544 timestamptz_le_timestamp scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2541 ( "<="	   PGNSP PGUID b f f	1184	1114   16 2537 2544 timestamptz_le_timestamp scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2542 ( "="	   PGNSP PGUID b t f	1184	1114   16 2536 2545 timestamptz_eq_timestamp eqsel eqjoinsel ));
+DATA(insert OID = 2542 ( "="	   PGNSP PGUID b t f	1184	1114   16 2536 2545 timestamptz_eq_timestamp eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2543 ( ">="	   PGNSP PGUID b f f	1184	1114   16 2535 2540 timestamptz_ge_timestamp scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2543 ( ">="	   PGNSP PGUID b f f	1184	1114   16 2535 2540 timestamptz_ge_timestamp scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2544 ( ">"	   PGNSP PGUID b f f	1184	1114   16 2534 2541 timestamptz_gt_timestamp scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2544 ( ">"	   PGNSP PGUID b f f	1184	1114   16 2534 2541 timestamptz_gt_timestamp scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2545 ( "<>"	   PGNSP PGUID b f f	1184	1114   16 2539 2542 timestamptz_ne_timestamp neqsel neqjoinsel ));
+DATA(insert OID = 2545 ( "<>"	   PGNSP PGUID b f f	1184	1114   16 2539 2542 timestamptz_ne_timestamp neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
 /* formerly-missing interval + datetime operators */
-DATA(insert OID = 2551 (  "+"	   PGNSP PGUID b f f	1186 1082 1114 1076 0 interval_pl_date - - ));
+DATA(insert OID = 2551 (  "+"	   PGNSP PGUID b f f	1186 1082 1114 1076 0 interval_pl_date - - "---"));
 DESCR("add");
-DATA(insert OID = 2552 (  "+"	   PGNSP PGUID b f f	1186 1266 1266 1802 0 interval_pl_timetz - - ));
+DATA(insert OID = 2552 (  "+"	   PGNSP PGUID b f f	1186 1266 1266 1802 0 interval_pl_timetz - - "---"));
 DESCR("add");
-DATA(insert OID = 2553 (  "+"	   PGNSP PGUID b f f	1186 1114 1114 2066 0 interval_pl_timestamp - - ));
+DATA(insert OID = 2553 (  "+"	   PGNSP PGUID b f f	1186 1114 1114 2066 0 interval_pl_timestamp - - "---"));
 DESCR("add");
-DATA(insert OID = 2554 (  "+"	   PGNSP PGUID b f f	1186 1184 1184 1327 0 interval_pl_timestamptz - - ));
+DATA(insert OID = 2554 (  "+"	   PGNSP PGUID b f f	1186 1184 1184 1327 0 interval_pl_timestamptz - - "---"));
 DESCR("add");
-DATA(insert OID = 2555 (  "+"	   PGNSP PGUID b f f	23	 1082 1082 1100 0 integer_pl_date - - ));
+DATA(insert OID = 2555 (  "+"	   PGNSP PGUID b f f	23	 1082 1082 1100 0 integer_pl_date - - "---"));
 DESCR("add");
 
 /* new operators for Y-direction rtree opfamilies */
-DATA(insert OID = 2570 (  "<<|"    PGNSP PGUID b f f 603 603	16	 0	 0 box_below positionsel positionjoinsel ));
+DATA(insert OID = 2570 (  "<<|"    PGNSP PGUID b f f 603 603	16	 0	 0 box_below positionsel positionjoinsel "---"));
 DESCR("is below");
-DATA(insert OID = 2571 (  "&<|"    PGNSP PGUID b f f 603 603	16	 0	 0 box_overbelow positionsel positionjoinsel ));
+DATA(insert OID = 2571 (  "&<|"    PGNSP PGUID b f f 603 603	16	 0	 0 box_overbelow positionsel positionjoinsel "---"));
 DESCR("overlaps or is below");
-DATA(insert OID = 2572 (  "|&>"    PGNSP PGUID b f f 603 603	16	 0	 0 box_overabove positionsel positionjoinsel ));
+DATA(insert OID = 2572 (  "|&>"    PGNSP PGUID b f f 603 603	16	 0	 0 box_overabove positionsel positionjoinsel "---"));
 DESCR("overlaps or is above");
-DATA(insert OID = 2573 (  "|>>"    PGNSP PGUID b f f 603 603	16	 0	 0 box_above positionsel positionjoinsel ));
+DATA(insert OID = 2573 (  "|>>"    PGNSP PGUID b f f 603 603	16	 0	 0 box_above positionsel positionjoinsel "---"));
 DESCR("is above");
-DATA(insert OID = 2574 (  "<<|"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_below positionsel positionjoinsel ));
+DATA(insert OID = 2574 (  "<<|"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_below positionsel positionjoinsel "---"));
 DESCR("is below");
-DATA(insert OID = 2575 (  "&<|"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_overbelow positionsel positionjoinsel ));
+DATA(insert OID = 2575 (  "&<|"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_overbelow positionsel positionjoinsel "---"));
 DESCR("overlaps or is below");
-DATA(insert OID = 2576 (  "|&>"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_overabove positionsel positionjoinsel ));
+DATA(insert OID = 2576 (  "|&>"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_overabove positionsel positionjoinsel "---"));
 DESCR("overlaps or is above");
-DATA(insert OID = 2577 (  "|>>"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_above positionsel positionjoinsel ));
+DATA(insert OID = 2577 (  "|>>"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_above positionsel positionjoinsel "---"));
 DESCR("is above");
-DATA(insert OID = 2589 (  "&<|"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_overbelow positionsel positionjoinsel ));
+DATA(insert OID = 2589 (  "&<|"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_overbelow positionsel positionjoinsel "---"));
 DESCR("overlaps or is below");
-DATA(insert OID = 2590 (  "|&>"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_overabove positionsel positionjoinsel ));
+DATA(insert OID = 2590 (  "|&>"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_overabove positionsel positionjoinsel "---"));
 DESCR("overlaps or is above");
 
 /* overlap/contains/contained for arrays */
-DATA(insert OID = 2750 (  "&&"	   PGNSP PGUID b f f 2277 2277	16 2750  0 arrayoverlap arraycontsel arraycontjoinsel ));
+DATA(insert OID = 2750 (  "&&"	   PGNSP PGUID b f f 2277 2277	16 2750  0 arrayoverlap arraycontsel arraycontjoinsel "---"));
 DESCR("overlaps");
 #define OID_ARRAY_OVERLAP_OP	2750
-DATA(insert OID = 2751 (  "@>"	   PGNSP PGUID b f f 2277 2277	16 2752  0 arraycontains arraycontsel arraycontjoinsel ));
+DATA(insert OID = 2751 (  "@>"	   PGNSP PGUID b f f 2277 2277	16 2752  0 arraycontains arraycontsel arraycontjoinsel "---"));
 DESCR("contains");
 #define OID_ARRAY_CONTAINS_OP	2751
-DATA(insert OID = 2752 (  "<@"	   PGNSP PGUID b f f 2277 2277	16 2751  0 arraycontained arraycontsel arraycontjoinsel ));
+DATA(insert OID = 2752 (  "<@"	   PGNSP PGUID b f f 2277 2277	16 2751  0 arraycontained arraycontsel arraycontjoinsel "---"));
 DESCR("is contained by");
 #define OID_ARRAY_CONTAINED_OP	2752
 
 /* capturing operators to preserve pre-8.3 behavior of text concatenation */
-DATA(insert OID = 2779 (  "||"	   PGNSP PGUID b f f 25 2776	25	 0 0 textanycat - - ));
+DATA(insert OID = 2779 (  "||"	   PGNSP PGUID b f f 25 2776	25	 0 0 textanycat - - "---"));
 DESCR("concatenate");
-DATA(insert OID = 2780 (  "||"	   PGNSP PGUID b f f 2776 25	25	 0 0 anytextcat - - ));
+DATA(insert OID = 2780 (  "||"	   PGNSP PGUID b f f 2776 25	25	 0 0 anytextcat - - "---"));
 DESCR("concatenate");
 
 /* obsolete names for contains/contained-by operators; remove these someday */
-DATA(insert OID = 2860 (  "@"	   PGNSP PGUID b f f 604 604	16 2861  0 poly_contained contsel contjoinsel ));
+DATA(insert OID = 2860 (  "@"	   PGNSP PGUID b f f 604 604	16 2861  0 poly_contained contsel contjoinsel "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2861 (  "~"	   PGNSP PGUID b f f 604 604	16 2860  0 poly_contain contsel contjoinsel ));
+DATA(insert OID = 2861 (  "~"	   PGNSP PGUID b f f 604 604	16 2860  0 poly_contain contsel contjoinsel "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2862 (  "@"	   PGNSP PGUID b f f 603 603	16 2863  0 box_contained contsel contjoinsel ));
+DATA(insert OID = 2862 (  "@"	   PGNSP PGUID b f f 603 603	16 2863  0 box_contained contsel contjoinsel "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2863 (  "~"	   PGNSP PGUID b f f 603 603	16 2862  0 box_contain contsel contjoinsel ));
+DATA(insert OID = 2863 (  "~"	   PGNSP PGUID b f f 603 603	16 2862  0 box_contain contsel contjoinsel "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2864 (  "@"	   PGNSP PGUID b f f 718 718	16 2865  0 circle_contained contsel contjoinsel ));
+DATA(insert OID = 2864 (  "@"	   PGNSP PGUID b f f 718 718	16 2865  0 circle_contained contsel contjoinsel "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2865 (  "~"	   PGNSP PGUID b f f 718 718	16 2864  0 circle_contain contsel contjoinsel ));
+DATA(insert OID = 2865 (  "~"	   PGNSP PGUID b f f 718 718	16 2864  0 circle_contain contsel contjoinsel "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2866 (  "@"	   PGNSP PGUID b f f 600 603	16	 0	 0 on_pb - - ));
+DATA(insert OID = 2866 (  "@"	   PGNSP PGUID b f f 600 603	16	 0	 0 on_pb - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2867 (  "@"	   PGNSP PGUID b f f 600 602	16 2868  0 on_ppath - - ));
+DATA(insert OID = 2867 (  "@"	   PGNSP PGUID b f f 600 602	16 2868  0 on_ppath - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2868 (  "~"	   PGNSP PGUID b f f 602 600	 16  2867  0 path_contain_pt - - ));
+DATA(insert OID = 2868 (  "~"	   PGNSP PGUID b f f 602 600	 16  2867  0 path_contain_pt - - "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2869 (  "@"	   PGNSP PGUID b f f 600 604	 16  2870  0 pt_contained_poly - - ));
+DATA(insert OID = 2869 (  "@"	   PGNSP PGUID b f f 600 604	 16  2870  0 pt_contained_poly - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2870 (  "~"	   PGNSP PGUID b f f 604 600	 16  2869  0 poly_contain_pt - - ));
+DATA(insert OID = 2870 (  "~"	   PGNSP PGUID b f f 604 600	 16  2869  0 poly_contain_pt - - "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2871 (  "@"	   PGNSP PGUID b f f 600 718	 16  2872  0 pt_contained_circle - - ));
+DATA(insert OID = 2871 (  "@"	   PGNSP PGUID b f f 600 718	 16  2872  0 pt_contained_circle - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2872 (  "~"	   PGNSP PGUID b f f 718 600	 16  2871  0 circle_contain_pt - - ));
+DATA(insert OID = 2872 (  "~"	   PGNSP PGUID b f f 718 600	 16  2871  0 circle_contain_pt - - "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2873 (  "@"	   PGNSP PGUID b f f 600 628 16   0  0 on_pl - - ));
+DATA(insert OID = 2873 (  "@"	   PGNSP PGUID b f f 600 628 16   0  0 on_pl - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2874 (  "@"	   PGNSP PGUID b f f 600 601 16   0  0 on_ps - - ));
+DATA(insert OID = 2874 (  "@"	   PGNSP PGUID b f f 600 601 16   0  0 on_ps - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2875 (  "@"	   PGNSP PGUID b f f 601 628 16   0  0 on_sl - - ));
+DATA(insert OID = 2875 (  "@"	   PGNSP PGUID b f f 601 628 16   0  0 on_sl - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2876 (  "@"	   PGNSP PGUID b f f 601 603 16   0  0 on_sb - - ));
+DATA(insert OID = 2876 (  "@"	   PGNSP PGUID b f f 601 603 16   0  0 on_sb - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2877 (  "~"	   PGNSP PGUID b f f 1034 1033	 16 0 0 aclcontains - - ));
+DATA(insert OID = 2877 (  "~"	   PGNSP PGUID b f f 1034 1033	 16 0 0 aclcontains - - "---"));
 DESCR("deprecated, use @> instead");
 
 /* uuid operators */
-DATA(insert OID = 2972 (  "="	   PGNSP PGUID b t t 2950 2950 16 2972 2973 uuid_eq eqsel eqjoinsel ));
+DATA(insert OID = 2972 (  "="	   PGNSP PGUID b t t 2950 2950 16 2972 2973 uuid_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2973 (  "<>"	   PGNSP PGUID b f f 2950 2950 16 2973 2972 uuid_ne neqsel neqjoinsel ));
+DATA(insert OID = 2973 (  "<>"	   PGNSP PGUID b f f 2950 2950 16 2973 2972 uuid_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 2974 (  "<"	   PGNSP PGUID b f f 2950 2950 16 2975 2977 uuid_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2974 (  "<"	   PGNSP PGUID b f f 2950 2950 16 2975 2977 uuid_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2975 (  ">"	   PGNSP PGUID b f f 2950 2950 16 2974 2976 uuid_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2975 (  ">"	   PGNSP PGUID b f f 2950 2950 16 2974 2976 uuid_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2976 (  "<="	   PGNSP PGUID b f f 2950 2950 16 2977 2975 uuid_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2976 (  "<="	   PGNSP PGUID b f f 2950 2950 16 2977 2975 uuid_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2977 (  ">="	   PGNSP PGUID b f f 2950 2950 16 2976 2974 uuid_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2977 (  ">="	   PGNSP PGUID b f f 2950 2950 16 2976 2974 uuid_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* pg_lsn operators */
-DATA(insert OID = 3222 (  "="	   PGNSP PGUID b t t 3220 3220 16 3222 3223 pg_lsn_eq eqsel eqjoinsel ));
+DATA(insert OID = 3222 (  "="	   PGNSP PGUID b t t 3220 3220 16 3222 3223 pg_lsn_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3223 (  "<>"	   PGNSP PGUID b f f 3220 3220 16 3223 3222 pg_lsn_ne neqsel neqjoinsel ));
+DATA(insert OID = 3223 (  "<>"	   PGNSP PGUID b f f 3220 3220 16 3223 3222 pg_lsn_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3224 (  "<"	   PGNSP PGUID b f f 3220 3220 16 3225 3227 pg_lsn_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3224 (  "<"	   PGNSP PGUID b f f 3220 3220 16 3225 3227 pg_lsn_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3225 (  ">"	   PGNSP PGUID b f f 3220 3220 16 3224 3226 pg_lsn_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3225 (  ">"	   PGNSP PGUID b f f 3220 3220 16 3224 3226 pg_lsn_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3226 (  "<="	   PGNSP PGUID b f f 3220 3220 16 3227 3225 pg_lsn_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3226 (  "<="	   PGNSP PGUID b f f 3220 3220 16 3227 3225 pg_lsn_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3227 (  ">="	   PGNSP PGUID b f f 3220 3220 16 3226 3224 pg_lsn_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3227 (  ">="	   PGNSP PGUID b f f 3220 3220 16 3226 3224 pg_lsn_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 3228 (  "-"	   PGNSP PGUID b f f 3220 3220 1700    0	0 pg_lsn_mi - - ));
+DATA(insert OID = 3228 (  "-"	   PGNSP PGUID b f f 3220 3220 1700    0	0 pg_lsn_mi - - "---"));
 DESCR("minus");
 
 /* enum operators */
-DATA(insert OID = 3516 (  "="	   PGNSP PGUID b t t 3500 3500 16 3516 3517 enum_eq eqsel eqjoinsel ));
+DATA(insert OID = 3516 (  "="	   PGNSP PGUID b t t 3500 3500 16 3516 3517 enum_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3517 (  "<>"	   PGNSP PGUID b f f 3500 3500 16 3517 3516 enum_ne neqsel neqjoinsel ));
+DATA(insert OID = 3517 (  "<>"	   PGNSP PGUID b f f 3500 3500 16 3517 3516 enum_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3518 (  "<"	   PGNSP PGUID b f f 3500 3500 16 3519 3521 enum_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3518 (  "<"	   PGNSP PGUID b f f 3500 3500 16 3519 3521 enum_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3519 (  ">"	   PGNSP PGUID b f f 3500 3500 16 3518 3520 enum_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3519 (  ">"	   PGNSP PGUID b f f 3500 3500 16 3518 3520 enum_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3520 (  "<="	   PGNSP PGUID b f f 3500 3500 16 3521 3519 enum_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3520 (  "<="	   PGNSP PGUID b f f 3500 3500 16 3521 3519 enum_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3521 (  ">="	   PGNSP PGUID b f f 3500 3500 16 3520 3518 enum_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3521 (  ">="	   PGNSP PGUID b f f 3500 3500 16 3520 3518 enum_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /*
  * tsearch operations
  */
-DATA(insert OID = 3627 (  "<"	   PGNSP PGUID b f f 3614	 3614	 16 3632 3631	 tsvector_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3627 (  "<"	   PGNSP PGUID b f f 3614	 3614	 16 3632 3631	 tsvector_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3628 (  "<="	   PGNSP PGUID b f f 3614	 3614	 16 3631 3632	 tsvector_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3628 (  "<="	   PGNSP PGUID b f f 3614	 3614	 16 3631 3632	 tsvector_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3629 (  "="	   PGNSP PGUID b t f 3614	 3614	 16 3629 3630	 tsvector_eq eqsel eqjoinsel ));
+DATA(insert OID = 3629 (  "="	   PGNSP PGUID b t f 3614	 3614	 16 3629 3630	 tsvector_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3630 (  "<>"	   PGNSP PGUID b f f 3614	 3614	 16 3630 3629	 tsvector_ne neqsel neqjoinsel ));
+DATA(insert OID = 3630 (  "<>"	   PGNSP PGUID b f f 3614	 3614	 16 3630 3629	 tsvector_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3631 (  ">="	   PGNSP PGUID b f f 3614	 3614	 16 3628 3627	 tsvector_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3631 (  ">="	   PGNSP PGUID b f f 3614	 3614	 16 3628 3627	 tsvector_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 3632 (  ">"	   PGNSP PGUID b f f 3614	 3614	 16 3627 3628	 tsvector_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3632 (  ">"	   PGNSP PGUID b f f 3614	 3614	 16 3627 3628	 tsvector_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3633 (  "||"	   PGNSP PGUID b f f 3614	 3614	 3614  0	0	 tsvector_concat   -	-	  ));
+DATA(insert OID = 3633 (  "||"	   PGNSP PGUID b f f 3614	 3614	 3614  0	0	 tsvector_concat   -	-	  "---"));
 DESCR("concatenate");
-DATA(insert OID = 3636 (  "@@"	   PGNSP PGUID b f f 3614	 3615	 16 3637	0	 ts_match_vq   tsmatchsel tsmatchjoinsel ));
+DATA(insert OID = 3636 (  "@@"	   PGNSP PGUID b f f 3614	 3615	 16 3637	0	 ts_match_vq   tsmatchsel tsmatchjoinsel "---"));
 DESCR("text search match");
-DATA(insert OID = 3637 (  "@@"	   PGNSP PGUID b f f 3615	 3614	 16 3636	0	 ts_match_qv   tsmatchsel tsmatchjoinsel ));
+DATA(insert OID = 3637 (  "@@"	   PGNSP PGUID b f f 3615	 3614	 16 3636	0	 ts_match_qv   tsmatchsel tsmatchjoinsel "---"));
 DESCR("text search match");
-DATA(insert OID = 3660 (  "@@@"    PGNSP PGUID b f f 3614	 3615	 16 3661	0	 ts_match_vq   tsmatchsel tsmatchjoinsel ));
+DATA(insert OID = 3660 (  "@@@"    PGNSP PGUID b f f 3614	 3615	 16 3661	0	 ts_match_vq   tsmatchsel tsmatchjoinsel "---"));
 DESCR("deprecated, use @@ instead");
-DATA(insert OID = 3661 (  "@@@"    PGNSP PGUID b f f 3615	 3614	 16 3660	0	 ts_match_qv   tsmatchsel tsmatchjoinsel ));
+DATA(insert OID = 3661 (  "@@@"    PGNSP PGUID b f f 3615	 3614	 16 3660	0	 ts_match_qv   tsmatchsel tsmatchjoinsel "---"));
 DESCR("deprecated, use @@ instead");
-DATA(insert OID = 3674 (  "<"	   PGNSP PGUID b f f 3615	 3615	 16 3679 3678	 tsquery_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3674 (  "<"	   PGNSP PGUID b f f 3615	 3615	 16 3679 3678	 tsquery_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3675 (  "<="	   PGNSP PGUID b f f 3615	 3615	 16 3678 3679	 tsquery_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3675 (  "<="	   PGNSP PGUID b f f 3615	 3615	 16 3678 3679	 tsquery_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3676 (  "="	   PGNSP PGUID b t f 3615	 3615	 16 3676 3677	 tsquery_eq eqsel eqjoinsel ));
+DATA(insert OID = 3676 (  "="	   PGNSP PGUID b t f 3615	 3615	 16 3676 3677	 tsquery_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3677 (  "<>"	   PGNSP PGUID b f f 3615	 3615	 16 3677 3676	 tsquery_ne neqsel neqjoinsel ));
+DATA(insert OID = 3677 (  "<>"	   PGNSP PGUID b f f 3615	 3615	 16 3677 3676	 tsquery_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3678 (  ">="	   PGNSP PGUID b f f 3615	 3615	 16 3675 3674	 tsquery_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3678 (  ">="	   PGNSP PGUID b f f 3615	 3615	 16 3675 3674	 tsquery_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 3679 (  ">"	   PGNSP PGUID b f f 3615	 3615	 16 3674 3675	 tsquery_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3679 (  ">"	   PGNSP PGUID b f f 3615	 3615	 16 3674 3675	 tsquery_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3680 (  "&&"	   PGNSP PGUID b f f 3615	 3615	 3615  0	0	 tsquery_and   -	-	  ));
+DATA(insert OID = 3680 (  "&&"	   PGNSP PGUID b f f 3615	 3615	 3615  0	0	 tsquery_and   -	-	  "---"));
 DESCR("AND-concatenate");
-DATA(insert OID = 3681 (  "||"	   PGNSP PGUID b f f 3615	 3615	 3615  0	0	 tsquery_or   -		-	  ));
+DATA(insert OID = 3681 (  "||"	   PGNSP PGUID b f f 3615	 3615	 3615  0	0	 tsquery_or   -		-	  "---"));
 DESCR("OR-concatenate");
-DATA(insert OID = 3682 (  "!!"	   PGNSP PGUID l f f 0		 3615	 3615  0	0	 tsquery_not   -	-	  ));
+DATA(insert OID = 3682 (  "!!"	   PGNSP PGUID l f f 0		 3615	 3615  0	0	 tsquery_not   -	-	  "---"));
 DESCR("NOT tsquery");
-DATA(insert OID = 3693 (  "@>"	   PGNSP PGUID b f f 3615	 3615	 16 3694	0	 tsq_mcontains	contsel    contjoinsel	 ));
+DATA(insert OID = 3693 (  "@>"	   PGNSP PGUID b f f 3615	 3615	 16 3694	0	 tsq_mcontains	contsel    contjoinsel	 "---"));
 DESCR("contains");
-DATA(insert OID = 3694 (  "<@"	   PGNSP PGUID b f f 3615	 3615	 16 3693	0	 tsq_mcontained contsel    contjoinsel	 ));
+DATA(insert OID = 3694 (  "<@"	   PGNSP PGUID b f f 3615	 3615	 16 3693	0	 tsq_mcontained contsel    contjoinsel	 "---"));
 DESCR("is contained by");
-DATA(insert OID = 3762 (  "@@"	   PGNSP PGUID b f f 25		 25		 16    0	0	 ts_match_tt	contsel    contjoinsel	 ));
+DATA(insert OID = 3762 (  "@@"	   PGNSP PGUID b f f 25		 25		 16    0	0	 ts_match_tt	contsel    contjoinsel	 "---"));
 DESCR("text search match");
-DATA(insert OID = 3763 (  "@@"	   PGNSP PGUID b f f 25		 3615	 16    0	0	 ts_match_tq	contsel    contjoinsel	 ));
+DATA(insert OID = 3763 (  "@@"	   PGNSP PGUID b f f 25		 3615	 16    0	0	 ts_match_tq	contsel    contjoinsel	 "---"));
 DESCR("text search match");
 
 /* generic record comparison operators */
-DATA(insert OID = 2988 (  "="	   PGNSP PGUID b t f 2249 2249 16 2988 2989 record_eq eqsel eqjoinsel ));
+DATA(insert OID = 2988 (  "="	   PGNSP PGUID b t f 2249 2249 16 2988 2989 record_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define RECORD_EQ_OP 2988
-DATA(insert OID = 2989 (  "<>"	   PGNSP PGUID b f f 2249 2249 16 2989 2988 record_ne neqsel neqjoinsel ));
+DATA(insert OID = 2989 (  "<>"	   PGNSP PGUID b f f 2249 2249 16 2989 2988 record_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 2990 (  "<"	   PGNSP PGUID b f f 2249 2249 16 2991 2993 record_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2990 (  "<"	   PGNSP PGUID b f f 2249 2249 16 2991 2993 record_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define RECORD_LT_OP 2990
-DATA(insert OID = 2991 (  ">"	   PGNSP PGUID b f f 2249 2249 16 2990 2992 record_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2991 (  ">"	   PGNSP PGUID b f f 2249 2249 16 2990 2992 record_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
 #define RECORD_GT_OP 2991
-DATA(insert OID = 2992 (  "<="	   PGNSP PGUID b f f 2249 2249 16 2993 2991 record_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2992 (  "<="	   PGNSP PGUID b f f 2249 2249 16 2993 2991 record_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2993 (  ">="	   PGNSP PGUID b f f 2249 2249 16 2992 2990 record_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2993 (  ">="	   PGNSP PGUID b f f 2249 2249 16 2992 2990 record_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* byte-oriented tests for identical rows and fast sorting */
-DATA(insert OID = 3188 (  "*="	   PGNSP PGUID b t f 2249 2249 16 3188 3189 record_image_eq eqsel eqjoinsel ));
+DATA(insert OID = 3188 (  "*="	   PGNSP PGUID b t f 2249 2249 16 3188 3189 record_image_eq eqsel eqjoinsel "mhf"));
 DESCR("identical");
-DATA(insert OID = 3189 (  "*<>"   PGNSP PGUID b f f 2249 2249 16 3189 3188 record_image_ne neqsel neqjoinsel ));
+DATA(insert OID = 3189 (  "*<>"   PGNSP PGUID b f f 2249 2249 16 3189 3188 record_image_ne neqsel neqjoinsel "mhf"));
 DESCR("not identical");
-DATA(insert OID = 3190 (  "*<"	   PGNSP PGUID b f f 2249 2249 16 3191 3193 record_image_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3190 (  "*<"	   PGNSP PGUID b f f 2249 2249 16 3191 3193 record_image_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3191 (  "*>"	   PGNSP PGUID b f f 2249 2249 16 3190 3192 record_image_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3191 (  "*>"	   PGNSP PGUID b f f 2249 2249 16 3190 3192 record_image_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3192 (  "*<="   PGNSP PGUID b f f 2249 2249 16 3193 3191 record_image_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3192 (  "*<="   PGNSP PGUID b f f 2249 2249 16 3193 3191 record_image_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3193 (  "*>="   PGNSP PGUID b f f 2249 2249 16 3192 3190 record_image_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3193 (  "*>="   PGNSP PGUID b f f 2249 2249 16 3192 3190 record_image_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* generic range type operators */
-DATA(insert OID = 3882 (  "="	   PGNSP PGUID b t t 3831 3831 16 3882 3883 range_eq eqsel eqjoinsel ));
+DATA(insert OID = 3882 (  "="	   PGNSP PGUID b t t 3831 3831 16 3882 3883 range_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3883 (  "<>"	   PGNSP PGUID b f f 3831 3831 16 3883 3882 range_ne neqsel neqjoinsel ));
+DATA(insert OID = 3883 (  "<>"	   PGNSP PGUID b f f 3831 3831 16 3883 3882 range_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3884 (  "<"	   PGNSP PGUID b f f 3831 3831 16 3887 3886 range_lt rangesel scalarltjoinsel ));
+DATA(insert OID = 3884 (  "<"	   PGNSP PGUID b f f 3831 3831 16 3887 3886 range_lt rangesel scalarltjoinsel "---"));
 DESCR("less than");
 #define OID_RANGE_LESS_OP 3884
-DATA(insert OID = 3885 (  "<="	   PGNSP PGUID b f f 3831 3831 16 3886 3887 range_le rangesel scalarltjoinsel ));
+DATA(insert OID = 3885 (  "<="	   PGNSP PGUID b f f 3831 3831 16 3886 3887 range_le rangesel scalarltjoinsel "---"));
 DESCR("less than or equal");
 #define OID_RANGE_LESS_EQUAL_OP 3885
-DATA(insert OID = 3886 (  ">="	   PGNSP PGUID b f f 3831 3831 16 3885 3884 range_ge rangesel scalargtjoinsel ));
+DATA(insert OID = 3886 (  ">="	   PGNSP PGUID b f f 3831 3831 16 3885 3884 range_ge rangesel scalargtjoinsel "---"));
 DESCR("greater than or equal");
 #define OID_RANGE_GREATER_EQUAL_OP 3886
-DATA(insert OID = 3887 (  ">"	   PGNSP PGUID b f f 3831 3831 16 3884 3885 range_gt rangesel scalargtjoinsel ));
+DATA(insert OID = 3887 (  ">"	   PGNSP PGUID b f f 3831 3831 16 3884 3885 range_gt rangesel scalargtjoinsel "---"));
 DESCR("greater than");
 #define OID_RANGE_GREATER_OP 3887
-DATA(insert OID = 3888 (  "&&"	   PGNSP PGUID b f f 3831 3831 16 3888 0 range_overlaps rangesel areajoinsel ));
+DATA(insert OID = 3888 (  "&&"	   PGNSP PGUID b f f 3831 3831 16 3888 0 range_overlaps rangesel areajoinsel "---"));
 DESCR("overlaps");
 #define OID_RANGE_OVERLAP_OP 3888
-DATA(insert OID = 3889 (  "@>"	   PGNSP PGUID b f f 3831 2283 16 3891 0 range_contains_elem rangesel contjoinsel ));
+DATA(insert OID = 3889 (  "@>"	   PGNSP PGUID b f f 3831 2283 16 3891 0 range_contains_elem rangesel contjoinsel "---"));
 DESCR("contains");
 #define OID_RANGE_CONTAINS_ELEM_OP 3889
-DATA(insert OID = 3890 (  "@>"	   PGNSP PGUID b f f 3831 3831 16 3892 0 range_contains rangesel contjoinsel ));
+DATA(insert OID = 3890 (  "@>"	   PGNSP PGUID b f f 3831 3831 16 3892 0 range_contains rangesel contjoinsel "---"));
 DESCR("contains");
 #define OID_RANGE_CONTAINS_OP 3890
-DATA(insert OID = 3891 (  "<@"	   PGNSP PGUID b f f 2283 3831 16 3889 0 elem_contained_by_range rangesel contjoinsel ));
+DATA(insert OID = 3891 (  "<@"	   PGNSP PGUID b f f 2283 3831 16 3889 0 elem_contained_by_range rangesel contjoinsel "---"));
 DESCR("is contained by");
 #define OID_RANGE_ELEM_CONTAINED_OP 3891
-DATA(insert OID = 3892 (  "<@"	   PGNSP PGUID b f f 3831 3831 16 3890 0 range_contained_by rangesel contjoinsel ));
+DATA(insert OID = 3892 (  "<@"	   PGNSP PGUID b f f 3831 3831 16 3890 0 range_contained_by rangesel contjoinsel "---"));
 DESCR("is contained by");
 #define OID_RANGE_CONTAINED_OP 3892
-DATA(insert OID = 3893 (  "<<"	   PGNSP PGUID b f f 3831 3831 16 3894 0 range_before rangesel scalarltjoinsel ));
+DATA(insert OID = 3893 (  "<<"	   PGNSP PGUID b f f 3831 3831 16 3894 0 range_before rangesel scalarltjoinsel "---"));
 DESCR("is left of");
 #define OID_RANGE_LEFT_OP 3893
-DATA(insert OID = 3894 (  ">>"	   PGNSP PGUID b f f 3831 3831 16 3893 0 range_after rangesel scalargtjoinsel ));
+DATA(insert OID = 3894 (  ">>"	   PGNSP PGUID b f f 3831 3831 16 3893 0 range_after rangesel scalargtjoinsel "---"));
 DESCR("is right of");
 #define OID_RANGE_RIGHT_OP 3894
-DATA(insert OID = 3895 (  "&<"	   PGNSP PGUID b f f 3831 3831 16 0 0 range_overleft rangesel scalarltjoinsel ));
+DATA(insert OID = 3895 (  "&<"	   PGNSP PGUID b f f 3831 3831 16 0 0 range_overleft rangesel scalarltjoinsel "---"));
 DESCR("overlaps or is left of");
 #define OID_RANGE_OVERLAPS_LEFT_OP 3895
-DATA(insert OID = 3896 (  "&>"	   PGNSP PGUID b f f 3831 3831 16 0 0 range_overright rangesel scalargtjoinsel ));
+DATA(insert OID = 3896 (  "&>"	   PGNSP PGUID b f f 3831 3831 16 0 0 range_overright rangesel scalargtjoinsel "---"));
 DESCR("overlaps or is right of");
 #define OID_RANGE_OVERLAPS_RIGHT_OP 3896
-DATA(insert OID = 3897 (  "-|-"    PGNSP PGUID b f f 3831 3831 16 3897 0 range_adjacent contsel contjoinsel ));
+DATA(insert OID = 3897 (  "-|-"    PGNSP PGUID b f f 3831 3831 16 3897 0 range_adjacent contsel contjoinsel "---"));
 DESCR("is adjacent to");
-DATA(insert OID = 3898 (  "+"	   PGNSP PGUID b f f 3831 3831 3831 3898 0 range_union - - ));
+DATA(insert OID = 3898 (  "+"	   PGNSP PGUID b f f 3831 3831 3831 3898 0 range_union - - "---"));
 DESCR("range union");
-DATA(insert OID = 3899 (  "-"	   PGNSP PGUID b f f 3831 3831 3831 0 0 range_minus - - ));
+DATA(insert OID = 3899 (  "-"	   PGNSP PGUID b f f 3831 3831 3831 0 0 range_minus - - "---"));
 DESCR("range difference");
-DATA(insert OID = 3900 (  "*"	   PGNSP PGUID b f f 3831 3831 3831 3900 0 range_intersect - - ));
+DATA(insert OID = 3900 (  "*"	   PGNSP PGUID b f f 3831 3831 3831 3900 0 range_intersect - - "---"));
 DESCR("range intersection");
-DATA(insert OID = 3962 (  "->"	   PGNSP PGUID b f f 114 25 114 0 0 json_object_field - - ));
+DATA(insert OID = 3962 (  "->"	   PGNSP PGUID b f f 114 25 114 0 0 json_object_field - - "---"));
 DESCR("get json object field");
-DATA(insert OID = 3963 (  "->>"    PGNSP PGUID b f f 114 25 25 0 0 json_object_field_text - - ));
+DATA(insert OID = 3963 (  "->>"    PGNSP PGUID b f f 114 25 25 0 0 json_object_field_text - - "---"));
 DESCR("get json object field as text");
-DATA(insert OID = 3964 (  "->"	   PGNSP PGUID b f f 114 23 114 0 0 json_array_element - - ));
+DATA(insert OID = 3964 (  "->"	   PGNSP PGUID b f f 114 23 114 0 0 json_array_element - - "---"));
 DESCR("get json array element");
-DATA(insert OID = 3965 (  "->>"    PGNSP PGUID b f f 114 23 25 0 0 json_array_element_text - - ));
+DATA(insert OID = 3965 (  "->>"    PGNSP PGUID b f f 114 23 25 0 0 json_array_element_text - - "---"));
 DESCR("get json array element as text");
-DATA(insert OID = 3966 (  "#>"	   PGNSP PGUID b f f 114 1009 114 0 0 json_extract_path - - ));
+DATA(insert OID = 3966 (  "#>"	   PGNSP PGUID b f f 114 1009 114 0 0 json_extract_path - - "---"));
 DESCR("get value from json with path elements");
-DATA(insert OID = 3967 (  "#>>"    PGNSP PGUID b f f 114 1009 25 0 0 json_extract_path_text - - ));
+DATA(insert OID = 3967 (  "#>>"    PGNSP PGUID b f f 114 1009 25 0 0 json_extract_path_text - - "---"));
 DESCR("get value from json as text with path elements");
-DATA(insert OID = 3211 (  "->"	   PGNSP PGUID b f f 3802 25 3802 0 0 jsonb_object_field - - ));
+DATA(insert OID = 3211 (  "->"	   PGNSP PGUID b f f 3802 25 3802 0 0 jsonb_object_field - - "---"));
 DESCR("get jsonb object field");
-DATA(insert OID = 3477 (  "->>"    PGNSP PGUID b f f 3802 25 25 0 0 jsonb_object_field_text - - ));
+DATA(insert OID = 3477 (  "->>"    PGNSP PGUID b f f 3802 25 25 0 0 jsonb_object_field_text - - "---"));
 DESCR("get jsonb object field as text");
-DATA(insert OID = 3212 (  "->"	   PGNSP PGUID b f f 3802 23 3802 0 0 jsonb_array_element - - ));
+DATA(insert OID = 3212 (  "->"	   PGNSP PGUID b f f 3802 23 3802 0 0 jsonb_array_element - - "---"));
 DESCR("get jsonb array element");
-DATA(insert OID = 3481 (  "->>"    PGNSP PGUID b f f 3802 23 25 0 0 jsonb_array_element_text - - ));
+DATA(insert OID = 3481 (  "->>"    PGNSP PGUID b f f 3802 23 25 0 0 jsonb_array_element_text - - "---"));
 DESCR("get jsonb array element as text");
-DATA(insert OID = 3213 (  "#>"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_extract_path - - ));
+DATA(insert OID = 3213 (  "#>"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_extract_path - - "---"));
 DESCR("get value from jsonb with path elements");
-DATA(insert OID = 3206 (  "#>>"    PGNSP PGUID b f f 3802 1009 25 0 0 jsonb_extract_path_text - - ));
+DATA(insert OID = 3206 (  "#>>"    PGNSP PGUID b f f 3802 1009 25 0 0 jsonb_extract_path_text - - "---"));
 DESCR("get value from jsonb as text with path elements");
-DATA(insert OID = 3240 (  "="	 PGNSP PGUID b t t 3802 3802  16 3240 3241 jsonb_eq eqsel eqjoinsel ));
+DATA(insert OID = 3240 (  "="	 PGNSP PGUID b t t 3802 3802  16 3240 3241 jsonb_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3241 (  "<>"	 PGNSP PGUID b f f 3802 3802  16 3241 3240 jsonb_ne neqsel neqjoinsel ));
+DATA(insert OID = 3241 (  "<>"	 PGNSP PGUID b f f 3802 3802  16 3241 3240 jsonb_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3242 (  "<"		PGNSP PGUID b f f 3802 3802 16 3243 3245 jsonb_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3242 (  "<"		PGNSP PGUID b f f 3802 3802 16 3243 3245 jsonb_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3243 (  ">"		PGNSP PGUID b f f 3802 3802 16 3242 3244 jsonb_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3243 (  ">"		PGNSP PGUID b f f 3802 3802 16 3242 3244 jsonb_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3244 (  "<="	PGNSP PGUID b f f 3802 3802 16 3245 3243 jsonb_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3244 (  "<="	PGNSP PGUID b f f 3802 3802 16 3245 3243 jsonb_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3245 (  ">="	PGNSP PGUID b f f 3802 3802 16 3244 3242 jsonb_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3245 (  ">="	PGNSP PGUID b f f 3802 3802 16 3244 3242 jsonb_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 3246 (  "@>"	   PGNSP PGUID b f f 3802 3802 16 3250 0 jsonb_contains contsel contjoinsel ));
+DATA(insert OID = 3246 (  "@>"	   PGNSP PGUID b f f 3802 3802 16 3250 0 jsonb_contains contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 3247 (  "?"	   PGNSP PGUID b f f 3802 25 16 0 0 jsonb_exists contsel contjoinsel ));
+DATA(insert OID = 3247 (  "?"	   PGNSP PGUID b f f 3802 25 16 0 0 jsonb_exists contsel contjoinsel "---"));
 DESCR("exists");
-DATA(insert OID = 3248 (  "?|"	   PGNSP PGUID b f f 3802 1009 16 0 0 jsonb_exists_any contsel contjoinsel ));
+DATA(insert OID = 3248 (  "?|"	   PGNSP PGUID b f f 3802 1009 16 0 0 jsonb_exists_any contsel contjoinsel "---"));
 DESCR("exists any");
-DATA(insert OID = 3249 (  "?&"	   PGNSP PGUID b f f 3802 1009 16 0 0 jsonb_exists_all contsel contjoinsel ));
+DATA(insert OID = 3249 (  "?&"	   PGNSP PGUID b f f 3802 1009 16 0 0 jsonb_exists_all contsel contjoinsel "---"));
 DESCR("exists all");
-DATA(insert OID = 3250 (  "<@"	   PGNSP PGUID b f f 3802 3802 16 3246 0 jsonb_contained contsel contjoinsel ));
+DATA(insert OID = 3250 (  "<@"	   PGNSP PGUID b f f 3802 3802 16 3246 0 jsonb_contained contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 3284 (  "||"	   PGNSP PGUID b f f 3802 3802 3802 0 0 jsonb_concat - - ));
+DATA(insert OID = 3284 (  "||"	   PGNSP PGUID b f f 3802 3802 3802 0 0 jsonb_concat - - "---"));
 DESCR("concatenate");
-DATA(insert OID = 3285 (  "-"	   PGNSP PGUID b f f 3802 25 3802 0 0 3302 - - ));
+DATA(insert OID = 3285 (  "-"	   PGNSP PGUID b f f 3802 25 3802 0 0 3302 - - "---"));
 DESCR("delete object field");
-DATA(insert OID = 3286 (  "-"	   PGNSP PGUID b f f 3802 23 3802 0 0 3303 - - ));
+DATA(insert OID = 3286 (  "-"	   PGNSP PGUID b f f 3802 23 3802 0 0 3303 - - "---"));
 DESCR("delete array element");
-DATA(insert OID = 3287 (  "#-"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_delete_path - - ));
+DATA(insert OID = 3287 (  "#-"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_delete_path - - "---"));
 DESCR("delete path");
 
 /*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 9254f85..9865a9c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -250,6 +250,7 @@ typedef enum NodeTag
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
 	T_MVStatisticInfo,
+	T_RestrictStatData,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1979cdf..b78ee5d 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,12 +15,12 @@
 #define RELATION_H
 
 #include "access/sdir.h"
+#include "access/htup.h"
 #include "lib/stringinfo.h"
 #include "nodes/params.h"
 #include "nodes/parsenodes.h"
 #include "storage/block.h"
 
-
 /*
  * Relids
  *		Set of relation identifiers (indexes into the rangetable).
@@ -1341,6 +1341,26 @@ typedef struct RestrictInfo
 	Selectivity right_bucketsize;		/* avg bucketsize of right side */
 } RestrictInfo;
 
+typedef struct bm_mvstat
+{
+	Bitmapset *attrs;
+	MVStatisticInfo *stats;
+	int			mvkind;
+} bm_mvstat;
+
+typedef struct RestrictStatData
+{
+	NodeTag			type;
+	BoolExprType	 boolop;
+	Node			*clause;
+	Node			*mvclause;
+	Node			*nonmvclause;
+	List			*children;
+	List			*mvstats;
+	Bitmapset		*mvattrs;
+	List			*unusedrinfos;
+} RestrictStatData;
+
 /*
  * Since mergejoinscansel() is a relatively expensive function, and would
  * otherwise be invoked many times while planning a large join tree,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6bfd338..24003ae 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -183,13 +183,11 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo,
-					   List *conditions);
+					   SpecialJoinInfo *sjinfo);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo,
-				   List *conditions);
+				   SpecialJoinInfo *sjinfo);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index a40c9b1..bb9d68b 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -84,6 +84,7 @@ extern Oid	get_commutator(Oid opno);
 extern Oid	get_negator(Oid opno);
 extern RegProcedure get_oprrest(Oid opno);
 extern RegProcedure get_oprjoin(Oid opno);
+extern int get_oprmvstat(Oid opno);
 extern char *get_func_name(Oid funcid);
 extern Oid	get_func_namespace(Oid funcid);
 extern Oid	get_func_rettype(Oid funcid);
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index f2fbc11..a08fd58 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -34,6 +34,9 @@ extern int mvstat_search_type;
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
+#define MVSTATISTIC_MCV  1
+#define MVSTATISTIC_HIST 2
+#define MVSTATISTIC_FDEP 4
 
 /*
  * Functional dependencies, tracking column-level relationships (values
-- 
1.8.3.1

#49

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#48)

Re: multivariate statistics / patch v7

Hi,

On 07/16/2015 01:51 PM, Kyotaro HORIGUCHI wrote:

Hi, I'd like to show you the modified constitution of
multivariate statistics application logic. Please find the
attached. They apply on your v7 patch.

Sadly I do have some trouble getting it to apply correctly :-(
So for now all my comments are based on just reading the code.

FWIW I've rebased my patch to the current master, it's available on
github as usual:

https://github.com/tvondra/postgres/commits/mvstats

The code to find mv-applicable clause is moved out of the main
flow of clauselist_selectivity. As I said in the previous mail,
the new function transformRestrictInfoForEstimate (too bad name
but just for PoC:) scans clauselist and generates
RestrictStatsData struct which drives mv-aware selectivity
calculation. This struct isolates MV and non-MV estimation.

The struct RestrictStatData mainly consists of the following
three parts,

- clause to be estimated by current logic (MV is not applicable)
- clause to be estimated by MV-staistics.
- list of child RestrictStatDatas, which are to be run
recursively.

mvclause_selectivty() is the topmost function where mv stats
works. This structure effectively prevents main estimation flow
from being broken by modifying mvstats part. Although I haven't
measured but I'm positive the code is far reduced from yours.

I attached two patches to this message. The first one is to
rebase v7 patch to current(maybe) master and the second applies
the refactoring.

I'm a little anxious about performance but I think this makes the
process to apply mv-stats far clearer. Regtests for mvstats
succeeded asis except for fdep, which is not implememted in this
patch.

What do you think about this?

I'm not sure, at this point. I'm having a hard time understanding how
exactly the code works - there are pretty much no comments explaining
the implementation, so it takes time to understand the code. This is
especially true about transformRestrictInfoForEstimate which is also
quite long. I understand it's a PoC, but comments would really help.

On a conceptual level, I think the idea to split the estimation into two
phases - enrich the expression tree with nodes with details about stats
etc, and then actually do the estimation in the second phase might be
interesting. Not because it's somehow clearer, but because it gives us a
chance to see the expression tree as a whole, with details about all the
stats (with the current code we process/estimate the tree
incrementally). But I don't really know how useful that would be.

I don't think the proposed change makes the process somehow clearer. I
know it's a PoC at this point, so I don't expect it to be perfect, but
for me the original code is certainly clearer. Of course, I'm biased as
I wrote the current code, and I (naturally) shaped it to match my ideas
during the development process, and I'm much more familiar with it.

Omitting the support for functional dependencies is a bit unfortunate, I
think. Is that merely to make the PoC simpler, or is there something
that makes it impossible to support that kind of stats?

Another thing that I noticed is that you completely removed the code
that combined multiple stats (and selected the best combination of
stats). In other words, you've reverted to the intermediate single
statistics approach, including removing the improved handling of OR
clauses and conditions. It's a bit difficult to judge the proposed
approach not knowing how well it supports those (quite crucial)
features. What if it can't support some them., or what if it makes the
code much more complicated (thus defeating the goal of making it more
clear)?

I share your concern about the performance impact - one thing is that
this new code might be slower than the original one, but a more serious
issue IMHO is that the performance impact will happen even for relations
with no multivariate stats at all. The original patch was very careful
about getting ~0% overhead in such cases, and if the new code does not
allow that, I don't see this approach as acceptable. We must not put
additional overhead on people not using multivariate stats.

But I think it's worth exploring this idea a bit more - can you rebase
it to the current patch version (as on github) and adding the missing
pieces (functional dependencies, multi-statistics estimation and passing
conditions)?

One more thing - I noticed you extended the pg_operator catalog with a
oprmvstat attribute, used to flag operators that are compatible with
multivariate stats. I'm not happy with the current approach (using
oprrest to do this decision), but I'm not really sure this is a good
solution either. The culprit is that it only answers one of the two
important questions - Is it compatible? How to perform the estimation?

So we'd have to rely on oprrest anyway, when actually performing the
estimation of a clause with "compatible" operator. And we'd have to keep
in sync two places (catalog and checks in file), and we'd have to update
the catalog after improving the implementation (adding support for
another operator).

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Tomas Vondra (#49)

Re: multivariate statistics / patch v7

Hello,

At Sat, 25 Jul 2015 23:09:31 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55B3FB0B.7000201@2ndquadrant.com>

Hi,

On 07/16/2015 01:51 PM, Kyotaro HORIGUCHI wrote:

Hi, I'd like to show you the modified constitution of
multivariate statistics application logic. Please find the
attached. They apply on your v7 patch.

Sadly I do have some trouble getting it to apply correctly :-(
So for now all my comments are based on just reading the code.

Ah. My modification to rebase to the master for the time should
be culprit. Sorry for the dirty patch.

# I would recreate the patch if you complained before struggling
# with the thing..

The core of the modification is on closesel.c. I attached the
patched closesel.c.

FWIW I've rebased my patch to the current master, it's available on
github as usual:

https://github.com/tvondra/postgres/commits/mvstats

Thanks.

The code to find mv-applicable clause is moved out of the main
flow of clauselist_selectivity. As I said in the previous mail,
the new function transformRestrictInfoForEstimate (too bad name
but just for PoC:) scans clauselist and generates
RestrictStatsData struct which drives mv-aware selectivity
calculation. This struct isolates MV and non-MV estimation.

The struct RestrictStatData mainly consists of the following
three parts,

- clause to be estimated by current logic (MV is not applicable)
- clause to be estimated by MV-staistics.
- list of child RestrictStatDatas, which are to be run
recursively.

mvclause_selectivty() is the topmost function where mv stats
works. This structure effectively prevents main estimation flow
from being broken by modifying mvstats part. Although I haven't
measured but I'm positive the code is far reduced from yours.

I attached two patches to this message. The first one is to
rebase v7 patch to current(maybe) master and the second applies
the refactoring.

I'm a little anxious about performance but I think this makes the
process to apply mv-stats far clearer. Regtests for mvstats
succeeded asis except for fdep, which is not implememted in this
patch.

What do you think about this?

I'm not sure, at this point. I'm having a hard time understanding how
exactly the code works - there are pretty much no comments explaining
the implementation, so it takes time to understand the code. This is
especially true about transformRestrictInfoForEstimate which is also
quite long. I understand it's a PoC, but comments would really help.

The patch itself shold hardly readable because it's not from
master but from your last patch plus somthing.

My concern about the code at the time was following,

- You embedded the logic of multivariate estimation into
clauselist_selectivity. I think estimate using multivariate
statistics is quite different from the ordinary estimate based
on single column stats then they are logically separatable and
we should do so.

- You are taking top-down approach and it runs tree-walking to
check appliability of mv-stats for every stepping down in
clause tree. If the subtree found to be mv-applicable, split it
to two parts - mv-compatible and non-compatible. These steps
requires expression tree walking, which looks using too-much
CPU.

- You look to be considering the cases when users create many
multivariate statistics on attribute sets having
duplications. But it looks too-much for me. MV-stats are more
resource-eating so we can assume the minimum usage of that.

My suggestion in the patch is a bottom-up approach to find
mv-applicable portion(s) in the expression tree, which is the
basic way of planner overall. The approach requires no repetitive
run of tree walker, that is, pull_varnos. It could fail to find
the 'optimal' solution for complex situations but needs far less
calculation for almost the same return (I think..).

Even though it doesn't consider the functional dependency, the
reduce of the code shows the efficiency. It does not nothing
tricky.

On a conceptual level, I think the idea to split the estimation into
two phases - enrich the expression tree with nodes with details about
stats etc, and then actually do the estimation in the second phase
might be interesting. Not because it's somehow clearer, but because it
gives us a chance to see the expression tree as a whole, with details
about all the stats (with the current code we process/estimate the
tree incrementally). But I don't really know how useful that would be.

It is difficult to say which approach is better sinch it is
affected by what we think important than other things. However I
concern about that your code substantially reconstructs the
expression (clause) tree midst of processing it. I believe it
should be a separate phase for simplicity. Of course additional
required resource is also should be considered but it is rather
reduced for this case.

I don't think the proposed change makes the process somehow clearer. I
know it's a PoC at this point, so I don't expect it to be perfect, but
for me the original code is certainly clearer. Of course, I'm biased
as I wrote the current code, and I (naturally) shaped it to match my
ideas during the development process, and I'm much more familiar with
it.

Mmm. we need someone else's opition:) What I think on this point
is described just above... OK, I try to describe this in other
words.

The embedded approach simply increases the state and code path
by, roughly, multiplication basis. The separate approcach adds
them in addition basis. I thinks this is the most siginificant
point of why I feel it 'clear'.

Of course, the acceptable complexity differs according to the
fundamental complexity, performance, required memory or someting
others but I feel it is too-much complexity for the objective.

Omitting the support for functional dependencies is a bit unfortunate,
I think. Is that merely to make the PoC simpler, or is there something
that makes it impossible to support that kind of stats?

I don't think so. I ommited it simply because it would more time
to implement.

Another thing that I noticed is that you completely removed the code
that combined multiple stats (and selected the best combination of
stats). In other words, you've reverted to the intermediate single
statistics approach, including removing the improved handling of OR
clauses and conditions.

Yeah, good catch :p I noticed that just after submitting the
patch that I retaion only one statistics at the second level from
the bottom but it is easily fixed by changing pruning timing. The
struct can hold multiple statistics anyway.

And I don't omit OR case. It is handled along with the AND
case. (in wrong way?)

It's a bit difficult to judge the proposed
approach not knowing how well it supports those (quite crucial)
features. What if it can't support some them., or what if it makes the
code much more complicated (thus defeating the goal of making it more
clear)?

OR is supported, Fdep is maybe supportable, but all of them
occurs within the function with the entangled name
(transform..something). But I should put more consider on your
latest code before that.

I share your concern about the performance impact - one thing is that
this new code might be slower than the original one, but a more
serious issue IMHO is that the performance impact will happen even for
relations with no multivariate stats at all. The original patch was
very careful about getting ~0% overhead in such cases,

I don't think so. find_stats runs pull_varnos and
transformRestric.. also uses pull_varnos to bail out at the top
level. They should have almost the same overhead for the case.

and if the new
code does not allow that, I don't see this approach as acceptable. We
must not put additional overhead on people not using multivariate
stats.

But I think it's worth exploring this idea a bit more - can you rebase
it to the current patch version (as on github) and adding the missing
pieces (functional dependencies, multi-statistics estimation and
passing conditions)?

With pleasure. Please wait for a while.

One more thing - I noticed you extended the pg_operator catalog with a
oprmvstat attribute, used to flag operators that are compatible with
multivariate stats. I'm not happy with the current approach (using
oprrest to do this decision), but I'm not really sure this is a good
solution either. The culprit is that it only answers one of the two
important questions - Is it compatible? How to perform the estimation?

Hostly saying, I also don't like this. But checking oprrest is
unpleasant much the same.

So we'd have to rely on oprrest anyway, when actually performing the
estimation of a clause with "compatible" operator. And we'd have to
keep in sync two places (catalog and checks in file), and we'd have to
update the catalog after improving the implementation (adding support
for another operator).

Mmm. It depends on what the deveopers think about the definition
of oprrest. More practically, I'm worried whether it cannot be
other than eqsel for any equality operator. And the same for
comparison operators.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#50)

Re: multivariate statistics / patch v7

Hello Horiguchi-san,

On 07/27/2015 09:04 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Sat, 25 Jul 2015 23:09:31 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55B3FB0B.7000201@2ndquadrant.com>

Hi,

On 07/16/2015 01:51 PM, Kyotaro HORIGUCHI wrote:

Hi, I'd like to show you the modified constitution of
multivariate statistics application logic. Please find the
attached. They apply on your v7 patch.

Sadly I do have some trouble getting it to apply correctly :-(
So for now all my comments are based on just reading the code.

Ah. My modification to rebase to the master for the time should
be culprit. Sorry for the dirty patch.

# I would recreate the patch if you complained before struggling
# with the thing..

The core of the modification is on closesel.c. I attached the
patched closesel.c.

I don't see any attachment. Perhaps you forgot to actually attach it?

My concern about the code at the time was following,

- You embedded the logic of multivariate estimation into
clauselist_selectivity. I think estimate using multivariate
statistics is quite different from the ordinary estimate based
on single column stats then they are logically separatable and
we should do so.

I don't see them as very different, actually quite the opposite. The two
kinds of statistics are complementary and should naturally coexist.
Perhaps the current code is not perfect and a refactoring would make the
code more readable, but I don't think it's primary aim should be to
separate regular and multivariate stats.

- You are taking top-down approach and it runs tree-walking to
check appliability of mv-stats for every stepping down in
clause tree. If the subtree found to be mv-applicable, split it
to two parts - mv-compatible and non-compatible. These steps
requires expression tree walking, which looks using too-much
CPU.

I'm taking top-down approach because that's what the regular stats do,
and also because that's what allows implementing the features that I
think are interesting - ability to combine multiple stats in an
efficient way, pass conditions and such. I think those two features are
very useful and allow very interesting things.

The bottom-up would work too, probably - I mean, we could start from
leaves of the expression tree, and build the largest "subtree"
compatible with multivariate stats and then try to estimate it. I don't
see how we could pass conditions though, which works naturally in the
top-down approach.

Or maybe a combination of both - identify the "compatible" subtrees
first, then perform the top-down phase.

- You look to be considering the cases when users create many
multivariate statistics on attribute sets having
duplications. But it looks too-much for me. MV-stats are more
resource-eating so we can assume the minimum usage of that.

Not really. I don't expect huge numbers of multivariate stats to be
built on the tables.

But I think restricting the users to use a single multivariate
statistics per table would be a significant limitation. And once you
allow using multiple multivariate statistics for the set of clauses,
supporting over-lapping stats is not that difficult.

What it however makes possible is combining multiple "small" stats into
a larger one in a very efficient way - it assumes the overlap is
sufficient, of course. But if that's true you may build multiple small
(and very accurate) stats instead of one huge (or very inaccurate)
statistics.

This also makes it possible to handle complex combinations of clauses
that are compatible and incompatible with multivariate statistics, by
passing the conditions.

My suggestion in the patch is a bottom-up approach to find
mv-applicable portion(s) in the expression tree, which is the
basic way of planner overall. The approach requires no repetitive
run of tree walker, that is, pull_varnos. It could fail to find
the 'optimal' solution for complex situations but needs far less
calculation for almost the same return (I think..).

Even though it doesn't consider the functional dependency, the
reduce of the code shows the efficiency. It does not nothing
tricky.

On a conceptual level, I think the idea to split the estimation into
two phases - enrich the expression tree with nodes with details about
stats etc, and then actually do the estimation in the second phase
might be interesting. Not because it's somehow clearer, but because it
gives us a chance to see the expression tree as a whole, with details
about all the stats (with the current code we process/estimate the
tree incrementally). But I don't really know how useful that would be.

It is difficult to say which approach is better sinch it is
affected by what we think important than other things. However I
concern about that your code substantially reconstructs the
expression (clause) tree midst of processing it. I believe it
should be a separate phase for simplicity. Of course additional
required resource is also should be considered but it is rather
reduced for this case.

What do you mean by "reconstruct the expression tree"? It's true I'm
walking the expression tree top-down, but how is that reconstructing?

I don't think the proposed change makes the process somehow clearer. I
know it's a PoC at this point, so I don't expect it to be perfect, but
for me the original code is certainly clearer. Of course, I'm biased
as I wrote the current code, and I (naturally) shaped it to match my
ideas during the development process, and I'm much more familiar with
it.

Mmm. we need someone else's opition:) What I think on this point
is described just above... OK, I try to describe this in other
words.

I find your comments very valuable. I may not agree with some of them,
but I certainly appreciate your point of view. So thank you very much
for the time you spent reviewing this patch so far!

The embedded approach simply increases the state and code path by,
roughly, multiplication basis. The separate approcach adds them in
addition basis. I thinks this is the most siginificant point of why I
feel it 'clear'.

Of course, the acceptable complexity differs according to the
fundamental complexity, performance, required memory or someting
others but I feel it is too-much complexity for the objective.

Yes, I think we might have slightly different objectives in mind.

Regarding the complexity - I am not too worried about spending more CPU
cycles on this, as long as it does not impact the case where people have
no multivariate statistics at all. That's because I expect people to use
this for large DSS/DWH data sets with lots of dependencies in the (often
denormalized) tables and complex conditions - in those cases the
planning difference is negligible, especially if the improved estimates
make the query run in seconds instead of hours.

This is why I was so careful to entirely skip the expensive processing
when where were no multivariate stats, and why I don't like the fact
that your approach makes this skip more difficult (or maybe impossible,
I'm not sure).

It's also true that most OLTP queries (especially the short ones, thus
most impacted by the increase of planning time) use rather short/simple
clause lists, so even the top-down approach should be very cheap.

Omitting the support for functional dependencies is a bit unfortunate,
I think. Is that merely to make the PoC simpler, or is there something
that makes it impossible to support that kind of stats?

I don't think so. I ommited it simply because it would more time
to implement.

OK, thanks for confirming this.

Another thing that I noticed is that you completely removed the code
that combined multiple stats (and selected the best combination of
stats). In other words, you've reverted to the intermediate single
statistics approach, including removing the improved handling of OR
clauses and conditions.

Yeah, good catch :p I noticed that just after submitting the
patch that I retaion only one statistics at the second level from
the bottom but it is easily fixed by changing pruning timing. The
struct can hold multiple statistics anyway.

Great!

And I don't omit OR case. It is handled along with the AND
case. (in wrong way?)

Oh, I see. I got a bit confused because you've removed the optimization
step (and conditions), and that needs to be handled a bit differently
for the OR clauses.

It's a bit difficult to judge the proposed
approach not knowing how well it supports those (quite crucial)
features. What if it can't support some them., or what if it makes the
code much more complicated (thus defeating the goal of making it more
clear)?

OR is supported, Fdep is maybe supportable, but all of them
occurs within the function with the entangled name
(transform..something). But I should put more consider on your
latest code before that.

Good. Likewise, I'd like to see more of your approach ;-)

I share your concern about the performance impact - one thing is that
this new code might be slower than the original one, but a more
serious issue IMHO is that the performance impact will happen even for
relations with no multivariate stats at all. The original patch was
very careful about getting ~0% overhead in such cases,

I don't think so. find_stats runs pull_varnos and
transformRestric.. also uses pull_varnos to bail out at the top
level. They should have almost the same overhead for the case.

Understood. As I explained above, I'm not all that concerned about the
performance impact, as long as we make sure it only applies to people
using the multivariate stats.

I also think a combined approach - first a bottom-up step (identifying
the largest compatible subtrees & caching the varnos), then a top-down
step (doing the same optimization as implemented today) might minimize
the performance impact.

and if the new
code does not allow that, I don't see this approach as acceptable. We
must not put additional overhead on people not using multivariate
stats.

But I think it's worth exploring this idea a bit more - can you rebase
it to the current patch version (as on github) and adding the missing
pieces (functional dependencies, multi-statistics estimation and
passing conditions)?

With pleasure. Please wait for a while.

Sure. Take your time.

One more thing - I noticed you extended the pg_operator catalog with a
oprmvstat attribute, used to flag operators that are compatible with
multivariate stats. I'm not happy with the current approach (using
oprrest to do this decision), but I'm not really sure this is a good
solution either. The culprit is that it only answers one of the two
important questions - Is it compatible? How to perform the estimation?

Hostly saying, I also don't like this. But checking oprrest is
unpleasant much the same.

The patch is already quite massive, so let's use the same approach as
current stats, and leave this problem for another patch. If we come up
with a great idea, we can work on it, but I see this as a loosely
related annoyance rather than something this patch aims to address.

So we'd have to rely on oprrest anyway, when actually performing the
estimation of a clause with "compatible" operator. And we'd have to
keep in sync two places (catalog and checks in file), and we'd have to
update the catalog after improving the implementation (adding support
for another operator).

Mmm. It depends on what the deveopers think about the definition
of oprrest. More practically, I'm worried whether it cannot be
other than eqsel for any equality operator. And the same for
comparison operators.

OTOH if you define a new operator with oprrest=F_EQSEL, you're
effectively saying "It's OK to estimate this using regular eq/lt/gt
operators". If your operator is somehow incompatible with that, you
should not set oprrest=F_EQSEL.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Heikki Linnakangas

hlinnaka@iki.fi

over 10 years ago

In reply to: Tomas Vondra (#40)

Re: multivariate statistics / patch v7

On 05/25/2015 11:43 PM, Tomas Vondra wrote:

There are 6 files attached, but only 0002-0006 are actually part of the
multivariate statistics patch itself.

All of these patches are huge. In order to review this in a reasonable
amount of time, we need to do this in several steps. So let's see what
would be the minimal set of these patches that could be reviewed and
committed, while still being useful.

The main patches are:

1. shared infrastructure and functional dependencies
2. clause reduction using functional dependencies
3. multivariate MCV lists
4. multivariate histograms
5. multi-statistics estimation

Would it make sense to commit only patches 1 and 2 first? Would that be
enough to get a benefit from this?

I have some doubts about the clause reduction and functional
dependencies part of this. It seems to treat functional dependency as a
boolean property, but even with the classic zipcode and city case, it's
not always an all or nothing thing. At least in some countries, there
can be zipcodes that span multiple cities. So zipcode=X does not
completely imply city=Y, although there is a strong correlation (if
that's the right term). How strong does the correlation need to be for
this patch to decide that zipcode implies city? I couldn't actually see
a clear threshold stated anywhere.

So rather than treating functional dependence as a boolean, I think it
would make more sense to put a 0.0-1.0 number to it. That means that you
can't do clause reduction like it's done in this patch, where you
actually remove clauses from the query for cost esimation purposes.
Instead, you need to calculate the selectivity for each clause
independently, but instead of just multiplying the selectivities
together, apply the "dependence factor" to it.

Does that make sense? I haven't really looked at the MCV, histogram and
"multi-statistics estimation" patches yet. Do those patches make the
clause reduction patch obsolete? Should we forget about the clause
reduction and functional dependency patch, and focus on those later
patches instead?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Tomas Vondra (#51)

1 attachment(s)

Re: multivariate statistics / patch v7

Hello, I certainly attached the file this time.

At Mon, 27 Jul 2015 23:54:08 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55B6A880.3050801@2ndquadrant.com>

The core of the modification is on closesel.c. I attached the
patched closesel.c.

I don't see any attachment. Perhaps you forgot to actually attach it?

Very sorry to have forgotten to attach it. I attached the new
patch applicable on the head of mvstats branch of your
repository.

My concern about the code at the time was following,

- You embedded the logic of multivariate estimation into
clauselist_selectivity. I think estimate using multivariate
statistics is quite different from the ordinary estimate based
on single column stats then they are logically separatable and
we should do so.

I don't see them as very different, actually quite the opposite. The
two kinds of statistics are complementary and should naturally
coexist. Perhaps the current code is not perfect and a refactoring
would make the code more readable, but I don't think it's primary aim
should be to separate regular and multivariate stats.

- You are taking top-down approach and it runs tree-walking to
check appliability of mv-stats for every stepping down in
clause tree. If the subtree found to be mv-applicable, split it
to two parts - mv-compatible and non-compatible. These steps
requires expression tree walking, which looks using too-much
CPU.

I'm taking top-down approach because that's what the regular stats do,
and also because that's what allows implementing the features that I
think are interesting - ability to combine multiple stats in an
efficient way, pass conditions and such. I think those two features
are very useful and allow very interesting things.

The bottom-up would work too, probably - I mean, we could start from
leaves of the expression tree, and build the largest "subtree"
compatible with multivariate stats and then try to estimate it. I
don't see how we could pass conditions though, which works naturally
in the top-down approach.

By the way, the 'condition' looks to mean what will be received
by the parameter of clause(list)_selectivity with the same
name. But it is always NIL. Looking at the comment for
collect_mv_attnum, it is prepared for 'multitable statistics'. If
so, I think it's better removed from the current patch, because
it is useless now.

Or maybe a combination of both - identify the "compatible" subtrees
first, then perform the top-down phase.

- You look to be considering the cases when users create many
multivariate statistics on attribute sets having
duplications. But it looks too-much for me. MV-stats are more
resource-eating so we can assume the minimum usage of that.

Not really. I don't expect huge numbers of multivariate stats to be
built on the tables.

But I think restricting the users to use a single multivariate
statistics per table would be a significant limitation. And once you
allow using multiple multivariate statistics for the set of clauses,
supporting over-lapping stats is not that difficult.

What it however makes possible is combining multiple "small" stats
into a larger one in a very efficient way - it assumes the overlap is
sufficient, of course. But if that's true you may build multiple small
(and very accurate) stats instead of one huge (or very inaccurate)
statistics.

This also makes it possible to handle complex combinations of clauses
that are compatible and incompatible with multivariate statistics, by
passing the conditions.

My suggestion in the patch is a bottom-up approach to find
mv-applicable portion(s) in the expression tree, which is the
basic way of planner overall. The approach requires no repetitive
run of tree walker, that is, pull_varnos. It could fail to find
the 'optimal' solution for complex situations but needs far less
calculation for almost the same return (I think..).

Even though it doesn't consider the functional dependency, the
reduce of the code shows the efficiency. It does not nothing
tricky.

OK

The functional dependency code looks immature in both the
detection phase and application phase in comparison to MCV and
histogram. Addition to that, as the comment in dependencies.c
says, fdep is not so significant (than MCV/HIST) because it is
usually carefully avoided and should be noticed and considered in
designing of application or the whole system.

Persisting to apply them all at once doesn't seem to be a good
strategy to be adopted earlier.

Or perhaps it might be better to register the dependency itself
than registering incomplete information (only the set of colums
involoved in the relationship) and try to detect the relationship
from the given values. I suppose those who can register the
columnset know the precise nature of the dependency in advance.

On a conceptual level, I think the idea to split the estimation into
two phases - enrich the expression tree with nodes with details about
stats etc, and then actually do the estimation in the second phase
might be interesting. Not because it's somehow clearer, but because it
gives us a chance to see the expression tree as a whole, with details
about all the stats (with the current code we process/estimate the
tree incrementally). But I don't really know how useful that would be.

It is difficult to say which approach is better sinch it is
affected by what we think important than other things. However I
concern about that your code substantially reconstructs the
expression (clause) tree midst of processing it. I believe it
should be a separate phase for simplicity. Of course additional
required resource is also should be considered but it is rather
reduced for this case.

What do you mean by "reconstruct the expression tree"? It's true I'm
walking the expression tree top-down, but how is that reconstructing?

For example clauselist_mv_split does. It separates mvclauses from
original clauselist and apply mv-stats at once and (parhaps) let
the rest be processed in the 'normal' route. I called this as
"reconstruct", which I tried to do explicity and separately.

I don't think the proposed change makes the process somehow clearer. I
know it's a PoC at this point, so I don't expect it to be perfect, but
for me the original code is certainly clearer. Of course, I'm biased
as I wrote the current code, and I (naturally) shaped it to match my
ideas during the development process, and I'm much more familiar with
it.

Mmm. we need someone else's opition:) What I think on this point
is described just above... OK, I try to describe this in other
words.

I find your comments very valuable. I may not agree with some of them,
but I certainly appreciate your point of view. So thank you very much
for the time you spent reviewing this patch so far!

Yeah, thank you for your patience and kindness.

The embedded approach simply increases the state and code path by,
roughly, multiplication basis. The separate approcach adds them in
addition basis. I thinks this is the most siginificant point of why I
feel it 'clear'.

Of course, the acceptable complexity differs according to the
fundamental complexity, performance, required memory or someting
others but I feel it is too-much complexity for the objective.

Yes, I think we might have slightly different objectives in mind.

Sure! Now I'm understand what is the point.

Regarding the complexity - I am not too worried about spending more
CPU cycles on this, as long as it does not impact the case where
people have no multivariate statistics at all. That's because I expect
people to use this for large DSS/DWH data sets with lots of
dependencies in the (often denormalized) tables and complex conditions
- in those cases the planning difference is negligible, especially if
the improved estimates make the query run in seconds instead of hours.

I share the vision with you. If that is the case, the mv-stats
route should not be intrude the existing non-mv-stats route. I
feel you have too much intruded clauselist_selectivity all the
more.

If that is the case, my mv-distinct code has different objective
from you. It aims to save the misestimation from multicolumn
correlations more commonly occurs in OLTP usage.

This is why I was so careful to entirely skip the expensive processing
when where were no multivariate stats, and why I don't like the fact
that your approach makes this skip more difficult (or maybe
impossible, I'm not sure).

My code totally skips if transformRestrictionForEstimate returns
NULL and runs clauselist_selectivity as usual. I think almost the
same as yours.

However, if you think it I believe we should not only skipping
calculation but also hiding the additional code blocks which is
overwhelming the normal route. The one of major objectives of my
approach is that point.

It's also true that most OLTP queries (especially the short ones, thus
most impacted by the increase of planning time) use rather
short/simple clause lists, so even the top-down approach should be
very cheap.

Omitting the support for functional dependencies is a bit unfortunate,
I think. Is that merely to make the PoC simpler, or is there something
that makes it impossible to support that kind of stats?

I don't think so. I ommited it simply because it would more time
to implement.

OK, thanks for confirming this.

Another thing that I noticed is that you completely removed the code
that combined multiple stats (and selected the best combination of
stats). In other words, you've reverted to the intermediate single
statistics approach, including removing the improved handling of OR
clauses and conditions.

Yeah, good catch :p I noticed that just after submitting the
patch that I retaion only one statistics at the second level from
the bottom but it is easily fixed by changing pruning timing. The
struct can hold multiple statistics anyway.

Great!

But sorry. I found that considering multiple stats at every level
cannot be done without exhaustive searching of combinations among
child clauses and needs additional data structure. It needs more
thoughs.. As mentioned later, top-down might be suitable for
this optimization.

And I don't omit OR case. It is handled along with the AND
case. (in wrong way?)

Oh, I see. I got a bit confused because you've removed the
optimization step (and conditions), and that needs to be handled a bit
differently for the OR clauses.

Sorry to have forced you reading unapplicable patch:p

It's a bit difficult to judge the proposed
approach not knowing how well it supports those (quite crucial)
features. What if it can't support some them., or what if it makes the
code much more complicated (thus defeating the goal of making it more
clear)?

OR is supported, Fdep is maybe supportable, but all of them
occurs within the function with the entangled name
(transform..something). But I should put more consider on your
latest code before that.

Good. Likewise, I'd like to see more of your approach ;-)

I share your concern about the performance impact - one thing is that
this new code might be slower than the original one, but a more
serious issue IMHO is that the performance impact will happen even for
relations with no multivariate stats at all. The original patch was
very careful about getting ~0% overhead in such cases,

I don't think so. find_stats runs pull_varnos and
transformRestric.. also uses pull_varnos to bail out at the top
level. They should have almost the same overhead for the case.

Understood. As I explained above, I'm not all that concerned about the
performance impact, as long as we make sure it only applies to people
using the multivariate stats.

I also think a combined approach - first a bottom-up step (identifying
the largest compatible subtrees & caching the varnos), then a top-down
step (doing the same optimization as implemented today) might minimize
the performance impact.

I almost reaching the same conclusion.

and if the new
code does not allow that, I don't see this approach as acceptable. We
must not put additional overhead on people not using multivariate
stats.

But I think it's worth exploring this idea a bit more - can you rebase
it to the current patch version (as on github) and adding the missing
pieces (functional dependencies, multi-statistics estimation and
passing conditions)?

With pleasure. Please wait for a while.

Sure. Take your time.

One more thing - I noticed you extended the pg_operator catalog with a
oprmvstat attribute, used to flag operators that are compatible with
multivariate stats. I'm not happy with the current approach (using
oprrest to do this decision), but I'm not really sure this is a good
solution either. The culprit is that it only answers one of the two
important questions - Is it compatible? How to perform the estimation?

Hostly saying, I also don't like this. But checking oprrest is
unpleasant much the same.

The patch is already quite massive, so let's use the same approach as
current stats, and leave this problem for another patch. If we come up
with a great idea, we can work on it, but I see this as a loosely
related annoyance rather than something this patch aims to address.

Agreed.

So we'd have to rely on oprrest anyway, when actually performing the
estimation of a clause with "compatible" operator. And we'd have to
keep in sync two places (catalog and checks in file), and we'd have to
update the catalog after improving the implementation (adding support
for another operator).

Mmm. It depends on what the deveopers think about the definition
of oprrest. More practically, I'm worried whether it cannot be
other than eqsel for any equality operator. And the same for
comparison operators.

OTOH if you define a new operator with oprrest=F_EQSEL, you're
effectively saying "It's OK to estimate this using regular eq/lt/gt
operators". If your operator is somehow incompatible with that, you
should not set oprrest=F_EQSEL.

In contrast, some function other than F_EQSEL might be compatible
with mv-statistics.

For all that, it's not my concern. Although I think they really
are effectively the same, I'm uneasy to use the field apparently
not intended (or suitable) to distinguish such kind of property
of operator.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Modify-the-estimate-path-to-be-bottom-up-processing.patchtext/x-patch; charset=us-asciiDownload

>From 69da94afdd35ed3469dfe9793db38d895adf2b1e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 30 Jul 2015 18:16:30 +0900
Subject: [PATCH] Modify the estimate path to be bottom-up processing.

---
 src/backend/catalog/pg_operator.c      |    6 +
 src/backend/optimizer/path/clausesel.c | 4134 +++++++-------------------------
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/cache/lsyscache.c    |   40 +
 src/include/catalog/pg_operator.h      | 1550 ++++++------
 src/include/nodes/nodes.h              |    1 +
 src/include/nodes/relation.h           |   22 +-
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/lsyscache.h          |    1 +
 src/include/utils/mvstats.h            |    3 +
 12 files changed, 1693 insertions(+), 4114 deletions(-)

diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 072f530..dea39d3 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -251,6 +251,9 @@ OperatorShellMake(const char *operatorName,
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(InvalidOid);
 
+	/* XXXX: How this should be implemented? */
+	values[Anum_pg_operator_oprmvstat - 1] = CStringGetTextDatum("---");
+
 	/*
 	 * open pg_operator
 	 */
@@ -508,6 +511,9 @@ OperatorCreate(const char *operatorName,
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(restrictionId);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(joinId);
 
+	/* XXXX: How this should be implemented? */
+	values[Anum_pg_operator_oprmvstat - 1] = CStringGetTextDatum("---");
+
 	pg_operator_desc = heap_open(OperatorRelationId, RowExclusiveLock);
 
 	/*
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 9bb5b3f..b8bb9f3 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -46,13 +46,6 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
-static Selectivity clauselist_selectivity_or(PlannerInfo *root,
-											 List *clauses,
-											 int varRelid,
-											 JoinType jointype,
-											 SpecialJoinInfo *sjinfo,
-											 List *conditions);
-
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -60,38 +53,6 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
 
-static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
-							 int type);
-
-static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
-									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
-									  int type);
-
-static Bitmapset *clause_mv_get_attnums(PlannerInfo *root, Node *clause);
-
-static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
-								Oid varRelid, List *stats,
-								SpecialJoinInfo *sjinfo);
-
-static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
-								 List *clauses, Oid varRelid,
-								 List **mvclauses, MVStatisticInfo *mvstats, int types);
-
-static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-						MVStatisticInfo *mvstats, List *clauses,
-						List *conditions, bool is_or);
-
-static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									MVStatisticInfo *mvstats,
-									List *clauses, List *conditions,
-									bool is_or, bool *fullmatch,
-									Selectivity *lowsel);
-static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									MVStatisticInfo *mvstats,
-									List *clauses, List *conditions,
-									bool is_or);
-
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
 									int nmatches, char * matches,
@@ -104,79 +65,11 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
-/*
- * Describes a combination of multiple statistics to cover attributes
- * referenced by the clauses. The array 'stats' (with nstats elements)
- * lists attributes (in the order as they are applied), and number of
- * clause attributes covered by this solution.
- *
- * choose_mv_statistics_exhaustive() uses this to track both the current
- * and the best solutions, while walking through the state of possible
- * combination.
- */
-typedef struct mv_solution_t {
-	int		nclauses;		/* number of clauses covered */
-	int		nconditions;	/* number of conditions covered */
-	int		nstats;			/* number of stats applied */
-	int	   *stats;			/* stats (in the apply order) */
-} mv_solution_t;
-
-static List *choose_mv_statistics(PlannerInfo *root,
-								List *mvstats,
-								List *clauses, List *conditions,
-								Oid varRelid,
-								SpecialJoinInfo *sjinfo);
-
-static List *filter_clauses(PlannerInfo *root, Oid varRelid,
-							SpecialJoinInfo *sjinfo, int type,
-							List *stats, List *clauses,
-							Bitmapset **attnums);
-
-static List *filter_stats(List *stats, Bitmapset *new_attnums,
-						  Bitmapset *all_attnums);
-
-static Bitmapset **make_stats_attnums(MVStatisticInfo *mvstats,
-									  int nmvstats);
-
-static MVStatisticInfo *make_stats_array(List *stats, int *nmvstats);
-
-static List* filter_redundant_stats(List *stats,
-									List *clauses, List *conditions);
-
-static Node** make_clauses_array(List *clauses, int *nclauses);
-
-static Bitmapset ** make_clauses_attnums(PlannerInfo *root, Oid varRelid,
-										 SpecialJoinInfo *sjinfo, int type,
-										 Node **clauses, int nclauses);
-
-static bool* make_cover_map(Bitmapset **stats_attnums, int nmvstats,
-							Bitmapset **clauses_attnums, int nclauses);
-
-static bool has_stats(List *stats, int type);
-
-static List * find_stats(PlannerInfo *root, List *clauses,
-						 Oid varRelid, Index *relid);
-
-static Bitmapset* fdeps_collect_attnums(List *stats);
-
-static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
-static int	*make_attnum_to_idx_mapping(Bitmapset *attnums);
-
-static bool	*build_adjacency_matrix(List *stats, Bitmapset *attnums,
-								int *idx_to_attnum, int *attnum_to_idx);
-
-static void	multiply_adjacency_matrix(bool *matrix, int natts);
-
 static List* fdeps_reduce_clauses(List *clauses,
 								  Bitmapset *attnums, bool *matrix,
 								  int *idx_to_attnum, int *attnum_to_idx,
 								  Index relid);
 
-static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
-					 List *clauses, Bitmapset *deps_attnums,
-					 List **reduced_clauses, List **deps_clauses,
-					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
-
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
 int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
@@ -188,397 +81,41 @@ int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+typedef enum mv_selec_status
+{
+	NORMAL,
+	FULL_MATCH,
+	FAILURE
+} mv_selec_status;
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
+/***************/
+RestrictStatData *
+transformRestrictInfoForEstimate(PlannerInfo *root, List *clauses, int varRelid, SpecialJoinInfo *sjinfo);
 
 /*
- * clauselist_selectivity -
- *	  Compute the selectivity of an implicitly-ANDed list of boolean
- *	  expression clauses.  The list can be empty, in which case 1.0
- *	  must be returned.  List elements may be either RestrictInfos
- *	  or bare expression clauses --- the former is preferred since
- *	  it allows caching of results.
- *
- * See clause_selectivity() for the meaning of the additional parameters.
- *
- * Our basic approach is to take the product of the selectivities of the
- * subclauses.  However, that's only right if the subclauses have independent
- * probabilities, and in reality they are often NOT independent.  So,
- * we want to be smarter where we can.
- *
- * Currently, the only extra smarts we have is to recognize "range queries",
- * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
- * query components if they are restriction opclauses whose operators have
- * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
- * We pair up clauses of this form that refer to the same variable.  An
- * unpairable clause of this kind is simply multiplied into the selectivity
- * product in the normal way.  But when we find a pair, we know that the
- * selectivities represent the relative positions of the low and high bounds
- * within the column's range, so instead of figuring the selectivity as
- * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
- * see that hisel is the fraction of the range below the high bound, while
- * losel is the fraction above the low bound; so hisel can be interpreted
- * directly as a 0..1 value but we need to convert losel to 1-losel before
- * interpreting it as a value.  Then the available range is 1-losel to hisel.
- * However, this calculation double-excludes nulls, so really we need
- * hisel + losel + null_frac - 1.)
- *
- * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
- * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
- * yields an impossible (negative) result.
- *
- * A free side-effect is that we can recognize redundant inequalities such
- * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
- *
- * Of course this is all very dependent on the behavior of
- * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
- *
- *
- * Multivariate statististics
- * --------------------------
- * This also uses multivariate stats to estimate combinations of
- * conditions, in a way (a) maximizing the estimate accuracy by using
- * as many stats as possible, and (b) minimizing the overhead,
- * especially when there are no suitable multivariate stats (so if you
- * are not using multivariate stats, there's no additional overhead).
- *
- * The following checks are performed (in this order), and the optimizer
- * falls back to regular stats on the first 'false'.
- *
- * NOTE: This explains how this works with all the patches applied, not
- *       just the functional dependencies.
- *
- * (0) check if there are multivariate stats on the relation
- *
- *     If no, just skip all the following steps (directly to the
- *     original code).
- *
- * (1) check how many attributes are there in conditions compatible
- *     with functional dependencies
- *
- *     Only simple equality clauses are considered compatible with
- *     functional dependencies (and that's unlikely to change, because
- *     that's the only case when functional dependencies are useful).
- *
- *     If there are no conditions that might be handled by multivariate
- *     stats, or if the conditions reference just a single column, it
- *     makes no sense to use functional dependencies, so skip to (4).
- *
- * (2) reduce the clauses using functional dependencies
- *
- *     This simply attempts to 'reduce' the clauses by applying functional
- *     dependencies. For example if there are two clauses:
- *
- *         WHERE (a = 1) AND (b = 2)
- *
- *     and we know that 'a' determines the value of 'b', we may remove
- *     the second condition (b = 2) when computing the selectivity.
- *     This is of course tricky - see mvstats/dependencies.c for details.
- *
- *     After the reduction, step (1) is to be repeated.
- *
- * (3) check how many attributes are there in conditions compatible
- *     with MCV lists and histograms
- *
- *     What conditions are compatible with multivariate stats is decided
- *     by clause_is_mv_compatible(). At this moment, only conditions
- *     of the form "column operator constant" (for simple comparison
- *     operators), IS [NOT] NULL and some AND/OR clauses are considered
- *     compatible with multivariate statistics.
- *
- *     Again, see clause_is_mv_compatible() for details.
- *
- * (4) check how many attributes are there in conditions compatible
- *     with MCV lists and histograms
- *
- *     If there are no conditions that might be handled by MCV lists
- *     or histograms, or if the conditions reference just a single
- *     column, it makes no sense to continue, so just skip to (7).
- *
- * (5) choose the stats matching the most columns
- *
- *     If there are multiple instances of multivariate statistics (e.g.
- *     built on different sets of columns), we choose the stats covering
- *     the most columns from step (1). It may happen that all available
- *     stats match just a single column - for example with conditions
- *
- *         WHERE a = 1 AND b = 2
- *
- *     and statistics built on (a,c) and (b,c). In such case just fall
- *     back to the regular stats because it makes no sense to use the
- *     multivariate statistics.
- *
- *     For more details about how exactly we choose the stats, see
- *     choose_mv_statistics().
- *
- * (6) use the multivariate stats to estimate matching clauses
- *
- * (7) estimate the remaining clauses using the regular statistics
+ * boolop_selectivity -
  */
-Selectivity
-clauselist_selectivity(PlannerInfo *root,
+static Selectivity
+and_clause_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo,
-					   List *conditions)
+					   SpecialJoinInfo *sjinfo)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
-	/* processing mv stats */
-	Index		relid = InvalidOid;
-
-	/* attributes in mv-compatible clauses */
-	Bitmapset  *mvattnums = NULL;
-	List	   *stats = NIL;
-
-	/* use clauses (not conditions), because those are always non-empty */
-	stats = find_stats(root, clauses, varRelid, &relid);
-
-	/*
-	 * If there's exactly one clause, then no use in trying to match up
-	 * pairs, or matching multivariate statistics, so just go directly
-	 * to clause_selectivity().
-	 */
-	if (list_length(clauses) == 1)
-		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo, conditions);
-
-	/*
-	 * Check that there are some stats with functional dependencies
-	 * built (by walking the stats list). We're going to find that
-	 * anyway when trying to apply the functional dependencies, but
-	 * this is probably a tad faster.
-	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP))
-	{
-		/*
-		 * Collect attributes referenced by mv-compatible clauses (looking
-		 * for clauses compatible with functional dependencies for now).
-		 */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   MV_CLAUSE_TYPE_FDEP);
-
-		/*
-		 * If there are mv-compatible clauses, referencing at least two
-		 * different columns (otherwise it makes no sense to use mv stats),
-		 * try to reduce the clauses using functional dependencies, and
-		 * recollect the attributes from the reduced list.
-		 *
-		 * We don't need to select a single statistics for this - we can
-		 * apply all the functional dependencies we have.
-		 */
-		if (bms_num_members(mvattnums) >= 2)
-			clauses = clauselist_apply_dependencies(root, clauses, varRelid,
-													stats, sjinfo);
-	}
-
-	/*
-	 * Check that there are statistics with MCV list or histogram.
-	 * If not, we don't need to waste time with the optimization.
-	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
-	{
-		/*
-		 * Recollect attributes from mv-compatible clauses (maybe we've
-		 * removed so many clauses we have a single mv-compatible attnum).
-		 * From now on we're only interested in MCV-compatible clauses.
-		 */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-
-		/*
-		 * If there still are at least two columns, we'll try to select
-		 * a suitable combination of multivariate stats. If there are
-		 * multiple combinations, we'll try to choose the best one.
-		 * See choose_mv_statistics for more details.
-		 */
-		if (bms_num_members(mvattnums) >= 2)
-		{
-			int k;
-			ListCell *s;
-
-			/*
-			 * Copy the list of conditions, so that we can build a list
-			 * of local conditions (and keep the original intact, for
-			 * the other clauses at the same level).
-			 */
-			List *conditions_local = list_copy(conditions);
-
-			/* find the best combination of statistics */
-			List *solution = choose_mv_statistics(root, stats,
-												  clauses, conditions,
-												  varRelid, sjinfo);
-
-			/* we have a good solution (list of stats) */
-			foreach (s, solution)
-			{
-				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
-
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
-				List	*mvclauses_new = NIL;
-				List	*mvclauses_conditions = NIL;
-				Bitmapset	*stat_attnums = NULL;
-
-				/* build attnum bitmapset for this statistics */
-				for (k = 0; k < mvstat->stakeys->dim1; k++)
-					stat_attnums = bms_add_member(stat_attnums,
-												  mvstat->stakeys->values[k]);
-
-				/*
-				 * Append the compatible conditions (passed from above)
-				 * to mvclauses_conditions.
-				 */
-				foreach (l, conditions)
-				{
-					Node *c = (Node*)lfirst(l);
-					Bitmapset *tmp = clause_mv_get_attnums(root, c);
-
-					if (bms_is_subset(tmp, stat_attnums))
-						mvclauses_conditions
-							= lappend(mvclauses_conditions, c);
-
-					bms_free(tmp);
-				}
-
-				/* split the clauselist into regular and mv-clauses
-				 *
-				 * We keep the list of clauses (we don't remove the
-				 * clauses yet, because we want to use the clauses
-				 * as conditions of other clauses).
-				 *
-				 * FIXME Do this only once, i.e. filter the clauses
-				 *       once (selecting clauses covered by at least
-				 *       one statistics) and then convert them into
-				 *       smaller per-statistics lists of conditions
-				 *       and estimated clauses.
-				 */
-				clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-
-				/*
-				 * We've chosen the statistics to match the clauses, so
-				 * each statistics from the solution should have at least
-				 * one new clause (not covered by the previous stats).
-				 */
-				Assert(mvclauses != NIL);
-
-				/*
-				 * Mvclauses now contains only clauses compatible
-				 * with the currently selected stats, but we have to
-				 * split that into conditions (already matched by
-				 * the previous stats), and the new clauses we need
-				 * to estimate using this stats.
-				 */
-				foreach (l, mvclauses)
-				{
-					ListCell *p;
-					bool covered = false;
-					Node  *clause = (Node *) lfirst(l);
-					Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
-
-					/*
-					 * If already covered by previous stats, add it to
-					 * conditions.
-					 *
-					 * TODO Maybe this could be relaxed a bit? Because
-					 *      with complex and/or clauses, this might
-					 *      mean no statistics actually covers such
-					 *      complex clause.
-					 */
-					foreach (p, solution)
-					{
-						int k;
-						Bitmapset  *stat_attnums = NULL;
-
-						MVStatisticInfo *prev_stat
-							= (MVStatisticInfo *)lfirst(p);
-
-						/* break if we've ran into current statistic */
-						if (prev_stat == mvstat)
-							break;
-
-						for (k = 0; k < prev_stat->stakeys->dim1; k++)
-							stat_attnums = bms_add_member(stat_attnums,
-														  prev_stat->stakeys->values[k]);
-
-						covered = bms_is_subset(clause_attnums, stat_attnums);
-
-						bms_free(stat_attnums);
-
-						if (covered)
-							break;
-					}
-
-					if (covered)
-						mvclauses_conditions
-							= lappend(mvclauses_conditions, clause);
-					else
-						mvclauses_new
-							= lappend(mvclauses_new, clause);
-				}
-
-				/*
-				 * We need at least one new clause (not just conditions).
-				 */
-				Assert(mvclauses_new != NIL);
-
-				/* compute the multivariate stats */
-				s1 *= clauselist_mv_selectivity(root, mvstat,
-												mvclauses_new,
-												mvclauses_conditions,
-												false); /* AND */
-			}
-
-			/*
-			 * And now finally remove all the mv-compatible clauses.
-			 *
-			 * This only repeats the same split as above, but this
-			 * time we actually use the result list (and feed it to
-			 * the next call).
-			 */
-			foreach (s, solution)
-			{
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
-
-				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
-
-				/* split the list into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-
-				/*
-				 * Add the clauses to the conditions (to be passed
-				 * to regular clauses), irrespectedly whether it
-				 * will be used as a condition or a clause here.
-				 *
-				 * We only keep the remaining conditions in the
-				 * clauses (we keep what clauselist_mv_split returns)
-				 * so we add each MV condition exactly once.
-				 */
-				conditions_local = list_concat(conditions_local, mvclauses);
-			}
-
-			/* from now on, work with the 'local' list of conditions */
-			conditions = conditions_local;
-		}
-	}
-
 	/*
 	 * If there's exactly one clause, then no use in trying to match up
 	 * pairs, so just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo, conditions);
-
+								  varRelid, jointype, sjinfo);
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -591,8 +128,7 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
-								conditions);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -750,270 +286,333 @@ clauselist_selectivity(PlannerInfo *root,
 	return s1;
 }
 
-/*
- * Similar to clauselist_selectivity(), but for clauses connected by OR.
- *
- * That means a few differences:
- *
- *   - functional dependencies don't apply to OR-clauses
- *
- *   - we can't add the previous clauses to conditions
- *
- *   - combined selectivities are combined using (s1+s2 - s1*s2)
- *     and not as a multiplication (s1*s2)
- *
- * Another way to evaluate this might be turning
- *
- *     (a OR b OR c)
- *
- * into
- *
- *     NOT ((NOT a) AND (NOT b) AND (NOT c))
- *
- * and computing selectivity of that using clauselist_selectivity().
- * That would allow (a) using the clauselist_selectivity directly and
- * (b) using the previous clauses as conditions. Not sure if it's
- * worth the additional complexity, though.
- *
- * FIXME I'm not entirely sure, but ISTM to me that the clauses might
- *       be processed repeatedly - once for each statistics in the
- *       solution. E.g. with (a=1 OR b=1 OR c=1) and statistics on
- *       [a,b] and [b,c], we can't use [b=1] with both stats, because
- *       we can't combine those using conditional probabilities as with
- *       AND clauses (no conditions with OR clauses).
- *
- * FIXME Maybe we'll need an alternative choose_mv_statistics for OR
- *       clauses, because we can't do so complicated stuff anyway
- *       (conditions, etc.). We generally need to split the clauses
- *       into multiple disjunct subsets, each estimated separately.
- *       So just search for the smallest number of stats, covering the
- *       clauses.
- *
- *       Or maybe just get rid of all this and use the simple formula
- *
- *           s1 + s2 * (s1*s2) formula, which seems to be working
- *
- *       quite reasonably.
- */
 static Selectivity
-clauselist_selectivity_or(PlannerInfo *root,
-					   List *clauses,
-					   int varRelid,
-					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo,
-					   List *conditions)
+clause_mcv_selectivity(PlannerInfo *root, MVStatisticInfo *stats,
+					   Node *clause, int *status)
 {
-	Selectivity s1 = 0.0;
-	ListCell   *l;
-
-	/* processing mv stats */
-	Index		relid = InvalidOid;
+	MCVList mcvlist = NULL;
+	int		nmatches = 0;
+	int		nconditions = 0;
+	char   *matches = NULL;
+	char   *condition_matches = NULL;
+	Selectivity s = 0.0;
+	Selectivity t = 0.0;
+	Selectivity u = 0.0;
+	BoolExpr *expr = (BoolExpr*) clause;
+	bool	is_or = or_clause(clause);
+	int i;
+	bool fullmatch;
+	Selectivity lowsel;
 
-	/* attributes in mv-compatible clauses */
-	Bitmapset  *mvattnums = NULL;
-	List	   *stats = NIL;
+	Assert(IsA(expr, BoolExpr));
+	
+	if (!expr || not_clause(clause)) /* For now!! */
+	{
+		*status = FAILURE;
+		return 0.0;
+	}
+	if (!stats->mcv_built)
+	{
+		*status = FAILURE;
+		return 0.0;
+	}
+	
+	mcvlist = load_mv_mcvlist(stats->mvoid);
+	Assert (mcvlist != NULL);
+	Assert (mcvlist->nitems > 0);
 
-	/* use clauses (not conditions), because those are always non-empty */
-	stats = find_stats(root, clauses, varRelid, &relid);
+	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+	matches = palloc0(sizeof(char) * nmatches);
 
-	/* OR-clauses do not work with functional dependencies */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
-	{
-		/*
-		 * Recollect attributes from mv-compatible clauses (maybe we've
-		 * removed so many clauses we have a single mv-compatible attnum).
-		 * From now on we're only interested in MCV-compatible clauses.
-		 */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+	if (!is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
-		/*
-		 * If there still are at least two columns, we'll try to select
-		 * a suitable multivariate stats.
-		 */
-		if (bms_num_members(mvattnums) >= 2)
-		{
-			int k;
-			ListCell *s;
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
-			List *solution
-				= choose_mv_statistics(root, stats,
-									   clauses, conditions,
-									   varRelid, sjinfo);
+	nmatches = update_match_bitmap_mcvlist(root, expr->args,
+										   stats->stakeys, mcvlist,
+										   (is_or ? 0 : nmatches), matches,
+										   &lowsel, &fullmatch, is_or);
 
-			/* we have a good solution stats */
-			foreach (s, solution)
-			{
-				Selectivity s2;
-				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		u += mcvlist->items[i]->frequency;
+		
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
 
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
-				List	*mvclauses_new = NIL;
-				List	*mvclauses_conditions = NIL;
-				Bitmapset	*stat_attnums = NULL;
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
 
-				/* build attnum bitmapset for this statistics */
-				for (k = 0; k < mvstat->stakeys->dim1; k++)
-					stat_attnums = bms_add_member(stat_attnums,
-												  mvstat->stakeys->values[k]);
+		t += mcvlist->items[i]->frequency;
+	}
 
-				/*
-				 * Append the compatible conditions (passed from above)
-				 * to mvclauses_conditions.
-				 */
-				foreach (l, conditions)
-				{
-					Node *c = (Node*)lfirst(l);
-					Bitmapset *tmp = clause_mv_get_attnums(root, c);
+	pfree(matches);
+	pfree(condition_matches);
+	pfree(mcvlist);
 
-					if (bms_is_subset(tmp, stat_attnums))
-						mvclauses_conditions
-							= lappend(mvclauses_conditions, c);
+	if (fullmatch)
+		*status = FULL_MATCH;
 
-					bms_free(tmp);
-				}
+	/* mcv_low is omitted for now */
 
-				/* split the clauselist into regular and mv-clauses
-				 *
-				 * We keep the list of clauses (we don't remove the
-				 * clauses yet, because we want to use the clauses
-				 * as conditions of other clauses).
-				 *
-				 * FIXME Do this only once, i.e. filter the clauses
-				 *       once (selecting clauses covered by at least
-				 *       one statistics) and then convert them into
-				 *       smaller per-statistics lists of conditions
-				 *       and estimated clauses.
-				 */
-				clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
 
-				/*
-				 * We've chosen the statistics to match the clauses, so
-				 * each statistics from the solution should have at least
-				 * one new clause (not covered by the previous stats).
-				 */
-				Assert(mvclauses != NIL);
+	return (s / t) * u;
+}
 
-				/*
-				 * Mvclauses now contains only clauses compatible
-				 * with the currently selected stats, but we have to
-				 * split that into conditions (already matched by
-				 * the previous stats), and the new clauses we need
-				 * to estimate using this stats.
-				 *
-				 * XXX We'll only use the new clauses, but maybe we
-				 *     should use the conditions too, somehow. We can't
-				 *     use that directly in conditional probability, but
-				 *     maybe we might use them in a different way?
-				 *
-				 *     If we have a clause (a OR b OR c), then knowing
-				 *     that 'a' is TRUE means (b OR c) can't make the
-				 *     whole clause FALSE.
-				 *
-				 *     This is pretty much what
-				 *
-				 *         (a OR b) == NOT ((NOT a) AND (NOT b))
-				 *
-				 *     implies.
-				 */
-				foreach (l, mvclauses)
-				{
-					ListCell *p;
-					bool covered = false;
-					Node  *clause = (Node *) lfirst(l);
-					Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+static Selectivity
+clause_hist_selectivity(PlannerInfo *root, MVStatisticInfo *stats,
+						Node *clause, int *status)
+{
+	MVSerializedHistogram mvhist = NULL;
+	int		nmatches = 0;
+	int		nconditions = 0;
+	char   *matches = NULL;
+	char   *condition_matches = NULL;
+	Selectivity s = 0.0;
+	Selectivity t = 0.0;
+	Selectivity u = 0.0;
+	BoolExpr *expr = (BoolExpr*) clause;
+	bool	is_or = or_clause(clause);
+	int i;
 
-					/*
-					 * If already covered by previous stats, add it to
-					 * conditions.
-					 *
-					 * TODO Maybe this could be relaxed a bit? Because
-					 *      with complex and/or clauses, this might
-					 *      mean no statistics actually covers such
-					 *      complex clause.
-					 */
-					foreach (p, solution)
-					{
-						int k;
-						Bitmapset  *stat_attnums = NULL;
+	Assert(IsA(expr, BoolExpr));
 
-						MVStatisticInfo *prev_stat
-							= (MVStatisticInfo *)lfirst(p);
+	if (!expr || not_clause(clause))  /* for now */
+	{
+		*status = 0;
+		return 0.0;
+	}
+	if (!stats->hist_built)
+	{
+		*status = 1;
+		return 0.0;
+	}
+	mvhist = load_mv_histogram(stats->mvoid);
+	Assert (mvhist != NULL);
+	Assert (clause != NULL);
 
-						/* break if we've ran into current statistic */
-						if (prev_stat == mvstat)
-							break;
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
+	matches = palloc0(sizeof(char) * nmatches);
+	if (!is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
-						for (k = 0; k < prev_stat->stakeys->dim1; k++)
-							stat_attnums = bms_add_member(stat_attnums,
-														  prev_stat->stakeys->values[k]);
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
-						covered = bms_is_subset(clause_attnums, stat_attnums);
+	update_match_bitmap_histogram(root, expr->args, stats->stakeys, mvhist,
+								  (is_or ? 0 : nmatches), matches, is_or);
 
-						bms_free(stat_attnums);
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		float coeff = 1.0;
+		u += mvhist->buckets[i]->ntuples;
 
-						if (covered)
-							break;
-					}
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
 
-					if (! covered)
-						mvclauses_new = lappend(mvclauses_new, clause);
-				}
+		t += coeff * mvhist->buckets[i]->ntuples;
 
-				/*
-				 * We need at least one new clause (not just conditions).
-				 */
-				Assert(mvclauses_new != NIL);
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += coeff * mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
+	}
 
-				/* compute the multivariate stats */
-				s2 = clauselist_mv_selectivity(root, mvstat,
-												mvclauses_new,
-												mvclauses_conditions,
-												true); /* OR */
+	pfree(matches);
+	pfree(condition_matches);
+	pfree(mvhist);
 
-				s1 = s1 + s2 - s1 * s2;
-			}
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
 
-			/*
-			 * And now finally remove all the mv-compatible clauses.
-			 *
-			 * This only repeats the same split as above, but this
-			 * time we actually use the result list (and feed it to
-			 * the next call).
-			 */
-			foreach (s, solution)
-			{
-				/* clauses compatible with multi-variate stats */
-				List	*mvclauses = NIL;
+	return (s / t) * u;
+}
 
-				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+static Selectivity
+apply_mvstats(PlannerInfo *root, Node *clause, bm_mvstat *statent)
+{
+	Selectivity s1 = 0.0;
+	int status;
 
-				/* split the list into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
-										varRelid, &mvclauses, mvstat,
-										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-			}
-		}
+	if (statent->mvkind & MVSTATISTIC_MCV)
+	{
+		s1 = clause_mcv_selectivity(root, statent->stats, clause, &status);
+		if (status == FULL_MATCH && s1 > 0.0)
+			return s1;
 	}
+	
+	if (statent->mvkind & MVSTATISTIC_HIST)
+		s1 = s1 + clause_hist_selectivity(root, statent->stats,
+										  clause, &status);
 
-	/*
-	 * Handle the remaining clauses (either using regular statistics,
-	 * or by multivariate stats at the next level).
-	 */
-	foreach(l, clauses)
-	{
-		Selectivity s2 = clause_selectivity(root,
-											(Node *) lfirst(l),
-											varRelid,
-											jointype,
-											sjinfo,
-											conditions);
+	return s1;
+}
+
+static inline Selectivity
+merge_selectivity(Selectivity s1, Selectivity s2, BoolExprType op)
+{
+	if (op == AND_EXPR)
+		s1 = s1 * s2;
+	else
 		s1 = s1 + s2 - s1 * s2;
+
+	return s1;
+}
+/*
+ * mvclause_selectivity -
+ */
+static Selectivity
+mvclause_selectivity(PlannerInfo *root,
+					 RestrictStatData *rstat,
+					 int varRelid,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo)
+{
+	Selectivity s1;
+	ListCell *lc;
+
+	if (!rstat->mvclause && !rstat->nonmvclause && !rstat->children)
+		return clause_selectivity(root, rstat->clause, varRelid, jointype,
+								  sjinfo);
+
+	if (rstat->boolop == NOT_EXPR)
+	{
+		RestrictStatData *clause =
+			(RestrictStatData *)linitial(rstat->children);
+
+		s1 = 1.0 - mvclause_selectivity(root, clause, varRelid,
+										jointype, sjinfo);
+		return s1;
+	}
+
+	s1 = (rstat->boolop == AND_EXPR ? 1.0 : 0.0);
+
+	if (rstat->nonmvclause)
+		s1 = merge_selectivity(s1,
+				   clause_selectivity(root, rstat->nonmvclause,
+									  varRelid, jointype, sjinfo),
+							   rstat->boolop);
+
+	if (rstat->mvclause)
+	{
+		bm_mvstat *mvs = (bm_mvstat*)linitial(rstat->mvstats);
+		Selectivity s2 = apply_mvstats(root, rstat->mvclause, mvs);
+
+		/* Fall back to ordinary calculation */
+		if (s2 < 0)
+			s2 = clause_selectivity(root, rstat->mvclause, varRelid,
+									jointype, sjinfo);
+		s1 = merge_selectivity(s1, s2, rstat->boolop);
+	}
+
+	foreach(lc, rstat->children)
+	{
+		RestrictStatData *rsd = (RestrictStatData *) lfirst(lc);
+		Assert(IsA(rsd, RestrictStatData));
+
+		s1 = merge_selectivity(s1,
+							   mvclause_selectivity(root, rsd, varRelid,
+													jointype, sjinfo),
+							   rstat->boolop);
+	}
+
+	return s1;
+}
+
+
+/*
+ * clauselist_selectivity -
+ *	  Compute the selectivity of an implicitly-ANDed list of boolean
+ *	  expression clauses.  The list can be empty, in which case 1.0
+ *	  must be returned.  List elements may be either RestrictInfos
+ *	  or bare expression clauses --- the former is preferred since
+ *	  it allows caching of results.
+ *
+ * See clause_selectivity() for the meaning of the additional parameters.
+ *
+ * Our basic approach is to take the product of the selectivities of the
+ * subclauses.  However, that's only right if the subclauses have independent
+ * probabilities, and in reality they are often NOT independent.  So,
+ * we want to be smarter where we can.
+ *
+ * Currently, the only extra smarts we have is to recognize "range queries",
+ * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
+ * query components if they are restriction opclauses whose operators have
+ * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
+ * We pair up clauses of this form that refer to the same variable.  An
+ * unpairable clause of this kind is simply multiplied into the selectivity
+ * product in the normal way.  But when we find a pair, we know that the
+ * selectivities represent the relative positions of the low and high bounds
+ * within the column's range, so instead of figuring the selectivity as
+ * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
+ * see that hisel is the fraction of the range below the high bound, while
+ * losel is the fraction above the low bound; so hisel can be interpreted
+ * directly as a 0..1 value but we need to convert losel to 1-losel before
+ * interpreting it as a value.  Then the available range is 1-losel to hisel.
+ * However, this calculation double-excludes nulls, so really we need
+ * hisel + losel + null_frac - 1.)
+ *
+ * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
+ * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
+ * yields an impossible (negative) result.
+ *
+ * A free side-effect is that we can recognize redundant inequalities such
+ * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
+ *
+ * Of course this is all very dependent on the behavior of
+ * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ *
+ *
+ * Multivariate statististics
+ * --------------------------
+ * This also uses multivariate stats to estimate combinations of
+ * conditions, in a way (a) maximizing the estimate accuracy by using
+ * as many stats as possible, and (b) minimizing the overhead,
+ * especially when there are no suitable multivariate stats (so if you
+ * are not using multivariate stats, there's no additional overhead).
+ *
+ * The following checks are performed (in this order), and the optimizer
+ * falls back to regular stats on the first 'false'.
+ *
+ * NOTE: This explains how this works with all the patches applied, not
+ *       just the functional dependencies.
+ *
+ */
+Selectivity
+clauselist_selectivity(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo)
+{
+	Selectivity s1 = 1.0;
+	RestrictStatData *rstat;
+	List *rinfos = clauses;
+
+	/* Reconstruct clauses so that multivariate statistics can be applied */
+	rstat = transformRestrictInfoForEstimate(root, clauses, varRelid, sjinfo);
+
+	if (rstat)
+	{
+		rinfos = rstat->unusedrinfos;
+
+		s1 = mvclause_selectivity(root, rstat, varRelid, jointype, sjinfo);
 	}
 
+	s1 = s1 * and_clause_selectivity(root, rinfos, varRelid, jointype, sjinfo);
+
 	return s1;
 }
 
@@ -1224,8 +823,7 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo,
-				   List *conditions)
+				   SpecialJoinInfo *sjinfo)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -1355,28 +953,37 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo,
-									  conditions);
+									  sjinfo);
 	}
 	else if (and_clause(clause))
 	{
-		/* share code with clauselist_selectivity() */
-		s1 = clauselist_selectivity(root,
+		s1 = and_clause_selectivity(root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo,
-									conditions);
+									sjinfo);
 	}
 	else if (or_clause(clause))
 	{
-		/* just call to clauselist_selectivity_or() */
-		s1 = clauselist_selectivity_or(root,
-									((BoolExpr *) clause)->args,
-									varRelid,
-									jointype,
-									sjinfo,
-									conditions);
+		/*
+		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
+		 * account for the probable overlap of selected tuple sets.
+		 *
+		 * XXX is this too conservative?
+		 */
+		ListCell   *arg;
+
+		s1 = 0.0;
+		foreach(arg, ((BoolExpr *) clause)->args)
+		{
+			Selectivity s2 = clause_selectivity(root,
+												(Node *) lfirst(arg),
+												varRelid,
+												jointype,
+												sjinfo);
+
+			s1 = s1 + s2 - s1 * s2;
+		}
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -1469,1895 +1076,51 @@ clause_selectivity(PlannerInfo *root,
 						 jointype,
 						 sjinfo);
 	}
-	else if (IsA(clause, CurrentOfExpr))
-	{
-		/* CURRENT OF selects at most one row of its table */
-		CurrentOfExpr *cexpr = (CurrentOfExpr *) clause;
-		RelOptInfo *crel = find_base_rel(root, cexpr->cvarno);
-
-		if (crel->tuples > 0)
-			s1 = 1.0 / crel->tuples;
-	}
-	else if (IsA(clause, RelabelType))
-	{
-		/* Not sure this case is needed, but it can't hurt */
-		s1 = clause_selectivity(root,
-								(Node *) ((RelabelType *) clause)->arg,
-								varRelid,
-								jointype,
-								sjinfo,
-								conditions);
-	}
-	else if (IsA(clause, CoerceToDomain))
-	{
-		/* Not sure this case is needed, but it can't hurt */
-		s1 = clause_selectivity(root,
-								(Node *) ((CoerceToDomain *) clause)->arg,
-								varRelid,
-								jointype,
-								sjinfo,
-								conditions);
-	}
-
-	/* Cache the result if possible */
-	if (cacheable)
-	{
-		if (jointype == JOIN_INNER)
-			rinfo->norm_selec = s1;
-		else
-			rinfo->outer_selec = s1;
-	}
-
-#ifdef SELECTIVITY_DEBUG
-	elog(DEBUG4, "clause_selectivity: s1 %f", s1);
-#endif   /* SELECTIVITY_DEBUG */
-
-	return s1;
-}
-
-
-/*
- * Estimate selectivity for the list of MV-compatible clauses, using
- * using a MV statistics (combining a histogram and MCV list).
- *
- * This simply passes the estimation to the MCV list and then to the
- * histogram, if available.
- *
- * TODO Clamp the selectivity by min of the per-clause selectivities
- *      (i.e. the selectivity of the most restrictive clause), because
- *      that's the maximum we can ever get from ANDed list of clauses.
- *      This may probably prevent issues with hitting too many buckets
- *      and low precision histograms.
- *
- * TODO We may support some additional conditions, most importantly
- *      those matching multiple columns (e.g. "a = b" or "a < b").
- *      Ultimately we could track multi-table histograms for join
- *      cardinality estimation.
- *
- * TODO Further thoughts on processing equality clauses: Maybe it'd be
- *      better to look for stats (with MCV) covered by the equality
- *      clauses, because then we have a chance to find an exact match
- *      in the MCV list, which is pretty much the best we can do. We may
- *      also look at the least frequent MCV item, and use it as a upper
- *      boundary for the selectivity (had there been a more frequent
- *      item, it'd be in the MCV list).
- *
- * TODO There are several options for 'sanity clamping' the estimates.
- *
- *      First, if we have selectivities for each condition, then
- *
- *          P(A,B) <= MIN(P(A), P(B))
- *
- *      Because additional conditions (connected by AND) can only lower
- *      the probability.
- *
- *      So we can do some basic sanity checks using the single-variate
- *      stats (the ones we have right now).
- *
- *      Second, when we have multivariate stats with a MCV list, then
- *
- *      (a) if we have a full equality condition (one equality condition
- *          on each column) and we found a match in the MCV list, this is
- *          the selectivity (and it's supposed to be exact)
- *
- *      (b) if we have a full equality condition and we haven't found a
- *          match in the MCV list, then the selectivity is below the
- *          lowest selectivity in the MCV list
- *
- *      (c) if we have a equality condition (not full), we can still
- *          search the MCV for matches and use the sum of probabilities
- *          as a lower boundary for the histogram (if there are no
- *          matches in the MCV list, then we have no boundary)
- *
- *      Third, if there are multiple (combinations of) multivariate
- *      stats for a set of clauses, we may compute all of them and then
- *      somehow aggregate them - e.g. by choosing the minimum, median or
- *      average. The stats are susceptible to overestimation (because
- *      we take 50% of the bucket for partial matches). Some stats may
- *      give better estimates than others, but it's very difficult to
- *      say that in advance which one is the best (it depends on the
- *      number of buckets, number of additional columns not referenced
- *      in the clauses, type of condition etc.).
- *
- *      So we may compute them all and then choose a sane aggregation
- *      (minimum seems like a good approach). Of course, this may result
- *      in longer / more expensive estimation (CPU-wise), but it may be
- *      worth it.
- *
- *      It's possible to add a GUC choosing whether to do a 'simple'
- *      (using a single stats expected to give the best estimate) and
- *      'complex' (combining the multiple estimates).
- *
- *          multivariate_estimates = (simple|full)
- *
- *      Also, this might be enabled at a table level, by something like
- *
- *          ALTER TABLE ... SET STATISTICS (simple|full)
- *
- *      Which would make it possible to use this only for the tables
- *      where the simple approach does not work.
- *
- *      Also, there are ways to optimize this algorithmically. E.g. we
- *      may try to get an estimate from a matching MCV list first, and
- *      if we happen to get a "full equality match" we may stop computing
- *      the estimates from other stats (for this condition) because
- *      that's probably the best estimate we can really get.
- *
- * TODO When applying the clauses to the histogram/MCV list, we can do
- *      that from the most selective clauses first, because that'll
- *      eliminate the buckets/items sooner (so we'll be able to skip
- *      them without inspection, which is more expensive). But this
- *      requires really knowing the per-clause selectivities in advance,
- *      and that's not what we do now.
- *
- * TODO All this is based on the assumption that the statistics represent
- *      the necessary dependencies, i.e. that if two colunms are not in
- *      the same statistics, there's no dependency. If that's not the
- *      case, we may get misestimates, just like before. For example
- *      assume we have a table with three columns [a,b,c] with exactly
- *      the same values, and statistics on [a,b] and [b,c]. So somthing
- *      like this:
- *
- *          CREATE TABLE test AS SELECT i, i, i
-                                  FROM generate_series(1,1000);
- *
- *          ALTER TABLE test ADD STATISTICS (mcv) ON (a,b);
- *          ALTER TABLE test ADD STATISTICS (mcv) ON (b,c);
- *
- *          ANALYZE test;
- *
- *          EXPLAIN ANALYZE SELECT * FROM test
- *                    WHERE (a < 10) AND (b < 20) AND (c < 10);
- *
- *      The problem here is that the only shared column between the two
- *      statistics is 'b' so the probability will be computed like this
- *
- *          P[(a < 10) & (b < 20) & (c < 10)]
- *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (a < 10) & (b < 20)]
- *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (b < 20)]
- *
- *      or like this
- *
- *          P[(a < 10) & (b < 20) & (c < 10)]
- *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20) & (c < 10)]
- *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20)]
- *
- *      In both cases the conditional probabilities will be evaluated as
- *      0.5, because they lack the other column (which would make it 1.0).
- *
- *      Theoretically it might be possible to transfer the dependency,
- *      e.g. by building bitmap for [a,b] and then combine it with [b,c]
- *      by doing something like this:
- *
- *          1) build bitmap on [a,b] using [(a<10) & (b < 20)]
- *          2) for each element in [b,c] check the bitmap
- *
- *      But that's certainly nontrivial - for example the statistics may
- *      be different (MCV list vs. histogram) and/or the items may not
- *      match (e.g. MCV items or histogram buckets will be built
- *      differently). Also, for one value of 'b' there might be multiple
- *      MCV items (because of the other column values) with different
- *      bitmap values (some will match, some won't) - so it's not exactly
- *      bitmap but a partial match.
- *
- *      Maybe a hash table with number of matches and mismatches (or
- *      maybe sums of frequencies) would work? The step (2) would then
- *      lookup the values and use that to weight the item somehow.
- * 
- *      Currently the only solution is to build statistics on all three
- *      columns.
- */
-static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
-						  List *clauses, List *conditions, bool is_or)
-{
-	bool fullmatch = false;
-	Selectivity s1 = 0.0, s2 = 0.0;
-
-	/*
-	 * Lowest frequency in the MCV list (may be used as an upper bound
-	 * for full equality conditions that did not match any MCV item).
-	 */
-	Selectivity mcv_low = 0.0;
-
-	/* TODO Evaluate simple 1D selectivities, use the smallest one as
-	 *      an upper bound, product as lower bound, and sort the
-	 *      clauses in ascending order by selectivity (to optimize the
-	 *      MCV/histogram evaluation).
-	 */
-
-	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
-										   clauses, conditions, is_or,
-										   &fullmatch, &mcv_low);
-
-	/*
-	 * If we got a full equality match on the MCV list, we're done (and
-	 * the estimate is pretty good).
-	 */
-	if (fullmatch && (s1 > 0.0))
-		return s1;
-
-	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
-	 *       selectivity as upper bound */
-
-	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
-											 clauses, conditions, is_or);
-
-	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
-	return s1 + s2;
-}
-
-/*
- * Collect attributes from mv-compatible clauses.
- */
-static Bitmapset *
-collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
-				   Index *relid, SpecialJoinInfo *sjinfo, int types)
-{
-	Bitmapset  *attnums = NULL;
-	ListCell   *l;
-
-	/*
-	 * Walk through the clauses and identify the ones we can estimate
-	 * using multivariate stats, and remember the relid/columns. We'll
-	 * then cross-check if we have suitable stats, and only if needed
-	 * we'll split the clauses into multivariate and regular lists.
-	 *
-	 * For now we're only interested in RestrictInfo nodes with nested
-	 * OpExpr, using either a range or equality.
-	 */
-	foreach (l, clauses)
-	{
-		Node	   *clause = (Node *) lfirst(l);
-
-		/* ignore the result here - we only need the attnums */
-		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
-								sjinfo, types);
-	}
-
-	/*
-	 * If there are not at least two attributes referenced by the clause(s),
-	 * we can throw everything out (as we'll revert to simple stats).
-	 */
-	if (bms_num_members(attnums) <= 1)
-	{
-		bms_free(attnums);
-		attnums = NULL;
-		*relid = InvalidOid;
-	}
-
-	return attnums;
-}
-
-/*
- * Selects the best combination of multivariate statistics, in an
- * exhaustive way, where 'best' means:
- *
- * (a) covering the most attributes (referenced by clauses)
- * (b) using the least number of multivariate stats
- * (c) using the most conditions to exploit dependency
- *
- * There may be other optimality criteria, not considered in the initial
- * implementation (more on that 'weaknesses' section).
- *
- * This pretty much splits the probability of clauses (aka selectivity)
- * into a sequence of conditional probabilities, like this
- *
- *    P(A,B,C,D) = P(A,B) * P(C|A,B) * P(D|A,B,C)
- *
- * and removing the attributes not referenced by the existing stats,
- * under the assumption that there's no dependency (otherwise the DBA
- * would create the stats).
- *
- * The last criteria means that when we have the choice to compute like
- * this
- *
- *      P(A,B,C,D) = P(A,B,C) * P(D|B,C)
- *
- * or like this
- *
- *      P(A,B,C,D) = P(A,B,C) * P(D|C)
- *
- * we should use the first option, as that exploits more dependencies.
- *
- * The order of statistics in the solution implicitly determines the
- * order of estimation of clauses, because as we apply a statistics,
- * we always use it to estimate all the clauses covered by it (and
- * then we use those clauses as conditions for the next statistics).
- *
- * Don't call this directly but through choose_mv_statistics().
- *
- *
- * Algorithm
- * ---------
- * The algorithm is a recursive implementation of backtracking, with
- * maximum 'depth' equal to the number of multi-variate statistics
- * available on the table.
- *
- * It explores all the possible permutations of the stats.
- * 
- * Whenever it considers adding the next statistics, the clauses it
- * matches are divided into 'conditions' (clauses already matched by at
- * least one previous statistics) and clauses that are estimated.
- *
- * Then several checks are performed:
- *
- *  (a) The statistics covers at least 2 columns, referenced in the
- *      estimated clauses (otherwise multi-variate stats are useless).
- *
- *  (b) The statistics covers at least 1 new column, i.e. column not
- *      refefenced by the already used stats (and the new column has
- *      to be referenced by the clauses, of couse). Otherwise the
- *      statistics would not add any new information.
- *
- * There are some other sanity checks (e.g. that the stats must not be
- * used twice etc.).
- *
- * Finally the new solution is compared to the currently best one, and
- * if it's considered better, it's used instead.
- *
- *
- * Weaknesses
- * ----------
- * The current implemetation uses a somewhat simple optimality criteria,
- * suffering by the following weaknesses.
- *
- * (a) There may be multiple solutions with the same number of covered
- *     attributes and number of statistics (e.g. the same solution but
- *     with statistics in a different order). It's unclear which solution
- *     is the best one - in a sense all of them are equal.
- *
- * TODO It might be possible to compute estimate for each of those
- *      solutions, and then combine them to get the final estimate
- *      (e.g. by using average or median).
- *
- * (b) Does not consider that some types of stats are a better match for
- *     some types of clauses (e.g. MCV list is a good match for equality
- *     than a histogram).
- *
- *     XXX Maybe MCV is almost always better / more accurate?
- *
- *     But maybe this is pointless - generally, each column is either
- *     a label (it's not important whether because of the data type or
- *     how it's used), or a value with ordering that makes sense. So
- *     either a MCV list is more appropriate (labels) or a histogram
- *     (values with orderings).
- *
- *     Now sure what to do with statistics on columns mixing columns of
- *     both types - maybe it'd be beeter to invent a new type of stats
- *     combining MCV list and histogram (keeping a small histogram for
- *     each MCV item, and a separate histogram for values not on the
- *     MCV list). But that's not implemented at this moment.
- *
- * TODO The algorithm should probably count number of Vars (not just
- *      attnums) when computing the 'score' of each solution. Computing
- *      the ratio of (num of all vars) / (num of condition vars) as a
- *      measure of how well the solution uses conditions might be
- *      useful.
- */
-static void
-choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
-					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
-					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
-					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
-					bool *cover_map, bool *condition_map, int *ruled_out,
-					mv_solution_t *current, mv_solution_t **best)
-{
-	int i, j;
-
-	Assert(best != NULL);
-	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
-
-	CHECK_FOR_INTERRUPTS();
-
-	if (current == NULL)
-	{
-		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
-		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
-		current->nstats = 0;
-		current->nclauses = 0;
-		current->nconditions = 0;
-	}
-
-	/*
-	 * Now try to apply each statistics, matching at least two attributes,
-	 * unless it's already used in one of the previous steps.
-	 */
-	for (i = 0; i < nmvstats; i++)
-	{
-		int c;
-
-		int ncovered_clauses = 0;		/* number of covered clauses */
-		int ncovered_conditions = 0;	/* number of covered conditions */
-		int nattnums = 0;		/* number of covered attributes */
-
-		Bitmapset  *all_attnums = NULL;
-		Bitmapset  *new_attnums = NULL;
-
-		/* skip statistics that were already used or eliminated */
-		if (ruled_out[i] != -1)
-			continue;
-
-		/*
-		 * See if we have clauses covered by this statistics, but not
-		 * yet covered by any of the preceding onces.
-		 */
-		for (c = 0; c < nclauses; c++)
-		{
-			bool covered = false;
-			Bitmapset *clause_attnums = clauses_attnums[c];
-			Bitmapset *tmp = NULL;
-
-			/*
-			 * If this clause is not covered by this stats, we can't
-			 * use the stats to estimate that at all.
-			 */
-			if (! cover_map[i * nclauses + c])
-				continue;
-
-			/*
-			 * Now we know we'll use this clause - either as a condition
-			 * or as a new clause (the estimated one). So let's add the
-			 * attributes to the attnums from all the clauses usable with
-			 * this statistics.
-			 */
-			tmp = bms_union(all_attnums, clause_attnums);
-
-			/* free the old bitmap */
-			bms_free(all_attnums);
-			all_attnums = tmp;
-
-			/* let's see if it's covered by any of the previous stats */
-			for (j = 0; j < step; j++)
-			{
-				/* already covered by the previous stats */
-				if (cover_map[current->stats[j] * nclauses + c])
-					covered = true;
-
-				if (covered)
-					break;
-			}
-
-			/* if already covered, continue with the next clause */
-			if (covered)
-			{
-				ncovered_conditions += 1;
-				continue;
-			}
-
-			/*
-			 * OK, this clause is covered by this statistics (and not by
-			 * any of the previous ones)
-			 */
-			ncovered_clauses += 1;
-
-			/* add the attnums into attnums from 'new clauses' */
-			// new_attnums = bms_union(new_attnums, clause_attnums);
-		}
-
-		/* can't have more new clauses than original clauses */
-		Assert(nclauses >= ncovered_clauses);
-		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
-
-		nattnums = bms_num_members(all_attnums);
-
-		/* free all the bitmapsets - we don't need them anymore */
-		bms_free(all_attnums);
-		bms_free(new_attnums);
-
-		all_attnums = NULL;
-		new_attnums = NULL;
-
-		/*
-		 * See if we have clauses covered by this statistics, but not
-		 * yet covered by any of the preceding onces.
-		 */
-		for (c = 0; c < nconditions; c++)
-		{
-			Bitmapset *clause_attnums = conditions_attnums[c];
-			Bitmapset *tmp = NULL;
-
-			/*
-			 * If this clause is not covered by this stats, we can't
-			 * use the stats to estimate that at all.
-			 */
-			if (! condition_map[i * nconditions + c])
-				continue;
-
-			/* count this as a condition */
-			ncovered_conditions += 1;
-
-			/*
-			 * Now we know we'll use this clause - either as a condition
-			 * or as a new clause (the estimated one). So let's add the
-			 * attributes to the attnums from all the clauses usable with
-			 * this statistics.
-			 */
-			tmp = bms_union(all_attnums, clause_attnums);
-
-			/* free the old bitmap */
-			bms_free(all_attnums);
-			all_attnums = tmp;
-		}
-
-		/*
-		 * Let's mark the statistics as 'ruled out' - either we'll use
-		 * it (and proceed to the next step), or it's incompatible.
-		 */
-		ruled_out[i] = step;
-
-		/*
-		 * There are no clauses usable with this statistics (not already
-		 * covered by aome of the previous stats).
-		 *
-		 * Similarly, if the clauses only use a single attribute, we
-		 * can't really use that.
-		 */
-		if ((ncovered_clauses == 0) || (nattnums < 2))
-			continue;
-
-		/*
-		 * TODO Not sure if it's possible to add a clause referencing
-		 *      only attributes already covered by previous stats?
-		 *      Introducing only some new dependency, not a new
-		 *      attribute. Couldn't come up with an example, though.
-		 *      Might be worth adding some assert.
-		 */
-
-		/*
-		 * got a suitable statistics - let's update the current solution,
-		 * maybe use it as the best solution
-		 */
-		current->nclauses += ncovered_clauses;
-		current->nconditions += ncovered_conditions;
-		current->nstats += 1;
-		current->stats[step] = i;
-
-		/*
-		 * We can never cover more clauses, or use more stats that we
-		 * actually have at the beginning.
-		 */
-		Assert(nclauses >= current->nclauses);
-		Assert(nmvstats >= current->nstats);
-		Assert(step < nmvstats);
-
-		/* we can't get more conditions that clauses and conditions combined
-		 *
-		 * FIXME This assert does not work because we count the conditions
-		 *       repeatedly (once for each statistics covering it).
-		 */
-		/* Assert((nconditions + nclauses) >= current->nconditions); */
-
-		if (*best == NULL)
-		{
-			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
-			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
-			(*best)->nstats = 0;
-			(*best)->nclauses = 0;
-			(*best)->nconditions = 0;
-		}
-
-		/* see if it's better than the current 'best' solution */
-		if ((current->nclauses > (*best)->nclauses) ||
-			((current->nclauses == (*best)->nclauses) &&
-			((current->nstats > (*best)->nstats))))
-		{
-			(*best)->nstats = current->nstats;
-			(*best)->nclauses = current->nclauses;
-			(*best)->nconditions = current->nconditions;
-			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
-		}
-
-		/*
-		 * The recursion only makes sense if we haven't covered all the
-		 * attributes (then adding stats is not really possible).
-		 */
-		if ((step + 1) < nmvstats)
-			choose_mv_statistics_exhaustive(root, step+1,
-									nmvstats, mvstats, stats_attnums,
-									nclauses, clauses, clauses_attnums,
-									nconditions, conditions, conditions_attnums,
-									cover_map, condition_map, ruled_out,
-									current, best);
-
-		/* reset the last step */
-		current->nclauses -= ncovered_clauses;
-		current->nconditions -= ncovered_conditions;
-		current->nstats -= 1;
-		current->stats[step] = 0;
-
-		/* mark the statistics as usable again */
-		ruled_out[i] = -1;
-
-		Assert(current->nclauses >= 0);
-		Assert(current->nstats >= 0);
-	}
-
-	/* reset all statistics as 'incompatible' in this step */
-	for (i = 0; i < nmvstats; i++)
-		if (ruled_out[i] == step)
-			ruled_out[i] = -1;
-
-}
-
-/*
- * Greedy search for a multivariate solution - a sequence of statistics
- * covering the clauses. This chooses the "best" statistics at each step,
- * so the resulting solution may not be the best solution globally, but
- * this produces the solution in only N steps (where N is the number of
- * statistics), while the exhaustive approach may have to walk through
- * ~N! combinations (although some of those are terminated early).
- *
- * See the comments at choose_mv_statistics_exhaustive() as this does
- * the same thing (but in a different way).
- *
- * Don't call this directly, but through choose_mv_statistics().
- *
- * TODO There are probably other metrics we might use - e.g. using
- *      number of columns (num_cond_columns / num_cov_columns), which
- *      might work better with a mix of simple and complex clauses.
- *
- * TODO Also the choice at the very first step should be handled
- *      in a special way, because there will be 0 conditions at that
- *      moment, so there needs to be some other criteria - e.g. using
- *      the simplest (or most complex?) clause might be a good idea.
- *
- * TODO We might also select multiple stats using different criteria,
- *      and branch the search. This is however tricky, because if we
- *      choose k statistics at each step, we get k^N branches to
- *      walk through (with N steps). That's not really good with
- *      large number of stats (yet better than exhaustive search).
- */
-static void
-choose_mv_statistics_greedy(PlannerInfo *root, int step,
-					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
-					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
-					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
-					bool *cover_map, bool *condition_map, int *ruled_out,
-					mv_solution_t *current, mv_solution_t **best)
-{
-	int i, j;
-	int best_stat = -1;
-	double gain, max_gain = -1.0;
-
-	/*
-	 * Bitmap tracking which clauses are already covered (by the previous
-	 * statistics) and may thus serve only as a condition in this step.
-	 */
-	bool *covered_clauses = (bool*)palloc0(nclauses);
-
-	/*
-	 * Number of clauses and columns covered by each statistics - this
-	 * includes both conditions and clauses covered by the statistics for
-	 * the first time. The number of columns may count some columns
-	 * repeatedly - if a column is shared by multiple clauses, it will
-	 * be counted once for each clause (covered by the statistics).
-	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
-	 * will be counted twice (if both clauses are covered).
-	 *
-	 * The values for reduded statistics (that can't be applied) are
-	 * not computed, because that'd be pointless.
-	 */
-	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
-	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
-
-	/*
-	 * Same as above, but this only includes clauses that are already
-	 * covered by the previous stats (and the current one).
-	 */
-	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
-	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
-
-	/*
-	 * Number of attributes for each clause.
-	 *
-	 * TODO Might be computed in choose_mv_statistics() and then passed
-	 *      here, but then the function would not have the same signature
-	 *      as _exhaustive().
-	 */
-	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
-	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
-
-	CHECK_FOR_INTERRUPTS();
-
-	Assert(best != NULL);
-	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
-
-	/* compute attributes (columns) for each clause */
-	for (i = 0; i < nclauses; i++)
-		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
-
-	/* compute attributes (columns) for each condition */
-	for (i = 0; i < nconditions; i++)
-		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
-
-	/* see which clauses are already covered at this point (by previous stats) */
-	for (i = 0; i < step; i++)
-		for (j = 0; j < nclauses; j++)
-			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
-
-	/* which remaining statistics covers most clauses / uses most conditions? */
-	for (i = 0; i < nmvstats; i++)
-	{
-		Bitmapset *attnums_covered = NULL;
-		Bitmapset *attnums_conditions = NULL;
-
-		/* skip stats that are already ruled out (either used or inapplicable) */
-		if (ruled_out[i] != -1)
-			continue;
-
-		/* count covered clauses and conditions (for the statistics) */
-		for (j = 0; j < nclauses; j++)
-		{
-			if (cover_map[i * nclauses + j])
-			{
-				Bitmapset *attnums_new
-					= bms_union(attnums_covered, clauses_attnums[j]);
-
-				/* get rid of the old bitmap and keep the unified result */
-				bms_free(attnums_covered);
-				attnums_covered = attnums_new;
-
-				num_cov_clauses[i] += 1;
-				num_cov_columns[i] += attnum_counts[j];
-
-				/* is the clause already covered (i.e. a condition)? */
-				if (covered_clauses[j])
-				{
-					num_cond_clauses[i] += 1;
-					num_cond_columns[i] += attnum_counts[j];
-					attnums_new = bms_union(attnums_conditions,
-											clauses_attnums[j]);
-
-					bms_free(attnums_conditions);
-					attnums_conditions = attnums_new;
-				}
-			}
-		}
-
-		/* if all covered clauses are covered by prev stats (thus conditions) */
-		if (num_cov_clauses[i] == num_cond_clauses[i])
-			ruled_out[i] = step;
-
-		/* same if there are no new attributes */
-		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
-			ruled_out[i] = step;
-
-		bms_free(attnums_covered);
-		bms_free(attnums_conditions);
-
-		/* if the statistics is inapplicable, try the next one */
-		if (ruled_out[i] != -1)
-			continue;
-
-		/* now let's walk through conditions and count the covered */
-		for (j = 0; j < nconditions; j++)
-		{
-			if (condition_map[i * nconditions + j])
-			{
-				num_cond_clauses[i] += 1;
-				num_cond_columns[i] += attnum_cond_counts[j];
-			}
-		}
-
-		/* otherwise see if this improves the interesting metrics */
-		gain = num_cond_columns[i] / (double)num_cov_columns[i];
-
-		if (gain > max_gain)
-		{
-			max_gain = gain;
-			best_stat = i;
-		}
-	}
-
-	/*
-	 * Have we found a suitable statistics? Add it to the solution and
-	 * try next step.
-	 */
-	if (best_stat != -1)
-	{
-		/* mark the statistics, so that we skip it in next steps */
-		ruled_out[best_stat] = step;
-
-		/* allocate current solution if necessary */
-		if (current == NULL)
-		{
-			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
-			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
-			current->nstats = 0;
-			current->nclauses = 0;
-			current->nconditions = 0;
-		}
-
-		current->nclauses += num_cov_clauses[best_stat];
-		current->nconditions += num_cond_clauses[best_stat];
-		current->stats[step] = best_stat;
-		current->nstats++;
-
-		if (*best == NULL)
-		{
-			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
-			(*best)->nstats = current->nstats;
-			(*best)->nclauses = current->nclauses;
-			(*best)->nconditions = current->nconditions;
-
-			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
-			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
-		}
-		else
-		{
-			/* see if this is a better solution */
-			double current_gain = (double)current->nconditions / current->nclauses;
-			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
-
-			if ((current_gain > best_gain) ||
-				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
-			{
-				(*best)->nstats = current->nstats;
-				(*best)->nclauses = current->nclauses;
-				(*best)->nconditions = current->nconditions;
-				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
-			}
-		}
-
-		/*
-		 * The recursion only makes sense if we haven't covered all the
-		 * attributes (then adding stats is not really possible).
-		*/
-		if ((step + 1) < nmvstats)
-			choose_mv_statistics_greedy(root, step+1,
-									nmvstats, mvstats, stats_attnums,
-									nclauses, clauses, clauses_attnums,
-									nconditions, conditions, conditions_attnums,
-									cover_map, condition_map, ruled_out,
-									current, best);
-
-		/* reset the last step */
-		current->nclauses -= num_cov_clauses[best_stat];
-		current->nconditions -= num_cond_clauses[best_stat];
-		current->nstats -= 1;
-		current->stats[step] = 0;
-
-		/* mark the statistics as usable again */
-		ruled_out[best_stat] = -1;
-	}
-
-	/* reset all statistics eliminated in this step */
-	for (i = 0; i < nmvstats; i++)
-		if (ruled_out[i] == step)
-			ruled_out[i] = -1;
-
-	/* free everything allocated in this step */
-	pfree(covered_clauses);
-	pfree(attnum_counts);
-	pfree(num_cov_clauses);
-	pfree(num_cov_columns);
-	pfree(num_cond_clauses);
-	pfree(num_cond_columns);
-}
-
-/*
- * Chooses the combination of statistics, optimal for estimation of
- * a particular clause list.
- *
- * This only handles a 'preparation' shared by the exhaustive and greedy
- * implementations (see the previous methods), mostly trying to reduce
- * the size of the problem (eliminate clauses/statistics that can't be
- * really used in the solution).
- *
- * It also precomputes bitmaps for attributes covered by clauses and
- * statistics, so that we don't need to do that over and over in the
- * actual optimizations (as it's both CPU and memory intensive).
- *
- * TODO This will probably have to consider compatibility of clauses,
- *      because 'dependencies' will probably work only with equality
- *      clauses.
- *
- * TODO Another way to make the optimization problems smaller might
- *      be splitting the statistics into several disjoint subsets, i.e.
- *      if we can split the graph of statistics (after the elimination)
- *      into multiple components (so that stats in different components
- *      share no attributes), we can do the optimization for each
- *      component separately.
- *
- * TODO If we could compute what is a "perfect solution" maybe we could
- *      terminate the search after reaching ~90% of it? Say, if we knew
- *      that we can cover 10 clauses and reuse 8 dependencies, maybe
- *      covering 9 clauses and 7 dependencies would be OK?
- */
-static List*
-choose_mv_statistics(PlannerInfo *root, List *stats,
-					 List *clauses, List *conditions,
-					 Oid varRelid, SpecialJoinInfo *sjinfo)
-{
-	int i;
-	mv_solution_t *best = NULL;
-	List *result = NIL;
-
-	int nmvstats;
-	MVStatisticInfo *mvstats;
-
-	/* we only work with MCV lists and histograms here */
-	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
-
-	bool   *clause_cover_map = NULL,
-		   *condition_cover_map = NULL;
-	int	   *ruled_out = NULL;
-
-	/* build bitmapsets for all stats and clauses */
-	Bitmapset **stats_attnums;
-	Bitmapset **clauses_attnums;
-	Bitmapset **conditions_attnums;
-
-	int nclauses, nconditions;
-	Node ** clauses_array;
-	Node ** conditions_array;
-
-	/* copy lists, so that we can free them during elimination easily */
-	clauses = list_copy(clauses);
-	conditions = list_copy(conditions);
-	stats = list_copy(stats);
-
-	/*
-	 * Reduce the optimization problem size as much as possible.
-	 *
-	 * Eliminate clauses and conditions not covered by any statistics,
-	 * or statistics not matching at least two attributes (one of them
-	 * has to be in a regular clause).
-	 *
-	 * It's possible that removing a statistics in one iteration
-	 * eliminates clause in the next one, so we'll repeat this until we
-	 * eliminate no clauses/stats in that iteration.
-	 *
-	 * This can only happen after eliminating a statistics - clauses are
-	 * eliminated first, so statistics always reflect that.
-	 */
-	while (true)
-	{
-		List	   *tmp;
-
-		Bitmapset *compatible_attnums = NULL;
-		Bitmapset *condition_attnums  = NULL;
-		Bitmapset *all_attnums = NULL;
-
-		/*
-		 * Clauses
-		 *
-		 * Walk through clauses and keep only those covered by at least
-		 * one of the statistics we still have. We'll also keep info
-		 * about attnums in clauses (without conditions) so that we can
-		 * ignore stats covering just conditions (which is pointless).
-		 */
-		tmp = filter_clauses(root, varRelid, sjinfo, type,
-							 stats, clauses, &compatible_attnums);
-
-		/* discard the original list */
-		list_free(clauses);
-		clauses = tmp;
-
-		/*
-		 * Conditions
-		 *
-		 * Walk through clauses and keep only those covered by at least
-		 * one of the statistics we still have. Also, collect bitmap of
-		 * attributes so that we can make sure we add at least one new
-		 * attribute (by comparing with clauses).
-		 */
-		if (conditions != NIL)
-		{
-			tmp = filter_clauses(root, varRelid, sjinfo, type,
-								 stats, conditions, &condition_attnums);
-
-			/* discard the original list */
-			list_free(conditions);
-			conditions = tmp;
-		}
-
-		/* get a union of attnums (from conditions and new clauses) */
-		all_attnums = bms_union(compatible_attnums, condition_attnums);
-
-		/*
-		 * Statisitics
-		 *
-		 * Walk through statistics and only keep those covering at least
-		 * one new attribute (excluding conditions) and at two attributes
-		 * in both clauses and conditions.
-		 */
-		tmp = filter_stats(stats, compatible_attnums, all_attnums);
-
-		/* if we've not eliminated anything, terminate */
-		if (list_length(stats) == list_length(tmp))
-			break;
-
-		/* work only with filtered statistics from now */
-		list_free(stats);
-		stats = tmp;
-	}
-
-	/* only do the optimization if we have clauses/statistics */
-	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
-		return NULL;
-
-	/* remove redundant stats (stats covered by another stats) */
-	stats = filter_redundant_stats(stats, clauses, conditions);
-
-	/*
-	 * TODO We should sort the stats to make the order deterministic,
-	 *      otherwise we may get different estimates on different
-	 *      executions - if there are multiple "equally good" solutions,
-	 *      we'll keep the first solution we see.
-	 *
-	 *      Sorting by OID probably is not the right solution though,
-	 *      because we'd like it to be somehow reproducible,
-	 *      irrespectedly of the order of ADD STATISTICS commands.
-	 *      So maybe statkeys?
-	 */
-	mvstats = make_stats_array(stats, &nmvstats);
-	stats_attnums = make_stats_attnums(mvstats, nmvstats);
-
-	/* collect clauses an bitmap of attnums */
-	clauses_array = make_clauses_array(clauses, &nclauses);
-	clauses_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
-										   clauses_array, nclauses);
-
-	/* collect conditions and bitmap of attnums */
-	conditions_array = make_clauses_array(conditions, &nconditions);
-	conditions_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
-										   conditions_array, nconditions);
-
-	/*
-	 * Build bitmaps with info about which clauses/conditions are
-	 * covered by each statistics (so that we don't need to call the
-	 * bms_is_subset over and over again).
-	 */
-	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
-									  clauses_attnums, nclauses);
-
-	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
-										 conditions_attnums, nconditions);
-
-	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
-
-	/* no stats are ruled out by default */
-	for (i = 0; i < nmvstats; i++)
-		ruled_out[i] = -1;
-
-	/* do the optimization itself */
-	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
-		choose_mv_statistics_exhaustive(root, 0,
-									   nmvstats, mvstats, stats_attnums,
-									   nclauses, clauses_array, clauses_attnums,
-									   nconditions, conditions_array, conditions_attnums,
-									   clause_cover_map, condition_cover_map,
-									   ruled_out, NULL, &best);
-	else
-		choose_mv_statistics_greedy(root, 0,
-									   nmvstats, mvstats, stats_attnums,
-									   nclauses, clauses_array, clauses_attnums,
-									   nconditions, conditions_array, conditions_attnums,
-									   clause_cover_map, condition_cover_map,
-									   ruled_out, NULL, &best);
-
-	/* create a list of statistics from the array */
-	if (best != NULL)
-	{
-		for (i = 0; i < best->nstats; i++)
-		{
-			MVStatisticInfo *info = makeNode(MVStatisticInfo);
-			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
-			result = lappend(result, info);
-		}
-		pfree(best);
-	}
-
-	/* cleanup (maybe leave it up to the memory context?) */
-	for (i = 0; i < nmvstats; i++)
-		bms_free(stats_attnums[i]);
-
-	for (i = 0; i < nclauses; i++)
-		bms_free(clauses_attnums[i]);
-
-	for (i = 0; i < nconditions; i++)
-		bms_free(conditions_attnums[i]);
-
-	pfree(stats_attnums);
-	pfree(clauses_attnums);
-	pfree(conditions_attnums);
-
-	pfree(clauses_array);
-	pfree(conditions_array);
-	pfree(clause_cover_map);
-	pfree(condition_cover_map);
-	pfree(ruled_out);
-	pfree(mvstats);
-
-	list_free(clauses);
-	list_free(conditions);
-	list_free(stats);
-
-	return result;
-}
-
-
-/*
- * This splits the clauses list into two parts - one containing clauses
- * that will be evaluated using the chosen statistics, and the remaining
- * clauses (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
-					List *clauses, Oid varRelid, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
-
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
-
-	Bitmapset *mvattnums = NULL;
-
-	/* build bitmap of attributes covered by the stats, so we can
-	 * do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
-
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
-
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
-
-		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
-									&attnums, sjinfo, types))
-		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
-		}
-
-		/*
-		 * The clause matches the selected stats, so put it to the list
-		 * of mv-compatible clauses. Otherwise, keep it in the list of
-		 * 'regular' clauses (that may be selected later).
-		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
-	}
-
-	/*
-	 * Perform regular estimation using the clauses incompatible
-	 * with the chosen histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
-
-}
-
-/*
- * Determines whether the clause is compatible with multivariate stats,
- * and if it is, returns some additional information - varno (index
- * into simple_rte_array) and a bitmap of attributes. This is then
- * used to fetch related multivariate statistics.
- *
- * At this moment we only support basic conditions of the form
- *
- *     variable OP constant
- *
- * where OP is one of [=,<,<=,>=,>] (which is however determined by
- * looking at the associated function for estimating selectivity, just
- * like with the single-dimensional case).
- *
- * TODO Support 'OR clauses' - shouldn't be all that difficult to
- *      evaluate them using multivariate stats.
- */
-static bool
-clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-						Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
-						int types)
-{
-	Relids clause_relids;
-	Relids left_relids;
-	Relids right_relids;
-
-	if (IsA(clause, RestrictInfo))
-	{
-		RestrictInfo *rinfo = (RestrictInfo *) clause;
-
-		/* Pseudoconstants are not really interesting here. */
-		if (rinfo->pseudoconstant)
-			return false;
-
-		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
-		clause = (Node*)rinfo->clause;
-
-		/* we don't support join conditions at this moment */
-		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
-			return false;
-
-		clause_relids = rinfo->clause_relids;
-		left_relids = rinfo->left_relids;
-		right_relids = rinfo->right_relids;
-	}
-	else if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
-	{
-		left_relids = pull_varnos(get_leftop((Expr*)clause));
-		right_relids = pull_varnos(get_rightop((Expr*)clause));
-
-		clause_relids = bms_union(left_relids,
-								  right_relids);
-	}
-	else
-	{
-		/* Not a binary opclause, so mark left/right relid sets as empty */
-		left_relids = NULL;
-		right_relids = NULL;
-		/* and get the total relid set the hard way */
-		clause_relids = pull_varnos((Node *) clause);
-	}
-
-	/*
-	 * Only simple opclauses and IS NULL tests are compatible with
-	 * multivariate stats at this point.
-	 */
-	if ((is_opclause(clause))
-		&& (list_length(((OpExpr *) clause)->args) == 2))
-	{
-		OpExpr	   *expr = (OpExpr *) clause;
-		bool		varonleft = true;
-		bool		ok;
-
-		/* is it 'variable op constant' ? */
-		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
-			(is_pseudo_constant_clause_relids(lsecond(expr->args),
-											  right_relids) ||
-			(varonleft = false,
-			is_pseudo_constant_clause_relids(linitial(expr->args),
-											 left_relids)));
-
-		if (ok)
-		{
-			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
-
-			/*
-			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-			 * (return NULL).
-			 *
-			 * TODO Maybe use examine_variable() would fix that?
-			 */
-			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-				return false;
-
-			/*
-			 * Only consider this variable if (varRelid == 0) or when the varno
-			 * matches varRelid (see explanation at clause_selectivity).
-			 *
-			 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-			 *       part seems to be enforced by treat_as_join_clause().
-			 */
-			if (! ((varRelid == 0) || (varRelid == var->varno)))
-				return false;
-
-			/* Also skip special varno values, and system attributes ... */
-			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-				return false;
-
-			/* Lookup info about the base relation (we need to pass the OID out) */
-			if (relid != NULL)
-				*relid = var->varno;
-
-			/*
-			 * If it's not a "<" or ">" or "=" operator, just ignore the
-			 * clause. Otherwise note the relid and attnum for the variable.
-			 * This uses the function for estimating selectivity, ont the
-			 * operator directly (a bit awkward, but well ...).
-			 */
-			switch (get_oprrest(expr->opno))
-				{
-					case F_SCALARLTSEL:
-					case F_SCALARGTSEL:
-						/* not compatible with functional dependencies */
-						if (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
-						{
-							*attnums = bms_add_member(*attnums, var->varattno);
-							return (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
-						}
-						return false;
-
-					case F_EQSEL:
-						*attnums = bms_add_member(*attnums, var->varattno);
-						return true;
-				}
-		}
-	}
-	else if (IsA(clause, NullTest)
-			 && IsA(((NullTest*)clause)->arg, Var))
-	{
-		Var * var = (Var*)((NullTest*)clause)->arg;
-
-		/*
-		 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-		 * (return NULL).
-		 *
-		 * TODO Maybe use examine_variable() would fix that?
-		 */
-		if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-			return false;
-
-		/*
-		 * Only consider this variable if (varRelid == 0) or when the varno
-		 * matches varRelid (see explanation at clause_selectivity).
-		 *
-		 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-		 *       part seems to be enforced by treat_as_join_clause().
-		 */
-		if (! ((varRelid == 0) || (varRelid == var->varno)))
-			return false;
-
-		/* Also skip special varno values, and system attributes ... */
-		if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-			return false;
-
-		/* Lookup info about the base relation (we need to pass the OID out) */
-		if (relid != NULL)
-				*relid = var->varno;
-
-		*attnums = bms_add_member(*attnums, var->varattno);
-
-		return true;
-	}
-	else if (or_clause(clause) || and_clause(clause))
-	{
-		/*
-		 * AND/OR-clauses are supported if all sub-clauses are supported
-		 *
-		 * TODO We might support mixed case, where some of the clauses
-		 *      are supported and some are not, and treat all supported
-		 *      subclauses as a single clause, compute it's selectivity
-		 *      using mv stats, and compute the total selectivity using
-		 *      the current algorithm.
-		 *
-		 * TODO For RestrictInfo above an OR-clause, we might use the
-		 *      orclause with nested RestrictInfo - we won't have to
-		 *      call pull_varnos() for each clause, saving time. 
-		 */
-		Bitmapset *tmp = NULL;
-		ListCell *l;
-		foreach (l, ((BoolExpr*)clause)->args)
-		{
-			if (! clause_is_mv_compatible(root, (Node*)lfirst(l),
-						varRelid, relid, &tmp, sjinfo, types))
-				return false;
-		}
-
-		/* add the attnums from the OR-clause to the set of attnums */
-		*attnums = bms_join(*attnums, tmp);
-
-		return true;
-	}
-
-	return false;
-}
-
-
-static Bitmapset *
-clause_mv_get_attnums(PlannerInfo *root, Node *clause)
-{
-	Bitmapset * attnums = NULL;
-
-	/* Extract clause from restrict info, if needed. */
-	if (IsA(clause, RestrictInfo))
-		clause = (Node*)((RestrictInfo*)clause)->clause;
-
-	/*
-	 * Only simple opclauses and IS NULL tests are compatible with
-	 * multivariate stats at this point.
-	 */
-	if ((is_opclause(clause))
-		&& (list_length(((OpExpr *) clause)->args) == 2))
-	{
-		OpExpr	   *expr = (OpExpr *) clause;
-
-		if (IsA(linitial(expr->args), Var))
-			attnums = bms_add_member(attnums,
-							((Var*)linitial(expr->args))->varattno);
-		else
-			attnums = bms_add_member(attnums,
-							((Var*)lsecond(expr->args))->varattno);
-	}
-	else if (IsA(clause, NullTest)
-			 && IsA(((NullTest*)clause)->arg, Var))
-	{
-		attnums = bms_add_member(attnums,
-							((Var*)((NullTest*)clause)->arg)->varattno);
-	}
-	else if (or_clause(clause) || and_clause(clause))
-	{
-		ListCell *l;
-		foreach (l, ((BoolExpr*)clause)->args)
-		{
-			attnums = bms_join(attnums,
-						clause_mv_get_attnums(root, (Node*)lfirst(l)));
-		}
-	}
-
-	return attnums;
-}
-
-/*
- * Performs reduction of clauses using functional dependencies, i.e.
- * removes clauses that are considered redundant. It simply walks
- * through dependencies, and checks whether the dependency 'matches'
- * the clauses, i.e. if there's a clause matching the condition. If yes,
- * all clauses matching the implied part of the dependency are removed
- * from the list.
- *
- * This simply looks at attnums references by the clauses, not at the
- * type of the operator (equality, inequality, ...). This may not be the
- * right way to do - it certainly works best for equalities, which is
- * naturally consistent with functional dependencies (implications).
- * It's not clear that other operators are handled sensibly - for
- * example for inequalities, like
- *
- *     WHERE (A >= 10) AND (B <= 20)
- *
- * and a trivial case where [A == B], resulting in symmetric pair of
- * rules [A => B], [B => A], it's rather clear we can't remove either of
- * those clauses.
- *
- * That only highlights that functional dependencies are most suitable
- * for label-like data, where using non-equality operators is very rare.
- * Using the common city/zipcode example, clauses like
- *
- *     (zipcode <= 12345)
- *
- * or
- *
- *     (cityname >= 'Washington')
- *
- * are rare. So restricting the reduction to equality should not harm
- * the usefulness / applicability.
- *
- * The other assumption is that this assumes 'compatible' clauses. For
- * example by using mismatching zip code and city name, this is unable
- * to identify the discrepancy and eliminates one of the clauses. The
- * usual approach (multiplying both selectivities) thus produces a more
- * accurate estimate, although mostly by luck - the multiplication
- * comes from assumption of statistical independence of the two
- * conditions (which is not not valid in this case), but moves the
- * estimate in the right direction (towards 0%).
- *
- * This might be somewhat improved by cross-checking the selectivities
- * against MCV and/or histogram.
- *
- * The implementation needs to be careful about cyclic rules, i.e. rules
- * like [A => B] and [B => A] at the same time. This must not reduce
- * clauses on both attributes at the same time.
- *
- * Technically we might consider selectivities here too, somehow. E.g.
- * when (A => B) and (B => A), we might use the clauses with minimum
- * selectivity.
- *
- * TODO Consider restricting the reduction to equality clauses. Or maybe
- *      use equality classes somehow?
- *
- * TODO Merge this docs to dependencies.c, as it's saying mostly the
- *      same things as the comments there.
- *
- * TODO Currently this is applied only to the top-level clauses, but
- *      maybe we could apply it to lists at subtrees too, e.g. to the
- *      two AND-clauses in
- *
- *          (x=1 AND y=2) OR (z=3 AND q=10)
- *
- */
-static List *
-clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
-							  Oid varRelid, List *stats,
-							  SpecialJoinInfo *sjinfo)
-{
-	List	   *reduced_clauses = NIL;
-	Index		relid;
-
-	/*
-	 * matrix of (natts x natts), 1 means x=>y
-	 *
-	 * This serves two purposes - first, it merges dependencies from all
-	 * the statistics, second it makes generating all the transitive
-	 * dependencies easier.
-	 *
-	 * We need to build this only for attributes from the dependencies,
-	 * not for all attributes in the table.
-	 *
-	 * We can't do that only for attributes from the clauses, because we
-	 * want to build transitive dependencies (including those going
-	 * through attributes not listed in the stats).
-	 *
-	 * This only works for A=>B dependencies, not sure how to do that
-	 * for complex dependencies.
-	 */
-	bool       *deps_matrix;
-	int			deps_natts;	/* size of the matric */
-
-	/* mapping attnum <=> matrix index */
-	int		   *deps_idx_to_attnum;
-	int		   *deps_attnum_to_idx;
-
-	/* attnums in dependencies and clauses (and intersection) */
-	List	   *deps_clauses   = NIL;
-	Bitmapset  *deps_attnums   = NULL;
-	Bitmapset  *clause_attnums = NULL;
-	Bitmapset  *intersect_attnums = NULL;
-
-	/*
-	 * Is there at least one statistics with functional dependencies?
-	 * If not, return the original clauses right away.
-	 *
-	 * XXX Isn't this pointless, thanks to exactly the same check in
-	 *     clauselist_selectivity()? Can we trigger the condition here?
-	 */
-	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
-		return clauses;
-
-	/*
-	 * Build the dependency matrix, i.e. attribute adjacency matrix,
-	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
-	 * multiply it by itself, to get transitive dependencies.
-	 *
-	 * Note: This is pretty much transitive closure from graph theory.
-	 *
-	 * First, let's see what attributes are covered by functional
-	 * dependencies (sides of the adjacency matrix), and also a maximum
-	 * attribute (size of mapping to simple integer indexes);
-	 */
-	deps_attnums = fdeps_collect_attnums(stats);
-
-	/*
-	 * Walk through the clauses - clauses that are (one of)
-	 *
-	 * (a) not mv-compatible
-	 * (b) are using more than a single attnum
-	 * (c) using attnum not covered by functional depencencies
-	 *
-	 * may be copied directly to the result. The interesting clauses are
-	 * kept in 'deps_clauses' and will be processed later.
-	 */
-	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
-										  &reduced_clauses, &deps_clauses,
-										  varRelid, &relid, sjinfo);
-
-	/*
-	 * we need at least two clauses referencing two different attributes
-	 * referencing to do the reduction
-	 */
-	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
-	{
-		bms_free(clause_attnums);
-		list_free(reduced_clauses);
-		list_free(deps_clauses);
-
-		return clauses;
-	}
-
-
-	/*
-	 * We need at least two matching attributes in the clauses and
-	 * dependencies, otherwise we can't really reduce anything.
-	 */
-	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
-	if (bms_num_members(intersect_attnums) < 2)
-	{
-		bms_free(clause_attnums);
-		bms_free(deps_attnums);
-		bms_free(intersect_attnums);
-
-		list_free(deps_clauses);
-		list_free(reduced_clauses);
-
-		return clauses;
-	}
-
-	/*
-	 * Build mapping between matrix indexes and attnums, and then the
-	 * adjacency matrix itself.
-	 */
-	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
-	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
-
-	/* build the adjacency matrix */
-	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
-										 deps_idx_to_attnum,
-										 deps_attnum_to_idx);
-
-	deps_natts = bms_num_members(deps_attnums);
-
-	/*
-	 * Multiply the matrix N-times (N = size of the matrix), so that we
-	 * get all the transitive dependencies. That makes the next step
-	 * much easier and faster.
-	 *
-	 * This is essentially an adjacency matrix from graph theory, and
-	 * by multiplying it we get transitive edges. We don't really care
-	 * about the exact number (number of paths between vertices) though,
-	 * so we can do the multiplication in-place (we don't care whether
-	 * we found the dependency in this round or in the previous one).
-	 *
-	 * Track how many new dependencies were added, and stop when 0, but
-	 * we can't multiply more than N-times (longest path in the graph).
-	 */
-	multiply_adjacency_matrix(deps_matrix, deps_natts);
-
-	/*
-	 * Walk through the clauses, and see which other clauses we may
-	 * reduce. The matrix contains all transitive dependencies, which
-	 * makes this very fast.
-	 *
-	 * We have to be careful not to reduce the clause using itself, or
-	 * reducing all clauses forming a cycle (so we have to skip already
-	 * eliminated clauses).
-	 *
-	 * I'm not sure whether this guarantees finding the best solution,
-	 * i.e. reducing the most clauses, but it probably does (thanks to
-	 * having all the transitive dependencies).
-	 */
-	deps_clauses = fdeps_reduce_clauses(deps_clauses,
-										deps_attnums, deps_matrix,
-										deps_idx_to_attnum,
-										deps_attnum_to_idx, relid);
-
-	/* join the two lists of clauses */
-	reduced_clauses = list_union(reduced_clauses, deps_clauses);
-
-	pfree(deps_matrix);
-	pfree(deps_idx_to_attnum);
-	pfree(deps_attnum_to_idx);
-
-	bms_free(deps_attnums);
-	bms_free(clause_attnums);
-	bms_free(intersect_attnums);
-
-	return reduced_clauses;
-}
-
-static bool
-has_stats(List *stats, int type)
-{
-	ListCell   *s;
-
-	foreach (s, stats)
-	{
-		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
-
-		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
-			return true;
-	}
-
-	return false;
-}
-
-/*
- * Determing relid (either from varRelid or from clauses) and then
- * lookup stats using the relid.
- */
-static List *
-find_stats(PlannerInfo *root, List *clauses, Oid varRelid, Index *relid)
-{
-	/* unknown relid by default */
-	*relid = InvalidOid;
-
-	/*
-	 * First we need to find the relid (index info simple_rel_array).
-	 * If varRelid is not 0, we already have it, otherwise we have to
-	 * look it up from the clauses.
-	 */
-	if (varRelid != 0)
-		*relid = varRelid;
-	else
-	{
-		Relids	relids = pull_varnos((Node*)clauses);
-
-		/*
-		 * We only expect 0 or 1 members in the bitmapset. If there are
-		 * no vars, we'll get empty bitmapset, otherwise we'll get the
-		 * relid as the single member.
-		 *
-		 * FIXME For some reason we can get 2 relids here (e.g. \d in
-		 *       psql does that).
-		 */
-		if (bms_num_members(relids) == 1)
-			*relid = bms_singleton_member(relids);
-
-		bms_free(relids);
-	}
-
-	/*
-	 * if we found the relid, we can get the stats from simple_rel_array
-	 *
-	 * This only gets stats that are already built, because that's how
-	 * we load it into RelOptInfo (see get_relation_info), but we don't
-	 * detoast the whole stats yet. That'll be done later, after we
-	 * decide which stats to use.
-	 */
-	if (*relid != InvalidOid)
-		return root->simple_rel_array[*relid]->mvstatlist;
-
-	return NIL;
-}
-
-static Bitmapset*
-fdeps_collect_attnums(List *stats)
-{
-	ListCell *lc;
-	Bitmapset *attnums = NULL;
-
-	foreach (lc, stats)
-	{
-		int j;
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
-
-		int2vector *stakeys = info->stakeys;
-
-		/* skip stats without functional dependencies built */
-		if (! info->deps_built)
-			continue;
-
-		for (j = 0; j < stakeys->dim1; j++)
-			attnums = bms_add_member(attnums, stakeys->values[j]);
-	}
-
-	return attnums;
-}
-
-
-static int*
-make_idx_to_attnum_mapping(Bitmapset *attnums)
-{
-	int		attidx = 0;
-	int		attnum = -1;
-
-	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
-
-	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
-		mapping[attidx++] = attnum;
-
-	Assert(attidx == bms_num_members(attnums));
-
-	return mapping;
-}
-
-static int*
-make_attnum_to_idx_mapping(Bitmapset *attnums)
-{
-	int		attidx = 0;
-	int		attnum = -1;
-	int		maxattnum = -1;
-	int	   *mapping;
-
-	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
-		maxattnum = attnum;
-
-	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
-
-	attnum = -1;
-	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
-		mapping[attnum] = attidx++;
-
-	Assert(attidx == bms_num_members(attnums));
-
-	return mapping;
-}
-
-static bool*
-build_adjacency_matrix(List *stats, Bitmapset *attnums,
-					   int *idx_to_attnum, int *attnum_to_idx)
-{
-	ListCell *lc;
-	int		natts  = bms_num_members(attnums);
-	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
-
-	foreach (lc, stats)
-	{
-		int j;
-		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
-		MVDependencies dependencies = NULL;
-
-		/* skip stats without functional dependencies built */
-		if (! stat->deps_built)
-			continue;
-
-		/* fetch and deserialize dependencies */
-		dependencies = load_mv_dependencies(stat->mvoid);
-		if (dependencies == NULL)
-		{
-			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
-			continue;
-		}
-
-		/* set matrix[a,b] to 'true' if 'a=>b' */
-		for (j = 0; j < dependencies->ndeps; j++)
-		{
-			int aidx = attnum_to_idx[dependencies->deps[j]->a];
-			int bidx = attnum_to_idx[dependencies->deps[j]->b];
-
-			/* a=> b */
-			matrix[aidx * natts + bidx] = true;
-		}
-	}
-
-	return matrix;
-}
-
-static void
-multiply_adjacency_matrix(bool *matrix, int natts)
-{
-	int i;
+	else if (IsA(clause, CurrentOfExpr))
+	{
+		/* CURRENT OF selects at most one row of its table */
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) clause;
+		RelOptInfo *crel = find_base_rel(root, cexpr->cvarno);
 
-	for (i = 0; i < natts; i++)
+		if (crel->tuples > 0)
+			s1 = 1.0 / crel->tuples;
+	}
+	else if (IsA(clause, RelabelType))
+	{
+		/* Not sure this case is needed, but it can't hurt */
+		s1 = clause_selectivity(root,
+								(Node *) ((RelabelType *) clause)->arg,
+								varRelid,
+								jointype,
+								sjinfo);
+	}
+	else if (IsA(clause, CoerceToDomain))
 	{
-		int k, l, m;
-		int nchanges = 0;
+		/* Not sure this case is needed, but it can't hurt */
+		s1 = clause_selectivity(root,
+								(Node *) ((CoerceToDomain *) clause)->arg,
+								varRelid,
+								jointype,
+								sjinfo);
+	}
 
-		/* k => l */
-		for (k = 0; k < natts; k++)
-		{
-			for (l = 0; l < natts; l++)
-			{
-				/* we already have this dependency */
-				if (matrix[k * natts + l])
-					continue;
+	/* Cache the result if possible */
+	if (cacheable)
+	{
+		if (jointype == JOIN_INNER)
+			rinfo->norm_selec = s1;
+		else
+			rinfo->outer_selec = s1;
+	}
 
-				/* we don't really care about the exact value, just 0/1 */
-				for (m = 0; m < natts; m++)
-				{
-					if (matrix[k * natts + m] * matrix[m * natts + l])
-					{
-						matrix[k * natts + l] = true;
-						nchanges += 1;
-						break;
-					}
-				}
-			}
-		}
+#ifdef SELECTIVITY_DEBUG
+	elog(DEBUG4, "clause_selectivity: s1 %f", s1);
+#endif   /* SELECTIVITY_DEBUG */
 
-		/* no transitive dependency added here, so terminate */
-		if (nchanges == 0)
-			break;
-	}
+	return s1;
 }
 
+
 static List*
 fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
 					int *idx_to_attnum, int *attnum_to_idx, Index relid)
@@ -3447,55 +1210,6 @@ fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
 }
 
 
-static Bitmapset *
-fdeps_filter_clauses(PlannerInfo *root,
-					 List *clauses, Bitmapset *deps_attnums,
-					 List **reduced_clauses, List **deps_clauses,
-					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo)
-{
-	ListCell *lc;
-	Bitmapset *clause_attnums = NULL;
-
-	foreach (lc, clauses)
-	{
-		Bitmapset *attnums = NULL;
-		Node	   *clause = (Node *) lfirst(lc);
-
-		if (! clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
-									  sjinfo, MV_CLAUSE_TYPE_FDEP))
-
-			/* clause incompatible with functional dependencies */
-			*reduced_clauses = lappend(*reduced_clauses, clause);
-
-		else if (bms_num_members(attnums) > 1)
-
-			/*
-			 * clause referencing multiple attributes (strange, should
-			 * this be handled by clause_is_mv_compatible directly)
-			 */
-			*reduced_clauses = lappend(*reduced_clauses, clause);
-
-		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
-
-			/* clause not covered by the dependencies */
-			*reduced_clauses = lappend(*reduced_clauses, clause);
-
-		else
-		{
-			/* ok, clause compatible with existing dependencies */
-			Assert(bms_num_members(attnums) == 1);
-
-			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums,
-										bms_singleton_member(attnums));
-		}
-
-		bms_free(attnums);
-	}
-
-	return clause_attnums;
-}
-
 /*
  * Pull varattnos from the clauses, similarly to pull_varattnos() but:
  *
@@ -3529,162 +1243,6 @@ get_varattnos(Node * node, Index relid)
 	return result;
 }
 
-/*
- * Estimate selectivity of clauses using a MCV list.
- *
- * If there's no MCV list for the stats, the function returns 0.0.
- *
- * While computing the estimate, the function checks whether all the
- * columns were matched with an equality condition. If that's the case,
- * we can skip processing the histogram, as there can be no rows in
- * it with the same values - all the rows matching the condition are
- * represented by the MCV item. This can only happen with equality
- * on all the attributes.
- *
- * The algorithm works like this:
- *
- *   1) mark all items as 'match'
- *   2) walk through all the clauses
- *   3) for a particular clause, walk through all the items
- *   4) skip items that are already 'no match'
- *   5) check clause for items that still match
- *   6) sum frequencies for items to get selectivity
- *
- * The function also returns the frequency of the least frequent item
- * on the MCV list, which may be useful for clamping estimate from the
- * histogram (all items not present in the MCV list are less frequent).
- * This however seems useful only for cases with conditions on all
- * attributes.
- *
- * TODO This only handles AND-ed clauses, but it might work for OR-ed
- *      lists too - it just needs to reverse the logic a bit. I.e. start
- *      with 'no match' for all items, and mark the items as a match
- *      as the clauses are processed (and skip items that are 'match').
- */
-static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
-								  List *clauses, List *conditions, bool is_or,
-								  bool *fullmatch, Selectivity *lowsel)
-{
-	int i;
-	Selectivity s = 0.0;
-	Selectivity t = 0.0;
-	Selectivity u = 0.0;
-
-	MCVList mcvlist = NULL;
-
-	int	nmatches = 0;
-	int	nconditions = 0;
-
-	/* match/mismatch bitmap for each MCV item */
-	char * matches = NULL;
-	char * condition_matches = NULL;
-
-	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 1);
-
-	/* there's no MCV list built yet */
-	if (! mvstats->mcv_built)
-		return 0.0;
-
-	mcvlist = load_mv_mcvlist(mvstats->mvoid);
-
-	Assert(mcvlist != NULL);
-	Assert(mcvlist->nitems > 0);
-
-	/* number of matching MCV items */
-	nmatches = mcvlist->nitems;
-	nconditions = mcvlist->nitems;
-
-	/*
-	 * Bitmap of bucket matches (mismatch, partial, full).
-	 *
-	 * For AND clauses all buckets match (and we'll eliminate them).
-	 * For OR  clauses no  buckets match (and we'll add them).
-	 *
-	 * We only need to do the memset for AND clauses (for OR clauses
-	 * it's already set correctly by the palloc0).
-	 */
-	matches = palloc0(sizeof(char) * nmatches);
-
-	if (! is_or) /* AND-clause */
-		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
-
-	/* Conditions are treated as AND clause, so match by default. */
-	condition_matches = palloc0(sizeof(char) * nconditions);
-	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
-
-	/*
-	 * build the match bitmap for the conditions (conditions are always
-	 * connected by AND)
-	 */
-	if (conditions != NIL)
-		nconditions = update_match_bitmap_mcvlist(root, conditions,
-									   mvstats->stakeys, mcvlist,
-									   nconditions, condition_matches,
-									   lowsel, fullmatch, false);
-
-	/*
-	 * build the match bitmap for the estimated clauses
-	 *
-	 * TODO This evaluates the clauses for all MCV items, even those
-	 *      ruled out by the conditions. The final result should be the
-	 *      same, but it might be faster.
-	 */
-	nmatches = update_match_bitmap_mcvlist(root, clauses,
-										   mvstats->stakeys, mcvlist,
-										   ((is_or) ? 0 : nmatches), matches,
-										   lowsel, fullmatch, is_or);
-
-	/* sum frequencies for all the matching MCV items */
-	for (i = 0; i < mcvlist->nitems; i++)
-	{
-		/*
-		 * Find out what part of the data is covered by the MCV list,
-		 * so that we can 'scale' the selectivity properly (e.g. when
-		 * only 50% of the sample items got into the MCV, and the rest
-		 * is either in a histogram, or not covered by stats).
-		 *
-		 * TODO This might be handled by keeping a global "frequency"
-		 *      for the whole list, which might save us a bit of time
-		 *      spent on accessing the not-matching part of the MCV list.
-		 *      Although it's likely in a cache, so it's very fast.
-		 */
-		u += mcvlist->items[i]->frequency;
-
-		/* skit MCV items not matching the conditions */
-		if (condition_matches[i] == MVSTATS_MATCH_NONE)
-			continue;
-
-		if (matches[i] != MVSTATS_MATCH_NONE)
-			s += mcvlist->items[i]->frequency;
-
-		t += mcvlist->items[i]->frequency;
-	}
-
-	pfree(matches);
-	pfree(condition_matches);
-	pfree(mcvlist);
-
-	/* no condition matches */
-	if (t == 0.0)
-		return (Selectivity)0.0;
-
-	return (s / t) * u;
-}
-
-/*
- * Evaluate clauses using the MCV list, and update the match bitmap.
- *
- * The bitmap may be already partially set, so this is really a way to
- * combine results of several clause lists - either when computing
- * conditional probability P(A|B) or a combination of AND/OR clauses.
- *
- * TODO This works with 'bitmap' where each bit is represented as a char,
- *      which is slightly wasteful. Instead, we could use a regular
- *      bitmap, reducing the size to ~1/8. Another thing is merging the
- *      bitmaps using & and |, which might be faster than min/max.
- */
 static int
 update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 						   int2vector *stakeys, MCVList mcvlist,
@@ -3972,216 +1530,59 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 			/* match/mismatch bitmap for each MCV item */
 			int	tmp_nmatches = 0;
-			char * tmp_matches = NULL;
-
-			Assert(tmp_clauses != NIL);
-			Assert(list_length(tmp_clauses) >= 2);
-
-			/* number of matching MCV items */
-			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
-
-			/* by default none of the MCV items matches the clauses */
-			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
-
-			/* AND clauses assume everything matches, initially */
-			if (! or_clause(clause))
-				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
-			/* build the match bitmap for the OR-clauses */
-			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
-									   stakeys, mcvlist,
-									   tmp_nmatches, tmp_matches,
-									   lowsel, fullmatch, or_clause(clause));
-
-			/* merge the bitmap into the existing one*/
-			for (i = 0; i < mcvlist->nitems; i++)
-			{
-				/*
-				 * To AND-merge the bitmaps, a MIN() semantics is used.
-				 * For OR-merge, use MAX().
-				 *
-				 * FIXME this does not decrease the number of matches
-				 */
-				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
-			}
 
-			pfree(tmp_matches);
-
-		}
-		else
-			elog(ERROR, "unknown clause type: %d", clause->type);
-	}
-
-	/*
-	 * If all the columns were matched by equality, it's a full match.
-	 * In this case there can be just a single MCV item, matching the
-	 * clause (if there were two, both would match the other one).
-	 */
-	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
-
-	/* free the allocated pieces */
-	if (eqmatches)
-		pfree(eqmatches);
-
-	return nmatches;
-}
-
-/*
- * Estimate selectivity of clauses using a histogram.
- *
- * If there's no histogram for the stats, the function returns 0.0.
- *
- * The general idea of this method is similar to how MCV lists are
- * processed, except that this introduces the concept of a partial
- * match (MCV only works with full match / mismatch).
- *
- * The algorithm works like this:
- *
- *   1) mark all buckets as 'full match'
- *   2) walk through all the clauses
- *   3) for a particular clause, walk through all the buckets
- *   4) skip buckets that are already 'no match'
- *   5) check clause for buckets that still match (at least partially)
- *   6) sum frequencies for buckets to get selectivity
- *
- * Unlike MCV lists, histograms have a concept of a partial match. In
- * that case we use 1/2 the bucket, to minimize the average error. The
- * MV histograms are usually less detailed than the per-column ones,
- * meaning the sum is often quite high (thanks to combining a lot of
- * "partially hit" buckets).
- *
- * Maybe we could use per-bucket information with number of distinct
- * values it contains (for each dimension), and then use that to correct
- * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
- * frequency). We might also scale the value depending on the actual
- * ndistinct estimate (not just the values observed in the sample).
- *
- * Another option would be to multiply the selectivities, i.e. if we get
- * 'partial match' for a bucket for multiple conditions, we might use
- * 0.5^k (where k is the number of conditions), instead of 0.5. This
- * probably does not minimize the average error, though.
- *
- * TODO This might use a similar shortcut to MCV lists - count buckets
- *      marked as partial/full match, and terminate once this drop to 0.
- *      Not sure if it's really worth it - for MCV lists a situation like
- *      this is not uncommon, but for histograms it's not that clear.
- */
-static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
-									List *clauses, List *conditions, bool is_or)
-{
-	int i;
-	Selectivity s = 0.0;
-	Selectivity t = 0.0;
-	Selectivity u = 0.0;
-
-	int		nmatches = 0;
-	int		nconditions = 0;
-	char   *matches = NULL;
-	char   *condition_matches = NULL;
-
-	MVSerializedHistogram mvhist = NULL;
-
-	/* there's no histogram */
-	if (! mvstats->hist_built)
-		return 0.0;
-
-	/* There may be no histogram in the stats (check hist_built flag) */
-	mvhist = load_mv_histogram(mvstats->mvoid);
-
-	Assert (mvhist != NULL);
-	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 1);
-
-	nmatches = mvhist->nbuckets;
-	nconditions = mvhist->nbuckets;
-
-	/*
-	 * Bitmap of bucket matches (mismatch, partial, full).
-	 *
-	 * For AND clauses all buckets match (and we'll eliminate them).
-	 * For OR  clauses no  buckets match (and we'll add them).
-	 *
-	 * We only need to do the memset for AND clauses (for OR clauses
-	 * it's already set correctly by the palloc0).
-	 */
-	matches = palloc0(sizeof(char) * nmatches);
-
-	if (! is_or) /* AND-clause */
-		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
-
-	/* Conditions are treated as AND clause, so match by default. */
-	condition_matches = palloc0(sizeof(char)*nconditions);
-	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+			char * tmp_matches = NULL;
 
-	/*
-	 * build the match bitmap for the conditions (conditions are always
-	 * connected by AND)
-	 */
-	if (conditions != NIL)
-		update_match_bitmap_histogram(root, conditions,
-								  mvstats->stakeys, mvhist,
-								  nconditions, condition_matches, false);
+			Assert(tmp_clauses != NIL);
+			Assert(list_length(tmp_clauses) >= 2);
 
-	/*
-	 * build the match bitmap for the estimated clauses
-	 *
-	 * TODO This evaluates the clauses for all buckets, even those
-	 *      ruled out by the conditions. The final result should be
-	 *      the same, but it might be faster.
-	 */
-	update_match_bitmap_histogram(root, clauses,
-								  mvstats->stakeys, mvhist,
-								  ((is_or) ? 0 : nmatches), matches,
-								  is_or);
+			/* number of matching MCV items */
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
-	/* now, walk through the buckets and sum the selectivities */
-	for (i = 0; i < mvhist->nbuckets; i++)
-	{
-		float coeff = 1.0;
+			/* by default none of the MCV items matches the clauses */
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-		/*
-		 * Find out what part of the data is covered by the histogram,
-		 * so that we can 'scale' the selectivity properly (e.g. when
-		 * only 50% of the sample got into the histogram, and the rest
-		 * is in a MCV list).
-		 *
-		 * TODO This might be handled by keeping a global "frequency"
-		 *      for the whole histogram, which might save us some time
-		 *      spent accessing the not-matching part of the histogram.
-		 *      Although it's likely in a cache, so it's very fast.
-		 */
-		u += mvhist->buckets[i]->ntuples;
+			/* AND clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
-		/* skip buckets not matching the conditions */
-		if (condition_matches[i] == MVSTATS_MATCH_NONE)
-			continue;
-		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
-			coeff = 0.5;
+			/* build the match bitmap for the OR-clauses */
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
+									   stakeys, mcvlist,
+									   tmp_nmatches, tmp_matches,
+									   lowsel, fullmatch, or_clause(clause));
 
-		t += coeff * mvhist->buckets[i]->ntuples;
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
+			}
 
-		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += coeff * mvhist->buckets[i]->ntuples;
-		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			/*
-			 * TODO If both conditions and clauses match partially, this
-			 *      will use 0.25 match - not sure if that's the right
-			 *      thing solution, but seems about right.
-			 */
-			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
+			pfree(tmp_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* release the allocated bitmap and deserialized histogram */
-	pfree(matches);
-	pfree(condition_matches);
-	pfree(mvhist);
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
 
-	/* no condition matches */
-	if (t == 0.0)
-		return (Selectivity)0.0;
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
 
-	return (s / t) * u;
+	return nmatches;
 }
 
 /*
@@ -4691,362 +2092,479 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 	return nmatches;
 }
 
-/*
- * Walk through clauses and keep only those covered by at least
- * one of the statistics.
- */
-static List *
-filter_clauses(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
-			   int type, List *stats, List *clauses, Bitmapset **attnums)
+static Node *
+stripRestrictStatData(List *clauses, BoolExprType boolop, Bitmapset **attrs)
 {
-	ListCell   *c;
-	ListCell   *s;
-
-	/* results (list of compatible clauses, attnums) */
-	List	   *rclauses = NIL;
+	Expr *newexpr;
+	ListCell *lc;
 
-	foreach (c, clauses)
+	if (attrs) *attrs = NULL;
+	
+	if (list_length(clauses) == 0)
+		newexpr = NULL;
+	else if (list_length(clauses) == 1)
 	{
-		Node *clause = (Node*)lfirst(c);
-		Bitmapset *clause_attnums = NULL;
-		Index relid;
+		RestrictStatData *rsd = (RestrictStatData *) linitial(clauses);
+		Assert(IsA(rsd, RestrictStatData));
 
-		/*
-		 * The clause has to be mv-compatible (suitable operators etc.).
-		 */
-		if (! clause_is_mv_compatible(root, clause, varRelid,
-							 &relid, &clause_attnums, sjinfo, type))
-				elog(ERROR, "should not get non-mv-compatible cluase");
+		newexpr = (Expr*)(rsd->clause);
+		if (attrs) *attrs = rsd->mvattrs;
+	}
+	else
+	{
+		BoolExpr *newboolexpr;
+		newboolexpr = makeNode(BoolExpr);
+		newboolexpr->boolop = boolop;
 
-		/* is there a statistics covering this clause? */
-		foreach (s, stats)
+		foreach (lc, clauses)
 		{
-			int k, matches = 0;
-			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
-
-			for (k = 0; k < stat->stakeys->dim1; k++)
-			{
-				if (bms_is_member(stat->stakeys->values[k],
-								  clause_attnums))
-					matches += 1;
-			}
-
-			/*
-			 * The clause is compatible if all attributes it references
-			 * are covered by the statistics.
-			 */
-			if (bms_num_members(clause_attnums) == matches)
-			{
-				*attnums = bms_union(*attnums, clause_attnums);
-				rclauses = lappend(rclauses, clause);
-				break;
-			}
+			RestrictStatData *rsd = (RestrictStatData *) lfirst(lc);
+			Assert(IsA(rsd, RestrictStatData));
+			newboolexpr->args = 
+				lappend(newboolexpr->args, rsd->clause);
+			if (attrs)
+				*attrs = bms_add_members(*attrs, rsd->mvattrs);
 		}
-
-		bms_free(clause_attnums);
+		newexpr = (Expr*) newboolexpr;
 	}
 
-	/* we can't have more compatible conditions than source conditions */
-	Assert(list_length(clauses) >= list_length(rclauses));
-
-	return rclauses;
+	return (Node*)newexpr;
 }
 
-
-/*
- * Walk through statistics and only keep those covering at least
- * one new attribute (excluding conditions) and at two attributes
- * in both clauses and conditions.
- *
- * This check might be made more strict by checking against individual
- * clauses, because by using the bitmapsets of all attnums we may
- * actually use attnums from clauses that are not covered by the
- * statistics. For example, we may have a condition
- *
- *    (a=1 AND b=2)
- *
- * and a new clause
- *
- *    (c=1 AND d=1)
- *
- * With only bitmapsets, statistics on [b,c] will pass through this
- * (assuming there are some statistics covering both clases).
- *
- * TODO Do the more strict check.
- */
-static List *
-filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+RestrictStatData *
+transformRestrictInfoForEstimate(PlannerInfo *root, List *clauses,
+								 int relid, SpecialJoinInfo *sjinfo)
 {
-	ListCell   *s;
-	List	   *stats_filtered = NIL;
+	static int level = 0;
+	int i = -1;
+	char head[100];
+	RestrictStatData *rdata = makeNode(RestrictStatData);
+	Node *clause;
 
-	foreach (s, stats)
+	memset(head, '.', 100);
+	head[level] = 0;
+	
+	if (list_length(clauses) == 1 &&
+		!IsA((Node*)linitial(clauses), RestrictInfo))
 	{
-		int k;
-		int matches_new = 0,
-			matches_all = 0;
-
-		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
-
-		/* see how many attributes the statistics covers */
-		for (k = 0; k < stat->stakeys->dim1; k++)
-		{
-			/* attributes from new clauses */
-			if (bms_is_member(stat->stakeys->values[k], new_attnums))
-				matches_new += 1;
-
-			/* attributes from onditions */
-			if (bms_is_member(stat->stakeys->values[k], all_attnums))
-				matches_all += 1;
-		}
-
-		/* check we have enough attributes for this statistics */
-		if ((matches_new >= 1) && (matches_all >= 2))
-			stats_filtered = lappend(stats_filtered, stat);
+		Assert(relid > 0);
+		clause = (Node*)linitial(clauses);
 	}
+	else
+	{
+		/* This is top level clauselist. Convert it to and expression */
+		ListCell *lc;
+		Index clauserelid = 0;
+		Relids relids = pull_varnos((Node*)clauses);
 
-	/* we can't have more useful stats than we had originally */
-	Assert(list_length(stats) >= list_length(stats_filtered));
-
-	return stats_filtered;
-}
+		if (bms_num_members(relids) != 1)
+			return NULL;
 
-static MVStatisticInfo *
-make_stats_array(List *stats, int *nmvstats)
-{
-	int i;
-	ListCell   *l;
+		clauserelid = bms_singleton_member(relids);
+		if (relid != 0 && relid != clauserelid)
+			return NULL;
 
-	MVStatisticInfo *mvstats = NULL;
-	*nmvstats = list_length(stats);
+		relid = clauserelid;
 
-	mvstats
-		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+		if (list_length(clauses) == 1)
+		{
+			/*
+			 * If the clauselist had only 1 element, it should be a toplevel
+			 * RestrictInfo.
+			 */
+			RestrictInfo *rinfo = (RestrictInfo *) linitial(clauses);
+			Assert(IsA(rinfo, RestrictInfo));
+			
+			/* The only RestrictInfo is a join clause. Bail out. */
+			if (rinfo->pseudoconstant ||
+				treat_as_join_clause((Node*)rinfo->clause,
+									 rinfo, 0, sjinfo))
+				return NULL;
+
+			clause = (Node*) rinfo->clause;
+		}
+		else
+		{
+			BoolExpr *andexpr = makeNode(BoolExpr);
+			andexpr->boolop = AND_EXPR;
+			foreach (lc, clauses)
+			{
+				RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+				
+				Assert(IsA(rinfo, RestrictInfo));
+
+				/* stash caluses unrelated to multivariate statistics. */
+				if (rinfo->pseudoconstant ||
+					treat_as_join_clause((Node*)rinfo->clause,
+										 rinfo, 0, sjinfo))
+					rdata->unusedrinfos = lappend(rdata->unusedrinfos,
+												  rinfo);
+				else
+					andexpr->args = lappend(andexpr->args, rinfo->clause);
+			}
+			clause = (Node*)andexpr;
+		}
 
-	i = 0;
-	foreach (l, stats)
-	{
-		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
-		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
 	}
 
-	return mvstats;
-}
-
-static Bitmapset **
-make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
-{
-	int			i, j;
-	Bitmapset **stats_attnums = NULL;
+	Assert(!IsA(clause, RestrictInfo));
 
-	Assert(nmvstats > 0);
+	rdata->clause = clause;
+	rdata->boolop = AND_EXPR;
 
-	/* build bitmaps of attnums for the stats (easier to compare) */
-	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+	if (and_clause(clause) || or_clause(clause))
+	{
+		BoolExpr *boolexpr = (BoolExpr *)clause;
+		ListCell *lc;
+		List *mvclauses = NIL;
+		List *nonmvclauses = NIL;
+		List *partialclauses = NIL;
+		Bitmapset *resultattrs = NULL;
+		List *resultstats = NIL;
 
-	for (i = 0; i < nmvstats; i++)
-		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
-			stats_attnums[i]
-				= bms_add_member(stats_attnums[i],
-								 mvstats[i].stakeys->values[j]);
+		rdata->boolop = boolexpr->boolop;
+		ereport(DEBUG1,
+				(errmsg ("%s%s[%d][%d](%d)",
+						 head,
+						 and_clause(clause)?"AND":
+						 (or_clause(clause)?"OR":"NOT"),
+						 level, i, list_length(boolexpr->args)),
+				 errhidestmt(level)));
 
-	return stats_attnums;
-}
+		/* Recursively process the subexpressions */
+		level++;
+		foreach (lc, (boolexpr->args))
+		{
+			Node *nd = (Node*) lfirst(lc);
+			RestrictStatData *tmpsd;
 
+			tmpsd = transformRestrictInfoForEstimate(root,
+													 list_make1(nd),
+													 relid, sjinfo);
+			/*
+			 * mvclauses is to hold the child RestrictStatData that
+			 * potentially can be pulled-up to this node's mvclause, which is
+			 * to be estimated using multivariate statistics.
+			 *
+			 * partialclauses is to hold the child RestrictStatData that
+			 * cannot be pulled-up.
+			 * 
+			 * nonmvclauses is to hold the child RestrictStatData to be
+			 * pulled-up into the clause to be estimated in the normal way.
+			 */
+			if (tmpsd->mvattrs)
+				mvclauses = lappend(mvclauses, tmpsd);
+			else if (tmpsd->mvclause)
+				partialclauses = lappend(partialclauses, tmpsd);
+			else
+				nonmvclauses = lappend(nonmvclauses, tmpsd);
+		}
+		level--;
 
-/*
- * Now let's remove redundant statistics, covering the same columns
- * as some other stats, when restricted to the attributes from
- * remaining clauses.
- *
- * If statistics S1 covers S2 (covers S2 attributes and possibly
- * some more), we can probably remove S2. What actually matters are
- * attributes from covered clauses (not all the attributes). This
- * might however prefer larger, and thus less accurate, statistics.
- *
- * When a redundancy is detected, we simply keep the smaller
- * statistics (less number of columns), on the assumption that it's
- * more accurate and faster to process. That might be incorrect for
- * two reasons - first, the accuracy really depends on number of
- * buckets/MCV items, not the number of columns. Second, we might
- * prefer MCV lists over histograms or something like that.
- */
-static List*
-filter_redundant_stats(List *stats, List *clauses, List *conditions)
-{
-	int i, j, nmvstats;
 
-	MVStatisticInfo	   *mvstats;
-	bool			   *redundant;
-	Bitmapset		  **stats_attnums;
-	Bitmapset		   *varattnos;
-	Index				relid;
+		if (list_length(mvclauses) == 1)
+		{
+			/*
+			 * If this boolean clause has only one mv clause, pull it up for
+			 * now.
+			 */
+			RestrictStatData *rsd = (RestrictStatData *) linitial(mvclauses);
+			resultattrs = rsd->mvattrs;
+			resultstats = rsd->mvstats;
+		}
+		if (list_length(mvclauses) > 1)
+		{
+			/*
+			 * Pick up the smallest mv-stats that covers as large part as
+			 * possible of the attrutes appeard in the subclauses, then remove
+			 * clauses that is not covered by the selected mv-stats.
+			 */
+			int nmvstats = 0;
+			ListCell *lc;
+			bm_mvstat *mvstatslist[16];
+			int maxnattrs = 0;
+			int i;
 
-	Assert(list_length(stats) > 0);
-	Assert(list_length(clauses) > 0);
+			/*
+			 * Collect all mvstats from all subclauses. Attribute set should
+			 * be unique so use it as key. There should be not so many stats.
+			 */
+			foreach (lc, mvclauses)
+			{
+				RestrictStatData *rsd = (RestrictStatData *) lfirst(lc);
+				Bitmapset *mvattrs = rsd->mvattrs;
+				ListCell *lcs;
 
-	/*
-	 * We'll convert the list of statistics into an array now, because
-	 * the reduction of redundant statistics is easier to do that way
-	 * (we can mark previous stats as redundant, etc.).
-	 */
-	mvstats = make_stats_array(stats, &nmvstats);
-	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+				/* make a covering attribute set of all cluases */
+				resultattrs = bms_add_members(resultattrs, mvattrs);
 
-	/* by default, none of the stats is redundant (so palloc0) */
-	redundant = palloc0(nmvstats * sizeof(bool));
+				/* pick up new mv stats from lower clauses */
+				foreach (lcs, rsd->mvstats)
+				{
+					bm_mvstat *mvs = (bm_mvstat*) lfirst(lcs);
+					bool found = false;
 
-	/*
-	 * We only expect a single relid here, and also we should get the
-	 * same relid from clauses and conditions (but we get it from
-	 * clauses, because those are certainly non-empty).
-	 */
-	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+					for (i = 0 ; !found && i < nmvstats ; i++)
+					{
+						if(bms_equal(mvstatslist[i]->attrs, mvs->attrs))
+							found = true;
+					}
+					if (!found)
+					{
+						mvstatslist[nmvstats] = mvs;
+						nmvstats++;
+					}
 
-	/*
-	 * Get the varattnos from both conditions and clauses.
-	 *
-	 * This skips system attributes, although that should be impossible
-	 * thanks to previous filtering out of incompatible clauses.
-	 *
-	 * XXX Is that really true?
-	 */
-	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
-						  get_varattnos((Node*)conditions, relid));
+					/* ignore more than 15(!) stats for a clause */
+					if (nmvstats > 15)
+						break;
+				}
+			}
 
-	for (i = 1; i < nmvstats; i++)
-	{
-		/* intersect with current statistics */
-		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
+			/* Check functional dependency first, maybe.. */
+//			if (list_length(mvclauses) == 2)
+//			{
+//				RestrictStatData *rsd1 =
+//					(RestrictStatData *) linitial(mvclauses);
+//				RestrictStatData *rsd2 =
+//					(RestrictStatData *) lsecond(mvclauses);
+//				/* To do more...*/
+//			}
 
-		/* walk through 'previous' stats and check redundancy */
-		for (j = 0; j < i; j++)
-		{
-			/* intersect with current statistics */
-			Bitmapset *prev;
+			//if (clauseboolop == AND_EXPR && ...
+				
+			maxnattrs = 0;
 
-			/* skip stats already identified as redundant */
-			if (redundant[j])
-				continue;
+			/*
+			 * Find stats covering largest number of attributes in this
+			 * clause
+			 */
+			for (i = 0 ; i < nmvstats ; i++)
+			{
+				Bitmapset *matchattr =
+					bms_intersect(resultattrs, mvstatslist[i]->attrs);
+				int nmatchattrs = bms_num_members(matchattr);
 
-			prev = bms_intersect(stats_attnums[j], varattnos);
+				if (maxnattrs < nmatchattrs)
+				{
+					/* The candidates so far is not maximum */
+					if (nmvstats - i > 0)
+						memmove(mvstatslist, mvstatslist + i,
+								(nmvstats - i) * sizeof(bm_mvstat*));
+					maxnattrs = nmatchattrs;
+					nmvstats =- i;
+					i = 0; /* Restart from the first */
+				}
+				else if (maxnattrs > nmatchattrs)
+				{
+					/* Remove this stats */
+					if (nmvstats - i - 1> 0)
+						memmove(mvstatslist + i, mvstatslist + i + 1,
+								(nmvstats - i - 1) * sizeof(bm_mvstat*));
+					nmvstats--;
+				}
+			}
 
-			switch (bms_subset_compare(curr, prev))
+			if (maxnattrs < 2)
+			{
+				/* mv stats dosn't apply one attribute */
+				mvclauses = NIL;
+				nonmvclauses = NIL;
+				resultattrs = NULL;
+				resultstats = NIL;
+			}
+			else
 			{
-				case BMS_EQUAL:
+				/* Consider only the first stats for now.. */
+				if (nmvstats > 1)
+					elog(LOG, "Some mv stats are ignored");
+
+				if (!bms_is_subset(resultattrs,
+								   mvstatslist[0]->attrs))
+				{
 					/*
-					 * Use the smaller one (hopefully more accurate).
-					 * If both have the same size, use the first one.
+					 * move out the clauses that is not covered by the
+					 * candidate stats
 					 */
-					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
-						redundant[i] = TRUE;
-					else
-						redundant[j] = TRUE;
-
-					break;
-
-				case BMS_SUBSET1: /* curr is subset of prev */
-					redundant[i] = TRUE;
-					break;
+					List *old_mvclauses = mvclauses;
+					ListCell *lc;
+					Bitmapset *statsattrs =
+						mvstatslist[0]->attrs;
+					mvclauses = NIL;
 
-				case BMS_SUBSET2: /* prev is subset of curr */
-					redundant[j] = TRUE;
-					break;
+					foreach(lc, old_mvclauses)
+					{
+						RestrictStatData *rsd =	(RestrictStatData *) lfirst(lc);
+						Assert(IsA(rsd, RestrictStatData));
 
-				case BMS_DIFFERENT:
-					/* do nothing - keep both stats */
-					break;
+						if (bms_is_subset(rsd->mvattrs, statsattrs))
+							mvclauses = lappend(mvclauses, rsd);
+						else
+							nonmvclauses = lappend(nonmvclauses, rsd);
+					}
+					resultattrs = bms_intersect(resultattrs, 
+										mvstatslist[0]->attrs);
+				}
+				resultstats = list_make1(mvstatslist[0]);
 			}
-
-			bms_free(prev);
 		}
 
-		bms_free(curr);
-	}
-
-	/* can't reduce all statistics (at least one has to remain) */
-	Assert(nmvstats > 0);
+		if (bms_num_members(resultattrs) < 2)
+		{
+			/*
+			 * make this non-mv if mvclause covers only one mv-attribute.
+			 */
+			nonmvclauses = list_concat(nonmvclauses, mvclauses);
+			mvclauses = NULL;
+			resultattrs = NULL;
+			resultstats = NIL;
+		}
 
-	/* now, let's remove the reduced statistics from the arrays */
-	list_free(stats);
-	stats = NIL;
+		/*
+		 * All mvclauses are covered by the candidate stats	here.
+		 */
+		rdata->mvclause =
+			stripRestrictStatData(mvclauses, rdata->boolop, NULL);
+		rdata->children = partialclauses;
+		rdata->mvattrs = resultattrs;
+		rdata->nonmvclause =
+			stripRestrictStatData(nonmvclauses, rdata->boolop, NULL);
+		rdata->mvstats = resultstats;
 
-	for (i = 0; i < nmvstats; i++)
+	}
+	else if (not_clause(clause))
 	{
-		MVStatisticInfo *info;
-
-		pfree(stats_attnums[i]);
+		Node *nd = (Node *) linitial(((BoolExpr*)clause)->args);
+		RestrictStatData *tmpsd;
 
-		if (redundant[i])
-			continue;
-
-		info = makeNode(MVStatisticInfo);
-		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
-
-		stats = lappend(stats, info);
+		tmpsd = transformRestrictInfoForEstimate(root, list_make1(nd),
+												 relid, sjinfo);
+		rdata->children = list_make1(tmpsd);
 	}
-
-	pfree(mvstats);
-	pfree(stats_attnums);
-	pfree(redundant);
-
-	return stats;
-}
-
-static Node**
-make_clauses_array(List *clauses, int *nclauses)
-{
-	int i;
-	ListCell *l;
-
-	Node** clauses_array;
-
-	*nclauses = list_length(clauses);
-	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
-
-	i = 0;
-	foreach (l, clauses)
-		clauses_array[i++] = (Node *)lfirst(l);
-
-	*nclauses = i;
-
-	return clauses_array;
-}
-
-static Bitmapset **
-make_clauses_attnums(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
-					 int type, Node **clauses, int nclauses)
-{
-	int			i;
-	Index		relid;
-	Bitmapset **clauses_attnums
-		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
-
-	for (i = 0; i < nclauses; i++)
+	else if (is_opclause(clause) &&
+			 list_length(((OpExpr *) clause)->args) == 2)
 	{
-		Bitmapset * attnums = NULL;
+		Node *varnode = get_leftop((Expr*)clause);
+		Node *nonvarnode = get_rightop((Expr*)clause);
 
-		if (! clause_is_mv_compatible(root, clauses[i], varRelid,
-									  &relid, &attnums, sjinfo, type))
-			elog(ERROR, "should not get non-mv-compatible cluase");
+		/* Place var on vernode if any */
+		if (!IsA(varnode, Var))
+		{
+			Node *tmp = nonvarnode;
+			nonvarnode = varnode;
+			varnode = tmp;
+		}
+		
+		if (IsA(varnode, Var) && is_pseudo_constant_clause(nonvarnode))
+		{
+			Var *var = (Var *)varnode;
+			List *statslist = root->simple_rel_array[relid]->mvstatlist;
+			Oid opno = ((OpExpr*)clause)->opno;
+			int varmvbitmap = get_oprmvstat(opno);
+
+			if (varmvbitmap &&
+				!IS_SPECIAL_VARNO(var->varno) &&
+				AttrNumberIsForUserDefinedAttr(var->varattno))
+			{
+				List *mvstats = NIL;
+				ListCell *lc;
+				Bitmapset *varattrs = bms_make_singleton(var->varattno);
 
-		clauses_attnums[i] = attnums;
+				/*
+				 * Add mv statistics if it is applicable on this expression
+				 */
+				foreach (lc, statslist)
+				{
+					int k;
+					MVStatisticInfo *stats = (MVStatisticInfo *) lfirst(lc);
+					Bitmapset *statsattrs = NULL;
+					int statsmvbitmap =
+						(stats->mcv_built ? MVSTATISTIC_MCV : 0) |
+						(stats->hist_built ? MVSTATISTIC_HIST : 0) |
+						(stats->deps_built ? MVSTATISTIC_FDEP : 0);
+
+					for (k = 0 ; k < stats->stakeys->dim1 ; k++)
+						statsattrs = bms_add_member(statsattrs,
+													stats->stakeys->values[k]);
+					/* XXX: Does this work as expected? */
+					if (bms_is_subset(varattrs, statsattrs) &&
+						(statsmvbitmap & varmvbitmap))
+					{
+						bm_mvstat *mvstatsent = palloc0(sizeof(bm_mvstat));
+						mvstatsent->attrs = statsattrs;
+						mvstatsent->stats = stats;
+						mvstatsent->mvkind = statsmvbitmap;
+						mvstats = lappend(mvstats, mvstatsent);
+					}
+				}
+				if (mvstats)
+				{
+					/* MV stats is potentially applicable on this expression */
+					ereport(DEBUG1,
+							(errmsg ("%sMATCH[%d][%d](varno = %d, attno = %d)",
+									 head, level, i,
+									 var->varno, var->varattno),
+							 errhidestmt(level)));
+
+					rdata->mvstats = mvstats;
+					rdata->mvattrs = varattrs;
+				}
+			}
+		}
+		else
+		{
+			ereport(DEBUG1,
+					(errmsg ("%sno match BinOp[%d][%d]: r=%d, l=%d",
+							 head, level, i,
+							 varnode->type, nonvarnode->type),
+					 errhidestmt(level)));
+		}
 	}
+	else if (IsA(clause, NullTest))
+	{
+		NullTest *expr = (NullTest*)clause;
+		Var *var = (Var *)(expr->arg);
 
-	return clauses_attnums;
-}
-
-static bool*
-make_cover_map(Bitmapset **stats_attnums, int nmvstats,
-			   Bitmapset **clauses_attnums, int nclauses)
-{
-	int		i, j;
-	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+		if (IsA(var, Var) &&
+			!IS_SPECIAL_VARNO(var->varno) &&
+			AttrNumberIsForUserDefinedAttr(var->varattno))
+		{
+			Bitmapset *varattrs = bms_make_singleton(var->varattno);
+			List *mvstats = NIL;
+			ListCell *lc;
 
-	for (i = 0; i < nmvstats; i++)
-		for (j = 0; j < nclauses; j++)
-			cover_map[i * nclauses + j]
-				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+			foreach(lc, root->simple_rel_array[relid]->mvstatlist)
+			{
+				MVStatisticInfo *stats = (MVStatisticInfo *) lfirst(lc);
+				Bitmapset *statsattrs = NULL;			
+				int k;
+
+				for (k = 0 ; k < stats->stakeys->dim1 ; k++)
+					statsattrs = bms_add_member(statsattrs,
+											stats->stakeys->values[k]);
+				if (bms_is_subset(varattrs, statsattrs))
+				{
+					bm_mvstat *mvstatsent = palloc0(sizeof(bm_mvstat));
+					mvstatsent->stats = stats;
+					mvstatsent->attrs = statsattrs;
+					mvstatsent->mvkind = (MVSTATISTIC_MCV |MVSTATISTIC_HIST);
+					mvstats = lappend(mvstats, mvstatsent);
+				}
+			}
+			if (mvstats)
+			{
+				rdata->mvstats = mvstats;
+				rdata->mvattrs = varattrs;
+			}		
+		}
+	}
+	else
+	{
+		ereport(DEBUG1,
+				(errmsg ("%sno match node(%d)[%d][%d]",
+						 head, clause->type, level, i),
+				 errhidestmt(level)));
+	}
 
-	return cover_map;
+	return rdata;
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 6837364..7069f60 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3380,8 +3380,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo,
-									NIL);
+									sjinfo);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3404,8 +3403,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo,
-									NIL);
+									&norm_sjinfo);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3572,7 +3570,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3608,8 +3606,7 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL,
-							   NIL);
+							   NULL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3646,8 +3643,7 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL,
-							   NIL);
+							   NULL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3785,14 +3781,12 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo,
-										NIL);
+										sjinfo);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo,
-										NIL);
+										sjinfo);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3804,8 +3798,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo,
-										NIL);
+										sjinfo);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index e41508b..f0acc14 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL, NIL);
+								  0, JOIN_INNER, NULL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo, NIL);
+										0, JOIN_INNER, &sjinfo);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cba54a4..64b6ae4 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1580,15 +1580,13 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo,
-													NIL);
+													jointype, sjinfo);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo,
-														  NIL);
+														  jointype, sjinfo);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6208,8 +6206,7 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL,
-											  NIL);
+											  NULL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6534,8 +6531,7 @@ btcostestimate(PG_FUNCTION_ARGS)
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL,
-												  NIL);
+												  NULL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7278,8 +7274,7 @@ gincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL,
-											   NIL);
+											   NULL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7511,7 +7506,7 @@ brincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL, NIL);
+							   JOIN_INNER, NULL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 1dc2932..6e3a0c7 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -44,6 +44,7 @@
 #include "utils/rel.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
+#include "utils/mvstats.h"
 
 /* Hook for plugins to get control in get_attavgwidth() */
 get_attavgwidth_hook_type get_attavgwidth_hook = NULL;
@@ -1344,6 +1345,45 @@ get_oprjoin(Oid opno)
 		return (RegProcedure) InvalidOid;
 }
 
+/*
+ * get_oprmvstat
+ *
+ *		Returns mv stats compatibility for computing selectivity
+ *      Return valueis bitwise or of MVSTATISTIC_* symbols
+ */
+int
+get_oprmvstat(Oid opno)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum tmp;
+		bool  isnull;
+		char *str;
+		int result = 0;
+
+		tmp = SysCacheGetAttr(OPEROID, tp,
+							  Anum_pg_operator_oprmvstat, &isnull);
+		if (!isnull)
+		{
+			str = TextDatumGetCString(tmp);
+			if (strlen(str) == 3)
+			{
+				if (str[0] != '-') result |= MVSTATISTIC_MCV;
+				if (str[1] != '-') result |= MVSTATISTIC_HIST;
+				if (str[2] != '-') result |= MVSTATISTIC_FDEP;
+			}
+		}
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return 0;
+}
+
+
 /*				---------- FUNCTION CACHE ----------					 */
 
 /*
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 26c9d4e..c75ac72 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -49,6 +49,9 @@ CATALOG(pg_operator,2617)
 	regproc		oprcode;		/* OID of underlying function */
 	regproc		oprrest;		/* OID of restriction estimator, or 0 */
 	regproc		oprjoin;		/* OID of join estimator, or 0 */
+#ifdef CATALOG_VARLEN			/* variable-length fields start here */
+	text		oprmvstat;		/* MV stat compatibility in '[m-][h-][f-]' */
+#endif
 } FormData_pg_operator;
 
 /* ----------------
@@ -63,7 +66,7 @@ typedef FormData_pg_operator *Form_pg_operator;
  * ----------------
  */
 
-#define Natts_pg_operator				14
+#define Natts_pg_operator				15
 #define Anum_pg_operator_oprname		1
 #define Anum_pg_operator_oprnamespace	2
 #define Anum_pg_operator_oprowner		3
@@ -78,6 +81,7 @@ typedef FormData_pg_operator *Form_pg_operator;
 #define Anum_pg_operator_oprcode		12
 #define Anum_pg_operator_oprrest		13
 #define Anum_pg_operator_oprjoin		14
+#define Anum_pg_operator_oprmvstat		15
 
 /* ----------------
  *		initial contents of pg_operator
@@ -91,1735 +95,1735 @@ typedef FormData_pg_operator *Form_pg_operator;
  * for the underlying function.
  */
 
-DATA(insert OID =  15 ( "="		   PGNSP PGUID b t t	23	20	16 416	36 int48eq eqsel eqjoinsel ));
+DATA(insert OID =  15 ( "="		   PGNSP PGUID b t t	23	20	16 416	36 int48eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  36 ( "<>"	   PGNSP PGUID b f f	23	20	16 417	15 int48ne neqsel neqjoinsel ));
+DATA(insert OID =  36 ( "<>"	   PGNSP PGUID b f f	23	20	16 417	15 int48ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID =  37 ( "<"		   PGNSP PGUID b f f	23	20	16 419	82 int48lt scalarltsel scalarltjoinsel ));
+DATA(insert OID =  37 ( "<"		   PGNSP PGUID b f f	23	20	16 419	82 int48lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID =  76 ( ">"		   PGNSP PGUID b f f	23	20	16 418	80 int48gt scalargtsel scalargtjoinsel ));
+DATA(insert OID =  76 ( ">"		   PGNSP PGUID b f f	23	20	16 418	80 int48gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID =  80 ( "<="	   PGNSP PGUID b f f	23	20	16 430	76 int48le scalarltsel scalarltjoinsel ));
+DATA(insert OID =  80 ( "<="	   PGNSP PGUID b f f	23	20	16 430	76 int48le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID =  82 ( ">="	   PGNSP PGUID b f f	23	20	16 420	37 int48ge scalargtsel scalargtjoinsel ));
+DATA(insert OID =  82 ( ">="	   PGNSP PGUID b f f	23	20	16 420	37 int48ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID =  58 ( "<"		   PGNSP PGUID b f f	16	16	16	59	1695 boollt scalarltsel scalarltjoinsel ));
+DATA(insert OID =  58 ( "<"		   PGNSP PGUID b f f	16	16	16	59	1695 boollt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID =  59 ( ">"		   PGNSP PGUID b f f	16	16	16	58	1694 boolgt scalargtsel scalargtjoinsel ));
+DATA(insert OID =  59 ( ">"		   PGNSP PGUID b f f	16	16	16	58	1694 boolgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID =  85 ( "<>"	   PGNSP PGUID b f f	16	16	16	85	91 boolne neqsel neqjoinsel ));
+DATA(insert OID =  85 ( "<>"	   PGNSP PGUID b f f	16	16	16	85	91 boolne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 #define BooleanNotEqualOperator   85
-DATA(insert OID =  91 ( "="		   PGNSP PGUID b t t	16	16	16	91	85 booleq eqsel eqjoinsel ));
+DATA(insert OID =  91 ( "="		   PGNSP PGUID b t t	16	16	16	91	85 booleq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define BooleanEqualOperator   91
-DATA(insert OID = 1694 (  "<="	   PGNSP PGUID b f f	16	16	16 1695 59 boolle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1694 (  "<="	   PGNSP PGUID b f f	16	16	16 1695 59 boolle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1695 (  ">="	   PGNSP PGUID b f f	16	16	16 1694 58 boolge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1695 (  ">="	   PGNSP PGUID b f f	16	16	16 1694 58 boolge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID =  92 ( "="		   PGNSP PGUID b t t	18	18	16	92 630 chareq eqsel eqjoinsel ));
+DATA(insert OID =  92 ( "="		   PGNSP PGUID b t t	18	18	16	92 630 chareq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  93 ( "="		   PGNSP PGUID b t t	19	19	16	93 643 nameeq eqsel eqjoinsel ));
+DATA(insert OID =  93 ( "="		   PGNSP PGUID b t t	19	19	16	93 643 nameeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  94 ( "="		   PGNSP PGUID b t t	21	21	16	94 519 int2eq eqsel eqjoinsel ));
+DATA(insert OID =  94 ( "="		   PGNSP PGUID b t t	21	21	16	94 519 int2eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  95 ( "<"		   PGNSP PGUID b f f	21	21	16 520 524 int2lt scalarltsel scalarltjoinsel ));
+DATA(insert OID =  95 ( "<"		   PGNSP PGUID b f f	21	21	16 520 524 int2lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID =  96 ( "="		   PGNSP PGUID b t t	23	23	16	96 518 int4eq eqsel eqjoinsel ));
+DATA(insert OID =  96 ( "="		   PGNSP PGUID b t t	23	23	16	96 518 int4eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define Int4EqualOperator	96
-DATA(insert OID =  97 ( "<"		   PGNSP PGUID b f f	23	23	16 521 525 int4lt scalarltsel scalarltjoinsel ));
+DATA(insert OID =  97 ( "<"		   PGNSP PGUID b f f	23	23	16 521 525 int4lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define Int4LessOperator	97
-DATA(insert OID =  98 ( "="		   PGNSP PGUID b t t	25	25	16	98 531 texteq eqsel eqjoinsel ));
+DATA(insert OID =  98 ( "="		   PGNSP PGUID b t t	25	25	16	98 531 texteq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define TextEqualOperator	98
 
-DATA(insert OID = 349 (  "||"	   PGNSP PGUID b f f 2277 2283 2277 0 0 array_append   -	   -	 ));
+DATA(insert OID = 349 (  "||"	   PGNSP PGUID b f f 2277 2283 2277 0 0 array_append   -	   -	 "---"));
 DESCR("append element onto end of array");
-DATA(insert OID = 374 (  "||"	   PGNSP PGUID b f f 2283 2277 2277 0 0 array_prepend  -	   -	 ));
+DATA(insert OID = 374 (  "||"	   PGNSP PGUID b f f 2283 2277 2277 0 0 array_prepend  -	   -	 "---"));
 DESCR("prepend element onto front of array");
-DATA(insert OID = 375 (  "||"	   PGNSP PGUID b f f 2277 2277 2277 0 0 array_cat	   -	   -	 ));
+DATA(insert OID = 375 (  "||"	   PGNSP PGUID b f f 2277 2277 2277 0 0 array_cat	   -	   -	 "---"));
 DESCR("concatenate");
 
-DATA(insert OID = 352 (  "="	   PGNSP PGUID b f t	28	28	16 352	 0 xideq eqsel eqjoinsel ));
+DATA(insert OID = 352 (  "="	   PGNSP PGUID b f t	28	28	16 352	 0 xideq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 353 (  "="	   PGNSP PGUID b f f	28	23	16	 0	 0 xideqint4 eqsel eqjoinsel ));
+DATA(insert OID = 353 (  "="	   PGNSP PGUID b f f	28	23	16	 0	 0 xideqint4 eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 388 (  "!"	   PGNSP PGUID r f f	20	 0	1700  0  0 numeric_fac - - ));
+DATA(insert OID = 388 (  "!"	   PGNSP PGUID r f f	20	 0	1700  0  0 numeric_fac - - "---"));
 DESCR("factorial");
-DATA(insert OID = 389 (  "!!"	   PGNSP PGUID l f f	 0	20	1700  0  0 numeric_fac - - ));
+DATA(insert OID = 389 (  "!!"	   PGNSP PGUID l f f	 0	20	1700  0  0 numeric_fac - - "---"));
 DESCR("deprecated, use ! instead");
-DATA(insert OID = 385 (  "="	   PGNSP PGUID b f t	29	29	16 385	 0 cideq eqsel eqjoinsel ));
+DATA(insert OID = 385 (  "="	   PGNSP PGUID b f t	29	29	16 385	 0 cideq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 386 (  "="	   PGNSP PGUID b f t	22	22	16 386	 0 int2vectoreq eqsel eqjoinsel ));
+DATA(insert OID = 386 (  "="	   PGNSP PGUID b f t	22	22	16 386	 0 int2vectoreq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 
-DATA(insert OID = 387 (  "="	   PGNSP PGUID b t f	27	27	16 387 402 tideq eqsel eqjoinsel ));
+DATA(insert OID = 387 (  "="	   PGNSP PGUID b t f	27	27	16 387 402 tideq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define TIDEqualOperator   387
-DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
+DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define TIDLessOperator    2799
-DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
+DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 411 ( "<>"	   PGNSP PGUID b f f	20	20	16 411 410 int8ne neqsel neqjoinsel ));
+DATA(insert OID = 411 ( "<>"	   PGNSP PGUID b f f	20	20	16 411 410 int8ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 412 ( "<"		   PGNSP PGUID b f f	20	20	16 413 415 int8lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 412 ( "<"		   PGNSP PGUID b f f	20	20	16 413 415 int8lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define Int8LessOperator	412
-DATA(insert OID = 413 ( ">"		   PGNSP PGUID b f f	20	20	16 412 414 int8gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 413 ( ">"		   PGNSP PGUID b f f	20	20	16 412 414 int8gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 414 ( "<="	   PGNSP PGUID b f f	20	20	16 415 413 int8le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 414 ( "<="	   PGNSP PGUID b f f	20	20	16 415 413 int8le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 415 ( ">="	   PGNSP PGUID b f f	20	20	16 414 412 int8ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 415 ( ">="	   PGNSP PGUID b f f	20	20	16 414 412 int8ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 416 ( "="		   PGNSP PGUID b t t	20	23	16	15 417 int84eq eqsel eqjoinsel ));
+DATA(insert OID = 416 ( "="		   PGNSP PGUID b t t	20	23	16	15 417 int84eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 417 ( "<>"	   PGNSP PGUID b f f	20	23	16	36 416 int84ne neqsel neqjoinsel ));
+DATA(insert OID = 417 ( "<>"	   PGNSP PGUID b f f	20	23	16	36 416 int84ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 418 ( "<"		   PGNSP PGUID b f f	20	23	16	76 430 int84lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 418 ( "<"		   PGNSP PGUID b f f	20	23	16	76 430 int84lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 419 ( ">"		   PGNSP PGUID b f f	20	23	16	37 420 int84gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 419 ( ">"		   PGNSP PGUID b f f	20	23	16	37 420 int84gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 420 ( "<="	   PGNSP PGUID b f f	20	23	16	82 419 int84le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 420 ( "<="	   PGNSP PGUID b f f	20	23	16	82 419 int84le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 430 ( ">="	   PGNSP PGUID b f f	20	23	16	80 418 int84ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 430 ( ">="	   PGNSP PGUID b f f	20	23	16	80 418 int84ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 439 (  "%"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8mod - - ));
+DATA(insert OID = 439 (  "%"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8mod - - "---"));
 DESCR("modulus");
-DATA(insert OID = 473 (  "@"	   PGNSP PGUID l f f	 0	20	20	 0	 0 int8abs - - ));
+DATA(insert OID = 473 (  "@"	   PGNSP PGUID l f f	 0	20	20	 0	 0 int8abs - - "---"));
 DESCR("absolute value");
 
-DATA(insert OID = 484 (  "-"	   PGNSP PGUID l f f	 0	20	20	 0	 0 int8um - - ));
+DATA(insert OID = 484 (  "-"	   PGNSP PGUID l f f	 0	20	20	 0	 0 int8um - - "---"));
 DESCR("negate");
-DATA(insert OID = 485 (  "<<"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_left positionsel positionjoinsel ));
+DATA(insert OID = 485 (  "<<"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_left positionsel positionjoinsel "---"));
 DESCR("is left of");
-DATA(insert OID = 486 (  "&<"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_overleft positionsel positionjoinsel ));
+DATA(insert OID = 486 (  "&<"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_overleft positionsel positionjoinsel "---"));
 DESCR("overlaps or is left of");
-DATA(insert OID = 487 (  "&>"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_overright positionsel positionjoinsel ));
+DATA(insert OID = 487 (  "&>"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_overright positionsel positionjoinsel "---"));
 DESCR("overlaps or is right of");
-DATA(insert OID = 488 (  ">>"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_right positionsel positionjoinsel ));
+DATA(insert OID = 488 (  ">>"	   PGNSP PGUID b f f 604 604	16	 0	 0 poly_right positionsel positionjoinsel "---"));
 DESCR("is right of");
-DATA(insert OID = 489 (  "<@"	   PGNSP PGUID b f f 604 604	16 490	 0 poly_contained contsel contjoinsel ));
+DATA(insert OID = 489 (  "<@"	   PGNSP PGUID b f f 604 604	16 490	 0 poly_contained contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 490 (  "@>"	   PGNSP PGUID b f f 604 604	16 489	 0 poly_contain contsel contjoinsel ));
+DATA(insert OID = 490 (  "@>"	   PGNSP PGUID b f f 604 604	16 489	 0 poly_contain contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 491 (  "~="	   PGNSP PGUID b f f 604 604	16 491	 0 poly_same eqsel eqjoinsel ));
+DATA(insert OID = 491 (  "~="	   PGNSP PGUID b f f 604 604	16 491	 0 poly_same eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 492 (  "&&"	   PGNSP PGUID b f f 604 604	16 492	 0 poly_overlap areasel areajoinsel ));
+DATA(insert OID = 492 (  "&&"	   PGNSP PGUID b f f 604 604	16 492	 0 poly_overlap areasel areajoinsel "---"));
 DESCR("overlaps");
-DATA(insert OID = 493 (  "<<"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_left positionsel positionjoinsel ));
+DATA(insert OID = 493 (  "<<"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_left positionsel positionjoinsel "---"));
 DESCR("is left of");
-DATA(insert OID = 494 (  "&<"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_overleft positionsel positionjoinsel ));
+DATA(insert OID = 494 (  "&<"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_overleft positionsel positionjoinsel "---"));
 DESCR("overlaps or is left of");
-DATA(insert OID = 495 (  "&>"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_overright positionsel positionjoinsel ));
+DATA(insert OID = 495 (  "&>"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_overright positionsel positionjoinsel "---"));
 DESCR("overlaps or is right of");
-DATA(insert OID = 496 (  ">>"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_right positionsel positionjoinsel ));
+DATA(insert OID = 496 (  ">>"	   PGNSP PGUID b f f 603 603	16	 0	 0 box_right positionsel positionjoinsel "---"));
 DESCR("is right of");
-DATA(insert OID = 497 (  "<@"	   PGNSP PGUID b f f 603 603	16 498	 0 box_contained contsel contjoinsel ));
+DATA(insert OID = 497 (  "<@"	   PGNSP PGUID b f f 603 603	16 498	 0 box_contained contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 498 (  "@>"	   PGNSP PGUID b f f 603 603	16 497	 0 box_contain contsel contjoinsel ));
+DATA(insert OID = 498 (  "@>"	   PGNSP PGUID b f f 603 603	16 497	 0 box_contain contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 499 (  "~="	   PGNSP PGUID b f f 603 603	16 499	 0 box_same eqsel eqjoinsel ));
+DATA(insert OID = 499 (  "~="	   PGNSP PGUID b f f 603 603	16 499	 0 box_same eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 500 (  "&&"	   PGNSP PGUID b f f 603 603	16 500	 0 box_overlap areasel areajoinsel ));
+DATA(insert OID = 500 (  "&&"	   PGNSP PGUID b f f 603 603	16 500	 0 box_overlap areasel areajoinsel "---"));
 DESCR("overlaps");
-DATA(insert OID = 501 (  ">="	   PGNSP PGUID b f f 603 603	16 505 504 box_ge areasel areajoinsel ));
+DATA(insert OID = 501 (  ">="	   PGNSP PGUID b f f 603 603	16 505 504 box_ge areasel areajoinsel "---"));
 DESCR("greater than or equal by area");
-DATA(insert OID = 502 (  ">"	   PGNSP PGUID b f f 603 603	16 504 505 box_gt areasel areajoinsel ));
+DATA(insert OID = 502 (  ">"	   PGNSP PGUID b f f 603 603	16 504 505 box_gt areasel areajoinsel "---"));
 DESCR("greater than by area");
-DATA(insert OID = 503 (  "="	   PGNSP PGUID b f f 603 603	16 503	 0 box_eq eqsel eqjoinsel ));
+DATA(insert OID = 503 (  "="	   PGNSP PGUID b f f 603 603	16 503	 0 box_eq eqsel eqjoinsel "mhf"));
 DESCR("equal by area");
-DATA(insert OID = 504 (  "<"	   PGNSP PGUID b f f 603 603	16 502 501 box_lt areasel areajoinsel ));
+DATA(insert OID = 504 (  "<"	   PGNSP PGUID b f f 603 603	16 502 501 box_lt areasel areajoinsel "---"));
 DESCR("less than by area");
-DATA(insert OID = 505 (  "<="	   PGNSP PGUID b f f 603 603	16 501 502 box_le areasel areajoinsel ));
+DATA(insert OID = 505 (  "<="	   PGNSP PGUID b f f 603 603	16 501 502 box_le areasel areajoinsel "---"));
 DESCR("less than or equal by area");
-DATA(insert OID = 506 (  ">^"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_above positionsel positionjoinsel ));
+DATA(insert OID = 506 (  ">^"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_above positionsel positionjoinsel "---"));
 DESCR("is above");
-DATA(insert OID = 507 (  "<<"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_left positionsel positionjoinsel ));
+DATA(insert OID = 507 (  "<<"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_left positionsel positionjoinsel "---"));
 DESCR("is left of");
-DATA(insert OID = 508 (  ">>"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_right positionsel positionjoinsel ));
+DATA(insert OID = 508 (  ">>"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_right positionsel positionjoinsel "---"));
 DESCR("is right of");
-DATA(insert OID = 509 (  "<^"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_below positionsel positionjoinsel ));
+DATA(insert OID = 509 (  "<^"	   PGNSP PGUID b f f 600 600	16	 0	 0 point_below positionsel positionjoinsel "---"));
 DESCR("is below");
-DATA(insert OID = 510 (  "~="	   PGNSP PGUID b f f 600 600	16 510 713 point_eq eqsel eqjoinsel ));
+DATA(insert OID = 510 (  "~="	   PGNSP PGUID b f f 600 600	16 510 713 point_eq eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 511 (  "<@"	   PGNSP PGUID b f f 600 603	16 433	 0 on_pb contsel contjoinsel ));
+DATA(insert OID = 511 (  "<@"	   PGNSP PGUID b f f 600 603	16 433	 0 on_pb contsel contjoinsel "---"));
 DESCR("point inside box");
-DATA(insert OID = 433 (  "@>"	   PGNSP PGUID b f f 603 600	16 511	 0 box_contain_pt contsel contjoinsel ));
+DATA(insert OID = 433 (  "@>"	   PGNSP PGUID b f f 603 600	16 511	 0 box_contain_pt contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 512 (  "<@"	   PGNSP PGUID b f f 600 602	16 755	 0 on_ppath - - ));
+DATA(insert OID = 512 (  "<@"	   PGNSP PGUID b f f 600 602	16 755	 0 on_ppath - - "---"));
 DESCR("point within closed path, or point on open path");
-DATA(insert OID = 513 (  "@@"	   PGNSP PGUID l f f	 0 603 600	 0	 0 box_center - - ));
+DATA(insert OID = 513 (  "@@"	   PGNSP PGUID l f f	 0 603 600	 0	 0 box_center - - "---"));
 DESCR("center of");
-DATA(insert OID = 514 (  "*"	   PGNSP PGUID b f f	23	23	23 514	 0 int4mul - - ));
+DATA(insert OID = 514 (  "*"	   PGNSP PGUID b f f	23	23	23 514	 0 int4mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 517 (  "<->"	   PGNSP PGUID b f f 600 600 701 517	 0 point_distance - - ));
+DATA(insert OID = 517 (  "<->"	   PGNSP PGUID b f f 600 600 701 517	 0 point_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 518 (  "<>"	   PGNSP PGUID b f f	23	23	16 518	96 int4ne neqsel neqjoinsel ));
+DATA(insert OID = 518 (  "<>"	   PGNSP PGUID b f f	23	23	16 518	96 int4ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 519 (  "<>"	   PGNSP PGUID b f f	21	21	16 519	94 int2ne neqsel neqjoinsel ));
+DATA(insert OID = 519 (  "<>"	   PGNSP PGUID b f f	21	21	16 519	94 int2ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 520 (  ">"	   PGNSP PGUID b f f	21	21	16	95 522 int2gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 520 (  ">"	   PGNSP PGUID b f f	21	21	16	95 522 int2gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 521 (  ">"	   PGNSP PGUID b f f	23	23	16	97 523 int4gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 521 (  ">"	   PGNSP PGUID b f f	23	23	16	97 523 int4gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 522 (  "<="	   PGNSP PGUID b f f	21	21	16 524 520 int2le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 522 (  "<="	   PGNSP PGUID b f f	21	21	16 524 520 int2le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 523 (  "<="	   PGNSP PGUID b f f	23	23	16 525 521 int4le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 523 (  "<="	   PGNSP PGUID b f f	23	23	16 525 521 int4le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 524 (  ">="	   PGNSP PGUID b f f	21	21	16 522	95 int2ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 524 (  ">="	   PGNSP PGUID b f f	21	21	16 522	95 int2ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 525 (  ">="	   PGNSP PGUID b f f	23	23	16 523	97 int4ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 525 (  ">="	   PGNSP PGUID b f f	23	23	16 523	97 int4ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 526 (  "*"	   PGNSP PGUID b f f	21	21	21 526	 0 int2mul - - ));
+DATA(insert OID = 526 (  "*"	   PGNSP PGUID b f f	21	21	21 526	 0 int2mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 527 (  "/"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2div - - ));
+DATA(insert OID = 527 (  "/"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2div - - "---"));
 DESCR("divide");
-DATA(insert OID = 528 (  "/"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4div - - ));
+DATA(insert OID = 528 (  "/"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4div - - "---"));
 DESCR("divide");
-DATA(insert OID = 529 (  "%"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2mod - - ));
+DATA(insert OID = 529 (  "%"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2mod - - "---"));
 DESCR("modulus");
-DATA(insert OID = 530 (  "%"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4mod - - ));
+DATA(insert OID = 530 (  "%"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4mod - - "---"));
 DESCR("modulus");
-DATA(insert OID = 531 (  "<>"	   PGNSP PGUID b f f	25	25	16 531	98 textne neqsel neqjoinsel ));
+DATA(insert OID = 531 (  "<>"	   PGNSP PGUID b f f	25	25	16 531	98 textne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 532 (  "="	   PGNSP PGUID b t t	21	23	16 533 538 int24eq eqsel eqjoinsel ));
+DATA(insert OID = 532 (  "="	   PGNSP PGUID b t t	21	23	16 533 538 int24eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 533 (  "="	   PGNSP PGUID b t t	23	21	16 532 539 int42eq eqsel eqjoinsel ));
+DATA(insert OID = 533 (  "="	   PGNSP PGUID b t t	23	21	16 532 539 int42eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 534 (  "<"	   PGNSP PGUID b f f	21	23	16 537 542 int24lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 534 (  "<"	   PGNSP PGUID b f f	21	23	16 537 542 int24lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 535 (  "<"	   PGNSP PGUID b f f	23	21	16 536 543 int42lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 535 (  "<"	   PGNSP PGUID b f f	23	21	16 536 543 int42lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 536 (  ">"	   PGNSP PGUID b f f	21	23	16 535 540 int24gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 536 (  ">"	   PGNSP PGUID b f f	21	23	16 535 540 int24gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 537 (  ">"	   PGNSP PGUID b f f	23	21	16 534 541 int42gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 537 (  ">"	   PGNSP PGUID b f f	23	21	16 534 541 int42gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 538 (  "<>"	   PGNSP PGUID b f f	21	23	16 539 532 int24ne neqsel neqjoinsel ));
+DATA(insert OID = 538 (  "<>"	   PGNSP PGUID b f f	21	23	16 539 532 int24ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 539 (  "<>"	   PGNSP PGUID b f f	23	21	16 538 533 int42ne neqsel neqjoinsel ));
+DATA(insert OID = 539 (  "<>"	   PGNSP PGUID b f f	23	21	16 538 533 int42ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 540 (  "<="	   PGNSP PGUID b f f	21	23	16 543 536 int24le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 540 (  "<="	   PGNSP PGUID b f f	21	23	16 543 536 int24le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 541 (  "<="	   PGNSP PGUID b f f	23	21	16 542 537 int42le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 541 (  "<="	   PGNSP PGUID b f f	23	21	16 542 537 int42le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 542 (  ">="	   PGNSP PGUID b f f	21	23	16 541 534 int24ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 542 (  ">="	   PGNSP PGUID b f f	21	23	16 541 534 int24ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 543 (  ">="	   PGNSP PGUID b f f	23	21	16 540 535 int42ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 543 (  ">="	   PGNSP PGUID b f f	23	21	16 540 535 int42ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 544 (  "*"	   PGNSP PGUID b f f	21	23	23 545	 0 int24mul - - ));
+DATA(insert OID = 544 (  "*"	   PGNSP PGUID b f f	21	23	23 545	 0 int24mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 545 (  "*"	   PGNSP PGUID b f f	23	21	23 544	 0 int42mul - - ));
+DATA(insert OID = 545 (  "*"	   PGNSP PGUID b f f	23	21	23 544	 0 int42mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 546 (  "/"	   PGNSP PGUID b f f	21	23	23	 0	 0 int24div - - ));
+DATA(insert OID = 546 (  "/"	   PGNSP PGUID b f f	21	23	23	 0	 0 int24div - - "---"));
 DESCR("divide");
-DATA(insert OID = 547 (  "/"	   PGNSP PGUID b f f	23	21	23	 0	 0 int42div - - ));
+DATA(insert OID = 547 (  "/"	   PGNSP PGUID b f f	23	21	23	 0	 0 int42div - - "---"));
 DESCR("divide");
-DATA(insert OID = 550 (  "+"	   PGNSP PGUID b f f	21	21	21 550	 0 int2pl - - ));
+DATA(insert OID = 550 (  "+"	   PGNSP PGUID b f f	21	21	21 550	 0 int2pl - - "---"));
 DESCR("add");
-DATA(insert OID = 551 (  "+"	   PGNSP PGUID b f f	23	23	23 551	 0 int4pl - - ));
+DATA(insert OID = 551 (  "+"	   PGNSP PGUID b f f	23	23	23 551	 0 int4pl - - "---"));
 DESCR("add");
-DATA(insert OID = 552 (  "+"	   PGNSP PGUID b f f	21	23	23 553	 0 int24pl - - ));
+DATA(insert OID = 552 (  "+"	   PGNSP PGUID b f f	21	23	23 553	 0 int24pl - - "---"));
 DESCR("add");
-DATA(insert OID = 553 (  "+"	   PGNSP PGUID b f f	23	21	23 552	 0 int42pl - - ));
+DATA(insert OID = 553 (  "+"	   PGNSP PGUID b f f	23	21	23 552	 0 int42pl - - "---"));
 DESCR("add");
-DATA(insert OID = 554 (  "-"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2mi - - ));
+DATA(insert OID = 554 (  "-"	   PGNSP PGUID b f f	21	21	21	 0	 0 int2mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 555 (  "-"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4mi - - ));
+DATA(insert OID = 555 (  "-"	   PGNSP PGUID b f f	23	23	23	 0	 0 int4mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 556 (  "-"	   PGNSP PGUID b f f	21	23	23	 0	 0 int24mi - - ));
+DATA(insert OID = 556 (  "-"	   PGNSP PGUID b f f	21	23	23	 0	 0 int24mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 557 (  "-"	   PGNSP PGUID b f f	23	21	23	 0	 0 int42mi - - ));
+DATA(insert OID = 557 (  "-"	   PGNSP PGUID b f f	23	21	23	 0	 0 int42mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 558 (  "-"	   PGNSP PGUID l f f	 0	23	23	 0	 0 int4um - - ));
+DATA(insert OID = 558 (  "-"	   PGNSP PGUID l f f	 0	23	23	 0	 0 int4um - - "---"));
 DESCR("negate");
-DATA(insert OID = 559 (  "-"	   PGNSP PGUID l f f	 0	21	21	 0	 0 int2um - - ));
+DATA(insert OID = 559 (  "-"	   PGNSP PGUID l f f	 0	21	21	 0	 0 int2um - - "---"));
 DESCR("negate");
-DATA(insert OID = 560 (  "="	   PGNSP PGUID b t t 702 702	16 560 561 abstimeeq eqsel eqjoinsel ));
+DATA(insert OID = 560 (  "="	   PGNSP PGUID b t t 702 702	16 560 561 abstimeeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 561 (  "<>"	   PGNSP PGUID b f f 702 702	16 561 560 abstimene neqsel neqjoinsel ));
+DATA(insert OID = 561 (  "<>"	   PGNSP PGUID b f f 702 702	16 561 560 abstimene neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 562 (  "<"	   PGNSP PGUID b f f 702 702	16 563 565 abstimelt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 562 (  "<"	   PGNSP PGUID b f f 702 702	16 563 565 abstimelt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 563 (  ">"	   PGNSP PGUID b f f 702 702	16 562 564 abstimegt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 563 (  ">"	   PGNSP PGUID b f f 702 702	16 562 564 abstimegt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 564 (  "<="	   PGNSP PGUID b f f 702 702	16 565 563 abstimele scalarltsel scalarltjoinsel ));
+DATA(insert OID = 564 (  "<="	   PGNSP PGUID b f f 702 702	16 565 563 abstimele scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 565 (  ">="	   PGNSP PGUID b f f 702 702	16 564 562 abstimege scalargtsel scalargtjoinsel ));
+DATA(insert OID = 565 (  ">="	   PGNSP PGUID b f f 702 702	16 564 562 abstimege scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 566 (  "="	   PGNSP PGUID b t t 703 703	16 566 567 reltimeeq eqsel eqjoinsel ));
+DATA(insert OID = 566 (  "="	   PGNSP PGUID b t t 703 703	16 566 567 reltimeeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 567 (  "<>"	   PGNSP PGUID b f f 703 703	16 567 566 reltimene neqsel neqjoinsel ));
+DATA(insert OID = 567 (  "<>"	   PGNSP PGUID b f f 703 703	16 567 566 reltimene neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 568 (  "<"	   PGNSP PGUID b f f 703 703	16 569 571 reltimelt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 568 (  "<"	   PGNSP PGUID b f f 703 703	16 569 571 reltimelt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 569 (  ">"	   PGNSP PGUID b f f 703 703	16 568 570 reltimegt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 569 (  ">"	   PGNSP PGUID b f f 703 703	16 568 570 reltimegt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 570 (  "<="	   PGNSP PGUID b f f 703 703	16 571 569 reltimele scalarltsel scalarltjoinsel ));
+DATA(insert OID = 570 (  "<="	   PGNSP PGUID b f f 703 703	16 571 569 reltimele scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 571 (  ">="	   PGNSP PGUID b f f 703 703	16 570 568 reltimege scalargtsel scalargtjoinsel ));
+DATA(insert OID = 571 (  ">="	   PGNSP PGUID b f f 703 703	16 570 568 reltimege scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 572 (  "~="	   PGNSP PGUID b f f 704 704	16 572	 0 tintervalsame eqsel eqjoinsel ));
+DATA(insert OID = 572 (  "~="	   PGNSP PGUID b f f 704 704	16 572	 0 tintervalsame eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 573 (  "<<"	   PGNSP PGUID b f f 704 704	16	 0	 0 tintervalct - - ));
+DATA(insert OID = 573 (  "<<"	   PGNSP PGUID b f f 704 704	16	 0	 0 tintervalct - - "---"));
 DESCR("contains");
-DATA(insert OID = 574 (  "&&"	   PGNSP PGUID b f f 704 704	16 574	 0 tintervalov - - ));
+DATA(insert OID = 574 (  "&&"	   PGNSP PGUID b f f 704 704	16 574	 0 tintervalov - - "---"));
 DESCR("overlaps");
-DATA(insert OID = 575 (  "#="	   PGNSP PGUID b f f 704 703	16	 0 576 tintervalleneq - - ));
+DATA(insert OID = 575 (  "#="	   PGNSP PGUID b f f 704 703	16	 0 576 tintervalleneq - - "---"));
 DESCR("equal by length");
-DATA(insert OID = 576 (  "#<>"	   PGNSP PGUID b f f 704 703	16	 0 575 tintervallenne - - ));
+DATA(insert OID = 576 (  "#<>"	   PGNSP PGUID b f f 704 703	16	 0 575 tintervallenne - - "---"));
 DESCR("not equal by length");
-DATA(insert OID = 577 (  "#<"	   PGNSP PGUID b f f 704 703	16	 0 580 tintervallenlt - - ));
+DATA(insert OID = 577 (  "#<"	   PGNSP PGUID b f f 704 703	16	 0 580 tintervallenlt - - "---"));
 DESCR("less than by length");
-DATA(insert OID = 578 (  "#>"	   PGNSP PGUID b f f 704 703	16	 0 579 tintervallengt - - ));
+DATA(insert OID = 578 (  "#>"	   PGNSP PGUID b f f 704 703	16	 0 579 tintervallengt - - "---"));
 DESCR("greater than by length");
-DATA(insert OID = 579 (  "#<="	   PGNSP PGUID b f f 704 703	16	 0 578 tintervallenle - - ));
+DATA(insert OID = 579 (  "#<="	   PGNSP PGUID b f f 704 703	16	 0 578 tintervallenle - - "---"));
 DESCR("less than or equal by length");
-DATA(insert OID = 580 (  "#>="	   PGNSP PGUID b f f 704 703	16	 0 577 tintervallenge - - ));
+DATA(insert OID = 580 (  "#>="	   PGNSP PGUID b f f 704 703	16	 0 577 tintervallenge - - "---"));
 DESCR("greater than or equal by length");
-DATA(insert OID = 581 (  "+"	   PGNSP PGUID b f f 702 703 702	 0	 0 timepl - - ));
+DATA(insert OID = 581 (  "+"	   PGNSP PGUID b f f 702 703 702	 0	 0 timepl - - "---"));
 DESCR("add");
-DATA(insert OID = 582 (  "-"	   PGNSP PGUID b f f 702 703 702	 0	 0 timemi - - ));
+DATA(insert OID = 582 (  "-"	   PGNSP PGUID b f f 702 703 702	 0	 0 timemi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 583 (  "<?>"	   PGNSP PGUID b f f 702 704	16	 0	 0 intinterval - - ));
+DATA(insert OID = 583 (  "<?>"	   PGNSP PGUID b f f 702 704	16	 0	 0 intinterval - - "---"));
 DESCR("is contained by");
-DATA(insert OID = 584 (  "-"	   PGNSP PGUID l f f	 0 700 700	 0	 0 float4um - - ));
+DATA(insert OID = 584 (  "-"	   PGNSP PGUID l f f	 0 700 700	 0	 0 float4um - - "---"));
 DESCR("negate");
-DATA(insert OID = 585 (  "-"	   PGNSP PGUID l f f	 0 701 701	 0	 0 float8um - - ));
+DATA(insert OID = 585 (  "-"	   PGNSP PGUID l f f	 0 701 701	 0	 0 float8um - - "---"));
 DESCR("negate");
-DATA(insert OID = 586 (  "+"	   PGNSP PGUID b f f 700 700 700 586	 0 float4pl - - ));
+DATA(insert OID = 586 (  "+"	   PGNSP PGUID b f f 700 700 700 586	 0 float4pl - - "---"));
 DESCR("add");
-DATA(insert OID = 587 (  "-"	   PGNSP PGUID b f f 700 700 700	 0	 0 float4mi - - ));
+DATA(insert OID = 587 (  "-"	   PGNSP PGUID b f f 700 700 700	 0	 0 float4mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 588 (  "/"	   PGNSP PGUID b f f 700 700 700	 0	 0 float4div - - ));
+DATA(insert OID = 588 (  "/"	   PGNSP PGUID b f f 700 700 700	 0	 0 float4div - - "---"));
 DESCR("divide");
-DATA(insert OID = 589 (  "*"	   PGNSP PGUID b f f 700 700 700 589	 0 float4mul - - ));
+DATA(insert OID = 589 (  "*"	   PGNSP PGUID b f f 700 700 700 589	 0 float4mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 590 (  "@"	   PGNSP PGUID l f f	 0 700 700	 0	 0 float4abs - - ));
+DATA(insert OID = 590 (  "@"	   PGNSP PGUID l f f	 0 700 700	 0	 0 float4abs - - "---"));
 DESCR("absolute value");
-DATA(insert OID = 591 (  "+"	   PGNSP PGUID b f f 701 701 701 591	 0 float8pl - - ));
+DATA(insert OID = 591 (  "+"	   PGNSP PGUID b f f 701 701 701 591	 0 float8pl - - "---"));
 DESCR("add");
-DATA(insert OID = 592 (  "-"	   PGNSP PGUID b f f 701 701 701	 0	 0 float8mi - - ));
+DATA(insert OID = 592 (  "-"	   PGNSP PGUID b f f 701 701 701	 0	 0 float8mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 593 (  "/"	   PGNSP PGUID b f f 701 701 701	 0	 0 float8div - - ));
+DATA(insert OID = 593 (  "/"	   PGNSP PGUID b f f 701 701 701	 0	 0 float8div - - "---"));
 DESCR("divide");
-DATA(insert OID = 594 (  "*"	   PGNSP PGUID b f f 701 701 701 594	 0 float8mul - - ));
+DATA(insert OID = 594 (  "*"	   PGNSP PGUID b f f 701 701 701 594	 0 float8mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 595 (  "@"	   PGNSP PGUID l f f	 0 701 701	 0	 0 float8abs - - ));
+DATA(insert OID = 595 (  "@"	   PGNSP PGUID l f f	 0 701 701	 0	 0 float8abs - - "---"));
 DESCR("absolute value");
-DATA(insert OID = 596 (  "|/"	   PGNSP PGUID l f f	 0 701 701	 0	 0 dsqrt - - ));
+DATA(insert OID = 596 (  "|/"	   PGNSP PGUID l f f	 0 701 701	 0	 0 dsqrt - - "---"));
 DESCR("square root");
-DATA(insert OID = 597 (  "||/"	   PGNSP PGUID l f f	 0 701 701	 0	 0 dcbrt - - ));
+DATA(insert OID = 597 (  "||/"	   PGNSP PGUID l f f	 0 701 701	 0	 0 dcbrt - - "---"));
 DESCR("cube root");
-DATA(insert OID = 1284 (  "|"	   PGNSP PGUID l f f	 0 704 702	 0	 0 tintervalstart - - ));
+DATA(insert OID = 1284 (  "|"	   PGNSP PGUID l f f	 0 704 702	 0	 0 tintervalstart - - "---"));
 DESCR("start of interval");
-DATA(insert OID = 606 (  "<#>"	   PGNSP PGUID b f f 702 702 704	 0	 0 mktinterval - - ));
+DATA(insert OID = 606 (  "<#>"	   PGNSP PGUID b f f 702 702 704	 0	 0 mktinterval - - "---"));
 DESCR("convert to tinterval");
 
-DATA(insert OID = 607 (  "="	   PGNSP PGUID b t t	26	26	16 607 608 oideq eqsel eqjoinsel ));
+DATA(insert OID = 607 (  "="	   PGNSP PGUID b t t	26	26	16 607 608 oideq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 608 (  "<>"	   PGNSP PGUID b f f	26	26	16 608 607 oidne neqsel neqjoinsel ));
+DATA(insert OID = 608 (  "<>"	   PGNSP PGUID b f f	26	26	16 608 607 oidne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 609 (  "<"	   PGNSP PGUID b f f	26	26	16 610 612 oidlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 609 (  "<"	   PGNSP PGUID b f f	26	26	16 610 612 oidlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 610 (  ">"	   PGNSP PGUID b f f	26	26	16 609 611 oidgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 610 (  ">"	   PGNSP PGUID b f f	26	26	16 609 611 oidgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 611 (  "<="	   PGNSP PGUID b f f	26	26	16 612 610 oidle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 611 (  "<="	   PGNSP PGUID b f f	26	26	16 612 610 oidle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 612 (  ">="	   PGNSP PGUID b f f	26	26	16 611 609 oidge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 612 (  ">="	   PGNSP PGUID b f f	26	26	16 611 609 oidge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 644 (  "<>"	   PGNSP PGUID b f f	30	30	16 644 649 oidvectorne neqsel neqjoinsel ));
+DATA(insert OID = 644 (  "<>"	   PGNSP PGUID b f f	30	30	16 644 649 oidvectorne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 645 (  "<"	   PGNSP PGUID b f f	30	30	16 646 648 oidvectorlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 645 (  "<"	   PGNSP PGUID b f f	30	30	16 646 648 oidvectorlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 646 (  ">"	   PGNSP PGUID b f f	30	30	16 645 647 oidvectorgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 646 (  ">"	   PGNSP PGUID b f f	30	30	16 645 647 oidvectorgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 647 (  "<="	   PGNSP PGUID b f f	30	30	16 648 646 oidvectorle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 647 (  "<="	   PGNSP PGUID b f f	30	30	16 648 646 oidvectorle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 648 (  ">="	   PGNSP PGUID b f f	30	30	16 647 645 oidvectorge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 648 (  ">="	   PGNSP PGUID b f f	30	30	16 647 645 oidvectorge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 649 (  "="	   PGNSP PGUID b t t	30	30	16 649 644 oidvectoreq eqsel eqjoinsel ));
+DATA(insert OID = 649 (  "="	   PGNSP PGUID b t t	30	30	16 649 644 oidvectoreq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 
-DATA(insert OID = 613 (  "<->"	   PGNSP PGUID b f f 600 628 701	 0	 0 dist_pl - - ));
+DATA(insert OID = 613 (  "<->"	   PGNSP PGUID b f f 600 628 701	 0	 0 dist_pl - - "---"));
 DESCR("distance between");
-DATA(insert OID = 614 (  "<->"	   PGNSP PGUID b f f 600 601 701	 0	 0 dist_ps - - ));
+DATA(insert OID = 614 (  "<->"	   PGNSP PGUID b f f 600 601 701	 0	 0 dist_ps - - "---"));
 DESCR("distance between");
-DATA(insert OID = 615 (  "<->"	   PGNSP PGUID b f f 600 603 701	 0	 0 dist_pb - - ));
+DATA(insert OID = 615 (  "<->"	   PGNSP PGUID b f f 600 603 701	 0	 0 dist_pb - - "---"));
 DESCR("distance between");
-DATA(insert OID = 616 (  "<->"	   PGNSP PGUID b f f 601 628 701	 0	 0 dist_sl - - ));
+DATA(insert OID = 616 (  "<->"	   PGNSP PGUID b f f 601 628 701	 0	 0 dist_sl - - "---"));
 DESCR("distance between");
-DATA(insert OID = 617 (  "<->"	   PGNSP PGUID b f f 601 603 701	 0	 0 dist_sb - - ));
+DATA(insert OID = 617 (  "<->"	   PGNSP PGUID b f f 601 603 701	 0	 0 dist_sb - - "---"));
 DESCR("distance between");
-DATA(insert OID = 618 (  "<->"	   PGNSP PGUID b f f 600 602 701	 0	 0 dist_ppath - - ));
+DATA(insert OID = 618 (  "<->"	   PGNSP PGUID b f f 600 602 701	 0	 0 dist_ppath - - "---"));
 DESCR("distance between");
 
-DATA(insert OID = 620 (  "="	   PGNSP PGUID b t t	700  700	16 620 621 float4eq eqsel eqjoinsel ));
+DATA(insert OID = 620 (  "="	   PGNSP PGUID b t t	700  700	16 620 621 float4eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 621 (  "<>"	   PGNSP PGUID b f f	700  700	16 621 620 float4ne neqsel neqjoinsel ));
+DATA(insert OID = 621 (  "<>"	   PGNSP PGUID b f f	700  700	16 621 620 float4ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 622 (  "<"	   PGNSP PGUID b f f	700  700	16 623 625 float4lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 622 (  "<"	   PGNSP PGUID b f f	700  700	16 623 625 float4lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 623 (  ">"	   PGNSP PGUID b f f	700  700	16 622 624 float4gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 623 (  ">"	   PGNSP PGUID b f f	700  700	16 622 624 float4gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 624 (  "<="	   PGNSP PGUID b f f	700  700	16 625 623 float4le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 624 (  "<="	   PGNSP PGUID b f f	700  700	16 625 623 float4le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 625 (  ">="	   PGNSP PGUID b f f	700  700	16 624 622 float4ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 625 (  ">="	   PGNSP PGUID b f f	700  700	16 624 622 float4ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 630 (  "<>"	   PGNSP PGUID b f f	18	18		16 630	92	charne neqsel neqjoinsel ));
+DATA(insert OID = 630 (  "<>"	   PGNSP PGUID b f f	18	18		16 630	92	charne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 631 (  "<"	   PGNSP PGUID b f f	18	18	16 633 634 charlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 631 (  "<"	   PGNSP PGUID b f f	18	18	16 633 634 charlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 632 (  "<="	   PGNSP PGUID b f f	18	18	16 634 633 charle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 632 (  "<="	   PGNSP PGUID b f f	18	18	16 634 633 charle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 633 (  ">"	   PGNSP PGUID b f f	18	18	16 631 632 chargt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 633 (  ">"	   PGNSP PGUID b f f	18	18	16 631 632 chargt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 634 (  ">="	   PGNSP PGUID b f f	18	18	16 632 631 charge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 634 (  ">="	   PGNSP PGUID b f f	18	18	16 632 631 charge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 639 (  "~"	   PGNSP PGUID b f f	19	25	16 0 640 nameregexeq regexeqsel regexeqjoinsel ));
+DATA(insert OID = 639 (  "~"	   PGNSP PGUID b f f	19	25	16 0 640 nameregexeq regexeqsel regexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-sensitive");
 #define OID_NAME_REGEXEQ_OP		639
-DATA(insert OID = 640 (  "!~"	   PGNSP PGUID b f f	19	25	16 0 639 nameregexne regexnesel regexnejoinsel ));
+DATA(insert OID = 640 (  "!~"	   PGNSP PGUID b f f	19	25	16 0 639 nameregexne regexnesel regexnejoinsel "---"));
 DESCR("does not match regular expression, case-sensitive");
-DATA(insert OID = 641 (  "~"	   PGNSP PGUID b f f	25	25	16 0 642 textregexeq regexeqsel regexeqjoinsel ));
+DATA(insert OID = 641 (  "~"	   PGNSP PGUID b f f	25	25	16 0 642 textregexeq regexeqsel regexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-sensitive");
 #define OID_TEXT_REGEXEQ_OP		641
-DATA(insert OID = 642 (  "!~"	   PGNSP PGUID b f f	25	25	16 0 641 textregexne regexnesel regexnejoinsel ));
+DATA(insert OID = 642 (  "!~"	   PGNSP PGUID b f f	25	25	16 0 641 textregexne regexnesel regexnejoinsel "---"));
 DESCR("does not match regular expression, case-sensitive");
-DATA(insert OID = 643 (  "<>"	   PGNSP PGUID b f f	19	19	16 643 93 namene neqsel neqjoinsel ));
+DATA(insert OID = 643 (  "<>"	   PGNSP PGUID b f f	19	19	16 643 93 namene neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 654 (  "||"	   PGNSP PGUID b f f	25	25	25	 0 0 textcat - - ));
+DATA(insert OID = 654 (  "||"	   PGNSP PGUID b f f	25	25	25	 0 0 textcat - - "---"));
 DESCR("concatenate");
 
-DATA(insert OID = 660 (  "<"	   PGNSP PGUID b f f	19	19	16 662 663 namelt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 660 (  "<"	   PGNSP PGUID b f f	19	19	16 662 663 namelt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 661 (  "<="	   PGNSP PGUID b f f	19	19	16 663 662 namele scalarltsel scalarltjoinsel ));
+DATA(insert OID = 661 (  "<="	   PGNSP PGUID b f f	19	19	16 663 662 namele scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 662 (  ">"	   PGNSP PGUID b f f	19	19	16 660 661 namegt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 662 (  ">"	   PGNSP PGUID b f f	19	19	16 660 661 namegt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 663 (  ">="	   PGNSP PGUID b f f	19	19	16 661 660 namege scalargtsel scalargtjoinsel ));
+DATA(insert OID = 663 (  ">="	   PGNSP PGUID b f f	19	19	16 661 660 namege scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 664 (  "<"	   PGNSP PGUID b f f	25	25	16 666 667 text_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 664 (  "<"	   PGNSP PGUID b f f	25	25	16 666 667 text_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 665 (  "<="	   PGNSP PGUID b f f	25	25	16 667 666 text_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 665 (  "<="	   PGNSP PGUID b f f	25	25	16 667 666 text_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 666 (  ">"	   PGNSP PGUID b f f	25	25	16 664 665 text_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 666 (  ">"	   PGNSP PGUID b f f	25	25	16 664 665 text_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 667 (  ">="	   PGNSP PGUID b f f	25	25	16 665 664 text_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 667 (  ">="	   PGNSP PGUID b f f	25	25	16 665 664 text_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 670 (  "="	   PGNSP PGUID b t t	701  701	16 670 671 float8eq eqsel eqjoinsel ));
+DATA(insert OID = 670 (  "="	   PGNSP PGUID b t t	701  701	16 670 671 float8eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 671 (  "<>"	   PGNSP PGUID b f f	701  701	16 671 670 float8ne neqsel neqjoinsel ));
+DATA(insert OID = 671 (  "<>"	   PGNSP PGUID b f f	701  701	16 671 670 float8ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 672 (  "<"	   PGNSP PGUID b f f	701  701	16 674 675 float8lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 672 (  "<"	   PGNSP PGUID b f f	701  701	16 674 675 float8lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define Float8LessOperator	672
-DATA(insert OID = 673 (  "<="	   PGNSP PGUID b f f	701  701	16 675 674 float8le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 673 (  "<="	   PGNSP PGUID b f f	701  701	16 675 674 float8le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 674 (  ">"	   PGNSP PGUID b f f	701  701	16 672 673 float8gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 674 (  ">"	   PGNSP PGUID b f f	701  701	16 672 673 float8gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 675 (  ">="	   PGNSP PGUID b f f	701  701	16 673 672 float8ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 675 (  ">="	   PGNSP PGUID b f f	701  701	16 673 672 float8ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 682 (  "@"	   PGNSP PGUID l f f	 0	21	21	 0	 0 int2abs - - ));
+DATA(insert OID = 682 (  "@"	   PGNSP PGUID l f f	 0	21	21	 0	 0 int2abs - - "---"));
 DESCR("absolute value");
-DATA(insert OID = 684 (  "+"	   PGNSP PGUID b f f	20	20	20 684	 0 int8pl - - ));
+DATA(insert OID = 684 (  "+"	   PGNSP PGUID b f f	20	20	20 684	 0 int8pl - - "---"));
 DESCR("add");
-DATA(insert OID = 685 (  "-"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8mi - - ));
+DATA(insert OID = 685 (  "-"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 686 (  "*"	   PGNSP PGUID b f f	20	20	20 686	 0 int8mul - - ));
+DATA(insert OID = 686 (  "*"	   PGNSP PGUID b f f	20	20	20 686	 0 int8mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 687 (  "/"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8div - - ));
+DATA(insert OID = 687 (  "/"	   PGNSP PGUID b f f	20	20	20	 0	 0 int8div - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 688 (  "+"	   PGNSP PGUID b f f	20	23	20 692	 0 int84pl - - ));
+DATA(insert OID = 688 (  "+"	   PGNSP PGUID b f f	20	23	20 692	 0 int84pl - - "---"));
 DESCR("add");
-DATA(insert OID = 689 (  "-"	   PGNSP PGUID b f f	20	23	20	 0	 0 int84mi - - ));
+DATA(insert OID = 689 (  "-"	   PGNSP PGUID b f f	20	23	20	 0	 0 int84mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 690 (  "*"	   PGNSP PGUID b f f	20	23	20 694	 0 int84mul - - ));
+DATA(insert OID = 690 (  "*"	   PGNSP PGUID b f f	20	23	20 694	 0 int84mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 691 (  "/"	   PGNSP PGUID b f f	20	23	20	 0	 0 int84div - - ));
+DATA(insert OID = 691 (  "/"	   PGNSP PGUID b f f	20	23	20	 0	 0 int84div - - "---"));
 DESCR("divide");
-DATA(insert OID = 692 (  "+"	   PGNSP PGUID b f f	23	20	20 688	 0 int48pl - - ));
+DATA(insert OID = 692 (  "+"	   PGNSP PGUID b f f	23	20	20 688	 0 int48pl - - "---"));
 DESCR("add");
-DATA(insert OID = 693 (  "-"	   PGNSP PGUID b f f	23	20	20	 0	 0 int48mi - - ));
+DATA(insert OID = 693 (  "-"	   PGNSP PGUID b f f	23	20	20	 0	 0 int48mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 694 (  "*"	   PGNSP PGUID b f f	23	20	20 690	 0 int48mul - - ));
+DATA(insert OID = 694 (  "*"	   PGNSP PGUID b f f	23	20	20 690	 0 int48mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 695 (  "/"	   PGNSP PGUID b f f	23	20	20	 0	 0 int48div - - ));
+DATA(insert OID = 695 (  "/"	   PGNSP PGUID b f f	23	20	20	 0	 0 int48div - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 818 (  "+"	   PGNSP PGUID b f f	20	21	20 822	 0 int82pl - - ));
+DATA(insert OID = 818 (  "+"	   PGNSP PGUID b f f	20	21	20 822	 0 int82pl - - "---"));
 DESCR("add");
-DATA(insert OID = 819 (  "-"	   PGNSP PGUID b f f	20	21	20	 0	 0 int82mi - - ));
+DATA(insert OID = 819 (  "-"	   PGNSP PGUID b f f	20	21	20	 0	 0 int82mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 820 (  "*"	   PGNSP PGUID b f f	20	21	20 824	 0 int82mul - - ));
+DATA(insert OID = 820 (  "*"	   PGNSP PGUID b f f	20	21	20 824	 0 int82mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 821 (  "/"	   PGNSP PGUID b f f	20	21	20	 0	 0 int82div - - ));
+DATA(insert OID = 821 (  "/"	   PGNSP PGUID b f f	20	21	20	 0	 0 int82div - - "---"));
 DESCR("divide");
-DATA(insert OID = 822 (  "+"	   PGNSP PGUID b f f	21	20	20 818	 0 int28pl - - ));
+DATA(insert OID = 822 (  "+"	   PGNSP PGUID b f f	21	20	20 818	 0 int28pl - - "---"));
 DESCR("add");
-DATA(insert OID = 823 (  "-"	   PGNSP PGUID b f f	21	20	20	 0	 0 int28mi - - ));
+DATA(insert OID = 823 (  "-"	   PGNSP PGUID b f f	21	20	20	 0	 0 int28mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 824 (  "*"	   PGNSP PGUID b f f	21	20	20 820	 0 int28mul - - ));
+DATA(insert OID = 824 (  "*"	   PGNSP PGUID b f f	21	20	20 820	 0 int28mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 825 (  "/"	   PGNSP PGUID b f f	21	20	20	 0	 0 int28div - - ));
+DATA(insert OID = 825 (  "/"	   PGNSP PGUID b f f	21	20	20	 0	 0 int28div - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 706 (  "<->"	   PGNSP PGUID b f f 603 603 701 706	 0 box_distance - - ));
+DATA(insert OID = 706 (  "<->"	   PGNSP PGUID b f f 603 603 701 706	 0 box_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 707 (  "<->"	   PGNSP PGUID b f f 602 602 701 707	 0 path_distance - - ));
+DATA(insert OID = 707 (  "<->"	   PGNSP PGUID b f f 602 602 701 707	 0 path_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 708 (  "<->"	   PGNSP PGUID b f f 628 628 701 708	 0 line_distance - - ));
+DATA(insert OID = 708 (  "<->"	   PGNSP PGUID b f f 628 628 701 708	 0 line_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 709 (  "<->"	   PGNSP PGUID b f f 601 601 701 709	 0 lseg_distance - - ));
+DATA(insert OID = 709 (  "<->"	   PGNSP PGUID b f f 601 601 701 709	 0 lseg_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 712 (  "<->"	   PGNSP PGUID b f f 604 604 701 712	 0 poly_distance - - ));
+DATA(insert OID = 712 (  "<->"	   PGNSP PGUID b f f 604 604 701 712	 0 poly_distance - - "---"));
 DESCR("distance between");
 
-DATA(insert OID = 713 (  "<>"	   PGNSP PGUID b f f 600 600	16 713 510 point_ne neqsel neqjoinsel ));
+DATA(insert OID = 713 (  "<>"	   PGNSP PGUID b f f 600 600	16 713 510 point_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
 /* add translation/rotation/scaling operators for geometric types. - thomas 97/05/10 */
-DATA(insert OID = 731 (  "+"	   PGNSP PGUID b f f	600  600	600  731  0 point_add - - ));
+DATA(insert OID = 731 (  "+"	   PGNSP PGUID b f f	600  600	600  731  0 point_add - - "---"));
 DESCR("add points (translate)");
-DATA(insert OID = 732 (  "-"	   PGNSP PGUID b f f	600  600	600    0  0 point_sub - - ));
+DATA(insert OID = 732 (  "-"	   PGNSP PGUID b f f	600  600	600    0  0 point_sub - - "---"));
 DESCR("subtract points (translate)");
-DATA(insert OID = 733 (  "*"	   PGNSP PGUID b f f	600  600	600  733  0 point_mul - - ));
+DATA(insert OID = 733 (  "*"	   PGNSP PGUID b f f	600  600	600  733  0 point_mul - - "---"));
 DESCR("multiply points (scale/rotate)");
-DATA(insert OID = 734 (  "/"	   PGNSP PGUID b f f	600  600	600    0  0 point_div - - ));
+DATA(insert OID = 734 (  "/"	   PGNSP PGUID b f f	600  600	600    0  0 point_div - - "---"));
 DESCR("divide points (scale/rotate)");
-DATA(insert OID = 735 (  "+"	   PGNSP PGUID b f f	602  602	602  735  0 path_add - - ));
+DATA(insert OID = 735 (  "+"	   PGNSP PGUID b f f	602  602	602  735  0 path_add - - "---"));
 DESCR("concatenate");
-DATA(insert OID = 736 (  "+"	   PGNSP PGUID b f f	602  600	602    0  0 path_add_pt - - ));
+DATA(insert OID = 736 (  "+"	   PGNSP PGUID b f f	602  600	602    0  0 path_add_pt - - "---"));
 DESCR("add (translate path)");
-DATA(insert OID = 737 (  "-"	   PGNSP PGUID b f f	602  600	602    0  0 path_sub_pt - - ));
+DATA(insert OID = 737 (  "-"	   PGNSP PGUID b f f	602  600	602    0  0 path_sub_pt - - "---"));
 DESCR("subtract (translate path)");
-DATA(insert OID = 738 (  "*"	   PGNSP PGUID b f f	602  600	602    0  0 path_mul_pt - - ));
+DATA(insert OID = 738 (  "*"	   PGNSP PGUID b f f	602  600	602    0  0 path_mul_pt - - "---"));
 DESCR("multiply (rotate/scale path)");
-DATA(insert OID = 739 (  "/"	   PGNSP PGUID b f f	602  600	602    0  0 path_div_pt - - ));
+DATA(insert OID = 739 (  "/"	   PGNSP PGUID b f f	602  600	602    0  0 path_div_pt - - "---"));
 DESCR("divide (rotate/scale path)");
-DATA(insert OID = 755 (  "@>"	   PGNSP PGUID b f f	602  600	 16  512  0 path_contain_pt - - ));
+DATA(insert OID = 755 (  "@>"	   PGNSP PGUID b f f	602  600	 16  512  0 path_contain_pt - - "---"));
 DESCR("contains");
-DATA(insert OID = 756 (  "<@"	   PGNSP PGUID b f f	600  604	 16  757  0 pt_contained_poly contsel contjoinsel ));
+DATA(insert OID = 756 (  "<@"	   PGNSP PGUID b f f	600  604	 16  757  0 pt_contained_poly contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 757 (  "@>"	   PGNSP PGUID b f f	604  600	 16  756  0 poly_contain_pt contsel contjoinsel ));
+DATA(insert OID = 757 (  "@>"	   PGNSP PGUID b f f	604  600	 16  756  0 poly_contain_pt contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 758 (  "<@"	   PGNSP PGUID b f f	600  718	 16  759  0 pt_contained_circle contsel contjoinsel ));
+DATA(insert OID = 758 (  "<@"	   PGNSP PGUID b f f	600  718	 16  759  0 pt_contained_circle contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 759 (  "@>"	   PGNSP PGUID b f f	718  600	 16  758  0 circle_contain_pt contsel contjoinsel ));
+DATA(insert OID = 759 (  "@>"	   PGNSP PGUID b f f	718  600	 16  758  0 circle_contain_pt contsel contjoinsel "---"));
 DESCR("contains");
 
-DATA(insert OID = 773 (  "@"	   PGNSP PGUID l f f	 0	23	23	 0	 0 int4abs - - ));
+DATA(insert OID = 773 (  "@"	   PGNSP PGUID l f f	 0	23	23	 0	 0 int4abs - - "---"));
 DESCR("absolute value");
 
 /* additional operators for geometric types - thomas 1997-07-09 */
-DATA(insert OID =  792 (  "="	   PGNSP PGUID b f f	602  602	 16  792  0 path_n_eq eqsel eqjoinsel ));
+DATA(insert OID =  792 (  "="	   PGNSP PGUID b f f	602  602	 16  792  0 path_n_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID =  793 (  "<"	   PGNSP PGUID b f f	602  602	 16  794  0 path_n_lt - - ));
+DATA(insert OID =  793 (  "<"	   PGNSP PGUID b f f	602  602	 16  794  0 path_n_lt - - "---"));
 DESCR("less than");
-DATA(insert OID =  794 (  ">"	   PGNSP PGUID b f f	602  602	 16  793  0 path_n_gt - - ));
+DATA(insert OID =  794 (  ">"	   PGNSP PGUID b f f	602  602	 16  793  0 path_n_gt - - "---"));
 DESCR("greater than");
-DATA(insert OID =  795 (  "<="	   PGNSP PGUID b f f	602  602	 16  796  0 path_n_le - - ));
+DATA(insert OID =  795 (  "<="	   PGNSP PGUID b f f	602  602	 16  796  0 path_n_le - - "---"));
 DESCR("less than or equal");
-DATA(insert OID =  796 (  ">="	   PGNSP PGUID b f f	602  602	 16  795  0 path_n_ge - - ));
+DATA(insert OID =  796 (  ">="	   PGNSP PGUID b f f	602  602	 16  795  0 path_n_ge - - "---"));
 DESCR("greater than or equal");
-DATA(insert OID =  797 (  "#"	   PGNSP PGUID l f f	0	 602	 23    0  0 path_npoints - - ));
+DATA(insert OID =  797 (  "#"	   PGNSP PGUID l f f	0	 602	 23    0  0 path_npoints - - "---"));
 DESCR("number of points");
-DATA(insert OID =  798 (  "?#"	   PGNSP PGUID b f f	602  602	 16    0  0 path_inter - - ));
+DATA(insert OID =  798 (  "?#"	   PGNSP PGUID b f f	602  602	 16    0  0 path_inter - - "---"));
 DESCR("intersect");
-DATA(insert OID =  799 (  "@-@"    PGNSP PGUID l f f	0	 602	701    0  0 path_length - - ));
+DATA(insert OID =  799 (  "@-@"    PGNSP PGUID l f f	0	 602	701    0  0 path_length - - "---"));
 DESCR("sum of path segment lengths");
-DATA(insert OID =  800 (  ">^"	   PGNSP PGUID b f f	603  603	 16    0  0 box_above_eq positionsel positionjoinsel ));
+DATA(insert OID =  800 (  ">^"	   PGNSP PGUID b f f	603  603	 16    0  0 box_above_eq positionsel positionjoinsel "---"));
 DESCR("is above (allows touching)");
-DATA(insert OID =  801 (  "<^"	   PGNSP PGUID b f f	603  603	 16    0  0 box_below_eq positionsel positionjoinsel ));
+DATA(insert OID =  801 (  "<^"	   PGNSP PGUID b f f	603  603	 16    0  0 box_below_eq positionsel positionjoinsel "---"));
 DESCR("is below (allows touching)");
-DATA(insert OID =  802 (  "?#"	   PGNSP PGUID b f f	603  603	 16    0  0 box_overlap areasel areajoinsel ));
+DATA(insert OID =  802 (  "?#"	   PGNSP PGUID b f f	603  603	 16    0  0 box_overlap areasel areajoinsel "---"));
 DESCR("deprecated, use && instead");
-DATA(insert OID =  803 (  "#"	   PGNSP PGUID b f f	603  603	603    0  0 box_intersect - - ));
+DATA(insert OID =  803 (  "#"	   PGNSP PGUID b f f	603  603	603    0  0 box_intersect - - "---"));
 DESCR("box intersection");
-DATA(insert OID =  804 (  "+"	   PGNSP PGUID b f f	603  600	603    0  0 box_add - - ));
+DATA(insert OID =  804 (  "+"	   PGNSP PGUID b f f	603  600	603    0  0 box_add - - "---"));
 DESCR("add point to box (translate)");
-DATA(insert OID =  805 (  "-"	   PGNSP PGUID b f f	603  600	603    0  0 box_sub - - ));
+DATA(insert OID =  805 (  "-"	   PGNSP PGUID b f f	603  600	603    0  0 box_sub - - "---"));
 DESCR("subtract point from box (translate)");
-DATA(insert OID =  806 (  "*"	   PGNSP PGUID b f f	603  600	603    0  0 box_mul - - ));
+DATA(insert OID =  806 (  "*"	   PGNSP PGUID b f f	603  600	603    0  0 box_mul - - "---"));
 DESCR("multiply box by point (scale)");
-DATA(insert OID =  807 (  "/"	   PGNSP PGUID b f f	603  600	603    0  0 box_div - - ));
+DATA(insert OID =  807 (  "/"	   PGNSP PGUID b f f	603  600	603    0  0 box_div - - "---"));
 DESCR("divide box by point (scale)");
-DATA(insert OID =  808 (  "?-"	   PGNSP PGUID b f f	600  600	 16  808  0 point_horiz - - ));
+DATA(insert OID =  808 (  "?-"	   PGNSP PGUID b f f	600  600	 16  808  0 point_horiz - - "---"));
 DESCR("horizontally aligned");
-DATA(insert OID =  809 (  "?|"	   PGNSP PGUID b f f	600  600	 16  809  0 point_vert - - ));
+DATA(insert OID =  809 (  "?|"	   PGNSP PGUID b f f	600  600	 16  809  0 point_vert - - "---"));
 DESCR("vertically aligned");
 
-DATA(insert OID = 811 (  "="	   PGNSP PGUID b t f 704 704	16 811 812 tintervaleq eqsel eqjoinsel ));
+DATA(insert OID = 811 (  "="	   PGNSP PGUID b t f 704 704	16 811 812 tintervaleq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 812 (  "<>"	   PGNSP PGUID b f f 704 704	16 812 811 tintervalne neqsel neqjoinsel ));
+DATA(insert OID = 812 (  "<>"	   PGNSP PGUID b f f 704 704	16 812 811 tintervalne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 813 (  "<"	   PGNSP PGUID b f f 704 704	16 814 816 tintervallt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 813 (  "<"	   PGNSP PGUID b f f 704 704	16 814 816 tintervallt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 814 (  ">"	   PGNSP PGUID b f f 704 704	16 813 815 tintervalgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 814 (  ">"	   PGNSP PGUID b f f 704 704	16 813 815 tintervalgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 815 (  "<="	   PGNSP PGUID b f f 704 704	16 816 814 tintervalle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 815 (  "<="	   PGNSP PGUID b f f 704 704	16 816 814 tintervalle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 816 (  ">="	   PGNSP PGUID b f f 704 704	16 815 813 tintervalge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 816 (  ">="	   PGNSP PGUID b f f 704 704	16 815 813 tintervalge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 843 (  "*"	   PGNSP PGUID b f f	790  700	790 845   0 cash_mul_flt4 - - ));
+DATA(insert OID = 843 (  "*"	   PGNSP PGUID b f f	790  700	790 845   0 cash_mul_flt4 - - "---"));
 DESCR("multiply");
-DATA(insert OID = 844 (  "/"	   PGNSP PGUID b f f	790  700	790   0   0 cash_div_flt4 - - ));
+DATA(insert OID = 844 (  "/"	   PGNSP PGUID b f f	790  700	790   0   0 cash_div_flt4 - - "---"));
 DESCR("divide");
-DATA(insert OID = 845 (  "*"	   PGNSP PGUID b f f	700  790	790 843   0 flt4_mul_cash - - ));
+DATA(insert OID = 845 (  "*"	   PGNSP PGUID b f f	700  790	790 843   0 flt4_mul_cash - - "---"));
 DESCR("multiply");
 
-DATA(insert OID = 900 (  "="	   PGNSP PGUID b t f	790  790	16 900 901 cash_eq eqsel eqjoinsel ));
+DATA(insert OID = 900 (  "="	   PGNSP PGUID b t f	790  790	16 900 901 cash_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 901 (  "<>"	   PGNSP PGUID b f f	790  790	16 901 900 cash_ne neqsel neqjoinsel ));
+DATA(insert OID = 901 (  "<>"	   PGNSP PGUID b f f	790  790	16 901 900 cash_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 902 (  "<"	   PGNSP PGUID b f f	790  790	16 903 905 cash_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 902 (  "<"	   PGNSP PGUID b f f	790  790	16 903 905 cash_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 903 (  ">"	   PGNSP PGUID b f f	790  790	16 902 904 cash_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 903 (  ">"	   PGNSP PGUID b f f	790  790	16 902 904 cash_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 904 (  "<="	   PGNSP PGUID b f f	790  790	16 905 903 cash_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 904 (  "<="	   PGNSP PGUID b f f	790  790	16 905 903 cash_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 905 (  ">="	   PGNSP PGUID b f f	790  790	16 904 902 cash_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 905 (  ">="	   PGNSP PGUID b f f	790  790	16 904 902 cash_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 906 (  "+"	   PGNSP PGUID b f f	790  790	790 906   0 cash_pl - - ));
+DATA(insert OID = 906 (  "+"	   PGNSP PGUID b f f	790  790	790 906   0 cash_pl - - "---"));
 DESCR("add");
-DATA(insert OID = 907 (  "-"	   PGNSP PGUID b f f	790  790	790   0   0 cash_mi - - ));
+DATA(insert OID = 907 (  "-"	   PGNSP PGUID b f f	790  790	790   0   0 cash_mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 908 (  "*"	   PGNSP PGUID b f f	790  701	790 916   0 cash_mul_flt8 - - ));
+DATA(insert OID = 908 (  "*"	   PGNSP PGUID b f f	790  701	790 916   0 cash_mul_flt8 - - "---"));
 DESCR("multiply");
-DATA(insert OID = 909 (  "/"	   PGNSP PGUID b f f	790  701	790   0   0 cash_div_flt8 - - ));
+DATA(insert OID = 909 (  "/"	   PGNSP PGUID b f f	790  701	790   0   0 cash_div_flt8 - - "---"));
 DESCR("divide");
-DATA(insert OID = 912 (  "*"	   PGNSP PGUID b f f	790  23		790 917   0 cash_mul_int4 - - ));
+DATA(insert OID = 912 (  "*"	   PGNSP PGUID b f f	790  23		790 917   0 cash_mul_int4 - - "---"));
 DESCR("multiply");
-DATA(insert OID = 913 (  "/"	   PGNSP PGUID b f f	790  23		790   0   0 cash_div_int4 - - ));
+DATA(insert OID = 913 (  "/"	   PGNSP PGUID b f f	790  23		790   0   0 cash_div_int4 - - "---"));
 DESCR("divide");
-DATA(insert OID = 914 (  "*"	   PGNSP PGUID b f f	790  21		790 918   0 cash_mul_int2 - - ));
+DATA(insert OID = 914 (  "*"	   PGNSP PGUID b f f	790  21		790 918   0 cash_mul_int2 - - "---"));
 DESCR("multiply");
-DATA(insert OID = 915 (  "/"	   PGNSP PGUID b f f	790  21		790   0   0 cash_div_int2 - - ));
+DATA(insert OID = 915 (  "/"	   PGNSP PGUID b f f	790  21		790   0   0 cash_div_int2 - - "---"));
 DESCR("divide");
-DATA(insert OID = 916 (  "*"	   PGNSP PGUID b f f	701  790	790 908   0 flt8_mul_cash - - ));
+DATA(insert OID = 916 (  "*"	   PGNSP PGUID b f f	701  790	790 908   0 flt8_mul_cash - - "---"));
 DESCR("multiply");
-DATA(insert OID = 917 (  "*"	   PGNSP PGUID b f f	23	790		790 912   0 int4_mul_cash - - ));
+DATA(insert OID = 917 (  "*"	   PGNSP PGUID b f f	23	790		790 912   0 int4_mul_cash - - "---"));
 DESCR("multiply");
-DATA(insert OID = 918 (  "*"	   PGNSP PGUID b f f	21	790		790 914   0 int2_mul_cash - - ));
+DATA(insert OID = 918 (  "*"	   PGNSP PGUID b f f	21	790		790 914   0 int2_mul_cash - - "---"));
 DESCR("multiply");
-DATA(insert OID = 3825 ( "/"	   PGNSP PGUID b f f	790 790		701   0   0 cash_div_cash - - ));
+DATA(insert OID = 3825 ( "/"	   PGNSP PGUID b f f	790 790		701   0   0 cash_div_cash - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 965 (  "^"	   PGNSP PGUID b f f	701  701	701 0 0 dpow - - ));
+DATA(insert OID = 965 (  "^"	   PGNSP PGUID b f f	701  701	701 0 0 dpow - - "---"));
 DESCR("exponentiation");
-DATA(insert OID = 966 (  "+"	   PGNSP PGUID b f f 1034 1033 1034 0 0 aclinsert - - ));
+DATA(insert OID = 966 (  "+"	   PGNSP PGUID b f f 1034 1033 1034 0 0 aclinsert - - "---"));
 DESCR("add/update ACL item");
-DATA(insert OID = 967 (  "-"	   PGNSP PGUID b f f 1034 1033 1034 0 0 aclremove - - ));
+DATA(insert OID = 967 (  "-"	   PGNSP PGUID b f f 1034 1033 1034 0 0 aclremove - - "---"));
 DESCR("remove ACL item");
-DATA(insert OID = 968 (  "@>"	   PGNSP PGUID b f f 1034 1033	 16 0 0 aclcontains - - ));
+DATA(insert OID = 968 (  "@>"	   PGNSP PGUID b f f 1034 1033	 16 0 0 aclcontains - - "---"));
 DESCR("contains");
-DATA(insert OID = 974 (  "="	   PGNSP PGUID b f t 1033 1033	 16 974 0 aclitemeq eqsel eqjoinsel ));
+DATA(insert OID = 974 (  "="	   PGNSP PGUID b f t 1033 1033	 16 974 0 aclitemeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 
 /* additional geometric operators - thomas 1997-07-09 */
-DATA(insert OID =  969 (  "@@"	   PGNSP PGUID l f f	0  601	600    0  0 lseg_center - - ));
+DATA(insert OID =  969 (  "@@"	   PGNSP PGUID l f f	0  601	600    0  0 lseg_center - - "---"));
 DESCR("center of");
-DATA(insert OID =  970 (  "@@"	   PGNSP PGUID l f f	0  602	600    0  0 path_center - - ));
+DATA(insert OID =  970 (  "@@"	   PGNSP PGUID l f f	0  602	600    0  0 path_center - - "---"));
 DESCR("center of");
-DATA(insert OID =  971 (  "@@"	   PGNSP PGUID l f f	0  604	600    0  0 poly_center - - ));
+DATA(insert OID =  971 (  "@@"	   PGNSP PGUID l f f	0  604	600    0  0 poly_center - - "---"));
 DESCR("center of");
 
-DATA(insert OID = 1054 ( "="	   PGNSP PGUID b t t 1042 1042	 16 1054 1057 bpchareq eqsel eqjoinsel ));
+DATA(insert OID = 1054 ( "="	   PGNSP PGUID b t t 1042 1042	 16 1054 1057 bpchareq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 
-DATA(insert OID = 1055 ( "~"	   PGNSP PGUID b f f 1042 25	 16    0 1056 bpcharregexeq regexeqsel regexeqjoinsel ));
+DATA(insert OID = 1055 ( "~"	   PGNSP PGUID b f f 1042 25	 16    0 1056 bpcharregexeq regexeqsel regexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-sensitive");
 #define OID_BPCHAR_REGEXEQ_OP		1055
-DATA(insert OID = 1056 ( "!~"	   PGNSP PGUID b f f 1042 25	 16    0 1055 bpcharregexne regexnesel regexnejoinsel ));
+DATA(insert OID = 1056 ( "!~"	   PGNSP PGUID b f f 1042 25	 16    0 1055 bpcharregexne regexnesel regexnejoinsel "---"));
 DESCR("does not match regular expression, case-sensitive");
-DATA(insert OID = 1057 ( "<>"	   PGNSP PGUID b f f 1042 1042	 16 1057 1054 bpcharne neqsel neqjoinsel ));
+DATA(insert OID = 1057 ( "<>"	   PGNSP PGUID b f f 1042 1042	 16 1057 1054 bpcharne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1058 ( "<"	   PGNSP PGUID b f f 1042 1042	 16 1060 1061 bpcharlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1058 ( "<"	   PGNSP PGUID b f f 1042 1042	 16 1060 1061 bpcharlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1059 ( "<="	   PGNSP PGUID b f f 1042 1042	 16 1061 1060 bpcharle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1059 ( "<="	   PGNSP PGUID b f f 1042 1042	 16 1061 1060 bpcharle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1060 ( ">"	   PGNSP PGUID b f f 1042 1042	 16 1058 1059 bpchargt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1060 ( ">"	   PGNSP PGUID b f f 1042 1042	 16 1058 1059 bpchargt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1061 ( ">="	   PGNSP PGUID b f f 1042 1042	 16 1059 1058 bpcharge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1061 ( ">="	   PGNSP PGUID b f f 1042 1042	 16 1059 1058 bpcharge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* generic array comparison operators */
-DATA(insert OID = 1070 (  "="	   PGNSP PGUID b t t 2277 2277 16 1070 1071 array_eq eqsel eqjoinsel ));
+DATA(insert OID = 1070 (  "="	   PGNSP PGUID b t t 2277 2277 16 1070 1071 array_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define ARRAY_EQ_OP 1070
-DATA(insert OID = 1071 (  "<>"	   PGNSP PGUID b f f 2277 2277 16 1071 1070 array_ne neqsel neqjoinsel ));
+DATA(insert OID = 1071 (  "<>"	   PGNSP PGUID b f f 2277 2277 16 1071 1070 array_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1072 (  "<"	   PGNSP PGUID b f f 2277 2277 16 1073 1075 array_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1072 (  "<"	   PGNSP PGUID b f f 2277 2277 16 1073 1075 array_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define ARRAY_LT_OP 1072
-DATA(insert OID = 1073 (  ">"	   PGNSP PGUID b f f 2277 2277 16 1072 1074 array_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1073 (  ">"	   PGNSP PGUID b f f 2277 2277 16 1072 1074 array_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
 #define ARRAY_GT_OP 1073
-DATA(insert OID = 1074 (  "<="	   PGNSP PGUID b f f 2277 2277 16 1075 1073 array_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1074 (  "<="	   PGNSP PGUID b f f 2277 2277 16 1075 1073 array_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1075 (  ">="	   PGNSP PGUID b f f 2277 2277 16 1074 1072 array_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1075 (  ">="	   PGNSP PGUID b f f 2277 2277 16 1074 1072 array_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* date operators */
-DATA(insert OID = 1076 ( "+"	   PGNSP PGUID b f f	1082	1186 1114 2551 0 date_pl_interval - - ));
+DATA(insert OID = 1076 ( "+"	   PGNSP PGUID b f f	1082	1186 1114 2551 0 date_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 1077 ( "-"	   PGNSP PGUID b f f	1082	1186 1114 0 0 date_mi_interval - - ));
+DATA(insert OID = 1077 ( "-"	   PGNSP PGUID b f f	1082	1186 1114 0 0 date_mi_interval - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1093 ( "="	   PGNSP PGUID b t t	1082	1082   16 1093 1094 date_eq eqsel eqjoinsel ));
+DATA(insert OID = 1093 ( "="	   PGNSP PGUID b t t	1082	1082   16 1093 1094 date_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1094 ( "<>"	   PGNSP PGUID b f f	1082	1082   16 1094 1093 date_ne neqsel neqjoinsel ));
+DATA(insert OID = 1094 ( "<>"	   PGNSP PGUID b f f	1082	1082   16 1094 1093 date_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1095 ( "<"	   PGNSP PGUID b f f	1082	1082   16 1097 1098 date_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1095 ( "<"	   PGNSP PGUID b f f	1082	1082   16 1097 1098 date_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1096 ( "<="	   PGNSP PGUID b f f	1082	1082   16 1098 1097 date_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1096 ( "<="	   PGNSP PGUID b f f	1082	1082   16 1098 1097 date_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1097 ( ">"	   PGNSP PGUID b f f	1082	1082   16 1095 1096 date_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1097 ( ">"	   PGNSP PGUID b f f	1082	1082   16 1095 1096 date_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1098 ( ">="	   PGNSP PGUID b f f	1082	1082   16 1096 1095 date_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1098 ( ">="	   PGNSP PGUID b f f	1082	1082   16 1096 1095 date_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 1099 ( "-"	   PGNSP PGUID b f f	1082	1082   23 0 0 date_mi - - ));
+DATA(insert OID = 1099 ( "-"	   PGNSP PGUID b f f	1082	1082   23 0 0 date_mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1100 ( "+"	   PGNSP PGUID b f f	1082	  23 1082 2555 0 date_pli - - ));
+DATA(insert OID = 1100 ( "+"	   PGNSP PGUID b f f	1082	  23 1082 2555 0 date_pli - - "---"));
 DESCR("add");
-DATA(insert OID = 1101 ( "-"	   PGNSP PGUID b f f	1082	  23 1082 0 0 date_mii - - ));
+DATA(insert OID = 1101 ( "-"	   PGNSP PGUID b f f	1082	  23 1082 0 0 date_mii - - "---"));
 DESCR("subtract");
 
 /* time operators */
-DATA(insert OID = 1108 ( "="	   PGNSP PGUID b t t	1083	1083  16 1108 1109 time_eq eqsel eqjoinsel ));
+DATA(insert OID = 1108 ( "="	   PGNSP PGUID b t t	1083	1083  16 1108 1109 time_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1109 ( "<>"	   PGNSP PGUID b f f	1083	1083  16 1109 1108 time_ne neqsel neqjoinsel ));
+DATA(insert OID = 1109 ( "<>"	   PGNSP PGUID b f f	1083	1083  16 1109 1108 time_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1110 ( "<"	   PGNSP PGUID b f f	1083	1083  16 1112 1113 time_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1110 ( "<"	   PGNSP PGUID b f f	1083	1083  16 1112 1113 time_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1111 ( "<="	   PGNSP PGUID b f f	1083	1083  16 1113 1112 time_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1111 ( "<="	   PGNSP PGUID b f f	1083	1083  16 1113 1112 time_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1112 ( ">"	   PGNSP PGUID b f f	1083	1083  16 1110 1111 time_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1112 ( ">"	   PGNSP PGUID b f f	1083	1083  16 1110 1111 time_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1113 ( ">="	   PGNSP PGUID b f f	1083	1083  16 1111 1110 time_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1113 ( ">="	   PGNSP PGUID b f f	1083	1083  16 1111 1110 time_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* timetz operators */
-DATA(insert OID = 1550 ( "="	   PGNSP PGUID b t t	1266 1266	16 1550 1551 timetz_eq eqsel eqjoinsel ));
+DATA(insert OID = 1550 ( "="	   PGNSP PGUID b t t	1266 1266	16 1550 1551 timetz_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1551 ( "<>"	   PGNSP PGUID b f f	1266 1266	16 1551 1550 timetz_ne neqsel neqjoinsel ));
+DATA(insert OID = 1551 ( "<>"	   PGNSP PGUID b f f	1266 1266	16 1551 1550 timetz_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1552 ( "<"	   PGNSP PGUID b f f	1266 1266	16 1554 1555 timetz_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1552 ( "<"	   PGNSP PGUID b f f	1266 1266	16 1554 1555 timetz_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1553 ( "<="	   PGNSP PGUID b f f	1266 1266	16 1555 1554 timetz_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1553 ( "<="	   PGNSP PGUID b f f	1266 1266	16 1555 1554 timetz_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1554 ( ">"	   PGNSP PGUID b f f	1266 1266	16 1552 1553 timetz_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1554 ( ">"	   PGNSP PGUID b f f	1266 1266	16 1552 1553 timetz_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1555 ( ">="	   PGNSP PGUID b f f	1266 1266	16 1553 1552 timetz_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1555 ( ">="	   PGNSP PGUID b f f	1266 1266	16 1553 1552 timetz_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* float48 operators */
-DATA(insert OID = 1116 (  "+"		PGNSP PGUID b f f 700 701 701 1126	 0 float48pl - - ));
+DATA(insert OID = 1116 (  "+"		PGNSP PGUID b f f 700 701 701 1126	 0 float48pl - - "---"));
 DESCR("add");
-DATA(insert OID = 1117 (  "-"		PGNSP PGUID b f f 700 701 701  0	 0 float48mi - - ));
+DATA(insert OID = 1117 (  "-"		PGNSP PGUID b f f 700 701 701  0	 0 float48mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1118 (  "/"		PGNSP PGUID b f f 700 701 701  0	 0 float48div - - ));
+DATA(insert OID = 1118 (  "/"		PGNSP PGUID b f f 700 701 701  0	 0 float48div - - "---"));
 DESCR("divide");
-DATA(insert OID = 1119 (  "*"		PGNSP PGUID b f f 700 701 701 1129	 0 float48mul - - ));
+DATA(insert OID = 1119 (  "*"		PGNSP PGUID b f f 700 701 701 1129	 0 float48mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1120 (  "="		PGNSP PGUID b t t  700	701  16 1130 1121 float48eq eqsel eqjoinsel ));
+DATA(insert OID = 1120 (  "="		PGNSP PGUID b t t  700	701  16 1130 1121 float48eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1121 (  "<>"		PGNSP PGUID b f f  700	701  16 1131 1120 float48ne neqsel neqjoinsel ));
+DATA(insert OID = 1121 (  "<>"		PGNSP PGUID b f f  700	701  16 1131 1120 float48ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1122 (  "<"		PGNSP PGUID b f f  700	701  16 1133 1125 float48lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1122 (  "<"		PGNSP PGUID b f f  700	701  16 1133 1125 float48lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1123 (  ">"		PGNSP PGUID b f f  700	701  16 1132 1124 float48gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1123 (  ">"		PGNSP PGUID b f f  700	701  16 1132 1124 float48gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1124 (  "<="		PGNSP PGUID b f f  700	701  16 1135 1123 float48le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1124 (  "<="		PGNSP PGUID b f f  700	701  16 1135 1123 float48le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1125 (  ">="		PGNSP PGUID b f f  700	701  16 1134 1122 float48ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1125 (  ">="		PGNSP PGUID b f f  700	701  16 1134 1122 float48ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* float84 operators */
-DATA(insert OID = 1126 (  "+"		PGNSP PGUID b f f 701 700 701 1116	 0 float84pl - - ));
+DATA(insert OID = 1126 (  "+"		PGNSP PGUID b f f 701 700 701 1116	 0 float84pl - - "---"));
 DESCR("add");
-DATA(insert OID = 1127 (  "-"		PGNSP PGUID b f f 701 700 701  0	 0 float84mi - - ));
+DATA(insert OID = 1127 (  "-"		PGNSP PGUID b f f 701 700 701  0	 0 float84mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1128 (  "/"		PGNSP PGUID b f f 701 700 701  0	 0 float84div - - ));
+DATA(insert OID = 1128 (  "/"		PGNSP PGUID b f f 701 700 701  0	 0 float84div - - "---"));
 DESCR("divide");
-DATA(insert OID = 1129 (  "*"		PGNSP PGUID b f f 701 700 701 1119	 0 float84mul - - ));
+DATA(insert OID = 1129 (  "*"		PGNSP PGUID b f f 701 700 701 1119	 0 float84mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1130 (  "="		PGNSP PGUID b t t  701	700  16 1120 1131 float84eq eqsel eqjoinsel ));
+DATA(insert OID = 1130 (  "="		PGNSP PGUID b t t  701	700  16 1120 1131 float84eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1131 (  "<>"		PGNSP PGUID b f f  701	700  16 1121 1130 float84ne neqsel neqjoinsel ));
+DATA(insert OID = 1131 (  "<>"		PGNSP PGUID b f f  701	700  16 1121 1130 float84ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1132 (  "<"		PGNSP PGUID b f f  701	700  16 1123 1135 float84lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1132 (  "<"		PGNSP PGUID b f f  701	700  16 1123 1135 float84lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1133 (  ">"		PGNSP PGUID b f f  701	700  16 1122 1134 float84gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1133 (  ">"		PGNSP PGUID b f f  701	700  16 1122 1134 float84gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1134 (  "<="		PGNSP PGUID b f f  701	700  16 1125 1133 float84le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1134 (  "<="		PGNSP PGUID b f f  701	700  16 1125 1133 float84le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1135 (  ">="		PGNSP PGUID b f f  701	700  16 1124 1132 float84ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1135 (  ">="		PGNSP PGUID b f f  701	700  16 1124 1132 float84ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 
 /* LIKE hacks by Keith Parks. */
-DATA(insert OID = 1207 (  "~~"	  PGNSP PGUID b f f  19 25	16 0 1208 namelike likesel likejoinsel ));
+DATA(insert OID = 1207 (  "~~"	  PGNSP PGUID b f f  19 25	16 0 1208 namelike likesel likejoinsel "---"));
 DESCR("matches LIKE expression");
 #define OID_NAME_LIKE_OP		1207
-DATA(insert OID = 1208 (  "!~~"   PGNSP PGUID b f f  19 25	16 0 1207 namenlike nlikesel nlikejoinsel ));
+DATA(insert OID = 1208 (  "!~~"   PGNSP PGUID b f f  19 25	16 0 1207 namenlike nlikesel nlikejoinsel "---"));
 DESCR("does not match LIKE expression");
-DATA(insert OID = 1209 (  "~~"	  PGNSP PGUID b f f  25 25	16 0 1210 textlike likesel likejoinsel ));
+DATA(insert OID = 1209 (  "~~"	  PGNSP PGUID b f f  25 25	16 0 1210 textlike likesel likejoinsel "---"));
 DESCR("matches LIKE expression");
 #define OID_TEXT_LIKE_OP		1209
-DATA(insert OID = 1210 (  "!~~"   PGNSP PGUID b f f  25 25	16 0 1209 textnlike nlikesel nlikejoinsel ));
+DATA(insert OID = 1210 (  "!~~"   PGNSP PGUID b f f  25 25	16 0 1209 textnlike nlikesel nlikejoinsel "---"));
 DESCR("does not match LIKE expression");
-DATA(insert OID = 1211 (  "~~"	  PGNSP PGUID b f f  1042 25	16 0 1212 bpcharlike likesel likejoinsel ));
+DATA(insert OID = 1211 (  "~~"	  PGNSP PGUID b f f  1042 25	16 0 1212 bpcharlike likesel likejoinsel "---"));
 DESCR("matches LIKE expression");
 #define OID_BPCHAR_LIKE_OP		1211
-DATA(insert OID = 1212 (  "!~~"   PGNSP PGUID b f f  1042 25	16 0 1211 bpcharnlike nlikesel nlikejoinsel ));
+DATA(insert OID = 1212 (  "!~~"   PGNSP PGUID b f f  1042 25	16 0 1211 bpcharnlike nlikesel nlikejoinsel "---"));
 DESCR("does not match LIKE expression");
 
 /* case-insensitive regex hacks */
-DATA(insert OID = 1226 (  "~*"		 PGNSP PGUID b f f	19	25	16 0 1227 nameicregexeq icregexeqsel icregexeqjoinsel ));
+DATA(insert OID = 1226 (  "~*"		 PGNSP PGUID b f f	19	25	16 0 1227 nameicregexeq icregexeqsel icregexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-insensitive");
 #define OID_NAME_ICREGEXEQ_OP		1226
-DATA(insert OID = 1227 (  "!~*"		 PGNSP PGUID b f f	19	25	16 0 1226 nameicregexne icregexnesel icregexnejoinsel ));
+DATA(insert OID = 1227 (  "!~*"		 PGNSP PGUID b f f	19	25	16 0 1226 nameicregexne icregexnesel icregexnejoinsel "---"));
 DESCR("does not match regular expression, case-insensitive");
-DATA(insert OID = 1228 (  "~*"		 PGNSP PGUID b f f	25	25	16 0 1229 texticregexeq icregexeqsel icregexeqjoinsel ));
+DATA(insert OID = 1228 (  "~*"		 PGNSP PGUID b f f	25	25	16 0 1229 texticregexeq icregexeqsel icregexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-insensitive");
 #define OID_TEXT_ICREGEXEQ_OP		1228
-DATA(insert OID = 1229 (  "!~*"		 PGNSP PGUID b f f	25	25	16 0 1228 texticregexne icregexnesel icregexnejoinsel ));
+DATA(insert OID = 1229 (  "!~*"		 PGNSP PGUID b f f	25	25	16 0 1228 texticregexne icregexnesel icregexnejoinsel "---"));
 DESCR("does not match regular expression, case-insensitive");
-DATA(insert OID = 1234 (  "~*"		PGNSP PGUID b f f  1042  25  16 0 1235 bpcharicregexeq icregexeqsel icregexeqjoinsel ));
+DATA(insert OID = 1234 (  "~*"		PGNSP PGUID b f f  1042  25  16 0 1235 bpcharicregexeq icregexeqsel icregexeqjoinsel "mhf"));
 DESCR("matches regular expression, case-insensitive");
 #define OID_BPCHAR_ICREGEXEQ_OP		1234
-DATA(insert OID = 1235 ( "!~*"		PGNSP PGUID b f f  1042  25  16 0 1234 bpcharicregexne icregexnesel icregexnejoinsel ));
+DATA(insert OID = 1235 ( "!~*"		PGNSP PGUID b f f  1042  25  16 0 1234 bpcharicregexne icregexnesel icregexnejoinsel "---"));
 DESCR("does not match regular expression, case-insensitive");
 
 /* timestamptz operators */
-DATA(insert OID = 1320 (  "="	   PGNSP PGUID b t t 1184 1184	 16 1320 1321 timestamptz_eq eqsel eqjoinsel ));
+DATA(insert OID = 1320 (  "="	   PGNSP PGUID b t t 1184 1184	 16 1320 1321 timestamptz_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1321 (  "<>"	   PGNSP PGUID b f f 1184 1184	 16 1321 1320 timestamptz_ne neqsel neqjoinsel ));
+DATA(insert OID = 1321 (  "<>"	   PGNSP PGUID b f f 1184 1184	 16 1321 1320 timestamptz_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1322 (  "<"	   PGNSP PGUID b f f 1184 1184	 16 1324 1325 timestamptz_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1322 (  "<"	   PGNSP PGUID b f f 1184 1184	 16 1324 1325 timestamptz_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1323 (  "<="	   PGNSP PGUID b f f 1184 1184	 16 1325 1324 timestamptz_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1323 (  "<="	   PGNSP PGUID b f f 1184 1184	 16 1325 1324 timestamptz_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1324 (  ">"	   PGNSP PGUID b f f 1184 1184	 16 1322 1323 timestamptz_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1324 (  ">"	   PGNSP PGUID b f f 1184 1184	 16 1322 1323 timestamptz_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1325 (  ">="	   PGNSP PGUID b f f 1184 1184	 16 1323 1322 timestamptz_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1325 (  ">="	   PGNSP PGUID b f f 1184 1184	 16 1323 1322 timestamptz_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 1327 (  "+"	   PGNSP PGUID b f f 1184 1186 1184  2554 0 timestamptz_pl_interval - - ));
+DATA(insert OID = 1327 (  "+"	   PGNSP PGUID b f f 1184 1186 1184  2554 0 timestamptz_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 1328 (  "-"	   PGNSP PGUID b f f 1184 1184 1186  0	0 timestamptz_mi - - ));
+DATA(insert OID = 1328 (  "-"	   PGNSP PGUID b f f 1184 1184 1186  0	0 timestamptz_mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1329 (  "-"	   PGNSP PGUID b f f 1184 1186 1184  0	0 timestamptz_mi_interval - - ));
+DATA(insert OID = 1329 (  "-"	   PGNSP PGUID b f f 1184 1186 1184  0	0 timestamptz_mi_interval - - "---"));
 DESCR("subtract");
 
 /* interval operators */
-DATA(insert OID = 1330 (  "="	   PGNSP PGUID b t t 1186 1186	 16 1330 1331 interval_eq eqsel eqjoinsel ));
+DATA(insert OID = 1330 (  "="	   PGNSP PGUID b t t 1186 1186	 16 1330 1331 interval_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1331 (  "<>"	   PGNSP PGUID b f f 1186 1186	 16 1331 1330 interval_ne neqsel neqjoinsel ));
+DATA(insert OID = 1331 (  "<>"	   PGNSP PGUID b f f 1186 1186	 16 1331 1330 interval_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1332 (  "<"	   PGNSP PGUID b f f 1186 1186	 16 1334 1335 interval_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1332 (  "<"	   PGNSP PGUID b f f 1186 1186	 16 1334 1335 interval_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1333 (  "<="	   PGNSP PGUID b f f 1186 1186	 16 1335 1334 interval_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1333 (  "<="	   PGNSP PGUID b f f 1186 1186	 16 1335 1334 interval_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1334 (  ">"	   PGNSP PGUID b f f 1186 1186	 16 1332 1333 interval_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1334 (  ">"	   PGNSP PGUID b f f 1186 1186	 16 1332 1333 interval_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1335 (  ">="	   PGNSP PGUID b f f 1186 1186	 16 1333 1332 interval_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1335 (  ">="	   PGNSP PGUID b f f 1186 1186	 16 1333 1332 interval_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 1336 (  "-"	   PGNSP PGUID l f f	0 1186 1186    0	0 interval_um - - ));
+DATA(insert OID = 1336 (  "-"	   PGNSP PGUID l f f	0 1186 1186    0	0 interval_um - - "---"));
 DESCR("negate");
-DATA(insert OID = 1337 (  "+"	   PGNSP PGUID b f f 1186 1186 1186 1337	0 interval_pl - - ));
+DATA(insert OID = 1337 (  "+"	   PGNSP PGUID b f f 1186 1186 1186 1337	0 interval_pl - - "---"));
 DESCR("add");
-DATA(insert OID = 1338 (  "-"	   PGNSP PGUID b f f 1186 1186 1186    0	0 interval_mi - - ));
+DATA(insert OID = 1338 (  "-"	   PGNSP PGUID b f f 1186 1186 1186    0	0 interval_mi - - "---"));
 DESCR("subtract");
 
-DATA(insert OID = 1360 (  "+"	   PGNSP PGUID b f f 1082 1083 1114 1363 0 datetime_pl - - ));
+DATA(insert OID = 1360 (  "+"	   PGNSP PGUID b f f 1082 1083 1114 1363 0 datetime_pl - - "---"));
 DESCR("convert date and time to timestamp");
-DATA(insert OID = 1361 (  "+"	   PGNSP PGUID b f f 1082 1266 1184 1366 0 datetimetz_pl - - ));
+DATA(insert OID = 1361 (  "+"	   PGNSP PGUID b f f 1082 1266 1184 1366 0 datetimetz_pl - - "---"));
 DESCR("convert date and time with time zone to timestamp with time zone");
-DATA(insert OID = 1363 (  "+"	   PGNSP PGUID b f f 1083 1082 1114 1360 0 timedate_pl - - ));
+DATA(insert OID = 1363 (  "+"	   PGNSP PGUID b f f 1083 1082 1114 1360 0 timedate_pl - - "---"));
 DESCR("convert time and date to timestamp");
-DATA(insert OID = 1366 (  "+"	   PGNSP PGUID b f f 1266 1082 1184 1361 0 timetzdate_pl - - ));
+DATA(insert OID = 1366 (  "+"	   PGNSP PGUID b f f 1266 1082 1184 1361 0 timetzdate_pl - - "---"));
 DESCR("convert time with time zone and date to timestamp with time zone");
 
-DATA(insert OID = 1399 (  "-"	   PGNSP PGUID b f f 1083 1083 1186  0	0 time_mi_time - - ));
+DATA(insert OID = 1399 (  "-"	   PGNSP PGUID b f f 1083 1083 1186  0	0 time_mi_time - - "---"));
 DESCR("subtract");
 
 /* additional geometric operators - thomas 97/04/18 */
-DATA(insert OID = 1420 (  "@@"	  PGNSP PGUID l f f  0	718 600   0    0 circle_center - - ));
+DATA(insert OID = 1420 (  "@@"	  PGNSP PGUID l f f  0	718 600   0    0 circle_center - - "---"));
 DESCR("center of");
-DATA(insert OID = 1500 (  "="	  PGNSP PGUID b f f  718	718 16 1500 1501 circle_eq eqsel eqjoinsel ));
+DATA(insert OID = 1500 (  "="	  PGNSP PGUID b f f  718	718 16 1500 1501 circle_eq eqsel eqjoinsel "mhf"));
 DESCR("equal by area");
-DATA(insert OID = 1501 (  "<>"	  PGNSP PGUID b f f  718	718 16 1501 1500 circle_ne neqsel neqjoinsel ));
+DATA(insert OID = 1501 (  "<>"	  PGNSP PGUID b f f  718	718 16 1501 1500 circle_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal by area");
-DATA(insert OID = 1502 (  "<"	  PGNSP PGUID b f f  718	718 16 1503 1505 circle_lt areasel areajoinsel ));
+DATA(insert OID = 1502 (  "<"	  PGNSP PGUID b f f  718	718 16 1503 1505 circle_lt areasel areajoinsel "---"));
 DESCR("less than by area");
-DATA(insert OID = 1503 (  ">"	  PGNSP PGUID b f f  718	718 16 1502 1504 circle_gt areasel areajoinsel ));
+DATA(insert OID = 1503 (  ">"	  PGNSP PGUID b f f  718	718 16 1502 1504 circle_gt areasel areajoinsel "---"));
 DESCR("greater than by area");
-DATA(insert OID = 1504 (  "<="	  PGNSP PGUID b f f  718	718 16 1505 1503 circle_le areasel areajoinsel ));
+DATA(insert OID = 1504 (  "<="	  PGNSP PGUID b f f  718	718 16 1505 1503 circle_le areasel areajoinsel "---"));
 DESCR("less than or equal by area");
-DATA(insert OID = 1505 (  ">="	  PGNSP PGUID b f f  718	718 16 1504 1502 circle_ge areasel areajoinsel ));
+DATA(insert OID = 1505 (  ">="	  PGNSP PGUID b f f  718	718 16 1504 1502 circle_ge areasel areajoinsel "---"));
 DESCR("greater than or equal by area");
 
-DATA(insert OID = 1506 (  "<<"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_left positionsel positionjoinsel ));
+DATA(insert OID = 1506 (  "<<"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_left positionsel positionjoinsel "---"));
 DESCR("is left of");
-DATA(insert OID = 1507 (  "&<"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_overleft positionsel positionjoinsel ));
+DATA(insert OID = 1507 (  "&<"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_overleft positionsel positionjoinsel "---"));
 DESCR("overlaps or is left of");
-DATA(insert OID = 1508 (  "&>"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_overright positionsel positionjoinsel ));
+DATA(insert OID = 1508 (  "&>"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_overright positionsel positionjoinsel "---"));
 DESCR("overlaps or is right of");
-DATA(insert OID = 1509 (  ">>"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_right positionsel positionjoinsel ));
+DATA(insert OID = 1509 (  ">>"	  PGNSP PGUID b f f  718	718 16	  0    0 circle_right positionsel positionjoinsel "---"));
 DESCR("is right of");
-DATA(insert OID = 1510 (  "<@"	  PGNSP PGUID b f f  718	718 16 1511    0 circle_contained contsel contjoinsel ));
+DATA(insert OID = 1510 (  "<@"	  PGNSP PGUID b f f  718	718 16 1511    0 circle_contained contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 1511 (  "@>"	  PGNSP PGUID b f f  718	718 16 1510    0 circle_contain contsel contjoinsel ));
+DATA(insert OID = 1511 (  "@>"	  PGNSP PGUID b f f  718	718 16 1510    0 circle_contain contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 1512 (  "~="	  PGNSP PGUID b f f  718	718 16 1512    0 circle_same eqsel eqjoinsel ));
+DATA(insert OID = 1512 (  "~="	  PGNSP PGUID b f f  718	718 16 1512    0 circle_same eqsel eqjoinsel "mhf"));
 DESCR("same as");
-DATA(insert OID = 1513 (  "&&"	  PGNSP PGUID b f f  718	718 16 1513    0 circle_overlap areasel areajoinsel ));
+DATA(insert OID = 1513 (  "&&"	  PGNSP PGUID b f f  718	718 16 1513    0 circle_overlap areasel areajoinsel "---"));
 DESCR("overlaps");
-DATA(insert OID = 1514 (  "|>>"   PGNSP PGUID b f f  718	718 16	  0    0 circle_above positionsel positionjoinsel ));
+DATA(insert OID = 1514 (  "|>>"   PGNSP PGUID b f f  718	718 16	  0    0 circle_above positionsel positionjoinsel "---"));
 DESCR("is above");
-DATA(insert OID = 1515 (  "<<|"   PGNSP PGUID b f f  718	718 16	  0    0 circle_below positionsel positionjoinsel ));
+DATA(insert OID = 1515 (  "<<|"   PGNSP PGUID b f f  718	718 16	  0    0 circle_below positionsel positionjoinsel "---"));
 DESCR("is below");
 
-DATA(insert OID = 1516 (  "+"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_add_pt - - ));
+DATA(insert OID = 1516 (  "+"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_add_pt - - "---"));
 DESCR("add");
-DATA(insert OID = 1517 (  "-"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_sub_pt - - ));
+DATA(insert OID = 1517 (  "-"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_sub_pt - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1518 (  "*"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_mul_pt - - ));
+DATA(insert OID = 1518 (  "*"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_mul_pt - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1519 (  "/"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_div_pt - - ));
+DATA(insert OID = 1519 (  "/"	  PGNSP PGUID b f f  718	600  718	  0    0 circle_div_pt - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 1520 (  "<->"   PGNSP PGUID b f f  718	718  701   1520    0 circle_distance - - ));
+DATA(insert OID = 1520 (  "<->"   PGNSP PGUID b f f  718	718  701   1520    0 circle_distance - - "---"));
 DESCR("distance between");
-DATA(insert OID = 1521 (  "#"	  PGNSP PGUID l f f  0		604   23	  0    0 poly_npoints - - ));
+DATA(insert OID = 1521 (  "#"	  PGNSP PGUID l f f  0		604   23	  0    0 poly_npoints - - "---"));
 DESCR("number of points");
-DATA(insert OID = 1522 (  "<->"   PGNSP PGUID b f f  600	718  701   3291    0 dist_pc - - ));
+DATA(insert OID = 1522 (  "<->"   PGNSP PGUID b f f  600	718  701   3291    0 dist_pc - - "---"));
 DESCR("distance between");
-DATA(insert OID = 3291 (  "<->"   PGNSP PGUID b f f  718	600  701   1522    0 dist_cpoint - - ));
+DATA(insert OID = 3291 (  "<->"   PGNSP PGUID b f f  718	600  701   1522    0 dist_cpoint - - "---"));
 DESCR("distance between");
-DATA(insert OID = 3276 (  "<->"   PGNSP PGUID b f f  600	604  701   3289    0 dist_ppoly - - ));
+DATA(insert OID = 3276 (  "<->"   PGNSP PGUID b f f  600	604  701   3289    0 dist_ppoly - - "---"));
 DESCR("distance between");
-DATA(insert OID = 3289 (  "<->"   PGNSP PGUID b f f  604	600  701   3276    0 dist_polyp - - ));
+DATA(insert OID = 3289 (  "<->"   PGNSP PGUID b f f  604	600  701   3276    0 dist_polyp - - "---"));
 DESCR("distance between");
-DATA(insert OID = 1523 (  "<->"   PGNSP PGUID b f f  718	604  701	  0    0 dist_cpoly - - ));
+DATA(insert OID = 1523 (  "<->"   PGNSP PGUID b f f  718	604  701	  0    0 dist_cpoly - - "---"));
 DESCR("distance between");
 
 /* additional geometric operators - thomas 1997-07-09 */
-DATA(insert OID = 1524 (  "<->"   PGNSP PGUID b f f  628	603  701	  0  0 dist_lb - - ));
+DATA(insert OID = 1524 (  "<->"   PGNSP PGUID b f f  628	603  701	  0  0 dist_lb - - "---"));
 DESCR("distance between");
 
-DATA(insert OID = 1525 (  "?#"	  PGNSP PGUID b f f  601	601 16 1525  0 lseg_intersect - - ));
+DATA(insert OID = 1525 (  "?#"	  PGNSP PGUID b f f  601	601 16 1525  0 lseg_intersect - - "---"));
 DESCR("intersect");
-DATA(insert OID = 1526 (  "?||"   PGNSP PGUID b f f  601	601 16 1526  0 lseg_parallel - - ));
+DATA(insert OID = 1526 (  "?||"   PGNSP PGUID b f f  601	601 16 1526  0 lseg_parallel - - "---"));
 DESCR("parallel");
-DATA(insert OID = 1527 (  "?-|"   PGNSP PGUID b f f  601	601 16 1527  0 lseg_perp - - ));
+DATA(insert OID = 1527 (  "?-|"   PGNSP PGUID b f f  601	601 16 1527  0 lseg_perp - - "---"));
 DESCR("perpendicular");
-DATA(insert OID = 1528 (  "?-"	  PGNSP PGUID l f f  0	601 16	  0  0 lseg_horizontal - - ));
+DATA(insert OID = 1528 (  "?-"	  PGNSP PGUID l f f  0	601 16	  0  0 lseg_horizontal - - "---"));
 DESCR("horizontal");
-DATA(insert OID = 1529 (  "?|"	  PGNSP PGUID l f f  0	601 16	  0  0 lseg_vertical - - ));
+DATA(insert OID = 1529 (  "?|"	  PGNSP PGUID l f f  0	601 16	  0  0 lseg_vertical - - "---"));
 DESCR("vertical");
-DATA(insert OID = 1535 (  "="	  PGNSP PGUID b f f  601	601 16 1535 1586 lseg_eq eqsel eqjoinsel ));
+DATA(insert OID = 1535 (  "="	  PGNSP PGUID b f f  601	601 16 1535 1586 lseg_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1536 (  "#"	  PGNSP PGUID b f f  601	601  600 1536  0 lseg_interpt - - ));
+DATA(insert OID = 1536 (  "#"	  PGNSP PGUID b f f  601	601  600 1536  0 lseg_interpt - - "---"));
 DESCR("intersection point");
-DATA(insert OID = 1537 (  "?#"	  PGNSP PGUID b f f  601	628 16	  0  0 inter_sl - - ));
+DATA(insert OID = 1537 (  "?#"	  PGNSP PGUID b f f  601	628 16	  0  0 inter_sl - - "---"));
 DESCR("intersect");
-DATA(insert OID = 1538 (  "?#"	  PGNSP PGUID b f f  601	603 16	  0  0 inter_sb - - ));
+DATA(insert OID = 1538 (  "?#"	  PGNSP PGUID b f f  601	603 16	  0  0 inter_sb - - "---"));
 DESCR("intersect");
-DATA(insert OID = 1539 (  "?#"	  PGNSP PGUID b f f  628	603 16	  0  0 inter_lb - - ));
+DATA(insert OID = 1539 (  "?#"	  PGNSP PGUID b f f  628	603 16	  0  0 inter_lb - - "---"));
 DESCR("intersect");
 
-DATA(insert OID = 1546 (  "<@"	  PGNSP PGUID b f f  600	628 16	  0  0 on_pl - - ));
+DATA(insert OID = 1546 (  "<@"	  PGNSP PGUID b f f  600	628 16	  0  0 on_pl - - "---"));
 DESCR("point on line");
-DATA(insert OID = 1547 (  "<@"	  PGNSP PGUID b f f  600	601 16	  0  0 on_ps - - ));
+DATA(insert OID = 1547 (  "<@"	  PGNSP PGUID b f f  600	601 16	  0  0 on_ps - - "---"));
 DESCR("is contained by");
-DATA(insert OID = 1548 (  "<@"	  PGNSP PGUID b f f  601	628 16	  0  0 on_sl - - ));
+DATA(insert OID = 1548 (  "<@"	  PGNSP PGUID b f f  601	628 16	  0  0 on_sl - - "---"));
 DESCR("lseg on line");
-DATA(insert OID = 1549 (  "<@"	  PGNSP PGUID b f f  601	603 16	  0  0 on_sb - - ));
+DATA(insert OID = 1549 (  "<@"	  PGNSP PGUID b f f  601	603 16	  0  0 on_sb - - "---"));
 DESCR("is contained by");
 
-DATA(insert OID = 1557 (  "##"	  PGNSP PGUID b f f  600	628  600	  0  0 close_pl - - ));
+DATA(insert OID = 1557 (  "##"	  PGNSP PGUID b f f  600	628  600	  0  0 close_pl - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1558 (  "##"	  PGNSP PGUID b f f  600	601  600	  0  0 close_ps - - ));
+DATA(insert OID = 1558 (  "##"	  PGNSP PGUID b f f  600	601  600	  0  0 close_ps - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1559 (  "##"	  PGNSP PGUID b f f  600	603  600	  0  0 close_pb - - ));
+DATA(insert OID = 1559 (  "##"	  PGNSP PGUID b f f  600	603  600	  0  0 close_pb - - "---"));
 DESCR("closest point to A on B");
 
-DATA(insert OID = 1566 (  "##"	  PGNSP PGUID b f f  601	628  600	  0  0 close_sl - - ));
+DATA(insert OID = 1566 (  "##"	  PGNSP PGUID b f f  601	628  600	  0  0 close_sl - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1567 (  "##"	  PGNSP PGUID b f f  601	603  600	  0  0 close_sb - - ));
+DATA(insert OID = 1567 (  "##"	  PGNSP PGUID b f f  601	603  600	  0  0 close_sb - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1568 (  "##"	  PGNSP PGUID b f f  628	603  600	  0  0 close_lb - - ));
+DATA(insert OID = 1568 (  "##"	  PGNSP PGUID b f f  628	603  600	  0  0 close_lb - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1577 (  "##"	  PGNSP PGUID b f f  628	601  600	  0  0 close_ls - - ));
+DATA(insert OID = 1577 (  "##"	  PGNSP PGUID b f f  628	601  600	  0  0 close_ls - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1578 (  "##"	  PGNSP PGUID b f f  601	601  600	  0  0 close_lseg - - ));
+DATA(insert OID = 1578 (  "##"	  PGNSP PGUID b f f  601	601  600	  0  0 close_lseg - - "---"));
 DESCR("closest point to A on B");
-DATA(insert OID = 1583 (  "*"	  PGNSP PGUID b f f 1186	701 1186	1584 0 interval_mul - - ));
+DATA(insert OID = 1583 (  "*"	  PGNSP PGUID b f f 1186	701 1186	1584 0 interval_mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1584 (  "*"	  PGNSP PGUID b f f  701 1186 1186	1583 0 mul_d_interval - - ));
+DATA(insert OID = 1584 (  "*"	  PGNSP PGUID b f f  701 1186 1186	1583 0 mul_d_interval - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1585 (  "/"	  PGNSP PGUID b f f 1186	701 1186	  0  0 interval_div - - ));
+DATA(insert OID = 1585 (  "/"	  PGNSP PGUID b f f 1186	701 1186	  0  0 interval_div - - "---"));
 DESCR("divide");
 
-DATA(insert OID = 1586 (  "<>"	  PGNSP PGUID b f f  601	601 16 1586 1535 lseg_ne neqsel neqjoinsel ));
+DATA(insert OID = 1586 (  "<>"	  PGNSP PGUID b f f  601	601 16 1586 1535 lseg_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1587 (  "<"	  PGNSP PGUID b f f  601	601 16 1589 1590 lseg_lt - - ));
+DATA(insert OID = 1587 (  "<"	  PGNSP PGUID b f f  601	601 16 1589 1590 lseg_lt - - "---"));
 DESCR("less than by length");
-DATA(insert OID = 1588 (  "<="	  PGNSP PGUID b f f  601	601 16 1590 1589 lseg_le - - ));
+DATA(insert OID = 1588 (  "<="	  PGNSP PGUID b f f  601	601 16 1590 1589 lseg_le - - "---"));
 DESCR("less than or equal by length");
-DATA(insert OID = 1589 (  ">"	  PGNSP PGUID b f f  601	601 16 1587 1588 lseg_gt - - ));
+DATA(insert OID = 1589 (  ">"	  PGNSP PGUID b f f  601	601 16 1587 1588 lseg_gt - - "---"));
 DESCR("greater than by length");
-DATA(insert OID = 1590 (  ">="	  PGNSP PGUID b f f  601	601 16 1588 1587 lseg_ge - - ));
+DATA(insert OID = 1590 (  ">="	  PGNSP PGUID b f f  601	601 16 1588 1587 lseg_ge - - "---"));
 DESCR("greater than or equal by length");
 
-DATA(insert OID = 1591 (  "@-@"   PGNSP PGUID l f f 0  601	701    0  0 lseg_length - - ));
+DATA(insert OID = 1591 (  "@-@"   PGNSP PGUID l f f 0  601	701    0  0 lseg_length - - "---"));
 DESCR("distance between endpoints");
 
-DATA(insert OID = 1611 (  "?#"	  PGNSP PGUID b f f  628	628 16 1611  0 line_intersect - - ));
+DATA(insert OID = 1611 (  "?#"	  PGNSP PGUID b f f  628	628 16 1611  0 line_intersect - - "---"));
 DESCR("intersect");
-DATA(insert OID = 1612 (  "?||"   PGNSP PGUID b f f  628	628 16 1612  0 line_parallel - - ));
+DATA(insert OID = 1612 (  "?||"   PGNSP PGUID b f f  628	628 16 1612  0 line_parallel - - "---"));
 DESCR("parallel");
-DATA(insert OID = 1613 (  "?-|"   PGNSP PGUID b f f  628	628 16 1613  0 line_perp - - ));
+DATA(insert OID = 1613 (  "?-|"   PGNSP PGUID b f f  628	628 16 1613  0 line_perp - - "---"));
 DESCR("perpendicular");
-DATA(insert OID = 1614 (  "?-"	  PGNSP PGUID l f f  0	628 16	  0  0 line_horizontal - - ));
+DATA(insert OID = 1614 (  "?-"	  PGNSP PGUID l f f  0	628 16	  0  0 line_horizontal - - "---"));
 DESCR("horizontal");
-DATA(insert OID = 1615 (  "?|"	  PGNSP PGUID l f f  0	628 16	  0  0 line_vertical - - ));
+DATA(insert OID = 1615 (  "?|"	  PGNSP PGUID l f f  0	628 16	  0  0 line_vertical - - "---"));
 DESCR("vertical");
-DATA(insert OID = 1616 (  "="	  PGNSP PGUID b f f  628	628 16 1616  0 line_eq eqsel eqjoinsel ));
+DATA(insert OID = 1616 (  "="	  PGNSP PGUID b f f  628	628 16 1616  0 line_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1617 (  "#"	  PGNSP PGUID b f f  628	628  600 1617  0 line_interpt - - ));
+DATA(insert OID = 1617 (  "#"	  PGNSP PGUID b f f  628	628  600 1617  0 line_interpt - - "---"));
 DESCR("intersection point");
 
 /* MAC type */
-DATA(insert OID = 1220 (  "="	   PGNSP PGUID b t t 829 829	 16 1220 1221 macaddr_eq eqsel eqjoinsel ));
+DATA(insert OID = 1220 (  "="	   PGNSP PGUID b t t 829 829	 16 1220 1221 macaddr_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1221 (  "<>"	   PGNSP PGUID b f f 829 829	 16 1221 1220 macaddr_ne neqsel neqjoinsel ));
+DATA(insert OID = 1221 (  "<>"	   PGNSP PGUID b f f 829 829	 16 1221 1220 macaddr_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1222 (  "<"	   PGNSP PGUID b f f 829 829	 16 1224 1225 macaddr_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1222 (  "<"	   PGNSP PGUID b f f 829 829	 16 1224 1225 macaddr_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1223 (  "<="	   PGNSP PGUID b f f 829 829	 16 1225 1224 macaddr_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1223 (  "<="	   PGNSP PGUID b f f 829 829	 16 1225 1224 macaddr_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1224 (  ">"	   PGNSP PGUID b f f 829 829	 16 1222 1223 macaddr_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1224 (  ">"	   PGNSP PGUID b f f 829 829	 16 1222 1223 macaddr_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1225 (  ">="	   PGNSP PGUID b f f 829 829	 16 1223 1222 macaddr_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1225 (  ">="	   PGNSP PGUID b f f 829 829	 16 1223 1222 macaddr_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 3147 (  "~"	   PGNSP PGUID l f f	  0 829 829 0 0 macaddr_not - - ));
+DATA(insert OID = 3147 (  "~"	   PGNSP PGUID l f f	  0 829 829 0 0 macaddr_not - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 3148 (  "&"	   PGNSP PGUID b f f	829 829 829 0 0 macaddr_and - - ));
+DATA(insert OID = 3148 (  "&"	   PGNSP PGUID b f f	829 829 829 0 0 macaddr_and - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 3149 (  "|"	   PGNSP PGUID b f f	829 829 829 0 0 macaddr_or - - ));
+DATA(insert OID = 3149 (  "|"	   PGNSP PGUID b f f	829 829 829 0 0 macaddr_or - - "---"));
 DESCR("bitwise or");
 
 /* INET type (these also support CIDR via implicit cast) */
-DATA(insert OID = 1201 (  "="	   PGNSP PGUID b t t 869 869	 16 1201 1202 network_eq eqsel eqjoinsel ));
+DATA(insert OID = 1201 (  "="	   PGNSP PGUID b t t 869 869	 16 1201 1202 network_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1202 (  "<>"	   PGNSP PGUID b f f 869 869	 16 1202 1201 network_ne neqsel neqjoinsel ));
+DATA(insert OID = 1202 (  "<>"	   PGNSP PGUID b f f 869 869	 16 1202 1201 network_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1203 (  "<"	   PGNSP PGUID b f f 869 869	 16 1205 1206 network_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1203 (  "<"	   PGNSP PGUID b f f 869 869	 16 1205 1206 network_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1204 (  "<="	   PGNSP PGUID b f f 869 869	 16 1206 1205 network_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1204 (  "<="	   PGNSP PGUID b f f 869 869	 16 1206 1205 network_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1205 (  ">"	   PGNSP PGUID b f f 869 869	 16 1203 1204 network_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1205 (  ">"	   PGNSP PGUID b f f 869 869	 16 1203 1204 network_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1206 (  ">="	   PGNSP PGUID b f f 869 869	 16 1204 1203 network_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1206 (  ">="	   PGNSP PGUID b f f 869 869	 16 1204 1203 network_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 931  (  "<<"	   PGNSP PGUID b f f 869 869	 16 933		0 network_sub networksel networkjoinsel ));
+DATA(insert OID = 931  (  "<<"	   PGNSP PGUID b f f 869 869	 16 933		0 network_sub networksel networkjoinsel "---"));
 DESCR("is subnet");
 #define OID_INET_SUB_OP			931
-DATA(insert OID = 932  (  "<<="    PGNSP PGUID b f f 869 869	 16 934		0 network_subeq networksel networkjoinsel ));
+DATA(insert OID = 932  (  "<<="    PGNSP PGUID b f f 869 869	 16 934		0 network_subeq networksel networkjoinsel "---"));
 DESCR("is subnet or equal");
 #define OID_INET_SUBEQ_OP		932
-DATA(insert OID = 933  (  ">>"	   PGNSP PGUID b f f 869 869	 16 931		0 network_sup networksel networkjoinsel ));
+DATA(insert OID = 933  (  ">>"	   PGNSP PGUID b f f 869 869	 16 931		0 network_sup networksel networkjoinsel "---"));
 DESCR("is supernet");
 #define OID_INET_SUP_OP			933
-DATA(insert OID = 934  (  ">>="    PGNSP PGUID b f f 869 869	 16 932		0 network_supeq networksel networkjoinsel ));
+DATA(insert OID = 934  (  ">>="    PGNSP PGUID b f f 869 869	 16 932		0 network_supeq networksel networkjoinsel "---"));
 DESCR("is supernet or equal");
 #define OID_INET_SUPEQ_OP		934
-DATA(insert OID = 3552	(  "&&"    PGNSP PGUID b f f 869 869	 16 3552	0 network_overlap networksel networkjoinsel ));
+DATA(insert OID = 3552	(  "&&"    PGNSP PGUID b f f 869 869	 16 3552	0 network_overlap networksel networkjoinsel "---"));
 DESCR("overlaps (is subnet or supernet)");
 #define OID_INET_OVERLAP_OP		3552
 
-DATA(insert OID = 2634 (  "~"	   PGNSP PGUID l f f	  0 869 869 0 0 inetnot - - ));
+DATA(insert OID = 2634 (  "~"	   PGNSP PGUID l f f	  0 869 869 0 0 inetnot - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 2635 (  "&"	   PGNSP PGUID b f f	869 869 869 0 0 inetand - - ));
+DATA(insert OID = 2635 (  "&"	   PGNSP PGUID b f f	869 869 869 0 0 inetand - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 2636 (  "|"	   PGNSP PGUID b f f	869 869 869 0 0 inetor - - ));
+DATA(insert OID = 2636 (  "|"	   PGNSP PGUID b f f	869 869 869 0 0 inetor - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 2637 (  "+"	   PGNSP PGUID b f f	869  20 869 2638 0 inetpl - - ));
+DATA(insert OID = 2637 (  "+"	   PGNSP PGUID b f f	869  20 869 2638 0 inetpl - - "---"));
 DESCR("add");
-DATA(insert OID = 2638 (  "+"	   PGNSP PGUID b f f	 20 869 869 2637 0 int8pl_inet - - ));
+DATA(insert OID = 2638 (  "+"	   PGNSP PGUID b f f	 20 869 869 2637 0 int8pl_inet - - "---"));
 DESCR("add");
-DATA(insert OID = 2639 (  "-"	   PGNSP PGUID b f f	869  20 869 0 0 inetmi_int8 - - ));
+DATA(insert OID = 2639 (  "-"	   PGNSP PGUID b f f	869  20 869 0 0 inetmi_int8 - - "---"));
 DESCR("subtract");
-DATA(insert OID = 2640 (  "-"	   PGNSP PGUID b f f	869 869  20 0 0 inetmi - - ));
+DATA(insert OID = 2640 (  "-"	   PGNSP PGUID b f f	869 869  20 0 0 inetmi - - "---"));
 DESCR("subtract");
 
 /* case-insensitive LIKE hacks */
-DATA(insert OID = 1625 (  "~~*"   PGNSP PGUID b f f  19 25	16 0 1626 nameiclike iclikesel iclikejoinsel ));
+DATA(insert OID = 1625 (  "~~*"   PGNSP PGUID b f f  19 25	16 0 1626 nameiclike iclikesel iclikejoinsel "---"));
 DESCR("matches LIKE expression, case-insensitive");
 #define OID_NAME_ICLIKE_OP		1625
-DATA(insert OID = 1626 (  "!~~*"  PGNSP PGUID b f f  19 25	16 0 1625 nameicnlike icnlikesel icnlikejoinsel ));
+DATA(insert OID = 1626 (  "!~~*"  PGNSP PGUID b f f  19 25	16 0 1625 nameicnlike icnlikesel icnlikejoinsel "---"));
 DESCR("does not match LIKE expression, case-insensitive");
-DATA(insert OID = 1627 (  "~~*"   PGNSP PGUID b f f  25 25	16 0 1628 texticlike iclikesel iclikejoinsel ));
+DATA(insert OID = 1627 (  "~~*"   PGNSP PGUID b f f  25 25	16 0 1628 texticlike iclikesel iclikejoinsel "---"));
 DESCR("matches LIKE expression, case-insensitive");
 #define OID_TEXT_ICLIKE_OP		1627
-DATA(insert OID = 1628 (  "!~~*"  PGNSP PGUID b f f  25 25	16 0 1627 texticnlike icnlikesel icnlikejoinsel ));
+DATA(insert OID = 1628 (  "!~~*"  PGNSP PGUID b f f  25 25	16 0 1627 texticnlike icnlikesel icnlikejoinsel "---"));
 DESCR("does not match LIKE expression, case-insensitive");
-DATA(insert OID = 1629 (  "~~*"   PGNSP PGUID b f f  1042 25	16 0 1630 bpchariclike iclikesel iclikejoinsel ));
+DATA(insert OID = 1629 (  "~~*"   PGNSP PGUID b f f  1042 25	16 0 1630 bpchariclike iclikesel iclikejoinsel "---"));
 DESCR("matches LIKE expression, case-insensitive");
 #define OID_BPCHAR_ICLIKE_OP	1629
-DATA(insert OID = 1630 (  "!~~*"  PGNSP PGUID b f f  1042 25	16 0 1629 bpcharicnlike icnlikesel icnlikejoinsel ));
+DATA(insert OID = 1630 (  "!~~*"  PGNSP PGUID b f f  1042 25	16 0 1629 bpcharicnlike icnlikesel icnlikejoinsel "---"));
 DESCR("does not match LIKE expression, case-insensitive");
 
 /* NUMERIC type - OID's 1700-1799 */
-DATA(insert OID = 1751 (  "-"	   PGNSP PGUID l f f	0 1700 1700    0	0 numeric_uminus - - ));
+DATA(insert OID = 1751 (  "-"	   PGNSP PGUID l f f	0 1700 1700    0	0 numeric_uminus - - "---"));
 DESCR("negate");
-DATA(insert OID = 1752 (  "="	   PGNSP PGUID b t t 1700 1700	 16 1752 1753 numeric_eq eqsel eqjoinsel ));
+DATA(insert OID = 1752 (  "="	   PGNSP PGUID b t t 1700 1700	 16 1752 1753 numeric_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1753 (  "<>"	   PGNSP PGUID b f f 1700 1700	 16 1753 1752 numeric_ne neqsel neqjoinsel ));
+DATA(insert OID = 1753 (  "<>"	   PGNSP PGUID b f f 1700 1700	 16 1753 1752 numeric_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1754 (  "<"	   PGNSP PGUID b f f 1700 1700	 16 1756 1757 numeric_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1754 (  "<"	   PGNSP PGUID b f f 1700 1700	 16 1756 1757 numeric_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1755 (  "<="	   PGNSP PGUID b f f 1700 1700	 16 1757 1756 numeric_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1755 (  "<="	   PGNSP PGUID b f f 1700 1700	 16 1757 1756 numeric_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1756 (  ">"	   PGNSP PGUID b f f 1700 1700	 16 1754 1755 numeric_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1756 (  ">"	   PGNSP PGUID b f f 1700 1700	 16 1754 1755 numeric_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1757 (  ">="	   PGNSP PGUID b f f 1700 1700	 16 1755 1754 numeric_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1757 (  ">="	   PGNSP PGUID b f f 1700 1700	 16 1755 1754 numeric_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 1758 (  "+"	   PGNSP PGUID b f f 1700 1700 1700 1758	0 numeric_add - - ));
+DATA(insert OID = 1758 (  "+"	   PGNSP PGUID b f f 1700 1700 1700 1758	0 numeric_add - - "---"));
 DESCR("add");
-DATA(insert OID = 1759 (  "-"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_sub - - ));
+DATA(insert OID = 1759 (  "-"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_sub - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1760 (  "*"	   PGNSP PGUID b f f 1700 1700 1700 1760	0 numeric_mul - - ));
+DATA(insert OID = 1760 (  "*"	   PGNSP PGUID b f f 1700 1700 1700 1760	0 numeric_mul - - "---"));
 DESCR("multiply");
-DATA(insert OID = 1761 (  "/"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_div - - ));
+DATA(insert OID = 1761 (  "/"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_div - - "---"));
 DESCR("divide");
-DATA(insert OID = 1762 (  "%"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_mod - - ));
+DATA(insert OID = 1762 (  "%"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_mod - - "---"));
 DESCR("modulus");
-DATA(insert OID = 1038 (  "^"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_power - - ));
+DATA(insert OID = 1038 (  "^"	   PGNSP PGUID b f f 1700 1700 1700    0	0 numeric_power - - "---"));
 DESCR("exponentiation");
-DATA(insert OID = 1763 (  "@"	   PGNSP PGUID l f f	0 1700 1700    0	0 numeric_abs - - ));
+DATA(insert OID = 1763 (  "@"	   PGNSP PGUID l f f	0 1700 1700    0	0 numeric_abs - - "---"));
 DESCR("absolute value");
 
-DATA(insert OID = 1784 (  "="	  PGNSP PGUID b t f 1560 1560 16 1784 1785 biteq eqsel eqjoinsel ));
+DATA(insert OID = 1784 (  "="	  PGNSP PGUID b t f 1560 1560 16 1784 1785 biteq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1785 (  "<>"	  PGNSP PGUID b f f 1560 1560 16 1785 1784 bitne neqsel neqjoinsel ));
+DATA(insert OID = 1785 (  "<>"	  PGNSP PGUID b f f 1560 1560 16 1785 1784 bitne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1786 (  "<"	  PGNSP PGUID b f f 1560 1560 16 1787 1789 bitlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1786 (  "<"	  PGNSP PGUID b f f 1560 1560 16 1787 1789 bitlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1787 (  ">"	  PGNSP PGUID b f f 1560 1560 16 1786 1788 bitgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1787 (  ">"	  PGNSP PGUID b f f 1560 1560 16 1786 1788 bitgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1788 (  "<="	  PGNSP PGUID b f f 1560 1560 16 1789 1787 bitle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1788 (  "<="	  PGNSP PGUID b f f 1560 1560 16 1789 1787 bitle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1789 (  ">="	  PGNSP PGUID b f f 1560 1560 16 1788 1786 bitge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1789 (  ">="	  PGNSP PGUID b f f 1560 1560 16 1788 1786 bitge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 1791 (  "&"	  PGNSP PGUID b f f 1560 1560 1560 1791  0 bitand - - ));
+DATA(insert OID = 1791 (  "&"	  PGNSP PGUID b f f 1560 1560 1560 1791  0 bitand - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 1792 (  "|"	  PGNSP PGUID b f f 1560 1560 1560 1792  0 bitor - - ));
+DATA(insert OID = 1792 (  "|"	  PGNSP PGUID b f f 1560 1560 1560 1792  0 bitor - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 1793 (  "#"	  PGNSP PGUID b f f 1560 1560 1560 1793  0 bitxor - - ));
+DATA(insert OID = 1793 (  "#"	  PGNSP PGUID b f f 1560 1560 1560 1793  0 bitxor - - "---"));
 DESCR("bitwise exclusive or");
-DATA(insert OID = 1794 (  "~"	  PGNSP PGUID l f f    0 1560 1560	  0  0 bitnot - - ));
+DATA(insert OID = 1794 (  "~"	  PGNSP PGUID l f f    0 1560 1560	  0  0 bitnot - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 1795 (  "<<"	  PGNSP PGUID b f f 1560   23 1560	  0  0 bitshiftleft - - ));
+DATA(insert OID = 1795 (  "<<"	  PGNSP PGUID b f f 1560   23 1560	  0  0 bitshiftleft - - "---"));
 DESCR("bitwise shift left");
-DATA(insert OID = 1796 (  ">>"	  PGNSP PGUID b f f 1560   23 1560	  0  0 bitshiftright - - ));
+DATA(insert OID = 1796 (  ">>"	  PGNSP PGUID b f f 1560   23 1560	  0  0 bitshiftright - - "---"));
 DESCR("bitwise shift right");
-DATA(insert OID = 1797 (  "||"	  PGNSP PGUID b f f 1562 1562 1562	  0  0 bitcat - - ));
+DATA(insert OID = 1797 (  "||"	  PGNSP PGUID b f f 1562 1562 1562	  0  0 bitcat - - "---"));
 DESCR("concatenate");
 
-DATA(insert OID = 1800 (  "+"	   PGNSP PGUID b f f 1083 1186 1083  1849 0 time_pl_interval - - ));
+DATA(insert OID = 1800 (  "+"	   PGNSP PGUID b f f 1083 1186 1083  1849 0 time_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 1801 (  "-"	   PGNSP PGUID b f f 1083 1186 1083  0	0 time_mi_interval - - ));
+DATA(insert OID = 1801 (  "-"	   PGNSP PGUID b f f 1083 1186 1083  0	0 time_mi_interval - - "---"));
 DESCR("subtract");
-DATA(insert OID = 1802 (  "+"	   PGNSP PGUID b f f 1266 1186 1266  2552 0 timetz_pl_interval - - ));
+DATA(insert OID = 1802 (  "+"	   PGNSP PGUID b f f 1266 1186 1266  2552 0 timetz_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 1803 (  "-"	   PGNSP PGUID b f f 1266 1186 1266  0	0 timetz_mi_interval - - ));
+DATA(insert OID = 1803 (  "-"	   PGNSP PGUID b f f 1266 1186 1266  0	0 timetz_mi_interval - - "---"));
 DESCR("subtract");
 
-DATA(insert OID = 1804 (  "="	  PGNSP PGUID b t f 1562 1562 16 1804 1805 varbiteq eqsel eqjoinsel ));
+DATA(insert OID = 1804 (  "="	  PGNSP PGUID b t f 1562 1562 16 1804 1805 varbiteq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1805 (  "<>"	  PGNSP PGUID b f f 1562 1562 16 1805 1804 varbitne neqsel neqjoinsel ));
+DATA(insert OID = 1805 (  "<>"	  PGNSP PGUID b f f 1562 1562 16 1805 1804 varbitne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1806 (  "<"	  PGNSP PGUID b f f 1562 1562 16 1807 1809 varbitlt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1806 (  "<"	  PGNSP PGUID b f f 1562 1562 16 1807 1809 varbitlt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1807 (  ">"	  PGNSP PGUID b f f 1562 1562 16 1806 1808 varbitgt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1807 (  ">"	  PGNSP PGUID b f f 1562 1562 16 1806 1808 varbitgt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1808 (  "<="	  PGNSP PGUID b f f 1562 1562 16 1809 1807 varbitle scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1808 (  "<="	  PGNSP PGUID b f f 1562 1562 16 1809 1807 varbitle scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1809 (  ">="	  PGNSP PGUID b f f 1562 1562 16 1808 1806 varbitge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1809 (  ">="	  PGNSP PGUID b f f 1562 1562 16 1808 1806 varbitge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 1849 (  "+"	   PGNSP PGUID b f f 1186 1083 1083  1800 0 interval_pl_time - - ));
+DATA(insert OID = 1849 (  "+"	   PGNSP PGUID b f f 1186 1083 1083  1800 0 interval_pl_time - - "---"));
 DESCR("add");
 
-DATA(insert OID = 1862 ( "="	   PGNSP PGUID b t t	21	20	16 1868  1863 int28eq eqsel eqjoinsel ));
+DATA(insert OID = 1862 ( "="	   PGNSP PGUID b t t	21	20	16 1868  1863 int28eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1863 ( "<>"	   PGNSP PGUID b f f	21	20	16 1869  1862 int28ne neqsel neqjoinsel ));
+DATA(insert OID = 1863 ( "<>"	   PGNSP PGUID b f f	21	20	16 1869  1862 int28ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1864 ( "<"	   PGNSP PGUID b f f	21	20	16 1871  1867 int28lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1864 ( "<"	   PGNSP PGUID b f f	21	20	16 1871  1867 int28lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1865 ( ">"	   PGNSP PGUID b f f	21	20	16 1870  1866 int28gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1865 ( ">"	   PGNSP PGUID b f f	21	20	16 1870  1866 int28gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1866 ( "<="	   PGNSP PGUID b f f	21	20	16 1873  1865 int28le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1866 ( "<="	   PGNSP PGUID b f f	21	20	16 1873  1865 int28le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1867 ( ">="	   PGNSP PGUID b f f	21	20	16 1872  1864 int28ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1867 ( ">="	   PGNSP PGUID b f f	21	20	16 1872  1864 int28ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 1868 ( "="	   PGNSP PGUID b t t	20	21	16	1862 1869 int82eq eqsel eqjoinsel ));
+DATA(insert OID = 1868 ( "="	   PGNSP PGUID b t t	20	21	16	1862 1869 int82eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1869 ( "<>"	   PGNSP PGUID b f f	20	21	16	1863 1868 int82ne neqsel neqjoinsel ));
+DATA(insert OID = 1869 ( "<>"	   PGNSP PGUID b f f	20	21	16	1863 1868 int82ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1870 ( "<"	   PGNSP PGUID b f f	20	21	16	1865 1873 int82lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1870 ( "<"	   PGNSP PGUID b f f	20	21	16	1865 1873 int82lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1871 ( ">"	   PGNSP PGUID b f f	20	21	16	1864 1872 int82gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1871 ( ">"	   PGNSP PGUID b f f	20	21	16	1864 1872 int82gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1872 ( "<="	   PGNSP PGUID b f f	20	21	16	1867 1871 int82le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1872 ( "<="	   PGNSP PGUID b f f	20	21	16	1867 1871 int82le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1873 ( ">="	   PGNSP PGUID b f f	20	21	16	1866 1870 int82ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1873 ( ">="	   PGNSP PGUID b f f	20	21	16	1866 1870 int82ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 1874 ( "&"	   PGNSP PGUID b f f	21	21	21	1874  0 int2and - - ));
+DATA(insert OID = 1874 ( "&"	   PGNSP PGUID b f f	21	21	21	1874  0 int2and - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 1875 ( "|"	   PGNSP PGUID b f f	21	21	21	1875  0 int2or - - ));
+DATA(insert OID = 1875 ( "|"	   PGNSP PGUID b f f	21	21	21	1875  0 int2or - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 1876 ( "#"	   PGNSP PGUID b f f	21	21	21	1876  0 int2xor - - ));
+DATA(insert OID = 1876 ( "#"	   PGNSP PGUID b f f	21	21	21	1876  0 int2xor - - "---"));
 DESCR("bitwise exclusive or");
-DATA(insert OID = 1877 ( "~"	   PGNSP PGUID l f f	 0	21	21	 0	  0 int2not - - ));
+DATA(insert OID = 1877 ( "~"	   PGNSP PGUID l f f	 0	21	21	 0	  0 int2not - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 1878 ( "<<"	   PGNSP PGUID b f f	21	23	21	 0	  0 int2shl - - ));
+DATA(insert OID = 1878 ( "<<"	   PGNSP PGUID b f f	21	23	21	 0	  0 int2shl - - "---"));
 DESCR("bitwise shift left");
-DATA(insert OID = 1879 ( ">>"	   PGNSP PGUID b f f	21	23	21	 0	  0 int2shr - - ));
+DATA(insert OID = 1879 ( ">>"	   PGNSP PGUID b f f	21	23	21	 0	  0 int2shr - - "---"));
 DESCR("bitwise shift right");
 
-DATA(insert OID = 1880 ( "&"	   PGNSP PGUID b f f	23	23	23	1880  0 int4and - - ));
+DATA(insert OID = 1880 ( "&"	   PGNSP PGUID b f f	23	23	23	1880  0 int4and - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 1881 ( "|"	   PGNSP PGUID b f f	23	23	23	1881  0 int4or - - ));
+DATA(insert OID = 1881 ( "|"	   PGNSP PGUID b f f	23	23	23	1881  0 int4or - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 1882 ( "#"	   PGNSP PGUID b f f	23	23	23	1882  0 int4xor - - ));
+DATA(insert OID = 1882 ( "#"	   PGNSP PGUID b f f	23	23	23	1882  0 int4xor - - "---"));
 DESCR("bitwise exclusive or");
-DATA(insert OID = 1883 ( "~"	   PGNSP PGUID l f f	 0	23	23	 0	  0 int4not - - ));
+DATA(insert OID = 1883 ( "~"	   PGNSP PGUID l f f	 0	23	23	 0	  0 int4not - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 1884 ( "<<"	   PGNSP PGUID b f f	23	23	23	 0	  0 int4shl - - ));
+DATA(insert OID = 1884 ( "<<"	   PGNSP PGUID b f f	23	23	23	 0	  0 int4shl - - "---"));
 DESCR("bitwise shift left");
-DATA(insert OID = 1885 ( ">>"	   PGNSP PGUID b f f	23	23	23	 0	  0 int4shr - - ));
+DATA(insert OID = 1885 ( ">>"	   PGNSP PGUID b f f	23	23	23	 0	  0 int4shr - - "---"));
 DESCR("bitwise shift right");
 
-DATA(insert OID = 1886 ( "&"	   PGNSP PGUID b f f	20	20	20	1886  0 int8and - - ));
+DATA(insert OID = 1886 ( "&"	   PGNSP PGUID b f f	20	20	20	1886  0 int8and - - "---"));
 DESCR("bitwise and");
-DATA(insert OID = 1887 ( "|"	   PGNSP PGUID b f f	20	20	20	1887  0 int8or - - ));
+DATA(insert OID = 1887 ( "|"	   PGNSP PGUID b f f	20	20	20	1887  0 int8or - - "---"));
 DESCR("bitwise or");
-DATA(insert OID = 1888 ( "#"	   PGNSP PGUID b f f	20	20	20	1888  0 int8xor - - ));
+DATA(insert OID = 1888 ( "#"	   PGNSP PGUID b f f	20	20	20	1888  0 int8xor - - "---"));
 DESCR("bitwise exclusive or");
-DATA(insert OID = 1889 ( "~"	   PGNSP PGUID l f f	 0	20	20	 0	  0 int8not - - ));
+DATA(insert OID = 1889 ( "~"	   PGNSP PGUID l f f	 0	20	20	 0	  0 int8not - - "---"));
 DESCR("bitwise not");
-DATA(insert OID = 1890 ( "<<"	   PGNSP PGUID b f f	20	23	20	 0	  0 int8shl - - ));
+DATA(insert OID = 1890 ( "<<"	   PGNSP PGUID b f f	20	23	20	 0	  0 int8shl - - "---"));
 DESCR("bitwise shift left");
-DATA(insert OID = 1891 ( ">>"	   PGNSP PGUID b f f	20	23	20	 0	  0 int8shr - - ));
+DATA(insert OID = 1891 ( ">>"	   PGNSP PGUID b f f	20	23	20	 0	  0 int8shr - - "---"));
 DESCR("bitwise shift right");
 
-DATA(insert OID = 1916 (  "+"	   PGNSP PGUID l f f	 0	20	20	0	0 int8up - - ));
+DATA(insert OID = 1916 (  "+"	   PGNSP PGUID l f f	 0	20	20	0	0 int8up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1917 (  "+"	   PGNSP PGUID l f f	 0	21	21	0	0 int2up - - ));
+DATA(insert OID = 1917 (  "+"	   PGNSP PGUID l f f	 0	21	21	0	0 int2up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1918 (  "+"	   PGNSP PGUID l f f	 0	23	23	0	0 int4up - - ));
+DATA(insert OID = 1918 (  "+"	   PGNSP PGUID l f f	 0	23	23	0	0 int4up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1919 (  "+"	   PGNSP PGUID l f f	 0	700 700 0	0 float4up - - ));
+DATA(insert OID = 1919 (  "+"	   PGNSP PGUID l f f	 0	700 700 0	0 float4up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1920 (  "+"	   PGNSP PGUID l f f	 0	701 701 0	0 float8up - - ));
+DATA(insert OID = 1920 (  "+"	   PGNSP PGUID l f f	 0	701 701 0	0 float8up - - "---"));
 DESCR("unary plus");
-DATA(insert OID = 1921 (  "+"	   PGNSP PGUID l f f	 0 1700 1700	0	0 numeric_uplus - - ));
+DATA(insert OID = 1921 (  "+"	   PGNSP PGUID l f f	 0 1700 1700	0	0 numeric_uplus - - "---"));
 DESCR("unary plus");
 
 /* bytea operators */
-DATA(insert OID = 1955 ( "="	   PGNSP PGUID b t t 17 17	16 1955 1956 byteaeq eqsel eqjoinsel ));
+DATA(insert OID = 1955 ( "="	   PGNSP PGUID b t t 17 17	16 1955 1956 byteaeq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 1956 ( "<>"	   PGNSP PGUID b f f 17 17	16 1956 1955 byteane neqsel neqjoinsel ));
+DATA(insert OID = 1956 ( "<>"	   PGNSP PGUID b f f 17 17	16 1956 1955 byteane neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 1957 ( "<"	   PGNSP PGUID b f f 17 17	16 1959 1960 bytealt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1957 ( "<"	   PGNSP PGUID b f f 17 17	16 1959 1960 bytealt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 1958 ( "<="	   PGNSP PGUID b f f 17 17	16 1960 1959 byteale scalarltsel scalarltjoinsel ));
+DATA(insert OID = 1958 ( "<="	   PGNSP PGUID b f f 17 17	16 1960 1959 byteale scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 1959 ( ">"	   PGNSP PGUID b f f 17 17	16 1957 1958 byteagt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1959 ( ">"	   PGNSP PGUID b f f 17 17	16 1957 1958 byteagt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 1960 ( ">="	   PGNSP PGUID b f f 17 17	16 1958 1957 byteage scalargtsel scalargtjoinsel ));
+DATA(insert OID = 1960 ( ">="	   PGNSP PGUID b f f 17 17	16 1958 1957 byteage scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
-DATA(insert OID = 2016 (  "~~"	   PGNSP PGUID b f f 17 17	16 0	2017 bytealike likesel likejoinsel ));
+DATA(insert OID = 2016 (  "~~"	   PGNSP PGUID b f f 17 17	16 0	2017 bytealike likesel likejoinsel "---"));
 DESCR("matches LIKE expression");
 #define OID_BYTEA_LIKE_OP		2016
-DATA(insert OID = 2017 (  "!~~"    PGNSP PGUID b f f 17 17	16 0	2016 byteanlike nlikesel nlikejoinsel ));
+DATA(insert OID = 2017 (  "!~~"    PGNSP PGUID b f f 17 17	16 0	2016 byteanlike nlikesel nlikejoinsel "---"));
 DESCR("does not match LIKE expression");
-DATA(insert OID = 2018 (  "||"	   PGNSP PGUID b f f 17 17	17 0	0	 byteacat - - ));
+DATA(insert OID = 2018 (  "||"	   PGNSP PGUID b f f 17 17	17 0	0	 byteacat - - "---"));
 DESCR("concatenate");
 
 /* timestamp operators */
-DATA(insert OID = 2060 (  "="	   PGNSP PGUID b t t 1114 1114	 16 2060 2061 timestamp_eq eqsel eqjoinsel ));
+DATA(insert OID = 2060 (  "="	   PGNSP PGUID b t t 1114 1114	 16 2060 2061 timestamp_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2061 (  "<>"	   PGNSP PGUID b f f 1114 1114	 16 2061 2060 timestamp_ne neqsel neqjoinsel ));
+DATA(insert OID = 2061 (  "<>"	   PGNSP PGUID b f f 1114 1114	 16 2061 2060 timestamp_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 2062 (  "<"	   PGNSP PGUID b f f 1114 1114	 16 2064 2065 timestamp_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2062 (  "<"	   PGNSP PGUID b f f 1114 1114	 16 2064 2065 timestamp_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2063 (  "<="	   PGNSP PGUID b f f 1114 1114	 16 2065 2064 timestamp_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2063 (  "<="	   PGNSP PGUID b f f 1114 1114	 16 2065 2064 timestamp_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2064 (  ">"	   PGNSP PGUID b f f 1114 1114	 16 2062 2063 timestamp_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2064 (  ">"	   PGNSP PGUID b f f 1114 1114	 16 2062 2063 timestamp_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2065 (  ">="	   PGNSP PGUID b f f 1114 1114	 16 2063 2062 timestamp_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2065 (  ">="	   PGNSP PGUID b f f 1114 1114	 16 2063 2062 timestamp_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2066 (  "+"	   PGNSP PGUID b f f 1114 1186 1114  2553 0 timestamp_pl_interval - - ));
+DATA(insert OID = 2066 (  "+"	   PGNSP PGUID b f f 1114 1186 1114  2553 0 timestamp_pl_interval - - "---"));
 DESCR("add");
-DATA(insert OID = 2067 (  "-"	   PGNSP PGUID b f f 1114 1114 1186  0	0 timestamp_mi - - ));
+DATA(insert OID = 2067 (  "-"	   PGNSP PGUID b f f 1114 1114 1186  0	0 timestamp_mi - - "---"));
 DESCR("subtract");
-DATA(insert OID = 2068 (  "-"	   PGNSP PGUID b f f 1114 1186 1114  0	0 timestamp_mi_interval - - ));
+DATA(insert OID = 2068 (  "-"	   PGNSP PGUID b f f 1114 1186 1114  0	0 timestamp_mi_interval - - "---"));
 DESCR("subtract");
 
 /* character-by-character (not collation order) comparison operators for character types */
 
-DATA(insert OID = 2314 ( "~<~"	PGNSP PGUID b f f 25 25 16 2318 2317 text_pattern_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2314 ( "~<~"	PGNSP PGUID b f f 25 25 16 2318 2317 text_pattern_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2315 ( "~<=~" PGNSP PGUID b f f 25 25 16 2317 2318 text_pattern_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2315 ( "~<=~" PGNSP PGUID b f f 25 25 16 2317 2318 text_pattern_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2317 ( "~>=~" PGNSP PGUID b f f 25 25 16 2315 2314 text_pattern_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2317 ( "~>=~" PGNSP PGUID b f f 25 25 16 2315 2314 text_pattern_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2318 ( "~>~"	PGNSP PGUID b f f 25 25 16 2314 2315 text_pattern_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2318 ( "~>~"	PGNSP PGUID b f f 25 25 16 2314 2315 text_pattern_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
 
-DATA(insert OID = 2326 ( "~<~"	PGNSP PGUID b f f 1042 1042 16 2330 2329 bpchar_pattern_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2326 ( "~<~"	PGNSP PGUID b f f 1042 1042 16 2330 2329 bpchar_pattern_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2327 ( "~<=~" PGNSP PGUID b f f 1042 1042 16 2329 2330 bpchar_pattern_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2327 ( "~<=~" PGNSP PGUID b f f 1042 1042 16 2329 2330 bpchar_pattern_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2329 ( "~>=~" PGNSP PGUID b f f 1042 1042 16 2327 2326 bpchar_pattern_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2329 ( "~>=~" PGNSP PGUID b f f 1042 1042 16 2327 2326 bpchar_pattern_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2330 ( "~>~"	PGNSP PGUID b f f 1042 1042 16 2326 2327 bpchar_pattern_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2330 ( "~>~"	PGNSP PGUID b f f 1042 1042 16 2326 2327 bpchar_pattern_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
 
 /* crosstype operations for date vs. timestamp and timestamptz */
 
-DATA(insert OID = 2345 ( "<"	   PGNSP PGUID b f f	1082	1114   16 2375 2348 date_lt_timestamp scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2345 ( "<"	   PGNSP PGUID b f f	1082	1114   16 2375 2348 date_lt_timestamp scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2346 ( "<="	   PGNSP PGUID b f f	1082	1114   16 2374 2349 date_le_timestamp scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2346 ( "<="	   PGNSP PGUID b f f	1082	1114   16 2374 2349 date_le_timestamp scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2347 ( "="	   PGNSP PGUID b t f	1082	1114   16 2373 2350 date_eq_timestamp eqsel eqjoinsel ));
+DATA(insert OID = 2347 ( "="	   PGNSP PGUID b t f	1082	1114   16 2373 2350 date_eq_timestamp eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2348 ( ">="	   PGNSP PGUID b f f	1082	1114   16 2372 2345 date_ge_timestamp scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2348 ( ">="	   PGNSP PGUID b f f	1082	1114   16 2372 2345 date_ge_timestamp scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2349 ( ">"	   PGNSP PGUID b f f	1082	1114   16 2371 2346 date_gt_timestamp scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2349 ( ">"	   PGNSP PGUID b f f	1082	1114   16 2371 2346 date_gt_timestamp scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2350 ( "<>"	   PGNSP PGUID b f f	1082	1114   16 2376 2347 date_ne_timestamp neqsel neqjoinsel ));
+DATA(insert OID = 2350 ( "<>"	   PGNSP PGUID b f f	1082	1114   16 2376 2347 date_ne_timestamp neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 2358 ( "<"	   PGNSP PGUID b f f	1082	1184   16 2388 2361 date_lt_timestamptz scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2358 ( "<"	   PGNSP PGUID b f f	1082	1184   16 2388 2361 date_lt_timestamptz scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2359 ( "<="	   PGNSP PGUID b f f	1082	1184   16 2387 2362 date_le_timestamptz scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2359 ( "<="	   PGNSP PGUID b f f	1082	1184   16 2387 2362 date_le_timestamptz scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2360 ( "="	   PGNSP PGUID b t f	1082	1184   16 2386 2363 date_eq_timestamptz eqsel eqjoinsel ));
+DATA(insert OID = 2360 ( "="	   PGNSP PGUID b t f	1082	1184   16 2386 2363 date_eq_timestamptz eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2361 ( ">="	   PGNSP PGUID b f f	1082	1184   16 2385 2358 date_ge_timestamptz scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2361 ( ">="	   PGNSP PGUID b f f	1082	1184   16 2385 2358 date_ge_timestamptz scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2362 ( ">"	   PGNSP PGUID b f f	1082	1184   16 2384 2359 date_gt_timestamptz scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2362 ( ">"	   PGNSP PGUID b f f	1082	1184   16 2384 2359 date_gt_timestamptz scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2363 ( "<>"	   PGNSP PGUID b f f	1082	1184   16 2389 2360 date_ne_timestamptz neqsel neqjoinsel ));
+DATA(insert OID = 2363 ( "<>"	   PGNSP PGUID b f f	1082	1184   16 2389 2360 date_ne_timestamptz neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 2371 ( "<"	   PGNSP PGUID b f f	1114	1082   16 2349 2374 timestamp_lt_date scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2371 ( "<"	   PGNSP PGUID b f f	1114	1082   16 2349 2374 timestamp_lt_date scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2372 ( "<="	   PGNSP PGUID b f f	1114	1082   16 2348 2375 timestamp_le_date scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2372 ( "<="	   PGNSP PGUID b f f	1114	1082   16 2348 2375 timestamp_le_date scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2373 ( "="	   PGNSP PGUID b t f	1114	1082   16 2347 2376 timestamp_eq_date eqsel eqjoinsel ));
+DATA(insert OID = 2373 ( "="	   PGNSP PGUID b t f	1114	1082   16 2347 2376 timestamp_eq_date eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2374 ( ">="	   PGNSP PGUID b f f	1114	1082   16 2346 2371 timestamp_ge_date scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2374 ( ">="	   PGNSP PGUID b f f	1114	1082   16 2346 2371 timestamp_ge_date scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2375 ( ">"	   PGNSP PGUID b f f	1114	1082   16 2345 2372 timestamp_gt_date scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2375 ( ">"	   PGNSP PGUID b f f	1114	1082   16 2345 2372 timestamp_gt_date scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2376 ( "<>"	   PGNSP PGUID b f f	1114	1082   16 2350 2373 timestamp_ne_date neqsel neqjoinsel ));
+DATA(insert OID = 2376 ( "<>"	   PGNSP PGUID b f f	1114	1082   16 2350 2373 timestamp_ne_date neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 2384 ( "<"	   PGNSP PGUID b f f	1184	1082   16 2362 2387 timestamptz_lt_date scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2384 ( "<"	   PGNSP PGUID b f f	1184	1082   16 2362 2387 timestamptz_lt_date scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2385 ( "<="	   PGNSP PGUID b f f	1184	1082   16 2361 2388 timestamptz_le_date scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2385 ( "<="	   PGNSP PGUID b f f	1184	1082   16 2361 2388 timestamptz_le_date scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2386 ( "="	   PGNSP PGUID b t f	1184	1082   16 2360 2389 timestamptz_eq_date eqsel eqjoinsel ));
+DATA(insert OID = 2386 ( "="	   PGNSP PGUID b t f	1184	1082   16 2360 2389 timestamptz_eq_date eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2387 ( ">="	   PGNSP PGUID b f f	1184	1082   16 2359 2384 timestamptz_ge_date scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2387 ( ">="	   PGNSP PGUID b f f	1184	1082   16 2359 2384 timestamptz_ge_date scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2388 ( ">"	   PGNSP PGUID b f f	1184	1082   16 2358 2385 timestamptz_gt_date scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2388 ( ">"	   PGNSP PGUID b f f	1184	1082   16 2358 2385 timestamptz_gt_date scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2389 ( "<>"	   PGNSP PGUID b f f	1184	1082   16 2363 2386 timestamptz_ne_date neqsel neqjoinsel ));
+DATA(insert OID = 2389 ( "<>"	   PGNSP PGUID b f f	1184	1082   16 2363 2386 timestamptz_ne_date neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
 /* crosstype operations for timestamp vs. timestamptz */
 
-DATA(insert OID = 2534 ( "<"	   PGNSP PGUID b f f	1114	1184   16 2544 2537 timestamp_lt_timestamptz scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2534 ( "<"	   PGNSP PGUID b f f	1114	1184   16 2544 2537 timestamp_lt_timestamptz scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2535 ( "<="	   PGNSP PGUID b f f	1114	1184   16 2543 2538 timestamp_le_timestamptz scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2535 ( "<="	   PGNSP PGUID b f f	1114	1184   16 2543 2538 timestamp_le_timestamptz scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2536 ( "="	   PGNSP PGUID b t f	1114	1184   16 2542 2539 timestamp_eq_timestamptz eqsel eqjoinsel ));
+DATA(insert OID = 2536 ( "="	   PGNSP PGUID b t f	1114	1184   16 2542 2539 timestamp_eq_timestamptz eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2537 ( ">="	   PGNSP PGUID b f f	1114	1184   16 2541 2534 timestamp_ge_timestamptz scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2537 ( ">="	   PGNSP PGUID b f f	1114	1184   16 2541 2534 timestamp_ge_timestamptz scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2538 ( ">"	   PGNSP PGUID b f f	1114	1184   16 2540 2535 timestamp_gt_timestamptz scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2538 ( ">"	   PGNSP PGUID b f f	1114	1184   16 2540 2535 timestamp_gt_timestamptz scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2539 ( "<>"	   PGNSP PGUID b f f	1114	1184   16 2545 2536 timestamp_ne_timestamptz neqsel neqjoinsel ));
+DATA(insert OID = 2539 ( "<>"	   PGNSP PGUID b f f	1114	1184   16 2545 2536 timestamp_ne_timestamptz neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
-DATA(insert OID = 2540 ( "<"	   PGNSP PGUID b f f	1184	1114   16 2538 2543 timestamptz_lt_timestamp scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2540 ( "<"	   PGNSP PGUID b f f	1184	1114   16 2538 2543 timestamptz_lt_timestamp scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2541 ( "<="	   PGNSP PGUID b f f	1184	1114   16 2537 2544 timestamptz_le_timestamp scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2541 ( "<="	   PGNSP PGUID b f f	1184	1114   16 2537 2544 timestamptz_le_timestamp scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2542 ( "="	   PGNSP PGUID b t f	1184	1114   16 2536 2545 timestamptz_eq_timestamp eqsel eqjoinsel ));
+DATA(insert OID = 2542 ( "="	   PGNSP PGUID b t f	1184	1114   16 2536 2545 timestamptz_eq_timestamp eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2543 ( ">="	   PGNSP PGUID b f f	1184	1114   16 2535 2540 timestamptz_ge_timestamp scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2543 ( ">="	   PGNSP PGUID b f f	1184	1114   16 2535 2540 timestamptz_ge_timestamp scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 2544 ( ">"	   PGNSP PGUID b f f	1184	1114   16 2534 2541 timestamptz_gt_timestamp scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2544 ( ">"	   PGNSP PGUID b f f	1184	1114   16 2534 2541 timestamptz_gt_timestamp scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2545 ( "<>"	   PGNSP PGUID b f f	1184	1114   16 2539 2542 timestamptz_ne_timestamp neqsel neqjoinsel ));
+DATA(insert OID = 2545 ( "<>"	   PGNSP PGUID b f f	1184	1114   16 2539 2542 timestamptz_ne_timestamp neqsel neqjoinsel "mhf"));
 DESCR("not equal");
 
 /* formerly-missing interval + datetime operators */
-DATA(insert OID = 2551 (  "+"	   PGNSP PGUID b f f	1186 1082 1114 1076 0 interval_pl_date - - ));
+DATA(insert OID = 2551 (  "+"	   PGNSP PGUID b f f	1186 1082 1114 1076 0 interval_pl_date - - "---"));
 DESCR("add");
-DATA(insert OID = 2552 (  "+"	   PGNSP PGUID b f f	1186 1266 1266 1802 0 interval_pl_timetz - - ));
+DATA(insert OID = 2552 (  "+"	   PGNSP PGUID b f f	1186 1266 1266 1802 0 interval_pl_timetz - - "---"));
 DESCR("add");
-DATA(insert OID = 2553 (  "+"	   PGNSP PGUID b f f	1186 1114 1114 2066 0 interval_pl_timestamp - - ));
+DATA(insert OID = 2553 (  "+"	   PGNSP PGUID b f f	1186 1114 1114 2066 0 interval_pl_timestamp - - "---"));
 DESCR("add");
-DATA(insert OID = 2554 (  "+"	   PGNSP PGUID b f f	1186 1184 1184 1327 0 interval_pl_timestamptz - - ));
+DATA(insert OID = 2554 (  "+"	   PGNSP PGUID b f f	1186 1184 1184 1327 0 interval_pl_timestamptz - - "---"));
 DESCR("add");
-DATA(insert OID = 2555 (  "+"	   PGNSP PGUID b f f	23	 1082 1082 1100 0 integer_pl_date - - ));
+DATA(insert OID = 2555 (  "+"	   PGNSP PGUID b f f	23	 1082 1082 1100 0 integer_pl_date - - "---"));
 DESCR("add");
 
 /* new operators for Y-direction rtree opfamilies */
-DATA(insert OID = 2570 (  "<<|"    PGNSP PGUID b f f 603 603	16	 0	 0 box_below positionsel positionjoinsel ));
+DATA(insert OID = 2570 (  "<<|"    PGNSP PGUID b f f 603 603	16	 0	 0 box_below positionsel positionjoinsel "---"));
 DESCR("is below");
-DATA(insert OID = 2571 (  "&<|"    PGNSP PGUID b f f 603 603	16	 0	 0 box_overbelow positionsel positionjoinsel ));
+DATA(insert OID = 2571 (  "&<|"    PGNSP PGUID b f f 603 603	16	 0	 0 box_overbelow positionsel positionjoinsel "---"));
 DESCR("overlaps or is below");
-DATA(insert OID = 2572 (  "|&>"    PGNSP PGUID b f f 603 603	16	 0	 0 box_overabove positionsel positionjoinsel ));
+DATA(insert OID = 2572 (  "|&>"    PGNSP PGUID b f f 603 603	16	 0	 0 box_overabove positionsel positionjoinsel "---"));
 DESCR("overlaps or is above");
-DATA(insert OID = 2573 (  "|>>"    PGNSP PGUID b f f 603 603	16	 0	 0 box_above positionsel positionjoinsel ));
+DATA(insert OID = 2573 (  "|>>"    PGNSP PGUID b f f 603 603	16	 0	 0 box_above positionsel positionjoinsel "---"));
 DESCR("is above");
-DATA(insert OID = 2574 (  "<<|"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_below positionsel positionjoinsel ));
+DATA(insert OID = 2574 (  "<<|"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_below positionsel positionjoinsel "---"));
 DESCR("is below");
-DATA(insert OID = 2575 (  "&<|"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_overbelow positionsel positionjoinsel ));
+DATA(insert OID = 2575 (  "&<|"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_overbelow positionsel positionjoinsel "---"));
 DESCR("overlaps or is below");
-DATA(insert OID = 2576 (  "|&>"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_overabove positionsel positionjoinsel ));
+DATA(insert OID = 2576 (  "|&>"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_overabove positionsel positionjoinsel "---"));
 DESCR("overlaps or is above");
-DATA(insert OID = 2577 (  "|>>"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_above positionsel positionjoinsel ));
+DATA(insert OID = 2577 (  "|>>"    PGNSP PGUID b f f 604 604	16	 0	 0 poly_above positionsel positionjoinsel "---"));
 DESCR("is above");
-DATA(insert OID = 2589 (  "&<|"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_overbelow positionsel positionjoinsel ));
+DATA(insert OID = 2589 (  "&<|"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_overbelow positionsel positionjoinsel "---"));
 DESCR("overlaps or is below");
-DATA(insert OID = 2590 (  "|&>"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_overabove positionsel positionjoinsel ));
+DATA(insert OID = 2590 (  "|&>"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_overabove positionsel positionjoinsel "---"));
 DESCR("overlaps or is above");
 
 /* overlap/contains/contained for arrays */
-DATA(insert OID = 2750 (  "&&"	   PGNSP PGUID b f f 2277 2277	16 2750  0 arrayoverlap arraycontsel arraycontjoinsel ));
+DATA(insert OID = 2750 (  "&&"	   PGNSP PGUID b f f 2277 2277	16 2750  0 arrayoverlap arraycontsel arraycontjoinsel "---"));
 DESCR("overlaps");
 #define OID_ARRAY_OVERLAP_OP	2750
-DATA(insert OID = 2751 (  "@>"	   PGNSP PGUID b f f 2277 2277	16 2752  0 arraycontains arraycontsel arraycontjoinsel ));
+DATA(insert OID = 2751 (  "@>"	   PGNSP PGUID b f f 2277 2277	16 2752  0 arraycontains arraycontsel arraycontjoinsel "---"));
 DESCR("contains");
 #define OID_ARRAY_CONTAINS_OP	2751
-DATA(insert OID = 2752 (  "<@"	   PGNSP PGUID b f f 2277 2277	16 2751  0 arraycontained arraycontsel arraycontjoinsel ));
+DATA(insert OID = 2752 (  "<@"	   PGNSP PGUID b f f 2277 2277	16 2751  0 arraycontained arraycontsel arraycontjoinsel "---"));
 DESCR("is contained by");
 #define OID_ARRAY_CONTAINED_OP	2752
 
 /* capturing operators to preserve pre-8.3 behavior of text concatenation */
-DATA(insert OID = 2779 (  "||"	   PGNSP PGUID b f f 25 2776	25	 0 0 textanycat - - ));
+DATA(insert OID = 2779 (  "||"	   PGNSP PGUID b f f 25 2776	25	 0 0 textanycat - - "---"));
 DESCR("concatenate");
-DATA(insert OID = 2780 (  "||"	   PGNSP PGUID b f f 2776 25	25	 0 0 anytextcat - - ));
+DATA(insert OID = 2780 (  "||"	   PGNSP PGUID b f f 2776 25	25	 0 0 anytextcat - - "---"));
 DESCR("concatenate");
 
 /* obsolete names for contains/contained-by operators; remove these someday */
-DATA(insert OID = 2860 (  "@"	   PGNSP PGUID b f f 604 604	16 2861  0 poly_contained contsel contjoinsel ));
+DATA(insert OID = 2860 (  "@"	   PGNSP PGUID b f f 604 604	16 2861  0 poly_contained contsel contjoinsel "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2861 (  "~"	   PGNSP PGUID b f f 604 604	16 2860  0 poly_contain contsel contjoinsel ));
+DATA(insert OID = 2861 (  "~"	   PGNSP PGUID b f f 604 604	16 2860  0 poly_contain contsel contjoinsel "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2862 (  "@"	   PGNSP PGUID b f f 603 603	16 2863  0 box_contained contsel contjoinsel ));
+DATA(insert OID = 2862 (  "@"	   PGNSP PGUID b f f 603 603	16 2863  0 box_contained contsel contjoinsel "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2863 (  "~"	   PGNSP PGUID b f f 603 603	16 2862  0 box_contain contsel contjoinsel ));
+DATA(insert OID = 2863 (  "~"	   PGNSP PGUID b f f 603 603	16 2862  0 box_contain contsel contjoinsel "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2864 (  "@"	   PGNSP PGUID b f f 718 718	16 2865  0 circle_contained contsel contjoinsel ));
+DATA(insert OID = 2864 (  "@"	   PGNSP PGUID b f f 718 718	16 2865  0 circle_contained contsel contjoinsel "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2865 (  "~"	   PGNSP PGUID b f f 718 718	16 2864  0 circle_contain contsel contjoinsel ));
+DATA(insert OID = 2865 (  "~"	   PGNSP PGUID b f f 718 718	16 2864  0 circle_contain contsel contjoinsel "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2866 (  "@"	   PGNSP PGUID b f f 600 603	16	 0	 0 on_pb - - ));
+DATA(insert OID = 2866 (  "@"	   PGNSP PGUID b f f 600 603	16	 0	 0 on_pb - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2867 (  "@"	   PGNSP PGUID b f f 600 602	16 2868  0 on_ppath - - ));
+DATA(insert OID = 2867 (  "@"	   PGNSP PGUID b f f 600 602	16 2868  0 on_ppath - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2868 (  "~"	   PGNSP PGUID b f f 602 600	 16  2867  0 path_contain_pt - - ));
+DATA(insert OID = 2868 (  "~"	   PGNSP PGUID b f f 602 600	 16  2867  0 path_contain_pt - - "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2869 (  "@"	   PGNSP PGUID b f f 600 604	 16  2870  0 pt_contained_poly - - ));
+DATA(insert OID = 2869 (  "@"	   PGNSP PGUID b f f 600 604	 16  2870  0 pt_contained_poly - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2870 (  "~"	   PGNSP PGUID b f f 604 600	 16  2869  0 poly_contain_pt - - ));
+DATA(insert OID = 2870 (  "~"	   PGNSP PGUID b f f 604 600	 16  2869  0 poly_contain_pt - - "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2871 (  "@"	   PGNSP PGUID b f f 600 718	 16  2872  0 pt_contained_circle - - ));
+DATA(insert OID = 2871 (  "@"	   PGNSP PGUID b f f 600 718	 16  2872  0 pt_contained_circle - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2872 (  "~"	   PGNSP PGUID b f f 718 600	 16  2871  0 circle_contain_pt - - ));
+DATA(insert OID = 2872 (  "~"	   PGNSP PGUID b f f 718 600	 16  2871  0 circle_contain_pt - - "---"));
 DESCR("deprecated, use @> instead");
-DATA(insert OID = 2873 (  "@"	   PGNSP PGUID b f f 600 628 16   0  0 on_pl - - ));
+DATA(insert OID = 2873 (  "@"	   PGNSP PGUID b f f 600 628 16   0  0 on_pl - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2874 (  "@"	   PGNSP PGUID b f f 600 601 16   0  0 on_ps - - ));
+DATA(insert OID = 2874 (  "@"	   PGNSP PGUID b f f 600 601 16   0  0 on_ps - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2875 (  "@"	   PGNSP PGUID b f f 601 628 16   0  0 on_sl - - ));
+DATA(insert OID = 2875 (  "@"	   PGNSP PGUID b f f 601 628 16   0  0 on_sl - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2876 (  "@"	   PGNSP PGUID b f f 601 603 16   0  0 on_sb - - ));
+DATA(insert OID = 2876 (  "@"	   PGNSP PGUID b f f 601 603 16   0  0 on_sb - - "---"));
 DESCR("deprecated, use <@ instead");
-DATA(insert OID = 2877 (  "~"	   PGNSP PGUID b f f 1034 1033	 16 0 0 aclcontains - - ));
+DATA(insert OID = 2877 (  "~"	   PGNSP PGUID b f f 1034 1033	 16 0 0 aclcontains - - "---"));
 DESCR("deprecated, use @> instead");
 
 /* uuid operators */
-DATA(insert OID = 2972 (  "="	   PGNSP PGUID b t t 2950 2950 16 2972 2973 uuid_eq eqsel eqjoinsel ));
+DATA(insert OID = 2972 (  "="	   PGNSP PGUID b t t 2950 2950 16 2972 2973 uuid_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 2973 (  "<>"	   PGNSP PGUID b f f 2950 2950 16 2973 2972 uuid_ne neqsel neqjoinsel ));
+DATA(insert OID = 2973 (  "<>"	   PGNSP PGUID b f f 2950 2950 16 2973 2972 uuid_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 2974 (  "<"	   PGNSP PGUID b f f 2950 2950 16 2975 2977 uuid_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2974 (  "<"	   PGNSP PGUID b f f 2950 2950 16 2975 2977 uuid_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 2975 (  ">"	   PGNSP PGUID b f f 2950 2950 16 2974 2976 uuid_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2975 (  ">"	   PGNSP PGUID b f f 2950 2950 16 2974 2976 uuid_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 2976 (  "<="	   PGNSP PGUID b f f 2950 2950 16 2977 2975 uuid_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2976 (  "<="	   PGNSP PGUID b f f 2950 2950 16 2977 2975 uuid_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2977 (  ">="	   PGNSP PGUID b f f 2950 2950 16 2976 2974 uuid_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2977 (  ">="	   PGNSP PGUID b f f 2950 2950 16 2976 2974 uuid_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* pg_lsn operators */
-DATA(insert OID = 3222 (  "="	   PGNSP PGUID b t t 3220 3220 16 3222 3223 pg_lsn_eq eqsel eqjoinsel ));
+DATA(insert OID = 3222 (  "="	   PGNSP PGUID b t t 3220 3220 16 3222 3223 pg_lsn_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3223 (  "<>"	   PGNSP PGUID b f f 3220 3220 16 3223 3222 pg_lsn_ne neqsel neqjoinsel ));
+DATA(insert OID = 3223 (  "<>"	   PGNSP PGUID b f f 3220 3220 16 3223 3222 pg_lsn_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3224 (  "<"	   PGNSP PGUID b f f 3220 3220 16 3225 3227 pg_lsn_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3224 (  "<"	   PGNSP PGUID b f f 3220 3220 16 3225 3227 pg_lsn_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3225 (  ">"	   PGNSP PGUID b f f 3220 3220 16 3224 3226 pg_lsn_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3225 (  ">"	   PGNSP PGUID b f f 3220 3220 16 3224 3226 pg_lsn_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3226 (  "<="	   PGNSP PGUID b f f 3220 3220 16 3227 3225 pg_lsn_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3226 (  "<="	   PGNSP PGUID b f f 3220 3220 16 3227 3225 pg_lsn_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3227 (  ">="	   PGNSP PGUID b f f 3220 3220 16 3226 3224 pg_lsn_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3227 (  ">="	   PGNSP PGUID b f f 3220 3220 16 3226 3224 pg_lsn_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 3228 (  "-"	   PGNSP PGUID b f f 3220 3220 1700    0	0 pg_lsn_mi - - ));
+DATA(insert OID = 3228 (  "-"	   PGNSP PGUID b f f 3220 3220 1700    0	0 pg_lsn_mi - - "---"));
 DESCR("minus");
 
 /* enum operators */
-DATA(insert OID = 3516 (  "="	   PGNSP PGUID b t t 3500 3500 16 3516 3517 enum_eq eqsel eqjoinsel ));
+DATA(insert OID = 3516 (  "="	   PGNSP PGUID b t t 3500 3500 16 3516 3517 enum_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3517 (  "<>"	   PGNSP PGUID b f f 3500 3500 16 3517 3516 enum_ne neqsel neqjoinsel ));
+DATA(insert OID = 3517 (  "<>"	   PGNSP PGUID b f f 3500 3500 16 3517 3516 enum_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3518 (  "<"	   PGNSP PGUID b f f 3500 3500 16 3519 3521 enum_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3518 (  "<"	   PGNSP PGUID b f f 3500 3500 16 3519 3521 enum_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3519 (  ">"	   PGNSP PGUID b f f 3500 3500 16 3518 3520 enum_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3519 (  ">"	   PGNSP PGUID b f f 3500 3500 16 3518 3520 enum_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3520 (  "<="	   PGNSP PGUID b f f 3500 3500 16 3521 3519 enum_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3520 (  "<="	   PGNSP PGUID b f f 3500 3500 16 3521 3519 enum_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3521 (  ">="	   PGNSP PGUID b f f 3500 3500 16 3520 3518 enum_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3521 (  ">="	   PGNSP PGUID b f f 3500 3500 16 3520 3518 enum_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /*
  * tsearch operations
  */
-DATA(insert OID = 3627 (  "<"	   PGNSP PGUID b f f 3614	 3614	 16 3632 3631	 tsvector_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3627 (  "<"	   PGNSP PGUID b f f 3614	 3614	 16 3632 3631	 tsvector_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3628 (  "<="	   PGNSP PGUID b f f 3614	 3614	 16 3631 3632	 tsvector_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3628 (  "<="	   PGNSP PGUID b f f 3614	 3614	 16 3631 3632	 tsvector_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3629 (  "="	   PGNSP PGUID b t f 3614	 3614	 16 3629 3630	 tsvector_eq eqsel eqjoinsel ));
+DATA(insert OID = 3629 (  "="	   PGNSP PGUID b t f 3614	 3614	 16 3629 3630	 tsvector_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3630 (  "<>"	   PGNSP PGUID b f f 3614	 3614	 16 3630 3629	 tsvector_ne neqsel neqjoinsel ));
+DATA(insert OID = 3630 (  "<>"	   PGNSP PGUID b f f 3614	 3614	 16 3630 3629	 tsvector_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3631 (  ">="	   PGNSP PGUID b f f 3614	 3614	 16 3628 3627	 tsvector_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3631 (  ">="	   PGNSP PGUID b f f 3614	 3614	 16 3628 3627	 tsvector_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 3632 (  ">"	   PGNSP PGUID b f f 3614	 3614	 16 3627 3628	 tsvector_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3632 (  ">"	   PGNSP PGUID b f f 3614	 3614	 16 3627 3628	 tsvector_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3633 (  "||"	   PGNSP PGUID b f f 3614	 3614	 3614  0	0	 tsvector_concat   -	-	  ));
+DATA(insert OID = 3633 (  "||"	   PGNSP PGUID b f f 3614	 3614	 3614  0	0	 tsvector_concat   -	-	  "---"));
 DESCR("concatenate");
-DATA(insert OID = 3636 (  "@@"	   PGNSP PGUID b f f 3614	 3615	 16 3637	0	 ts_match_vq   tsmatchsel tsmatchjoinsel ));
+DATA(insert OID = 3636 (  "@@"	   PGNSP PGUID b f f 3614	 3615	 16 3637	0	 ts_match_vq   tsmatchsel tsmatchjoinsel "---"));
 DESCR("text search match");
-DATA(insert OID = 3637 (  "@@"	   PGNSP PGUID b f f 3615	 3614	 16 3636	0	 ts_match_qv   tsmatchsel tsmatchjoinsel ));
+DATA(insert OID = 3637 (  "@@"	   PGNSP PGUID b f f 3615	 3614	 16 3636	0	 ts_match_qv   tsmatchsel tsmatchjoinsel "---"));
 DESCR("text search match");
-DATA(insert OID = 3660 (  "@@@"    PGNSP PGUID b f f 3614	 3615	 16 3661	0	 ts_match_vq   tsmatchsel tsmatchjoinsel ));
+DATA(insert OID = 3660 (  "@@@"    PGNSP PGUID b f f 3614	 3615	 16 3661	0	 ts_match_vq   tsmatchsel tsmatchjoinsel "---"));
 DESCR("deprecated, use @@ instead");
-DATA(insert OID = 3661 (  "@@@"    PGNSP PGUID b f f 3615	 3614	 16 3660	0	 ts_match_qv   tsmatchsel tsmatchjoinsel ));
+DATA(insert OID = 3661 (  "@@@"    PGNSP PGUID b f f 3615	 3614	 16 3660	0	 ts_match_qv   tsmatchsel tsmatchjoinsel "---"));
 DESCR("deprecated, use @@ instead");
-DATA(insert OID = 3674 (  "<"	   PGNSP PGUID b f f 3615	 3615	 16 3679 3678	 tsquery_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3674 (  "<"	   PGNSP PGUID b f f 3615	 3615	 16 3679 3678	 tsquery_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3675 (  "<="	   PGNSP PGUID b f f 3615	 3615	 16 3678 3679	 tsquery_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3675 (  "<="	   PGNSP PGUID b f f 3615	 3615	 16 3678 3679	 tsquery_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3676 (  "="	   PGNSP PGUID b t f 3615	 3615	 16 3676 3677	 tsquery_eq eqsel eqjoinsel ));
+DATA(insert OID = 3676 (  "="	   PGNSP PGUID b t f 3615	 3615	 16 3676 3677	 tsquery_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3677 (  "<>"	   PGNSP PGUID b f f 3615	 3615	 16 3677 3676	 tsquery_ne neqsel neqjoinsel ));
+DATA(insert OID = 3677 (  "<>"	   PGNSP PGUID b f f 3615	 3615	 16 3677 3676	 tsquery_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3678 (  ">="	   PGNSP PGUID b f f 3615	 3615	 16 3675 3674	 tsquery_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3678 (  ">="	   PGNSP PGUID b f f 3615	 3615	 16 3675 3674	 tsquery_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 3679 (  ">"	   PGNSP PGUID b f f 3615	 3615	 16 3674 3675	 tsquery_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3679 (  ">"	   PGNSP PGUID b f f 3615	 3615	 16 3674 3675	 tsquery_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3680 (  "&&"	   PGNSP PGUID b f f 3615	 3615	 3615  0	0	 tsquery_and   -	-	  ));
+DATA(insert OID = 3680 (  "&&"	   PGNSP PGUID b f f 3615	 3615	 3615  0	0	 tsquery_and   -	-	  "---"));
 DESCR("AND-concatenate");
-DATA(insert OID = 3681 (  "||"	   PGNSP PGUID b f f 3615	 3615	 3615  0	0	 tsquery_or   -		-	  ));
+DATA(insert OID = 3681 (  "||"	   PGNSP PGUID b f f 3615	 3615	 3615  0	0	 tsquery_or   -		-	  "---"));
 DESCR("OR-concatenate");
-DATA(insert OID = 3682 (  "!!"	   PGNSP PGUID l f f 0		 3615	 3615  0	0	 tsquery_not   -	-	  ));
+DATA(insert OID = 3682 (  "!!"	   PGNSP PGUID l f f 0		 3615	 3615  0	0	 tsquery_not   -	-	  "---"));
 DESCR("NOT tsquery");
-DATA(insert OID = 3693 (  "@>"	   PGNSP PGUID b f f 3615	 3615	 16 3694	0	 tsq_mcontains	contsel    contjoinsel	 ));
+DATA(insert OID = 3693 (  "@>"	   PGNSP PGUID b f f 3615	 3615	 16 3694	0	 tsq_mcontains	contsel    contjoinsel	 "---"));
 DESCR("contains");
-DATA(insert OID = 3694 (  "<@"	   PGNSP PGUID b f f 3615	 3615	 16 3693	0	 tsq_mcontained contsel    contjoinsel	 ));
+DATA(insert OID = 3694 (  "<@"	   PGNSP PGUID b f f 3615	 3615	 16 3693	0	 tsq_mcontained contsel    contjoinsel	 "---"));
 DESCR("is contained by");
-DATA(insert OID = 3762 (  "@@"	   PGNSP PGUID b f f 25		 25		 16    0	0	 ts_match_tt	contsel    contjoinsel	 ));
+DATA(insert OID = 3762 (  "@@"	   PGNSP PGUID b f f 25		 25		 16    0	0	 ts_match_tt	contsel    contjoinsel	 "---"));
 DESCR("text search match");
-DATA(insert OID = 3763 (  "@@"	   PGNSP PGUID b f f 25		 3615	 16    0	0	 ts_match_tq	contsel    contjoinsel	 ));
+DATA(insert OID = 3763 (  "@@"	   PGNSP PGUID b f f 25		 3615	 16    0	0	 ts_match_tq	contsel    contjoinsel	 "---"));
 DESCR("text search match");
 
 /* generic record comparison operators */
-DATA(insert OID = 2988 (  "="	   PGNSP PGUID b t f 2249 2249 16 2988 2989 record_eq eqsel eqjoinsel ));
+DATA(insert OID = 2988 (  "="	   PGNSP PGUID b t f 2249 2249 16 2988 2989 record_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
 #define RECORD_EQ_OP 2988
-DATA(insert OID = 2989 (  "<>"	   PGNSP PGUID b f f 2249 2249 16 2989 2988 record_ne neqsel neqjoinsel ));
+DATA(insert OID = 2989 (  "<>"	   PGNSP PGUID b f f 2249 2249 16 2989 2988 record_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 2990 (  "<"	   PGNSP PGUID b f f 2249 2249 16 2991 2993 record_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2990 (  "<"	   PGNSP PGUID b f f 2249 2249 16 2991 2993 record_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
 #define RECORD_LT_OP 2990
-DATA(insert OID = 2991 (  ">"	   PGNSP PGUID b f f 2249 2249 16 2990 2992 record_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2991 (  ">"	   PGNSP PGUID b f f 2249 2249 16 2990 2992 record_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
 #define RECORD_GT_OP 2991
-DATA(insert OID = 2992 (  "<="	   PGNSP PGUID b f f 2249 2249 16 2993 2991 record_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 2992 (  "<="	   PGNSP PGUID b f f 2249 2249 16 2993 2991 record_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 2993 (  ">="	   PGNSP PGUID b f f 2249 2249 16 2992 2990 record_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 2993 (  ">="	   PGNSP PGUID b f f 2249 2249 16 2992 2990 record_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* byte-oriented tests for identical rows and fast sorting */
-DATA(insert OID = 3188 (  "*="	   PGNSP PGUID b t f 2249 2249 16 3188 3189 record_image_eq eqsel eqjoinsel ));
+DATA(insert OID = 3188 (  "*="	   PGNSP PGUID b t f 2249 2249 16 3188 3189 record_image_eq eqsel eqjoinsel "mhf"));
 DESCR("identical");
-DATA(insert OID = 3189 (  "*<>"   PGNSP PGUID b f f 2249 2249 16 3189 3188 record_image_ne neqsel neqjoinsel ));
+DATA(insert OID = 3189 (  "*<>"   PGNSP PGUID b f f 2249 2249 16 3189 3188 record_image_ne neqsel neqjoinsel "mhf"));
 DESCR("not identical");
-DATA(insert OID = 3190 (  "*<"	   PGNSP PGUID b f f 2249 2249 16 3191 3193 record_image_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3190 (  "*<"	   PGNSP PGUID b f f 2249 2249 16 3191 3193 record_image_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3191 (  "*>"	   PGNSP PGUID b f f 2249 2249 16 3190 3192 record_image_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3191 (  "*>"	   PGNSP PGUID b f f 2249 2249 16 3190 3192 record_image_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3192 (  "*<="   PGNSP PGUID b f f 2249 2249 16 3193 3191 record_image_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3192 (  "*<="   PGNSP PGUID b f f 2249 2249 16 3193 3191 record_image_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3193 (  "*>="   PGNSP PGUID b f f 2249 2249 16 3192 3190 record_image_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3193 (  "*>="   PGNSP PGUID b f f 2249 2249 16 3192 3190 record_image_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
 
 /* generic range type operators */
-DATA(insert OID = 3882 (  "="	   PGNSP PGUID b t t 3831 3831 16 3882 3883 range_eq eqsel eqjoinsel ));
+DATA(insert OID = 3882 (  "="	   PGNSP PGUID b t t 3831 3831 16 3882 3883 range_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3883 (  "<>"	   PGNSP PGUID b f f 3831 3831 16 3883 3882 range_ne neqsel neqjoinsel ));
+DATA(insert OID = 3883 (  "<>"	   PGNSP PGUID b f f 3831 3831 16 3883 3882 range_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3884 (  "<"	   PGNSP PGUID b f f 3831 3831 16 3887 3886 range_lt rangesel scalarltjoinsel ));
+DATA(insert OID = 3884 (  "<"	   PGNSP PGUID b f f 3831 3831 16 3887 3886 range_lt rangesel scalarltjoinsel "---"));
 DESCR("less than");
 #define OID_RANGE_LESS_OP 3884
-DATA(insert OID = 3885 (  "<="	   PGNSP PGUID b f f 3831 3831 16 3886 3887 range_le rangesel scalarltjoinsel ));
+DATA(insert OID = 3885 (  "<="	   PGNSP PGUID b f f 3831 3831 16 3886 3887 range_le rangesel scalarltjoinsel "---"));
 DESCR("less than or equal");
 #define OID_RANGE_LESS_EQUAL_OP 3885
-DATA(insert OID = 3886 (  ">="	   PGNSP PGUID b f f 3831 3831 16 3885 3884 range_ge rangesel scalargtjoinsel ));
+DATA(insert OID = 3886 (  ">="	   PGNSP PGUID b f f 3831 3831 16 3885 3884 range_ge rangesel scalargtjoinsel "---"));
 DESCR("greater than or equal");
 #define OID_RANGE_GREATER_EQUAL_OP 3886
-DATA(insert OID = 3887 (  ">"	   PGNSP PGUID b f f 3831 3831 16 3884 3885 range_gt rangesel scalargtjoinsel ));
+DATA(insert OID = 3887 (  ">"	   PGNSP PGUID b f f 3831 3831 16 3884 3885 range_gt rangesel scalargtjoinsel "---"));
 DESCR("greater than");
 #define OID_RANGE_GREATER_OP 3887
-DATA(insert OID = 3888 (  "&&"	   PGNSP PGUID b f f 3831 3831 16 3888 0 range_overlaps rangesel areajoinsel ));
+DATA(insert OID = 3888 (  "&&"	   PGNSP PGUID b f f 3831 3831 16 3888 0 range_overlaps rangesel areajoinsel "---"));
 DESCR("overlaps");
 #define OID_RANGE_OVERLAP_OP 3888
-DATA(insert OID = 3889 (  "@>"	   PGNSP PGUID b f f 3831 2283 16 3891 0 range_contains_elem rangesel contjoinsel ));
+DATA(insert OID = 3889 (  "@>"	   PGNSP PGUID b f f 3831 2283 16 3891 0 range_contains_elem rangesel contjoinsel "---"));
 DESCR("contains");
 #define OID_RANGE_CONTAINS_ELEM_OP 3889
-DATA(insert OID = 3890 (  "@>"	   PGNSP PGUID b f f 3831 3831 16 3892 0 range_contains rangesel contjoinsel ));
+DATA(insert OID = 3890 (  "@>"	   PGNSP PGUID b f f 3831 3831 16 3892 0 range_contains rangesel contjoinsel "---"));
 DESCR("contains");
 #define OID_RANGE_CONTAINS_OP 3890
-DATA(insert OID = 3891 (  "<@"	   PGNSP PGUID b f f 2283 3831 16 3889 0 elem_contained_by_range rangesel contjoinsel ));
+DATA(insert OID = 3891 (  "<@"	   PGNSP PGUID b f f 2283 3831 16 3889 0 elem_contained_by_range rangesel contjoinsel "---"));
 DESCR("is contained by");
 #define OID_RANGE_ELEM_CONTAINED_OP 3891
-DATA(insert OID = 3892 (  "<@"	   PGNSP PGUID b f f 3831 3831 16 3890 0 range_contained_by rangesel contjoinsel ));
+DATA(insert OID = 3892 (  "<@"	   PGNSP PGUID b f f 3831 3831 16 3890 0 range_contained_by rangesel contjoinsel "---"));
 DESCR("is contained by");
 #define OID_RANGE_CONTAINED_OP 3892
-DATA(insert OID = 3893 (  "<<"	   PGNSP PGUID b f f 3831 3831 16 3894 0 range_before rangesel scalarltjoinsel ));
+DATA(insert OID = 3893 (  "<<"	   PGNSP PGUID b f f 3831 3831 16 3894 0 range_before rangesel scalarltjoinsel "---"));
 DESCR("is left of");
 #define OID_RANGE_LEFT_OP 3893
-DATA(insert OID = 3894 (  ">>"	   PGNSP PGUID b f f 3831 3831 16 3893 0 range_after rangesel scalargtjoinsel ));
+DATA(insert OID = 3894 (  ">>"	   PGNSP PGUID b f f 3831 3831 16 3893 0 range_after rangesel scalargtjoinsel "---"));
 DESCR("is right of");
 #define OID_RANGE_RIGHT_OP 3894
-DATA(insert OID = 3895 (  "&<"	   PGNSP PGUID b f f 3831 3831 16 0 0 range_overleft rangesel scalarltjoinsel ));
+DATA(insert OID = 3895 (  "&<"	   PGNSP PGUID b f f 3831 3831 16 0 0 range_overleft rangesel scalarltjoinsel "---"));
 DESCR("overlaps or is left of");
 #define OID_RANGE_OVERLAPS_LEFT_OP 3895
-DATA(insert OID = 3896 (  "&>"	   PGNSP PGUID b f f 3831 3831 16 0 0 range_overright rangesel scalargtjoinsel ));
+DATA(insert OID = 3896 (  "&>"	   PGNSP PGUID b f f 3831 3831 16 0 0 range_overright rangesel scalargtjoinsel "---"));
 DESCR("overlaps or is right of");
 #define OID_RANGE_OVERLAPS_RIGHT_OP 3896
-DATA(insert OID = 3897 (  "-|-"    PGNSP PGUID b f f 3831 3831 16 3897 0 range_adjacent contsel contjoinsel ));
+DATA(insert OID = 3897 (  "-|-"    PGNSP PGUID b f f 3831 3831 16 3897 0 range_adjacent contsel contjoinsel "---"));
 DESCR("is adjacent to");
-DATA(insert OID = 3898 (  "+"	   PGNSP PGUID b f f 3831 3831 3831 3898 0 range_union - - ));
+DATA(insert OID = 3898 (  "+"	   PGNSP PGUID b f f 3831 3831 3831 3898 0 range_union - - "---"));
 DESCR("range union");
-DATA(insert OID = 3899 (  "-"	   PGNSP PGUID b f f 3831 3831 3831 0 0 range_minus - - ));
+DATA(insert OID = 3899 (  "-"	   PGNSP PGUID b f f 3831 3831 3831 0 0 range_minus - - "---"));
 DESCR("range difference");
-DATA(insert OID = 3900 (  "*"	   PGNSP PGUID b f f 3831 3831 3831 3900 0 range_intersect - - ));
+DATA(insert OID = 3900 (  "*"	   PGNSP PGUID b f f 3831 3831 3831 3900 0 range_intersect - - "---"));
 DESCR("range intersection");
-DATA(insert OID = 3962 (  "->"	   PGNSP PGUID b f f 114 25 114 0 0 json_object_field - - ));
+DATA(insert OID = 3962 (  "->"	   PGNSP PGUID b f f 114 25 114 0 0 json_object_field - - "---"));
 DESCR("get json object field");
-DATA(insert OID = 3963 (  "->>"    PGNSP PGUID b f f 114 25 25 0 0 json_object_field_text - - ));
+DATA(insert OID = 3963 (  "->>"    PGNSP PGUID b f f 114 25 25 0 0 json_object_field_text - - "---"));
 DESCR("get json object field as text");
-DATA(insert OID = 3964 (  "->"	   PGNSP PGUID b f f 114 23 114 0 0 json_array_element - - ));
+DATA(insert OID = 3964 (  "->"	   PGNSP PGUID b f f 114 23 114 0 0 json_array_element - - "---"));
 DESCR("get json array element");
-DATA(insert OID = 3965 (  "->>"    PGNSP PGUID b f f 114 23 25 0 0 json_array_element_text - - ));
+DATA(insert OID = 3965 (  "->>"    PGNSP PGUID b f f 114 23 25 0 0 json_array_element_text - - "---"));
 DESCR("get json array element as text");
-DATA(insert OID = 3966 (  "#>"	   PGNSP PGUID b f f 114 1009 114 0 0 json_extract_path - - ));
+DATA(insert OID = 3966 (  "#>"	   PGNSP PGUID b f f 114 1009 114 0 0 json_extract_path - - "---"));
 DESCR("get value from json with path elements");
-DATA(insert OID = 3967 (  "#>>"    PGNSP PGUID b f f 114 1009 25 0 0 json_extract_path_text - - ));
+DATA(insert OID = 3967 (  "#>>"    PGNSP PGUID b f f 114 1009 25 0 0 json_extract_path_text - - "---"));
 DESCR("get value from json as text with path elements");
-DATA(insert OID = 3211 (  "->"	   PGNSP PGUID b f f 3802 25 3802 0 0 jsonb_object_field - - ));
+DATA(insert OID = 3211 (  "->"	   PGNSP PGUID b f f 3802 25 3802 0 0 jsonb_object_field - - "---"));
 DESCR("get jsonb object field");
-DATA(insert OID = 3477 (  "->>"    PGNSP PGUID b f f 3802 25 25 0 0 jsonb_object_field_text - - ));
+DATA(insert OID = 3477 (  "->>"    PGNSP PGUID b f f 3802 25 25 0 0 jsonb_object_field_text - - "---"));
 DESCR("get jsonb object field as text");
-DATA(insert OID = 3212 (  "->"	   PGNSP PGUID b f f 3802 23 3802 0 0 jsonb_array_element - - ));
+DATA(insert OID = 3212 (  "->"	   PGNSP PGUID b f f 3802 23 3802 0 0 jsonb_array_element - - "---"));
 DESCR("get jsonb array element");
-DATA(insert OID = 3481 (  "->>"    PGNSP PGUID b f f 3802 23 25 0 0 jsonb_array_element_text - - ));
+DATA(insert OID = 3481 (  "->>"    PGNSP PGUID b f f 3802 23 25 0 0 jsonb_array_element_text - - "---"));
 DESCR("get jsonb array element as text");
-DATA(insert OID = 3213 (  "#>"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_extract_path - - ));
+DATA(insert OID = 3213 (  "#>"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_extract_path - - "---"));
 DESCR("get value from jsonb with path elements");
-DATA(insert OID = 3206 (  "#>>"    PGNSP PGUID b f f 3802 1009 25 0 0 jsonb_extract_path_text - - ));
+DATA(insert OID = 3206 (  "#>>"    PGNSP PGUID b f f 3802 1009 25 0 0 jsonb_extract_path_text - - "---"));
 DESCR("get value from jsonb as text with path elements");
-DATA(insert OID = 3240 (  "="	 PGNSP PGUID b t t 3802 3802  16 3240 3241 jsonb_eq eqsel eqjoinsel ));
+DATA(insert OID = 3240 (  "="	 PGNSP PGUID b t t 3802 3802  16 3240 3241 jsonb_eq eqsel eqjoinsel "mhf"));
 DESCR("equal");
-DATA(insert OID = 3241 (  "<>"	 PGNSP PGUID b f f 3802 3802  16 3241 3240 jsonb_ne neqsel neqjoinsel ));
+DATA(insert OID = 3241 (  "<>"	 PGNSP PGUID b f f 3802 3802  16 3241 3240 jsonb_ne neqsel neqjoinsel "mhf"));
 DESCR("not equal");
-DATA(insert OID = 3242 (  "<"		PGNSP PGUID b f f 3802 3802 16 3243 3245 jsonb_lt scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3242 (  "<"		PGNSP PGUID b f f 3802 3802 16 3243 3245 jsonb_lt scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than");
-DATA(insert OID = 3243 (  ">"		PGNSP PGUID b f f 3802 3802 16 3242 3244 jsonb_gt scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3243 (  ">"		PGNSP PGUID b f f 3802 3802 16 3242 3244 jsonb_gt scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than");
-DATA(insert OID = 3244 (  "<="	PGNSP PGUID b f f 3802 3802 16 3245 3243 jsonb_le scalarltsel scalarltjoinsel ));
+DATA(insert OID = 3244 (  "<="	PGNSP PGUID b f f 3802 3802 16 3245 3243 jsonb_le scalarltsel scalarltjoinsel "mh-"));
 DESCR("less than or equal");
-DATA(insert OID = 3245 (  ">="	PGNSP PGUID b f f 3802 3802 16 3244 3242 jsonb_ge scalargtsel scalargtjoinsel ));
+DATA(insert OID = 3245 (  ">="	PGNSP PGUID b f f 3802 3802 16 3244 3242 jsonb_ge scalargtsel scalargtjoinsel "mh-"));
 DESCR("greater than or equal");
-DATA(insert OID = 3246 (  "@>"	   PGNSP PGUID b f f 3802 3802 16 3250 0 jsonb_contains contsel contjoinsel ));
+DATA(insert OID = 3246 (  "@>"	   PGNSP PGUID b f f 3802 3802 16 3250 0 jsonb_contains contsel contjoinsel "---"));
 DESCR("contains");
-DATA(insert OID = 3247 (  "?"	   PGNSP PGUID b f f 3802 25 16 0 0 jsonb_exists contsel contjoinsel ));
+DATA(insert OID = 3247 (  "?"	   PGNSP PGUID b f f 3802 25 16 0 0 jsonb_exists contsel contjoinsel "---"));
 DESCR("exists");
-DATA(insert OID = 3248 (  "?|"	   PGNSP PGUID b f f 3802 1009 16 0 0 jsonb_exists_any contsel contjoinsel ));
+DATA(insert OID = 3248 (  "?|"	   PGNSP PGUID b f f 3802 1009 16 0 0 jsonb_exists_any contsel contjoinsel "---"));
 DESCR("exists any");
-DATA(insert OID = 3249 (  "?&"	   PGNSP PGUID b f f 3802 1009 16 0 0 jsonb_exists_all contsel contjoinsel ));
+DATA(insert OID = 3249 (  "?&"	   PGNSP PGUID b f f 3802 1009 16 0 0 jsonb_exists_all contsel contjoinsel "---"));
 DESCR("exists all");
-DATA(insert OID = 3250 (  "<@"	   PGNSP PGUID b f f 3802 3802 16 3246 0 jsonb_contained contsel contjoinsel ));
+DATA(insert OID = 3250 (  "<@"	   PGNSP PGUID b f f 3802 3802 16 3246 0 jsonb_contained contsel contjoinsel "---"));
 DESCR("is contained by");
-DATA(insert OID = 3284 (  "||"	   PGNSP PGUID b f f 3802 3802 3802 0 0 jsonb_concat - - ));
+DATA(insert OID = 3284 (  "||"	   PGNSP PGUID b f f 3802 3802 3802 0 0 jsonb_concat - - "---"));
 DESCR("concatenate");
-DATA(insert OID = 3285 (  "-"	   PGNSP PGUID b f f 3802 25 3802 0 0 3302 - - ));
+DATA(insert OID = 3285 (  "-"	   PGNSP PGUID b f f 3802 25 3802 0 0 3302 - - "---"));
 DESCR("delete object field");
-DATA(insert OID = 3286 (  "-"	   PGNSP PGUID b f f 3802 23 3802 0 0 3303 - - ));
+DATA(insert OID = 3286 (  "-"	   PGNSP PGUID b f f 3802 23 3802 0 0 3303 - - "---"));
 DESCR("delete array element");
-DATA(insert OID = 3287 (  "#-"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_delete_path - - ));
+DATA(insert OID = 3287 (  "#-"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_delete_path - - "---"));
 DESCR("delete path");
 
 /*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5a8d0ee..03935af 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -250,6 +250,7 @@ typedef enum NodeTag
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
 	T_MVStatisticInfo,
+	T_RestrictStatData,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1979cdf..b78ee5d 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,12 +15,12 @@
 #define RELATION_H
 
 #include "access/sdir.h"
+#include "access/htup.h"
 #include "lib/stringinfo.h"
 #include "nodes/params.h"
 #include "nodes/parsenodes.h"
 #include "storage/block.h"
 
-
 /*
  * Relids
  *		Set of relation identifiers (indexes into the rangetable).
@@ -1341,6 +1341,26 @@ typedef struct RestrictInfo
 	Selectivity right_bucketsize;		/* avg bucketsize of right side */
 } RestrictInfo;
 
+typedef struct bm_mvstat
+{
+	Bitmapset *attrs;
+	MVStatisticInfo *stats;
+	int			mvkind;
+} bm_mvstat;
+
+typedef struct RestrictStatData
+{
+	NodeTag			type;
+	BoolExprType	 boolop;
+	Node			*clause;
+	Node			*mvclause;
+	Node			*nonmvclause;
+	List			*children;
+	List			*mvstats;
+	Bitmapset		*mvattrs;
+	List			*unusedrinfos;
+} RestrictStatData;
+
 /*
  * Since mergejoinscansel() is a relatively expensive function, and would
  * otherwise be invoked many times while planning a large join tree,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 1445f3f..dd43e45 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -184,13 +184,11 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo,
-					   List *conditions);
+					   SpecialJoinInfo *sjinfo);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo,
-				   List *conditions);
+				   SpecialJoinInfo *sjinfo);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 9711538..6a9bec9 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -84,6 +84,7 @@ extern Oid	get_commutator(Oid opno);
 extern Oid	get_negator(Oid opno);
 extern RegProcedure get_oprrest(Oid opno);
 extern RegProcedure get_oprjoin(Oid opno);
+extern int get_oprmvstat(Oid opno);
 extern char *get_func_name(Oid funcid);
 extern Oid	get_func_namespace(Oid funcid);
 extern Oid	get_func_rettype(Oid funcid);
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index f2fbc11..a08fd58 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -34,6 +34,9 @@ extern int mvstat_search_type;
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
+#define MVSTATISTIC_MCV  1
+#define MVSTATISTIC_HIST 2
+#define MVSTATISTIC_FDEP 4
 
 /*
  * Functional dependencies, tracking column-level relationships (values
-- 
1.8.3.1

#54

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Heikki Linnakangas (#52)

Re: multivariate statistics / patch v7

Hi,

On 07/30/2015 10:21 AM, Heikki Linnakangas wrote:

On 05/25/2015 11:43 PM, Tomas Vondra wrote:

There are 6 files attached, but only 0002-0006 are actually part of the
multivariate statistics patch itself.

All of these patches are huge. In order to review this in a reasonable
amount of time, we need to do this in several steps. So let's see what
would be the minimal set of these patches that could be reviewed and
committed, while still being useful.

The main patches are:

1. shared infrastructure and functional dependencies
2. clause reduction using functional dependencies
3. multivariate MCV lists
4. multivariate histograms
5. multi-statistics estimation

Would it make sense to commit only patches 1 and 2 first? Would that be
enough to get a benefit from this?

I agree that the patch can't be reviewed as a single chunk - that was
the idea when I split the original (single chunk) patch into multiple
smaller pieces.

And yes, I believe committing pieces 1&2 might be enough to get
something useful, which can then be improved by adding the "usual" MCV
and histogram stats on top of that.

I have some doubts about the clause reduction and functional
dependencies part of this. It seems to treat functional dependency as
a boolean property, but even with the classic zipcode and city case,
it's not always an all or nothing thing. At least in some countries,
there can be zipcodes that span multiple cities. So zipcode=X does
not completely imply city=Y, although there is a strong correlation
(if that's the right term). How strong does the correlation need to
be for this patch to decide that zipcode implies city? I couldn't
actually see a clear threshold stated anywhere.

So rather than treating functional dependence as a boolean, I think
it would make more sense to put a 0.0-1.0 number to it. That means
that you can't do clause reduction like it's done in this patch,
where you actually remove clauses from the query for cost esimation
purposes. Instead, you need to calculate the selectivity for each
clause independently, but instead of just multiplying the
selectivities together, apply the "dependence factor" to it.

Does that make sense? I haven't really looked at the MCV, histogram
and "multi-statistics estimation" patches yet. Do those patches make
the clause reduction patch obsolete? Should we forget about the
clause reduction and functional dependency patch, and focus on those
later patches instead?

Perhaps. It's true that most real-world data sets are not 100% valid
with respect to functional dependencies - either because of natural
imperfections (multiple cities with the same ZIP code) or just noise in
the data (incorrect entries ...). And it's even mentioned in the code
comments somewhere, I guess.

But there are two main reasons why I chose not to extend the functional
dependencies with the [0.0-1.0] value you propose.

Firstly, functional dependencies were meant to be the simplest possible
implementation, illustrating how the "infrastructure" is supposed to
work (which is the main topic of the first patch).

Secondly, all kinds of statistics are "simplifications" of the actual
data. So I think it's not incorrect to ignore the exceptions up to some
threshold.

I also don't think this will make the estimates globally better. Let's
say you have 1% of rows that contradict the functional dependency - you
may either ignore them and have good estimates for 99% of the values and
incorrect estimates for 1%, or tweak the rule a bit and make the
estimates worse for 99% (and possibly better for 1%).

That being said, I'm not against improving the functional dependencies.
I already do have some improvements on my TODO - like for example
dependencies on more columns (not just A=>B but [A,B]=>C and such), but
I think we should not squash this into those two patches.

And yet another point - ISTM these cases might easily be handled better
by the statistics based on ndistinct coefficients, as proposed by
Kyotaro-san some time ago. That is, compute and track

ndistinct(A) * ndistinct(B) / ndistinct(A,B)

for all pairs of columns (or possibly larger groups). That seems to be
similar to the coefficient you propose.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#53)

Re: multivariate statistics / patch v7

Hello,

On 07/30/2015 01:26 PM, Kyotaro HORIGUCHI wrote:

Hello, I certainly attached the file this time.

At Mon, 27 Jul 2015 23:54:08 +0200, Tomas Vondra

<tomas.vondra@2ndquadrant.com> wrote in <55B6A880.3050801@2ndquadrant.com>

The bottom-up would work too, probably - I mean, we could start from
leaves of the expression tree, and build the largest "subtree"
compatible with multivariate stats and then try to estimate it. I
don't see how we could pass conditions though, which works naturally
in the top-down approach.

By the way, the 'condition' looks to mean what will be received
by the parameter of clause(list)_selectivity with the same
name. But it is always NIL. Looking at the comment for
collect_mv_attnum, it is prepared for 'multitable statistics'. If
so, I think it's better removed from the current patch, because
it is useless now.

I don't think so. Conditions certainly are not meant for multitable
statistics only (I don't see any comment suggesting that at
collect_mv_attnums), but are actually used with the current code.

For example try this:

create table t (a int, b int, c int);
insert into t select i/100, i/100, i/100
from generate_series(1,100000) s(i);
alter table t add statistics (mcv) on (a,b);
analyze t;

select * from t where a<10 and b < 10 and (a < 50 or b < 50 or c < 50);

What will happen when estimating this query is this:

(1) clauselist_selectivity is called, and sees a list of three clauses:

(a<10)
(b<10)
(a<50 OR b<50 OR c<50)

But there's only a single statistics on columns [a,b] so at this
point we can process only the first two clauses. So we'll do that,
computing

P(a<10, b<10)

and we'll pass the OR-clause to the clause_selectivity() call, along
with the two already estimated clauses as conditions.

(b) clause_selectivity will receive (a<50 OR b<50 OR c<50) as a clause
to estimate, and the two clauses as conditions, computing

P(a<50 OR b<50 OR c<50 | a<10, b<10)

The current estimate for the OR-clause is off, but I believe that's a
bug in the current implementation of clauselist_selectivity_or(), and
we've already discussed that some time ago.

The functional dependency code looks immature in both the
detection phase and application phase in comparison to MCV and
histogram. Addition to that, as the comment in dependencies.c
says, fdep is not so significant (than MCV/HIST) because it is
usually carefully avoided and should be noticed and considered in
designing of application or the whole system.

The code is certainly imperfect and needs improvements, no doubt about
that. I have certainly spent much more time on MCV/histograms.

I'm not sure about stating that functional dependencies are less
significant than MCV/HIST (I don't see any such statement in
dependencies.c). I might have thought that initially, when I opted to
implement fdeps as the simplest possible type of statistics, but I think
it's quite practical, actually.

I however disagree about the last point - it's true that in many cases
the databases are carefully normalized, which mostly makes functional
dependencies irrelevant. But this is only true for OLTP systems, while
the primary target of the patch are DSS/DWH systems. And in those
systems denormalization is a very common practice.

So I don't think fdeps are completely irrelevant - it's quite useful in
some scenarios, actually. Similarly to the ndistinct coefficient stats
that you proposed, for example.

Persisting to apply them all at once doesn't seem to be a good
strategy to be adopted earlier.

Why?

Or perhaps it might be better to register the dependency itself
than registering incomplete information (only the set of colums
involoved in the relationship) and try to detect the relationship
from the given values. I suppose those who can register the
columnset know the precise nature of the dependency in advance.

I don't see how that could be done? I mean, you only have the constants
supplied in the query - how could you verify the functional dependency
based on just those values (or even decide the direction)?

What do you mean by "reconstruct the expression tree"? It's true I'm
walking the expression tree top-down, but how is that reconstructing?

For example clauselist_mv_split does. It separates mvclauses from
original clauselist and apply mv-stats at once and (parhaps) let
the rest be processed in the 'normal' route. I called this as
"reconstruct", which I tried to do explicity and separately.

Ah, I see. Thanks for the explanation. I wouldn't call this
"reconstruction" though - I merely need to track which clauses to
estimate using multivariate stats (and which need to be estimated using
the regular stats). That's pretty much what RestrictStatData does, no?

I find your comments very valuable. I may not agree with some of
them, but I certainly appreciate your point of view. So thank you
very much for the time you spent reviewing this patch so far!

Yeah, thank you for your patience and kindness.

Likewise. It's very frustrating trying to understand complex code
written by someone else, and I appreciate your effort.

Regarding the complexity - I am not too worried about spending
more CPU cycles on this, as long as it does not impact the case
where people have no multivariate statistics at all. That's because
I expect people to use this for large DSS/DWH data sets with lots
of dependencies in the (often denormalized) tables and complex
conditions - in those cases the planning difference is negligible,
especially if the improved estimates make the query run in seconds
instead of hours.

I share the vision with you. If that is the case, the mv-stats
route should not be intrude the existing non-mv-stats route. I
feel you have too much intruded clauselist_selectivity all the
more.

If that is the case, my mv-distinct code has different objective
from you. It aims to save the misestimation from multicolumn
correlations more commonly occurs in OLTP usage.

OK. Let's see if we can make it work for both use cases.

This is why I was so careful to entirely skip the expensive
processing when where were no multivariate stats, and why I don't
like the fact that your approach makes this skip more difficult (or
maybe impossible, I'm not sure).

My code totally skips if transformRestrictionForEstimate returns
NULL and runs clauselist_selectivity as usual. I think almost the
same as yours.

Ah, OK. Perhaps I missed that as I've had trouble applying the patch.

However, if you think it I believe we should not only skipping
calculation but also hiding the additional code blocks which is
overwhelming the normal route. The one of major objectives of my
approach is that point.

My main concern at this point was planning time, so skipping the
calculation should be enough I believe. Hiding the additional code
blocks is a matter of aesthetics, and we can address that by moving it
to a separate method or such.

But sorry. I found that considering multiple stats at every level
cannot be done without exhaustive searching of combinations among
child clauses and needs additional data structure. It needs more
thoughs.. As mentioned later, top-down might be suitable for
this optimization.

Do you think a combined approach - first bottom-up preprocessing, then
top-down optimization (using the results of the first phase to speed
things up) - might work?

Understood. As I explained above, I'm not all that concerned about
the performance impact, as long as we make sure it only applies to
people using the multivariate stats.

I also think a combined approach - first a bottom-up step
(identifying the largest compatible subtrees & caching the varnos),
then a top-down step (doing the same optimization as implemented
today) might minimize the performance impact.

I almost reaching the same conclusion.

Ah, so the answer to my last question is "yes". Now we only need to
actually code it ;-)

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Heikki Linnakangas

hlinnaka@iki.fi

over 10 years ago

In reply to: Tomas Vondra (#54)

Re: multivariate statistics / patch v7

On 07/30/2015 03:55 PM, Tomas Vondra wrote:

On 07/30/2015 10:21 AM, Heikki Linnakangas wrote:

I have some doubts about the clause reduction and functional
dependencies part of this. It seems to treat functional dependency as
a boolean property, but even with the classic zipcode and city case,
it's not always an all or nothing thing. At least in some countries,
there can be zipcodes that span multiple cities. So zipcode=X does
not completely imply city=Y, although there is a strong correlation
(if that's the right term). How strong does the correlation need to
be for this patch to decide that zipcode implies city? I couldn't
actually see a clear threshold stated anywhere.

So rather than treating functional dependence as a boolean, I think
it would make more sense to put a 0.0-1.0 number to it. That means
that you can't do clause reduction like it's done in this patch,
where you actually remove clauses from the query for cost esimation
purposes. Instead, you need to calculate the selectivity for each
clause independently, but instead of just multiplying the
selectivities together, apply the "dependence factor" to it.

Does that make sense? I haven't really looked at the MCV, histogram
and "multi-statistics estimation" patches yet. Do those patches make
the clause reduction patch obsolete? Should we forget about the
clause reduction and functional dependency patch, and focus on those
later patches instead?

Perhaps. It's true that most real-world data sets are not 100% valid
with respect to functional dependencies - either because of natural
imperfections (multiple cities with the same ZIP code) or just noise in
the data (incorrect entries ...). And it's even mentioned in the code
comments somewhere, I guess.

But there are two main reasons why I chose not to extend the functional
dependencies with the [0.0-1.0] value you propose.

Firstly, functional dependencies were meant to be the simplest possible
implementation, illustrating how the "infrastructure" is supposed to
work (which is the main topic of the first patch).

Secondly, all kinds of statistics are "simplifications" of the actual
data. So I think it's not incorrect to ignore the exceptions up to some
threshold.

The problem with a threshold is that around that threshold, even a small
change in the data set can drastically change the produced estimates.
For example, imagine that we know from the stats that zip code implies
city. But then someone adds a single row to the table with an odd zip
code & city combination, which pushes the estimator over the threshold,
and the columns are no longer considered dependent, and the estimates
are now completely different. We should avoid steep cliffs like that.

BTW, what is the threshold in the current patch?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Heikki Linnakangas (#56)

Re: multivariate statistics / patch v7

Hi,

On 07/30/2015 06:58 PM, Heikki Linnakangas wrote:

The problem with a threshold is that around that threshold, even a
small change in the data set can drastically change the produced
estimates. For example, imagine that we know from the stats that zip
code implies city. But then someone adds a single row to the table
with an odd zip code & city combination, which pushes the estimator
over the threshold, and the columns are no longer considered
dependent, and the estimates are now completely different. We should
avoid steep cliffs like that.

BTW, what is the threshold in the current patch?

There's not a simple threshold - the algorithm mining the functional
dependencies is a bit more complicated. I tried to explain it in the
comment before build_mv_dependencies (in dependencies.c), but let me
briefly summarize it here.

To mine dependency [A => B], build_mv_dependencies does this:

(1) sort the sample by {A,B}

(2) split the sample into groups with the same value of A

(3) for each group, decide if it's consistent with the dependency

(a) if the group is too small (less than 3 rows), ignore it

(a) if the group is consistent, update

n_supporting
n_supporting_rows

(b) if the group is inconsistent, update

n_contradicting
n_contradicting_rows

(4) decide whether the dependency is "valid" by checking

n_supporting_rows >= n_contradicting_rows * 10

The limit is rather arbitrary and yes - I can imagine a more complex
condition (e.g. looking at average number of tuples per group etc.), but
I haven't looked into that - the point was to use something very simple,
only to illustrate the infrastructure.

I think we might come up with some elaborate way of associating "degree"
with the functional dependency, but at that point we really loose the
simplicity, and also make it indistinguishable from the remaining
statistics (because it won't be possible to reduce the clauses like
this, before performing the regular estimation). Which is exactly what
makes the functional dependencies so neat and efficient, so I'm not
overly enthusiastic about doing that.

What seems more interesting is implementing the ndistinct coefficient
instead, as proposed by Kyotaro-san - that seems to have the nice
"smooth" behavior you desire, while keeping the simplicity.

Both statistics types (functional dependencies and ndistinct coeff) have
one weak point, though - they somehow assume the queries use
"compatible" values. For example if you use a query with

WHERE city = 'New York' AND zip = 'zip for Detroit'

they can't detect cases like this, because those statistics types are
oblivious to individual values. I don't see this as a fatal flaw, though
- it's rather a consequence of the nature of the stats. And I tend to
look at the functional dependencies the same way.

If you need stats without these "issues" you'll have to use MCV list or
a histogram. Trying to fix the simple statistics types is futile, IMHO.

regards
Tomas

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Michael Paquier

michael.paquier@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#57)

Re: multivariate statistics / patch v7

On Fri, Jul 31, 2015 at 6:28 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

[series of arguments]

If you need stats without these "issues" you'll have to use MCV list or a
histogram. Trying to fix the simple statistics types is futile, IMHO.

Patch is marked as returned with feedback. There has been advanced
discussions and reviews as well.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Josh Berkus

josh@agliodbs.com

over 10 years ago

In reply to: Tomas Vondra (#24)

Re: multivariate statistics / patch v7

Tomas,

attached is v7 of the multivariate stats patch. The main improvement is
major refactoring of the clausesel.c portion - splitting the awfully
long spaghetti-style functions into smaller pieces, making it much more
understandable etc.

So presumably v7 handles varlena attributes as well, yes? I have a
destruction test case for correlated column stats, so I'd like to test
your patch on it.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMbdce762a9eb3c133829c01f4a03b823f03e911e18e83918d922f86f1fac1a39f3616449d62794182170e9765f10776f2@asav-3.01.com

#60

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Josh Berkus (#59)

Re: multivariate statistics / patch v7

Hi,

On 09/24/2015 06:43 PM, Josh Berkus wrote:

Tomas,

attached is v7 of the multivariate stats patch. The main improvement is
major refactoring of the clausesel.c portion - splitting the awfully
long spaghetti-style functions into smaller pieces, making it much more
understandable etc.

So presumably v7 handles varlena attributes as well, yes? I have a
destruction test case for correlated column stats, so I'd like to test
your patch on it.

Yes, it should handle varlena OK. Let me know if you need help with
that, and I'd like to hear feedback - whether it fixed your test case or
not, etc.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 10 years ago

In reply to: Tomas Vondra (#1)

7 attachment(s)

Re: multivariate statistics v8

Hi,

attached is v8 of the multivariate statistics patch (or rather a patch
series). The patch currently has 7 parts, but 0001 is just a fix of the
pull_varnos issue (possibly incorrect/temporary), and 0007 is just an
attempt to add the "multicolumn distinctness" (experimental for now).

There are three noteworthy changes:

1) Correct estimation of OR-clauses - this turned out to be a rather
minor change, thanks to simply transforming the OR-clauses to
AND-clauses, see clauselist_selectivity_or() for details.

2) Abandoning the ALTER TABLE ... ADD STATISTICS syntax and instead
adding separate commands CREATE STATISTICS / DROP STATISTICS, as
proposed in the "multicolumn distinctness" thread:

/messages/by-id/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp

This seems a better approach than the ALTER TABLE one - not only it
nicely fixes the grammar issues, it also naturally extends to
multi-table statistics (despite we don't know how those should work
exactly).

The syntax is this:

CREATE STATISTICS name ON table (columns) WITH (options);

DROP STATISTICS name;

and the 'name' is optional (and if absent, should be generated just
like for indexes, but that's not implemented yet).

The remaining question is how unique the statistics name should be.
My initial plan was to make it unique within a table, but that of
course does not work well with the DROP STATISTICS (it'd have to
specify the table name also), and it'd also now work with statistics
on multiple tables (which is one of the reasons for abandoning ALTER
TABLE stuff).

So I think it should be unique across tables. Statistics are hardly
a global object, so it should be unique within a schema. I thought
that simply using the schema of the table would work, but that of
course breaks with multiple tables in different schemas. So the only
solution seems to be explicit schema for statistics.

3) I've also started hacking on adding the "multicolumn distinctness"
proposed by Horiguchi-san, but I haven't really got that working. It
seems to be a bit more complicated than I anticipated because of the
"only equality conditions" restriction. So the 0007 patch only
really adds basic syntax and trivial build.

I do have bunch of ideas/questions about this statistics type. For
example, should we compute just a single coefficient or the exact
combination of columns specified in CREATE STATISTICS, or perhaps
for some additional subsets? I.e. with

CREATE STATISTICS ON t (a,b,c) WITH (ndistinct);

should we compute just the coefficient for (a,b,c), or maybe also
for (a,b), (b,c) and (a,c)? For N columns there's O(2^N) such
combinations, but perhaps it's acceptable.

Having the coefficient for just the single combination specified in
CREATE STATISTICS makes the estimation difficult when some of the
columns are not specified. For example, with coefficient just for
(a,b,c), what should happen for (WHERE a=1 AND b=2)?

Should we simply ignore the statistics, or apply it anyway and
somehow compensate for the missing columns?

I've also started working on something like a paper, hopefully
explaining the ideas and implementation more clearly and consistently
than possible on a mailing list (thanks to charts, figures and such).
It's available here (both the .tex source and .pdf with the current
version):

https://bitbucket.org/tvondra/mvstats-paper/src

It's not exactly short (~30 pages), and it's certainly incomplete with a
plenty of TODO notes, but hopefully it's already useful and not entirely
bogus.

Comments and questions are welcome - both to the patch and paper.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchtext/x-diff; name=0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchDownload

>From 537ef6c3889754aa9566cae21421371c345143d7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/7] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 32038ce..141a491 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -197,6 +197,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -245,6 +252,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.1.0

0002-shared-infrastructure-and-functional-dependencies.patchtext/x-diff; name=0002-shared-infrastructure-and-functional-dependencies.patchDownload

>From cdbb6d854fc59b576603c25f4567aab831e3d5b3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/7] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate
stats, most importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON table (columns) WITH (options)
- DROP STATISTICS name
- implementation of functional dependencies (the simplest
  type of multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e.
it does not influence the query planning (subject to
follow-up patches).

The current implementation requires a valid 'ltopr' for
the columns, so that we can sort the sample rows in various
ways, both in this patch and other kinds of statistics.
Maybe this restriction could be relaxed in the future,
requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV
list with limited functionality) might be made to work
with hashes of the values, which is sufficient for equality
comparisons. But the queries would require the equality
operator anyway, so it's not really a weaker requirement.
The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple
and probably needs improvements, so that it detects more
complicated dependencies, and also validation of the math.

The name 'functional dependencies' is more correct (than
'association rules') as it's exactly the name used in
relational theory (esp. Normal Forms) for tracking
column-level dependencies.

The multivariate statistics are automatically removed in
two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics
     would be defined on less than 2 columns (remaining)

If there are more at least 2 columns remaining, we keep
the statistics but perform cleanup on the next ANALYZE.
The dropped columns are removed from stakeys, and the new
statistics is built on the smaller set.

We can't do this at DROP COLUMN, because that'd leave us
with invalid statistics, or we'd have to throw it away
although we can still use it. This lazy approach lets us
use the statistics although some of the columns are dead.

This also adds a simple list of statistics to \d in psql.
---
 src/backend/catalog/Makefile               |   1 +
 src/backend/catalog/dependency.c           |  11 +-
 src/backend/catalog/heap.c                 | 102 +++++
 src/backend/catalog/namespace.c            |  49 +++
 src/backend/catalog/objectaddress.c        |  22 +
 src/backend/catalog/system_views.sql       |  11 +
 src/backend/commands/Makefile              |   6 +-
 src/backend/commands/analyze.c             |  21 +
 src/backend/commands/dropcmds.c            |   4 +
 src/backend/commands/event_trigger.c       |   3 +
 src/backend/commands/statscmds.c           | 299 ++++++++++++++
 src/backend/commands/tablecmds.c           |   8 +-
 src/backend/nodes/copyfuncs.c              |  16 +
 src/backend/nodes/outfuncs.c               |  18 +
 src/backend/optimizer/util/plancat.c       |  63 +++
 src/backend/parser/gram.y                  |  71 +++-
 src/backend/tcop/utility.c                 |  11 +
 src/backend/utils/Makefile                 |   2 +-
 src/backend/utils/cache/relcache.c         |  59 +++
 src/backend/utils/cache/syscache.c         |  23 ++
 src/backend/utils/mvstats/Makefile         |  17 +
 src/backend/utils/mvstats/common.c         | 356 ++++++++++++++++
 src/backend/utils/mvstats/common.h         |  75 ++++
 src/backend/utils/mvstats/dependencies.c   | 638 +++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |  42 ++
 src/include/catalog/dependency.h           |   5 +-
 src/include/catalog/heap.h                 |   1 +
 src/include/catalog/indexing.h             |   7 +
 src/include/catalog/namespace.h            |   2 +
 src/include/catalog/pg_mv_statistic.h      |  71 ++++
 src/include/catalog/pg_proc.h              |   5 +
 src/include/catalog/toasting.h             |   1 +
 src/include/commands/defrem.h              |   4 +
 src/include/nodes/nodes.h                  |   2 +
 src/include/nodes/parsenodes.h             |  11 +
 src/include/nodes/relation.h               |  28 ++
 src/include/parser/kwlist.h                |   2 +-
 src/include/utils/mvstats.h                |  69 ++++
 src/include/utils/rel.h                    |   4 +
 src/include/utils/relcache.h               |   1 +
 src/include/utils/syscache.h               |   2 +
 src/test/regress/expected/rules.out        |   8 +
 src/test/regress/expected/sanity_check.out |   1 +
 43 files changed, 2139 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 25130ec..058b8a9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index efca34c..32a9ee3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -159,7 +160,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1271,6 +1273,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2414,6 +2420,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 04c4f8f..5176f86 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -46,6 +46,7 @@
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1612,7 +1613,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1840,6 +1844,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2695,6 +2704,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 6644c6f..178f565 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4201,3 +4201,52 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid1(MVSTATNAME,
+										PointerGetDatum(stats_name));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid1(MVSTATNAME,
+									  PointerGetDatum(stats_name));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index e44d7d0..b2bcf1f 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -37,6 +37,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -436,9 +437,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAME,
+		Anum_pg_mv_statistic_staname,
+		InvalidAttrNumber,		/* FIXME probably should have namespace */
+		InvalidAttrNumber,		/* XXX same owner as relation */
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -911,6 +925,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2183,6 +2202,9 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			/* FIXME do the right owner checks here */
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 536c805..e3f3387 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,6 +158,17 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..5151001 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index ddb68ab..fa18903 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -55,7 +56,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Per-index data for ANALYZE */
 typedef struct AnlIndexData
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index f04f4f5..7d6318d 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -292,6 +292,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 3d1cb0b..baea9dd 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -110,6 +110,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1167,6 +1169,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_DEFACL:
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..3790082
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,299 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/multixact.h"
+#include "access/reloptions.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/dependency.h"
+#include "catalog/heap.h"
+#include "catalog/index.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_constraint.h"
+#include "catalog/pg_depend.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_inherits.h"
+#include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_tablespace.h"
+#include "catalog/pg_trigger.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_type_fn.h"
+#include "catalog/storage.h"
+#include "catalog/toasting.h"
+#include "commands/cluster.h"
+#include "commands/comment.h"
+#include "commands/defrem.h"
+#include "commands/event_trigger.h"
+#include "commands/policy.h"
+#include "commands/sequence.h"
+#include "commands/tablecmds.h"
+#include "commands/tablespace.h"
+#include "commands/trigger.h"
+#include "commands/typecmds.h"
+#include "commands/user.h"
+#include "executor/executor.h"
+#include "foreign/foreign.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/parsenodes.h"
+#include "optimizer/clauses.h"
+#include "optimizer/planner.h"
+#include "parser/parse_clause.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_collate.h"
+#include "parser/parse_expr.h"
+#include "parser/parse_oper.h"
+#include "parser/parse_relation.h"
+#include "parser/parse_type.h"
+#include "parser/parse_utilcmd.h"
+#include "parser/parser.h"
+#include "pgstat.h"
+#include "rewrite/rewriteDefine.h"
+#include "rewrite/rewriteHandler.h"
+#include "rewrite/rewriteManip.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "storage/lock.h"
+#include "storage/predicate.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "utils/typcache.h"
+#include "utils/mvstats.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON table (columns) WITH (options)
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+	ObjectAddress	address = InvalidObjectAddress;
+	NameData	staname;
+	Oid			statoid;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	ObjectAddress parentobject, childobject;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+
+	/* transform the column names to attnum values */
+
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, stmt->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	namestrcpy(&staname, stmt->statsname);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+	values[Anum_pg_mv_statistic_staname -1] = NameGetDatum(&staname);
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *     DROP STATISTICS stats_name ON table_name
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	HeapTuple	tup;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 56fed4d..f86d716 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -93,7 +94,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -141,8 +142,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index ba04b72..0ca2d35 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4118,6 +4118,19 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt  *newnode = makeNode(CreateStatsStmt);
+
+	COPY_STRING_FIELD(statsname);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4965,6 +4978,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 63fae82..cae21d0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1939,6 +1939,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3358,6 +3373,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_CreateStmt:
 				_outCreateStmt(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9442e5f..60fd57f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -39,7 +40,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -93,6 +96,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -381,6 +385,65 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+			if (! HeapTupleIsValid(htup))
+				continue;
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c4bed8a..5446870 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -241,7 +241,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -375,6 +375,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <node>	group_by_item empty_grouping_set rollup_clause cube_clause
 %type <node>	grouping_sets_clause
 
+%type <list>	OptStatsOptions
+%type <str>		opt_stats_name stats_name stats_options_name
+%type <node>	stats_options_arg
+%type <defelt>	stats_options_elem
+%type <list>	stats_options_list
+
 %type <list>	opt_fdw_options fdw_options
 %type <defelt>	fdw_option
 
@@ -809,6 +815,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3436,6 +3443,65 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS opt_stats_name ON qualified_name '(' columnList ')' OptStatsOptions
+					{
+						CreateStatsStmt *n = makeNode(CreateStatsStmt);
+						n->statsname = $3;
+						n->relation = $5;
+						n->keys = $7;
+						n->options = $9;
+						$$ = (Node *)n;
+					}
+			;
+
+opt_stats_name:
+			stats_name						{ $$ = $1; }
+			| /*EMPTY*/						{ $$ = NULL; }
+		;
+
+stats_name: ColId							{ $$ = $1; };
+
+OptStatsOptions:
+			WITH '(' stats_options_list ')'	{ $$ = $3; }
+			| /*EMPTY*/						{ $$ = NIL; }
+		;
+
+stats_options_list:
+			stats_options_elem
+				{
+					$$ = list_make1($1);
+				}
+			| stats_options_list ',' stats_options_elem
+				{
+					$$ = lappend($1, $3);
+				}
+		;
+
+stats_options_elem:
+			stats_options_name stats_options_arg
+				{
+					$$ = makeDefElem($1, $2);
+				}
+		;
+
+stats_options_name:
+			NonReservedWord			{ $$ = $1; }
+		;
+
+stats_options_arg:
+			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
+			| NumericOnly			{ $$ = (Node *) $1; }
+			| /* EMPTY */			{ $$ = NULL; }
+		;
+
 
 /*****************************************************************************
  *
@@ -5621,6 +5687,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
@@ -13860,7 +13927,6 @@ unreserved_keyword:
 			| STANDALONE_P
 			| START
 			| STATEMENT
-			| STATISTICS
 			| STDIN
 			| STDOUT
 			| STORAGE
@@ -14077,6 +14143,7 @@ reserved_keyword:
 			| SELECT
 			| SESSION_USER
 			| SOME
+			| STATISTICS
 			| SYMMETRIC
 			| TABLE
 			| THEN
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index e81bbc6..7029278 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1520,6 +1520,10 @@ ProcessUtilitySlow(Node *parsetree,
 				address = ExecSecLabelStmt((SecLabelStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:	/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -2160,6 +2164,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_TRANSFORM:
 					tag = "DROP TRANSFORM";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2527,6 +2534,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6b0c0b7..b6473bb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3922,6 +3923,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4891,6 +4948,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index efce7b9..ced92cd 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -501,6 +502,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAME */
+		MvStatisticNameIndexId,
+		1,
+		{
+			Anum_pg_mv_statistic_staname,
+			0,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..6d5465b
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..84b6561
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,638 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Mine functional dependencies between columns, in the form (A => B),
+ * meaning that a value in column 'A' determines value in 'B'. A simple
+ * artificial example may be a table created like this
+ *
+ *     CREATE TABLE deptest (a INT, b INT)
+ *        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+ *
+ * Clearly, once we know the value for 'A' we can easily determine the
+ * value of 'B' by dividing (A/10). A more practical example may be
+ * addresses, where (ZIP code => city name), i.e. once we know the ZIP,
+ * we probably know which city it belongs to. Larger cities usually have
+ * multiple ZIP codes, so the dependency can't be reversed.
+ *
+ * Functional dependencies are a concept well described in relational
+ * theory, especially in definition of normalization and "normal forms".
+ * Wikipedia has a nice definition of a functional dependency [1]:
+ *
+ *     In a given table, an attribute Y is said to have a functional
+ *     dependency on a set of attributes X (written X -> Y) if and only
+ *     if each X value is associated with precisely one Y value. For
+ *     example, in an "Employee" table that includes the attributes
+ *     "Employee ID" and "Employee Date of Birth", the functional
+ *     dependency {Employee ID} -> {Employee Date of Birth} would hold.
+ *     It follows from the previous two sentences that each {Employee ID}
+ *     is associated with precisely one {Employee Date of Birth}.
+ *
+ * [1] http://en.wikipedia.org/wiki/Database_normalization
+ *
+ * Most datasets might be normalized not to contain any such functional
+ * dependencies, but sometimes it's not practical. In some cases it's
+ * actually a conscious choice to model the dataset in denormalized way,
+ * either because of performance or to make querying easier.
+ *
+ * The current implementation supports only dependencies between two
+ * columns, but this is merely a simplification of the initial patch.
+ * It's certainly useful to mine for dependencies involving multiple
+ * columns on the 'left' side, i.e. a condition for the dependency.
+ * That is dependencies [A,B] => C and so on.
+ *
+ * TODO The implementation may/should be smart enough not to mine both
+ *      [A => B] and [A,C => B], because the second dependency is a
+ *      consequence of the first one (if values of A determine values
+ *      of B, adding another column won't change that). The ANALYZE
+ *      should first analyze 1:1 dependencies, then 2:1 dependencies
+ *      (and skip the already identified ones), etc.
+ *
+ * For example the dependency [city name => zip code] is much weaker
+ * than [city name, state name => zip code], because there may be
+ * multiple cities with the same name in various states. It's not
+ * perfect though - there are probably cities with the same name within
+ * the same state, but this is relatively rare occurence hopefully.
+ * More about this in the section about dependency mining.
+ *
+ * Handling multiple columns on the right side is not necessary, as such
+ * dependencies may be decomposed into a set of dependencies with
+ * the same meaning, one for each column on the right side. For example
+ *
+ *     A => [B,C]
+ *
+ * is exactly the same as
+ *
+ *     (A => B) & (A => C).
+ *
+ * Of course, storing (A => [B, C]) may be more efficient thant storing
+ * the two dependencies (A => B) and (A => C) separately.
+ *
+ *
+ * Dependency mining (ANALYZE)
+ * ---------------------------
+ *
+ * The current build algorithm is rather simple - for each pair [A,B] of
+ * columns, the data are sorted lexicographically (first by A, then B),
+ * and then a number of metrics is computed by walking the sorted data.
+ *
+ * In general the algorithm counts distict values of A (forming groups
+ * thanks to the sorting), supporting or contradicting the hypothesis
+ * that A => B (i.e. that values of B are predetermined by A). If there
+ * are multiple values of B for a single value of A, it's counted as
+ * contradicting.
+ *
+ * A group may be neither supporting nor contradicting. To be counted as
+ * supporting, the group has to have at least min_group_size(=3) rows.
+ * Smaller 'supporting' groups are counted as neutral.
+ *
+ * Finally, the number of rows in supporting and contradicting groups is
+ * compared, and if there is at least 10x more supporting rows, the
+ * dependency is considered valid.
+ *
+ *
+ * Real-world datasets are imperfect - there may be errors (e.g. due to
+ * data-entry mistakes), or factually correct records, yet contradicting
+ * the dependency (e.g. when a city splits into two, but both keep the
+ * same ZIP code). A strict ANALYZE implementation (where the functional
+ * dependencies are identified) would ignore dependencies on such noisy
+ * data, making the approach unusable in practice.
+ *
+ * The proposed implementation attempts to handle such noisy cases
+ * gracefully, by tolerating small number of contradicting cases.
+ *
+ * In the future this might also perform some sort of test and decide
+ * whether it's worth building any other kind of multivariate stats,
+ * or whether the dependencies sufficiently describe the data. Or at
+ * least not build the MCV list / histogram on the implied columns.
+ * Such reduction would however make the 'verification' (see the next
+ * section) impossible.
+ *
+ *
+ * Clause reduction (planner/optimizer)
+ * ------------------------------------
+ *
+ * Apllying the dependencies is quite simple - given a list of clauses,
+ * try to apply all the dependencies. For example given clause list
+ *
+ *    (a = 1) AND (b = 1) AND (c = 1) AND (d < 100)
+ *
+ * and dependencies [a=>b] and [a=>d], this may be reduced to
+ *
+ *    (a = 1) AND (c = 1) AND (d < 100)
+ *
+ * The (d<100) can't be reduced as it's not an equality clause, so the
+ * dependency [a=>d] can't be applied.
+ *
+ * See clauselist_apply_dependencies() for more details.
+ *
+ * The problem with the reduction is that the query may use conditions
+ * that are not redundant, but in fact contradictory - e.g. the user
+ * may search for a ZIP code and a city name not matching the ZIP code.
+ *
+ * In such cases, the condition on the city name is not actually
+ * redundant, but actually contradictory (making the result empty), and
+ * removing it while estimating the cardinality will make the estimate
+ * worse.
+ *
+ * The current estimation assuming independence (and multiplying the
+ * selectivities) works better in this case, but only by utter luck.
+ *
+ * In some cases this might be verified using the other multivariate
+ * statistics - MCV lists and histograms. For MCV lists the verification
+ * might be very simple - peek into the list if there are any items
+ * matching the clause on the 'A' column (e.g. ZIP code), and if such
+ * item is found, check that the 'B' column matches the other clause.
+ * If it does not, the clauses are contradictory. We can't really say
+ * if such item was not found, except maybe restricting the selectivity
+ * using the MCV data (e.g. using min/max selectivity, or something).
+ *
+ * With histograms, it might work similarly - we can't check the values
+ * directly (because histograms use buckets, unlike MCV lists, storing
+ * the actual values). So we can only observe the buckets matching the
+ * clauses - if those buckets have very low frequency, it probably means
+ * the two clauses are incompatible.
+ *
+ * It's unclear what 'low frequency' is, but if one of the clauses is
+ * implied (automatically true because of the other clause), then
+ *
+ *     selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+ *
+ * So we might compute selectivity of the first clause (on the column
+ * A in dependency [A=>B]) - for example using regular statistics.
+ * And then check if the selectivity computed from the histogram is
+ * about the same (or significantly lower).
+ *
+ * The problem is that histograms work well only when the data ordering
+ * matches the natural meaning. For values that serve as labels - like
+ * city names or ZIP codes, or even generated IDs, histograms really
+ * don't work all that well. For example sorting cities by name won't
+ * match the sorting of ZIP codes, rendering the histogram unusable.
+ *
+ * The MCV are probably going to work much better, because they don't
+ * really assume any sort of ordering. And it's probably more appropriate
+ * for the label-like data.
+ *
+ * TODO Support dependencies with multiple columns on left/right.
+ *
+ * TODO Investigate using histogram and MCV list to confirm the
+ *      functional dependencies.
+ *
+ * TODO Investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram/MCV list).
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ *
+ * TODO Consider eliminating the implied columns from the histogram and
+ *      MCV lists (but maybe that's not a good idea, because that'd make
+ *      it impossible to use these stats for non-equality clauses and
+ *      also it wouldn't be possible to use the stats for verification
+ *      of the dependencies as proposed in another TODO).
+ *
+ * TODO This builds a complete set of dependencies, i.e. including
+ *      transitive dependencies - if we identify [A => B] and [B => C],
+ *      we're likely to identify [A => C] too. It might be better to
+ *      keep only the minimal set of dependencies, i.e. prune all the
+ *      dependencies that we can recreate by transivitity.
+ *
+ *      There are two conceptual ways to do that:
+ *
+ *      (a) generate all the rules, and then prune the rules that may
+ *          be recteated by combining other dependencies, or
+ *
+ *      (b) performing the 'is combination of other dependencies' check
+ *          before actually doing the work
+ *
+ *      The second option has the advantage that we don't really need
+ *      to perform the sort/count. It's not sufficient alone, though,
+ *      because we may discover the dependencies in the wrong order.
+ *      For example [A => B], [A => C] and then [B => C]. None of those
+ *      dependencies is a combination of the already known ones, yet
+ *      [A => C] is a combination of [A => B] and [B => C].
+ *
+ * FIXME Not sure the current NULL handling makes much sense. We assume
+ *       that NULL is 0, so it's handled like a regular value
+ *       (NULL == NULL), so all NULLs in a single column form a single
+ *       group. Maybe that's not the right thing to do, especially with
+ *       equality conditions - in that case NULLs are irrelevant. So
+ *       maybe the right solution would be to just ignore NULL values?
+ *
+ *       However simply "ignoring" the NULL values does not seem like
+ *       a good idea - imagine columns A and B, where for each value of
+ *       A, values in B are constant (same for the whole group) or NULL.
+ *       Let's say only 10% of B values in each group is not NULL. Then
+ *       ignoring the NULL values will result in 10x misestimate (and
+ *       it's trivial to construct arbitrary errors). So maybe handling
+ *       NULL values just like a regular value is the right thing here.
+ *
+ *       Or maybe NULL values should be treated differently on each side
+ *       of the dependency? E.g. as ignored on the left (condition) and
+ *       as regular values on the right - this seems consistent with how
+ *       equality clauses work, as equality clause means 'NOT NULL'.
+ *       So if we say [A => B] then it may also imply "NOT NULL" on the
+ *       right side.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct values in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we can estimate the
+	 *      average group size and use it as a threshold. Or something
+	 *      like that. Seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index bb59bc2..f6d60ad 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2104,6 +2104,48 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90500)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, staname, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name */
+					appendPQExpBuffer(&buf, "%s ", PQgetvalue(result, i, 1));
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 3), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 7));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index fbcf904..9a5c397 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,10 +153,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 
 /* in dependency.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index e6ac394..36debeb 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index c38958d..e171ae6 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops));
+#define MvStatisticNameIndexId  3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index b6ad934..9bb59f9 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -137,6 +137,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..8c33a92
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,71 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;			/* relation containing attributes */
+	NameData	staname;			/* statistics name */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_attrdef
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					6
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_deps_enabled		3
+#define Anum_pg_mv_statistic_deps_built			4
+#define Anum_pg_mv_statistic_stakeys			5
+#define Anum_pg_mv_statistic_stadeps			6
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index d8640db..85c638d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2739,6 +2739,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index fb2f035..b7c878d 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3577, 3578);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index adae296..3adb956 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -75,6 +75,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(List *name, List *args, bool oldstyle,
 				List *parameters, const char *queryString);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 603edd3..ece0776 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -251,6 +251,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -381,6 +382,7 @@ typedef enum NodeTag
 	T_CreatePolicyStmt,
 	T_AlterPolicyStmt,
 	T_CreateTransformStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 9142e94..3650897 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -596,6 +596,16 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	char	   *statsname;		/* name of new statistics, or NULL for default */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1405,6 +1415,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5393005..baa0c88 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -479,6 +479,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -573,6 +574,33 @@ typedef struct IndexOptInfo
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 812ca83..daefcef 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -361,7 +361,7 @@ PG_KEYWORD("stable", STABLE, UNRESERVED_KEYWORD)
 PG_KEYWORD("standalone", STANDALONE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("start", START, UNRESERVED_KEYWORD)
 PG_KEYWORD("statement", STATEMENT, UNRESERVED_KEYWORD)
-PG_KEYWORD("statistics", STATISTICS, UNRESERVED_KEYWORD)
+PG_KEYWORD("statistics", STATISTICS, RESERVED_KEYWORD)
 PG_KEYWORD("stdin", STDIN, UNRESERVED_KEYWORD)
 PG_KEYWORD("stdout", STDOUT, UNRESERVED_KEYWORD)
 PG_KEYWORD("storage", STORAGE, UNRESERVED_KEYWORD)
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..411cd16
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 8a55a09..4d6edb6 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -79,6 +79,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -111,6 +112,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 6953281..77efeff 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 18404e2..bff702e 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAME,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 80374e4..428b1e8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1365,6 +1365,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
2.1.0

0003-clause-reduction-using-functional-dependencies.patchtext/x-diff; name=0003-clause-reduction-using-functional-dependencies.patchDownload

>From 039046f31843f2747a4fef4ed49b830b492ee459 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/7] clause reduction using functional dependencies

During planning, use functional dependencies to decide which
clauses to skip during cardinality estimation. Initial and
rather simplistic implementation.

This only works with regular WHERE clauses, not clauses used
for join clauses.

Note: The clause_is_mv_compatible() needs to identify the
relation (so that we can fetch the list of multivariate stats
by OID). planner_rt_fetch() seems like the appropriate way to
get the relation OID, but apparently it only works with simple
vars. Maybe examine_variable() would make this work with more
complex vars too?

Includes regression tests analyzing functional dependencies
(part of ANALYZE) on several datasets (no dependencies, no
transitive dependencies, ...).

Checks that a query with conditions on two columns, where one (B)
is functionally dependent on the other one (A), correctly ignores
the clause on (B) and chooses bitmap index scan instead of plain
index scan (which is what happens otherwise, thanks to assumption
of independence).

Note: Functional dependencies only work with equality clauses,
no inequalities etc.
---
 src/backend/optimizer/path/clausesel.c        | 912 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 +
 src/include/utils/mvstats.h                   |  16 +-
 src/test/regress/expected/mv_dependencies.out | 172 +++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 +++++
 8 files changed, 1278 insertions(+), 5 deletions(-)
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 6ce2726..c7f17e3 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,44 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+
+static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+								Oid varRelid, List *stats,
+								SpecialJoinInfo *sjinfo);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, List *clauses,
+						 Oid varRelid, Index *relid);
+ 
+static Bitmapset* fdeps_collect_attnums(List *stats);
+
+static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
+static int	*make_attnum_to_idx_mapping(Bitmapset *attnums);
+
+static bool	*build_adjacency_matrix(List *stats, Bitmapset *attnums,
+								int *idx_to_attnum, int *attnum_to_idx);
+
+static void	multiply_adjacency_matrix(bool *matrix, int natts);
+
+static List* fdeps_reduce_clauses(List *clauses,
+								  Bitmapset *attnums, bool *matrix,
+								  int *idx_to_attnum, int *attnum_to_idx,
+								  Index relid);
+
+static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static Bitmapset * get_varattnos(Node * node, Index relid);
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +103,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -87,6 +130,88 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  *
  * Of course this is all very dependent on the behavior of
  * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ *
+ *
+ * Multivariate statististics
+ * --------------------------
+ * This also uses multivariate stats to estimate combinations of
+ * conditions, in a way (a) maximizing the estimate accuracy by using
+ * as many stats as possible, and (b) minimizing the overhead,
+ * especially when there are no suitable multivariate stats (so if you
+ * are not using multivariate stats, there's no additional overhead).
+ *
+ * The following checks are performed (in this order), and the optimizer
+ * falls back to regular stats on the first 'false'.
+ *
+ * NOTE: This explains how this works with all the patches applied, not
+ *       just the functional dependencies.
+ *
+ * (0) check if there are multivariate stats on the relation
+ *
+ *     If no, just skip all the following steps (directly to the
+ *     original code).
+ *
+ * (1) check how many attributes are there in conditions compatible
+ *     with functional dependencies
+ *
+ *     Only simple equality clauses are considered compatible with
+ *     functional dependencies (and that's unlikely to change, because
+ *     that's the only case when functional dependencies are useful).
+ *
+ *     If there are no conditions that might be handled by multivariate
+ *     stats, or if the conditions reference just a single column, it
+ *     makes no sense to use functional dependencies, so skip to (4).
+ *
+ * (2) reduce the clauses using functional dependencies
+ *
+ *     This simply attempts to 'reduce' the clauses by applying functional
+ *     dependencies. For example if there are two clauses:
+ *
+ *         WHERE (a = 1) AND (b = 2)
+ *
+ *     and we know that 'a' determines the value of 'b', we may remove
+ *     the second condition (b = 2) when computing the selectivity.
+ *     This is of course tricky - see mvstats/dependencies.c for details.
+ *
+ *     After the reduction, step (1) is to be repeated.
+ *
+ * (3) check how many attributes are there in conditions compatible
+ *     with MCV lists and histograms
+ *
+ *     What conditions are compatible with multivariate stats is decided
+ *     by clause_is_mv_compatible(). At this moment, only conditions
+ *     of the form "column operator constant" (for simple comparison
+ *     operators), IS [NOT] NULL and some AND/OR clauses are considered
+ *     compatible with multivariate statistics.
+ *
+ *     Again, see clause_is_mv_compatible() for details.
+ *
+ * (4) check how many attributes are there in conditions compatible
+ *     with MCV lists and histograms
+ *
+ *     If there are no conditions that might be handled by MCV lists
+ *     or histograms, or if the conditions reference just a single
+ *     column, it makes no sense to continue, so just skip to (7).
+ *
+ * (5) choose the stats matching the most columns
+ *
+ *     If there are multiple instances of multivariate statistics (e.g.
+ *     built on different sets of columns), we choose the stats covering
+ *     the most columns from step (1). It may happen that all available
+ *     stats match just a single column - for example with conditions
+ *
+ *         WHERE a = 1 AND b = 2
+ *
+ *     and statistics built on (a,c) and (b,c). In such case just fall
+ *     back to the regular stats because it makes no sense to use the
+ *     multivariate statistics.
+ *
+ *     For more details about how exactly we choose the stats, see
+ *     choose_mv_statistics().
+ *
+ * (6) use the multivariate stats to estimate matching clauses
+ *
+ * (7) estimate the remaining clauses using the regular statistics
  */
 Selectivity
 clauselist_selectivity(PlannerInfo *root,
@@ -99,6 +224,16 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+	List	   *stats = NIL;
+
+	/* use clauses (not conditions), because those are always non-empty */
+	stats = find_stats(root, clauses, varRelid, &relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +243,31 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Check that there are some stats with functional dependencies
+	 * built (by walking the stats list). We're going to find that
+	 * anyway when trying to apply the functional dependencies, but
+	 * this is probably a tad faster.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+	{
+		/* collect attributes referenced by mv-compatible clauses */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+
+		/*
+		 * If there are mv-compatible clauses, referencing at least two
+		 * different columns (otherwise it makes no sense to use mv stats),
+		 * try to reduce the clauses using functional dependencies, and
+		 * recollect the attributes from the reduced list.
+		 *
+		 * We don't need to select a single statistics for this - we can
+		 * apply all the functional dependencies we have.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+			clauses = clauselist_apply_dependencies(root, clauses, varRelid,
+													stats, sjinfo);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +923,753 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
+				   Index *relid, SpecialJoinInfo *sjinfo)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* no support for OR clauses at this point */
+		if (rinfo->orclause)
+			return false;
+
+		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
+		clause = (Node*)rinfo->clause;
+
+		/* only simple opclauses are compatible with multivariate stats */
+		if (! is_opclause(clause))
+			return false;
+
+		/* we don't support join conditions at this moment */
+		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+			return false;
+
+		/* is it 'variable op constant' ? */
+		if (list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+				 * (return NULL).
+				 *
+				 * TODO Maybe use examine_variable() would fix that?
+				 */
+				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+					return false;
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 *
+				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+				 *       part seems to be enforced by treat_as_join_clause().
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				*relid = var->varno;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_EQSEL:
+							*attnum = var->varattno;
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * Performs reduction of clauses using functional dependencies, i.e.
+ * removes clauses that are considered redundant. It simply walks
+ * through dependencies, and checks whether the dependency 'matches'
+ * the clauses, i.e. if there's a clause matching the condition. If yes,
+ * all clauses matching the implied part of the dependency are removed
+ * from the list.
+ *
+ * This simply looks at attnums references by the clauses, not at the
+ * type of the operator (equality, inequality, ...). This may not be the
+ * right way to do - it certainly works best for equalities, which is
+ * naturally consistent with functional dependencies (implications).
+ * It's not clear that other operators are handled sensibly - for
+ * example for inequalities, like
+ *
+ *     WHERE (A >= 10) AND (B <= 20)
+ *
+ * and a trivial case where [A == B], resulting in symmetric pair of
+ * rules [A => B], [B => A], it's rather clear we can't remove either of
+ * those clauses.
+ *
+ * That only highlights that functional dependencies are most suitable
+ * for label-like data, where using non-equality operators is very rare.
+ * Using the common city/zipcode example, clauses like
+ *
+ *     (zipcode <= 12345)
+ *
+ * or
+ *
+ *     (cityname >= 'Washington')
+ *
+ * are rare. So restricting the reduction to equality should not harm
+ * the usefulness / applicability.
+ *
+ * The other assumption is that this assumes 'compatible' clauses. For
+ * example by using mismatching zip code and city name, this is unable
+ * to identify the discrepancy and eliminates one of the clauses. The
+ * usual approach (multiplying both selectivities) thus produces a more
+ * accurate estimate, although mostly by luck - the multiplication
+ * comes from assumption of statistical independence of the two
+ * conditions (which is not not valid in this case), but moves the
+ * estimate in the right direction (towards 0%).
+ *
+ * This might be somewhat improved by cross-checking the selectivities
+ * against MCV and/or histogram.
+ *
+ * The implementation needs to be careful about cyclic rules, i.e. rules
+ * like [A => B] and [B => A] at the same time. This must not reduce
+ * clauses on both attributes at the same time.
+ *
+ * Technically we might consider selectivities here too, somehow. E.g.
+ * when (A => B) and (B => A), we might use the clauses with minimum
+ * selectivity.
+ *
+ * TODO Consider restricting the reduction to equality clauses. Or maybe
+ *      use equality classes somehow?
+ *
+ * TODO Merge this docs to dependencies.c, as it's saying mostly the
+ *      same things as the comments there.
+ *
+ * TODO Currently this is applied only to the top-level clauses, but
+ *      maybe we could apply it to lists at subtrees too, e.g. to the
+ *      two AND-clauses in
+ *
+ *          (x=1 AND y=2) OR (z=3 AND q=10)
+ *
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Oid varRelid, List *stats,
+							  SpecialJoinInfo *sjinfo)
+{
+	List	   *reduced_clauses = NIL;
+	Index		relid;
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	List	   *deps_clauses   = NIL;
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/*
+	 * Build the dependency matrix, i.e. attribute adjacency matrix,
+	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
+	 * multiply it by itself, to get transitive dependencies.
+	 *
+	 * Note: This is pretty much transitive closure from graph theory.
+	 *
+	 * First, let's see what attributes are covered by functional
+	 * dependencies (sides of the adjacency matrix), and also a maximum
+	 * attribute (size of mapping to simple integer indexes);
+	 */
+	deps_attnums = fdeps_collect_attnums(stats);
+
+	/*
+	 * Walk through the clauses - clauses that are (one of)
+	 *
+	 * (a) not mv-compatible
+	 * (b) are using more than a single attnum
+	 * (c) using attnum not covered by functional depencencies
+	 *
+	 * may be copied directly to the result. The interesting clauses are
+	 * kept in 'deps_clauses' and will be processed later.
+	 */
+	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
+										  &reduced_clauses, &deps_clauses,
+										  varRelid, &relid, sjinfo);
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+		list_free(deps_clauses);
+
+		return clauses;
+	}
+
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't really reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(deps_clauses);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/*
+	 * Build mapping between matrix indexes and attnums, and then the
+	 * adjacency matrix itself.
+	 */
+	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
+	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
+
+	/* build the adjacency matrix */
+	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
+										 deps_idx_to_attnum,
+										 deps_attnum_to_idx);
+
+	deps_natts = bms_num_members(deps_attnums);
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	multiply_adjacency_matrix(deps_matrix, deps_natts);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	deps_clauses = fdeps_reduce_clauses(deps_clauses,
+										deps_attnums, deps_matrix,
+										deps_idx_to_attnum,
+										deps_attnum_to_idx, relid);
+
+	/* join the two lists of clauses */
+	reduced_clauses = list_union(reduced_clauses, deps_clauses);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
+
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Determing relid (either from varRelid or from clauses) and then
+ * lookup stats using the relid.
+ */
+static List *
+find_stats(PlannerInfo *root, List *clauses, Oid varRelid, Index *relid)
+{
+	/* unknown relid by default */
+	*relid = InvalidOid;
+
+	/*
+	 * First we need to find the relid (index info simple_rel_array).
+	 * If varRelid is not 0, we already have it, otherwise we have to
+	 * look it up from the clauses.
+	 */
+	if (varRelid != 0)
+		*relid = varRelid;
+	else
+	{
+		Relids	relids = pull_varnos((Node*)clauses);
+
+		/*
+		 * We only expect 0 or 1 members in the bitmapset. If there are
+		 * no vars, we'll get empty bitmapset, otherwise we'll get the
+		 * relid as the single member.
+		 *
+		 * FIXME For some reason we can get 2 relids here (e.g. \d in
+		 *       psql does that).
+		 */
+		if (bms_num_members(relids) == 1)
+			*relid = bms_singleton_member(relids);
+
+		bms_free(relids);
+	}
+
+	/*
+	 * if we found the relid, we can get the stats from simple_rel_array
+	 *
+	 * This only gets stats that are already built, because that's how
+	 * we load it into RelOptInfo (see get_relation_info), but we don't
+	 * detoast the whole stats yet. That'll be done later, after we
+	 * decide which stats to use.
+	 */
+	if (*relid != InvalidOid)
+		return root->simple_rel_array[*relid]->mvstatlist;
+
+	return NIL;
+}
+
+static Bitmapset*
+fdeps_collect_attnums(List *stats)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		int2vector *stakeys = info->stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+			attnums = bms_add_member(attnums, stakeys->values[j]);
+	}
+
+	return attnums;
+}
+
+
+static int*
+make_idx_to_attnum_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum = -1;
+
+	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
+
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attidx++] = attnum;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+static int*
+make_attnum_to_idx_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum = -1;
+	int		maxattnum = -1;
+	int	   *mapping;
+
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		maxattnum = attnum;
+
+	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attnum] = attidx++;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+static bool*
+build_adjacency_matrix(List *stats, Bitmapset *attnums,
+					   int *idx_to_attnum, int *attnum_to_idx)
+{
+	ListCell *lc;
+	int		natts  = bms_num_members(attnums);
+	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! stat->deps_built)
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(stat->mvoid);
+		if (dependencies == NULL)
+		{
+			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
+			continue;
+		}
+
+		/* set matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			matrix[aidx * natts + bidx] = true;
+		}
+	}
+
+	return matrix;
+}
+
+static void
+multiply_adjacency_matrix(bool *matrix, int natts)
+{
+	int i;
+
+	for (i = 0; i < natts; i++)
+	{
+		int k, l, m;
+		int nchanges = 0;
+
+		/* k => l */
+		for (k = 0; k < natts; k++)
+		{
+			for (l = 0; l < natts; l++)
+			{
+				/* we already have this dependency */
+				if (matrix[k * natts + l])
+					continue;
+
+				/* we don't really care about the exact value, just 0/1 */
+				for (m = 0; m < natts; m++)
+				{
+					if (matrix[k * natts + m] * matrix[m * natts + l])
+					{
+						matrix[k * natts + l] = true;
+						nchanges += 1;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added here, so terminate */
+		if (nchanges == 0)
+			break;
+	}
+}
+
+static List*
+fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
+					int *idx_to_attnum, int *attnum_to_idx, Index relid)
+{
+	int i;
+	ListCell *lc;
+	List   *reduced_clauses = NIL;
+
+	int			nmvclauses;	/* size of the arrays */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+
+	int			natts = bms_num_members(attnums);
+
+	/*
+	 * Preallocate space for all clauses (the list only containst
+	 * compatible clauses at this point). This makes it somewhat easier
+	 * to access the stats / attnums randomly.
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* fill the arrays */
+	nmvclauses = 0;
+	foreach (lc, clauses)
+	{
+		Node * clause = (Node*)lfirst(lc);
+		Bitmapset * attnums = get_varattnos(clause, relid);
+
+		mvclauses[nmvclauses] = clause;
+		mvattnums[nmvclauses] = bms_singleton_member(attnums);
+		nmvclauses++;
+	}
+
+	Assert(nmvclauses == list_length(clauses));
+
+	/* now try to reduce the clauses (using the dependencies) */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], attnums))
+				continue;
+
+			aidx = attnum_to_idx[mvattnums[i]];
+			bidx = attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = matrix[aidx * natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	return reduced_clauses;
+}
+
+
+static Bitmapset *
+fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo)
+{
+	ListCell *lc;
+	Bitmapset *clause_attnums = NULL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+
+		if (! clause_is_mv_compatible(root, clause, varRelid, relid,
+									  &attnum, sjinfo))
+
+			/* clause incompatible with functional dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(attnum, deps_attnums))
+
+			/* clause not covered by the dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else
+		{
+			*deps_clauses   = lappend(*deps_clauses, clause);
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	return clause_attnums;
+}
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..bd200bc 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 84b6561..0a08d12 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -636,3 +636,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 411cd16..02a7dda 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -16,12 +16,20 @@
 
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -47,6 +55,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..e759997
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index b1bc7c7..81484f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index ade9ef1..14ea574 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -161,3 +161,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..48dea4d
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
2.1.0

0004-multivariate-MCV-lists.patchtext/x-diff; name=0004-multivariate-MCV-lists.patchDownload

>From 1ce724f8813c5e680be3b845a6a8d2d3cf8f3560 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/7] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 src/backend/catalog/system_views.sql   |    4 +-
 src/backend/commands/statscmds.c       |   45 +-
 src/backend/nodes/outfuncs.c           |    2 +
 src/backend/optimizer/path/clausesel.c | 1079 ++++++++++++++++++++++++++--
 src/backend/optimizer/util/plancat.c   |    4 +-
 src/backend/utils/mvstats/Makefile     |    2 +-
 src/backend/utils/mvstats/common.c     |  104 ++-
 src/backend/utils/mvstats/common.h     |   11 +-
 src/backend/utils/mvstats/mcv.c        | 1237 ++++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                |   25 +-
 src/include/catalog/pg_mv_statistic.h  |   19 +-
 src/include/catalog/pg_proc.h          |    4 +
 src/include/nodes/relation.h           |    2 +
 src/include/utils/mvstats.h            |   69 +-
 src/test/regress/expected/mv_mcv.out   |  207 ++++++
 src/test/regress/expected/rules.out    |    4 +-
 src/test/regress/parallel_schedule     |    2 +-
 src/test/regress/serial_schedule       |    1 +
 src/test/regress/sql/mv_mcv.sql        |  178 +++++
 19 files changed, 2898 insertions(+), 101 deletions(-)
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e3f3387..6482aa7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -165,7 +165,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 3790082..f730253 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -134,7 +134,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject, childobject;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -191,6 +197,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -199,10 +228,16 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -223,8 +258,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index cae21d0..0f58199 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1948,9 +1948,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c7f17e3..f122045 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,17 +48,38 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+							 int type);
 
 static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
-									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
+									  int type);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+								 List *clauses, Oid varRelid,
+								 List **mvclauses, MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStatisticInfo *mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
@@ -85,6 +107,13 @@ static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
 
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -250,8 +279,12 @@ clauselist_selectivity(PlannerInfo *root,
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP))
 	{
-		/* collect attributes referenced by mv-compatible clauses */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+		/*
+		 * Collect attributes referenced by mv-compatible clauses (looking
+		 * for clauses compatible with functional dependencies for now).
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   MV_CLAUSE_TYPE_FDEP);
 
 		/*
 		 * If there are mv-compatible clauses, referencing at least two
@@ -268,6 +301,48 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
+	 * Check that there are statistics with MCV list. If not, we don't
+	 * need to waste time with the optimization.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV))
+	{
+		/*
+		 * Recollect attributes from mv-compatible clauses (maybe we've
+		 * removed so many clauses we have a single mv-compatible attnum).
+		 * From now on we're only interested in MCV-compatible clauses.
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   MV_CLAUSE_TYPE_MCV);
+
+		/*
+		 * If there still are at least two columns, we'll try to select
+		 * a suitable multivariate stats.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+		{
+			/* see choose_mv_statistics() for details */
+			MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+			if (mvstat != NULL)	/* we have a matching stats */
+			{
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										MV_CLAUSE_TYPE_MCV);
+
+				/* we've chosen the histogram to match the clauses */
+				Assert(mvclauses != NIL);
+
+				/* compute the multivariate stats */
+				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			}
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -924,12 +999,129 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * Estimate selectivity for the list of MV-compatible clauses, using
+ * using a MV statistics (combining a histogram and MCV list).
+ *
+ * This simply passes the estimation to the MCV list and then to the
+ * histogram, if available.
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities
+ *      (i.e. the selectivity of the most restrictive clause), because
+ *      that's the maximum we can ever get from ANDed list of clauses.
+ *      This may probably prevent issues with hitting too many buckets
+ *      and low precision histograms.
+ *
+ * TODO We may support some additional conditions, most importantly
+ *      those matching multiple columns (e.g. "a = b" or "a < b").
+ *      Ultimately we could track multi-table histograms for join
+ *      cardinality estimation.
+ *
+ * TODO Further thoughts on processing equality clauses: Maybe it'd be
+ *      better to look for stats (with MCV) covered by the equality
+ *      clauses, because then we have a chance to find an exact match
+ *      in the MCV list, which is pretty much the best we can do. We may
+ *      also look at the least frequent MCV item, and use it as a upper
+ *      boundary for the selectivity (had there been a more frequent
+ *      item, it'd be in the MCV list).
+ *
+ * TODO There are several options for 'sanity clamping' the estimates.
+ *
+ *      First, if we have selectivities for each condition, then
+ *
+ *          P(A,B) <= MIN(P(A), P(B))
+ *
+ *      Because additional conditions (connected by AND) can only lower
+ *      the probability.
+ *
+ *      So we can do some basic sanity checks using the single-variate
+ *      stats (the ones we have right now).
+ *
+ *      Second, when we have multivariate stats with a MCV list, then
+ *
+ *      (a) if we have a full equality condition (one equality condition
+ *          on each column) and we found a match in the MCV list, this is
+ *          the selectivity (and it's supposed to be exact)
+ *
+ *      (b) if we have a full equality condition and we haven't found a
+ *          match in the MCV list, then the selectivity is below the
+ *          lowest selectivity in the MCV list
+ *
+ *      (c) if we have a equality condition (not full), we can still
+ *          search the MCV for matches and use the sum of probabilities
+ *          as a lower boundary for the histogram (if there are no
+ *          matches in the MCV list, then we have no boundary)
+ *
+ *      Third, if there are multiple (combinations of) multivariate
+ *      stats for a set of clauses, we may compute all of them and then
+ *      somehow aggregate them - e.g. by choosing the minimum, median or
+ *      average. The stats are susceptible to overestimation (because
+ *      we take 50% of the bucket for partial matches). Some stats may
+ *      give better estimates than others, but it's very difficult to
+ *      say that in advance which one is the best (it depends on the
+ *      number of buckets, number of additional columns not referenced
+ *      in the clauses, type of condition etc.).
+ *
+ *      So we may compute them all and then choose a sane aggregation
+ *      (minimum seems like a good approach). Of course, this may result
+ *      in longer / more expensive estimation (CPU-wise), but it may be
+ *      worth it.
+ *
+ *      It's possible to add a GUC choosing whether to do a 'simple'
+ *      (using a single stats expected to give the best estimate) and
+ *      'complex' (combining the multiple estimates).
+ *
+ *          multivariate_estimates = (simple|full)
+ *
+ *      Also, this might be enabled at a table level, by something like
+ *
+ *          ALTER TABLE ... SET STATISTICS (simple|full)
+ *
+ *      Which would make it possible to use this only for the tables
+ *      where the simple approach does not work.
+ *
+ *      Also, there are ways to optimize this algorithmically. E.g. we
+ *      may try to get an estimate from a matching MCV list first, and
+ *      if we happen to get a "full equality match" we may stop computing
+ *      the estimates from other stats (for this condition) because
+ *      that's probably the best estimate we can really get.
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
 collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
-				   Index *relid, SpecialJoinInfo *sjinfo)
+				   Index *relid, SpecialJoinInfo *sjinfo, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
@@ -945,12 +1137,11 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+								sjinfo, types);
 	}
 
 	/*
@@ -969,6 +1160,188 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
+ * We're looking for statistics matching at least 2 attributes,
+ * referenced in the clauses compatible with multivariate statistics.
+ * The current selection criteria is very simple - we choose the
+ * statistics referencing the most attributes.
+ *
+ * If there are multiple statistics referencing the same number of
+ * columns (from the clauses), the one with less source columns
+ * (as listed in the ADD STATISTICS when creating the statistics) wins.
+ * Other wise the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns,
+ *     but one has 100 buckets and the other one has 1000 buckets (thus
+ *     likely providing better estimates), this is not currently
+ *     considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list,
+ *     another one with just a histogram and a third one with both,
+ *     this is not considered.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts,
+ *     so if there are multiple clauses on a single attribute, this
+ *     still counts as a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example
+ *     equality clauses probably work better with MCV lists than with
+ *     histograms. But IS [NOT] NULL conditions may often work better
+ *     with histograms (thanks to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
+ * selected as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list
+ * of clauses into two parts - conditions that are compatible with the
+ * selected stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while
+ * the last condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting
+ * conditions instead of just referenced attributes), but eventually
+ * the best option should be to combine multiple statistics. But that's
+ * much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing
+ *      the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements)
+	 * and for each one count the referenced attributes (encoded in
+	 * the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen statistics, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes covered by the stats, so we can
+	 * do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
+									&attnums, sjinfo, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list
+		 * of mv-compatible clauses. Otherwise, keep it in the list of
+		 * 'regular' clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
  * into simple_rte_array) and a bitmap of attributes. This is then
@@ -987,8 +1360,12 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
  */
 static bool
 clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+						Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+						int types)
 {
+	Relids clause_relids;
+	Relids left_relids;
+	Relids right_relids;
 
 	if (IsA(clause, RestrictInfo))
 	{
@@ -998,82 +1375,176 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		if (rinfo->pseudoconstant)
 			return false;
 
-		/* no support for OR clauses at this point */
-		if (rinfo->orclause)
-			return false;
-
 		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
 		clause = (Node*)rinfo->clause;
 
-		/* only simple opclauses are compatible with multivariate stats */
-		if (! is_opclause(clause))
-			return false;
-
 		/* we don't support join conditions at this moment */
 		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
 			return false;
 
+		clause_relids = rinfo->clause_relids;
+		left_relids = rinfo->left_relids;
+		right_relids = rinfo->right_relids;
+	}
+	else if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+	{
+		left_relids = pull_varnos(get_leftop((Expr*)clause));
+		right_relids = pull_varnos(get_rightop((Expr*)clause));
+
+		clause_relids = bms_union(left_relids,
+								  right_relids);
+	}
+	else
+	{
+		/* Not a binary opclause, so mark left/right relid sets as empty */
+		left_relids = NULL;
+		right_relids = NULL;
+		/* and get the total relid set the hard way */
+		clause_relids = pull_varnos((Node *) clause);
+	}
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+		bool		varonleft = true;
+		bool		ok;
+
 		/* is it 'variable op constant' ? */
-		if (list_length(((OpExpr *) clause)->args) == 2)
+
+		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
+			(is_pseudo_constant_clause_relids(lsecond(expr->args),
+											  right_relids) ||
+			(varonleft = false,
+			is_pseudo_constant_clause_relids(linitial(expr->args),
+											 left_relids)));
+
+		if (ok)
 		{
-			OpExpr	   *expr = (OpExpr *) clause;
-			bool		varonleft = true;
-			bool		ok;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
 
-			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-				(is_pseudo_constant_clause_relids(lsecond(expr->args),
-												rinfo->right_relids) ||
-				(varonleft = false,
-				is_pseudo_constant_clause_relids(linitial(expr->args),
-												rinfo->left_relids)));
+			/*
+			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+			 * (return NULL).
+			 *
+			 * TODO Maybe use examine_variable() would fix that?
+			 */
+			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+				return false;
 
-			if (ok)
-			{
-				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			/*
+			 * Only consider this variable if (varRelid == 0) or when the varno
+			 * matches varRelid (see explanation at clause_selectivity).
+			 *
+			 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+			 *       part seems to be enforced by treat_as_join_clause().
+			 */
+			if (! ((varRelid == 0) || (varRelid == var->varno)))
+				return false;
 
-				/*
-				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-				 * (return NULL).
-				 *
-				 * TODO Maybe use examine_variable() would fix that?
-				 */
-				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-					return false;
+			/* Also skip special varno values, and system attributes ... */
+			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+				return false;
 
-				/*
-				 * Only consider this variable if (varRelid == 0) or when the varno
-				 * matches varRelid (see explanation at clause_selectivity).
-				 *
-				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-				 *       part seems to be enforced by treat_as_join_clause().
-				 */
-				if (! ((varRelid == 0) || (varRelid == var->varno)))
-					return false;
+			/* Lookup info about the base relation (we need to pass the OID out) */
+			if (relid != NULL)
+				*relid = var->varno;
+
+			/*
+			 * If it's not a "<" or ">" or "=" operator, just ignore the
+			 * clause. Otherwise note the relid and attnum for the variable.
+			 * This uses the function for estimating selectivity, ont the
+			 * operator directly (a bit awkward, but well ...).
+			 */
+			switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:
+					case F_SCALARGTSEL:
+						/* not compatible with functional dependencies */
+						if (types & MV_CLAUSE_TYPE_MCV)
+						{
+							*attnums = bms_add_member(*attnums, var->varattno);
+							return (types & MV_CLAUSE_TYPE_MCV);
+						}
+						return false;
+
+					case F_EQSEL:
+						*attnums = bms_add_member(*attnums, var->varattno);
+						return true;
+				}
+		}
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		Var * var = (Var*)((NullTest*)clause)->arg;
+
+		/*
+		 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+		 * (return NULL).
+		 *
+		 * TODO Maybe use examine_variable() would fix that?
+		 */
+		if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+			return false;
+
+		/*
+		 * Only consider this variable if (varRelid == 0) or when the varno
+		 * matches varRelid (see explanation at clause_selectivity).
+		 *
+		 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+		 *       part seems to be enforced by treat_as_join_clause().
+		 */
+		if (! ((varRelid == 0) || (varRelid == var->varno)))
+			return false;
 
-				/* Also skip special varno values, and system attributes ... */
-				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-					return false;
+		/* Also skip special varno values, and system attributes ... */
+		if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+			return false;
 
+		/* Lookup info about the base relation (we need to pass the OID out) */
+		if (relid != NULL)
 				*relid = var->varno;
 
-				/*
-				 * If it's not a "<" or ">" or "=" operator, just ignore the
-				 * clause. Otherwise note the relid and attnum for the variable.
-				 * This uses the function for estimating selectivity, ont the
-				 * operator directly (a bit awkward, but well ...).
-				 */
-				switch (get_oprrest(expr->opno))
-					{
-						case F_EQSEL:
-							*attnum = var->varattno;
-							return true;
-					}
-			}
+		*attnums = bms_add_member(*attnums, var->varattno);
+
+		return true;
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		/*
+		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses
+		 *      are supported and some are not, and treat all supported
+		 *      subclauses as a single clause, compute it's selectivity
+		 *      using mv stats, and compute the total selectivity using
+		 *      the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the
+		 *      orclause with nested RestrictInfo - we won't have to
+		 *      call pull_varnos() for each clause, saving time. 
+		 */
+		Bitmapset *tmp = NULL;
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			if (! clause_is_mv_compatible(root, (Node*)lfirst(l),
+						varRelid, relid, &tmp, sjinfo, types))
+				return false;
 		}
+
+		/* add the attnums from the OR-clause to the set of attnums */
+		*attnums = bms_join(*attnums, tmp);
+
+		return true;
 	}
 
 	return false;
-
 }
 
 /*
@@ -1322,6 +1793,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+			return true;
 	}
 
 	return false;
@@ -1617,25 +2091,39 @@ fdeps_filter_clauses(PlannerInfo *root,
 
 	foreach (lc, clauses)
 	{
-		AttrNumber	attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
 
-		if (! clause_is_mv_compatible(root, clause, varRelid, relid,
-									  &attnum, sjinfo))
+		if (! clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+									  sjinfo, MV_CLAUSE_TYPE_FDEP))
 
 			/* clause incompatible with functional dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
-		else if (! bms_is_member(attnum, deps_attnums))
+		else if (bms_num_members(attnums) > 1)
+
+			/*
+			 * clause referencing multiple attributes (strange, should
+			 * this be handled by clause_is_mv_compatible directly)
+			 */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
 
 			/* clause not covered by the dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
 		else
 		{
+			/* ok, clause compatible with existing dependencies */
+			Assert(bms_num_members(attnums) == 1);
+
 			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+										bms_singleton_member(attnums));
 		}
+
+		bms_free(attnums);
 	}
 
 	return clause_attnums;
@@ -1673,3 +2161,454 @@ get_varattnos(Node * node, Index relid)
 
 	return result;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/* frequency of the lowest MCV item */
+	*lowsel = 1.0;
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 *
+	 * FIXME This would probably deserve a refactoring, I guess. Unify
+	 *       the two loops and put the checks inside, or something like
+	 *       that.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			/* operator */
+			FmgrInfo		opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	ltproc, gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype,
+										TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * find the lowest selectivity in the MCV
+					 * FIXME Maybe not the best place do do this (in for all clauses).
+					 */
+					if (item->frequency < *lowsel)
+						*lowsel = item->frequency;
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* TODO consider bsearch here (list is sorted by values)
+					 * TODO handle other operators too (LT, GT)
+					 * TODO identify "full match" when the clauses fully
+					 *      match the whole MCV list (so that checking the
+					 *      histogram is not needed)
+					 */
+					if (oprrest == F_EQSEL)
+					{
+						/*
+						 * We don't care about isgt in equality, because it does not
+						 * matter whether it's (var = const) or (const = var).
+						 */
+						bool match = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+
+						if (match)
+							eqmatches = bms_add_member(eqmatches, idx);
+
+						mismatch = (! match);
+					}
+					else if (oprrest == F_SCALARLTSEL)	/* column < constant */
+					{
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+						} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+					}
+					else if (oprrest == F_SCALARGTSEL)	/* column > constant */
+					{
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/*
+				 * find the lowest selectivity in the MCV
+				 * FIXME Maybe not the best place do do this (in for all clauses).
+				 */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! mcvlist->items[i]->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (mcvlist->items[i]->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60fd57f..0da7ad9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -410,7 +410,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -419,9 +419,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd200bc..d1da714 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 6d5465b..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..670dbda
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1237 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Multivariate MCVs (most-common values lists) are a straightforward
+ * extension of regular MCV list, tracking combinations of values for
+ * several attributes (columns), including NULL flags, and frequency
+ * of the combination.
+ *
+ * For columns with small number of distinct values, this works quite
+ * well and may represent the distribution very accurately. For columns
+ * with large number of distinct values (e.g. stored as FLOAT), this
+ * does not work that well. Especially if the distribution is mostly
+ * uniform, with no very common combinations.
+ *
+ * If we can represent the distribution as a MCV list, we can estimate
+ * some clauses (e.g. equality clauses) much accurately than using
+ * histograms for example.
+ *
+ * Another benefit of MCV lists (compared to histograms) is that they
+ * don't require sorting of the values, so that they work better for
+ * data types that either don't support sorting at all, or when the
+ * sorting does not really match the meaning. For example we know how to
+ * sort strings, but it's unlikely to make much sense for city names.
+ *
+ *
+ * Hashed MCV (not yet implemented)
+ * -------------------------------- 
+ * By restricting to MCV list and equality conditions, we may use hash
+ * values instead of the long varlena values. This significantly reduces
+ * the storage requirements, and we can still use it to estimate the
+ * equality conditions (assuming the collisions are rare enough).
+ *
+ * This however complicates matching the columns to available stats, as
+ * it requires matching clauses (not columns) to stats. And it may get
+ * quite complex - e.g. what if there are multiple clauses, each
+ * compatible with different stats subset?
+ *
+ *
+ * Selectivity estimation
+ * ----------------------
+ * The estimation, implemented in clauselist_mv_selectivity_mcvlist(),
+ * is quite simple in principle - walk through the MCV items and sum
+ * frequencies of all the items that match all the clauses.
+ *
+ * The current implementation uses MCV lists to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *  (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (e) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ *
+ * Estimating equality clauses
+ * ---------------------------
+ * When computing selectivity estimate for equality clauses
+ *
+ *      (a = 1) AND (b = 2)
+ *
+ * we can do this estimate pretty exactly assuming that two conditions
+ * are met:
+ *
+ *      (1) there's an equality condition on each attribute
+ *
+ *      (2) we find a matching item in the MCV list
+ *
+ * In that case we know the MCV item represents all the tuples matching
+ * the clauses, and the selectivity estimate is complete. This is what
+ * we call 'full match'.
+ *
+ * When only (1) holds, but there's no matching MCV item, we don't know
+ * whether there are no such rows or just are not very frequent. We can
+ * however use the frequency of the least frequent MCV item as an upper
+ * bound for the selectivity.
+ *
+ * If the equality conditions match only a subset of the attributes
+ * the MCV list is built on (i.e. we can't get a full match - we may get
+ * multiple MCV items matching the clauses, but even if we get a single
+ * match there may be items that did not get into the MCV list. But in
+ * this case we can still use the frequency of the last MCV item to clam
+ * the 'additional' selectivity not accounted for by the matching items.
+ *
+ * If there's no histogram, because the MCV list approximates the
+ * distribution accurately (not because the histogram was disabled),
+ * it does not really matter whether there are equality conditions on
+ * all the columns - we can do pretty accurate estimation using the MCV.
+ *
+ * TODO For a combination of equality conditions (not full-match case)
+ *      we probably can clamp the selectivity by the minimum of
+ *      selectivities for each condition. For example if we know the
+ *      number of distinct values for each column, we can use 1/ndistinct
+ *      as a per-column estimate. Or rather 1/ndistinct + selectivity
+ *      derived from the MCV list.
+ *
+ *      If we know the estimate of number of combinations of the columns
+ *      (i.e. ndistinct(A,B)), we may estimate the average frequency of
+ *      items in the remaining 10% as [10% / ndistinct(A,B)].
+ *
+ *
+ * Bounding estimates
+ * ------------------
+ * In general the MCV lists may not provide estimates as accurate as
+ * for the full-match equality case, but may provide some useful
+ * lower/upper boundaries for the estimation error.
+ *
+ * With equality clauses we can do a few more tricks to narrow this
+ * error range (see the previous section and TODO), but with inequality
+ * clauses (or generally non-equality clauses), it's rather dificult.
+ * There's nothing like a 'full match' - we have to consider both the
+ * MCV items and the remaining part every time. We can't use the minimum
+ * selectivity of MCV items, as the clauses may match multiple items.
+ *
+ * For example with a MCV list on columns (A, B), covering 90% of the
+ * table (computed while building the MCV list), about ~10% of the table
+ * is not represented by the MCV list. So even if the conditions match
+ * all the remaining rows (not represented by the MCV items), we can't
+ * get selectivity higher than those 10%. We may use 1/2 the remaining
+ * selectivity as an estimate (minimizing average error).
+ *
+ * TODO Most of these ideas (error limiting) are not yet implemented.
+ *
+ *
+ * General TODO
+ * ------------
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * TODO Add support for clauses referencing multiple columns (a < b).
+ *
+ * TODO It's possible to build a special case of MCV list, storing not
+ *      the actual values but only 32/64-bit hash. This is only useful
+ *      for estimating equality clauses and for large varlena types,
+ *      which are very impractical for plain MCV list because of size.
+ *      But for those data types we really want just the equality
+ *      clauses, so it's actually a good solution.
+ *
+ * TODO Currently there's no logic to consider building only a MCV list
+ *      (and not building the histogram at all), except for doing this
+ *      decision manually in ADD STATISTICS.
+ */
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't
+ * allow more than 8k MCV items (see list max_mcv_items). We might
+ * increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the high compression as with histograms,
+ * because we're not doing any bucket splits etc. (which is the source
+ * of high redundancy there), but we need to do it anyway as we need
+ * to serialize varlena values etc. We might invent another way to
+ * serialize MCV lists, but let's keep it consistent.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* do not exceed UINT16_MAX */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval || (info[i].typlen > 0))
+			/* by value pased by reference, but fixed length */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * Inverse to serialize_mv_mcvlist() - see the comment there.
+ *
+ * We'll do full deserialization, because we don't really expect high
+ * duplication of values so the caching may not be as efficient as with
+ * histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			val = item->values[i];
+			valout = FunctionCall1(&fmgrinfo[i], val);
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index f6d60ad..cd0ed01 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,8 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, staname, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2128,6 +2129,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name */
@@ -2135,10 +2138,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 3), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 7));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 8c33a92..7be6223 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -36,15 +36,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -60,12 +66,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					6
+
+#define Natts_pg_mv_statistic					10
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_deps_enabled		3
-#define Anum_pg_mv_statistic_deps_built			4
-#define Anum_pg_mv_statistic_stakeys			5
-#define Anum_pg_mv_statistic_stadeps			6
+#define Anum_pg_mv_statistic_mcv_enabled		4
+#define Anum_pg_mv_statistic_mcv_max_items		5
+#define Anum_pg_mv_statistic_deps_built			6
+#define Anum_pg_mv_statistic_mcv_built			7
+#define Anum_pg_mv_statistic_stakeys			8
+#define Anum_pg_mv_statistic_stadeps			9
+#define Anum_pg_mv_statistic_stamcv				10
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 85c638d..b16f2a9 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2743,6 +2743,10 @@ DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index baa0c88..7f2dc8a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -592,9 +592,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 02a7dda..b028192 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -50,30 +50,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..4958390
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON mcv_list (unknown_column) WITH (mcv);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON mcv_list (a) WITH (mcv);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a, b) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items 200);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items 10);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items 10000);
+ERROR:  max number of MCV items is 8192
+-- correct command
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 428b1e8..50715db 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1369,7 +1369,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 81484f1..838c12b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 14ea574..d97a0ec 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -162,3 +162,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..16d82cf
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON mcv_list (unknown_column) WITH (mcv);
+
+-- single column
+CREATE STATISTICS s1 ON mcv_list (a) WITH (mcv);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a) WITH (mcv);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a, b) WITH (mcv);
+
+-- unknown option
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (unknown_option);
+
+-- missing MCV statistics
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items 200);
+
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items 10);
+
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items 10000);
+
+-- correct command
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.1.0

0005-multivariate-histograms.patchtext/x-diff; name=0005-multivariate-histograms.patchDownload

>From 133193c6e1546d2b3a595c04c0213400ea3c7990 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/7] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   44 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  718 ++++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2316 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   25 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  136 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 18 files changed, 3662 insertions(+), 39 deletions(-)
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6482aa7..cb6eff3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -167,7 +167,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index f730253..68e1685 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -135,12 +135,15 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -220,6 +223,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("maximum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -228,10 +254,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -239,6 +265,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -259,11 +290,14 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 0f58199..46463cc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1949,10 +1949,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index f122045..6c99f02 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -73,6 +74,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -80,6 +83,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
@@ -114,6 +123,7 @@ static Bitmapset * get_varattnos(Node * node, Index relid);
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -304,7 +314,7 @@ clauselist_selectivity(PlannerInfo *root,
 	 * Check that there are statistics with MCV list. If not, we don't
 	 * need to waste time with the optimization.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 	{
 		/*
 		 * Recollect attributes from mv-compatible clauses (maybe we've
@@ -312,7 +322,7 @@ clauselist_selectivity(PlannerInfo *root,
 		 * From now on we're only interested in MCV-compatible clauses.
 		 */
 		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   MV_CLAUSE_TYPE_MCV);
+									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 		/*
 		 * If there still are at least two columns, we'll try to select
@@ -331,7 +341,7 @@ clauselist_selectivity(PlannerInfo *root,
 				/* split the clauselist into regular and mv-clauses */
 				clauses = clauselist_mv_split(root, sjinfo, clauses,
 										varRelid, &mvclauses, mvstat,
-										MV_CLAUSE_TYPE_MCV);
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 				/* we've chosen the histogram to match the clauses */
 				Assert(mvclauses != NIL);
@@ -1098,6 +1108,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -1111,9 +1122,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1255,7 +1281,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1415,7 +1441,6 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		bool		ok;
 
 		/* is it 'variable op constant' ? */
-
 		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
 			(is_pseudo_constant_clause_relids(lsecond(expr->args),
 											  right_relids) ||
@@ -1465,10 +1490,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 					case F_SCALARLTSEL:
 					case F_SCALARGTSEL:
 						/* not compatible with functional dependencies */
-						if (types & MV_CLAUSE_TYPE_MCV)
+						if (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 						{
 							*attnums = bms_add_member(*attnums, var->varattno);
-							return (types & MV_CLAUSE_TYPE_MCV);
+							return (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 						}
 						return false;
 
@@ -1796,6 +1821,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2612,3 +2640,675 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 *
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																	   | TYPECACHE_GT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					bool tmp;
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+
+					/* values from the call cache */
+					char mincached, maxcached;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mincached = callcache[bucket->min[idx]];
+					maxcached = callcache[bucket->max[idx]];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*
+					* FIXME Once the min/max values are deduplicated, we can easily minimize
+					*       the number of calls to the comparator (assuming we keep the
+					*       deduplicated structure). See the note on compression at MVBucket
+					*       serialize/deserialize methods.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* column < constant */
+
+							if (! isgt)	/* (var < const) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 minval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (minval, constvalue).
+									 */
+									callcache[bucket->min[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(mincached & 0x02);	/* get call result from the cache (inverse) */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 maxval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (minval, constvalue).
+									 */
+									callcache[bucket->max[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(maxcached & 0x02);	/* extract the result (reverse) */
+
+								if (tmp)	/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							}
+							else	/* (const < var) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 maxval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 minval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (mincached & 0x02);	/* extract the result */
+
+								if (tmp)	/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_SCALARGTSEL:	/* column > constant */
+
+							if (! isgt)	/* (var > const) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 maxval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (val, constvalue).
+									 */
+									callcache[bucket->max[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 minval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (val, constvalue).
+									 */
+									callcache[bucket->min[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(mincached & 0x02);	/* extract the result */
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							else /* (const > var) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in
+								 * that case we can skip the bucket, because there's no overlap).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 minval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (mincached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 maxval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the lt/gt
+							 * operators fetched from type cache.
+							 *
+							 * TODO We'll use the default 50% estimate, but that's probably way off
+							 *		if there are multiple distinct values. Consider tweaking this a
+							 *		somehow, e.g. using only a part inversely proportional to the
+							 *		estimated number of distinct values in the bucket.
+							 *
+							 * TODO This does not handle inclusion flags at the moment, thus counting
+							 *		some buckets twice (when hitting the boundary).
+							 *
+							 * TODO Optimization is that if max[i] == min[i], it's effectively a MCV
+							 *		item and we can count the whole bucket as a complete match (thus
+							 *		using 100% bucket selectivity and not just 50%).
+							 *
+							 * TODO Technically some buckets may "degenerate" into single-value
+							 *		buckets (not necessarily for all the dimensions) - maybe this
+							 *		is better than keeping a separate MCV list (multi-dimensional).
+							 *		Update: Actually, that's unlikely to be better than a separate
+							 *		MCV list for two reasons - first, it requires ~2x the space
+							 *		(because of storing lower/upper boundaries) and second because
+							 *		the buckets are ranges - depending on the partitioning algorithm
+							 *		it may not even degenerate into (min=max) bucket. For example the
+							 *		the current partitioning algorithm never does that.
+							 */
+							if (! mincached)
+							{
+								tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 minval));
+
+								/* Update the cache. */
+								callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+							}
+							else
+								tmp = (mincached & 0x02);	/* extract the result */
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							if (! maxcached)
+							{
+								tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																	 DEFAULT_COLLATION_OID,
+																	 maxval,
+																	 cst->constvalue));
+
+								/* Update the cache. */
+								callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+							}
+							else
+								tmp = (maxcached & 0x02);	/* extract the result */
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							/* partial match */
+							UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							break;
+					}
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0da7ad9..9aded52 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -410,7 +410,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -420,10 +420,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d1da714..ffb76f4 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..933700f
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2316 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+/*
+ * Multivariate histograms
+ * -----------------------
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by a min/max value in each
+ * dimension, stored in an array, so that the bucket includes values
+ * fulfilling condition
+ *
+ *     min[i] <= value[i] <= max[i]
+ *
+ * where 'i' is the dimension. In 1D this corresponds to a simple
+ * interval, in 2D to a rectangle, and in 3D to a block. If you can
+ * imagine this in 4D, congrats!
+ *
+ * In addition to the bounaries, each bucket tracks additional details:
+ *
+ *     * frequency (fraction of tuples it matches)
+ *     * whether the boundaries are inclusive or exclusive
+ *     * whether the dimension contains only NULL values
+ *     * number of distinct values in each dimension (for building)
+ *
+ * and possibly some additional information.
+ *
+ * We do expect to support multiple histogram types, with different
+ * features etc. The 'type' field is used to identify those types.
+ * Technically some histogram types might use completely different
+ * bucket representation, but that's not expected at the moment.
+ *
+ * Although the current implementation builds non-overlapping buckets,
+ * the code does not (and should not) rely on the non-overlapping
+ * nature - there are interesting types of histograms / histogram
+ * building algorithms producing overlapping buckets.
+ *
+ *
+ * NULL handling (create_null_buckets)
+ * -----------------------------------
+ * Another thing worth mentioning is handling of NULL values. It would
+ * be quite difficult to work with buckets containing NULL and non-NULL
+ * values for a single dimension. To work around this, the initial step
+ * in building a histogram is building a set of 'NULL-buckets', i.e.
+ * buckets with one or more NULL-only dimensions.
+ *
+ * After that, no buckets are mixing NULL and non-NULL values in one
+ * dimension, and the actual histogram building starts. As that only
+ * splits the buckets into smaller ones, the resulting buckets can't
+ * mix NULL and non-NULL values either.
+ *
+ * The maximum number of NULL-buckets is determined by the number of
+ * attributes the histogram is built on. For N-dimensional histogram,
+ * the maximum number of NULL-buckets is 2^N. So for 8 attributes
+ * (which is the current value of MVSTATS_MAX_DIMENSIONS), there may be
+ * up to 256 NULL-buckets.
+ *
+ * Those buckets are only built if needed - if there are no NULL values
+ * in the data, no such buckets are built.
+ *
+ *
+ * Estimating selectivity
+ * ----------------------
+ * With histograms, we always "match" a whole bucket, not indivitual
+ * rows (or values), irrespectedly of the type of clause. Therefore we
+ * can't use the optimizations for equality clauses, as in MCV lists.
+ *
+ * The current implementation uses histograms to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *  (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (e) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ * When used on low-cardinality data, histograms usually perform
+ * considerably worse than MCV lists (which are a good fit for this
+ * kind of data). This is especially true on categorical data, where
+ * ordering of the values is mostly unrelated to meaning of the data,
+ * as proper ordering is crucial for histograms.
+ *
+ * On high-cardinality data the histograms are usually a better choice,
+ * because MCV lists can't represent the distribution accurately enough.
+ *
+ * By evaluating a clause on a bucket, we may get one of three results:
+ *
+ *     (a) FULL_MATCH - The bucket definitely matches the clause.
+ *
+ *     (b) PARTIAL_MATCH - The bucket matches the clause, but not
+ *                         necessarily all the tuples it represents.
+ *
+ *     (c) NO_MATCH - The bucket definitely does not match the clause.
+ *
+ * This may be illustrated using a range [1, 5], which is essentially
+ * a 1-D bucket. With clause
+ *
+ *     WHERE (a < 10) => FULL_MATCH (all range values are below
+ *                       10, so the whole bucket matches)
+ *
+ *     WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+ *                       the clause, but we don't know how many)
+ *
+ *     WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+ *                       no values from the bucket can match)
+ *
+ * Some clauses may produce only some of those results - for example
+ * equality clauses may never produce FULL_MATCH as we always hit only
+ * part of the bucket (we can't match both boundaries at the same time).
+ * This results in less accurate estimates compared to MCV lists, where
+ * we can hit a MCV items exactly (there's no PARTIAL match in MCV).
+ *
+ * There are clauses that may not produce any PARTIAL_MATCH results.
+ * A nice example of that is 'IS [NOT] NULL' clause, which either
+ * matches the bucket completely (FULL_MATCH) or not at all (NO_MATCH),
+ * thanks to how the NULL-buckets are constructed.
+ *
+ * Computing the total selectivity estimate is trivial - simply sum
+ * selectivities from all the FULL_MATCH and PARTIAL_MATCH buckets (but
+ * multiply the PARTIAL_MATCH buckets by 0.5 to minimize average error).
+ *
+ *
+ * Serialization
+ * -------------
+ * After building, the histogram is serialized into a more efficient
+ * form (dedup boundary values etc.). See serialize_mv_histogram() for
+ * more details about how it's done.
+ *
+ * Serialized histograms are marked with 'magic' constant, to make it
+ * easier to check the bytea value really is a serialized histogram.
+ *
+ * In the serialized form, values for each dimension are deduplicated,
+ * and referenced using an uint16 index. This saves a lot of space,
+ * because every time we split a bucket, we introduce a single new
+ * boundary value (to split the bucket by the selected dimension), but
+ * we actually copy all the boundary values for all dimensions. So for
+ * a histogram with 4 dimensions and 1000 buckets, we do have
+ *
+ *     1000 * 4 * 2 = 8000
+ *
+ * boundary values, but many of them are actually duplicated because
+ * the histogram started with a single bucket (8 boundary values) and
+ * then there were 999 splits (each introducing 1 new value):
+ *
+ *      8 + 999 = 1007
+ *
+ * So that's quite large diffence. Let's assume the Datum values are
+ * 8 bytes each. Storing the raw histogram would take ~ 64 kB, while
+ * with deduplication it's only ~18 kB.
+ *
+ * The difference may be removed by the transparent bytea compression,
+ * but the deduplication is also used to optimize the estimation. It's
+ * possible to process the deduplicated values, and then use this as
+ * a cache to minimize the actual function calls while checking the
+ * buckets. This significantly reduces the number of calls to the
+ * (often quite expensive) operator functions etc.
+ *
+ *
+ * The current limit on number of buckets (16384) is mostly arbitrary,
+ * but set so that it makes sure we don't exceed the number of distinct
+ * values indexable by uint16. In practice we could handle more buckets,
+ * because we index each dimension independently, and we do the splits
+ * over multiple dimensions.
+ *
+ * Histograms with more than 16k buckets are quite expensive to build
+ * and process, so the current limit is somewhat reasonable.
+ *
+ * The actual number of buckets is also related to statistics target,
+ * because we require MIN_BUCKET_ROWS (10) tuples per bucket before
+ * a split, so we can't have more than (2 * 300 * target / 10) buckets.
+ *
+ *
+ * TODO Maybe the distinct stats (both for combination of all columns
+ *      and for combinations of various subsets of columns) should be
+ *      moved to a separate structure (next to histogram/MCV/...) to
+ *      make it useful even without a histogram computed etc.
+ *
+ *      This would actually make mvcoeff (proposed by Kyotaro Horiguchi
+ *      in [1]) possible. Seems like a good way to estimate GROUP BY
+ *      cardinality, and also some other cases, pointed out by Kyotaro:
+ *
+ *      [1] http://www.postgresql.org/message-id/20150515.152936.83796179.horiguchi.kyotaro@lab.ntt.co.jp
+ *
+ *      This is not implemented at the moment, though. Also, Kyotaro's
+ *      patch only works with pairs of columns, but maybe tracking all
+ *      the combinations would be useful to handle more complex
+ *      conditions. It only seems to handle equalities, though (but for
+ *      GROUP BY estimation that's not a big deal).
+ */
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size.
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket
+ * for more details about the algorithm.
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later
+	 * to select dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		int				j;
+		int				nvals;
+		Datum		   *tmp;
+
+		SortSupportData	ssup;
+		StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		nvals = 0;
+		tmp = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+		for (j = 0; j < numrows; j++)
+		{
+			bool	isnull;
+
+			/* remember the index of the sample row, to make the partitioning simpler */
+			Datum	value = heap_getattr(rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isnull);
+
+			if (isnull)
+				continue;
+
+			tmp[nvals++] = value;
+		}
+
+		/* do the sort and stuff only if there are non-NULL values */
+		if (nvals > 0)
+		{
+			/* sort the array of values */
+			qsort_arg((void *) tmp, nvals, sizeof(Datum),
+					  compare_scalars_simple, (void *) &ssup);
+
+			/* count distinct values */
+			ndistvalues[i] = 1;
+			for (j = 1; j < nvals; j++)
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+					ndistvalues[i] += 1;
+
+			/* FIXME allocate only needed space (count ndistinct first) */
+			distvalues[i] = (Datum*)palloc0(sizeof(Datum) * ndistvalues[i]);
+
+			/* now collect distinct values into the array */
+			distvalues[i][0] = tmp[0];
+			ndistvalues[i] = 1;
+
+			for (j = 1; j < nvals; j++)
+			{
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+				{
+					distvalues[i][ndistvalues[i]] = tmp[j];
+					ndistvalues[i] += 1;
+				}
+			}
+		}
+
+		pfree(tmp);
+	}
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats,
+							   ndistvalues, distvalues);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in
+		 * case some of the rows were used for MCV (and thus are missing
+		 * from the histogram).
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm
+ * is simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * buckets in the histogram max_buckets (well, we might increase this
+ * to 16k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ *
+ * Deduplication in serialization
+ * ------------------------------
+ * The deduplication is very effective and important here, because every
+ * time we split a bucket, we keep all the boundary values, except for
+ * the dimension that was used for the split. Another way to look at
+ * this is that each split introduces 1 new value (the value used to do
+ * the split). A histogram with M buckets was created by (M-1) splits
+ * of the initial bucket, and each bucket has 2*N boundary values. So
+ * assuming the initial bucket does not have any 'collapsed' dimensions,
+ * the number of distinct values is
+ *
+ *     (2*N + (M-1))
+ *
+ * but the total number of boundary values is
+ *
+ *     2*N*M
+ *
+ * which is clearly much higher. For a histogram on two columns, with
+ * 1024 buckets, it's 1027 vs. 4096. Of course, we're not saving all
+ * the difference (because we'll use 32-bit indexes into the values).
+ * But with large values (e.g. stored as varlena), this saves a lot.
+ *
+ * An interesting feature is that the total number of distinct values
+ * does not really grow with the number of dimensions, except for the
+ * size of the initial bucket. After that it only depends on number of
+ * buckets (i.e. number of splits).
+ *
+ * XXX Of course this only holds for the current histogram building
+ *     algorithm. Algorithms doing the splits differently (e.g.
+ *     producing overlapping buckets) may behave differently.
+ *
+ * TODO This only confirms we can use the uint16 indexes. The worst
+ *      that could happen is if all the splits happened by a single
+ *      dimension. To exhaust the uint16 this would require ~64k
+ *      splits (needs to be reflected in MVSTAT_HIST_MAX_BUCKETS).
+ *
+ * TODO We don't need to use a separate boolean for each flag, instead
+ *      use a single char and set bits.
+ *
+ * TODO We might get a bit better compression by considering the actual
+ *      data type length. The current implementation treats all data
+ *      types passed by value as requiring 8B, but for INT it's actually
+ *      just 4B etc.
+ *
+ *      OTOH this is only related to the lookup table, and most of the
+ *      space is occupied by the buckets (with int16 indexes).
+ *
+ *
+ * Varlena compression
+ * -------------------
+ * This encoding may prevent automatic varlena compression (similarly
+ * to JSONB), because first part of the serialized bytea will be an
+ * array of unique values (although sorted), and pglz decides whether
+ * to compress by trying to compress the first part (~1kB or so). Which
+ * is likely to be poor, due to the lack of repetition.
+ *
+ * One possible cure to that might be storing the buckets first, and
+ * then the deduplicated arrays. The buckets might be better suited
+ * for compression.
+ *
+ * On the other hand the encoding scheme is a context-aware compression,
+ * usually compressing to ~30% (or less, with large data types). So the
+ * lack of pglz compression may be OK.
+ *
+ * XXX But maybe we don't really want to compress this, to save on
+ *     planning time?
+ *
+ * TODO Try storing the buckets / deduplicated arrays in reverse order,
+ *      measure impact on compression.
+ *
+ *
+ * Deserialization
+ * ---------------
+ * The deserialization is currently implemented so that it reconstructs
+ * the histogram back into the same structures - this involves quite
+ * a few of memcpy() and palloc(), but maybe we could create a special
+ * structure for the serialized histogram, and access the data directly,
+ * without the unpacking.
+ *
+ * Not only it would save some memory and CPU time, but might actually
+ * work better with CPU caches (not polluting the caches).
+ *
+ * TODO Try to keep the compressed form, instead of deserializing it to
+ *      MVHistogram/MVBucket.
+ *
+ *
+ * General TODOs
+ * -------------
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (10 * 1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 10MB (%ld > %d)",
+					total_length, (10 * 1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typlen > 0)
+			{
+				/* pased by value or reference, but fixed length */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary
+ * values deduplicated, so that it's possible to optimize the estimation
+ * part by caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* now let's allocate a single buffer for all the values and counts */
+
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+	for (i = 0; i < ndims; i++)
+	{
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+	}
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+				sizeof(MVSerializedBucket) +		/* bucket pointer */
+				sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval && info[i].typlen == sizeof(Datum))
+		{
+			/* passed by value / Datum - simply reuse the array */
+			histogram->values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typbyval)
+			{
+				/* pased by value, but smaller than Datum */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= *BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size. We
+ * select the bucket with the highest number of distinct values, and
+ * then split it by the longest dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this
+ * is used to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this
+ *       contains values for all the tuples from the sample, not just
+ *       the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *      For example use the "bucket volume" (product of dimension
+ *      lengths) to select the bucket.
+ *
+ *      We need buckets containing about the same number of tuples (so
+ *      about the same frequency), as that limits the error when we
+ *      match the bucket partially (in that case use 1/2 the bucket).
+ *
+ *      We also need buckets with "regular" size, i.e. not "narrow" in
+ *      some dimensions and "wide" in the others, because that makes
+ *      partial matches more likely and increases the estimation error,
+ *      especially when the clauses match many buckets partially. This
+ *      is especially serious for OR-clauses, because in that case any
+ *      of them may add the bucket as a (partial) match. With AND-clauses
+ *      all the clauses have to match the bucket, which makes this issue
+ *      somewhat less pressing.
+ *
+ *      For example this table:
+ *
+ *          CREATE TABLE t AS SELECT i AS a, i AS b
+ *                              FROM generate_series(1,1000000) s(i);
+ *          ALTER TABLE t ADD STATISTICS (histogram) ON (a,b);
+ *          ANALYZE t;
+ *
+ *      It's a very specific (and perhaps artificial) example, because
+ *      every bucket always has exactly the same number of distinct
+ *      values in all dimensions, which makes the partitioning tricky.
+ *
+ *      Then:
+ *
+ *          SELECT * FROM t WHERE a < 10 AND b < 10;
+ *
+ *      is estimated to return ~120 rows, while in reality it returns 9.
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *                      (actual time=0.185..270.774 rows=9 loops=1)
+ *         Filter: ((a < 10) AND (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      while the query using OR clauses is estimated like this:
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *                      (actual time=0.118..189.919 rows=9 loops=1)
+ *         Filter: ((a < 10) OR (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      which is clearly much worse. This happens because the histogram
+ *      contains buckets like this:
+ *
+ *          bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ *      i.e. the length of "a" dimension is (30310-3)=30307, while the
+ *      length of "b" is (30593-30134)=459. So the "b" dimension is much
+ *      narrower than "a". Of course, there are buckets where "b" is the
+ *      wider dimension.
+ *
+ *      This is partially mitigated by selecting the "longest" dimension
+ *      in partition_bucket() but that only happens after we already
+ *      selected the bucket. So if we never select the bucket, we can't
+ *      really fix it there.
+ *
+ *      The other reason why this particular example behaves so poorly
+ *      is due to the way we split the partition in partition_bucket().
+ *      Currently we attempt to divide the bucket into two parts with
+ *      the same number of sampled tuples (frequency), but that does not
+ *      work well when all the tuples are squashed on one end of the
+ *      bucket (e.g. exactly at the diagonal, as a=b). In that case we
+ *      split the bucket into a tiny bucket on the diagonal, and a huge
+ *      remaining part of the bucket, which is still going to be narrow
+ *      and we're unlikely to fix that.
+ *
+ *      So perhaps we need two partitioning strategies - one aiming to
+ *      split buckets with high frequency (number of sampled rows), the
+ *      other aiming to split "large" buckets. And alternating between
+ *      them, somehow.
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest
+ * bucket dimension, measured using the array of distinct values built
+ * at the very beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly
+ * distributed, and then use this to measure length. It's essentially
+ * a number of distinct values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts
+ * with roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning
+ * the new bucket (essentially shrinking the existing one in-place and
+ * returning the other "half" as a new bucket). The caller is responsible
+ * for adding the new bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case
+ *      of strongly dependent columns - e.g. y=x). The current algorithm
+ *      minimizes this, but may still happen for perfectly dependent
+ *      examples (when all the dimensions have equal length, the first
+ *      one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	// int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split.
+	 */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* sort support for the bsearch_comparator */
+		ssup_private = &ssup;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		b = (Datum*)bsearch(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute
+		 *       here - we can start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram (or if there's no
+ * statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ * 
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options
+ * skew the lengths by distributing the distinct values uniformly. For
+ * data types without a clear meaning of 'distance' (e.g. strings) that
+ * is not a big deal, but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_size = 1.0;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(9 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+		values[3] = (char *) palloc0(1024 * sizeof(char));
+		values[4] = (char *) palloc0(1024 * sizeof(char));
+		values[5] = (char *) palloc0(1024 * sizeof(char));
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/*
+		 * currently we only print array of indexes, but the deduplicated
+		 * values should be sorted, so this is actually quite useful
+		 *
+		 * TODO print the actual min/max values, using the output
+		 *      function of the attribute type
+		 */
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bucket_size *= (bucket->max[i] - bucket->min[i]) * 1.0
+											/ (histogram->nvalues[i]-1);
+
+			/* print the actual values, i.e. use output function etc. */
+			if (otype == 0)
+			{
+				Datum minval, maxval;
+				Datum minout, maxout;
+
+				format = "%s, %s";
+				if (i == 0)
+					format = "{%s%s";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %s}";
+
+				minval = histogram->values[i][bucket->min[i]];
+				minout = FunctionCall1(&fmgrinfo[i], minval);
+
+				maxval = histogram->values[i][bucket->max[i]];
+				maxout = FunctionCall1(&fmgrinfo[i], maxval);
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], DatumGetPointer(minout));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], DatumGetPointer(maxout));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else if (otype == 1)
+			{
+				format = "%s, %d";
+				if (i == 0)
+					format = "{%s%d";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %d}";
+
+				snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else
+			{
+				format = "%s, %f";
+				if (i == 0)
+					format = "{%s%f";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %f}";
+
+				snprintf(buff, 1024, format, values[1],
+						 bucket->min[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2],
+						bucket->max[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == histogram->ndimensions-1)
+				format = "%s, %s}";
+
+			snprintf(buff, 1024, format, values[3], bucket->nullsonly[i] ? "t" : "f");
+			strncpy(values[3], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[4], bucket->min_inclusive[i] ? "t" : "f");
+			strncpy(values[4], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[5], bucket->max_inclusive[i] ? "t" : "f");
+			strncpy(values[5], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_size);	/* density */
+		snprintf(values[8], 64, "%f", bucket_size);	/* bucket_size */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+		pfree(values[4]);
+		pfree(values[5]);
+		pfree(values[6]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int i, j;
+
+	float ffull = 0, fpartial = 0;
+	int nfull = 0, npartial = 0;
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		char ranges[1024];
+
+		if (! matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		memset(ranges, 0, sizeof(ranges));
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			sprintf(ranges, "%s [%d %d]", ranges,
+										  DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+										  DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, ranges, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index cd0ed01..c630f96 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,9 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, staname, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2152,8 +2152,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 9));
+							PQgetvalue(result, i, 10));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 7be6223..df6a61c 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -37,13 +37,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -51,6 +54,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -66,17 +70,20 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-
-#define Natts_pg_mv_statistic					10
+#define Natts_pg_mv_statistic					14
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_deps_enabled		3
 #define Anum_pg_mv_statistic_mcv_enabled		4
-#define Anum_pg_mv_statistic_mcv_max_items		5
-#define Anum_pg_mv_statistic_deps_built			6
-#define Anum_pg_mv_statistic_mcv_built			7
-#define Anum_pg_mv_statistic_stakeys			8
-#define Anum_pg_mv_statistic_stadeps			9
-#define Anum_pg_mv_statistic_stamcv				10
+#define Anum_pg_mv_statistic_hist_enabled		5
+#define Anum_pg_mv_statistic_mcv_max_items		6
+#define Anum_pg_mv_statistic_hist_max_buckets	7
+#define Anum_pg_mv_statistic_deps_built			8
+#define Anum_pg_mv_statistic_mcv_built			9
+#define Anum_pg_mv_statistic_hist_built			10
+#define Anum_pg_mv_statistic_stakeys			11
+#define Anum_pg_mv_statistic_stadeps			12
+#define Anum_pg_mv_statistic_stamcv				13
+#define Anum_pg_mv_statistic_stahist			14
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b16f2a9..9d20db5 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2747,6 +2747,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_size}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7f2dc8a..3706525 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -593,10 +593,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index b028192..aa07000 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -91,6 +91,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -98,20 +215,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -120,6 +242,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -129,10 +253,20 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..c3c5216
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON mv_histogram (unknown_column) WITH (histogram);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON mv_histogram (a) WITH (histogram);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a, b) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets 200);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets 10);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets 100000);
+ERROR:  maximum number of buckets is 16384
+-- correct command
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 50715db..b08f977 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1371,7 +1371,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 838c12b..fbed683 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index d97a0ec..c60c0b2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -163,3 +163,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..0ac21b8
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON mv_histogram (unknown_column) WITH (histogram);
+
+-- single column
+CREATE STATISTICS s1 ON mv_histogram (a) WITH (histogram);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a) WITH (histogram);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a, b) WITH (histogram);
+
+-- unknown option
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (unknown_option);
+
+-- missing histogram statistics
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets 200);
+
+-- invalid max_buckets value / too low
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets 10);
+
+-- invalid max_buckets value / too high
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets 100000);
+
+-- correct command
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.1.0

0006-multi-statistics-estimation.patchtext/x-diff; name=0006-multi-statistics-estimation.patchDownload

>From f1b003fb1eabc654e102718945fc785da4a7f023 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/7] multi-statistics estimation

The general idea is that a probability (which
is what selectivity is) can be split into a product of
conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part
may be simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute
the original probability.

The implementation works in the other direction, though.
We know what probability P(A & B & C) we need to compute,
and also what statistics are available.

So we search for a combinations of statistics, covering
the clauses in an optimal way (most clauses covered, most
dependencies exploited).

There are two possible approaches - exhaustive and greedy.
The exhaustive one walks through all permutations of
stats using dynamic programming, so it's guaranteed to
find the optimal solution, but it soon gets very slow as
it's roughly O(N!). The dynamic programming may improve
that a bit, but it's still far too expensive for large
numbers of statistics (on a single table).

The greedy algorithm is very simple - in every step choose
the best solution. That may not guarantee the best solution
globally (but maybe it does?), but it only needs N steps
to find the solution, so it's very fast (processing the
selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with
respect to runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply
them to the clauses using the conditional probabilities.
We process the selected stats one by one, and for each
we select the estimated clauses and conditions. See
clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to
be covered by a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single
multivariate statistics.

Clauses not covered by a single statistics at this level
will be passed to clause_selectivity() but this will treat
them as a collection of simpler clauses (connected by AND
or OR), and the clauses from the previous level will be
used as conditions.

So using the same example, the last clause will be passed
to clause_selectivity() with 'clause1' and 'clause2' as
conditions, and it will be processed using multivariate
stats if possible.

The other limitation is that all the expressions have to
be mv-compatible, i.e. there can't be a mix of expressions.
If this is violated, the clause may be passed to the next
level (just like with list of clauses not covered by
a single statistics), which splits that into clauses
handled by multivariate stats and clauses handler by
regular statistics.

rework clauselist_selectivity_or to handle OR-clauses correctly

We might invent a completely new set of functions here, resembling
clauselist_selectivity but adapting the ideas to OR-clauses.

But luckily we know that each OR-clause

    (a OR b OR c)

may be rewritten as an equivalent AND-clause using negation:

    NOT ((NOT a) AND (NOT b) AND (NOT c))

And that's something we can pass to clauselist_selectivity.

histogram call cache
--------------------

The call cache was removed because it did not initially work
well with OR clauses, but that was just a stupid thinko in the
implementation. This patch re-adds it, hopefully correctly.

The code in update_match_bitmap_histogram() is overly complex,
the branches handling various inequality cases are redundant.
This needs to be simplified somehow.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |    6 +-
 src/backend/optimizer/path/clausesel.c | 2224 +++++++++++++++++++++++++++-----
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 9 files changed, 2003 insertions(+), 308 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 83bbfa1..1d1571c 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -954,7 +954,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9a014d4..7d09fe3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -454,7 +454,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -1836,7 +1837,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_join_conds,
 										   baserel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 6c99f02..8d15d3c8 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -59,23 +68,29 @@ static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
 									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
 									  int type);
 
+static Bitmapset *clause_mv_get_attnums(PlannerInfo *root, Node *clause);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
 static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
 								 List *clauses, Oid varRelid,
 								 List **mvclauses, MVStatisticInfo *mvstats, int types);
 
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-						List *clauses, MVStatisticInfo *mvstats);
+						MVStatisticInfo *mvstats, List *clauses,
+						List *conditions, bool is_or);
+
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -89,11 +104,59 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root,
+								List *mvstats,
+								List *clauses, List *conditions,
+								Oid varRelid,
+								SpecialJoinInfo *sjinfo);
+
+static List *filter_clauses(PlannerInfo *root, Oid varRelid,
+							SpecialJoinInfo *sjinfo, int type,
+							List *stats, List *clauses,
+							Bitmapset **attnums);
+
+static List *filter_stats(List *stats, Bitmapset *new_attnums,
+						  Bitmapset *all_attnums);
+
+static Bitmapset **make_stats_attnums(MVStatisticInfo *mvstats,
+									  int nmvstats);
+
+static MVStatisticInfo *make_stats_array(List *stats, int *nmvstats);
+
+static List* filter_redundant_stats(List *stats,
+									List *clauses, List *conditions);
+
+static Node** make_clauses_array(List *clauses, int *nclauses);
+
+static Bitmapset ** make_clauses_attnums(PlannerInfo *root, Oid varRelid,
+										 SpecialJoinInfo *sjinfo, int type,
+										 Node **clauses, int nclauses);
+
+static bool* make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+							Bitmapset **clauses_attnums, int nclauses);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
 						 Oid varRelid, Index *relid);
- 
+
 static Bitmapset* fdeps_collect_attnums(List *stats);
 
 static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
@@ -116,6 +179,8 @@ static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
 
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
+
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
 #define MIN(x, y) (((x) < (y)) ? (x) : (y))
@@ -257,14 +322,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* attributes in mv-compatible clauses */
 	Bitmapset  *mvattnums = NULL;
@@ -274,12 +340,13 @@ clauselist_selectivity(PlannerInfo *root,
 	stats = find_stats(root, clauses, varRelid, &relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Check that there are some stats with functional dependencies
@@ -311,8 +378,8 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
-	 * Check that there are statistics with MCV list. If not, we don't
-	 * need to waste time with the optimization.
+	 * Check that there are statistics with MCV list or histogram.
+	 * If not, we don't need to waste time with the optimization.
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 	{
@@ -326,33 +393,194 @@ clauselist_selectivity(PlannerInfo *root,
 
 		/*
 		 * If there still are at least two columns, we'll try to select
-		 * a suitable multivariate stats.
+		 * a suitable combination of multivariate stats. If there are
+		 * multiple combinations, we'll try to choose the best one.
+		 * See choose_mv_statistics for more details.
 		 */
 		if (bms_num_members(mvattnums) >= 2)
 		{
-			/* see choose_mv_statistics() for details */
-			MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+			int k;
+			ListCell *s;
+
+			/*
+			 * Copy the list of conditions, so that we can build a list
+			 * of local conditions (and keep the original intact, for
+			 * the other clauses at the same level).
+			 */
+			List *conditions_local = list_copy(conditions);
+
+			/* find the best combination of statistics */
+			List *solution = choose_mv_statistics(root, stats,
+												  clauses, conditions,
+												  varRelid, sjinfo);
 
-			if (mvstat != NULL)	/* we have a matching stats */
+			/* we have a good solution (list of stats) */
+			foreach (s, solution)
 			{
+				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
 				/* clauses compatible with multi-variate stats */
 				List	*mvclauses = NIL;
+				List	*mvclauses_new = NIL;
+				List	*mvclauses_conditions = NIL;
+				Bitmapset	*stat_attnums = NULL;
 
-				/* split the clauselist into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
+				/* build attnum bitmapset for this statistics */
+				for (k = 0; k < mvstat->stakeys->dim1; k++)
+					stat_attnums = bms_add_member(stat_attnums,
+												  mvstat->stakeys->values[k]);
+
+				/*
+				 * Append the compatible conditions (passed from above)
+				 * to mvclauses_conditions.
+				 */
+				foreach (l, conditions)
+				{
+					Node *c = (Node*)lfirst(l);
+					Bitmapset *tmp = clause_mv_get_attnums(root, c);
+
+					if (bms_is_subset(tmp, stat_attnums))
+						mvclauses_conditions
+							= lappend(mvclauses_conditions, c);
+
+					bms_free(tmp);
+				}
+
+				/* split the clauselist into regular and mv-clauses
+				 *
+				 * We keep the list of clauses (we don't remove the
+				 * clauses yet, because we want to use the clauses
+				 * as conditions of other clauses).
+				 *
+				 * FIXME Do this only once, i.e. filter the clauses
+				 *       once (selecting clauses covered by at least
+				 *       one statistics) and then convert them into
+				 *       smaller per-statistics lists of conditions
+				 *       and estimated clauses.
+				 */
+				clauselist_mv_split(root, sjinfo, clauses,
 										varRelid, &mvclauses, mvstat,
 										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
-				/* we've chosen the histogram to match the clauses */
+				/*
+				 * We've chosen the statistics to match the clauses, so
+				 * each statistics from the solution should have at least
+				 * one new clause (not covered by the previous stats).
+				 */
 				Assert(mvclauses != NIL);
 
+				/*
+				 * Mvclauses now contains only clauses compatible
+				 * with the currently selected stats, but we have to
+				 * split that into conditions (already matched by
+				 * the previous stats), and the new clauses we need
+				 * to estimate using this stats.
+				 */
+				foreach (l, mvclauses)
+				{
+					ListCell *p;
+					bool covered = false;
+					Node  *clause = (Node *) lfirst(l);
+					Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+
+					/*
+					 * If already covered by previous stats, add it to
+					 * conditions.
+					 *
+					 * TODO Maybe this could be relaxed a bit? Because
+					 *      with complex and/or clauses, this might
+					 *      mean no statistics actually covers such
+					 *      complex clause.
+					 */
+					foreach (p, solution)
+					{
+						int k;
+						Bitmapset  *stat_attnums = NULL;
+
+						MVStatisticInfo *prev_stat
+							= (MVStatisticInfo *)lfirst(p);
+
+						/* break if we've ran into current statistic */
+						if (prev_stat == mvstat)
+							break;
+
+						for (k = 0; k < prev_stat->stakeys->dim1; k++)
+							stat_attnums = bms_add_member(stat_attnums,
+														  prev_stat->stakeys->values[k]);
+
+						covered = bms_is_subset(clause_attnums, stat_attnums);
+
+						bms_free(stat_attnums);
+
+						if (covered)
+							break;
+					}
+
+					if (covered)
+						mvclauses_conditions
+							= lappend(mvclauses_conditions, clause);
+					else
+						mvclauses_new
+							= lappend(mvclauses_new, clause);
+				}
+
+				/*
+				 * We need at least one new clause (not just conditions).
+				 */
+				Assert(mvclauses_new != NIL);
+
 				/* compute the multivariate stats */
-				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+				s1 *= clauselist_mv_selectivity(root, mvstat,
+												mvclauses_new,
+												mvclauses_conditions,
+												false); /* AND */
+			}
+
+			/*
+			 * And now finally remove all the mv-compatible clauses.
+			 *
+			 * This only repeats the same split as above, but this
+			 * time we actually use the result list (and feed it to
+			 * the next call).
+			 */
+			foreach (s, solution)
+			{
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+				/* split the list into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+				/*
+				 * Add the clauses to the conditions (to be passed
+				 * to regular clauses), irrespectedly whether it
+				 * will be used as a condition or a clause here.
+				 *
+				 * We only keep the remaining conditions in the
+				 * clauses (we keep what clauselist_mv_split returns)
+				 * so we add each MV condition exactly once.
+				 */
+				conditions_local = list_concat(conditions_local, mvclauses);
 			}
+
+			/* from now on, work with the 'local' list of conditions */
+			conditions = conditions_local;
 		}
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return clause_selectivity(root, (Node *) linitial(clauses),
+								  varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -364,7 +592,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -523,6 +752,55 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for OR-clauses. We can't
+ * simply apply exactly the same logic as to AND-clauses, because there
+ * are a few key differences:
+ *
+ *   - functional dependencies don't really apply to OR-clauses
+ *
+ *   - clauselist_selectivity() works by decomposing the selectivity
+ *     into conditional selectivities (probabilities), but that can be
+ *     done only for AND-clauses. That means problems with applying
+ *     multiple statistics (and reusing clauses as conditions, etc.).
+ *
+ * We might invent a completely new set of functions here, resembling
+ * clauselist_selectivity but adapting the ideas to OR-clauses.
+ *
+ * But luckily we know that each OR-clause
+ *
+ *     (a OR b OR c)
+ *
+ * may be rewritten as an equivalent AND-clause using negation:
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * And that's something we can pass to clauselist_selectivity.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	List	   *args = NIL;
+	ListCell   *l;
+	Expr	   *expr;
+
+	/* (NOT ...) */
+	foreach (l, clauses)
+		args = lappend(args, makeBoolExpr(NOT_EXPR, list_make1(lfirst(l)), -1));
+
+	/* ((NOT ...) AND (NOT ...)) */
+	expr = makeBoolExpr(AND_EXPR, args, -1);
+
+	/* NOT (... AND ...) */
+	return 1.0 - clauselist_selectivity(root, list_make1(expr), varRelid,
+										jointype, sjinfo, conditions);
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -729,7 +1007,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -849,7 +1128,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -858,29 +1138,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -970,7 +1239,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -979,7 +1249,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else
 	{
@@ -1103,9 +1374,67 @@ clause_selectivity(PlannerInfo *root,
  *      them without inspection, which is more expensive). But this
  *      requires really knowing the per-clause selectivities in advance,
  *      and that's not what we do now.
+ *
+ * TODO All this is based on the assumption that the statistics represent
+ *      the necessary dependencies, i.e. that if two colunms are not in
+ *      the same statistics, there's no dependency. If that's not the
+ *      case, we may get misestimates, just like before. For example
+ *      assume we have a table with three columns [a,b,c] with exactly
+ *      the same values, and statistics on [a,b] and [b,c]. So somthing
+ *      like this:
+ *
+ *          CREATE TABLE test AS SELECT i, i, i
+                                  FROM generate_series(1,1000);
+ *
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (a,b);
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (b,c);
+ *
+ *          ANALYZE test;
+ *
+ *          EXPLAIN ANALYZE SELECT * FROM test
+ *                    WHERE (a < 10) AND (b < 20) AND (c < 10);
+ *
+ *      The problem here is that the only shared column between the two
+ *      statistics is 'b' so the probability will be computed like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (a < 10) & (b < 20)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (b < 20)]
+ *
+ *      or like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20)]
+ *
+ *      In both cases the conditional probabilities will be evaluated as
+ *      0.5, because they lack the other column (which would make it 1.0).
+ *
+ *      Theoretically it might be possible to transfer the dependency,
+ *      e.g. by building bitmap for [a,b] and then combine it with [b,c]
+ *      by doing something like this:
+ *
+ *          1) build bitmap on [a,b] using [(a<10) & (b < 20)]
+ *          2) for each element in [b,c] check the bitmap
+ *
+ *      But that's certainly nontrivial - for example the statistics may
+ *      be different (MCV list vs. histogram) and/or the items may not
+ *      match (e.g. MCV items or histogram buckets will be built
+ *      differently). Also, for one value of 'b' there might be multiple
+ *      MCV items (because of the other column values) with different
+ *      bitmap values (some will match, some won't) - so it's not exactly
+ *      bitmap but a partial match.
+ *
+ *      Maybe a hash table with number of matches and mismatches (or
+ *      maybe sums of frequencies) would work? The step (2) would then
+ *      lookup the values and use that to weight the item somehow.
+ * 
+ *      Currently the only solution is to build statistics on all three
+ *      columns.
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -1123,7 +1452,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -1136,7 +1466,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
 	 *       selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1176,8 +1507,7 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	if (bms_num_members(attnums) <= 1)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
+		bms_free(attnums);
 		attnums = NULL;
 		*relid = InvalidOid;
 	}
@@ -1186,202 +1516,931 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
- * We're looking for statistics matching at least 2 attributes,
- * referenced in the clauses compatible with multivariate statistics.
- * The current selection criteria is very simple - we choose the
- * statistics referencing the most attributes.
+ * Selects the best combination of multivariate statistics, in an
+ * exhaustive way, where 'best' means:
  *
- * If there are multiple statistics referencing the same number of
- * columns (from the clauses), the one with less source columns
- * (as listed in the ADD STATISTICS when creating the statistics) wins.
- * Other wise the first one wins.
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
  *
- * This is a very simple criteria, and has several weaknesses:
+ * There may be other optimality criteria, not considered in the initial
+ * implementation (more on that 'weaknesses' section).
  *
- * (a) does not consider the accuracy of the statistics
+ * This pretty much splits the probability of clauses (aka selectivity)
+ * into a sequence of conditional probabilities, like this
  *
- *     If there are two histograms built on the same set of columns,
- *     but one has 100 buckets and the other one has 1000 buckets (thus
- *     likely providing better estimates), this is not currently
- *     considered.
+ *    P(A,B,C,D) = P(A,B) * P(C|A,B) * P(D|A,B,C)
  *
- * (b) does not consider the type of statistics
+ * and removing the attributes not referenced by the existing stats,
+ * under the assumption that there's no dependency (otherwise the DBA
+ * would create the stats).
  *
- *     If there are three statistics - one containing just a MCV list,
- *     another one with just a histogram and a third one with both,
- *     this is not considered.
+ * The last criteria means that when we have the choice to compute like
+ * this
  *
- * (c) does not consider the number of clauses
+ *      P(A,B,C,D) = P(A,B,C) * P(D|B,C)
  *
- *     As explained, only the number of referenced attributes counts,
- *     so if there are multiple clauses on a single attribute, this
- *     still counts as a single attribute.
+ * or like this
  *
- * (d) does not consider type of condition
+ *      P(A,B,C,D) = P(A,B,C) * P(D|C)
  *
- *     Some clauses may work better with some statistics - for example
- *     equality clauses probably work better with MCV lists than with
- *     histograms. But IS [NOT] NULL conditions may often work better
- *     with histograms (thanks to NULL-buckets).
+ * we should use the first option, as that exploits more dependencies.
  *
- * So for example with five WHERE conditions
+ * The order of statistics in the solution implicitly determines the
+ * order of estimation of clauses, because as we apply a statistics,
+ * we always use it to estimate all the clauses covered by it (and
+ * then we use those clauses as conditions for the next statistics).
  *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ * Don't call this directly but through choose_mv_statistics().
  *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
- * selected as it references the most columns.
  *
- * Once we have selected the multivariate statistics, we split the list
- * of clauses into two parts - conditions that are compatible with the
- * selected stats, and conditions are estimated using simple statistics.
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with
+ * maximum 'depth' equal to the number of multi-variate statistics
+ * available on the table.
  *
- * From the example above, conditions
+ * It explores all the possible permutations of the stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it
+ * matches are divided into 'conditions' (clauses already matched by at
+ * least one previous statistics) and clauses that are estimated.
  *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ * Then several checks are performed:
  *
- * will be estimated using the multivariate statistics (a,b,c,d) while
- * the last condition (e = 1) will get estimated using the regular ones.
+ *  (a) The statistics covers at least 2 columns, referenced in the
+ *      estimated clauses (otherwise multi-variate stats are useless).
  *
- * There are various alternative selection criteria (e.g. counting
- * conditions instead of just referenced attributes), but eventually
- * the best option should be to combine multiple statistics. But that's
- * much harder to do correctly.
+ *  (b) The statistics covers at least 1 new column, i.e. column not
+ *      refefenced by the already used stats (and the new column has
+ *      to be referenced by the clauses, of couse). Otherwise the
+ *      statistics would not add any new information.
  *
- * TODO Select multiple statistics and combine them when computing
- *      the estimate.
+ * There are some other sanity checks (e.g. that the stats must not be
+ * used twice etc.).
  *
- * TODO This will probably have to consider compatibility of clauses,
- *      because 'dependencies' will probably work only with equality
- *      clauses.
+ * Finally the new solution is compared to the currently best one, and
+ * if it's considered better, it's used instead.
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a somewhat simple optimality criteria,
+ * suffering by the following weaknesses.
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but
+ *     with statistics in a different order). It's unclear which solution
+ *     is the best one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those
+ *      solutions, and then combine them to get the final estimate
+ *      (e.g. by using average or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for
+ *     some types of clauses (e.g. MCV list is a good match for equality
+ *     than a histogram).
+ *
+ *     XXX Maybe MCV is almost always better / more accurate?
+ *
+ *     But maybe this is pointless - generally, each column is either
+ *     a label (it's not important whether because of the data type or
+ *     how it's used), or a value with ordering that makes sense. So
+ *     either a MCV list is more appropriate (labels) or a histogram
+ *     (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing columns of
+ *     both types - maybe it'd be beeter to invent a new type of stats
+ *     combining MCV list and histogram (keeping a small histogram for
+ *     each MCV item, and a separate histogram for values not on the
+ *     MCV list). But that's not implemented at this moment.
+ *
+ * TODO The algorithm should probably count number of Vars (not just
+ *      attnums) when computing the 'score' of each solution. Computing
+ *      the ratio of (num of all vars) / (num of condition vars) as a
+ *      measure of how well the solution uses conditions might be
+ *      useful.
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
 {
-	int i;
-	ListCell   *lc;
+	int i, j;
 
-	MVStatisticInfo *choice = NULL;
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements)
-	 * and for each one count the referenced attributes (encoded in
-	 * the 'attnums' bitmap).
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
 	 */
-	foreach (lc, stats)
+	for (i = 0; i < nmvstats; i++)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+		int c;
 
-		/* columns matching this statistics */
-		int matches = 0;
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
 			continue;
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
-
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		for (c = 0; c < nclauses; c++)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
-		}
-	}
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
 
-	return choice;
-}
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
 
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
 
-/*
- * This splits the clauses list into two parts - one containing clauses
- * that will be evaluated using the chosen statistics, and the remaining
- * clauses (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
-					List *clauses, Oid varRelid, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
 
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
 
-	Bitmapset *mvattnums = NULL;
+				if (covered)
+					break;
+			}
 
-	/* build bitmap of attributes covered by the stats, so we can
-	 * do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
 
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
 
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
 
-		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
-									&attnums, sjinfo, types))
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
 		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
 		}
 
 		/*
-		 * The clause matches the selected stats, so put it to the list
-		 * of mv-compatible clauses. Otherwise, keep it in the list of
-		 * 'regular' clauses (that may be selected later).
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
 		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
-	}
+		ruled_out[i] = step;
 
-	/*
-	 * Perform regular estimation using the clauses incompatible
-	 * with the chosen histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
 
-}
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
 
-/*
- * Determines whether the clause is compatible with multivariate stats,
- * and if it is, returns some additional information - varno (index
- * into simple_rte_array) and a bitmap of attributes. This is then
- * used to fetch related multivariate statistics.
- *
- * At this moment we only support basic conditions of the form
- *
- *     variable OP constant
- *
- * where OP is one of [=,<,<=,>=,>] (which is however determined by
- * looking at the associated function for estimating selectivity, just
- * like with the single-dimensional case).
- *
- * TODO Support 'OR clauses' - shouldn't be all that difficult to
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		/* we can't get more conditions that clauses and conditions combined
+		 *
+		 * FIXME This assert does not work because we count the conditions
+		 *       repeatedly (once for each statistics covering it).
+		 */
+		/* Assert((nconditions + nclauses) >= current->nconditions); */
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics
+ * covering the clauses. This chooses the "best" statistics at each step,
+ * so the resulting solution may not be the best solution globally, but
+ * this produces the solution in only N steps (where N is the number of
+ * statistics), while the exhaustive approach may have to walk through
+ * ~N! combinations (although some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does
+ * the same thing (but in a different way).
+ *
+ * Don't call this directly, but through choose_mv_statistics().
+ *
+ * TODO There are probably other metrics we might use - e.g. using
+ *      number of columns (num_cond_columns / num_cov_columns), which
+ *      might work better with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled
+ *      in a special way, because there will be 0 conditions at that
+ *      moment, so there needs to be some other criteria - e.g. using
+ *      the simplest (or most complex?) clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria,
+ *      and branch the search. This is however tricky, because if we
+ *      choose k statistics at each step, we get k^N branches to
+ *      walk through (with N steps). That's not really good with
+ *      large number of stats (yet better than exhaustive search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
+ * Chooses the combination of statistics, optimal for estimation of
+ * a particular clause list.
+ *
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce
+ * the size of the problem (eliminate clauses/statistics that can't be
+ * really used in the solution).
+ *
+ * It also precomputes bitmaps for attributes covered by clauses and
+ * statistics, so that we don't need to do that over and over in the
+ * actual optimizations (as it's both CPU and memory intensive).
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ *
+ * TODO Another way to make the optimization problems smaller might
+ *      be splitting the statistics into several disjoint subsets, i.e.
+ *      if we can split the graph of statistics (after the elimination)
+ *      into multiple components (so that stats in different components
+ *      share no attributes), we can do the optimization for each
+ *      component separately.
+ *
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew
+ *      that we can cover 10 clauses and reuse 8 dependencies, maybe
+ *      covering 9 clauses and 7 dependencies would be OK?
+ */
+static List*
+choose_mv_statistics(PlannerInfo *root, List *stats,
+					 List *clauses, List *conditions,
+					 Oid varRelid, SpecialJoinInfo *sjinfo)
+{
+	int i;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
+
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
+
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
+
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
+
+	/*
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
+	 */
+	while (true)
+	{
+		List	   *tmp;
+
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
+
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, varRelid, sjinfo, type,
+							 stats, clauses, &compatible_attnums);
+
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
+
+		/*
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
+		 */
+		if (conditions != NIL)
+		{
+			tmp = filter_clauses(root, varRelid, sjinfo, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
+		}
+
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
+
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
+
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
+
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
+
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
+
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
+										   clauses_array, nclauses);
+
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
+										   conditions_array, nconditions);
+
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
+		{
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
+		}
+		pfree(best);
+	}
+
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
+
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
+
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
+
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
+
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
+
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
+
+	return result;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen statistics, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes covered by the stats, so we can
+	 * do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
+									&attnums, sjinfo, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list
+		 * of mv-compatible clauses. Otherwise, keep it in the list of
+		 * 'regular' clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
  *      evaluate them using multivariate stats.
  */
 static bool
@@ -1539,10 +2598,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 
 		return true;
 	}
-	else if (or_clause(clause) || and_clause(clause))
+	else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 	{
 		/*
-		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
 		 *
 		 * TODO We might support mixed case, where some of the clauses
 		 *      are supported and some are not, and treat all supported
@@ -1552,7 +2611,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		 *
 		 * TODO For RestrictInfo above an OR-clause, we might use the
 		 *      orclause with nested RestrictInfo - we won't have to
-		 *      call pull_varnos() for each clause, saving time. 
+		 *      call pull_varnos() for each clause, saving time.
+		 *
+		 * TODO Perhaps this needs a bit more thought for functional
+		 *      dependencies? Those don't quite work for NOT cases.
 		 */
 		Bitmapset *tmp = NULL;
 		ListCell *l;
@@ -1572,6 +2634,51 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 	return false;
 }
 
+
+static Bitmapset *
+clause_mv_get_attnums(PlannerInfo *root, Node *clause)
+{
+	Bitmapset * attnums = NULL;
+
+	/* Extract clause from restrict info, if needed. */
+	if (IsA(clause, RestrictInfo))
+		clause = (Node*)((RestrictInfo*)clause)->clause;
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+
+		if (IsA(linitial(expr->args), Var))
+			attnums = bms_add_member(attnums,
+							((Var*)linitial(expr->args))->varattno);
+		else
+			attnums = bms_add_member(attnums,
+							((Var*)lsecond(expr->args))->varattno);
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		attnums = bms_add_member(attnums,
+							((Var*)((NullTest*)clause)->arg)->varattno);
+	}
+	else if (or_clause(clause) || and_clause(clause) || or_clause(clause))
+	{
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			attnums = bms_join(attnums,
+						clause_mv_get_attnums(root, (Node*)lfirst(l)));
+		}
+	}
+
+	return attnums;
+}
+
 /*
  * Performs reduction of clauses using functional dependencies, i.e.
  * removes clauses that are considered redundant. It simply walks
@@ -2223,22 +3330,26 @@ get_varattnos(Node * node, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2249,32 +3360,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
 
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2567,64 +3731,57 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				}
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mcvlist->nitems; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2682,15 +3839,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2701,27 +3861,57 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 	/* There may be no histogram in the stats (check hist_built flag) */
 	mvhist = load_mv_histogram(mvstats->mvoid);
 
-	Assert (mvhist != NULL);
-	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
-
-	nmatches = mvhist->nbuckets;
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2735,17 +3925,35 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2775,7 +3983,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 {
 	int i;
 	ListCell * l;
-
+ 
 	/*
 	 * Used for caching function calls, only once per deduplicated value.
 	 *
@@ -2818,7 +4026,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 			FmgrInfo	opproc;			/* operator */
 			fmgr_info(get_opcode(expr->opno), &opproc);
-
+ 
 			/* reset the cache (per clause) */
 			memset(callcache, 0, mvhist->nbuckets);
 
@@ -2870,7 +4078,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 					/* histogram boundaries */
 					Datum minval, maxval;
-
+ 
 					/* values from the call cache */
 					char mincached, maxcached;
 
@@ -2959,7 +4167,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 								}
 
 								/*
-								 * Now check whether the upper boundary is below the constant (in that
+								 * Now check whether constant is below the upper boundary (in that
 								 * case it's a partial match).
 								 */
 								if (! maxcached)
@@ -2978,8 +4186,32 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 								else
 									tmp = !(maxcached & 0x02);	/* extract the result (reverse) */
 
-								if (tmp)	/* partial match */
+								if (tmp)
+								{
+									/* partial match */
 									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+									continue;
+								}
+
+
+								/*
+								 * And finally check whether the whether the constant is above the the upper
+								 * boundary (in that case it's a full match match).
+								 *
+								 * XXX We need to do this because of the OR clauses (which start with no
+								 *     matches and we incrementally add more and more matches), but maybe
+								 *     we don't need to do the check and can just do UPDATE_RESULT?
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 maxval,
+																	 cst->constvalue));
+
+								if (tmp)
+								{
+									/* full match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_FULL, is_or);
+								}
 
 							}
 							else	/* (const < var) */
@@ -3018,15 +4250,36 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 																		 DEFAULT_COLLATION_OID,
 																		 minval,
 																		 cst->constvalue));
-
 									/* Update the cache. */
 									callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
-								}
+ 								}
 								else
 									tmp = (mincached & 0x02);	/* extract the result */
 
-								if (tmp)	/* partial match */
+								if (tmp)
+								{
+									/* partial match */
 									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 *
+								 * XXX We need to do this because of the OR clauses (which start with no
+								 *     matches and we incrementally add more and more matches), but maybe
+								 *     we don't need to do the check and can just do UPDATE_RESULT?
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 minval));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_FULL, is_or);
+
 							}
 							break;
 
@@ -3082,8 +4335,29 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									tmp = !(mincached & 0x02);	/* extract the result */
 
 								if (tmp)
+								{
 									/* partial match */
 									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 *
+								 * XXX We need to do this because of the OR clauses (which start with no
+								 *     matches and we incrementally add more and more matches), but maybe
+								 *     we don't need to do the check and can just do UPDATE_RESULT?
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 minval,
+																	 cst->constvalue));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_FULL, is_or);
+
 							}
 							else /* (const > var) */
 							{
@@ -3129,8 +4403,30 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									tmp = (maxcached & 0x02);	/* extract the result */
 
 								if (tmp)
+								{
 									/* partial match */
 									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 *
+								 * XXX We need to do this because of the OR clauses (which start with no
+								 *     matches and we incrementally add more and more matches), but maybe
+								 *     we don't need to do the check and can just do UPDATE_RESULT?
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 maxval));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_FULL, is_or);
+									continue;
+
 							}
 							break;
 
@@ -3195,6 +4491,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 							else
 								tmp = (maxcached & 0x02);	/* extract the result */
 
+
 							if (tmp)
 							{
 								/* no match */
@@ -3246,64 +4543,57 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
-			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			/* by default none of the buckets matches the clauses (OR clause) */
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* but AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
-
+			pfree(tmp_matches);
 		}
 		else
 			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* free the call cache */
 	pfree(callcache);
 
 #ifdef DEBUG_MVHIST
@@ -3312,3 +4602,363 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Walk through clauses and keep only those covered by at least
+ * one of the statistics.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
+			   int type, List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+		Index relid;
+
+		/*
+		 * The clause has to be mv-compatible (suitable operators etc.).
+		 */
+		if (! clause_is_mv_compatible(root, clause, varRelid,
+							 &relid, &clause_attnums, sjinfo, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* is there a statistics covering this clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			for (k = 0; k < stat->stakeys->dim1; k++)
+			{
+				if (bms_is_member(stat->stakeys->values[k],
+								  clause_attnums))
+					matches += 1;
+			}
+
+			/*
+			 * The clause is compatible if all attributes it references
+			 * are covered by the statistics.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+
+/*
+ * Walk through statistics and only keep those covering at least
+ * one new attribute (excluding conditions) and at two attributes
+ * in both clauses and conditions.
+ *
+ * This check might be made more strict by checking against individual
+ * clauses, because by using the bitmapsets of all attnums we may
+ * actually use attnums from clauses that are not covered by the
+ * statistics. For example, we may have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this
+ * (assuming there are some statistics covering both clases).
+ *
+ * TODO Do the more strict check.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
+
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
+
+	return mvstats;
+}
+
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
+
+	Assert(nmvstats > 0);
+
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
+}
+
+
+/*
+ * Now let's remove redundant statistics, covering the same columns
+ * as some other stats, when restricted to the attributes from
+ * remaining clauses.
+ *
+ * If statistics S1 covers S2 (covers S2 attributes and possibly
+ * some more), we can probably remove S2. What actually matters are
+ * attributes from covered clauses (not all the attributes). This
+ * might however prefer larger, and thus less accurate, statistics.
+ *
+ * When a redundancy is detected, we simply keep the smaller
+ * statistics (less number of columns), on the assumption that it's
+ * more accurate and faster to process. That might be incorrect for
+ * two reasons - first, the accuracy really depends on number of
+ * buckets/MCV items, not the number of columns. Second, we might
+ * prefer MCV lists over histograms or something like that.
+ */
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
+{
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
+
+	/*
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+
+	/*
+	 * Get the varattnos from both conditions and clauses.
+	 *
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
+	 *
+	 * XXX Is that really true?
+	 */
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
+	{
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
+
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
+	}
+
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
+	}
+
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
+}
+
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
+{
+	int i;
+	ListCell *l;
+
+	Node** clauses_array;
+
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
+
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
+}
+
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
+					 int type, Node **clauses, int nclauses)
+{
+	int			i;
+	Index		relid;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
+
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
+
+		if (! clause_is_mv_compatible(root, clauses[i], varRelid,
+									  &relid, &attnums, sjinfo, type))
+			elog(ERROR, "should not get non-mv-compatible cluase");
+
+		clauses_attnums[i] = attnums;
+	}
+
+	return clauses_attnums;
+}
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 990486c..9e001ee 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3431,7 +3431,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3454,7 +3455,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3621,7 +3623,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3657,7 +3659,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3694,7 +3697,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3832,12 +3836,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3849,7 +3855,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index f0acc14..e41508b 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 15121bc..7341cd6 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1625,13 +1625,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6257,7 +6259,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6582,7 +6585,8 @@ btcostestimate(PG_FUNCTION_ARGS)
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7343,7 +7347,8 @@ gincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7575,7 +7580,7 @@ brincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a185749..909c2c7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -380,6 +381,15 @@ static const struct config_enum_entry huge_pages_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3672,6 +3682,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ac21a3a..2431751 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -191,11 +191,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index aa07000..9fd1314 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -16,6 +16,14 @@
 
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
2.1.0

0007-initial-version-of-ndistinct-conefficient-statistics.patchtext/x-diff; name=0007-initial-version-of-ndistinct-conefficient-statistics.patchDownload

>From 55211180c650c22924a4b0a261e7d36ec83c0d8c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Wed, 23 Dec 2015 02:07:58 +0100
Subject: [PATCH 7/7] initial version of ndistinct conefficient statistics

---
 src/backend/commands/statscmds.c       |  11 ++-
 src/backend/optimizer/path/clausesel.c |   7 ++
 src/backend/optimizer/util/plancat.c   |   4 +-
 src/backend/utils/mvstats/Makefile     |   2 +-
 src/backend/utils/mvstats/common.c     |  20 ++++-
 src/backend/utils/mvstats/mvdist.c     | 147 +++++++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h  |  26 +++---
 src/include/nodes/relation.h           |   2 +
 src/include/utils/mvstats.h            |   6 ++
 9 files changed, 208 insertions(+), 17 deletions(-)
 create mode 100644 src/backend/utils/mvstats/mvdist.c

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 68e1685..de140b4 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -136,7 +136,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool 	build_dependencies = false,
 			build_mcv = false,
-			build_histogram = false;
+			build_histogram = false,
+			build_ndistinct = false;
 
 	int32 	max_buckets = -1,
 			max_mcv_items = -1;
@@ -200,6 +201,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "max_mcv_items") == 0)
@@ -254,10 +257,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv || build_histogram))
+	if (! (build_dependencies || build_mcv || build_histogram || build_ndistinct))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram, ndistinct) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -291,6 +294,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
 	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_ndist_enabled-1] = BoolGetDatum(build_ndistinct);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
@@ -298,6 +302,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+	nulls[Anum_pg_mv_statistic_standist -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 8d15d3c8..c717f96 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
+#define		MV_CLAUSE_TYPE_NDIST	0x08
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -377,6 +378,9 @@ clauselist_selectivity(PlannerInfo *root,
 													stats, sjinfo);
 	}
 
+	if (has_stats(stats, MV_CLAUSE_TYPE_NDIST))
+		elog(WARNING, "has ndistinct coefficient stats");
+
 	/*
 	 * Check that there are statistics with MCV list or histogram.
 	 * If not, we don't need to waste time with the optimization.
@@ -2931,6 +2935,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_NDIST) && stat->ndist_built)
+			return true;
 	}
 
 	return false;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9aded52..f4edfe6 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -410,7 +410,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -421,11 +421,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
 				info->hist_enabled = mvstat->hist_enabled;
+				info->ndist_enabled = mvstat->ndist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
 				info->hist_built = mvstat->hist_built;
+				info->ndist_built = mvstat->ndist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 9dbb3b6..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o histogram.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index ffb76f4..c42ca8f 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -53,6 +53,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
 		MVHistogram	histogram = NULL;
+		double		ndist	  = -1;
 		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
@@ -92,6 +93,9 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		if (stat->ndist_enabled)
+			ndist = build_mv_ndistinct(numrows, rows, attrs, stats);
+
 		/* build the MCV list */
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
@@ -101,7 +105,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, ndist, attrs, stats);
 	}
 }
 
@@ -183,6 +187,8 @@ list_mv_stats(Oid relid)
 		info->mcv_built = stats->mcv_built;
 		info->hist_enabled = stats->hist_enabled;
 		info->hist_built = stats->hist_built;
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
 
 		result = lappend(result, info);
 	}
@@ -252,7 +258,7 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 void
 update_mv_stats(Oid mvoid,
 				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
-				int2vector *attrs, VacAttrStats **stats)
+				double ndistcoeff, int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -292,26 +298,36 @@ update_mv_stats(Oid mvoid,
 			= PointerGetDatum(data);
 	}
 
+	if (ndistcoeff > 1.0)
+	{
+		nulls[Anum_pg_mv_statistic_standist -1] = false;
+		values[Anum_pg_mv_statistic_standist-1] = Float8GetDatum(ndistcoeff);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_standist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_ndist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_ndist_built-1] = true;
 	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_ndist_built-1] = BoolGetDatum(ndistcoeff > 1.0);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..6df7411
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,147 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * 
+ */
+double
+build_mv_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	MultiSortSupport mss = multi_sort_init(numattrs);
+	int ndistinct;
+	double result;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	Assert(numattrs >= 2);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, i, stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[i],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct++;
+	}
+
+	result = 1 / (double)ndistinct;
+
+	/*
+	 * now count distinct values for each attribute and incrementally
+	 * compute ndistinct(a,b) / (ndistinct(a) * ndistinct(b))
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		SortSupportData ssup;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+		memset(values, 0, sizeof(Datum) * numrows);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			bool isnull;
+			values[j] = heap_getattr(rows[j], attrs->values[i],
+									 stats[i]->tupDesc, &isnull);
+		}
+
+		qsort_arg((void *)values, numrows, sizeof(Datum),
+				  compare_scalars_simple, &ssup);
+
+		ndistinct = 1;
+		for (j = 1; j < numrows; j++)
+		{
+			if (compare_scalars_simple(&values[j], &values[j-1], &ssup) != 0)
+				ndistinct++;
+		}
+
+		result *= ndistinct;
+	}
+
+	return result;
+}
+
+double
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return DatumGetFloat8(deps);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index df6a61c..fb9ee22 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,6 +38,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
 	bool		hist_enabled;		/* build histogram? */
+	bool		ndist_enabled;		/* build ndist coefficient? */
 
 	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
@@ -47,6 +48,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
 	bool		hist_built;			/* histogram was built */
+	bool		ndist_built;		/* ndistinct coeff built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -55,6 +57,7 @@ CATALOG(pg_mv_statistic,3381)
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
 	bytea		stahist;			/* MV histogram (serialized) */
+	float8		standcoeff;			/* ndistinct coeff (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -70,20 +73,23 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_attrdef
  * ----------------
  */
-#define Natts_pg_mv_statistic					14
+#define Natts_pg_mv_statistic					17
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_deps_enabled		3
 #define Anum_pg_mv_statistic_mcv_enabled		4
 #define Anum_pg_mv_statistic_hist_enabled		5
-#define Anum_pg_mv_statistic_mcv_max_items		6
-#define Anum_pg_mv_statistic_hist_max_buckets	7
-#define Anum_pg_mv_statistic_deps_built			8
-#define Anum_pg_mv_statistic_mcv_built			9
-#define Anum_pg_mv_statistic_hist_built			10
-#define Anum_pg_mv_statistic_stakeys			11
-#define Anum_pg_mv_statistic_stadeps			12
-#define Anum_pg_mv_statistic_stamcv				13
-#define Anum_pg_mv_statistic_stahist			14
+#define Anum_pg_mv_statistic_ndist_enabled		6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_hist_max_buckets	8
+#define Anum_pg_mv_statistic_deps_built			9
+#define Anum_pg_mv_statistic_mcv_built			10
+#define Anum_pg_mv_statistic_hist_built			11
+#define Anum_pg_mv_statistic_ndist_built		12
+#define Anum_pg_mv_statistic_stakeys			13
+#define Anum_pg_mv_statistic_stadeps			14
+#define Anum_pg_mv_statistic_stamcv				15
+#define Anum_pg_mv_statistic_stahist			16
+#define Anum_pg_mv_statistic_standist			17
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3706525..6ecbc4e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -594,11 +594,13 @@ typedef struct MVStatisticInfo
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
 	bool		hist_enabled;	/* histogram enabled */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
 	bool		hist_built;		/* histogram built */
+	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 9fd1314..d3f9de3 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -224,6 +224,7 @@ typedef MVSerializedHistogramData *MVSerializedHistogram;
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
 MVSerializedHistogram    load_mv_histogram(Oid mvoid);
+double		   load_mv_ndistinct(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
@@ -265,11 +266,16 @@ MVHistogram
 build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
 				   VacAttrStats **stats, int numrows_total);
 
+double
+build_mv_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid, MVDependencies dependencies,
 					 MCVList mcvlist, MVHistogram histogram,
+					 double ndistcoeff,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #ifdef DEBUG_MVHIST
-- 
2.1.0

#62

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#61)

7 attachment(s)

Re: multivariate statistics v9

Hi,

attached is v9 of the patch series, including mostly these changes:

1) CREATE STATISTICS cleanup

Firstly, I forgot to make the STATISTICS keyword unreserved again.
I've also removed additional stuff from the grammar that turned out
to be unnecessary / could be replaced with existing pieces.

2) making statistics schema-specific

Similarly to the other objects (e.g. types), statistics names are now
unique within a schema. This also means that the statistics may be
created using qualified name, and also may belong to a different
schema than a table.

It seems to me we probably also need to track owner, and only allow
the owner (or superuser / schema owner) to manipulate the statistics.

The initial intention was to inherit all this from the parent table,
but as we're designing this for the multi-table case, it's not
really working anymore.

3) adding IF [NOT] EXISTS to DROP STATISTICS / CREATE STATISTICS

4) basic documentation of the DDL commands

It's really simple at this point and some of the paragraphs are
still empty. I also think that we'll have to add stuff explaining
how to use statistics, not just docs for the DDL commands.

5) various fixes of the regression tests, related to the above

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchapplication/x-patch; name=0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchDownload

From 55c1eee0e734e6e36da2e6f705b70228c2fce67c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/7] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index dff52c4..80d01bd 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -197,6 +197,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -245,6 +252,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.1.0

0002-shared-infrastructure-and-functional-dependencies.patchapplication/x-patch; name=0002-shared-infrastructure-and-functional-dependencies.patchDownload

From 5b0f45b77134f3b8db327f76f0351dc6119a0417 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/7] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate
stats, most importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON table (columns) WITH (options)
- DROP STATISTICS name
- implementation of functional dependencies (the simplest
  type of multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e.
it does not influence the query planning (subject to
follow-up patches).

The current implementation requires a valid 'ltopr' for
the columns, so that we can sort the sample rows in various
ways, both in this patch and other kinds of statistics.
Maybe this restriction could be relaxed in the future,
requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV
list with limited functionality) might be made to work
with hashes of the values, which is sufficient for equality
comparisons. But the queries would require the equality
operator anyway, so it's not really a weaker requirement.
The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple
and probably needs improvements, so that it detects more
complicated dependencies, and also validation of the math.

The name 'functional dependencies' is more correct (than
'association rules') as it's exactly the name used in
relational theory (esp. Normal Forms) for tracking
column-level dependencies.

The multivariate statistics are automatically removed in
two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics
     would be defined on less than 2 columns (remaining)

If there are at least 2 columns remaining, we keep
the statistics but perform cleanup on the next ANALYZE.
The dropped columns are removed from stakeys, and the new
statistics is built on the smaller set.

We can't do this at DROP COLUMN, because that'd leave us
with invalid statistics, or we'd have to throw it away
although we can still use it. This lazy approach lets us
use the statistics although some of the columns are dead.

This also adds a simple list of statistics to \d in psql.

This means the statistics are created within a schema by
using a qualified name (or using the default schema)

   CREATE STATISTICS schema.statistics ON ...

and then dropped by specifying qualified name

   DROP STATISTICS schema.statistics

or searching through search_path (just like with other objects).

The commands also include IF [NOT] EXISTS clauses, similarly
to the other DDL commands.

I'm not entirely sure making statistics schema-specific is that
a great idea. Maybe it should be "global", but that does not seem
right (e.g. it makes multi-tenant systems based on schemas more
difficult to manage, because tenants would interact).

Includes basic SGML documentation for the DDL commands, although
some of the sections are empty at the moment. In the end, there
should probably be a separate section about statistics elsewhere
in the documentation, explaining how to use the stats.
---
 doc/src/sgml/ref/allfiles.sgml             |   2 +
 doc/src/sgml/ref/create_statistics.sgml    | 174 ++++++++
 doc/src/sgml/ref/drop_statistics.sgml      |  90 ++++
 doc/src/sgml/reference.sgml                |   2 +
 src/backend/catalog/Makefile               |   1 +
 src/backend/catalog/dependency.c           |  11 +-
 src/backend/catalog/heap.c                 | 102 +++++
 src/backend/catalog/namespace.c            |  51 +++
 src/backend/catalog/objectaddress.c        |  22 +
 src/backend/catalog/system_views.sql       |  11 +
 src/backend/commands/Makefile              |   6 +-
 src/backend/commands/analyze.c             |  21 +
 src/backend/commands/dropcmds.c            |   4 +
 src/backend/commands/event_trigger.c       |   3 +
 src/backend/commands/statscmds.c           | 331 +++++++++++++++
 src/backend/commands/tablecmds.c           |   8 +-
 src/backend/nodes/copyfuncs.c              |  16 +
 src/backend/nodes/outfuncs.c               |  18 +
 src/backend/optimizer/util/plancat.c       |  63 +++
 src/backend/parser/gram.y                  |  34 +-
 src/backend/tcop/utility.c                 |  11 +
 src/backend/utils/Makefile                 |   2 +-
 src/backend/utils/cache/relcache.c         |  59 +++
 src/backend/utils/cache/syscache.c         |  23 ++
 src/backend/utils/mvstats/Makefile         |  17 +
 src/backend/utils/mvstats/common.c         | 356 ++++++++++++++++
 src/backend/utils/mvstats/common.h         |  75 ++++
 src/backend/utils/mvstats/dependencies.c   | 638 +++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |  44 ++
 src/include/catalog/dependency.h           |   5 +-
 src/include/catalog/heap.h                 |   1 +
 src/include/catalog/indexing.h             |   7 +
 src/include/catalog/namespace.h            |   2 +
 src/include/catalog/pg_mv_statistic.h      |  73 ++++
 src/include/catalog/pg_proc.h              |   5 +
 src/include/catalog/toasting.h             |   1 +
 src/include/commands/defrem.h              |   4 +
 src/include/nodes/nodes.h                  |   2 +
 src/include/nodes/parsenodes.h             |  12 +
 src/include/nodes/relation.h               |  28 ++
 src/include/utils/mvstats.h                |  70 ++++
 src/include/utils/rel.h                    |   4 +
 src/include/utils/relcache.h               |   1 +
 src/include/utils/syscache.h               |   2 +
 src/test/regress/expected/rules.out        |   8 +
 src/test/regress/expected/sanity_check.out |   1 +
 46 files changed, 2410 insertions(+), 11 deletions(-)
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index bf95453..c0f7653 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -76,6 +76,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
 <!ENTITY createTableSpace   SYSTEM "create_tablespace.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
 <!ENTITY dropTransform      SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..a86eae3
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,174 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON <replaceable class="PARAMETER">table_name</replaceable> ( [
+  { <replaceable class="PARAMETER">column_name</replaceable> } ] [, ...])
+[ WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )</literal></term>
+    <listitem>
+     <para>
+      ...
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Statistics Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>statistics parameters</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-notes">
+  <title>Notes</title>
+
+    <para>
+     ...
+    </para>
+
+ </refsect1>
+
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..4cc0b70
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,90 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics does not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 03020df..2b07b2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -104,6 +104,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createTable;
    &createTableAs;
    &createTableSpace;
@@ -147,6 +148,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropTable;
    &dropTableSpace;
    &dropTSConfig;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 25130ec..058b8a9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index d657c20..8b72d88 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -159,7 +160,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1271,6 +1273,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2414,6 +2420,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index d14cbb7..82c3632 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -46,6 +46,7 @@
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1612,7 +1613,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1840,6 +1844,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2695,6 +2704,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..dfd5bef 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4201,3 +4201,54 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+									  PointerGetDatum(stats_name),
+									  ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 65cf3ed..080c33c 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -37,6 +37,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -436,9 +437,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		InvalidAttrNumber,		/* XXX same owner as relation */
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -911,6 +925,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2183,6 +2202,9 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			/* FIXME do the right owner checks here */
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..2423985 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,6 +158,17 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..5151001 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 070df29..cbaa4e1 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -55,7 +56,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Per-index data for ANALYZE */
 typedef struct AnlIndexData
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..cd65b58 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -292,6 +292,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 9e32f8d..09061bb 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -110,6 +110,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1167,6 +1169,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_DEFACL:
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..84a8b13
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/multixact.h"
+#include "access/reloptions.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/dependency.h"
+#include "catalog/heap.h"
+#include "catalog/index.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_constraint.h"
+#include "catalog/pg_depend.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_inherits.h"
+#include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_tablespace.h"
+#include "catalog/pg_trigger.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_type_fn.h"
+#include "catalog/storage.h"
+#include "catalog/toasting.h"
+#include "commands/cluster.h"
+#include "commands/comment.h"
+#include "commands/defrem.h"
+#include "commands/event_trigger.h"
+#include "commands/policy.h"
+#include "commands/sequence.h"
+#include "commands/tablecmds.h"
+#include "commands/tablespace.h"
+#include "commands/trigger.h"
+#include "commands/typecmds.h"
+#include "commands/user.h"
+#include "executor/executor.h"
+#include "foreign/foreign.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/parsenodes.h"
+#include "optimizer/clauses.h"
+#include "optimizer/planner.h"
+#include "parser/parse_clause.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_collate.h"
+#include "parser/parse_expr.h"
+#include "parser/parse_oper.h"
+#include "parser/parse_relation.h"
+#include "parser/parse_type.h"
+#include "parser/parse_utilcmd.h"
+#include "parser/parser.h"
+#include "pgstat.h"
+#include "rewrite/rewriteDefine.h"
+#include "rewrite/rewriteHandler.h"
+#include "rewrite/rewriteManip.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "storage/lock.h"
+#include "storage/predicate.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "utils/typcache.h"
+#include "utils/mvstats.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON table (columns) WITH (options)
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+	ObjectAddress	address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	ObjectAddress parentobject, childobject;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (stmt->if_not_exists &&
+		SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exists, skipping",
+						namestr)));
+		return InvalidObjectAddress;
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+
+	/* transform the column names to attnum values */
+
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, stmt->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+	values[Anum_pg_mv_statistic_staname -1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace -1] = ObjectIdGetDatum(namespaceId);
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also record dependency on the schema (to drop statistics on DROP SCHEMA)
+	 */
+	parentobject.classId = NamespaceRelationId;
+	parentobject.objectId = ObjectIdGetDatum(namespaceId);
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *     DROP STATISTICS stats_name ON table_name
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	HeapTuple	tup;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0b4a334..5f4220d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -93,7 +94,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -141,8 +142,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f47e0da..407b4a0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4119,6 +4119,19 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt  *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4966,6 +4979,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d95e151..5ecc9ef 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1939,6 +1939,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3359,6 +3374,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_CreateStmt:
 				_outCreateStmt(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d5528e0..83bd85c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -39,7 +40,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -93,6 +96,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -381,6 +385,65 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+			if (! HeapTupleIsValid(htup))
+				continue;
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b307b48..3be3f02 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -241,7 +241,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -809,6 +809,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3436,6 +3437,36 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $5;
+							n->keys = $7;
+							n->options = $9;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $8;
+							n->keys = $10;
+							n->options = $12;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -5621,6 +5652,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 045f7f0..2ba88e2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1520,6 +1520,10 @@ ProcessUtilitySlow(Node *parsetree,
 				address = ExecSecLabelStmt((SecLabelStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:	/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -2160,6 +2164,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_TRANSFORM:
 					tag = "DROP TRANSFORM";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2527,6 +2534,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index fc5b9d9..1e41cac 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3930,6 +3931,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4899,6 +4956,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 6eb2ac6..0331da7 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -501,6 +502,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..6d5465b
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..84b6561
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,638 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Mine functional dependencies between columns, in the form (A => B),
+ * meaning that a value in column 'A' determines value in 'B'. A simple
+ * artificial example may be a table created like this
+ *
+ *     CREATE TABLE deptest (a INT, b INT)
+ *        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+ *
+ * Clearly, once we know the value for 'A' we can easily determine the
+ * value of 'B' by dividing (A/10). A more practical example may be
+ * addresses, where (ZIP code => city name), i.e. once we know the ZIP,
+ * we probably know which city it belongs to. Larger cities usually have
+ * multiple ZIP codes, so the dependency can't be reversed.
+ *
+ * Functional dependencies are a concept well described in relational
+ * theory, especially in definition of normalization and "normal forms".
+ * Wikipedia has a nice definition of a functional dependency [1]:
+ *
+ *     In a given table, an attribute Y is said to have a functional
+ *     dependency on a set of attributes X (written X -> Y) if and only
+ *     if each X value is associated with precisely one Y value. For
+ *     example, in an "Employee" table that includes the attributes
+ *     "Employee ID" and "Employee Date of Birth", the functional
+ *     dependency {Employee ID} -> {Employee Date of Birth} would hold.
+ *     It follows from the previous two sentences that each {Employee ID}
+ *     is associated with precisely one {Employee Date of Birth}.
+ *
+ * [1] http://en.wikipedia.org/wiki/Database_normalization
+ *
+ * Most datasets might be normalized not to contain any such functional
+ * dependencies, but sometimes it's not practical. In some cases it's
+ * actually a conscious choice to model the dataset in denormalized way,
+ * either because of performance or to make querying easier.
+ *
+ * The current implementation supports only dependencies between two
+ * columns, but this is merely a simplification of the initial patch.
+ * It's certainly useful to mine for dependencies involving multiple
+ * columns on the 'left' side, i.e. a condition for the dependency.
+ * That is dependencies [A,B] => C and so on.
+ *
+ * TODO The implementation may/should be smart enough not to mine both
+ *      [A => B] and [A,C => B], because the second dependency is a
+ *      consequence of the first one (if values of A determine values
+ *      of B, adding another column won't change that). The ANALYZE
+ *      should first analyze 1:1 dependencies, then 2:1 dependencies
+ *      (and skip the already identified ones), etc.
+ *
+ * For example the dependency [city name => zip code] is much weaker
+ * than [city name, state name => zip code], because there may be
+ * multiple cities with the same name in various states. It's not
+ * perfect though - there are probably cities with the same name within
+ * the same state, but this is relatively rare occurence hopefully.
+ * More about this in the section about dependency mining.
+ *
+ * Handling multiple columns on the right side is not necessary, as such
+ * dependencies may be decomposed into a set of dependencies with
+ * the same meaning, one for each column on the right side. For example
+ *
+ *     A => [B,C]
+ *
+ * is exactly the same as
+ *
+ *     (A => B) & (A => C).
+ *
+ * Of course, storing (A => [B, C]) may be more efficient thant storing
+ * the two dependencies (A => B) and (A => C) separately.
+ *
+ *
+ * Dependency mining (ANALYZE)
+ * ---------------------------
+ *
+ * The current build algorithm is rather simple - for each pair [A,B] of
+ * columns, the data are sorted lexicographically (first by A, then B),
+ * and then a number of metrics is computed by walking the sorted data.
+ *
+ * In general the algorithm counts distict values of A (forming groups
+ * thanks to the sorting), supporting or contradicting the hypothesis
+ * that A => B (i.e. that values of B are predetermined by A). If there
+ * are multiple values of B for a single value of A, it's counted as
+ * contradicting.
+ *
+ * A group may be neither supporting nor contradicting. To be counted as
+ * supporting, the group has to have at least min_group_size(=3) rows.
+ * Smaller 'supporting' groups are counted as neutral.
+ *
+ * Finally, the number of rows in supporting and contradicting groups is
+ * compared, and if there is at least 10x more supporting rows, the
+ * dependency is considered valid.
+ *
+ *
+ * Real-world datasets are imperfect - there may be errors (e.g. due to
+ * data-entry mistakes), or factually correct records, yet contradicting
+ * the dependency (e.g. when a city splits into two, but both keep the
+ * same ZIP code). A strict ANALYZE implementation (where the functional
+ * dependencies are identified) would ignore dependencies on such noisy
+ * data, making the approach unusable in practice.
+ *
+ * The proposed implementation attempts to handle such noisy cases
+ * gracefully, by tolerating small number of contradicting cases.
+ *
+ * In the future this might also perform some sort of test and decide
+ * whether it's worth building any other kind of multivariate stats,
+ * or whether the dependencies sufficiently describe the data. Or at
+ * least not build the MCV list / histogram on the implied columns.
+ * Such reduction would however make the 'verification' (see the next
+ * section) impossible.
+ *
+ *
+ * Clause reduction (planner/optimizer)
+ * ------------------------------------
+ *
+ * Apllying the dependencies is quite simple - given a list of clauses,
+ * try to apply all the dependencies. For example given clause list
+ *
+ *    (a = 1) AND (b = 1) AND (c = 1) AND (d < 100)
+ *
+ * and dependencies [a=>b] and [a=>d], this may be reduced to
+ *
+ *    (a = 1) AND (c = 1) AND (d < 100)
+ *
+ * The (d<100) can't be reduced as it's not an equality clause, so the
+ * dependency [a=>d] can't be applied.
+ *
+ * See clauselist_apply_dependencies() for more details.
+ *
+ * The problem with the reduction is that the query may use conditions
+ * that are not redundant, but in fact contradictory - e.g. the user
+ * may search for a ZIP code and a city name not matching the ZIP code.
+ *
+ * In such cases, the condition on the city name is not actually
+ * redundant, but actually contradictory (making the result empty), and
+ * removing it while estimating the cardinality will make the estimate
+ * worse.
+ *
+ * The current estimation assuming independence (and multiplying the
+ * selectivities) works better in this case, but only by utter luck.
+ *
+ * In some cases this might be verified using the other multivariate
+ * statistics - MCV lists and histograms. For MCV lists the verification
+ * might be very simple - peek into the list if there are any items
+ * matching the clause on the 'A' column (e.g. ZIP code), and if such
+ * item is found, check that the 'B' column matches the other clause.
+ * If it does not, the clauses are contradictory. We can't really say
+ * if such item was not found, except maybe restricting the selectivity
+ * using the MCV data (e.g. using min/max selectivity, or something).
+ *
+ * With histograms, it might work similarly - we can't check the values
+ * directly (because histograms use buckets, unlike MCV lists, storing
+ * the actual values). So we can only observe the buckets matching the
+ * clauses - if those buckets have very low frequency, it probably means
+ * the two clauses are incompatible.
+ *
+ * It's unclear what 'low frequency' is, but if one of the clauses is
+ * implied (automatically true because of the other clause), then
+ *
+ *     selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+ *
+ * So we might compute selectivity of the first clause (on the column
+ * A in dependency [A=>B]) - for example using regular statistics.
+ * And then check if the selectivity computed from the histogram is
+ * about the same (or significantly lower).
+ *
+ * The problem is that histograms work well only when the data ordering
+ * matches the natural meaning. For values that serve as labels - like
+ * city names or ZIP codes, or even generated IDs, histograms really
+ * don't work all that well. For example sorting cities by name won't
+ * match the sorting of ZIP codes, rendering the histogram unusable.
+ *
+ * The MCV are probably going to work much better, because they don't
+ * really assume any sort of ordering. And it's probably more appropriate
+ * for the label-like data.
+ *
+ * TODO Support dependencies with multiple columns on left/right.
+ *
+ * TODO Investigate using histogram and MCV list to confirm the
+ *      functional dependencies.
+ *
+ * TODO Investigate statistical testing of the distribution (to decide
+ *      whether it makes sense to build the histogram/MCV list).
+ *
+ * TODO Using a min/max of selectivities would probably make more sense
+ *      for the associated columns.
+ *
+ * TODO Consider eliminating the implied columns from the histogram and
+ *      MCV lists (but maybe that's not a good idea, because that'd make
+ *      it impossible to use these stats for non-equality clauses and
+ *      also it wouldn't be possible to use the stats for verification
+ *      of the dependencies as proposed in another TODO).
+ *
+ * TODO This builds a complete set of dependencies, i.e. including
+ *      transitive dependencies - if we identify [A => B] and [B => C],
+ *      we're likely to identify [A => C] too. It might be better to
+ *      keep only the minimal set of dependencies, i.e. prune all the
+ *      dependencies that we can recreate by transivitity.
+ *
+ *      There are two conceptual ways to do that:
+ *
+ *      (a) generate all the rules, and then prune the rules that may
+ *          be recteated by combining other dependencies, or
+ *
+ *      (b) performing the 'is combination of other dependencies' check
+ *          before actually doing the work
+ *
+ *      The second option has the advantage that we don't really need
+ *      to perform the sort/count. It's not sufficient alone, though,
+ *      because we may discover the dependencies in the wrong order.
+ *      For example [A => B], [A => C] and then [B => C]. None of those
+ *      dependencies is a combination of the already known ones, yet
+ *      [A => C] is a combination of [A => B] and [B => C].
+ *
+ * FIXME Not sure the current NULL handling makes much sense. We assume
+ *       that NULL is 0, so it's handled like a regular value
+ *       (NULL == NULL), so all NULLs in a single column form a single
+ *       group. Maybe that's not the right thing to do, especially with
+ *       equality conditions - in that case NULLs are irrelevant. So
+ *       maybe the right solution would be to just ignore NULL values?
+ *
+ *       However simply "ignoring" the NULL values does not seem like
+ *       a good idea - imagine columns A and B, where for each value of
+ *       A, values in B are constant (same for the whole group) or NULL.
+ *       Let's say only 10% of B values in each group is not NULL. Then
+ *       ignoring the NULL values will result in 10x misestimate (and
+ *       it's trivial to construct arbitrary errors). So maybe handling
+ *       NULL values just like a regular value is the right thing here.
+ *
+ *       Or maybe NULL values should be treated differently on each side
+ *       of the dependency? E.g. as ignored on the left (condition) and
+ *       as regular values on the right - this seems consistent with how
+ *       equality clauses work, as equality clause means 'NOT NULL'.
+ *       So if we say [A => B] then it may also imply "NOT NULL" on the
+ *       right side.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct values in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we can estimate the
+	 *      average group size and use it as a threshold. Or something
+	 *      like that. Seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 85e3aa5..590cd51 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2104,6 +2104,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90500)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 049bf9f..12211fe 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,10 +153,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 
 /* in dependency.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index b80d8d8..5ae42f7 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ab2c1a8..a768bb5 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId  3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..44cf9c6 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -137,6 +137,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..a568a07
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,73 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;			/* relation containing attributes */
+	NameData	staname;			/* statistics name */
+	Oid			stanamespace;		/* OID of namespace containing this statistics */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					7
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_deps_enabled		4
+#define Anum_pg_mv_statistic_deps_built			5
+#define Anum_pg_mv_statistic_stakeys			6
+#define Anum_pg_mv_statistic_stadeps			7
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f58672e..76e054d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2741,6 +2741,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index b7a38ce..a52096b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3577, 3578);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 54f67e9..99a6a62 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -75,6 +75,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(List *name, List *args, bool oldstyle,
 				List *parameters, const char *queryString);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 2b73483..0329472 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -251,6 +251,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -381,6 +382,7 @@ typedef enum NodeTag
 	T_CreatePolicyStmt,
 	T_AlterPolicyStmt,
 	T_CreateTransformStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fd0629..e1807fb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -601,6 +601,17 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+	bool		if_not_exists;	/* just do nothing if statistics already exists? */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1410,6 +1421,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 61519bb..7ae0f9e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -479,6 +479,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -573,6 +574,33 @@ typedef struct IndexOptInfo
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..7ebd961
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index ff5672d..26c7f85 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -79,6 +79,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -111,6 +112,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 1b48304..9f03c8d 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..0e0658d 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 28b061f..2e2df8e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1365,6 +1365,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
2.1.0

0003-clause-reduction-using-functional-dependencies.patchapplication/x-patch; name=0003-clause-reduction-using-functional-dependencies.patchDownload

From ebae50c43eb5c6fdd24efd726489dc40672ac184 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/7] clause reduction using functional dependencies

During planning, use functional dependencies to decide which
clauses to skip during cardinality estimation. Initial and
rather simplistic implementation.

This only works with regular WHERE clauses, not clauses used
for join clauses.

Note: The clause_is_mv_compatible() needs to identify the
relation (so that we can fetch the list of multivariate stats
by OID). planner_rt_fetch() seems like the appropriate way to
get the relation OID, but apparently it only works with simple
vars. Maybe examine_variable() would make this work with more
complex vars too?

Includes regression tests analyzing functional dependencies
(part of ANALYZE) on several datasets (no dependencies, no
transitive dependencies, ...).

Checks that a query with conditions on two columns, where one (B)
is functionally dependent on the other one (A), correctly ignores
the clause on (B) and chooses bitmap index scan instead of plain
index scan (which is what happens otherwise, thanks to assumption
of independence).

Note: Functional dependencies only work with equality clauses,
no inequalities etc.
---
 src/backend/optimizer/path/clausesel.c        | 912 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 +
 src/include/utils/mvstats.h                   |  16 +-
 src/test/regress/expected/mv_dependencies.out | 172 +++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 +++++
 8 files changed, 1278 insertions(+), 5 deletions(-)
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 02660c2..e834722 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,44 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+
+static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+								Oid varRelid, List *stats,
+								SpecialJoinInfo *sjinfo);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, List *clauses,
+						 Oid varRelid, Index *relid);
+ 
+static Bitmapset* fdeps_collect_attnums(List *stats);
+
+static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
+static int	*make_attnum_to_idx_mapping(Bitmapset *attnums);
+
+static bool	*build_adjacency_matrix(List *stats, Bitmapset *attnums,
+								int *idx_to_attnum, int *attnum_to_idx);
+
+static void	multiply_adjacency_matrix(bool *matrix, int natts);
+
+static List* fdeps_reduce_clauses(List *clauses,
+								  Bitmapset *attnums, bool *matrix,
+								  int *idx_to_attnum, int *attnum_to_idx,
+								  Index relid);
+
+static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static Bitmapset * get_varattnos(Node * node, Index relid);
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +103,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -87,6 +130,88 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  *
  * Of course this is all very dependent on the behavior of
  * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ *
+ *
+ * Multivariate statististics
+ * --------------------------
+ * This also uses multivariate stats to estimate combinations of
+ * conditions, in a way (a) maximizing the estimate accuracy by using
+ * as many stats as possible, and (b) minimizing the overhead,
+ * especially when there are no suitable multivariate stats (so if you
+ * are not using multivariate stats, there's no additional overhead).
+ *
+ * The following checks are performed (in this order), and the optimizer
+ * falls back to regular stats on the first 'false'.
+ *
+ * NOTE: This explains how this works with all the patches applied, not
+ *       just the functional dependencies.
+ *
+ * (0) check if there are multivariate stats on the relation
+ *
+ *     If no, just skip all the following steps (directly to the
+ *     original code).
+ *
+ * (1) check how many attributes are there in conditions compatible
+ *     with functional dependencies
+ *
+ *     Only simple equality clauses are considered compatible with
+ *     functional dependencies (and that's unlikely to change, because
+ *     that's the only case when functional dependencies are useful).
+ *
+ *     If there are no conditions that might be handled by multivariate
+ *     stats, or if the conditions reference just a single column, it
+ *     makes no sense to use functional dependencies, so skip to (4).
+ *
+ * (2) reduce the clauses using functional dependencies
+ *
+ *     This simply attempts to 'reduce' the clauses by applying functional
+ *     dependencies. For example if there are two clauses:
+ *
+ *         WHERE (a = 1) AND (b = 2)
+ *
+ *     and we know that 'a' determines the value of 'b', we may remove
+ *     the second condition (b = 2) when computing the selectivity.
+ *     This is of course tricky - see mvstats/dependencies.c for details.
+ *
+ *     After the reduction, step (1) is to be repeated.
+ *
+ * (3) check how many attributes are there in conditions compatible
+ *     with MCV lists and histograms
+ *
+ *     What conditions are compatible with multivariate stats is decided
+ *     by clause_is_mv_compatible(). At this moment, only conditions
+ *     of the form "column operator constant" (for simple comparison
+ *     operators), IS [NOT] NULL and some AND/OR clauses are considered
+ *     compatible with multivariate statistics.
+ *
+ *     Again, see clause_is_mv_compatible() for details.
+ *
+ * (4) check how many attributes are there in conditions compatible
+ *     with MCV lists and histograms
+ *
+ *     If there are no conditions that might be handled by MCV lists
+ *     or histograms, or if the conditions reference just a single
+ *     column, it makes no sense to continue, so just skip to (7).
+ *
+ * (5) choose the stats matching the most columns
+ *
+ *     If there are multiple instances of multivariate statistics (e.g.
+ *     built on different sets of columns), we choose the stats covering
+ *     the most columns from step (1). It may happen that all available
+ *     stats match just a single column - for example with conditions
+ *
+ *         WHERE a = 1 AND b = 2
+ *
+ *     and statistics built on (a,c) and (b,c). In such case just fall
+ *     back to the regular stats because it makes no sense to use the
+ *     multivariate statistics.
+ *
+ *     For more details about how exactly we choose the stats, see
+ *     choose_mv_statistics().
+ *
+ * (6) use the multivariate stats to estimate matching clauses
+ *
+ * (7) estimate the remaining clauses using the regular statistics
  */
 Selectivity
 clauselist_selectivity(PlannerInfo *root,
@@ -99,6 +224,16 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* attributes in mv-compatible clauses */
+	Bitmapset  *mvattnums = NULL;
+	List	   *stats = NIL;
+
+	/* use clauses (not conditions), because those are always non-empty */
+	stats = find_stats(root, clauses, varRelid, &relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +243,31 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Check that there are some stats with functional dependencies
+	 * built (by walking the stats list). We're going to find that
+	 * anyway when trying to apply the functional dependencies, but
+	 * this is probably a tad faster.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+	{
+		/* collect attributes referenced by mv-compatible clauses */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+
+		/*
+		 * If there are mv-compatible clauses, referencing at least two
+		 * different columns (otherwise it makes no sense to use mv stats),
+		 * try to reduce the clauses using functional dependencies, and
+		 * recollect the attributes from the reduced list.
+		 *
+		 * We don't need to select a single statistics for this - we can
+		 * apply all the functional dependencies we have.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+			clauses = clauselist_apply_dependencies(root, clauses, varRelid,
+													stats, sjinfo);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +923,753 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
+				   Index *relid, SpecialJoinInfo *sjinfo)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
+						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* no support for OR clauses at this point */
+		if (rinfo->orclause)
+			return false;
+
+		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
+		clause = (Node*)rinfo->clause;
+
+		/* only simple opclauses are compatible with multivariate stats */
+		if (! is_opclause(clause))
+			return false;
+
+		/* we don't support join conditions at this moment */
+		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+			return false;
+
+		/* is it 'variable op constant' ? */
+		if (list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+				 * (return NULL).
+				 *
+				 * TODO Maybe use examine_variable() would fix that?
+				 */
+				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+					return false;
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 *
+				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+				 *       part seems to be enforced by treat_as_join_clause().
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				*relid = var->varno;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_EQSEL:
+							*attnum = var->varattno;
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * Performs reduction of clauses using functional dependencies, i.e.
+ * removes clauses that are considered redundant. It simply walks
+ * through dependencies, and checks whether the dependency 'matches'
+ * the clauses, i.e. if there's a clause matching the condition. If yes,
+ * all clauses matching the implied part of the dependency are removed
+ * from the list.
+ *
+ * This simply looks at attnums references by the clauses, not at the
+ * type of the operator (equality, inequality, ...). This may not be the
+ * right way to do - it certainly works best for equalities, which is
+ * naturally consistent with functional dependencies (implications).
+ * It's not clear that other operators are handled sensibly - for
+ * example for inequalities, like
+ *
+ *     WHERE (A >= 10) AND (B <= 20)
+ *
+ * and a trivial case where [A == B], resulting in symmetric pair of
+ * rules [A => B], [B => A], it's rather clear we can't remove either of
+ * those clauses.
+ *
+ * That only highlights that functional dependencies are most suitable
+ * for label-like data, where using non-equality operators is very rare.
+ * Using the common city/zipcode example, clauses like
+ *
+ *     (zipcode <= 12345)
+ *
+ * or
+ *
+ *     (cityname >= 'Washington')
+ *
+ * are rare. So restricting the reduction to equality should not harm
+ * the usefulness / applicability.
+ *
+ * The other assumption is that this assumes 'compatible' clauses. For
+ * example by using mismatching zip code and city name, this is unable
+ * to identify the discrepancy and eliminates one of the clauses. The
+ * usual approach (multiplying both selectivities) thus produces a more
+ * accurate estimate, although mostly by luck - the multiplication
+ * comes from assumption of statistical independence of the two
+ * conditions (which is not not valid in this case), but moves the
+ * estimate in the right direction (towards 0%).
+ *
+ * This might be somewhat improved by cross-checking the selectivities
+ * against MCV and/or histogram.
+ *
+ * The implementation needs to be careful about cyclic rules, i.e. rules
+ * like [A => B] and [B => A] at the same time. This must not reduce
+ * clauses on both attributes at the same time.
+ *
+ * Technically we might consider selectivities here too, somehow. E.g.
+ * when (A => B) and (B => A), we might use the clauses with minimum
+ * selectivity.
+ *
+ * TODO Consider restricting the reduction to equality clauses. Or maybe
+ *      use equality classes somehow?
+ *
+ * TODO Merge this docs to dependencies.c, as it's saying mostly the
+ *      same things as the comments there.
+ *
+ * TODO Currently this is applied only to the top-level clauses, but
+ *      maybe we could apply it to lists at subtrees too, e.g. to the
+ *      two AND-clauses in
+ *
+ *          (x=1 AND y=2) OR (z=3 AND q=10)
+ *
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Oid varRelid, List *stats,
+							  SpecialJoinInfo *sjinfo)
+{
+	List	   *reduced_clauses = NIL;
+	Index		relid;
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	List	   *deps_clauses   = NIL;
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/*
+	 * Build the dependency matrix, i.e. attribute adjacency matrix,
+	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
+	 * multiply it by itself, to get transitive dependencies.
+	 *
+	 * Note: This is pretty much transitive closure from graph theory.
+	 *
+	 * First, let's see what attributes are covered by functional
+	 * dependencies (sides of the adjacency matrix), and also a maximum
+	 * attribute (size of mapping to simple integer indexes);
+	 */
+	deps_attnums = fdeps_collect_attnums(stats);
+
+	/*
+	 * Walk through the clauses - clauses that are (one of)
+	 *
+	 * (a) not mv-compatible
+	 * (b) are using more than a single attnum
+	 * (c) using attnum not covered by functional depencencies
+	 *
+	 * may be copied directly to the result. The interesting clauses are
+	 * kept in 'deps_clauses' and will be processed later.
+	 */
+	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
+										  &reduced_clauses, &deps_clauses,
+										  varRelid, &relid, sjinfo);
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+		list_free(deps_clauses);
+
+		return clauses;
+	}
+
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't really reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(deps_clauses);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/*
+	 * Build mapping between matrix indexes and attnums, and then the
+	 * adjacency matrix itself.
+	 */
+	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
+	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
+
+	/* build the adjacency matrix */
+	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
+										 deps_idx_to_attnum,
+										 deps_attnum_to_idx);
+
+	deps_natts = bms_num_members(deps_attnums);
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	multiply_adjacency_matrix(deps_matrix, deps_natts);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	deps_clauses = fdeps_reduce_clauses(deps_clauses,
+										deps_attnums, deps_matrix,
+										deps_idx_to_attnum,
+										deps_attnum_to_idx, relid);
+
+	/* join the two lists of clauses */
+	reduced_clauses = list_union(reduced_clauses, deps_clauses);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
+
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Determing relid (either from varRelid or from clauses) and then
+ * lookup stats using the relid.
+ */
+static List *
+find_stats(PlannerInfo *root, List *clauses, Oid varRelid, Index *relid)
+{
+	/* unknown relid by default */
+	*relid = InvalidOid;
+
+	/*
+	 * First we need to find the relid (index info simple_rel_array).
+	 * If varRelid is not 0, we already have it, otherwise we have to
+	 * look it up from the clauses.
+	 */
+	if (varRelid != 0)
+		*relid = varRelid;
+	else
+	{
+		Relids	relids = pull_varnos((Node*)clauses);
+
+		/*
+		 * We only expect 0 or 1 members in the bitmapset. If there are
+		 * no vars, we'll get empty bitmapset, otherwise we'll get the
+		 * relid as the single member.
+		 *
+		 * FIXME For some reason we can get 2 relids here (e.g. \d in
+		 *       psql does that).
+		 */
+		if (bms_num_members(relids) == 1)
+			*relid = bms_singleton_member(relids);
+
+		bms_free(relids);
+	}
+
+	/*
+	 * if we found the relid, we can get the stats from simple_rel_array
+	 *
+	 * This only gets stats that are already built, because that's how
+	 * we load it into RelOptInfo (see get_relation_info), but we don't
+	 * detoast the whole stats yet. That'll be done later, after we
+	 * decide which stats to use.
+	 */
+	if (*relid != InvalidOid)
+		return root->simple_rel_array[*relid]->mvstatlist;
+
+	return NIL;
+}
+
+static Bitmapset*
+fdeps_collect_attnums(List *stats)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		int2vector *stakeys = info->stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+			attnums = bms_add_member(attnums, stakeys->values[j]);
+	}
+
+	return attnums;
+}
+
+
+static int*
+make_idx_to_attnum_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum = -1;
+
+	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
+
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attidx++] = attnum;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+static int*
+make_attnum_to_idx_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum = -1;
+	int		maxattnum = -1;
+	int	   *mapping;
+
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		maxattnum = attnum;
+
+	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attnum] = attidx++;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+static bool*
+build_adjacency_matrix(List *stats, Bitmapset *attnums,
+					   int *idx_to_attnum, int *attnum_to_idx)
+{
+	ListCell *lc;
+	int		natts  = bms_num_members(attnums);
+	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! stat->deps_built)
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(stat->mvoid);
+		if (dependencies == NULL)
+		{
+			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
+			continue;
+		}
+
+		/* set matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			matrix[aidx * natts + bidx] = true;
+		}
+	}
+
+	return matrix;
+}
+
+static void
+multiply_adjacency_matrix(bool *matrix, int natts)
+{
+	int i;
+
+	for (i = 0; i < natts; i++)
+	{
+		int k, l, m;
+		int nchanges = 0;
+
+		/* k => l */
+		for (k = 0; k < natts; k++)
+		{
+			for (l = 0; l < natts; l++)
+			{
+				/* we already have this dependency */
+				if (matrix[k * natts + l])
+					continue;
+
+				/* we don't really care about the exact value, just 0/1 */
+				for (m = 0; m < natts; m++)
+				{
+					if (matrix[k * natts + m] * matrix[m * natts + l])
+					{
+						matrix[k * natts + l] = true;
+						nchanges += 1;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added here, so terminate */
+		if (nchanges == 0)
+			break;
+	}
+}
+
+static List*
+fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
+					int *idx_to_attnum, int *attnum_to_idx, Index relid)
+{
+	int i;
+	ListCell *lc;
+	List   *reduced_clauses = NIL;
+
+	int			nmvclauses;	/* size of the arrays */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+
+	int			natts = bms_num_members(attnums);
+
+	/*
+	 * Preallocate space for all clauses (the list only containst
+	 * compatible clauses at this point). This makes it somewhat easier
+	 * to access the stats / attnums randomly.
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* fill the arrays */
+	nmvclauses = 0;
+	foreach (lc, clauses)
+	{
+		Node * clause = (Node*)lfirst(lc);
+		Bitmapset * attnums = get_varattnos(clause, relid);
+
+		mvclauses[nmvclauses] = clause;
+		mvattnums[nmvclauses] = bms_singleton_member(attnums);
+		nmvclauses++;
+	}
+
+	Assert(nmvclauses == list_length(clauses));
+
+	/* now try to reduce the clauses (using the dependencies) */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], attnums))
+				continue;
+
+			aidx = attnum_to_idx[mvattnums[i]];
+			bidx = attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = matrix[aidx * natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	return reduced_clauses;
+}
+
+
+static Bitmapset *
+fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo)
+{
+	ListCell *lc;
+	Bitmapset *clause_attnums = NULL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+
+		if (! clause_is_mv_compatible(root, clause, varRelid, relid,
+									  &attnum, sjinfo))
+
+			/* clause incompatible with functional dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(attnum, deps_attnums))
+
+			/* clause not covered by the dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else
+		{
+			*deps_clauses   = lappend(*deps_clauses, clause);
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	return clause_attnums;
+}
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..bd200bc 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 84b6561..0a08d12 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -636,3 +636,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 7ebd961..cc43a79 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,12 +17,20 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -48,6 +56,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..e759997
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index b1bc7c7..81484f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index ade9ef1..14ea574 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -161,3 +161,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..48dea4d
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
2.1.0

0004-multivariate-MCV-lists.patchapplication/x-patch; name=0004-multivariate-MCV-lists.patchDownload

From 7ce09934eddfc08315b623fa498f9548f9150ec3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/7] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 doc/src/sgml/ref/create_statistics.sgml |   18 +
 src/backend/catalog/system_views.sql    |    4 +-
 src/backend/commands/statscmds.c        |   45 +-
 src/backend/nodes/outfuncs.c            |    2 +
 src/backend/optimizer/path/clausesel.c  | 1079 +++++++++++++++++++++++++--
 src/backend/optimizer/util/plancat.c    |    4 +-
 src/backend/utils/mvstats/Makefile      |    2 +-
 src/backend/utils/mvstats/common.c      |  104 ++-
 src/backend/utils/mvstats/common.h      |   11 +-
 src/backend/utils/mvstats/mcv.c         | 1237 +++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                 |   25 +-
 src/include/catalog/pg_mv_statistic.h   |   18 +-
 src/include/catalog/pg_proc.h           |    4 +
 src/include/nodes/relation.h            |    2 +
 src/include/utils/mvstats.h             |   69 +-
 src/test/regress/expected/mv_mcv.out    |  207 ++++++
 src/test/regress/expected/rules.out     |    4 +-
 src/test/regress/parallel_schedule      |    2 +-
 src/test/regress/serial_schedule        |    1 +
 src/test/regress/sql/mv_mcv.sql         |  178 +++++
 20 files changed, 2915 insertions(+), 101 deletions(-)
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index a86eae3..193e4b0 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -132,6 +132,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>max_mcv_items</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of MCV list items.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2423985..5488061 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -165,7 +165,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 84a8b13..90bfaed 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -136,7 +136,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject, childobject;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -212,6 +218,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -220,10 +249,16 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -243,8 +278,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 5ecc9ef..9e029ef 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1948,9 +1948,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index e834722..d194551 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,17 +48,38 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+							 int type);
 
 static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
-									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
+									  int type);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+								 List *clauses, Oid varRelid,
+								 List **mvclauses, MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStatisticInfo *mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
@@ -85,6 +107,13 @@ static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
 
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -250,8 +279,12 @@ clauselist_selectivity(PlannerInfo *root,
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP))
 	{
-		/* collect attributes referenced by mv-compatible clauses */
-		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo);
+		/*
+		 * Collect attributes referenced by mv-compatible clauses (looking
+		 * for clauses compatible with functional dependencies for now).
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   MV_CLAUSE_TYPE_FDEP);
 
 		/*
 		 * If there are mv-compatible clauses, referencing at least two
@@ -268,6 +301,48 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
+	 * Check that there are statistics with MCV list. If not, we don't
+	 * need to waste time with the optimization.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV))
+	{
+		/*
+		 * Recollect attributes from mv-compatible clauses (maybe we've
+		 * removed so many clauses we have a single mv-compatible attnum).
+		 * From now on we're only interested in MCV-compatible clauses.
+		 */
+		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
+									   MV_CLAUSE_TYPE_MCV);
+
+		/*
+		 * If there still are at least two columns, we'll try to select
+		 * a suitable multivariate stats.
+		 */
+		if (bms_num_members(mvattnums) >= 2)
+		{
+			/* see choose_mv_statistics() for details */
+			MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+			if (mvstat != NULL)	/* we have a matching stats */
+			{
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										MV_CLAUSE_TYPE_MCV);
+
+				/* we've chosen the histogram to match the clauses */
+				Assert(mvclauses != NIL);
+
+				/* compute the multivariate stats */
+				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			}
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -924,12 +999,129 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * Estimate selectivity for the list of MV-compatible clauses, using
+ * using a MV statistics (combining a histogram and MCV list).
+ *
+ * This simply passes the estimation to the MCV list and then to the
+ * histogram, if available.
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities
+ *      (i.e. the selectivity of the most restrictive clause), because
+ *      that's the maximum we can ever get from ANDed list of clauses.
+ *      This may probably prevent issues with hitting too many buckets
+ *      and low precision histograms.
+ *
+ * TODO We may support some additional conditions, most importantly
+ *      those matching multiple columns (e.g. "a = b" or "a < b").
+ *      Ultimately we could track multi-table histograms for join
+ *      cardinality estimation.
+ *
+ * TODO Further thoughts on processing equality clauses: Maybe it'd be
+ *      better to look for stats (with MCV) covered by the equality
+ *      clauses, because then we have a chance to find an exact match
+ *      in the MCV list, which is pretty much the best we can do. We may
+ *      also look at the least frequent MCV item, and use it as a upper
+ *      boundary for the selectivity (had there been a more frequent
+ *      item, it'd be in the MCV list).
+ *
+ * TODO There are several options for 'sanity clamping' the estimates.
+ *
+ *      First, if we have selectivities for each condition, then
+ *
+ *          P(A,B) <= MIN(P(A), P(B))
+ *
+ *      Because additional conditions (connected by AND) can only lower
+ *      the probability.
+ *
+ *      So we can do some basic sanity checks using the single-variate
+ *      stats (the ones we have right now).
+ *
+ *      Second, when we have multivariate stats with a MCV list, then
+ *
+ *      (a) if we have a full equality condition (one equality condition
+ *          on each column) and we found a match in the MCV list, this is
+ *          the selectivity (and it's supposed to be exact)
+ *
+ *      (b) if we have a full equality condition and we haven't found a
+ *          match in the MCV list, then the selectivity is below the
+ *          lowest selectivity in the MCV list
+ *
+ *      (c) if we have a equality condition (not full), we can still
+ *          search the MCV for matches and use the sum of probabilities
+ *          as a lower boundary for the histogram (if there are no
+ *          matches in the MCV list, then we have no boundary)
+ *
+ *      Third, if there are multiple (combinations of) multivariate
+ *      stats for a set of clauses, we may compute all of them and then
+ *      somehow aggregate them - e.g. by choosing the minimum, median or
+ *      average. The stats are susceptible to overestimation (because
+ *      we take 50% of the bucket for partial matches). Some stats may
+ *      give better estimates than others, but it's very difficult to
+ *      say that in advance which one is the best (it depends on the
+ *      number of buckets, number of additional columns not referenced
+ *      in the clauses, type of condition etc.).
+ *
+ *      So we may compute them all and then choose a sane aggregation
+ *      (minimum seems like a good approach). Of course, this may result
+ *      in longer / more expensive estimation (CPU-wise), but it may be
+ *      worth it.
+ *
+ *      It's possible to add a GUC choosing whether to do a 'simple'
+ *      (using a single stats expected to give the best estimate) and
+ *      'complex' (combining the multiple estimates).
+ *
+ *          multivariate_estimates = (simple|full)
+ *
+ *      Also, this might be enabled at a table level, by something like
+ *
+ *          ALTER TABLE ... SET STATISTICS (simple|full)
+ *
+ *      Which would make it possible to use this only for the tables
+ *      where the simple approach does not work.
+ *
+ *      Also, there are ways to optimize this algorithmically. E.g. we
+ *      may try to get an estimate from a matching MCV list first, and
+ *      if we happen to get a "full equality match" we may stop computing
+ *      the estimates from other stats (for this condition) because
+ *      that's probably the best estimate we can really get.
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
 collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
-				   Index *relid, SpecialJoinInfo *sjinfo)
+				   Index *relid, SpecialJoinInfo *sjinfo, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
@@ -945,12 +1137,11 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(root, clause, varRelid, relid, &attnum, sjinfo))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+								sjinfo, types);
 	}
 
 	/*
@@ -969,6 +1160,188 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
+ * We're looking for statistics matching at least 2 attributes,
+ * referenced in the clauses compatible with multivariate statistics.
+ * The current selection criteria is very simple - we choose the
+ * statistics referencing the most attributes.
+ *
+ * If there are multiple statistics referencing the same number of
+ * columns (from the clauses), the one with less source columns
+ * (as listed in the ADD STATISTICS when creating the statistics) wins.
+ * Other wise the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns,
+ *     but one has 100 buckets and the other one has 1000 buckets (thus
+ *     likely providing better estimates), this is not currently
+ *     considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list,
+ *     another one with just a histogram and a third one with both,
+ *     this is not considered.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts,
+ *     so if there are multiple clauses on a single attribute, this
+ *     still counts as a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example
+ *     equality clauses probably work better with MCV lists than with
+ *     histograms. But IS [NOT] NULL conditions may often work better
+ *     with histograms (thanks to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
+ * selected as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list
+ * of clauses into two parts - conditions that are compatible with the
+ * selected stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while
+ * the last condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting
+ * conditions instead of just referenced attributes), but eventually
+ * the best option should be to combine multiple statistics. But that's
+ * much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing
+ *      the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements)
+	 * and for each one count the referenced attributes (encoded in
+	 * the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen statistics, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes covered by the stats, so we can
+	 * do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
+									&attnums, sjinfo, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list
+		 * of mv-compatible clauses. Otherwise, keep it in the list of
+		 * 'regular' clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
  * into simple_rte_array) and a bitmap of attributes. This is then
@@ -987,8 +1360,12 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
  */
 static bool
 clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
-						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+						Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+						int types)
 {
+	Relids clause_relids;
+	Relids left_relids;
+	Relids right_relids;
 
 	if (IsA(clause, RestrictInfo))
 	{
@@ -998,82 +1375,176 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		if (rinfo->pseudoconstant)
 			return false;
 
-		/* no support for OR clauses at this point */
-		if (rinfo->orclause)
-			return false;
-
 		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
 		clause = (Node*)rinfo->clause;
 
-		/* only simple opclauses are compatible with multivariate stats */
-		if (! is_opclause(clause))
-			return false;
-
 		/* we don't support join conditions at this moment */
 		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
 			return false;
 
+		clause_relids = rinfo->clause_relids;
+		left_relids = rinfo->left_relids;
+		right_relids = rinfo->right_relids;
+	}
+	else if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+	{
+		left_relids = pull_varnos(get_leftop((Expr*)clause));
+		right_relids = pull_varnos(get_rightop((Expr*)clause));
+
+		clause_relids = bms_union(left_relids,
+								  right_relids);
+	}
+	else
+	{
+		/* Not a binary opclause, so mark left/right relid sets as empty */
+		left_relids = NULL;
+		right_relids = NULL;
+		/* and get the total relid set the hard way */
+		clause_relids = pull_varnos((Node *) clause);
+	}
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+		bool		varonleft = true;
+		bool		ok;
+
 		/* is it 'variable op constant' ? */
-		if (list_length(((OpExpr *) clause)->args) == 2)
+
+		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
+			(is_pseudo_constant_clause_relids(lsecond(expr->args),
+											  right_relids) ||
+			(varonleft = false,
+			is_pseudo_constant_clause_relids(linitial(expr->args),
+											 left_relids)));
+
+		if (ok)
 		{
-			OpExpr	   *expr = (OpExpr *) clause;
-			bool		varonleft = true;
-			bool		ok;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
 
-			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-				(is_pseudo_constant_clause_relids(lsecond(expr->args),
-												rinfo->right_relids) ||
-				(varonleft = false,
-				is_pseudo_constant_clause_relids(linitial(expr->args),
-												rinfo->left_relids)));
+			/*
+			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+			 * (return NULL).
+			 *
+			 * TODO Maybe use examine_variable() would fix that?
+			 */
+			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+				return false;
 
-			if (ok)
-			{
-				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			/*
+			 * Only consider this variable if (varRelid == 0) or when the varno
+			 * matches varRelid (see explanation at clause_selectivity).
+			 *
+			 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+			 *       part seems to be enforced by treat_as_join_clause().
+			 */
+			if (! ((varRelid == 0) || (varRelid == var->varno)))
+				return false;
 
-				/*
-				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-				 * (return NULL).
-				 *
-				 * TODO Maybe use examine_variable() would fix that?
-				 */
-				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-					return false;
+			/* Also skip special varno values, and system attributes ... */
+			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+				return false;
 
-				/*
-				 * Only consider this variable if (varRelid == 0) or when the varno
-				 * matches varRelid (see explanation at clause_selectivity).
-				 *
-				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-				 *       part seems to be enforced by treat_as_join_clause().
-				 */
-				if (! ((varRelid == 0) || (varRelid == var->varno)))
-					return false;
+			/* Lookup info about the base relation (we need to pass the OID out) */
+			if (relid != NULL)
+				*relid = var->varno;
+
+			/*
+			 * If it's not a "<" or ">" or "=" operator, just ignore the
+			 * clause. Otherwise note the relid and attnum for the variable.
+			 * This uses the function for estimating selectivity, ont the
+			 * operator directly (a bit awkward, but well ...).
+			 */
+			switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:
+					case F_SCALARGTSEL:
+						/* not compatible with functional dependencies */
+						if (types & MV_CLAUSE_TYPE_MCV)
+						{
+							*attnums = bms_add_member(*attnums, var->varattno);
+							return (types & MV_CLAUSE_TYPE_MCV);
+						}
+						return false;
+
+					case F_EQSEL:
+						*attnums = bms_add_member(*attnums, var->varattno);
+						return true;
+				}
+		}
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		Var * var = (Var*)((NullTest*)clause)->arg;
+
+		/*
+		 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+		 * (return NULL).
+		 *
+		 * TODO Maybe use examine_variable() would fix that?
+		 */
+		if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+			return false;
+
+		/*
+		 * Only consider this variable if (varRelid == 0) or when the varno
+		 * matches varRelid (see explanation at clause_selectivity).
+		 *
+		 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+		 *       part seems to be enforced by treat_as_join_clause().
+		 */
+		if (! ((varRelid == 0) || (varRelid == var->varno)))
+			return false;
 
-				/* Also skip special varno values, and system attributes ... */
-				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-					return false;
+		/* Also skip special varno values, and system attributes ... */
+		if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+			return false;
 
+		/* Lookup info about the base relation (we need to pass the OID out) */
+		if (relid != NULL)
 				*relid = var->varno;
 
-				/*
-				 * If it's not a "<" or ">" or "=" operator, just ignore the
-				 * clause. Otherwise note the relid and attnum for the variable.
-				 * This uses the function for estimating selectivity, ont the
-				 * operator directly (a bit awkward, but well ...).
-				 */
-				switch (get_oprrest(expr->opno))
-					{
-						case F_EQSEL:
-							*attnum = var->varattno;
-							return true;
-					}
-			}
+		*attnums = bms_add_member(*attnums, var->varattno);
+
+		return true;
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		/*
+		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses
+		 *      are supported and some are not, and treat all supported
+		 *      subclauses as a single clause, compute it's selectivity
+		 *      using mv stats, and compute the total selectivity using
+		 *      the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the
+		 *      orclause with nested RestrictInfo - we won't have to
+		 *      call pull_varnos() for each clause, saving time. 
+		 */
+		Bitmapset *tmp = NULL;
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			if (! clause_is_mv_compatible(root, (Node*)lfirst(l),
+						varRelid, relid, &tmp, sjinfo, types))
+				return false;
 		}
+
+		/* add the attnums from the OR-clause to the set of attnums */
+		*attnums = bms_join(*attnums, tmp);
+
+		return true;
 	}
 
 	return false;
-
 }
 
 /*
@@ -1322,6 +1793,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+			return true;
 	}
 
 	return false;
@@ -1617,25 +2091,39 @@ fdeps_filter_clauses(PlannerInfo *root,
 
 	foreach (lc, clauses)
 	{
-		AttrNumber	attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
 
-		if (! clause_is_mv_compatible(root, clause, varRelid, relid,
-									  &attnum, sjinfo))
+		if (! clause_is_mv_compatible(root, clause, varRelid, relid, &attnums,
+									  sjinfo, MV_CLAUSE_TYPE_FDEP))
 
 			/* clause incompatible with functional dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
-		else if (! bms_is_member(attnum, deps_attnums))
+		else if (bms_num_members(attnums) > 1)
+
+			/*
+			 * clause referencing multiple attributes (strange, should
+			 * this be handled by clause_is_mv_compatible directly)
+			 */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
 
 			/* clause not covered by the dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
 		else
 		{
+			/* ok, clause compatible with existing dependencies */
+			Assert(bms_num_members(attnums) == 1);
+
 			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+										bms_singleton_member(attnums));
 		}
+
+		bms_free(attnums);
 	}
 
 	return clause_attnums;
@@ -1673,3 +2161,454 @@ get_varattnos(Node * node, Index relid)
 
 	return result;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/* frequency of the lowest MCV item */
+	*lowsel = 1.0;
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 *
+	 * FIXME This would probably deserve a refactoring, I guess. Unify
+	 *       the two loops and put the checks inside, or something like
+	 *       that.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			/* operator */
+			FmgrInfo		opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	ltproc, gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype,
+										TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * find the lowest selectivity in the MCV
+					 * FIXME Maybe not the best place do do this (in for all clauses).
+					 */
+					if (item->frequency < *lowsel)
+						*lowsel = item->frequency;
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* TODO consider bsearch here (list is sorted by values)
+					 * TODO handle other operators too (LT, GT)
+					 * TODO identify "full match" when the clauses fully
+					 *      match the whole MCV list (so that checking the
+					 *      histogram is not needed)
+					 */
+					if (oprrest == F_EQSEL)
+					{
+						/*
+						 * We don't care about isgt in equality, because it does not
+						 * matter whether it's (var = const) or (const = var).
+						 */
+						bool match = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+
+						if (match)
+							eqmatches = bms_add_member(eqmatches, idx);
+
+						mismatch = (! match);
+					}
+					else if (oprrest == F_SCALARLTSEL)	/* column < constant */
+					{
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+						} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+					}
+					else if (oprrest == F_SCALARGTSEL)	/* column > constant */
+					{
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/*
+				 * find the lowest selectivity in the MCV
+				 * FIXME Maybe not the best place do do this (in for all clauses).
+				 */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! mcvlist->items[i]->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (mcvlist->items[i]->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 83bd85c..0cb4063 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -410,7 +410,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -419,9 +419,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd200bc..d1da714 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 6d5465b..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..670dbda
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1237 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Multivariate MCVs (most-common values lists) are a straightforward
+ * extension of regular MCV list, tracking combinations of values for
+ * several attributes (columns), including NULL flags, and frequency
+ * of the combination.
+ *
+ * For columns with small number of distinct values, this works quite
+ * well and may represent the distribution very accurately. For columns
+ * with large number of distinct values (e.g. stored as FLOAT), this
+ * does not work that well. Especially if the distribution is mostly
+ * uniform, with no very common combinations.
+ *
+ * If we can represent the distribution as a MCV list, we can estimate
+ * some clauses (e.g. equality clauses) much accurately than using
+ * histograms for example.
+ *
+ * Another benefit of MCV lists (compared to histograms) is that they
+ * don't require sorting of the values, so that they work better for
+ * data types that either don't support sorting at all, or when the
+ * sorting does not really match the meaning. For example we know how to
+ * sort strings, but it's unlikely to make much sense for city names.
+ *
+ *
+ * Hashed MCV (not yet implemented)
+ * -------------------------------- 
+ * By restricting to MCV list and equality conditions, we may use hash
+ * values instead of the long varlena values. This significantly reduces
+ * the storage requirements, and we can still use it to estimate the
+ * equality conditions (assuming the collisions are rare enough).
+ *
+ * This however complicates matching the columns to available stats, as
+ * it requires matching clauses (not columns) to stats. And it may get
+ * quite complex - e.g. what if there are multiple clauses, each
+ * compatible with different stats subset?
+ *
+ *
+ * Selectivity estimation
+ * ----------------------
+ * The estimation, implemented in clauselist_mv_selectivity_mcvlist(),
+ * is quite simple in principle - walk through the MCV items and sum
+ * frequencies of all the items that match all the clauses.
+ *
+ * The current implementation uses MCV lists to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *  (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (e) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ *
+ * Estimating equality clauses
+ * ---------------------------
+ * When computing selectivity estimate for equality clauses
+ *
+ *      (a = 1) AND (b = 2)
+ *
+ * we can do this estimate pretty exactly assuming that two conditions
+ * are met:
+ *
+ *      (1) there's an equality condition on each attribute
+ *
+ *      (2) we find a matching item in the MCV list
+ *
+ * In that case we know the MCV item represents all the tuples matching
+ * the clauses, and the selectivity estimate is complete. This is what
+ * we call 'full match'.
+ *
+ * When only (1) holds, but there's no matching MCV item, we don't know
+ * whether there are no such rows or just are not very frequent. We can
+ * however use the frequency of the least frequent MCV item as an upper
+ * bound for the selectivity.
+ *
+ * If the equality conditions match only a subset of the attributes
+ * the MCV list is built on (i.e. we can't get a full match - we may get
+ * multiple MCV items matching the clauses, but even if we get a single
+ * match there may be items that did not get into the MCV list. But in
+ * this case we can still use the frequency of the last MCV item to clam
+ * the 'additional' selectivity not accounted for by the matching items.
+ *
+ * If there's no histogram, because the MCV list approximates the
+ * distribution accurately (not because the histogram was disabled),
+ * it does not really matter whether there are equality conditions on
+ * all the columns - we can do pretty accurate estimation using the MCV.
+ *
+ * TODO For a combination of equality conditions (not full-match case)
+ *      we probably can clamp the selectivity by the minimum of
+ *      selectivities for each condition. For example if we know the
+ *      number of distinct values for each column, we can use 1/ndistinct
+ *      as a per-column estimate. Or rather 1/ndistinct + selectivity
+ *      derived from the MCV list.
+ *
+ *      If we know the estimate of number of combinations of the columns
+ *      (i.e. ndistinct(A,B)), we may estimate the average frequency of
+ *      items in the remaining 10% as [10% / ndistinct(A,B)].
+ *
+ *
+ * Bounding estimates
+ * ------------------
+ * In general the MCV lists may not provide estimates as accurate as
+ * for the full-match equality case, but may provide some useful
+ * lower/upper boundaries for the estimation error.
+ *
+ * With equality clauses we can do a few more tricks to narrow this
+ * error range (see the previous section and TODO), but with inequality
+ * clauses (or generally non-equality clauses), it's rather dificult.
+ * There's nothing like a 'full match' - we have to consider both the
+ * MCV items and the remaining part every time. We can't use the minimum
+ * selectivity of MCV items, as the clauses may match multiple items.
+ *
+ * For example with a MCV list on columns (A, B), covering 90% of the
+ * table (computed while building the MCV list), about ~10% of the table
+ * is not represented by the MCV list. So even if the conditions match
+ * all the remaining rows (not represented by the MCV items), we can't
+ * get selectivity higher than those 10%. We may use 1/2 the remaining
+ * selectivity as an estimate (minimizing average error).
+ *
+ * TODO Most of these ideas (error limiting) are not yet implemented.
+ *
+ *
+ * General TODO
+ * ------------
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * TODO Add support for clauses referencing multiple columns (a < b).
+ *
+ * TODO It's possible to build a special case of MCV list, storing not
+ *      the actual values but only 32/64-bit hash. This is only useful
+ *      for estimating equality clauses and for large varlena types,
+ *      which are very impractical for plain MCV list because of size.
+ *      But for those data types we really want just the equality
+ *      clauses, so it's actually a good solution.
+ *
+ * TODO Currently there's no logic to consider building only a MCV list
+ *      (and not building the histogram at all), except for doing this
+ *      decision manually in ADD STATISTICS.
+ */
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't
+ * allow more than 8k MCV items (see list max_mcv_items). We might
+ * increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the high compression as with histograms,
+ * because we're not doing any bucket splits etc. (which is the source
+ * of high redundancy there), but we need to do it anyway as we need
+ * to serialize varlena values etc. We might invent another way to
+ * serialize MCV lists, but let's keep it consistent.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* do not exceed UINT16_MAX */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval || (info[i].typlen > 0))
+			/* by value pased by reference, but fixed length */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * Inverse to serialize_mv_mcvlist() - see the comment there.
+ *
+ * We'll do full deserialization, because we don't really expect high
+ * duplication of values so the caching may not be as efficient as with
+ * histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			val = item->values[i];
+			valout = FunctionCall1(&fmgrinfo[i], val);
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 590cd51..7d13a38 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,8 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2128,6 +2129,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2137,10 +2140,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index a568a07..fd7107d 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -37,15 +37,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -61,13 +67,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					7
+#define Natts_pg_mv_statistic					11
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
-#define Anum_pg_mv_statistic_deps_built			5
-#define Anum_pg_mv_statistic_stakeys			6
-#define Anum_pg_mv_statistic_stadeps			7
+#define Anum_pg_mv_statistic_mcv_enabled		5
+#define Anum_pg_mv_statistic_mcv_max_items		6
+#define Anum_pg_mv_statistic_deps_built			7
+#define Anum_pg_mv_statistic_mcv_built			8
+#define Anum_pg_mv_statistic_stakeys			9
+#define Anum_pg_mv_statistic_stadeps			10
+#define Anum_pg_mv_statistic_stamcv				11
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 76e054d..1875e26 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2745,6 +2745,10 @@ DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7ae0f9e..d3c9898 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -592,9 +592,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index cc43a79..4535db7 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -51,30 +51,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..56748e3
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON mcv_list (unknown_column) WITH (mcv);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON mcv_list (a) WITH (mcv);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a, b) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+ERROR:  max number of MCV items is 8192
+-- correct command
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2e2df8e..ac5007e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1369,7 +1369,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 81484f1..838c12b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 14ea574..d97a0ec 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -162,3 +162,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..af4c9f4
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON mcv_list (unknown_column) WITH (mcv);
+
+-- single column
+CREATE STATISTICS s1 ON mcv_list (a) WITH (mcv);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a) WITH (mcv);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a, b) WITH (mcv);
+
+-- unknown option
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (unknown_option);
+
+-- missing MCV statistics
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+
+-- correct command
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.1.0

0005-multivariate-histograms.patchapplication/x-patch; name=0005-multivariate-histograms.patchDownload

From ff5b8b94fc19654a7fe98b0701d89af668388313 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/7] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 doc/src/sgml/ref/create_statistics.sgml    |   18 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   44 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  718 ++++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2316 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  136 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 19 files changed, 3680 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 193e4b0..fd3382e 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -133,6 +133,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>max_buckets</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of histogram buckets.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>max_mcv_items</> (<type>integer</>)</term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5488061..0fbdfa5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -167,7 +167,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 90bfaed..b974655 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -137,12 +137,15 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -241,6 +244,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("maximum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -249,10 +275,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -260,6 +286,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -279,11 +310,14 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9e029ef..0edc839 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1949,10 +1949,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d194551..5b2d92a 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -73,6 +74,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -80,6 +83,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
@@ -114,6 +123,7 @@ static Bitmapset * get_varattnos(Node * node, Index relid);
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -304,7 +314,7 @@ clauselist_selectivity(PlannerInfo *root,
 	 * Check that there are statistics with MCV list. If not, we don't
 	 * need to waste time with the optimization.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 	{
 		/*
 		 * Recollect attributes from mv-compatible clauses (maybe we've
@@ -312,7 +322,7 @@ clauselist_selectivity(PlannerInfo *root,
 		 * From now on we're only interested in MCV-compatible clauses.
 		 */
 		mvattnums = collect_mv_attnums(root, clauses, varRelid, &relid, sjinfo,
-									   MV_CLAUSE_TYPE_MCV);
+									   (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 		/*
 		 * If there still are at least two columns, we'll try to select
@@ -331,7 +341,7 @@ clauselist_selectivity(PlannerInfo *root,
 				/* split the clauselist into regular and mv-clauses */
 				clauses = clauselist_mv_split(root, sjinfo, clauses,
 										varRelid, &mvclauses, mvstat,
-										MV_CLAUSE_TYPE_MCV);
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 				/* we've chosen the histogram to match the clauses */
 				Assert(mvclauses != NIL);
@@ -1098,6 +1108,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -1111,9 +1122,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1255,7 +1281,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1415,7 +1441,6 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		bool		ok;
 
 		/* is it 'variable op constant' ? */
-
 		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
 			(is_pseudo_constant_clause_relids(lsecond(expr->args),
 											  right_relids) ||
@@ -1465,10 +1490,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 					case F_SCALARLTSEL:
 					case F_SCALARGTSEL:
 						/* not compatible with functional dependencies */
-						if (types & MV_CLAUSE_TYPE_MCV)
+						if (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 						{
 							*attnums = bms_add_member(*attnums, var->varattno);
-							return (types & MV_CLAUSE_TYPE_MCV);
+							return (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 						}
 						return false;
 
@@ -1796,6 +1821,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2612,3 +2640,675 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 *
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																	   | TYPECACHE_GT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					bool tmp;
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+
+					/* values from the call cache */
+					char mincached, maxcached;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mincached = callcache[bucket->min[idx]];
+					maxcached = callcache[bucket->max[idx]];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*
+					* FIXME Once the min/max values are deduplicated, we can easily minimize
+					*       the number of calls to the comparator (assuming we keep the
+					*       deduplicated structure). See the note on compression at MVBucket
+					*       serialize/deserialize methods.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* column < constant */
+
+							if (! isgt)	/* (var < const) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 minval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (minval, constvalue).
+									 */
+									callcache[bucket->min[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(mincached & 0x02);	/* get call result from the cache (inverse) */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 maxval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (minval, constvalue).
+									 */
+									callcache[bucket->max[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(maxcached & 0x02);	/* extract the result (reverse) */
+
+								if (tmp)	/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							}
+							else	/* (const < var) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 maxval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 minval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (mincached & 0x02);	/* extract the result */
+
+								if (tmp)	/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_SCALARGTSEL:	/* column > constant */
+
+							if (! isgt)	/* (var > const) */
+							{
+								/*
+								 * First check whether the constant is above the upper boundary (in that
+								 * case we can skip the bucket, because there's no overlap).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 maxval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (val, constvalue).
+									 */
+									callcache[bucket->max[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 cst->constvalue,
+																		 minval));
+
+									/*
+									 * Update the cache, but with the inverse value, as we keep the
+									 * cache for calls with (val, constvalue).
+									 */
+									callcache[bucket->min[idx]] = (tmp) ? 0x01 : 0x03;
+								}
+								else
+									tmp = !(mincached & 0x02);	/* extract the result */
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							else /* (const > var) */
+							{
+								/*
+								 * First check whether the constant is below the lower boundary (in
+								 * that case we can skip the bucket, because there's no overlap).
+								 */
+								if (! mincached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 minval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (mincached & 0x02);	/* extract the result */
+
+								if (tmp)
+								{
+									/* no match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 */
+								if (! maxcached)
+								{
+									tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																		 DEFAULT_COLLATION_OID,
+																		 maxval,
+																		 cst->constvalue));
+
+									/* Update the cache. */
+									callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+								}
+								else
+									tmp = (maxcached & 0x02);	/* extract the result */
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+							}
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the lt/gt
+							 * operators fetched from type cache.
+							 *
+							 * TODO We'll use the default 50% estimate, but that's probably way off
+							 *		if there are multiple distinct values. Consider tweaking this a
+							 *		somehow, e.g. using only a part inversely proportional to the
+							 *		estimated number of distinct values in the bucket.
+							 *
+							 * TODO This does not handle inclusion flags at the moment, thus counting
+							 *		some buckets twice (when hitting the boundary).
+							 *
+							 * TODO Optimization is that if max[i] == min[i], it's effectively a MCV
+							 *		item and we can count the whole bucket as a complete match (thus
+							 *		using 100% bucket selectivity and not just 50%).
+							 *
+							 * TODO Technically some buckets may "degenerate" into single-value
+							 *		buckets (not necessarily for all the dimensions) - maybe this
+							 *		is better than keeping a separate MCV list (multi-dimensional).
+							 *		Update: Actually, that's unlikely to be better than a separate
+							 *		MCV list for two reasons - first, it requires ~2x the space
+							 *		(because of storing lower/upper boundaries) and second because
+							 *		the buckets are ranges - depending on the partitioning algorithm
+							 *		it may not even degenerate into (min=max) bucket. For example the
+							 *		the current partitioning algorithm never does that.
+							 */
+							if (! mincached)
+							{
+								tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 minval));
+
+								/* Update the cache. */
+								callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
+							}
+							else
+								tmp = (mincached & 0x02);	/* extract the result */
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							if (! maxcached)
+							{
+								tmp = DatumGetBool(FunctionCall2Coll(&ltproc,
+																	 DEFAULT_COLLATION_OID,
+																	 maxval,
+																	 cst->constvalue));
+
+								/* Update the cache. */
+								callcache[bucket->max[idx]] = (tmp) ? 0x03 : 0x01;
+							}
+							else
+								tmp = (maxcached & 0x02);	/* extract the result */
+
+							if (tmp)
+							{
+								/* no match */
+								UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+								continue;
+							}
+
+							/* partial match */
+							UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+
+							break;
+					}
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0cb4063..963d26e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -410,7 +410,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -420,10 +420,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d1da714..ffb76f4 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..933700f
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2316 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+/*
+ * Multivariate histograms
+ * -----------------------
+ *
+ * Histograms are a collection of buckets, represented by n-dimensional
+ * rectangles. Each rectangle is delimited by a min/max value in each
+ * dimension, stored in an array, so that the bucket includes values
+ * fulfilling condition
+ *
+ *     min[i] <= value[i] <= max[i]
+ *
+ * where 'i' is the dimension. In 1D this corresponds to a simple
+ * interval, in 2D to a rectangle, and in 3D to a block. If you can
+ * imagine this in 4D, congrats!
+ *
+ * In addition to the bounaries, each bucket tracks additional details:
+ *
+ *     * frequency (fraction of tuples it matches)
+ *     * whether the boundaries are inclusive or exclusive
+ *     * whether the dimension contains only NULL values
+ *     * number of distinct values in each dimension (for building)
+ *
+ * and possibly some additional information.
+ *
+ * We do expect to support multiple histogram types, with different
+ * features etc. The 'type' field is used to identify those types.
+ * Technically some histogram types might use completely different
+ * bucket representation, but that's not expected at the moment.
+ *
+ * Although the current implementation builds non-overlapping buckets,
+ * the code does not (and should not) rely on the non-overlapping
+ * nature - there are interesting types of histograms / histogram
+ * building algorithms producing overlapping buckets.
+ *
+ *
+ * NULL handling (create_null_buckets)
+ * -----------------------------------
+ * Another thing worth mentioning is handling of NULL values. It would
+ * be quite difficult to work with buckets containing NULL and non-NULL
+ * values for a single dimension. To work around this, the initial step
+ * in building a histogram is building a set of 'NULL-buckets', i.e.
+ * buckets with one or more NULL-only dimensions.
+ *
+ * After that, no buckets are mixing NULL and non-NULL values in one
+ * dimension, and the actual histogram building starts. As that only
+ * splits the buckets into smaller ones, the resulting buckets can't
+ * mix NULL and non-NULL values either.
+ *
+ * The maximum number of NULL-buckets is determined by the number of
+ * attributes the histogram is built on. For N-dimensional histogram,
+ * the maximum number of NULL-buckets is 2^N. So for 8 attributes
+ * (which is the current value of MVSTATS_MAX_DIMENSIONS), there may be
+ * up to 256 NULL-buckets.
+ *
+ * Those buckets are only built if needed - if there are no NULL values
+ * in the data, no such buckets are built.
+ *
+ *
+ * Estimating selectivity
+ * ----------------------
+ * With histograms, we always "match" a whole bucket, not indivitual
+ * rows (or values), irrespectedly of the type of clause. Therefore we
+ * can't use the optimizations for equality clauses, as in MCV lists.
+ *
+ * The current implementation uses histograms to estimates those types
+ * of clauses (think of WHERE conditions):
+ *
+ *  (a) equality clauses    WHERE (a = 1) AND (b = 2)
+ *  (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+ *  (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+ *  (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+ *
+ * It's possible to add more clauses, for example:
+ *
+ *  (e) multi-var clauses   WHERE (a > b)
+ *
+ * and so on. These are tasks for the future, not yet implemented.
+ *
+ * When used on low-cardinality data, histograms usually perform
+ * considerably worse than MCV lists (which are a good fit for this
+ * kind of data). This is especially true on categorical data, where
+ * ordering of the values is mostly unrelated to meaning of the data,
+ * as proper ordering is crucial for histograms.
+ *
+ * On high-cardinality data the histograms are usually a better choice,
+ * because MCV lists can't represent the distribution accurately enough.
+ *
+ * By evaluating a clause on a bucket, we may get one of three results:
+ *
+ *     (a) FULL_MATCH - The bucket definitely matches the clause.
+ *
+ *     (b) PARTIAL_MATCH - The bucket matches the clause, but not
+ *                         necessarily all the tuples it represents.
+ *
+ *     (c) NO_MATCH - The bucket definitely does not match the clause.
+ *
+ * This may be illustrated using a range [1, 5], which is essentially
+ * a 1-D bucket. With clause
+ *
+ *     WHERE (a < 10) => FULL_MATCH (all range values are below
+ *                       10, so the whole bucket matches)
+ *
+ *     WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+ *                       the clause, but we don't know how many)
+ *
+ *     WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+ *                       no values from the bucket can match)
+ *
+ * Some clauses may produce only some of those results - for example
+ * equality clauses may never produce FULL_MATCH as we always hit only
+ * part of the bucket (we can't match both boundaries at the same time).
+ * This results in less accurate estimates compared to MCV lists, where
+ * we can hit a MCV items exactly (there's no PARTIAL match in MCV).
+ *
+ * There are clauses that may not produce any PARTIAL_MATCH results.
+ * A nice example of that is 'IS [NOT] NULL' clause, which either
+ * matches the bucket completely (FULL_MATCH) or not at all (NO_MATCH),
+ * thanks to how the NULL-buckets are constructed.
+ *
+ * Computing the total selectivity estimate is trivial - simply sum
+ * selectivities from all the FULL_MATCH and PARTIAL_MATCH buckets (but
+ * multiply the PARTIAL_MATCH buckets by 0.5 to minimize average error).
+ *
+ *
+ * Serialization
+ * -------------
+ * After building, the histogram is serialized into a more efficient
+ * form (dedup boundary values etc.). See serialize_mv_histogram() for
+ * more details about how it's done.
+ *
+ * Serialized histograms are marked with 'magic' constant, to make it
+ * easier to check the bytea value really is a serialized histogram.
+ *
+ * In the serialized form, values for each dimension are deduplicated,
+ * and referenced using an uint16 index. This saves a lot of space,
+ * because every time we split a bucket, we introduce a single new
+ * boundary value (to split the bucket by the selected dimension), but
+ * we actually copy all the boundary values for all dimensions. So for
+ * a histogram with 4 dimensions and 1000 buckets, we do have
+ *
+ *     1000 * 4 * 2 = 8000
+ *
+ * boundary values, but many of them are actually duplicated because
+ * the histogram started with a single bucket (8 boundary values) and
+ * then there were 999 splits (each introducing 1 new value):
+ *
+ *      8 + 999 = 1007
+ *
+ * So that's quite large diffence. Let's assume the Datum values are
+ * 8 bytes each. Storing the raw histogram would take ~ 64 kB, while
+ * with deduplication it's only ~18 kB.
+ *
+ * The difference may be removed by the transparent bytea compression,
+ * but the deduplication is also used to optimize the estimation. It's
+ * possible to process the deduplicated values, and then use this as
+ * a cache to minimize the actual function calls while checking the
+ * buckets. This significantly reduces the number of calls to the
+ * (often quite expensive) operator functions etc.
+ *
+ *
+ * The current limit on number of buckets (16384) is mostly arbitrary,
+ * but set so that it makes sure we don't exceed the number of distinct
+ * values indexable by uint16. In practice we could handle more buckets,
+ * because we index each dimension independently, and we do the splits
+ * over multiple dimensions.
+ *
+ * Histograms with more than 16k buckets are quite expensive to build
+ * and process, so the current limit is somewhat reasonable.
+ *
+ * The actual number of buckets is also related to statistics target,
+ * because we require MIN_BUCKET_ROWS (10) tuples per bucket before
+ * a split, so we can't have more than (2 * 300 * target / 10) buckets.
+ *
+ *
+ * TODO Maybe the distinct stats (both for combination of all columns
+ *      and for combinations of various subsets of columns) should be
+ *      moved to a separate structure (next to histogram/MCV/...) to
+ *      make it useful even without a histogram computed etc.
+ *
+ *      This would actually make mvcoeff (proposed by Kyotaro Horiguchi
+ *      in [1]) possible. Seems like a good way to estimate GROUP BY
+ *      cardinality, and also some other cases, pointed out by Kyotaro:
+ *
+ *      [1] http://www.postgresql.org/message-id/20150515.152936.83796179.horiguchi.kyotaro@lab.ntt.co.jp
+ *
+ *      This is not implemented at the moment, though. Also, Kyotaro's
+ *      patch only works with pairs of columns, but maybe tracking all
+ *      the combinations would be useful to handle more complex
+ *      conditions. It only seems to handle equalities, though (but for
+ *      GROUP BY estimation that's not a big deal).
+ */
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size.
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket
+ * for more details about the algorithm.
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later
+	 * to select dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		int				j;
+		int				nvals;
+		Datum		   *tmp;
+
+		SortSupportData	ssup;
+		StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		nvals = 0;
+		tmp = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+		for (j = 0; j < numrows; j++)
+		{
+			bool	isnull;
+
+			/* remember the index of the sample row, to make the partitioning simpler */
+			Datum	value = heap_getattr(rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isnull);
+
+			if (isnull)
+				continue;
+
+			tmp[nvals++] = value;
+		}
+
+		/* do the sort and stuff only if there are non-NULL values */
+		if (nvals > 0)
+		{
+			/* sort the array of values */
+			qsort_arg((void *) tmp, nvals, sizeof(Datum),
+					  compare_scalars_simple, (void *) &ssup);
+
+			/* count distinct values */
+			ndistvalues[i] = 1;
+			for (j = 1; j < nvals; j++)
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+					ndistvalues[i] += 1;
+
+			/* FIXME allocate only needed space (count ndistinct first) */
+			distvalues[i] = (Datum*)palloc0(sizeof(Datum) * ndistvalues[i]);
+
+			/* now collect distinct values into the array */
+			distvalues[i][0] = tmp[0];
+			ndistvalues[i] = 1;
+
+			for (j = 1; j < nvals; j++)
+			{
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+				{
+					distvalues[i][ndistvalues[i]] = tmp[j];
+					ndistvalues[i] += 1;
+				}
+			}
+		}
+
+		pfree(tmp);
+	}
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats,
+							   ndistvalues, distvalues);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in
+		 * case some of the rows were used for MCV (and thus are missing
+		 * from the histogram).
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm
+ * is simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use 32-bit values for the indexes in step (3), although we
+ * could probably use just 16 bits as we don't allow more than 8k
+ * buckets in the histogram max_buckets (well, we might increase this
+ * to 16k and still fit into signed 16-bits). But let's be lazy and rely
+ * on the varlena compression to kick in. If most bytes will be 0x00
+ * so it should work nicely.
+ *
+ *
+ * Deduplication in serialization
+ * ------------------------------
+ * The deduplication is very effective and important here, because every
+ * time we split a bucket, we keep all the boundary values, except for
+ * the dimension that was used for the split. Another way to look at
+ * this is that each split introduces 1 new value (the value used to do
+ * the split). A histogram with M buckets was created by (M-1) splits
+ * of the initial bucket, and each bucket has 2*N boundary values. So
+ * assuming the initial bucket does not have any 'collapsed' dimensions,
+ * the number of distinct values is
+ *
+ *     (2*N + (M-1))
+ *
+ * but the total number of boundary values is
+ *
+ *     2*N*M
+ *
+ * which is clearly much higher. For a histogram on two columns, with
+ * 1024 buckets, it's 1027 vs. 4096. Of course, we're not saving all
+ * the difference (because we'll use 32-bit indexes into the values).
+ * But with large values (e.g. stored as varlena), this saves a lot.
+ *
+ * An interesting feature is that the total number of distinct values
+ * does not really grow with the number of dimensions, except for the
+ * size of the initial bucket. After that it only depends on number of
+ * buckets (i.e. number of splits).
+ *
+ * XXX Of course this only holds for the current histogram building
+ *     algorithm. Algorithms doing the splits differently (e.g.
+ *     producing overlapping buckets) may behave differently.
+ *
+ * TODO This only confirms we can use the uint16 indexes. The worst
+ *      that could happen is if all the splits happened by a single
+ *      dimension. To exhaust the uint16 this would require ~64k
+ *      splits (needs to be reflected in MVSTAT_HIST_MAX_BUCKETS).
+ *
+ * TODO We don't need to use a separate boolean for each flag, instead
+ *      use a single char and set bits.
+ *
+ * TODO We might get a bit better compression by considering the actual
+ *      data type length. The current implementation treats all data
+ *      types passed by value as requiring 8B, but for INT it's actually
+ *      just 4B etc.
+ *
+ *      OTOH this is only related to the lookup table, and most of the
+ *      space is occupied by the buckets (with int16 indexes).
+ *
+ *
+ * Varlena compression
+ * -------------------
+ * This encoding may prevent automatic varlena compression (similarly
+ * to JSONB), because first part of the serialized bytea will be an
+ * array of unique values (although sorted), and pglz decides whether
+ * to compress by trying to compress the first part (~1kB or so). Which
+ * is likely to be poor, due to the lack of repetition.
+ *
+ * One possible cure to that might be storing the buckets first, and
+ * then the deduplicated arrays. The buckets might be better suited
+ * for compression.
+ *
+ * On the other hand the encoding scheme is a context-aware compression,
+ * usually compressing to ~30% (or less, with large data types). So the
+ * lack of pglz compression may be OK.
+ *
+ * XXX But maybe we don't really want to compress this, to save on
+ *     planning time?
+ *
+ * TODO Try storing the buckets / deduplicated arrays in reverse order,
+ *      measure impact on compression.
+ *
+ *
+ * Deserialization
+ * ---------------
+ * The deserialization is currently implemented so that it reconstructs
+ * the histogram back into the same structures - this involves quite
+ * a few of memcpy() and palloc(), but maybe we could create a special
+ * structure for the serialized histogram, and access the data directly,
+ * without the unpacking.
+ *
+ * Not only it would save some memory and CPU time, but might actually
+ * work better with CPU caches (not polluting the caches).
+ *
+ * TODO Try to keep the compressed form, instead of deserializing it to
+ *      MVHistogram/MVBucket.
+ *
+ *
+ * General TODOs
+ * -------------
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (10 * 1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 10MB (%ld > %d)",
+					total_length, (10 * 1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typlen > 0)
+			{
+				/* pased by value or reference, but fixed length */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary
+ * values deduplicated, so that it's possible to optimize the estimation
+ * part by caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* now let's allocate a single buffer for all the values and counts */
+
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+	for (i = 0; i < ndims; i++)
+	{
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+	}
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+				sizeof(MVSerializedBucket) +		/* bucket pointer */
+				sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval && info[i].typlen == sizeof(Datum))
+		{
+			/* passed by value / Datum - simply reuse the array */
+			histogram->values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typbyval)
+			{
+				/* pased by value, but smaller than Datum */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= *BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size. We
+ * select the bucket with the highest number of distinct values, and
+ * then split it by the longest dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this
+ * is used to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this
+ *       contains values for all the tuples from the sample, not just
+ *       the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *      For example use the "bucket volume" (product of dimension
+ *      lengths) to select the bucket.
+ *
+ *      We need buckets containing about the same number of tuples (so
+ *      about the same frequency), as that limits the error when we
+ *      match the bucket partially (in that case use 1/2 the bucket).
+ *
+ *      We also need buckets with "regular" size, i.e. not "narrow" in
+ *      some dimensions and "wide" in the others, because that makes
+ *      partial matches more likely and increases the estimation error,
+ *      especially when the clauses match many buckets partially. This
+ *      is especially serious for OR-clauses, because in that case any
+ *      of them may add the bucket as a (partial) match. With AND-clauses
+ *      all the clauses have to match the bucket, which makes this issue
+ *      somewhat less pressing.
+ *
+ *      For example this table:
+ *
+ *          CREATE TABLE t AS SELECT i AS a, i AS b
+ *                              FROM generate_series(1,1000000) s(i);
+ *          ALTER TABLE t ADD STATISTICS (histogram) ON (a,b);
+ *          ANALYZE t;
+ *
+ *      It's a very specific (and perhaps artificial) example, because
+ *      every bucket always has exactly the same number of distinct
+ *      values in all dimensions, which makes the partitioning tricky.
+ *
+ *      Then:
+ *
+ *          SELECT * FROM t WHERE a < 10 AND b < 10;
+ *
+ *      is estimated to return ~120 rows, while in reality it returns 9.
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *                      (actual time=0.185..270.774 rows=9 loops=1)
+ *         Filter: ((a < 10) AND (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      while the query using OR clauses is estimated like this:
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *                      (actual time=0.118..189.919 rows=9 loops=1)
+ *         Filter: ((a < 10) OR (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      which is clearly much worse. This happens because the histogram
+ *      contains buckets like this:
+ *
+ *          bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ *      i.e. the length of "a" dimension is (30310-3)=30307, while the
+ *      length of "b" is (30593-30134)=459. So the "b" dimension is much
+ *      narrower than "a". Of course, there are buckets where "b" is the
+ *      wider dimension.
+ *
+ *      This is partially mitigated by selecting the "longest" dimension
+ *      in partition_bucket() but that only happens after we already
+ *      selected the bucket. So if we never select the bucket, we can't
+ *      really fix it there.
+ *
+ *      The other reason why this particular example behaves so poorly
+ *      is due to the way we split the partition in partition_bucket().
+ *      Currently we attempt to divide the bucket into two parts with
+ *      the same number of sampled tuples (frequency), but that does not
+ *      work well when all the tuples are squashed on one end of the
+ *      bucket (e.g. exactly at the diagonal, as a=b). In that case we
+ *      split the bucket into a tiny bucket on the diagonal, and a huge
+ *      remaining part of the bucket, which is still going to be narrow
+ *      and we're unlikely to fix that.
+ *
+ *      So perhaps we need two partitioning strategies - one aiming to
+ *      split buckets with high frequency (number of sampled rows), the
+ *      other aiming to split "large" buckets. And alternating between
+ *      them, somehow.
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest
+ * bucket dimension, measured using the array of distinct values built
+ * at the very beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly
+ * distributed, and then use this to measure length. It's essentially
+ * a number of distinct values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts
+ * with roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning
+ * the new bucket (essentially shrinking the existing one in-place and
+ * returning the other "half" as a new bucket). The caller is responsible
+ * for adding the new bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case
+ *      of strongly dependent columns - e.g. y=x). The current algorithm
+ *      minimizes this, but may still happen for perfectly dependent
+ *      examples (when all the dimensions have equal length, the first
+ *      one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	// int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split.
+	 */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* sort support for the bsearch_comparator */
+		ssup_private = &ssup;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		b = (Datum*)bsearch(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute
+		 *       here - we can start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram (or if there's no
+ * statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ * 
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options
+ * skew the lengths by distributing the distinct values uniformly. For
+ * data types without a clear meaning of 'distance' (e.g. strings) that
+ * is not a big deal, but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_size = 1.0;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(9 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+		values[3] = (char *) palloc0(1024 * sizeof(char));
+		values[4] = (char *) palloc0(1024 * sizeof(char));
+		values[5] = (char *) palloc0(1024 * sizeof(char));
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/*
+		 * currently we only print array of indexes, but the deduplicated
+		 * values should be sorted, so this is actually quite useful
+		 *
+		 * TODO print the actual min/max values, using the output
+		 *      function of the attribute type
+		 */
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bucket_size *= (bucket->max[i] - bucket->min[i]) * 1.0
+											/ (histogram->nvalues[i]-1);
+
+			/* print the actual values, i.e. use output function etc. */
+			if (otype == 0)
+			{
+				Datum minval, maxval;
+				Datum minout, maxout;
+
+				format = "%s, %s";
+				if (i == 0)
+					format = "{%s%s";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %s}";
+
+				minval = histogram->values[i][bucket->min[i]];
+				minout = FunctionCall1(&fmgrinfo[i], minval);
+
+				maxval = histogram->values[i][bucket->max[i]];
+				maxout = FunctionCall1(&fmgrinfo[i], maxval);
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], DatumGetPointer(minout));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], DatumGetPointer(maxout));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else if (otype == 1)
+			{
+				format = "%s, %d";
+				if (i == 0)
+					format = "{%s%d";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %d}";
+
+				snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else
+			{
+				format = "%s, %f";
+				if (i == 0)
+					format = "{%s%f";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %f}";
+
+				snprintf(buff, 1024, format, values[1],
+						 bucket->min[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2],
+						bucket->max[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == histogram->ndimensions-1)
+				format = "%s, %s}";
+
+			snprintf(buff, 1024, format, values[3], bucket->nullsonly[i] ? "t" : "f");
+			strncpy(values[3], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[4], bucket->min_inclusive[i] ? "t" : "f");
+			strncpy(values[4], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[5], bucket->max_inclusive[i] ? "t" : "f");
+			strncpy(values[5], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_size);	/* density */
+		snprintf(values[8], 64, "%f", bucket_size);	/* bucket_size */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+		pfree(values[4]);
+		pfree(values[5]);
+		pfree(values[6]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int i, j;
+
+	float ffull = 0, fpartial = 0;
+	int nfull = 0, npartial = 0;
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		char ranges[1024];
+
+		if (! matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		memset(ranges, 0, sizeof(ranges));
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			sprintf(ranges, "%s [%d %d]", ranges,
+										  DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+										  DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, ranges, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 7d13a38..942b779 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,9 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2154,8 +2154,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 9));
+							PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index fd7107d..a5945af 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,13 +38,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -52,6 +55,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -67,17 +71,21 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					11
+#define Natts_pg_mv_statistic					15
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
 #define Anum_pg_mv_statistic_mcv_enabled		5
-#define Anum_pg_mv_statistic_mcv_max_items		6
-#define Anum_pg_mv_statistic_deps_built			7
-#define Anum_pg_mv_statistic_mcv_built			8
-#define Anum_pg_mv_statistic_stakeys			9
-#define Anum_pg_mv_statistic_stadeps			10
-#define Anum_pg_mv_statistic_stamcv				11
+#define Anum_pg_mv_statistic_hist_enabled		6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_hist_max_buckets	8
+#define Anum_pg_mv_statistic_deps_built			9
+#define Anum_pg_mv_statistic_mcv_built			10
+#define Anum_pg_mv_statistic_hist_built			11
+#define Anum_pg_mv_statistic_stakeys			12
+#define Anum_pg_mv_statistic_stadeps			13
+#define Anum_pg_mv_statistic_stamcv				14
+#define Anum_pg_mv_statistic_stahist			15
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 1875e26..2eb16f4 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2749,6 +2749,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_size}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d3c9898..1298c42 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -593,10 +593,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 4535db7..f05a517 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -92,6 +92,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -99,20 +216,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -121,6 +243,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -130,10 +254,20 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..a34edb8
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON mv_histogram (unknown_column) WITH (histogram);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON mv_histogram (a) WITH (histogram);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a, b) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+ERROR:  maximum number of buckets is 16384
+-- correct command
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ac5007e..9db1913 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1371,7 +1371,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 838c12b..fbed683 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index d97a0ec..c60c0b2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -163,3 +163,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..02f49b4
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON mv_histogram (unknown_column) WITH (histogram);
+
+-- single column
+CREATE STATISTICS s1 ON mv_histogram (a) WITH (histogram);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a) WITH (histogram);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a, b) WITH (histogram);
+
+-- unknown option
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (unknown_option);
+
+-- missing histogram statistics
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+
+-- invalid max_buckets value / too low
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+
+-- invalid max_buckets value / too high
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+
+-- correct command
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.1.0

0006-multi-statistics-estimation.patchapplication/x-patch; name=0006-multi-statistics-estimation.patchDownload

From c0983a6079d9b7f4617fb3d31bce53690a35e9d6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/7] multi-statistics estimation

The general idea is that a probability (which
is what selectivity is) can be split into a product of
conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part
may be simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute
the original probability.

The implementation works in the other direction, though.
We know what probability P(A & B & C) we need to compute,
and also what statistics are available.

So we search for a combinations of statistics, covering
the clauses in an optimal way (most clauses covered, most
dependencies exploited).

There are two possible approaches - exhaustive and greedy.
The exhaustive one walks through all permutations of
stats using dynamic programming, so it's guaranteed to
find the optimal solution, but it soon gets very slow as
it's roughly O(N!). The dynamic programming may improve
that a bit, but it's still far too expensive for large
numbers of statistics (on a single table).

The greedy algorithm is very simple - in every step choose
the best solution. That may not guarantee the best solution
globally (but maybe it does?), but it only needs N steps
to find the solution, so it's very fast (processing the
selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with
respect to runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply
them to the clauses using the conditional probabilities.
We process the selected stats one by one, and for each
we select the estimated clauses and conditions. See
clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to
be covered by a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single
multivariate statistics.

Clauses not covered by a single statistics at this level
will be passed to clause_selectivity() but this will treat
them as a collection of simpler clauses (connected by AND
or OR), and the clauses from the previous level will be
used as conditions.

So using the same example, the last clause will be passed
to clause_selectivity() with 'clause1' and 'clause2' as
conditions, and it will be processed using multivariate
stats if possible.

The other limitation is that all the expressions have to
be mv-compatible, i.e. there can't be a mix of expressions.
If this is violated, the clause may be passed to the next
level (just like with list of clauses not covered by
a single statistics), which splits that into clauses
handled by multivariate stats and clauses handler by
regular statistics.

rework clauselist_selectivity_or to handle OR-clauses correctly

We might invent a completely new set of functions here, resembling
clauselist_selectivity but adapting the ideas to OR-clauses.

But luckily we know that each OR-clause

    (a OR b OR c)

may be rewritten as an equivalent AND-clause using negation:

    NOT ((NOT a) AND (NOT b) AND (NOT c))

And that's something we can pass to clauselist_selectivity.

histogram call cache
--------------------

The call cache was removed because it did not initially work
well with OR clauses, but that was just a stupid thinko in the
implementation. This patch re-adds it, hopefully correctly.

The code in update_match_bitmap_histogram() is overly complex,
the branches handling various inequality cases are redundant.
This needs to be simplified somehow.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |    6 +-
 src/backend/optimizer/path/clausesel.c | 2224 +++++++++++++++++++++++++++-----
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 9 files changed, 2003 insertions(+), 308 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index f13316b..e25870f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -954,7 +954,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 374faf5..8f05a02 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -457,7 +457,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2000,7 +2001,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_join_conds,
 										   baserel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 5b2d92a..3d4d136 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -59,23 +68,29 @@ static Bitmapset  *collect_mv_attnums(PlannerInfo *root, List *clauses,
 									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
 									  int type);
 
+static Bitmapset *clause_mv_get_attnums(PlannerInfo *root, Node *clause);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
 static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
 								 List *clauses, Oid varRelid,
 								 List **mvclauses, MVStatisticInfo *mvstats, int types);
 
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-						List *clauses, MVStatisticInfo *mvstats);
+						MVStatisticInfo *mvstats, List *clauses,
+						List *conditions, bool is_or);
+
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -89,11 +104,59 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root,
+								List *mvstats,
+								List *clauses, List *conditions,
+								Oid varRelid,
+								SpecialJoinInfo *sjinfo);
+
+static List *filter_clauses(PlannerInfo *root, Oid varRelid,
+							SpecialJoinInfo *sjinfo, int type,
+							List *stats, List *clauses,
+							Bitmapset **attnums);
+
+static List *filter_stats(List *stats, Bitmapset *new_attnums,
+						  Bitmapset *all_attnums);
+
+static Bitmapset **make_stats_attnums(MVStatisticInfo *mvstats,
+									  int nmvstats);
+
+static MVStatisticInfo *make_stats_array(List *stats, int *nmvstats);
+
+static List* filter_redundant_stats(List *stats,
+									List *clauses, List *conditions);
+
+static Node** make_clauses_array(List *clauses, int *nclauses);
+
+static Bitmapset ** make_clauses_attnums(PlannerInfo *root, Oid varRelid,
+										 SpecialJoinInfo *sjinfo, int type,
+										 Node **clauses, int nclauses);
+
+static bool* make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+							Bitmapset **clauses_attnums, int nclauses);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
 						 Oid varRelid, Index *relid);
- 
+
 static Bitmapset* fdeps_collect_attnums(List *stats);
 
 static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
@@ -116,6 +179,8 @@ static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
 
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
+
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
 #define MIN(x, y) (((x) < (y)) ? (x) : (y))
@@ -257,14 +322,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* attributes in mv-compatible clauses */
 	Bitmapset  *mvattnums = NULL;
@@ -274,12 +340,13 @@ clauselist_selectivity(PlannerInfo *root,
 	stats = find_stats(root, clauses, varRelid, &relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Check that there are some stats with functional dependencies
@@ -311,8 +378,8 @@ clauselist_selectivity(PlannerInfo *root,
 	}
 
 	/*
-	 * Check that there are statistics with MCV list. If not, we don't
-	 * need to waste time with the optimization.
+	 * Check that there are statistics with MCV list or histogram.
+	 * If not, we don't need to waste time with the optimization.
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 	{
@@ -326,33 +393,194 @@ clauselist_selectivity(PlannerInfo *root,
 
 		/*
 		 * If there still are at least two columns, we'll try to select
-		 * a suitable multivariate stats.
+		 * a suitable combination of multivariate stats. If there are
+		 * multiple combinations, we'll try to choose the best one.
+		 * See choose_mv_statistics for more details.
 		 */
 		if (bms_num_members(mvattnums) >= 2)
 		{
-			/* see choose_mv_statistics() for details */
-			MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+			int k;
+			ListCell *s;
+
+			/*
+			 * Copy the list of conditions, so that we can build a list
+			 * of local conditions (and keep the original intact, for
+			 * the other clauses at the same level).
+			 */
+			List *conditions_local = list_copy(conditions);
+
+			/* find the best combination of statistics */
+			List *solution = choose_mv_statistics(root, stats,
+												  clauses, conditions,
+												  varRelid, sjinfo);
 
-			if (mvstat != NULL)	/* we have a matching stats */
+			/* we have a good solution (list of stats) */
+			foreach (s, solution)
 			{
+				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
 				/* clauses compatible with multi-variate stats */
 				List	*mvclauses = NIL;
+				List	*mvclauses_new = NIL;
+				List	*mvclauses_conditions = NIL;
+				Bitmapset	*stat_attnums = NULL;
 
-				/* split the clauselist into regular and mv-clauses */
-				clauses = clauselist_mv_split(root, sjinfo, clauses,
+				/* build attnum bitmapset for this statistics */
+				for (k = 0; k < mvstat->stakeys->dim1; k++)
+					stat_attnums = bms_add_member(stat_attnums,
+												  mvstat->stakeys->values[k]);
+
+				/*
+				 * Append the compatible conditions (passed from above)
+				 * to mvclauses_conditions.
+				 */
+				foreach (l, conditions)
+				{
+					Node *c = (Node*)lfirst(l);
+					Bitmapset *tmp = clause_mv_get_attnums(root, c);
+
+					if (bms_is_subset(tmp, stat_attnums))
+						mvclauses_conditions
+							= lappend(mvclauses_conditions, c);
+
+					bms_free(tmp);
+				}
+
+				/* split the clauselist into regular and mv-clauses
+				 *
+				 * We keep the list of clauses (we don't remove the
+				 * clauses yet, because we want to use the clauses
+				 * as conditions of other clauses).
+				 *
+				 * FIXME Do this only once, i.e. filter the clauses
+				 *       once (selecting clauses covered by at least
+				 *       one statistics) and then convert them into
+				 *       smaller per-statistics lists of conditions
+				 *       and estimated clauses.
+				 */
+				clauselist_mv_split(root, sjinfo, clauses,
 										varRelid, &mvclauses, mvstat,
 										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
-				/* we've chosen the histogram to match the clauses */
+				/*
+				 * We've chosen the statistics to match the clauses, so
+				 * each statistics from the solution should have at least
+				 * one new clause (not covered by the previous stats).
+				 */
 				Assert(mvclauses != NIL);
 
+				/*
+				 * Mvclauses now contains only clauses compatible
+				 * with the currently selected stats, but we have to
+				 * split that into conditions (already matched by
+				 * the previous stats), and the new clauses we need
+				 * to estimate using this stats.
+				 */
+				foreach (l, mvclauses)
+				{
+					ListCell *p;
+					bool covered = false;
+					Node  *clause = (Node *) lfirst(l);
+					Bitmapset *clause_attnums = clause_mv_get_attnums(root, clause);
+
+					/*
+					 * If already covered by previous stats, add it to
+					 * conditions.
+					 *
+					 * TODO Maybe this could be relaxed a bit? Because
+					 *      with complex and/or clauses, this might
+					 *      mean no statistics actually covers such
+					 *      complex clause.
+					 */
+					foreach (p, solution)
+					{
+						int k;
+						Bitmapset  *stat_attnums = NULL;
+
+						MVStatisticInfo *prev_stat
+							= (MVStatisticInfo *)lfirst(p);
+
+						/* break if we've ran into current statistic */
+						if (prev_stat == mvstat)
+							break;
+
+						for (k = 0; k < prev_stat->stakeys->dim1; k++)
+							stat_attnums = bms_add_member(stat_attnums,
+														  prev_stat->stakeys->values[k]);
+
+						covered = bms_is_subset(clause_attnums, stat_attnums);
+
+						bms_free(stat_attnums);
+
+						if (covered)
+							break;
+					}
+
+					if (covered)
+						mvclauses_conditions
+							= lappend(mvclauses_conditions, clause);
+					else
+						mvclauses_new
+							= lappend(mvclauses_new, clause);
+				}
+
+				/*
+				 * We need at least one new clause (not just conditions).
+				 */
+				Assert(mvclauses_new != NIL);
+
 				/* compute the multivariate stats */
-				s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+				s1 *= clauselist_mv_selectivity(root, mvstat,
+												mvclauses_new,
+												mvclauses_conditions,
+												false); /* AND */
+			}
+
+			/*
+			 * And now finally remove all the mv-compatible clauses.
+			 *
+			 * This only repeats the same split as above, but this
+			 * time we actually use the result list (and feed it to
+			 * the next call).
+			 */
+			foreach (s, solution)
+			{
+				/* clauses compatible with multi-variate stats */
+				List	*mvclauses = NIL;
+
+				MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+				/* split the list into regular and mv-clauses */
+				clauses = clauselist_mv_split(root, sjinfo, clauses,
+										varRelid, &mvclauses, mvstat,
+										(MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+
+				/*
+				 * Add the clauses to the conditions (to be passed
+				 * to regular clauses), irrespectedly whether it
+				 * will be used as a condition or a clause here.
+				 *
+				 * We only keep the remaining conditions in the
+				 * clauses (we keep what clauselist_mv_split returns)
+				 * so we add each MV condition exactly once.
+				 */
+				conditions_local = list_concat(conditions_local, mvclauses);
 			}
+
+			/* from now on, work with the 'local' list of conditions */
+			conditions = conditions_local;
 		}
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return clause_selectivity(root, (Node *) linitial(clauses),
+								  varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -364,7 +592,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -523,6 +752,55 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for OR-clauses. We can't
+ * simply apply exactly the same logic as to AND-clauses, because there
+ * are a few key differences:
+ *
+ *   - functional dependencies don't really apply to OR-clauses
+ *
+ *   - clauselist_selectivity() works by decomposing the selectivity
+ *     into conditional selectivities (probabilities), but that can be
+ *     done only for AND-clauses. That means problems with applying
+ *     multiple statistics (and reusing clauses as conditions, etc.).
+ *
+ * We might invent a completely new set of functions here, resembling
+ * clauselist_selectivity but adapting the ideas to OR-clauses.
+ *
+ * But luckily we know that each OR-clause
+ *
+ *     (a OR b OR c)
+ *
+ * may be rewritten as an equivalent AND-clause using negation:
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * And that's something we can pass to clauselist_selectivity.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	List	   *args = NIL;
+	ListCell   *l;
+	Expr	   *expr;
+
+	/* (NOT ...) */
+	foreach (l, clauses)
+		args = lappend(args, makeBoolExpr(NOT_EXPR, list_make1(lfirst(l)), -1));
+
+	/* ((NOT ...) AND (NOT ...)) */
+	expr = makeBoolExpr(AND_EXPR, args, -1);
+
+	/* NOT (... AND ...) */
+	return 1.0 - clauselist_selectivity(root, list_make1(expr), varRelid,
+										jointype, sjinfo, conditions);
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -729,7 +1007,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -849,7 +1128,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -858,29 +1138,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -970,7 +1239,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -979,7 +1249,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else
 	{
@@ -1103,9 +1374,67 @@ clause_selectivity(PlannerInfo *root,
  *      them without inspection, which is more expensive). But this
  *      requires really knowing the per-clause selectivities in advance,
  *      and that's not what we do now.
+ *
+ * TODO All this is based on the assumption that the statistics represent
+ *      the necessary dependencies, i.e. that if two colunms are not in
+ *      the same statistics, there's no dependency. If that's not the
+ *      case, we may get misestimates, just like before. For example
+ *      assume we have a table with three columns [a,b,c] with exactly
+ *      the same values, and statistics on [a,b] and [b,c]. So somthing
+ *      like this:
+ *
+ *          CREATE TABLE test AS SELECT i, i, i
+                                  FROM generate_series(1,1000);
+ *
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (a,b);
+ *          ALTER TABLE test ADD STATISTICS (mcv) ON (b,c);
+ *
+ *          ANALYZE test;
+ *
+ *          EXPLAIN ANALYZE SELECT * FROM test
+ *                    WHERE (a < 10) AND (b < 20) AND (c < 10);
+ *
+ *      The problem here is that the only shared column between the two
+ *      statistics is 'b' so the probability will be computed like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (a < 10) & (b < 20)]
+ *             = P[(a < 10) & (b < 20)] * P[(c < 10) | (b < 20)]
+ *
+ *      or like this
+ *
+ *          P[(a < 10) & (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20) & (c < 10)]
+ *             = P[(b < 20) & (c < 10)] * P[(a < 10) | (b < 20)]
+ *
+ *      In both cases the conditional probabilities will be evaluated as
+ *      0.5, because they lack the other column (which would make it 1.0).
+ *
+ *      Theoretically it might be possible to transfer the dependency,
+ *      e.g. by building bitmap for [a,b] and then combine it with [b,c]
+ *      by doing something like this:
+ *
+ *          1) build bitmap on [a,b] using [(a<10) & (b < 20)]
+ *          2) for each element in [b,c] check the bitmap
+ *
+ *      But that's certainly nontrivial - for example the statistics may
+ *      be different (MCV list vs. histogram) and/or the items may not
+ *      match (e.g. MCV items or histogram buckets will be built
+ *      differently). Also, for one value of 'b' there might be multiple
+ *      MCV items (because of the other column values) with different
+ *      bitmap values (some will match, some won't) - so it's not exactly
+ *      bitmap but a partial match.
+ *
+ *      Maybe a hash table with number of matches and mismatches (or
+ *      maybe sums of frequencies) would work? The step (2) would then
+ *      lookup the values and use that to weight the item somehow.
+ * 
+ *      Currently the only solution is to build statistics on all three
+ *      columns.
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -1123,7 +1452,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -1136,7 +1466,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
 	 *       selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1176,8 +1507,7 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 	 */
 	if (bms_num_members(attnums) <= 1)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
+		bms_free(attnums);
 		attnums = NULL;
 		*relid = InvalidOid;
 	}
@@ -1186,202 +1516,931 @@ collect_mv_attnums(PlannerInfo *root, List *clauses, Oid varRelid,
 }
 
 /*
- * We're looking for statistics matching at least 2 attributes,
- * referenced in the clauses compatible with multivariate statistics.
- * The current selection criteria is very simple - we choose the
- * statistics referencing the most attributes.
+ * Selects the best combination of multivariate statistics, in an
+ * exhaustive way, where 'best' means:
  *
- * If there are multiple statistics referencing the same number of
- * columns (from the clauses), the one with less source columns
- * (as listed in the ADD STATISTICS when creating the statistics) wins.
- * Other wise the first one wins.
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
  *
- * This is a very simple criteria, and has several weaknesses:
+ * There may be other optimality criteria, not considered in the initial
+ * implementation (more on that 'weaknesses' section).
  *
- * (a) does not consider the accuracy of the statistics
+ * This pretty much splits the probability of clauses (aka selectivity)
+ * into a sequence of conditional probabilities, like this
  *
- *     If there are two histograms built on the same set of columns,
- *     but one has 100 buckets and the other one has 1000 buckets (thus
- *     likely providing better estimates), this is not currently
- *     considered.
+ *    P(A,B,C,D) = P(A,B) * P(C|A,B) * P(D|A,B,C)
  *
- * (b) does not consider the type of statistics
+ * and removing the attributes not referenced by the existing stats,
+ * under the assumption that there's no dependency (otherwise the DBA
+ * would create the stats).
  *
- *     If there are three statistics - one containing just a MCV list,
- *     another one with just a histogram and a third one with both,
- *     this is not considered.
+ * The last criteria means that when we have the choice to compute like
+ * this
  *
- * (c) does not consider the number of clauses
+ *      P(A,B,C,D) = P(A,B,C) * P(D|B,C)
  *
- *     As explained, only the number of referenced attributes counts,
- *     so if there are multiple clauses on a single attribute, this
- *     still counts as a single attribute.
+ * or like this
  *
- * (d) does not consider type of condition
+ *      P(A,B,C,D) = P(A,B,C) * P(D|C)
  *
- *     Some clauses may work better with some statistics - for example
- *     equality clauses probably work better with MCV lists than with
- *     histograms. But IS [NOT] NULL conditions may often work better
- *     with histograms (thanks to NULL-buckets).
+ * we should use the first option, as that exploits more dependencies.
  *
- * So for example with five WHERE conditions
+ * The order of statistics in the solution implicitly determines the
+ * order of estimation of clauses, because as we apply a statistics,
+ * we always use it to estimate all the clauses covered by it (and
+ * then we use those clauses as conditions for the next statistics).
  *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ * Don't call this directly but through choose_mv_statistics().
  *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
- * selected as it references the most columns.
  *
- * Once we have selected the multivariate statistics, we split the list
- * of clauses into two parts - conditions that are compatible with the
- * selected stats, and conditions are estimated using simple statistics.
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with
+ * maximum 'depth' equal to the number of multi-variate statistics
+ * available on the table.
  *
- * From the example above, conditions
+ * It explores all the possible permutations of the stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it
+ * matches are divided into 'conditions' (clauses already matched by at
+ * least one previous statistics) and clauses that are estimated.
  *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ * Then several checks are performed:
  *
- * will be estimated using the multivariate statistics (a,b,c,d) while
- * the last condition (e = 1) will get estimated using the regular ones.
+ *  (a) The statistics covers at least 2 columns, referenced in the
+ *      estimated clauses (otherwise multi-variate stats are useless).
  *
- * There are various alternative selection criteria (e.g. counting
- * conditions instead of just referenced attributes), but eventually
- * the best option should be to combine multiple statistics. But that's
- * much harder to do correctly.
+ *  (b) The statistics covers at least 1 new column, i.e. column not
+ *      refefenced by the already used stats (and the new column has
+ *      to be referenced by the clauses, of couse). Otherwise the
+ *      statistics would not add any new information.
  *
- * TODO Select multiple statistics and combine them when computing
- *      the estimate.
+ * There are some other sanity checks (e.g. that the stats must not be
+ * used twice etc.).
  *
- * TODO This will probably have to consider compatibility of clauses,
- *      because 'dependencies' will probably work only with equality
- *      clauses.
+ * Finally the new solution is compared to the currently best one, and
+ * if it's considered better, it's used instead.
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a somewhat simple optimality criteria,
+ * suffering by the following weaknesses.
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but
+ *     with statistics in a different order). It's unclear which solution
+ *     is the best one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those
+ *      solutions, and then combine them to get the final estimate
+ *      (e.g. by using average or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for
+ *     some types of clauses (e.g. MCV list is a good match for equality
+ *     than a histogram).
+ *
+ *     XXX Maybe MCV is almost always better / more accurate?
+ *
+ *     But maybe this is pointless - generally, each column is either
+ *     a label (it's not important whether because of the data type or
+ *     how it's used), or a value with ordering that makes sense. So
+ *     either a MCV list is more appropriate (labels) or a histogram
+ *     (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing columns of
+ *     both types - maybe it'd be beeter to invent a new type of stats
+ *     combining MCV list and histogram (keeping a small histogram for
+ *     each MCV item, and a separate histogram for values not on the
+ *     MCV list). But that's not implemented at this moment.
+ *
+ * TODO The algorithm should probably count number of Vars (not just
+ *      attnums) when computing the 'score' of each solution. Computing
+ *      the ratio of (num of all vars) / (num of condition vars) as a
+ *      measure of how well the solution uses conditions might be
+ *      useful.
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
 {
-	int i;
-	ListCell   *lc;
+	int i, j;
 
-	MVStatisticInfo *choice = NULL;
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements)
-	 * and for each one count the referenced attributes (encoded in
-	 * the 'attnums' bitmap).
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
 	 */
-	foreach (lc, stats)
+	for (i = 0; i < nmvstats; i++)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+		int c;
 
-		/* columns matching this statistics */
-		int matches = 0;
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
 			continue;
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
-
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		for (c = 0; c < nclauses; c++)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
-		}
-	}
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
 
-	return choice;
-}
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
 
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
 
-/*
- * This splits the clauses list into two parts - one containing clauses
- * that will be evaluated using the chosen statistics, and the remaining
- * clauses (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
-					List *clauses, Oid varRelid, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
 
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
 
-	Bitmapset *mvattnums = NULL;
+				if (covered)
+					break;
+			}
 
-	/* build bitmap of attributes covered by the stats, so we can
-	 * do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
 
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
 
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
 
-		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
-									&attnums, sjinfo, types))
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
 		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
 		}
 
 		/*
-		 * The clause matches the selected stats, so put it to the list
-		 * of mv-compatible clauses. Otherwise, keep it in the list of
-		 * 'regular' clauses (that may be selected later).
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
 		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
-	}
+		ruled_out[i] = step;
 
-	/*
-	 * Perform regular estimation using the clauses incompatible
-	 * with the chosen histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
 
-}
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
 
-/*
- * Determines whether the clause is compatible with multivariate stats,
- * and if it is, returns some additional information - varno (index
- * into simple_rte_array) and a bitmap of attributes. This is then
- * used to fetch related multivariate statistics.
- *
- * At this moment we only support basic conditions of the form
- *
- *     variable OP constant
- *
- * where OP is one of [=,<,<=,>=,>] (which is however determined by
- * looking at the associated function for estimating selectivity, just
- * like with the single-dimensional case).
- *
- * TODO Support 'OR clauses' - shouldn't be all that difficult to
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		/* we can't get more conditions that clauses and conditions combined
+		 *
+		 * FIXME This assert does not work because we count the conditions
+		 *       repeatedly (once for each statistics covering it).
+		 */
+		/* Assert((nconditions + nclauses) >= current->nconditions); */
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics
+ * covering the clauses. This chooses the "best" statistics at each step,
+ * so the resulting solution may not be the best solution globally, but
+ * this produces the solution in only N steps (where N is the number of
+ * statistics), while the exhaustive approach may have to walk through
+ * ~N! combinations (although some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does
+ * the same thing (but in a different way).
+ *
+ * Don't call this directly, but through choose_mv_statistics().
+ *
+ * TODO There are probably other metrics we might use - e.g. using
+ *      number of columns (num_cond_columns / num_cov_columns), which
+ *      might work better with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled
+ *      in a special way, because there will be 0 conditions at that
+ *      moment, so there needs to be some other criteria - e.g. using
+ *      the simplest (or most complex?) clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria,
+ *      and branch the search. This is however tricky, because if we
+ *      choose k statistics at each step, we get k^N branches to
+ *      walk through (with N steps). That's not really good with
+ *      large number of stats (yet better than exhaustive search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
+ * Chooses the combination of statistics, optimal for estimation of
+ * a particular clause list.
+ *
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce
+ * the size of the problem (eliminate clauses/statistics that can't be
+ * really used in the solution).
+ *
+ * It also precomputes bitmaps for attributes covered by clauses and
+ * statistics, so that we don't need to do that over and over in the
+ * actual optimizations (as it's both CPU and memory intensive).
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ *
+ * TODO Another way to make the optimization problems smaller might
+ *      be splitting the statistics into several disjoint subsets, i.e.
+ *      if we can split the graph of statistics (after the elimination)
+ *      into multiple components (so that stats in different components
+ *      share no attributes), we can do the optimization for each
+ *      component separately.
+ *
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew
+ *      that we can cover 10 clauses and reuse 8 dependencies, maybe
+ *      covering 9 clauses and 7 dependencies would be OK?
+ */
+static List*
+choose_mv_statistics(PlannerInfo *root, List *stats,
+					 List *clauses, List *conditions,
+					 Oid varRelid, SpecialJoinInfo *sjinfo)
+{
+	int i;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
+
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
+
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
+
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
+
+	/*
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
+	 */
+	while (true)
+	{
+		List	   *tmp;
+
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
+
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, varRelid, sjinfo, type,
+							 stats, clauses, &compatible_attnums);
+
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
+
+		/*
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
+		 */
+		if (conditions != NIL)
+		{
+			tmp = filter_clauses(root, varRelid, sjinfo, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
+		}
+
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
+
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
+
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
+
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
+
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
+
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
+										   clauses_array, nclauses);
+
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
+										   conditions_array, nconditions);
+
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
+		{
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
+		}
+		pfree(best);
+	}
+
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
+
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
+
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
+
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
+
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
+
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
+
+	return result;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen statistics, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes covered by the stats, so we can
+	 * do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(root, clause, varRelid, NULL,
+									&attnums, sjinfo, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list
+		 * of mv-compatible clauses. Otherwise, keep it in the list of
+		 * 'regular' clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
  *      evaluate them using multivariate stats.
  */
 static bool
@@ -1539,10 +2598,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 
 		return true;
 	}
-	else if (or_clause(clause) || and_clause(clause))
+	else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 	{
 		/*
-		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
 		 *
 		 * TODO We might support mixed case, where some of the clauses
 		 *      are supported and some are not, and treat all supported
@@ -1552,7 +2611,10 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 		 *
 		 * TODO For RestrictInfo above an OR-clause, we might use the
 		 *      orclause with nested RestrictInfo - we won't have to
-		 *      call pull_varnos() for each clause, saving time. 
+		 *      call pull_varnos() for each clause, saving time.
+		 *
+		 * TODO Perhaps this needs a bit more thought for functional
+		 *      dependencies? Those don't quite work for NOT cases.
 		 */
 		Bitmapset *tmp = NULL;
 		ListCell *l;
@@ -1572,6 +2634,51 @@ clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 	return false;
 }
 
+
+static Bitmapset *
+clause_mv_get_attnums(PlannerInfo *root, Node *clause)
+{
+	Bitmapset * attnums = NULL;
+
+	/* Extract clause from restrict info, if needed. */
+	if (IsA(clause, RestrictInfo))
+		clause = (Node*)((RestrictInfo*)clause)->clause;
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+
+		if (IsA(linitial(expr->args), Var))
+			attnums = bms_add_member(attnums,
+							((Var*)linitial(expr->args))->varattno);
+		else
+			attnums = bms_add_member(attnums,
+							((Var*)lsecond(expr->args))->varattno);
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		attnums = bms_add_member(attnums,
+							((Var*)((NullTest*)clause)->arg)->varattno);
+	}
+	else if (or_clause(clause) || and_clause(clause) || or_clause(clause))
+	{
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			attnums = bms_join(attnums,
+						clause_mv_get_attnums(root, (Node*)lfirst(l)));
+		}
+	}
+
+	return attnums;
+}
+
 /*
  * Performs reduction of clauses using functional dependencies, i.e.
  * removes clauses that are considered redundant. It simply walks
@@ -2223,22 +3330,26 @@ get_varattnos(Node * node, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2249,32 +3360,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
 
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2567,64 +3731,57 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				}
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mcvlist->nitems; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2682,15 +3839,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2701,27 +3861,57 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 	/* There may be no histogram in the stats (check hist_built flag) */
 	mvhist = load_mv_histogram(mvstats->mvoid);
 
-	Assert (mvhist != NULL);
-	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
-
-	nmatches = mvhist->nbuckets;
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2735,17 +3925,35 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2775,7 +3983,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 {
 	int i;
 	ListCell * l;
-
+ 
 	/*
 	 * Used for caching function calls, only once per deduplicated value.
 	 *
@@ -2818,7 +4026,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 			FmgrInfo	opproc;			/* operator */
 			fmgr_info(get_opcode(expr->opno), &opproc);
-
+ 
 			/* reset the cache (per clause) */
 			memset(callcache, 0, mvhist->nbuckets);
 
@@ -2870,7 +4078,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 					/* histogram boundaries */
 					Datum minval, maxval;
-
+ 
 					/* values from the call cache */
 					char mincached, maxcached;
 
@@ -2959,7 +4167,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 								}
 
 								/*
-								 * Now check whether the upper boundary is below the constant (in that
+								 * Now check whether constant is below the upper boundary (in that
 								 * case it's a partial match).
 								 */
 								if (! maxcached)
@@ -2978,8 +4186,32 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 								else
 									tmp = !(maxcached & 0x02);	/* extract the result (reverse) */
 
-								if (tmp)	/* partial match */
+								if (tmp)
+								{
+									/* partial match */
 									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+									continue;
+								}
+
+
+								/*
+								 * And finally check whether the whether the constant is above the the upper
+								 * boundary (in that case it's a full match match).
+								 *
+								 * XXX We need to do this because of the OR clauses (which start with no
+								 *     matches and we incrementally add more and more matches), but maybe
+								 *     we don't need to do the check and can just do UPDATE_RESULT?
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 maxval,
+																	 cst->constvalue));
+
+								if (tmp)
+								{
+									/* full match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_FULL, is_or);
+								}
 
 							}
 							else	/* (const < var) */
@@ -3018,15 +4250,36 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 																		 DEFAULT_COLLATION_OID,
 																		 minval,
 																		 cst->constvalue));
-
 									/* Update the cache. */
 									callcache[bucket->min[idx]] = (tmp) ? 0x03 : 0x01;
-								}
+ 								}
 								else
 									tmp = (mincached & 0x02);	/* extract the result */
 
-								if (tmp)	/* partial match */
+								if (tmp)
+								{
+									/* partial match */
 									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 *
+								 * XXX We need to do this because of the OR clauses (which start with no
+								 *     matches and we incrementally add more and more matches), but maybe
+								 *     we don't need to do the check and can just do UPDATE_RESULT?
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 minval));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_FULL, is_or);
+
 							}
 							break;
 
@@ -3082,8 +4335,29 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									tmp = !(mincached & 0x02);	/* extract the result */
 
 								if (tmp)
+								{
 									/* partial match */
 									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the lower boundary is below the constant (in that
+								 * case it's a partial match).
+								 *
+								 * XXX We need to do this because of the OR clauses (which start with no
+								 *     matches and we incrementally add more and more matches), but maybe
+								 *     we don't need to do the check and can just do UPDATE_RESULT?
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 minval,
+																	 cst->constvalue));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_FULL, is_or);
+
 							}
 							else /* (const > var) */
 							{
@@ -3129,8 +4403,30 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									tmp = (maxcached & 0x02);	/* extract the result */
 
 								if (tmp)
+								{
 									/* partial match */
 									UPDATE_RESULT(matches[i], MVSTATS_MATCH_PARTIAL, is_or);
+									continue;
+								}
+
+								/*
+								 * Now check whether the upper boundary is below the constant (in that
+								 * case it's a partial match).
+								 *
+								 * XXX We need to do this because of the OR clauses (which start with no
+								 *     matches and we incrementally add more and more matches), but maybe
+								 *     we don't need to do the check and can just do UPDATE_RESULT?
+								 */
+								tmp = DatumGetBool(FunctionCall2Coll(&opproc,
+																	 DEFAULT_COLLATION_OID,
+																	 cst->constvalue,
+																	 maxval));
+
+								if (tmp)
+									/* partial match */
+									UPDATE_RESULT(matches[i], MVSTATS_MATCH_FULL, is_or);
+									continue;
+
 							}
 							break;
 
@@ -3195,6 +4491,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 							else
 								tmp = (maxcached & 0x02);	/* extract the result */
 
+
 							if (tmp)
 							{
 								/* no match */
@@ -3246,64 +4543,57 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
-			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			/* by default none of the buckets matches the clauses (OR clause) */
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* but AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
-
+			pfree(tmp_matches);
 		}
 		else
 			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* free the call cache */
 	pfree(callcache);
 
 #ifdef DEBUG_MVHIST
@@ -3312,3 +4602,363 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Walk through clauses and keep only those covered by at least
+ * one of the statistics.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
+			   int type, List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+		Index relid;
+
+		/*
+		 * The clause has to be mv-compatible (suitable operators etc.).
+		 */
+		if (! clause_is_mv_compatible(root, clause, varRelid,
+							 &relid, &clause_attnums, sjinfo, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* is there a statistics covering this clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			for (k = 0; k < stat->stakeys->dim1; k++)
+			{
+				if (bms_is_member(stat->stakeys->values[k],
+								  clause_attnums))
+					matches += 1;
+			}
+
+			/*
+			 * The clause is compatible if all attributes it references
+			 * are covered by the statistics.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+
+/*
+ * Walk through statistics and only keep those covering at least
+ * one new attribute (excluding conditions) and at two attributes
+ * in both clauses and conditions.
+ *
+ * This check might be made more strict by checking against individual
+ * clauses, because by using the bitmapsets of all attnums we may
+ * actually use attnums from clauses that are not covered by the
+ * statistics. For example, we may have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this
+ * (assuming there are some statistics covering both clases).
+ *
+ * TODO Do the more strict check.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
+
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
+
+	return mvstats;
+}
+
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
+
+	Assert(nmvstats > 0);
+
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
+}
+
+
+/*
+ * Now let's remove redundant statistics, covering the same columns
+ * as some other stats, when restricted to the attributes from
+ * remaining clauses.
+ *
+ * If statistics S1 covers S2 (covers S2 attributes and possibly
+ * some more), we can probably remove S2. What actually matters are
+ * attributes from covered clauses (not all the attributes). This
+ * might however prefer larger, and thus less accurate, statistics.
+ *
+ * When a redundancy is detected, we simply keep the smaller
+ * statistics (less number of columns), on the assumption that it's
+ * more accurate and faster to process. That might be incorrect for
+ * two reasons - first, the accuracy really depends on number of
+ * buckets/MCV items, not the number of columns. Second, we might
+ * prefer MCV lists over histograms or something like that.
+ */
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
+{
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
+
+	/*
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+
+	/*
+	 * Get the varattnos from both conditions and clauses.
+	 *
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
+	 *
+	 * XXX Is that really true?
+	 */
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
+	{
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
+
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
+	}
+
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
+	}
+
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
+}
+
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
+{
+	int i;
+	ListCell *l;
+
+	Node** clauses_array;
+
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
+
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
+}
+
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
+					 int type, Node **clauses, int nclauses)
+{
+	int			i;
+	Index		relid;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
+
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
+
+		if (! clause_is_mv_compatible(root, clauses[i], varRelid,
+									  &relid, &attnums, sjinfo, type))
+			elog(ERROR, "should not get non-mv-compatible cluase");
+
+		clauses_attnums[i] = attnums;
+	}
+
+	return clauses_attnums;
+}
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 459368e..d5bb819 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3431,7 +3431,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3454,7 +3455,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3621,7 +3623,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3657,7 +3659,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3694,7 +3697,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3832,12 +3836,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3849,7 +3855,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index ea831f5..6299e75 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ebb03aa..3c58d42 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1625,13 +1625,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6257,7 +6259,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6582,7 +6585,8 @@ btcostestimate(PG_FUNCTION_ARGS)
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7361,7 +7365,8 @@ gincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7598,7 +7603,7 @@ brincostestimate(PG_FUNCTION_ARGS)
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 38ba82f..861601f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -380,6 +381,15 @@ static const struct config_enum_entry huge_pages_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3672,6 +3682,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9999ca3..c1f7787 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -191,11 +191,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index f05a517..35b2f8e 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
2.1.0

0007-initial-version-of-ndistinct-conefficient-statistics.patchapplication/x-patch; name=0007-initial-version-of-ndistinct-conefficient-statistics.patchDownload

From f750038f659d74ff88be575b1c5c92ad0f745f1d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Wed, 23 Dec 2015 02:07:58 +0100
Subject: [PATCH 7/7] initial version of ndistinct conefficient statistics

---
 doc/src/sgml/ref/create_statistics.sgml |   9 ++
 src/backend/commands/statscmds.c        |  11 ++-
 src/backend/optimizer/path/clausesel.c  |   7 ++
 src/backend/optimizer/util/plancat.c    |   4 +-
 src/backend/utils/mvstats/Makefile      |   2 +-
 src/backend/utils/mvstats/common.c      |  20 ++++-
 src/backend/utils/mvstats/mvdist.c      | 147 ++++++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h   |  26 +++---
 src/include/nodes/relation.h            |   2 +
 src/include/utils/mvstats.h             |   6 ++
 10 files changed, 217 insertions(+), 17 deletions(-)
 create mode 100644 src/backend/utils/mvstats/mvdist.c

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index fd3382e..80360a6 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -168,6 +168,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index b974655..6ea0e13 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -138,7 +138,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool 	build_dependencies = false,
 			build_mcv = false,
-			build_histogram = false;
+			build_histogram = false,
+			build_ndistinct = false;
 
 	int32 	max_buckets = -1,
 			max_mcv_items = -1;
@@ -221,6 +222,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "max_mcv_items") == 0)
@@ -275,10 +278,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv || build_histogram))
+	if (! (build_dependencies || build_mcv || build_histogram || build_ndistinct))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram, ndistinct) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -311,6 +314,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
 	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_ndist_enabled-1] = BoolGetDatum(build_ndistinct);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
@@ -318,6 +322,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+	nulls[Anum_pg_mv_statistic_standist -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 3d4d136..720ff87 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
+#define		MV_CLAUSE_TYPE_NDIST	0x08
 
 static bool clause_is_mv_compatible(PlannerInfo *root, Node *clause, Oid varRelid,
 							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -377,6 +378,9 @@ clauselist_selectivity(PlannerInfo *root,
 													stats, sjinfo);
 	}
 
+	if (has_stats(stats, MV_CLAUSE_TYPE_NDIST))
+		elog(WARNING, "has ndistinct coefficient stats");
+
 	/*
 	 * Check that there are statistics with MCV list or histogram.
 	 * If not, we don't need to waste time with the optimization.
@@ -2931,6 +2935,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_NDIST) && stat->ndist_built)
+			return true;
 	}
 
 	return false;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 963d26e..a319246 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -410,7 +410,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -421,11 +421,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
 				info->hist_enabled = mvstat->hist_enabled;
+				info->ndist_enabled = mvstat->ndist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
 				info->hist_built = mvstat->hist_built;
+				info->ndist_built = mvstat->ndist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 9dbb3b6..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o histogram.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index ffb76f4..c42ca8f 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -53,6 +53,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
 		MVHistogram	histogram = NULL;
+		double		ndist	  = -1;
 		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
@@ -92,6 +93,9 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		if (stat->ndist_enabled)
+			ndist = build_mv_ndistinct(numrows, rows, attrs, stats);
+
 		/* build the MCV list */
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
@@ -101,7 +105,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, ndist, attrs, stats);
 	}
 }
 
@@ -183,6 +187,8 @@ list_mv_stats(Oid relid)
 		info->mcv_built = stats->mcv_built;
 		info->hist_enabled = stats->hist_enabled;
 		info->hist_built = stats->hist_built;
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
 
 		result = lappend(result, info);
 	}
@@ -252,7 +258,7 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 void
 update_mv_stats(Oid mvoid,
 				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
-				int2vector *attrs, VacAttrStats **stats)
+				double ndistcoeff, int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -292,26 +298,36 @@ update_mv_stats(Oid mvoid,
 			= PointerGetDatum(data);
 	}
 
+	if (ndistcoeff > 1.0)
+	{
+		nulls[Anum_pg_mv_statistic_standist -1] = false;
+		values[Anum_pg_mv_statistic_standist-1] = Float8GetDatum(ndistcoeff);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_standist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_ndist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_ndist_built-1] = true;
 	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_ndist_built-1] = BoolGetDatum(ndistcoeff > 1.0);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..6df7411
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,147 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * 
+ */
+double
+build_mv_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	MultiSortSupport mss = multi_sort_init(numattrs);
+	int ndistinct;
+	double result;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	Assert(numattrs >= 2);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, i, stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[i],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct++;
+	}
+
+	result = 1 / (double)ndistinct;
+
+	/*
+	 * now count distinct values for each attribute and incrementally
+	 * compute ndistinct(a,b) / (ndistinct(a) * ndistinct(b))
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		SortSupportData ssup;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+		memset(values, 0, sizeof(Datum) * numrows);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			bool isnull;
+			values[j] = heap_getattr(rows[j], attrs->values[i],
+									 stats[i]->tupDesc, &isnull);
+		}
+
+		qsort_arg((void *)values, numrows, sizeof(Datum),
+				  compare_scalars_simple, &ssup);
+
+		ndistinct = 1;
+		for (j = 1; j < numrows; j++)
+		{
+			if (compare_scalars_simple(&values[j], &values[j-1], &ssup) != 0)
+				ndistinct++;
+		}
+
+		result *= ndistinct;
+	}
+
+	return result;
+}
+
+double
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return DatumGetFloat8(deps);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index a5945af..ee353da 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,6 +39,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
 	bool		hist_enabled;		/* build histogram? */
+	bool		ndist_enabled;		/* build ndist coefficient? */
 
 	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
@@ -48,6 +49,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
 	bool		hist_built;			/* histogram was built */
+	bool		ndist_built;		/* ndistinct coeff built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -56,6 +58,7 @@ CATALOG(pg_mv_statistic,3381)
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
 	bytea		stahist;			/* MV histogram (serialized) */
+	float8		standcoeff;			/* ndistinct coeff (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -71,21 +74,24 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					15
+#define Natts_pg_mv_statistic					18
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
 #define Anum_pg_mv_statistic_mcv_enabled		5
 #define Anum_pg_mv_statistic_hist_enabled		6
-#define Anum_pg_mv_statistic_mcv_max_items		7
-#define Anum_pg_mv_statistic_hist_max_buckets	8
-#define Anum_pg_mv_statistic_deps_built			9
-#define Anum_pg_mv_statistic_mcv_built			10
-#define Anum_pg_mv_statistic_hist_built			11
-#define Anum_pg_mv_statistic_stakeys			12
-#define Anum_pg_mv_statistic_stadeps			13
-#define Anum_pg_mv_statistic_stamcv				14
-#define Anum_pg_mv_statistic_stahist			15
+#define Anum_pg_mv_statistic_ndist_enabled		7
+#define Anum_pg_mv_statistic_mcv_max_items		8
+#define Anum_pg_mv_statistic_hist_max_buckets	9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_ndist_built		13
+#define Anum_pg_mv_statistic_stakeys			14
+#define Anum_pg_mv_statistic_stadeps			15
+#define Anum_pg_mv_statistic_stamcv				16
+#define Anum_pg_mv_statistic_stahist			17
+#define Anum_pg_mv_statistic_standist			18
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1298c42..97d74e9 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -594,11 +594,13 @@ typedef struct MVStatisticInfo
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
 	bool		hist_enabled;	/* histogram enabled */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
 	bool		hist_built;		/* histogram built */
+	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 35b2f8e..a154cd9 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -225,6 +225,7 @@ typedef MVSerializedHistogramData *MVSerializedHistogram;
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
 MVSerializedHistogram    load_mv_histogram(Oid mvoid);
+double		   load_mv_ndistinct(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
@@ -266,11 +267,16 @@ MVHistogram
 build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
 				   VacAttrStats **stats, int numrows_total);
 
+double
+build_mv_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid, MVDependencies dependencies,
 					 MCVList mcvlist, MVHistogram histogram,
+					 double ndistcoeff,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #ifdef DEBUG_MVHIST
-- 
2.1.0

#63

Gavin Flower

GavinFlower@archidevsys.co.nz

almost 10 years ago

In reply to: Heikki Linnakangas (#20)

Re: WIP: multivariate statistics / proof of concept

On 12/12/14 05:53, Heikki Linnakangas wrote:

On 10/13/2014 01:00 AM, Tomas Vondra wrote:

Hi,

attached is a WIP patch implementing multivariate statistics.

Great! Really glad to see you working on this.
+     * FIXME This sample sizing is mostly OK when computing stats for
+     *       individual columns, but when computing multi-variate stats
+     *       for multivariate stats (histograms, mcv, ...) it's rather
+     *       insufficient. For small number of dimensions it works, but
+     *       for complex stats it'd be nice use sample proportional to
+     *       the table (say, 0.5% - 1%) instead of a fixed size.
I don't think a fraction of the table is appropriate. As long as the
sample is random, the accuracy of a sample doesn't depend much on the
size of the population. For example, if you sample 1,000 rows from a
table with 100,000 rows, or 1000 rows from a table with 100,000,000
rows, the accuracy is pretty much the same. That doesn't change when
you go from a single variable to multiple variables.

You do need a bigger sample with multiple variables, however. My gut
feeling is that if you sample N rows for a single variable, with two
variables you need to sample N^2 rows to get the same accuracy. But
it's not proportional to the table size. (I have no proof for that,
but I'm sure there is literature on this.)

[...]

I did stage III statistics at University many moons ago...

The accuracy of the sample only depends on the value of N, not the total
size of the population, with the obvious constraint that N <= population
size.

The standard deviation in a random sample is proportional to the square
root of N. So using N = 100 would have a standard deviation of about
10%, so to reduce it to 5% you would need N = 400.

For multiple variables, it will also be a function of N - I don't recall
precisely how, I suspect it might M * N were M is the number of
parameters (but I'm not as certain). I think M^N might be needed if you
want all the possible correlations between sets of variable to be
reasonably significant - but I'm mostly just guessing here.

So using a % of table size is somewhat silly, looking at the above.
However, if you want to detect frequencies that occur at the 1% level,
then you will need to sample 1% of the table or greater. So which
approach is 'best', depends on what you are trying to determine. The
sample size is more useful when you need to decide between 2 different
hypothesises.

The sampling methodology, is far more important than the ratio of N to
population size - consider the bias imposed by using random telephone
numbers, even before the event of mobile phones!

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Tomas Vondra (#61)

Re: multivariate statistics v8

On Wed, Dec 23, 2015 at 2:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

The remaining question is how unique the statistics name should be.
My initial plan was to make it unique within a table, but that of
course does not work well with the DROP STATISTICS (it'd have to
specify the table name also), and it'd also now work with statistics
on multiple tables (which is one of the reasons for abandoning ALTER
TABLE stuff).

So I think it should be unique across tables. Statistics are hardly
a global object, so it should be unique within a schema. I thought
that simply using the schema of the table would work, but that of
course breaks with multiple tables in different schemas. So the only
solution seems to be explicit schema for statistics.

That solution seems good to me.

(with apologies for not having looked at the rest of this much at all)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Bruce Momjian

bruce@momjian.us

almost 10 years ago

In reply to: Robert Haas (#64)

Re: multivariate statistics v8

On Wed, Jan 20, 2016 at 02:20:38PM -0500, Robert Haas wrote:

On Wed, Dec 23, 2015 at 2:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

The remaining question is how unique the statistics name should be.
My initial plan was to make it unique within a table, but that of
course does not work well with the DROP STATISTICS (it'd have to
specify the table name also), and it'd also now work with statistics
on multiple tables (which is one of the reasons for abandoning ALTER
TABLE stuff).

So I think it should be unique across tables. Statistics are hardly
a global object, so it should be unique within a schema. I thought
that simply using the schema of the table would work, but that of
course breaks with multiple tables in different schemas. So the only
solution seems to be explicit schema for statistics.

That solution seems good to me.

(with apologies for not having looked at the rest of this much at all)

Woh, this will be an optimizer game-changer, from the user perspective!

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Bruce Momjian (#65)

Re: multivariate statistics v8

Bruce Momjian wrote:

On Wed, Jan 20, 2016 at 02:20:38PM -0500, Robert Haas wrote:

On Wed, Dec 23, 2015 at 2:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

The remaining question is how unique the statistics name should be.
My initial plan was to make it unique within a table, but that of
course does not work well with the DROP STATISTICS (it'd have to
specify the table name also), and it'd also now work with statistics
on multiple tables (which is one of the reasons for abandoning ALTER
TABLE stuff).

So I think it should be unique across tables. Statistics are hardly
a global object, so it should be unique within a schema. I thought
that simply using the schema of the table would work, but that of
course breaks with multiple tables in different schemas. So the only
solution seems to be explicit schema for statistics.

That solution seems good to me.

(with apologies for not having looked at the rest of this much at all)

Woh, this will be an optimizer game-changer, from the user perspective!

That is the intent. The patch is huge, though -- any reviewing help is
welcome.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Alvaro Herrera (#66)

Re: multivariate statistics v8

On 01/20/2016 10:54 PM, Alvaro Herrera wrote:

Bruce Momjian wrote:

On Wed, Jan 20, 2016 at 02:20:38PM -0500, Robert Haas wrote:

On Wed, Dec 23, 2015 at 2:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

The remaining question is how unique the statistics name should be.
My initial plan was to make it unique within a table, but that of
course does not work well with the DROP STATISTICS (it'd have to
specify the table name also), and it'd also now work with statistics
on multiple tables (which is one of the reasons for abandoning ALTER
TABLE stuff).

So I think it should be unique across tables. Statistics are hardly
a global object, so it should be unique within a schema. I thought
that simply using the schema of the table would work, but that of
course breaks with multiple tables in different schemas. So the only
solution seems to be explicit schema for statistics.

That solution seems good to me.

(with apologies for not having looked at the rest of this much at all)

Woh, this will be an optimizer game-changer, from the user perspective!

That is the intent. The patch is huge, though -- any reviewing help
is welcome.

It's also true that a significant fraction of the size is documentation
(in the form of comments). However even after stripping them the patch
is not exactly small ...

I'm afraid it may be rather difficult to understand the general idea of
the patch. So if anyone is interested in discussing the patch in
Brussels next week, I'm available.

Also, in December I've posted a link to a "paper" I started writing
about the stats:

https://bitbucket.org/tvondra/mvstats-paper/src

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#62)

9 attachment(s)

Re: multivariate statistics v10

Hi,

Attached is v10 of the patch series. There are 9 parts at the moment:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patch
0002-shared-infrastructure-and-functional-dependencies.patch
0003-clause-reduction-using-functional-dependencies.patch
0004-multivariate-MCV-lists.patch
0005-multivariate-histograms.patch
0006-multi-statistics-estimation.patch
0007-multivariate-ndistinct-coefficients.patch
0008-change-how-we-apply-selectivity-to-number-of-groups-.patch
0009-fixup-of-regression-tests-plans-changes-by-group-by-.patch

However, the first one is still just a temporary workaround that I plan
to address next, and the last 3 are all dealing with the ndistinct
coefficients (and shall be squashed into a single chunk).

README docs
-----------

Aside from fixing a few bugs, there are several major improvements, the
main one being that I've moved most of the comments explaining how it
all works into a set of regular README files, located in
src/backend/utils/mvstats:

1) README.stats - Overview of available types of statistics, what
clauses can be estimated, how multiple statistics are combined etc.
This is probably the right place to start.

2) docs for each type of statistics currently available

README.dependencies - soft functional dependencies
README.mcv - MCV lists
README.histogram - histograms
README.ndistinct - ndistinct coefficients

The READMEs are added and modified through the patch series, so the best
thing to do is apply all the patches and start reading.

I have not improved the user-oriented SGML documentation in this patch,
that's one of the tasks I'd lie to work on next. But the READMEs should
give you a good idea how it's supposed to work, and there are some
examples of use in the regression tests.

Significantly simplified places
-------------------------------

The patch version also significantly simplifies several places that were
needlessly complex in the previous ones - firstly the function
evaluating clauses on multivariate histograms was rather needlessly
bloated, so I've simplified it a lot. Similarly for the code in
clauselist_select() that combines multiple statistics to estimate a list
of clauses - that's much simpler now too. And various other pieces.

That being said, I still think the code in clausesel.c can be
simplified. I feel there's a lot of cruft, mostly due to unknowingly
implementing something that could be solved by an existing function.

A prime example of that is inspecting the expression tree to check if we
know how to estimate the clauses using the multivariate statistics. That
sounds like a nice match for expression walker, but currently is done by
custom code. I plan to look at that next.

Also, I'm not quite sure I understand what the varRelid parameter of
clauselist_selectivity is for, so the code may be handling that wrong
(seems to be working though).

ndistinct coefficients
----------------------

The one new piece in this patch is the GROUP BY estimation, based on the
ndistinct coefficients. So for example you can do this:

CREATE TABLE t AS SELECT mod(i,1000) AS a, mod(i,1000) AS b
FROM generate_series(1,1000000) s(i);
ANALYZE t;
EXPLAIN SELECT * FROM t GROUP BY a, b;

which currently does this:

QUERY PLAN
-----------------------------------------------------------------------
Group (cost=127757.34..135257.34 rows=99996 width=8)
Group Key: a, b
-> Sort (cost=127757.34..130257.34 rows=1000000 width=8)
Sort Key: a, b
-> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=8)
(5 rows)

but we know that there are only 1000 groups because the columns are
correlated. So let's create ndistinct statistics on the two columns:

CREATE STATISTICS s1 ON t (a,b) WITH (ndistinct);
ANALYZE t;

which results in estimates like this:

QUERY PLAN
-----------------------------------------------------------------
HashAggregate (cost=19425.00..19435.00 rows=1000 width=8)
Group Key: a, b
-> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=8)
(3 rows)

I'm not quite sure how to combine this type of statistics with MCV lists
and histograms, so for now it's used only for GROUP BY.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchbinary/octet-stream; name=0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchDownload

From 19defa4e8c1e578f3cf4099b0729357ecc333c5a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/9] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index dff52c4..80d01bd 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -197,6 +197,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -245,6 +252,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.1.0

0002-shared-infrastructure-and-functional-dependencies.patchbinary/octet-stream; name=0002-shared-infrastructure-and-functional-dependencies.patchDownload

From 8aa6a738260ece48b31e9abc955d0c326fbf8a9a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/9] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate
stats, most importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON table (columns) WITH (options)
- DROP STATISTICS name
- implementation of functional dependencies (the simplest
  type of multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e.
it does not influence the query planning (subject to
follow-up patches).

The current implementation requires a valid 'ltopr' for
the columns, so that we can sort the sample rows in various
ways, both in this patch and other kinds of statistics.
Maybe this restriction could be relaxed in the future,
requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV
list with limited functionality) might be made to work
with hashes of the values, which is sufficient for equality
comparisons. But the queries would require the equality
operator anyway, so it's not really a weaker requirement.
The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple
and probably needs improvements, so that it detects more
complicated dependencies, and also validation of the math.

The name 'functional dependencies' is more correct (than
'association rules') as it's exactly the name used in
relational theory (esp. Normal Forms) for tracking
column-level dependencies.

The multivariate statistics are automatically removed in
two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics
     would be defined on less than 2 columns (remaining)

If there are more at least 2 columns remaining, we keep
the statistics but perform cleanup on the next ANALYZE.
The dropped columns are removed from stakeys, and the new
statistics is built on the smaller set.

We can't do this at DROP COLUMN, because that'd leave us
with invalid statistics, or we'd have to throw it away
although we can still use it. This lazy approach lets us
use the statistics although some of the columns are dead.

This also adds a simple list of statistics to \d in psql.

This means the statistics are created within a schema by
using a qualified name (or using the default schema)

   CREATE STATISTICS schema.statistics ON ...

and then dropped by specifying qualified name

   DROP STATISTICS schema.statistics

or searching through search_path (just like with other objects).

This also gets rid of the "(opt_)stats_name" definitions in gram.y
and instead replaces them with just "opt_any_name", although the
optional case is not really handled currently - there's no generated
name yet (so either we should drop it or implement it).

I'm not entirely sure making statistics schema-specific is that
a great idea. Maybe it should be "global", but that does not seem
right (e.g. it makes multi-tenant systems based on schemas more
difficult to manage, because tenants would interact).
---
 doc/src/sgml/ref/allfiles.sgml                |   2 +
 doc/src/sgml/ref/create_statistics.sgml       | 174 ++++++++++
 doc/src/sgml/ref/drop_statistics.sgml         |  90 ++++++
 doc/src/sgml/reference.sgml                   |   2 +
 src/backend/catalog/Makefile                  |   1 +
 src/backend/catalog/dependency.c              |  11 +-
 src/backend/catalog/heap.c                    | 102 ++++++
 src/backend/catalog/namespace.c               |  51 +++
 src/backend/catalog/objectaddress.c           |  22 ++
 src/backend/catalog/system_views.sql          |  11 +
 src/backend/commands/Makefile                 |   6 +-
 src/backend/commands/analyze.c                |  21 ++
 src/backend/commands/dropcmds.c               |   4 +
 src/backend/commands/event_trigger.c          |   3 +
 src/backend/commands/statscmds.c              | 331 +++++++++++++++++++
 src/backend/commands/tablecmds.c              |   8 +-
 src/backend/nodes/copyfuncs.c                 |  16 +
 src/backend/nodes/outfuncs.c                  |  18 ++
 src/backend/optimizer/util/plancat.c          |  63 ++++
 src/backend/parser/gram.y                     |  34 +-
 src/backend/tcop/utility.c                    |  11 +
 src/backend/utils/Makefile                    |   2 +-
 src/backend/utils/cache/relcache.c            |  59 ++++
 src/backend/utils/cache/syscache.c            |  23 ++
 src/backend/utils/mvstats/Makefile            |  17 +
 src/backend/utils/mvstats/README.dependencies | 222 +++++++++++++
 src/backend/utils/mvstats/common.c            | 356 +++++++++++++++++++++
 src/backend/utils/mvstats/common.h            |  75 +++++
 src/backend/utils/mvstats/dependencies.c      | 437 ++++++++++++++++++++++++++
 src/bin/psql/describe.c                       |  44 +++
 src/include/catalog/dependency.h              |   5 +-
 src/include/catalog/heap.h                    |   1 +
 src/include/catalog/indexing.h                |   7 +
 src/include/catalog/namespace.h               |   2 +
 src/include/catalog/pg_mv_statistic.h         |  73 +++++
 src/include/catalog/pg_proc.h                 |   5 +
 src/include/catalog/toasting.h                |   1 +
 src/include/commands/defrem.h                 |   4 +
 src/include/nodes/nodes.h                     |   2 +
 src/include/nodes/parsenodes.h                |  12 +
 src/include/nodes/relation.h                  |  28 ++
 src/include/utils/mvstats.h                   |  70 +++++
 src/include/utils/rel.h                       |   4 +
 src/include/utils/relcache.h                  |   1 +
 src/include/utils/syscache.h                  |   2 +
 src/test/regress/expected/rules.out           |   9 +
 src/test/regress/expected/sanity_check.out    |   1 +
 47 files changed, 2432 insertions(+), 11 deletions(-)
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/README.dependencies
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index bf95453..c0f7653 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -76,6 +76,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
 <!ENTITY createTableSpace   SYSTEM "create_tablespace.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
 <!ENTITY dropTransform      SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..a86eae3
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,174 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON <replaceable class="PARAMETER">table_name</replaceable> ( [
+  { <replaceable class="PARAMETER">column_name</replaceable> } ] [, ...])
+[ WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )</literal></term>
+    <listitem>
+     <para>
+      ...
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Statistics Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>statistics parameters</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-notes">
+  <title>Notes</title>
+
+    <para>
+     ...
+    </para>
+
+ </refsect1>
+
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..4cc0b70
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,90 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics does not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 03020df..2b07b2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -104,6 +104,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createTable;
    &createTableAs;
    &createTableSpace;
@@ -147,6 +148,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropTable;
    &dropTableSpace;
    &dropTSConfig;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 25130ec..058b8a9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index c48e37b..8200454 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -160,7 +161,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1272,6 +1274,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2415,6 +2421,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 6a4a9d9..e7d9aaa 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_constraint_fn.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1613,7 +1614,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1841,6 +1845,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2696,6 +2705,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..dfd5bef 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4201,3 +4201,54 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+									  PointerGetDatum(stats_name),
+									  ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index d2aaa6d..3a6a0b0 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -438,9 +439,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		InvalidAttrNumber,		/* XXX same owner as relation */
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -913,6 +927,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2185,6 +2204,9 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			/* FIXME do the right owner checks here */
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index abf9a70..b8a264e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,6 +158,17 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..5151001 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 070df29..cbaa4e1 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -55,7 +56,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Per-index data for ANALYZE */
 typedef struct AnlIndexData
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..cd65b58 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -292,6 +292,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 9e32f8d..09061bb 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -110,6 +110,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1167,6 +1169,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_DEFACL:
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..84a8b13
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/multixact.h"
+#include "access/reloptions.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/dependency.h"
+#include "catalog/heap.h"
+#include "catalog/index.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_constraint.h"
+#include "catalog/pg_depend.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_inherits.h"
+#include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_tablespace.h"
+#include "catalog/pg_trigger.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_type_fn.h"
+#include "catalog/storage.h"
+#include "catalog/toasting.h"
+#include "commands/cluster.h"
+#include "commands/comment.h"
+#include "commands/defrem.h"
+#include "commands/event_trigger.h"
+#include "commands/policy.h"
+#include "commands/sequence.h"
+#include "commands/tablecmds.h"
+#include "commands/tablespace.h"
+#include "commands/trigger.h"
+#include "commands/typecmds.h"
+#include "commands/user.h"
+#include "executor/executor.h"
+#include "foreign/foreign.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/parsenodes.h"
+#include "optimizer/clauses.h"
+#include "optimizer/planner.h"
+#include "parser/parse_clause.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_collate.h"
+#include "parser/parse_expr.h"
+#include "parser/parse_oper.h"
+#include "parser/parse_relation.h"
+#include "parser/parse_type.h"
+#include "parser/parse_utilcmd.h"
+#include "parser/parser.h"
+#include "pgstat.h"
+#include "rewrite/rewriteDefine.h"
+#include "rewrite/rewriteHandler.h"
+#include "rewrite/rewriteManip.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "storage/lock.h"
+#include "storage/predicate.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "utils/typcache.h"
+#include "utils/mvstats.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON table (columns) WITH (options)
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+	ObjectAddress	address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	ObjectAddress parentobject, childobject;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (stmt->if_not_exists &&
+		SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exists, skipping",
+						namestr)));
+		return InvalidObjectAddress;
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+
+	/* transform the column names to attnum values */
+
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, stmt->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+	values[Anum_pg_mv_statistic_staname -1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace -1] = ObjectIdGetDatum(namespaceId);
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also record dependency on the schema (to drop statistics on DROP SCHEMA)
+	 */
+	parentobject.classId = NamespaceRelationId;
+	parentobject.objectId = ObjectIdGetDatum(namespaceId);
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *     DROP STATISTICS stats_name ON table_name
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	HeapTuple	tup;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96dc923..96ab02f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -37,6 +37,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -95,7 +96,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -143,8 +144,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a9e9cc3..1a04024 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4124,6 +4124,19 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt  *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4999,6 +5012,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 85acce8..474d2c7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1968,6 +1968,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3409,6 +3424,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0ea9fcf..b9de71d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -28,6 +28,7 @@
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -40,7 +41,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -94,6 +97,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -387,6 +391,65 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+			if (! HeapTupleIsValid(htup))
+				continue;
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b307b48..3be3f02 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -241,7 +241,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -809,6 +809,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3436,6 +3437,36 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $5;
+							n->keys = $7;
+							n->options = $9;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $8;
+							n->keys = $10;
+							n->options = $12;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -5621,6 +5652,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 045f7f0..2ba88e2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1520,6 +1520,10 @@ ProcessUtilitySlow(Node *parsetree,
 				address = ExecSecLabelStmt((SecLabelStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:	/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -2160,6 +2164,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_TRANSFORM:
 					tag = "DROP TRANSFORM";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2527,6 +2534,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 130c06d..3bc4c8a 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3956,6 +3957,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4920,6 +4977,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 65ffe84..3c1bc4b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -502,6 +503,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
new file mode 100644
index 0000000..1f96fbc
--- /dev/null
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -0,0 +1,222 @@
+Soft functional dependencies
+============================
+
+A type of multivariate statistics used to capture cases when one column (or
+possibly a combination of columns) determines values in another column. We may
+also say that one column implies the other one.
+
+A simple artificial example may be a table with two columns, created like this
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, once we know the value for column 'a' the value for 'b' is trivially
+determined, as it's simply (a/10). A more practical example may be addresses,
+where (ZIP code -> city name), i.e. once we know the ZIP, we probably know the
+city it belongs to, as ZIP codes are usually assigned to one city. Larger cities
+may have multiple ZIP codes, so the dependency can't be reversed.
+
+Functional dependencies are a concept well described in relational theory,
+particularly in definition of normalization and "normal forms". Wikipedia has a
+nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency on
+    a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee" table
+    that includes the attributes "Employee ID" and "Employee Date of Birth", the
+    functional dependency {Employee ID} -> {Employee Date of Birth} would hold.
+    It follows from the previous two sentences that each {Employee ID} is
+    associated with precisely one {Employee Date of Birth}.
+
+    [1] http://en.wikipedia.org/wiki/Database_normalization
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases it's actually a conscious
+design choice to model the dataset in denormalized way, either because of
+performance or to make querying easier.
+
+The functional dependencies are called 'soft' because the implementation is
+meant to allow small number of rows contradicting the dependency. Many actual
+data sets contain some sort of errors, either because of data entry mistakes
+(user mistyping the ZIP code) or issues in generating the data (e.g. a ZIP code
+mistakenly assigned to two cities in different states). A strict implementation
+would ignore dependencies on such noisy data, rendering the approach unusable on
+such data sets.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current build algorithm is rather simple - for each pair (a,b) of columns,
+the data are sorted lexicographically (first by 'a', then by 'b'). Then for each
+group (rows with the same 'a' value) we decide whether the group is neutral,
+supporting or contradicting the dependency (a->b).
+
+A group is considered neutral when it's too small - e.g. when there's a single
+row in the group, there can't possibly be multiple values in 'b'. For this
+reason we ignore groups smaller than a threshold (currently 3 rows).
+
+For sufficiently large groups (3 rows or more), we count the number of distinct
+values in 'b'. When there's a single 'b' value, the group is considered to
+support the dependency (a->b), otherwise it's condidered as contradicting it.
+
+At the end, we compare the number of rows in supporting and contradicting groups,
+and if there are at least 10x as many supporting rows, we consider the
+functional dependency to be valid.
+
+
+This approach has the negative property that the algorithm is that it's a bit
+fragile with respect to the sample - there may be data sets producing quite
+different results for each ANALYZE execution (as even a single row may change
+the outcome of the final 10x test).
+
+It was proposed to make the dependencies "fuzzy" - e.g. track some coefficient
+between [0,1] determining how much the dependency holds. That would however mean
+we have to keep all the dependencies, as eliminating them based on the value of
+the coefficient (e.g. throw away dependencies <= 0.5) would result in exactly
+the same fragility issues. This would also make it more complicated to combine
+dependencies. So this does not seem like a practical approach.
+
+A better approach might be to replace the constants (min_group_size=3 and 10x)
+with values somehow related to the particular data set.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Apllying the functional dependencies is quite simple - given a list of equality
+clauses, check which clauses are redundant (i.e. implied by some other clause).
+For example given clause list
+
+    (a = 2) AND (b = 2) AND (c = 3)
+
+and dependencies (a->b) and (a->d), the list of clauses may be simplified to
+
+    (a = 1) AND (c = 3)
+
+Functional dependencies may only be applied to equality clauses, all other types
+of clauses are ignored. See clauselist_apply_dependencies() for more details.
+
+
+Compatibility of clauses
+------------------------
+
+The reduction assumes the clauses really are redundant, and the value in the
+reduced clause (b=2) is the value determined by (a=1). If that's not the case
+and the values are "incompatible" the result will be over-estimation.
+
+This may happen for example when using conditions on ZIP and city name with
+mismatching values (ZIP for a different city), etc. In such case the result
+set will be empty, but we'll estimate the selectivity using the ZIP condition.
+
+In this case the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+
+Dependencies vs. MCV/histogram
+------------------------------
+
+In some cases the "compatibility" of the conditions might be verified using the
+other types of multivariate stats - MCV lists and histograms.
+
+For MCV lists the verification might be very simple - peek into the list if
+there are any items matching the clause on the 'a' column (e.g. ZIP code), and
+if such item is found, check that the 'b' column matches the other clause. If it
+does not, the clauses are contradictory. We can't really say if such item was
+not found, except maybe restricting the selectivity using the MCV data (e.g.
+using min/max selectivity, or something).
+
+With histograms, it might work similarly - we can't check the values directly
+(because histograms use buckets, unlike MCV lists, storing the actual values).
+So we can only observe the buckets matching the clauses - if those buckets have
+very low frequency, it probably means the two clauses are incompatible.
+
+It's unclear what 'low frequency' is, but if one of the clauses is implied
+(automatically true because of the other clause), then
+
+    selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+
+So we might compute selectivity of the first clause - for example using regular
+statistics. And then check if the selectivity computed from the histogram is
+about the same (or significantly lower).
+
+The problem is that histograms work well only when the data ordering matches the
+natural meaning. For values that serve as labels - like city names or ZIP codes,
+or even generated IDs, histograms really don't work all that well. For example
+sorting cities by name won't match the sorting of ZIP codes, rendering the
+histogram unusable.
+
+So MCVs are probably going to work much better, because they don't really assume
+any sort of ordering. And it's probably more appropriate for the label-like data.
+
+A good question however is why even use functional dependencies in such cases
+and not simply use the MCV/histogram instead. One reason is that the functional
+dependencies allow fallback to regular stats, and often produce more accurate
+estimates - especially compared to histograms, that are quite bad in estimating
+equality clauses.
+
+
+Limitations
+-----------
+
+Let's see the main liminations of functional dependencies, especially those
+related to the current implementation.
+
+The current implementation supports only dependencies between two columns, but
+this is merely a simplification of the initial implementation. It's certainly
+useful to mine for dependencies involving multiple columns on the 'left' side,
+i.e. a condition for the dependency. That is dependencies like (a,b -> c).
+
+The implementation may/should be smart enough not to mine redundant conditions,
+e.g. (a->b) and (a,c -> b), because the latter is a trivial consequence of the
+former one (if values of 'a' determine 'b', adding another column won't change
+that relationship). The ANALYZE should first analyze 1:1 dependencies, then 2:1
+dependencies (and skip the already identified ones), etc.
+
+For example the dependency
+
+    (city name -> zip code)
+
+is much stronger, i.e. whenever it hold, then
+
+    (city name, state name -> zip code)
+
+holds too. But in case there are cities with the same name in different states,
+then only the latter dependency will be valid.
+
+Of course, there probably are cities with the same name within a single state,
+but hopefully this is relatively rare occurence (and thus we'll still detect
+the 'soft' dependency).
+
+Handling multiple columns on the right side of the dependency, is not necessary,
+as those dependencies may be simply decomposed into a set of dependencies with
+the same meaning, one for each column on the right side. For example
+
+    (a -> b,c)
+
+is exactly the same as
+
+    (a -> b) & (a -> c)
+
+Of course, storing the first form may be more efficient thant storing multiple
+'simple' dependencies separately.
+
+
+TODO Support dependencies with multiple columns on left/right.
+
+TODO Investigate using histogram and MCV list to verify the dependencies.
+
+TODO Investigate statistical testing of the distribution (to decide whether it
+     makes sense to build the histogram/MCV list).
+
+TODO Using a min/max of selectivities would probably make more sense for the
+     associated columns.
+
+TODO Consider eliminating the implied columns from the histogram and MCV lists
+     (but maybe that's not a good idea, because that'd make it impossible to use
+     these stats for non-equality clauses and also it wouldn't be possible to
+     use the stats for verification of the dependencies).
+
+TODO The reduction probably might be extended to also handle IS NULL clauses,
+     assuming we fix the ANALYZE to properly handle NULL values. We however
+     won't be able to reduce IS NOT NULL (unless I'm missing something).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..6d5465b
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..2a064a0
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,437 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Detect functional dependencies between columns.
+ *
+ * TODO This builds a complete set of dependencies, i.e. including transitive
+ *      dependencies - if we identify [A => B] and [B => C], we're likely to
+ *      identify [A => C] too. It might be better to  keep only the minimal set
+ *      of dependencies, i.e. prune all the dependencies that we can recreate
+ *      by transivitity.
+ * 
+ *      There are two conceptual ways to do that:
+ * 
+ *      (a) generate all the rules, and then prune the rules that may be
+ *          recteated by combining other dependencies, or
+ * 
+ *      (b) performing the 'is combination of other dependencies' check before
+ *          actually doing the work
+ * 
+ *      The second option has the advantage that we don't really need to perform
+ *      the sort/count. It's not sufficient alone, though, because we may
+ *      discover the dependencies in the wrong order. For example we may find
+ *
+ *          (a -> b), (a -> c) and then (b -> c)
+ *
+ *      None of those dependencies is a combination of the already known ones,
+ *      yet (a -> C) is a combination of (a -> b) and (b -> c).
+ *
+ * 
+ * FIXME Currently we simply replace NULL values with 0 and then handle is as
+ *       a regular value, but that groups NULL and actual 0 values. That's
+ *       clearly incorrect - we need to handle NULL values as a separate value.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct values in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we can estimate the
+	 *      average group size and use it as a threshold. Or something
+	 *      like that. Seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index fd8dc91..4f106c3 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2104,6 +2104,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90500)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 049bf9f..12211fe 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,10 +153,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 
 /* in dependency.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index b80d8d8..5ae42f7 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ab2c1a8..a768bb5 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId  3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..44cf9c6 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -137,6 +137,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..a568a07
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,73 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;			/* relation containing attributes */
+	NameData	staname;			/* statistics name */
+	Oid			stanamespace;		/* OID of namespace containing this statistics */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					7
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_deps_enabled		4
+#define Anum_pg_mv_statistic_deps_built			5
+#define Anum_pg_mv_statistic_stakeys			6
+#define Anum_pg_mv_statistic_stadeps			7
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 62b9125..20d565c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2666,6 +2666,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index b7a38ce..a52096b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3577, 3578);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 54f67e9..99a6a62 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -75,6 +75,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(List *name, List *args, bool oldstyle,
 				List *parameters, const char *queryString);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index c407fa2..2226aad 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -251,6 +251,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -386,6 +387,7 @@ typedef enum NodeTag
 	T_CreatePolicyStmt,
 	T_AlterPolicyStmt,
 	T_CreateTransformStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fd0629..e1807fb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -601,6 +601,17 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+	bool		if_not_exists;	/* just do nothing if statistics already exists? */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1410,6 +1421,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index af8cb6b..de86d01 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -503,6 +503,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -600,6 +601,33 @@ typedef struct IndexOptInfo
 	void		(*amcostestimate) ();	/* AM's cost estimator */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..7ebd961
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..8771f9c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -61,6 +61,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -93,6 +94,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 1b48304..9f03c8d 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..0e0658d 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 81bc5c9..84b4425 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1368,6 +1368,15 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
2.1.0

0003-clause-reduction-using-functional-dependencies.patchbinary/octet-stream; name=0003-clause-reduction-using-functional-dependencies.patchDownload

From 730f652aa01850d09586e354fe37c1478cc22e46 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/9] clause reduction using functional dependencies

During planning, use functional dependencies to decide which
clauses to skip during cardinality estimation. Initial and
rather simplistic implementation.

This only works with regular WHERE clauses, not clauses used
for join clauses.

Note: The clause_is_mv_compatible() needs to identify the
relation (so that we can fetch the list of multivariate stats
by OID). planner_rt_fetch() seems like the appropriate way to
get the relation OID, but apparently it only works with simple
vars. Maybe examine_variable() would make this work with more
complex vars too?

Includes regression tests analyzing functional dependencies
(part of ANALYZE) on several datasets (no dependencies, no
transitive dependencies, ...).

Checks that a query with conditions on two columns, where one (B)
is functionally dependent on the other one (A), correctly ignores
the clause on (B) and chooses bitmap index scan instead of plain
index scan (which is what happens otherwise, thanks to assumption
of independence).

Note: Functional dependencies only work with equality clauses,
no inequalities etc.
---
 src/backend/optimizer/path/clausesel.c        | 830 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/README.stats        |  36 ++
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 +
 src/include/utils/mvstats.h                   |  16 +-
 src/test/regress/expected/mv_dependencies.out | 172 ++++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 +++++
 9 files changed, 1232 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.stats
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 02660c2..c11aa3b 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,47 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(Node *clause, Oid varRelid,
+							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+
+static Bitmapset  *collect_mv_attnums(List *clauses,
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static int count_mv_attnums(List *clauses, Oid varRelid,
+							SpecialJoinInfo *sjinfo);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+								Oid varRelid, List *stats,
+								SpecialJoinInfo *sjinfo);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, List *clauses,
+						 Oid varRelid, Index *relid);
+ 
+static Bitmapset* fdeps_collect_attnums(List *stats);
+
+static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
+static int	*make_attnum_to_idx_mapping(Bitmapset *attnums);
+
+static bool	*build_adjacency_matrix(List *stats, Bitmapset *attnums,
+								int *idx_to_attnum, int *attnum_to_idx);
+
+static void	multiply_adjacency_matrix(bool *matrix, int natts);
+
+static List* fdeps_reduce_clauses(List *clauses,
+								  Bitmapset *attnums, bool *matrix,
+								  int *idx_to_attnum, int *attnum_to_idx,
+								  Index relid);
+
+static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+
+static Bitmapset * get_varattnos(Node * node, Index relid);
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +106,19 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
+ * The first thing we try to do is applying multivariate statistics, in a way
+ * that intends to minimize the overhead when there are no multivariate stats
+ * on the relation. Thus we do several simple (and inexpensive) checks first,
+ * to verify that suitable multivariate statistics exist.
+ *
+ * If we identify such multivariate statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using multivariate stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -99,6 +157,15 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* list of multivariate stats on the relation */
+	List	   *stats = NIL;
+
+	/* use clauses (not conditions), because those are always non-empty */
+	stats = find_stats(root, clauses, varRelid, &relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +175,25 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Apply functional dependencies, but first check that there are some stats
+	 * with functional dependencies built (by simply walking the stats list),
+	 * and that there are at two or more attributes referenced by clauses that
+	 * may be reduced using functional dependencies.
+	 *
+	 * We would find that anyway when trying to actually apply the functional
+	 * dependencies, but let's do the cheap checks first.
+	 *
+	 * After applying the functional dependencies we get the remainig clauses
+	 * that need to be estimated by other types of stats (MCV, histograms etc).
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
+		(count_mv_attnums(clauses, varRelid, sjinfo) >= 2))
+	{
+		clauses = clauselist_apply_dependencies(root, clauses, varRelid,
+												stats, sjinfo);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +849,745 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Oid varRelid,
+				   Index *relid, SpecialJoinInfo *sjinfo)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(clause, varRelid, relid, &attnum, sjinfo))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+		*relid = InvalidOid;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Oid varRelid, SpecialJoinInfo *sjinfo)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, varRelid,
+											NULL, sjinfo);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(Node *clause, Oid varRelid,
+						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+{
+
+	if (IsA(clause, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return false;
+
+		/* no support for OR clauses at this point */
+		if (rinfo->orclause)
+			return false;
+
+		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
+		clause = (Node*)rinfo->clause;
+
+		/* only simple opclauses are compatible with multivariate stats */
+		if (! is_opclause(clause))
+			return false;
+
+		/* we don't support join conditions at this moment */
+		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+			return false;
+
+		/* is it 'variable op constant' ? */
+		if (list_length(((OpExpr *) clause)->args) == 2)
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				(is_pseudo_constant_clause_relids(lsecond(expr->args),
+												rinfo->right_relids) ||
+				(varonleft = false,
+				is_pseudo_constant_clause_relids(linitial(expr->args),
+												rinfo->left_relids)));
+
+			if (ok)
+			{
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+				/*
+				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+				 * (return NULL).
+				 *
+				 * TODO Maybe use examine_variable() would fix that?
+				 */
+				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+					return false;
+
+				/*
+				 * Only consider this variable if (varRelid == 0) or when the varno
+				 * matches varRelid (see explanation at clause_selectivity).
+				 *
+				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+				 *       part seems to be enforced by treat_as_join_clause().
+				 */
+				if (! ((varRelid == 0) || (varRelid == var->varno)))
+					return false;
+
+				/* Also skip special varno values, and system attributes ... */
+				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+					return false;
+
+				if (relid)
+					*relid = var->varno;
+
+				/*
+				 * If it's not a "<" or ">" or "=" operator, just ignore the
+				 * clause. Otherwise note the relid and attnum for the variable.
+				 * This uses the function for estimating selectivity, ont the
+				 * operator directly (a bit awkward, but well ...).
+				 */
+				switch (get_oprrest(expr->opno))
+					{
+						case F_EQSEL:
+							*attnum = var->varattno;
+							return true;
+					}
+			}
+		}
+	}
+
+	return false;
+
+}
+
+/*
+ * reduce list of equality clauses using soft functional dependencies
+ *
+ * We simply walk through list of functional dependencies, and for each one we
+ * check whether the dependency 'matches' the clauses, i.e. if there's a clause
+ * matching the condition. If yes, we attempt to remove all clauses matching
+ * the implied part of the dependency from the list.
+ *
+ * This only reduces equality clauses, and ignores all the other types. We might
+ * extend it to handle IS NULL clause, in the future.
+ *
+ * We also assume the equality clauses are 'compatible'. For example we can't
+ * identify when the clauses use a mismatching zip code and city name. In such
+ * case the usual approach (product of selectivities) would produce a better
+ * estimate, although mostly by chance.
+ *
+ * The implementation needs to be careful about cyclic dependencies, e.g. when
+ *
+ *     (a -> b) and (b -> a)
+ *
+ * at the same time, which means there's 1:1 relationship between te columns.
+ * In this case we must not reduce clauses on both attributes at the same time.
+ *
+ * TODO Currently we only apply functional dependencies at the same level, but
+ *      maybe we could transfer the clauses from upper levels to the subtrees?
+ *      For example let's say we have (a->b) dependency, and condition
+ *
+ *          (a=1) AND (b=2 OR c=3)
+ *
+ *      Currently, we won't be able to perform any reduction, because we'll
+ *      consider (a=1) and (b=2 OR c=3) independently. But maybe we could pass
+ *      (a=1) into the other expression, and only check it against conditions
+ *      of the functional dependencies?
+ *
+ *      In this case we'd end up with
+ *
+ *         (a=1)
+ *
+ *      as we'd consider (b=2) implied thanks to the rule, rendering the whole
+ *      OR clause valid.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Oid varRelid, List *stats,
+							  SpecialJoinInfo *sjinfo)
+{
+	List	   *reduced_clauses = NIL;
+	Index		relid;
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	List	   *deps_clauses   = NIL;
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/*
+	 * Build the dependency matrix, i.e. attribute adjacency matrix,
+	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
+	 * multiply it by itself, to get transitive dependencies.
+	 *
+	 * Note: This is pretty much transitive closure from graph theory.
+	 *
+	 * First, let's see what attributes are covered by functional
+	 * dependencies (sides of the adjacency matrix), and also a maximum
+	 * attribute (size of mapping to simple integer indexes);
+	 */
+	deps_attnums = fdeps_collect_attnums(stats);
+
+	/*
+	 * Walk through the clauses - clauses that are (one of)
+	 *
+	 * (a) not mv-compatible
+	 * (b) are using more than a single attnum
+	 * (c) using attnum not covered by functional depencencies
+	 *
+	 * may be copied directly to the result. The interesting clauses are
+	 * kept in 'deps_clauses' and will be processed later.
+	 */
+	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
+										  &reduced_clauses, &deps_clauses,
+										  varRelid, &relid, sjinfo);
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+		list_free(deps_clauses);
+
+		return clauses;
+	}
+
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't really reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(deps_clauses);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/*
+	 * Build mapping between matrix indexes and attnums, and then the
+	 * adjacency matrix itself.
+	 */
+	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
+	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
+
+	/* build the adjacency matrix */
+	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
+										 deps_idx_to_attnum,
+										 deps_attnum_to_idx);
+
+	deps_natts = bms_num_members(deps_attnums);
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	multiply_adjacency_matrix(deps_matrix, deps_natts);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	deps_clauses = fdeps_reduce_clauses(deps_clauses,
+										deps_attnums, deps_matrix,
+										deps_idx_to_attnum,
+										deps_attnum_to_idx, relid);
+
+	/* join the two lists of clauses */
+	reduced_clauses = list_union(reduced_clauses, deps_clauses);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
+
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Determing relid (either from varRelid or from clauses) and then
+ * lookup stats using the relid.
+ */
+static List *
+find_stats(PlannerInfo *root, List *clauses, Oid varRelid, Index *relid)
+{
+	/* unknown relid by default */
+	*relid = InvalidOid;
+
+	/*
+	 * First we need to find the relid (index info simple_rel_array).
+	 * If varRelid is not 0, we already have it, otherwise we have to
+	 * look it up from the clauses.
+	 */
+	if (varRelid != 0)
+		*relid = varRelid;
+	else
+	{
+		Relids	relids = pull_varnos((Node*)clauses);
+
+		/*
+		 * We only expect 0 or 1 members in the bitmapset. If there are
+		 * no vars, we'll get empty bitmapset, otherwise we'll get the
+		 * relid as the single member.
+		 *
+		 * FIXME For some reason we can get 2 relids here (e.g. \d in
+		 *       psql does that).
+		 */
+		if (bms_num_members(relids) == 1)
+			*relid = bms_singleton_member(relids);
+
+		bms_free(relids);
+	}
+
+	/*
+	 * if we found the relid, we can get the stats from simple_rel_array
+	 *
+	 * This only gets stats that are already built, because that's how
+	 * we load it into RelOptInfo (see get_relation_info), but we don't
+	 * detoast the whole stats yet. That'll be done later, after we
+	 * decide which stats to use.
+	 */
+	if (*relid != InvalidOid)
+		return root->simple_rel_array[*relid]->mvstatlist;
+
+	return NIL;
+}
+
+static Bitmapset*
+fdeps_collect_attnums(List *stats)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		int2vector *stakeys = info->stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+			attnums = bms_add_member(attnums, stakeys->values[j]);
+	}
+
+	return attnums;
+}
+
+
+static int*
+make_idx_to_attnum_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum = -1;
+
+	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
+
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attidx++] = attnum;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+static int*
+make_attnum_to_idx_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum = -1;
+	int		maxattnum = -1;
+	int	   *mapping;
+
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		maxattnum = attnum;
+
+	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attnum] = attidx++;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+static bool*
+build_adjacency_matrix(List *stats, Bitmapset *attnums,
+					   int *idx_to_attnum, int *attnum_to_idx)
+{
+	ListCell *lc;
+	int		natts  = bms_num_members(attnums);
+	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! stat->deps_built)
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(stat->mvoid);
+		if (dependencies == NULL)
+		{
+			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
+			continue;
+		}
+
+		/* set matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			matrix[aidx * natts + bidx] = true;
+		}
+	}
+
+	return matrix;
+}
+
+static void
+multiply_adjacency_matrix(bool *matrix, int natts)
+{
+	int i;
+
+	for (i = 0; i < natts; i++)
+	{
+		int k, l, m;
+		int nchanges = 0;
+
+		/* k => l */
+		for (k = 0; k < natts; k++)
+		{
+			for (l = 0; l < natts; l++)
+			{
+				/* we already have this dependency */
+				if (matrix[k * natts + l])
+					continue;
+
+				/* we don't really care about the exact value, just 0/1 */
+				for (m = 0; m < natts; m++)
+				{
+					if (matrix[k * natts + m] * matrix[m * natts + l])
+					{
+						matrix[k * natts + l] = true;
+						nchanges += 1;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added here, so terminate */
+		if (nchanges == 0)
+			break;
+	}
+}
+
+static List*
+fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
+					int *idx_to_attnum, int *attnum_to_idx, Index relid)
+{
+	int i;
+	ListCell *lc;
+	List   *reduced_clauses = NIL;
+
+	int			nmvclauses;	/* size of the arrays */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+
+	int			natts = bms_num_members(attnums);
+
+	/*
+	 * Preallocate space for all clauses (the list only containst
+	 * compatible clauses at this point). This makes it somewhat easier
+	 * to access the stats / attnums randomly.
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* fill the arrays */
+	nmvclauses = 0;
+	foreach (lc, clauses)
+	{
+		Node * clause = (Node*)lfirst(lc);
+		Bitmapset * attnums = get_varattnos(clause, relid);
+
+		mvclauses[nmvclauses] = clause;
+		mvattnums[nmvclauses] = bms_singleton_member(attnums);
+		nmvclauses++;
+	}
+
+	Assert(nmvclauses == list_length(clauses));
+
+	/* now try to reduce the clauses (using the dependencies) */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], attnums))
+				continue;
+
+			aidx = attnum_to_idx[mvattnums[i]];
+			bidx = attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = matrix[aidx * natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	return reduced_clauses;
+}
+
+
+static Bitmapset *
+fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo)
+{
+	ListCell *lc;
+	Bitmapset *clause_attnums = NULL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+
+		if (! clause_is_mv_compatible(clause, varRelid, relid,
+									  &attnum, sjinfo))
+
+			/* clause incompatible with functional dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(attnum, deps_attnums))
+
+			/* clause not covered by the dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else
+		{
+			*deps_clauses   = lappend(*deps_clauses, clause);
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	return clause_attnums;
+}
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
new file mode 100644
index 0000000..a38ea7b
--- /dev/null
+++ b/src/backend/utils/mvstats/README.stats
@@ -0,0 +1,36 @@
+Multivariate statististics
+==========================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Multivariate stats track different types of dependencies between the columns,
+hopefully improving the estimates.
+
+Currently we only have one kind of multivariate statistics - soft functional
+dependencies, and we use it to improve estimates of equality clauses. See
+README.dependencies for details.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable multivariate stats
+        exist (so if you are not using multivariate stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are multivariate stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with multivariate statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..bd200bc 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 2a064a0..c80ba33 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -435,3 +435,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 7ebd961..cc43a79 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,12 +17,20 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -48,6 +56,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..e759997
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index bec0316..4f2ffb8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7e9b319..097a04f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -162,3 +162,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..48dea4d
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
2.1.0

0004-multivariate-MCV-lists.patchbinary/octet-stream; name=0004-multivariate-MCV-lists.patchDownload

From c63618ac6f0696f6c863cfb1b048b6ecdc611b97 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/9] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 doc/src/sgml/ref/create_statistics.sgml |   18 +
 src/backend/catalog/system_views.sql    |    4 +-
 src/backend/commands/statscmds.c        |   45 +-
 src/backend/nodes/outfuncs.c            |    2 +
 src/backend/optimizer/path/clausesel.c  | 1032 ++++++++++++++++++++++++++---
 src/backend/optimizer/util/plancat.c    |    4 +-
 src/backend/utils/mvstats/Makefile      |    2 +-
 src/backend/utils/mvstats/README.mcv    |  137 ++++
 src/backend/utils/mvstats/README.stats  |   89 ++-
 src/backend/utils/mvstats/common.c      |  104 ++-
 src/backend/utils/mvstats/common.h      |   11 +-
 src/backend/utils/mvstats/mcv.c         | 1094 +++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                 |   25 +-
 src/include/catalog/pg_mv_statistic.h   |   18 +-
 src/include/catalog/pg_proc.h           |    4 +
 src/include/nodes/relation.h            |    2 +
 src/include/utils/mvstats.h             |   69 +-
 src/test/regress/expected/mv_mcv.out    |  207 ++++++
 src/test/regress/expected/rules.out     |    4 +-
 src/test/regress/parallel_schedule      |    2 +-
 src/test/regress/serial_schedule        |    1 +
 src/test/regress/sql/mv_mcv.sql         |  178 +++++
 22 files changed, 2936 insertions(+), 116 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.mcv
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index a86eae3..193e4b0 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -132,6 +132,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>max_mcv_items</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of MCV list items.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a264e..2d570ee 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -165,7 +165,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 84a8b13..90bfaed 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -136,7 +136,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject, childobject;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -212,6 +218,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -220,10 +249,16 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -243,8 +278,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 474d2c7..e3983fd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1977,9 +1977,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c11aa3b..ce7d231 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,20 +48,41 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
 static bool clause_is_mv_compatible(Node *clause, Oid varRelid,
-							 Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo);
+							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+							 int type);
 
 static Bitmapset  *collect_mv_attnums(List *clauses,
-									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo);
+									  Oid varRelid, Index *relid, SpecialJoinInfo *sjinfo,
+									  int type);
 
 static int count_mv_attnums(List *clauses, Oid varRelid,
-							SpecialJoinInfo *sjinfo);
+							SpecialJoinInfo *sjinfo, int type);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+								 List *clauses, Oid varRelid,
+								 List **mvclauses, MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						List *clauses, MVStatisticInfo *mvstats);
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
@@ -88,6 +110,13 @@ static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
 
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -113,11 +142,13 @@ static Bitmapset * get_varattnos(Node * node, Index relid);
  * to verify that suitable multivariate statistics exist.
  *
  * If we identify such multivariate statistics apply, we try to apply them.
- * Currently we only have (soft) functional dependencies, so we try to reduce
- * the list of clauses.
  *
- * Then we remove the clauses estimated using multivariate stats, and process
- * the rest of the clauses using the regular per-column stats.
+ * First we try to reduce the list of clauses by applying (soft) functional
+ * dependencies, and then we try to estimate the selectivity of the reduced
+ * list of clauses using the multivariate MCV list.
+ *
+ * Finally we remove the portion of clauses estimated using multivariate stats,
+ * and process the rest of the clauses using the regular per-column stats.
  *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
@@ -187,13 +218,49 @@ clauselist_selectivity(PlannerInfo *root,
 	 * that need to be estimated by other types of stats (MCV, histograms etc).
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
-		(count_mv_attnums(clauses, varRelid, sjinfo) >= 2))
+		(count_mv_attnums(clauses, varRelid, sjinfo, MV_CLAUSE_TYPE_FDEP) >= 2))
 	{
 		clauses = clauselist_apply_dependencies(root, clauses, varRelid,
 												stats, sjinfo);
 	}
 
 	/*
+	 * Check that there are statistics with MCV list or histogram, and also the
+	 * number of attributes covered by these types of statistics.
+	 *
+	 * If there are no such stats or not enough attributes, don't waste time
+	 * with the multivariate code and simply skip to estimation using the
+	 * regular per-column stats.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
+		(count_mv_attnums(clauses, varRelid, sjinfo, MV_CLAUSE_TYPE_MCV) >= 2))
+	{
+		/* collect attributes from the compatible conditions */
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, varRelid, NULL, sjinfo,
+												  MV_CLAUSE_TYPE_MCV);
+
+		/* and search for the statistic covering the most attributes */
+		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+		if (mvstat != NULL)	/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	*mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, sjinfo, clauses,
+									varRelid, &mvclauses, mvstat,
+									MV_CLAUSE_TYPE_MCV);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats */
+			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -850,12 +917,75 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO We may support some additional conditions, most importantly those
+ *      matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ *      selectivity of the most restrictive clause), because that's the maximum
+ *      we can ever get from ANDed list of clauses. This may probably prevent
+ *      issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO We may remember the lowest frequency in the MCV list, and then later use
+ *      it as a upper boundary for the selectivity (had there been a more
+ *      frequent item, it'd be in the MCV list). This might improve cases with
+ *      low-detail histograms.
+ *
+ * TODO We may also derive some additional boundaries for the selectivity from
+ *      the MCV list, because
+ *
+ *      (a) if we have a "full equality condition" (one equality condition on
+ *          each column of the statistic) and we found a match in the MCV list,
+ *          then this is the final selectivity (and pretty accurate),
+ *
+ *      (b) if we have a "full equality condition" and we haven't found a match
+ *          in the MCV list, then the selectivity is below the lowest frequency
+ *          found in the MCV list,
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
 collect_mv_attnums(List *clauses, Oid varRelid,
-				   Index *relid, SpecialJoinInfo *sjinfo)
+				   Index *relid, SpecialJoinInfo *sjinfo, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
@@ -871,12 +1001,11 @@ collect_mv_attnums(List *clauses, Oid varRelid,
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(clause, varRelid, relid, &attnum, sjinfo))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, varRelid, relid, &attnums,
+								sjinfo, types);
 	}
 
 	/*
@@ -898,11 +1027,11 @@ collect_mv_attnums(List *clauses, Oid varRelid,
  * Count the number of attributes in clauses compatible with multivariate stats.
  */
 static int
-count_mv_attnums(List *clauses, Oid varRelid, SpecialJoinInfo *sjinfo)
+count_mv_attnums(List *clauses, Oid varRelid, SpecialJoinInfo *sjinfo, int type)
 {
 	int c;
 	Bitmapset *attnums = collect_mv_attnums(clauses, varRelid,
-											NULL, sjinfo);
+											NULL, sjinfo, type);
 
 	c = bms_num_members(attnums);
 
@@ -912,6 +1041,188 @@ count_mv_attnums(List *clauses, Oid varRelid, SpecialJoinInfo *sjinfo)
 }
 
 /*
+ * We're looking for statistics matching at least 2 attributes,
+ * referenced in the clauses compatible with multivariate statistics.
+ * The current selection criteria is very simple - we choose the
+ * statistics referencing the most attributes.
+ *
+ * If there are multiple statistics referencing the same number of
+ * columns (from the clauses), the one with less source columns
+ * (as listed in the ADD STATISTICS when creating the statistics) wins.
+ * Other wise the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns,
+ *     but one has 100 buckets and the other one has 1000 buckets (thus
+ *     likely providing better estimates), this is not currently
+ *     considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list,
+ *     another one with just a histogram and a third one with both,
+ *     this is not considered.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts,
+ *     so if there are multiple clauses on a single attribute, this
+ *     still counts as a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example
+ *     equality clauses probably work better with MCV lists than with
+ *     histograms. But IS [NOT] NULL conditions may often work better
+ *     with histograms (thanks to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
+ * selected as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list
+ * of clauses into two parts - conditions that are compatible with the
+ * selected stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while
+ * the last condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting
+ * conditions instead of just referenced attributes), but eventually
+ * the best option should be to combine multiple statistics. But that's
+ * much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing
+ *      the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses,
+ *      because 'dependencies' will probably work only with equality
+ *      clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements)
+	 * and for each one count the referenced attributes (encoded in
+	 * the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses
+ * that will be evaluated using the chosen statistics, and the remaining
+ * clauses (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
+					List *clauses, Oid varRelid, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes covered by the stats, so we can
+	 * do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(clause, varRelid, NULL,
+									&attnums, sjinfo, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list
+		 * of mv-compatible clauses. Otherwise, keep it in the list of
+		 * 'regular' clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible
+	 * with the chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+/*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
  * into simple_rte_array) and a bitmap of attributes. This is then
@@ -930,8 +1241,12 @@ count_mv_attnums(List *clauses, Oid varRelid, SpecialJoinInfo *sjinfo)
  */
 static bool
 clause_is_mv_compatible(Node *clause, Oid varRelid,
-						Index *relid, AttrNumber *attnum, SpecialJoinInfo *sjinfo)
+						Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
+						int types)
 {
+	Relids clause_relids;
+	Relids left_relids;
+	Relids right_relids;
 
 	if (IsA(clause, RestrictInfo))
 	{
@@ -941,83 +1256,176 @@ clause_is_mv_compatible(Node *clause, Oid varRelid,
 		if (rinfo->pseudoconstant)
 			return false;
 
-		/* no support for OR clauses at this point */
-		if (rinfo->orclause)
-			return false;
-
 		/* get the actual clause from the RestrictInfo (it's not an OR clause) */
 		clause = (Node*)rinfo->clause;
 
-		/* only simple opclauses are compatible with multivariate stats */
-		if (! is_opclause(clause))
-			return false;
-
 		/* we don't support join conditions at this moment */
 		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
 			return false;
 
+		clause_relids = rinfo->clause_relids;
+		left_relids = rinfo->left_relids;
+		right_relids = rinfo->right_relids;
+	}
+	else if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+	{
+		left_relids = pull_varnos(get_leftop((Expr*)clause));
+		right_relids = pull_varnos(get_rightop((Expr*)clause));
+
+		clause_relids = bms_union(left_relids,
+								  right_relids);
+	}
+	else
+	{
+		/* Not a binary opclause, so mark left/right relid sets as empty */
+		left_relids = NULL;
+		right_relids = NULL;
+		/* and get the total relid set the hard way */
+		clause_relids = pull_varnos((Node *) clause);
+	}
+
+	/*
+	 * Only simple opclauses and IS NULL tests are compatible with
+	 * multivariate stats at this point.
+	 */
+	if ((is_opclause(clause))
+		&& (list_length(((OpExpr *) clause)->args) == 2))
+	{
+		OpExpr	   *expr = (OpExpr *) clause;
+		bool		varonleft = true;
+		bool		ok;
+
 		/* is it 'variable op constant' ? */
-		if (list_length(((OpExpr *) clause)->args) == 2)
+
+		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
+			(is_pseudo_constant_clause_relids(lsecond(expr->args),
+											  right_relids) ||
+			(varonleft = false,
+			is_pseudo_constant_clause_relids(linitial(expr->args),
+											 left_relids)));
+
+		if (ok)
 		{
-			OpExpr	   *expr = (OpExpr *) clause;
-			bool		varonleft = true;
-			bool		ok;
+			Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
 
-			ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
-				(is_pseudo_constant_clause_relids(lsecond(expr->args),
-												rinfo->right_relids) ||
-				(varonleft = false,
-				is_pseudo_constant_clause_relids(linitial(expr->args),
-												rinfo->left_relids)));
+			/*
+			 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+			 * (return NULL).
+			 *
+			 * TODO Maybe use examine_variable() would fix that?
+			 */
+			if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+				return false;
 
-			if (ok)
-			{
-				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+			/*
+			 * Only consider this variable if (varRelid == 0) or when the varno
+			 * matches varRelid (see explanation at clause_selectivity).
+			 *
+			 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+			 *       part seems to be enforced by treat_as_join_clause().
+			 */
+			if (! ((varRelid == 0) || (varRelid == var->varno)))
+				return false;
 
-				/*
-				 * Simple variables only - otherwise the planner_rt_fetch seems to fail
-				 * (return NULL).
-				 *
-				 * TODO Maybe use examine_variable() would fix that?
-				 */
-				if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
-					return false;
+			/* Also skip special varno values, and system attributes ... */
+			if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+				return false;
 
-				/*
-				 * Only consider this variable if (varRelid == 0) or when the varno
-				 * matches varRelid (see explanation at clause_selectivity).
-				 *
-				 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
-				 *       part seems to be enforced by treat_as_join_clause().
-				 */
-				if (! ((varRelid == 0) || (varRelid == var->varno)))
-					return false;
+			/* Lookup info about the base relation (we need to pass the OID out) */
+			if (relid != NULL)
+				*relid = var->varno;
 
-				/* Also skip special varno values, and system attributes ... */
-				if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
-					return false;
+			/*
+			 * If it's not a "<" or ">" or "=" operator, just ignore the
+			 * clause. Otherwise note the relid and attnum for the variable.
+			 * This uses the function for estimating selectivity, ont the
+			 * operator directly (a bit awkward, but well ...).
+			 */
+			switch (get_oprrest(expr->opno))
+				{
+					case F_SCALARLTSEL:
+					case F_SCALARGTSEL:
+						/* not compatible with functional dependencies */
+						if (types & MV_CLAUSE_TYPE_MCV)
+						{
+							*attnums = bms_add_member(*attnums, var->varattno);
+							return (types & MV_CLAUSE_TYPE_MCV);
+						}
+						return false;
+
+					case F_EQSEL:
+						*attnums = bms_add_member(*attnums, var->varattno);
+						return true;
+				}
+		}
+	}
+	else if (IsA(clause, NullTest)
+			 && IsA(((NullTest*)clause)->arg, Var))
+	{
+		Var * var = (Var*)((NullTest*)clause)->arg;
 
-				if (relid)
-					*relid = var->varno;
+		/*
+		 * Simple variables only - otherwise the planner_rt_fetch seems to fail
+		 * (return NULL).
+		 *
+		 * TODO Maybe use examine_variable() would fix that?
+		 */
+		if (! (IsA(var, Var) && (varRelid == 0 || varRelid == var->varno)))
+			return false;
 
-				/*
-				 * If it's not a "<" or ">" or "=" operator, just ignore the
-				 * clause. Otherwise note the relid and attnum for the variable.
-				 * This uses the function for estimating selectivity, ont the
-				 * operator directly (a bit awkward, but well ...).
-				 */
-				switch (get_oprrest(expr->opno))
-					{
-						case F_EQSEL:
-							*attnum = var->varattno;
-							return true;
-					}
-			}
+		/*
+		 * Only consider this variable if (varRelid == 0) or when the varno
+		 * matches varRelid (see explanation at clause_selectivity).
+		 *
+		 * FIXME I suspect this may not be really necessary. The (varRelid == 0)
+		 *       part seems to be enforced by treat_as_join_clause().
+		 */
+		if (! ((varRelid == 0) || (varRelid == var->varno)))
+			return false;
+
+		/* Also skip special varno values, and system attributes ... */
+		if ((IS_SPECIAL_VARNO(var->varno)) || (! AttrNumberIsForUserDefinedAttr(var->varattno)))
+			return false;
+
+		/* Lookup info about the base relation (we need to pass the OID out) */
+		if (relid != NULL)
+				*relid = var->varno;
+
+		*attnums = bms_add_member(*attnums, var->varattno);
+
+		return true;
+	}
+	else if (or_clause(clause) || and_clause(clause))
+	{
+		/*
+		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses
+		 *      are supported and some are not, and treat all supported
+		 *      subclauses as a single clause, compute it's selectivity
+		 *      using mv stats, and compute the total selectivity using
+		 *      the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the
+		 *      orclause with nested RestrictInfo - we won't have to
+		 *      call pull_varnos() for each clause, saving time. 
+		 */
+		Bitmapset *tmp = NULL;
+		ListCell *l;
+		foreach (l, ((BoolExpr*)clause)->args)
+		{
+			if (! clause_is_mv_compatible((Node*)lfirst(l),
+						varRelid, relid, &tmp, sjinfo, types))
+				return false;
 		}
+
+		/* add the attnums from the OR-clause to the set of attnums */
+		*attnums = bms_join(*attnums, tmp);
+
+		return true;
 	}
 
 	return false;
-
 }
 
 /*
@@ -1240,6 +1648,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+			return true;
 	}
 
 	return false;
@@ -1535,25 +1946,39 @@ fdeps_filter_clauses(PlannerInfo *root,
 
 	foreach (lc, clauses)
 	{
-		AttrNumber	attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
 
-		if (! clause_is_mv_compatible(clause, varRelid, relid,
-									  &attnum, sjinfo))
+		if (! clause_is_mv_compatible(clause, varRelid, relid, &attnums,
+									  sjinfo, MV_CLAUSE_TYPE_FDEP))
 
 			/* clause incompatible with functional dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
-		else if (! bms_is_member(attnum, deps_attnums))
+		else if (bms_num_members(attnums) > 1)
+
+			/*
+			 * clause referencing multiple attributes (strange, should
+			 * this be handled by clause_is_mv_compatible directly)
+			 */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
 
 			/* clause not covered by the dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
 		else
 		{
+			/* ok, clause compatible with existing dependencies */
+			Assert(bms_num_members(attnums) == 1);
+
 			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+										bms_singleton_member(attnums));
 		}
+
+		bms_free(attnums);
 	}
 
 	return clause_attnums;
@@ -1591,3 +2016,454 @@ get_varattnos(Node * node, Index relid)
 
 	return result;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/* frequency of the lowest MCV item */
+	*lowsel = 1.0;
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 *
+	 * FIXME This would probably deserve a refactoring, I guess. Unify
+	 *       the two loops and put the checks inside, or something like
+	 *       that.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			/* operator */
+			FmgrInfo		opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	ltproc, gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype,
+										TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * find the lowest selectivity in the MCV
+					 * FIXME Maybe not the best place do do this (in for all clauses).
+					 */
+					if (item->frequency < *lowsel)
+						*lowsel = item->frequency;
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* TODO consider bsearch here (list is sorted by values)
+					 * TODO handle other operators too (LT, GT)
+					 * TODO identify "full match" when the clauses fully
+					 *      match the whole MCV list (so that checking the
+					 *      histogram is not needed)
+					 */
+					if (oprrest == F_EQSEL)
+					{
+						/*
+						 * We don't care about isgt in equality, because it does not
+						 * matter whether it's (var = const) or (const = var).
+						 */
+						bool match = DatumGetBool(FunctionCall2Coll(&opproc,
+															 DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															 item->values[idx]));
+
+						if (match)
+							eqmatches = bms_add_member(eqmatches, idx);
+
+						mismatch = (! match);
+					}
+					else if (oprrest == F_SCALARLTSEL)	/* column < constant */
+					{
+
+						if (! isgt)	/* (var < const) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+						} /* (get_oprrest(expr->opno) == F_SCALARLTSEL) */
+						else	/* (const < var) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+					}
+					else if (oprrest == F_SCALARGTSEL)	/* column > constant */
+					{
+
+						if (! isgt)	/* (var > const) */
+						{
+							/*
+							 * First check whether the constant is above the upper boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+						}
+						else /* (const > var) */
+						{
+							/*
+							 * First check whether the constant is below the lower boundary (in
+							 * that case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 item->values[idx],
+																 cst->constvalue));
+						}
+
+					} /* (get_oprrest(expr->opno) == F_SCALARGTSEL) */
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/*
+				 * find the lowest selectivity in the MCV
+				 * FIXME Maybe not the best place do do this (in for all clauses).
+				 */
+				if (item->frequency < *lowsel)
+					*lowsel = item->frequency;
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! mcvlist->items[i]->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (mcvlist->items[i]->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b9de71d..a92f889 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -425,9 +425,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.mcv b/src/backend/utils/mvstats/README.mcv
new file mode 100644
index 0000000..e93cfe4
--- /dev/null
+++ b/src/backend/utils/mvstats/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+    (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+    (1) there's an equality condition on all attributes of the statistic
+
+    (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+     f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+     of the actual values. This type of MCV list will be useful only for
+     estimating equality clauses, and will reduce space requirements for large
+     varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+     building the histogram at all), except for doing this decision manually in
+     ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+    SELECT * FROM pg_mv_mcv_items();
+
+It has two input parameters:
+
+    oid   - OID of the MCV list (pg_mv_statistic.staoid)
+
+and produces a table with these columns:
+
+    - item ID (0...nitems-1)
+    - values (string array)
+    - nulls only (boolean array)
+    - frequency (double precision)
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index a38ea7b..5c5c59a 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,9 +8,50 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-Currently we only have one kind of multivariate statistics - soft functional
-dependencies, and we use it to improve estimates of equality clauses. See
-README.dependencies for details.
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) MCV lists (README.mcv)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+    (b) MCV list - equality and inequality clauses, IS [NOT] NULL, AND/OR
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
 
 
 Selectivity estimation
@@ -23,14 +64,48 @@ When estimating selectivity, we aim to achieve several things:
     (b) minimize the overhead, especially when no suitable multivariate stats
         exist (so if you are not using multivariate stats, there's no overhead)
 
-This clauselist_selectivity() performs several inexpensive checks first, before
+Thus clauselist_selectivity() performs several inexpensive checks first, before
 even attempting to do the more expensive estimation.
 
     (1) check if there are multivariate stats on the relation
 
-    (2) check there are at least two attributes referenced by clauses compatible
-        with multivariate statistics (equality clauses for func. dependencies)
+    (2) check that there are functional dependencies on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equality clauses for func. dependencies)
 
     (3) perform reduction of equality clauses using func. dependencies
 
-    (4) estimate the reduced list of clauses using regular statistics
+    (4) check that there are multivariate MCV lists on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equalities, inequalities, etc.)
+
+    (5) find the best multivariate statistics (matching the most conditions)
+        and use it to compute the estimate
+
+    (6) estimate the remaining clauses (not estimated using multivariate stats)
+        using the regular per-column statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
+
+
+Further (possibly crazy) ideas
+------------------------------
+
+Currently the clauses are only estimated using a single statistics, even if
+there are multiple candidate statistics - for example assume we have statistics
+on (a,b,c) and (b,c,d), and estimate conditions
+
+    (b = 1) AND (c = 2)
+
+Then both statistics may be used, but we only use one of them. Maybe we could
+use compute estimates using all candidate stats, and somehow aggregate them
+into the final estimate by using average or median.
+
+Some stats may give better estimates than others, but it's very difficult to say
+in advance which stats are the best (it depends on the number of buckets, number
+of additional columns not referenced in the clauses, type of condition etc.).
+
+But of course, this may result in expensive estimation (CPU-wise).
+
+So we might add a GUC to choose between a simple (single statistics) and thus
+multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd200bc..d1da714 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 6d5465b..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..551c934
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1094 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't
+ * allow more than 8k MCV items (see list max_mcv_items). We might
+ * increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the high compression as with histograms,
+ * because we're not doing any bucket splits etc. (which is the source
+ * of high redundancy there), but we need to do it anyway as we need
+ * to serialize varlena values etc. We might invent another way to
+ * serialize MCV lists, but let's keep it consistent.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* do not exceed UINT16_MAX */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval || (info[i].typlen > 0))
+			/* by value pased by reference, but fixed length */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * Inverse to serialize_mv_mcvlist() - see the comment there.
+ *
+ * We'll do full deserialization, because we don't really expect high
+ * duplication of values so the caching may not be as efficient as with
+ * histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			val = item->values[i];
+			valout = FunctionCall1(&fmgrinfo[i], val);
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 4f106c3..6339631 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,8 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2128,6 +2129,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2137,10 +2140,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index a568a07..fd7107d 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -37,15 +37,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -61,13 +67,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					7
+#define Natts_pg_mv_statistic					11
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
-#define Anum_pg_mv_statistic_deps_built			5
-#define Anum_pg_mv_statistic_stakeys			6
-#define Anum_pg_mv_statistic_stadeps			7
+#define Anum_pg_mv_statistic_mcv_enabled		5
+#define Anum_pg_mv_statistic_mcv_max_items		6
+#define Anum_pg_mv_statistic_deps_built			7
+#define Anum_pg_mv_statistic_mcv_built			8
+#define Anum_pg_mv_statistic_stakeys			9
+#define Anum_pg_mv_statistic_stadeps			10
+#define Anum_pg_mv_statistic_stamcv				11
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 20d565c..66b4bcd 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2670,6 +2670,10 @@ DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index de86d01..5ae6b3c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -619,9 +619,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index cc43a79..4535db7 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -51,30 +51,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..56748e3
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON mcv_list (unknown_column) WITH (mcv);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON mcv_list (a) WITH (mcv);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a, b) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+ERROR:  max number of MCV items is 8192
+-- correct command
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 84b4425..66071d8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1373,7 +1373,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     s.staname,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f2ffb8..85d94f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 097a04f..6584d73 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -163,3 +163,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..af4c9f4
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON mcv_list (unknown_column) WITH (mcv);
+
+-- single column
+CREATE STATISTICS s1 ON mcv_list (a) WITH (mcv);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a) WITH (mcv);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a, b) WITH (mcv);
+
+-- unknown option
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (unknown_option);
+
+-- missing MCV statistics
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+
+-- correct command
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.1.0

0005-multivariate-histograms.patchbinary/octet-stream; name=0005-multivariate-histograms.patchDownload

From 820bbb5d00ee143d32dac16f48776346a2ddd81c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/9] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 doc/src/sgml/ref/create_statistics.sgml    |   18 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   44 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  606 ++++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/README.histogram |  287 ++++
 src/backend/utils/mvstats/README.stats     |    2 +
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2032 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  136 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 21 files changed, 3570 insertions(+), 41 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.histogram
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 193e4b0..fd3382e 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -133,6 +133,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>max_buckets</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of histogram buckets.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>max_mcv_items</> (<type>integer</>)</term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d570ee..6afdee0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -167,7 +167,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 90bfaed..b974655 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -137,12 +137,15 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -241,6 +244,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("maximum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -249,10 +275,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -260,6 +286,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -279,11 +310,14 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e3983fd..d3a96f0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1978,10 +1978,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index ce7d231..647212a 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(Node *clause, Oid varRelid,
 							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -76,6 +77,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -83,6 +86,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
@@ -117,6 +126,7 @@ static Bitmapset * get_varattnos(Node * node, Index relid);
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -145,7 +155,7 @@ static Bitmapset * get_varattnos(Node * node, Index relid);
  *
  * First we try to reduce the list of clauses by applying (soft) functional
  * dependencies, and then we try to estimate the selectivity of the reduced
- * list of clauses using the multivariate MCV list.
+ * list of clauses using the multivariate MCV list and histograms.
  *
  * Finally we remove the portion of clauses estimated using multivariate stats,
  * and process the rest of the clauses using the regular per-column stats.
@@ -232,12 +242,13 @@ clauselist_selectivity(PlannerInfo *root,
 	 * with the multivariate code and simply skip to estimation using the
 	 * regular per-column stats.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
-		(count_mv_attnums(clauses, varRelid, sjinfo, MV_CLAUSE_TYPE_MCV) >= 2))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) &&
+		(count_mv_attnums(clauses, varRelid, sjinfo,
+						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
 		/* collect attributes from the compatible conditions */
 		Bitmapset *mvattnums = collect_mv_attnums(clauses, varRelid, NULL, sjinfo,
-												  MV_CLAUSE_TYPE_MCV);
+											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
 		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
@@ -249,8 +260,8 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* split the clauselist into regular and mv-clauses */
 			clauses = clauselist_mv_split(root, sjinfo, clauses,
-									varRelid, &mvclauses, mvstat,
-									MV_CLAUSE_TYPE_MCV);
+										  varRelid, &mvclauses, mvstat,
+										  (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 
 			/* we've chosen the histogram to match the clauses */
 			Assert(mvclauses != NIL);
@@ -962,6 +973,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -975,9 +987,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
+	 *       selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1136,7 +1163,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1296,7 +1323,6 @@ clause_is_mv_compatible(Node *clause, Oid varRelid,
 		bool		ok;
 
 		/* is it 'variable op constant' ? */
-
 		ok = (bms_membership(clause_relids) == BMS_SINGLETON) &&
 			(is_pseudo_constant_clause_relids(lsecond(expr->args),
 											  right_relids) ||
@@ -1346,10 +1372,10 @@ clause_is_mv_compatible(Node *clause, Oid varRelid,
 					case F_SCALARLTSEL:
 					case F_SCALARGTSEL:
 						/* not compatible with functional dependencies */
-						if (types & MV_CLAUSE_TYPE_MCV)
+						if (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST))
 						{
 							*attnums = bms_add_member(*attnums, var->varattno);
-							return (types & MV_CLAUSE_TYPE_MCV);
+							return (types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
 						}
 						return false;
 
@@ -1651,6 +1677,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2467,3 +2496,556 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+#define HIST_CACHE_NOT_FOUND		0x00
+#define HIST_CACHE_FALSE			0x01
+#define HIST_CACHE_TRUE				0x03
+#define HIST_CACHE_MASK				0x02
+
+static char bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache);
+
+static char bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache, bool isgt);
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				/*
+				 * TODO Fetch only when really needed (probably for equality only)
+				 *
+				 * TODO Technically either lt/gt is sufficient.
+				 *
+				 * FIXME The code in analyze.c creates histograms only for types
+				 *       with enough ordering (by calling get_sort_group_operators).
+				 *       Is this the same assumption, i.e. are we certain that we
+				 *       get the ltproc/gtproc every time we ask? Or are there types
+				 *       where get_sort_group_operators returns ltopr and here we
+				 *       get nothing?
+				 */
+				TypeCacheEntry *typecache
+					= lookup_type_cache(var->vartype, TYPECACHE_EQ_OPR | TYPECACHE_LT_OPR
+																	   | TYPECACHE_GT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					char res = MVSTATS_MATCH_NONE;
+
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+					bool mininclude, maxinclude;
+					int minidx, maxidx;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minidx = bucket->min[idx];
+					maxidx = bucket->max[idx];
+
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mininclude = bucket->min_inclusive[idx];
+					maxinclude = bucket->max_inclusive[idx];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* Var < Const */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															   minval, maxval,
+															   minidx, maxidx,
+															   mininclude, maxinclude,
+															   callcache, isgt);
+							break;
+
+						case F_SCALARGTSEL:	/* Const < Var */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															  minval, maxval,
+															  minidx, maxidx,
+															  mininclude, maxinclude,
+															  callcache, isgt);
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the
+							 * lt operator, and we also check for equality with the boundaries.
+							 */
+
+							res = bucket_contains_value(ltproc, cst->constvalue,
+														minval, maxval,
+														minidx, maxidx,
+														mininclude, maxinclude,
+														callcache);
+							break;
+					}
+
+					UPDATE_RESULT(matches[i], res, is_or);
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
+
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache)
+{
+	bool a, b;
+
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/*
+	 * First some quick checks on equality - if any of the boundaries equals,
+	 * we have a partial match (so no need to call the comparator).
+	 */
+	if (((min_value == constvalue) && (min_include)) ||
+		((max_value == constvalue) && (max_include)))
+		return MVSTATS_MATCH_PARTIAL;
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	/*
+	 * If result for the bucket lower bound not in cache, evaluate the function
+	 * and store the result in the cache.
+	 */
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, min_value));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/* And do the same for the upper bound. */
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, max_value));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	return (a ^ b) ? MVSTATS_MATCH_PARTIAL : MVSTATS_MATCH_NONE;
+}
+
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache, bool isgt)
+{
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	bool a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	bool b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   min_value,
+										   constvalue));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   max_value,
+										   constvalue));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/*
+	 * Now, we need to combine both results into the final answer, and we need
+	 * to be careful about the 'isgt' variable which kinda inverts the meaning.
+	 *
+	 * First, we handle the case when each boundary returns different results.
+	 * In that case the outcome can only be 'partial' match.
+	 */
+	 if (a != b)
+		return MVSTATS_MATCH_PARTIAL;
+
+	/*
+	 * When the results are the same, then it depends on the 'isgt' value. There
+	 * are four options:
+	 *
+	 * isgt=false a=b=true  => full match
+	 * isgt=false a=b=false => empty
+	 * isgt=true  a=b=true  => empty
+	 * isgt=true  a=b=false => full match
+	 *
+	 * We'll cheat a bit, because we know that (a=b) so we'll use just one of them.
+	 */
+	if (isgt)
+		return (!a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+	else
+		return ( a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a92f889..d46aed2 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -426,10 +426,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.histogram b/src/backend/utils/mvstats/README.histogram
new file mode 100644
index 0000000..8234d2c
--- /dev/null
+++ b/src/backend/utils/mvstats/README.histogram
@@ -0,0 +1,287 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+    * frequency (fraction of tuples in the bucket)
+    * whether the boundaries are inclusive or exclusive
+    * whether the dimension contains only NULL values
+    * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+    (a) FULL_MATCH - The bucket definitely matches the clause.
+
+    (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+                        the tuples it represents.
+
+    (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+    WHERE (a < 10) => FULL_MATCH (all range values are below
+                      10, so the whole bucket matches)
+
+    WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+                      the clause, but we don't know how many)
+
+    WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+                      no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+    (a) create an initial bucket (containing all sample rows)
+
+    (b) create NULL buckets (by splitting the initial bucket)
+
+    (c) repeat
+    
+        (1) choose bucket to split next
+
+        (2) terminate if no bucket that might be split found, or if we've
+            reached the maximum number of buckets (16384)
+
+        (3) choose dimension to partition the bucket by
+
+        (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split.
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency. We also need to produce "regular" buckets, because buckets
+with one "side" much longer than the others are very likely to match a lot of
+conditions (which increases error, even if the bucket frequency is very low).
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket().
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL. 
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_mv_statistic table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+    2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+    (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+    2*N + K       = 1007
+    (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+    1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+    SELECT * FROM pg_mv_histogram_buckets();
+
+It has two input parameters:
+
+    oid   - OID of the histogram (pg_mv_statistic.staoid)
+    otype - type of output
+
+and produces a table with these columns:
+
+    - bucket ID                (0...nbuckets-1)
+    - lower bucket boundaries  (string array)
+    - upper bucket boundaries  (string array)
+    - nulls only dimensions    (boolean array)
+    - lower boundary inclusive (boolean array)
+    - upper boundary includive (boolean array)
+    - frequency                (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+    - 0 - values stored in the histogram, encoded as text
+    - 1 - indexes into the deduplicated arrays
+    - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 5c5c59a..3e4f4d1 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -18,6 +18,8 @@ Currently we only have two kinds of multivariate statistics
 
     (b) MCV lists (README.mcv)
 
+    (c) multivariate histograms (README.histogram)
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d1da714..ffb76f4 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..9e5620a
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2032 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size.
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket
+ * for more details about the algorithm.
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later
+	 * to select dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		int				j;
+		int				nvals;
+		Datum		   *tmp;
+
+		SortSupportData	ssup;
+		StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		nvals = 0;
+		tmp = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+		for (j = 0; j < numrows; j++)
+		{
+			bool	isnull;
+
+			/* remember the index of the sample row, to make the partitioning simpler */
+			Datum	value = heap_getattr(rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isnull);
+
+			if (isnull)
+				continue;
+
+			tmp[nvals++] = value;
+		}
+
+		/* do the sort and stuff only if there are non-NULL values */
+		if (nvals > 0)
+		{
+			/* sort the array of values */
+			qsort_arg((void *) tmp, nvals, sizeof(Datum),
+					  compare_scalars_simple, (void *) &ssup);
+
+			/* count distinct values */
+			ndistvalues[i] = 1;
+			for (j = 1; j < nvals; j++)
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+					ndistvalues[i] += 1;
+
+			/* FIXME allocate only needed space (count ndistinct first) */
+			distvalues[i] = (Datum*)palloc0(sizeof(Datum) * ndistvalues[i]);
+
+			/* now collect distinct values into the array */
+			distvalues[i][0] = tmp[0];
+			ndistvalues[i] = 1;
+
+			for (j = 1; j < nvals; j++)
+			{
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+				{
+					distvalues[i][ndistvalues[i]] = tmp[j];
+					ndistvalues[i] += 1;
+				}
+			}
+		}
+
+		pfree(tmp);
+	}
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats,
+							   ndistvalues, distvalues);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in
+		 * case some of the rows were used for MCV (and thus are missing
+		 * from the histogram).
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (10 * 1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 10MB (%ld > %d)",
+					total_length, (10 * 1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typlen > 0)
+			{
+				/* pased by value or reference, but fixed length */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary
+ * values deduplicated, so that it's possible to optimize the estimation
+ * part by caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* now let's allocate a single buffer for all the values and counts */
+
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+	for (i = 0; i < ndims; i++)
+	{
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+	}
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+				sizeof(MVSerializedBucket) +		/* bucket pointer */
+				sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval && info[i].typlen == sizeof(Datum))
+		{
+			/* passed by value / Datum - simply reuse the array */
+			histogram->values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typbyval)
+			{
+				/* pased by value, but smaller than Datum */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= *BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size. We
+ * select the bucket with the highest number of distinct values, and
+ * then split it by the longest dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this
+ * is used to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this
+ *       contains values for all the tuples from the sample, not just
+ *       the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *      For example use the "bucket volume" (product of dimension
+ *      lengths) to select the bucket.
+ *
+ *      We need buckets containing about the same number of tuples (so
+ *      about the same frequency), as that limits the error when we
+ *      match the bucket partially (in that case use 1/2 the bucket).
+ *
+ *      We also need buckets with "regular" size, i.e. not "narrow" in
+ *      some dimensions and "wide" in the others, because that makes
+ *      partial matches more likely and increases the estimation error,
+ *      especially when the clauses match many buckets partially. This
+ *      is especially serious for OR-clauses, because in that case any
+ *      of them may add the bucket as a (partial) match. With AND-clauses
+ *      all the clauses have to match the bucket, which makes this issue
+ *      somewhat less pressing.
+ *
+ *      For example this table:
+ *
+ *          CREATE TABLE t AS SELECT i AS a, i AS b
+ *                              FROM generate_series(1,1000000) s(i);
+ *          ALTER TABLE t ADD STATISTICS (histogram) ON (a,b);
+ *          ANALYZE t;
+ *
+ *      It's a very specific (and perhaps artificial) example, because
+ *      every bucket always has exactly the same number of distinct
+ *      values in all dimensions, which makes the partitioning tricky.
+ *
+ *      Then:
+ *
+ *          SELECT * FROM t WHERE a < 10 AND b < 10;
+ *
+ *      is estimated to return ~120 rows, while in reality it returns 9.
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *                      (actual time=0.185..270.774 rows=9 loops=1)
+ *         Filter: ((a < 10) AND (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      while the query using OR clauses is estimated like this:
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *                      (actual time=0.118..189.919 rows=9 loops=1)
+ *         Filter: ((a < 10) OR (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      which is clearly much worse. This happens because the histogram
+ *      contains buckets like this:
+ *
+ *          bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ *      i.e. the length of "a" dimension is (30310-3)=30307, while the
+ *      length of "b" is (30593-30134)=459. So the "b" dimension is much
+ *      narrower than "a". Of course, there are buckets where "b" is the
+ *      wider dimension.
+ *
+ *      This is partially mitigated by selecting the "longest" dimension
+ *      in partition_bucket() but that only happens after we already
+ *      selected the bucket. So if we never select the bucket, we can't
+ *      really fix it there.
+ *
+ *      The other reason why this particular example behaves so poorly
+ *      is due to the way we split the partition in partition_bucket().
+ *      Currently we attempt to divide the bucket into two parts with
+ *      the same number of sampled tuples (frequency), but that does not
+ *      work well when all the tuples are squashed on one end of the
+ *      bucket (e.g. exactly at the diagonal, as a=b). In that case we
+ *      split the bucket into a tiny bucket on the diagonal, and a huge
+ *      remaining part of the bucket, which is still going to be narrow
+ *      and we're unlikely to fix that.
+ *
+ *      So perhaps we need two partitioning strategies - one aiming to
+ *      split buckets with high frequency (number of sampled rows), the
+ *      other aiming to split "large" buckets. And alternating between
+ *      them, somehow.
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ *      histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest
+ * bucket dimension, measured using the array of distinct values built
+ * at the very beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly
+ * distributed, and then use this to measure length. It's essentially
+ * a number of distinct values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts
+ * with roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning
+ * the new bucket (essentially shrinking the existing one in-place and
+ * returning the other "half" as a new bucket). The caller is responsible
+ * for adding the new bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case
+ *      of strongly dependent columns - e.g. y=x). The current algorithm
+ *      minimizes this, but may still happen for perfectly dependent
+ *      examples (when all the dimensions have equal length, the first
+ *      one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	// int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split.
+	 */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* sort support for the bsearch_comparator */
+		ssup_private = &ssup;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		b = (Datum*)bsearch(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute
+		 *       here - we can start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram (or if there's no
+ * statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ * 
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options
+ * skew the lengths by distributing the distinct values uniformly. For
+ * data types without a clear meaning of 'distance' (e.g. strings) that
+ * is not a big deal, but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_size = 1.0;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(9 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+		values[3] = (char *) palloc0(1024 * sizeof(char));
+		values[4] = (char *) palloc0(1024 * sizeof(char));
+		values[5] = (char *) palloc0(1024 * sizeof(char));
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/*
+		 * currently we only print array of indexes, but the deduplicated
+		 * values should be sorted, so this is actually quite useful
+		 *
+		 * TODO print the actual min/max values, using the output
+		 *      function of the attribute type
+		 */
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bucket_size *= (bucket->max[i] - bucket->min[i]) * 1.0
+											/ (histogram->nvalues[i]-1);
+
+			/* print the actual values, i.e. use output function etc. */
+			if (otype == 0)
+			{
+				Datum minval, maxval;
+				Datum minout, maxout;
+
+				format = "%s, %s";
+				if (i == 0)
+					format = "{%s%s";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %s}";
+
+				minval = histogram->values[i][bucket->min[i]];
+				minout = FunctionCall1(&fmgrinfo[i], minval);
+
+				maxval = histogram->values[i][bucket->max[i]];
+				maxout = FunctionCall1(&fmgrinfo[i], maxval);
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], DatumGetPointer(minout));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], DatumGetPointer(maxout));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else if (otype == 1)
+			{
+				format = "%s, %d";
+				if (i == 0)
+					format = "{%s%d";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %d}";
+
+				snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else
+			{
+				format = "%s, %f";
+				if (i == 0)
+					format = "{%s%f";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %f}";
+
+				snprintf(buff, 1024, format, values[1],
+						 bucket->min[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2],
+						bucket->max[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == histogram->ndimensions-1)
+				format = "%s, %s}";
+
+			snprintf(buff, 1024, format, values[3], bucket->nullsonly[i] ? "t" : "f");
+			strncpy(values[3], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[4], bucket->min_inclusive[i] ? "t" : "f");
+			strncpy(values[4], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[5], bucket->max_inclusive[i] ? "t" : "f");
+			strncpy(values[5], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_size);	/* density */
+		snprintf(values[8], 64, "%f", bucket_size);	/* bucket_size */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+		pfree(values[4]);
+		pfree(values[5]);
+		pfree(values[6]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int i, j;
+
+	float ffull = 0, fpartial = 0;
+	int nfull = 0, npartial = 0;
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		char ranges[1024];
+
+		if (! matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		memset(ranges, 0, sizeof(ranges));
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			sprintf(ranges, "%s [%d %d]", ranges,
+										  DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+										  DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, ranges, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 6339631..3543239 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,9 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2154,8 +2154,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 9));
+							PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index fd7107d..a5945af 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,13 +38,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -52,6 +55,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -67,17 +71,21 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					11
+#define Natts_pg_mv_statistic					15
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
 #define Anum_pg_mv_statistic_mcv_enabled		5
-#define Anum_pg_mv_statistic_mcv_max_items		6
-#define Anum_pg_mv_statistic_deps_built			7
-#define Anum_pg_mv_statistic_mcv_built			8
-#define Anum_pg_mv_statistic_stakeys			9
-#define Anum_pg_mv_statistic_stadeps			10
-#define Anum_pg_mv_statistic_stamcv				11
+#define Anum_pg_mv_statistic_hist_enabled		6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_hist_max_buckets	8
+#define Anum_pg_mv_statistic_deps_built			9
+#define Anum_pg_mv_statistic_mcv_built			10
+#define Anum_pg_mv_statistic_hist_built			11
+#define Anum_pg_mv_statistic_stakeys			12
+#define Anum_pg_mv_statistic_stadeps			13
+#define Anum_pg_mv_statistic_stamcv				14
+#define Anum_pg_mv_statistic_stahist			15
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 66b4bcd..7e915bd 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2674,6 +2674,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_size}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ae6b3c..46bece6 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -620,10 +620,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 4535db7..f05a517 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -92,6 +92,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -99,20 +216,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -121,6 +243,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -130,10 +254,20 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..a34edb8
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON mv_histogram (unknown_column) WITH (histogram);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON mv_histogram (a) WITH (histogram);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a, b) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+ERROR:  maximum number of buckets is 16384
+-- correct command
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 66071d8..1a1a4ca 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1375,7 +1375,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 85d94f1..a885235 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 6584d73..2efdcd7 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -164,3 +164,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..02f49b4
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON mv_histogram (unknown_column) WITH (histogram);
+
+-- single column
+CREATE STATISTICS s1 ON mv_histogram (a) WITH (histogram);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a) WITH (histogram);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a, b) WITH (histogram);
+
+-- unknown option
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (unknown_option);
+
+-- missing histogram statistics
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+
+-- invalid max_buckets value / too low
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+
+-- invalid max_buckets value / too high
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+
+-- correct command
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.1.0

0006-multi-statistics-estimation.patchbinary/octet-stream; name=0006-multi-statistics-estimation.patchDownload

From 0e10f5f26e546d835b493f84a3ebe2c904390228 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/9] multi-statistics estimation

The general idea is that a probability (which
is what selectivity is) can be split into a product of
conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part
may be simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute
the original probability.

The implementation works in the other direction, though.
We know what probability P(A & B & C) we need to compute,
and also what statistics are available.

So we search for a combinations of statistics, covering
the clauses in an optimal way (most clauses covered, most
dependencies exploited).

There are two possible approaches - exhaustive and greedy.
The exhaustive one walks through all permutations of
stats using dynamic programming, so it's guaranteed to
find the optimal solution, but it soon gets very slow as
it's roughly O(N!). The dynamic programming may improve
that a bit, but it's still far too expensive for large
numbers of statistics (on a single table).

The greedy algorithm is very simple - in every step choose
the best solution. That may not guarantee the best solution
globally (but maybe it does?), but it only needs N steps
to find the solution, so it's very fast (processing the
selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with
respect to runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply
them to the clauses using the conditional probabilities.
We process the selected stats one by one, and for each
we select the estimated clauses and conditions. See
clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to
be covered by a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single
multivariate statistics.

Clauses not covered by a single statistics at this level
will be passed to clause_selectivity() but this will treat
them as a collection of simpler clauses (connected by AND
or OR), and the clauses from the previous level will be
used as conditions.

So using the same example, the last clause will be passed
to clause_selectivity() with 'clause1' and 'clause2' as
conditions, and it will be processed using multivariate
stats if possible.

The other limitation is that all the expressions have to
be mv-compatible, i.e. there can't be a mix of expressions.
If this is violated, the clause may be passed to the next
level (just like with list of clauses not covered by
a single statistics), which splits that into clauses
handled by multivariate stats and clauses handler by
regular statistics.

rework clauselist_selectivity_or to handle OR-clauses correctly

We might invent a completely new set of functions here, resembling
clauselist_selectivity but adapting the ideas to OR-clauses.

But luckily we know that each OR-clause

    (a OR b OR c)

may be rewritten as an equivalent AND-clause using negation:

    NOT ((NOT a) AND (NOT b) AND (NOT c))

And that's something we can pass to clauselist_selectivity.

histogram call cache
--------------------

The call cache was removed because it did not initially work
well with OR clauses, but that was just a stupid thinko in the
implementation. This patch re-adds it, hopefully correctly.

The code in update_match_bitmap_histogram() is overly complex,
the branches handling various inequality cases are redundant.
This needs to be simplified somehow.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |    6 +-
 src/backend/optimizer/path/clausesel.c | 1875 +++++++++++++++++++++++++++-----
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/backend/utils/mvstats/README.stats |  166 +++
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 10 files changed, 1833 insertions(+), 295 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index dc035d7..8f11b7a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -969,7 +969,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d79e4cc..2f4af21 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -498,7 +498,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2149,7 +2150,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 647212a..d239488 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -62,23 +71,27 @@ static Bitmapset  *collect_mv_attnums(List *clauses,
 static int count_mv_attnums(List *clauses, Oid varRelid,
 							SpecialJoinInfo *sjinfo, int type);
 
+static List *clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   int varRelid, SpecialJoinInfo *sjinfo, int types,
+						   bool remove);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 								Oid varRelid, List *stats,
 								SpecialJoinInfo *sjinfo);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
-static List *clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
-								 List *clauses, Oid varRelid,
-								 List **mvclauses, MVStatisticInfo *mvstats, int types);
-
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-						List *clauses, MVStatisticInfo *mvstats);
+						MVStatisticInfo *mvstats, List *clauses,
+						List *conditions, bool is_or);
+
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -92,11 +105,59 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root,
+								List *mvstats,
+								List *clauses, List *conditions,
+								Oid varRelid,
+								SpecialJoinInfo *sjinfo);
+
+static List *filter_clauses(PlannerInfo *root, Oid varRelid,
+							SpecialJoinInfo *sjinfo, int type,
+							List *stats, List *clauses,
+							Bitmapset **attnums);
+
+static List *filter_stats(List *stats, Bitmapset *new_attnums,
+						  Bitmapset *all_attnums);
+
+static Bitmapset **make_stats_attnums(MVStatisticInfo *mvstats,
+									  int nmvstats);
+
+static MVStatisticInfo *make_stats_array(List *stats, int *nmvstats);
+
+static List* filter_redundant_stats(List *stats,
+									List *clauses, List *conditions);
+
+static Node** make_clauses_array(List *clauses, int *nclauses);
+
+static Bitmapset ** make_clauses_attnums(PlannerInfo *root, Oid varRelid,
+										 SpecialJoinInfo *sjinfo, int type,
+										 Node **clauses, int nclauses);
+
+static bool* make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+							Bitmapset **clauses_attnums, int nclauses);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, List *clauses,
 						 Oid varRelid, Index *relid);
- 
+
 static Bitmapset* fdeps_collect_attnums(List *stats);
 
 static int	*make_idx_to_attnum_mapping(Bitmapset *attnums);
@@ -119,6 +180,8 @@ static Bitmapset *fdeps_filter_clauses(PlannerInfo *root,
 
 static Bitmapset * get_varattnos(Node * node, Index relid);
 
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
+
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
 #define MIN(x, y) (((x) < (y)) ? (x) : (y))
@@ -192,14 +255,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* list of multivariate stats on the relation */
 	List	   *stats = NIL;
@@ -208,12 +272,13 @@ clauselist_selectivity(PlannerInfo *root,
 	stats = find_stats(root, clauses, varRelid, &relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Apply functional dependencies, but first check that there are some stats
@@ -246,32 +311,101 @@ clauselist_selectivity(PlannerInfo *root,
 		(count_mv_attnums(clauses, varRelid, sjinfo,
 						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
-		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, varRelid, NULL, sjinfo,
-											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+		ListCell *s;
+
+		/*
+		 * Copy the conditions we got from the upper part of the expression tree
+		 * so that we can add local conditions to it (we need to keep the
+		 * original list intact, for sibling expressions - other expressions
+		 * at the same level).
+		 */
+		List *conditions_local = list_copy(conditions);
 
-		/* and search for the statistic covering the most attributes */
-		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+		/* find the best combination of statistics */
+		List *solution = choose_mv_statistics(root, stats,
+											  clauses, conditions,
+											  varRelid, sjinfo);
 
-		if (mvstat != NULL)	/* we have a matching stats */
+		/*
+		 * We have a good solution, which is merely a list of statistics that
+		 * we need to apply. We'll apply the statistics one by one (in the order
+		 * as they appear in the list), and for each statistic we'll
+		 *
+		 * (1) find clauses compatible with the statistic (and remove them
+		 *     from the list)
+		 *
+		 * (2) find local conditions compatible with the statistic
+		 *
+		 * (3) do the estimation P(clauses | conditions)
+		 *
+		 * (4) append the estimated clauses to local conditions
+		 *
+		 * continuously modify 
+		 */
+		foreach (s, solution)
 		{
-			/* clauses compatible with multi-variate stats */
-			List	*mvclauses = NIL;
+			MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
+
+			/* clauses compatible with the statistic we're applying right now */
+			List	*stat_clauses = NIL;
+			List	*stat_conditions = NIL;
+
+			/*
+			 * Find clauses and conditions matching the statistic - the clauses
+			 * need to be removed from the list, while conditions should remain
+			 * there (so that we can apply them repeatedly).
+			 *
+			 * FIXME Perhaps this should also check compatibility with the type
+			 *       of stats (i.e. MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST).
+			 */
+			stat_clauses
+				= clauses_matching_statistic(&clauses, mvstat, varRelid, sjinfo,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 true);
 
-			/* split the clauselist into regular and mv-clauses */
-			clauses = clauselist_mv_split(root, sjinfo, clauses,
-										  varRelid, &mvclauses, mvstat,
-										  (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST));
+			stat_conditions
+				= clauses_matching_statistic(&conditions_local, mvstat, varRelid, sjinfo,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 false);
 
-			/* we've chosen the histogram to match the clauses */
-			Assert(mvclauses != NIL);
+			/*
+			 * If we got no clauses to estimate, we've done something wrong,
+			 * either during the optimization, detecting compatible clause, or
+			 * somewhere else.
+			 *
+			 * Also, we need at least two attributes in clauses and conditions.
+			 */
+			Assert(stat_clauses != NIL);
+			Assert(count_mv_attnums(list_union(stat_clauses, stat_conditions),
+								varRelid, sjinfo,
+								MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);
 
 			/* compute the multivariate stats */
-			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			s1 *= clauselist_mv_selectivity(root, mvstat,
+											stat_clauses, stat_conditions,
+											false); /* AND */
+
+			/*
+			 * Add the new clauses to the local conditions, so that we can use
+			 * them for the subsequent statistics. We only add the clauses,
+			 * because the conditions are already there (or should be).
+			 */
+			conditions_local = list_concat(conditions_local, stat_clauses);
 		}
+
+		/* from now on, work only with the 'local' list of conditions */
+		conditions = conditions_local;
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return s1 * clause_selectivity(root, (Node *) linitial(clauses),
+									   varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -283,7 +417,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -442,6 +577,55 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for OR-clauses. We can't simply use
+ * the same multi-statistic estimation logic for AND-clauses, at least not
+ * directly, because there are a few key differences:
+ *
+ *   - functional dependencies don't really apply to OR-clauses
+ *
+ *   - clauselist_selectivity() is based on decomposing the selectivity into
+ *     a sequence of conditional probabilities (selectivities), but that can
+ *     be done only for AND-clauses
+ *
+ * We might invent a similar infrastructure for optimizing OR-clauses, doing
+ * something similar to what clause_selectivity does for AND-clauses, but
+ * luckily we know that each disjunctive normal form (aka OR-clause)
+ *
+ *     (a OR b OR c)
+ *
+ * may be rewritten as an equivalent conjunctive normal form (aka AND-clause)
+ * by using negation:
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * And that's something we can pass to clauselist_selectivity and let it do
+ * all the heavy lifting.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	List	   *args = NIL;
+	ListCell   *l;
+	Expr	   *expr;
+
+	/* build arguments for the AND-clause by negating args of the OR-clause */
+	foreach (l, clauses)
+		args = lappend(args, makeBoolExpr(NOT_EXPR, list_make1(lfirst(l)), -1));
+
+	/* and then the actual OR-clause on the negated args */
+	expr = makeBoolExpr(AND_EXPR, args, -1);
+
+	/* instead of constructing NOT expression, just do (1.0 - s) */
+	return 1.0 - clauselist_selectivity(root, list_make1(expr), varRelid,
+										jointype, sjinfo, conditions);
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -648,7 +832,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -768,7 +953,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -777,29 +963,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -889,7 +1064,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -898,7 +1074,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else
 	{
@@ -962,15 +1139,16 @@ clause_selectivity(PlannerInfo *root,
  *          in the MCV list, then the selectivity is below the lowest frequency
  *          found in the MCV list,
  *
- * TODO When applying the clauses to the histogram/MCV list, we can do
- *      that from the most selective clauses first, because that'll
- *      eliminate the buckets/items sooner (so we'll be able to skip
- *      them without inspection, which is more expensive). But this
- *      requires really knowing the per-clause selectivities in advance,
- *      and that's not what we do now.
+ * TODO When applying the clauses to the histogram/MCV list, we can do that from
+ *      the most selective clauses first, because that'll eliminate the
+ *      buckets/items sooner (so we'll be able to skip them without inspection,
+ *      which is more expensive). But this requires really knowing the
+ *      per-clause selectivities in advance, and that's not what we do now.
+ *
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -988,7 +1166,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -1001,7 +1180,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	/* FIXME if (fullmatch) without matching MCV item, use the mcv_low
 	 *       selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1041,8 +1221,7 @@ collect_mv_attnums(List *clauses, Oid varRelid,
 	 */
 	if (bms_num_members(attnums) <= 1)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
+		bms_free(attnums);
 		attnums = NULL;
 		*relid = InvalidOid;
 	}
@@ -1067,186 +1246,876 @@ count_mv_attnums(List *clauses, Oid varRelid, SpecialJoinInfo *sjinfo, int type)
 	return c;
 }
 
+static List *
+clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   int varRelid, SpecialJoinInfo *sjinfo, int types,
+						   bool remove)
+{
+	int i;
+	Bitmapset  *stat_attnums = NULL;
+	List	   *matching_clauses = NIL;
+	ListCell   *lc;
+
+	/* build attnum bitmapset for this statistics */
+	for (i = 0; i < statistic->stakeys->dim1; i++)
+		stat_attnums = bms_add_member(stat_attnums,
+									  statistic->stakeys->values[i]);
+
+	/*
+	 * We can't use foreach here, because we may need to remove some of the
+	 * clauses if (remove=true).
+	 */
+	lc = list_head(*clauses);
+	while (lc)
+	{
+		Node *clause = (Node*)lfirst(lc);
+		Bitmapset *attnums = NULL;
+
+		/* must advance lc before list_delete possibly pfree's it */
+		lc = lnext(lc);
+
+		/*
+		 * skip clauses that are not compatible with stats (just leave them
+		 * in the original list)
+		 *
+		 * FIXME Perhaps this should check what stats are actually available in
+		 *       the statistics (not a big deal now, because MCV and histograms
+		 *       handle the same types of conditions).
+		 */
+		if (! clause_is_mv_compatible(clause, varRelid, NULL, &attnums, sjinfo,
+									  types))
+		{
+			bms_free(attnums);
+			continue;
+		}
+
+		/* if the clause is covered by the statistic, add it to the list */
+		if (bms_is_subset(attnums, stat_attnums))
+		{
+			matching_clauses = lappend(matching_clauses, clause);
+
+			/* if remove=true, remove the matching item from the main list */
+			if (remove)
+				*clauses = list_delete_ptr(*clauses, clause);
+		}
+
+		bms_free(attnums);
+	}
+
+	bms_free(stat_attnums);
+
+	return matching_clauses;
+}
+
 /*
- * We're looking for statistics matching at least 2 attributes,
- * referenced in the clauses compatible with multivariate statistics.
- * The current selection criteria is very simple - we choose the
- * statistics referencing the most attributes.
+ * Selects the best combination of multivariate statistics, in an exhaustive
+ * way, where 'best' means:
  *
- * If there are multiple statistics referencing the same number of
- * columns (from the clauses), the one with less source columns
- * (as listed in the ADD STATISTICS when creating the statistics) wins.
- * Other wise the first one wins.
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
  *
- * This is a very simple criteria, and has several weaknesses:
+ * Don't call this directly but through choose_mv_statistics(), which does some
+ * additional tricks to minimize the runtime.
  *
- * (a) does not consider the accuracy of the statistics
  *
- *     If there are two histograms built on the same set of columns,
- *     but one has 100 buckets and the other one has 1000 buckets (thus
- *     likely providing better estimates), this is not currently
- *     considered.
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with maximum
+ * depth equal to the number of multi-variate statistics available on the table.
+ * It actually explores all valid combinations of stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it matches are
+ * divided into 'conditions' (clauses already matched by at least one previous
+ * statistics) and clauses that are estimated.
  *
- * (b) does not consider the type of statistics
+ * Then several checks are performed:
  *
- *     If there are three statistics - one containing just a MCV list,
- *     another one with just a histogram and a third one with both,
- *     this is not considered.
+ *  (a) The statistics covers at least 2 columns, referenced in the estimated
+ *      clauses (otherwise multi-variate stats are useless).
  *
- * (c) does not consider the number of clauses
+ *  (b) The statistics covers at least 1 new column, i.e. column not refefenced
+ *      by the already used stats (and the new column has to be referenced by
+ *      the clauses, of couse). Otherwise the statistics would not add any new
+ *      information.
  *
- *     As explained, only the number of referenced attributes counts,
- *     so if there are multiple clauses on a single attribute, this
- *     still counts as a single attribute.
+ * There are some other sanity checks (e.g. stats must not be used twice etc.).
  *
- * (d) does not consider type of condition
  *
- *     Some clauses may work better with some statistics - for example
- *     equality clauses probably work better with MCV lists than with
- *     histograms. But IS [NOT] NULL conditions may often work better
- *     with histograms (thanks to NULL-buckets).
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a rather simple optimality criteria, so it may
+ * not do the best choice when
  *
- * So for example with five WHERE conditions
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but with
+ *     statistics in a different order). It's unclear which solution in the best
+ *     one - in a sense all of them are equal.
  *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ * TODO It might be possible to compute estimate for each of those solutions,
+ *      and then combine them to get the final estimate (e.g. by using average
+ *      or median).
  *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be
- * selected as it references the most columns.
+ * (b) Does not consider that some types of stats are a better match for some
+ *     types of clauses (e.g. MCV list is generally a better match for equality
+ *     conditions than a histogram).
  *
- * Once we have selected the multivariate statistics, we split the list
- * of clauses into two parts - conditions that are compatible with the
- * selected stats, and conditions are estimated using simple statistics.
+ *     But maybe this is pointless - generally, each column is either a label
+ *     (it's not important whether because of the data type or how it's used),
+ *     or a value with ordering that makes sense. So either a MCV list is more
+ *     appropriate (labels) or a histogram (values with orderings).
  *
- * From the example above, conditions
+ *     Now sure what to do with statistics on columns mixing both types of data
+ *     (some columns would work best with MCVs, some with histograms). Maybe we
+ *     could invent a new type of statistics combining MCV list and histogram
+ *     (keeping a small histogram for each MCV item, and a separate histogram
+ *     for values not on the MCV list).
  *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ * TODO The algorithm should probably count number of Vars (not just attnums)
+ *      when computing the 'score' of each solution. Computing the ratio of
+ *      (num of all vars) / (num of condition vars) as a measure of how well
+ *      the solution uses conditions might be useful.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* this may run for a long sime, so let's make it interruptible */
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		/* we can't get more conditions that clauses and conditions combined
+		 *
+		 * FIXME This assert does not work because we count the conditions
+		 *       repeatedly (once for each statistics covering it).
+		 */
+		/* Assert((nconditions + nclauses) >= current->nconditions); */
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics covering
+ * the clauses. This chooses the "best" statistics at each step, so the
+ * resulting solution may not be the best solution globally, but this produces
+ * the solution in only N steps (where N is the number of statistics), while
+ * the exhaustive approach may have to walk through ~N! combinations (although
+ * some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does the same
+ * thing (but in a different way).
  *
- * will be estimated using the multivariate statistics (a,b,c,d) while
- * the last condition (e = 1) will get estimated using the regular ones.
+ * Don't call this directly, but through choose_mv_statistics().
  *
- * There are various alternative selection criteria (e.g. counting
- * conditions instead of just referenced attributes), but eventually
- * the best option should be to combine multiple statistics. But that's
- * much harder to do correctly.
+ * TODO There are probably other metrics we might use - e.g. using number of
+ *      columns (num_cond_columns / num_cov_columns), which might work better
+ *      with a mix of simple and complex clauses.
  *
- * TODO Select multiple statistics and combine them when computing
- *      the estimate.
+ * TODO Also the choice at the very first step should be handled in a special
+ *      way, because there will be 0 conditions at that moment, so there needs
+ *      to be some other criteria - e.g. using the simplest (or most complex?)
+ *      clause might be a good idea.
  *
- * TODO This will probably have to consider compatibility of clauses,
- *      because 'dependencies' will probably work only with equality
- *      clauses.
+ * TODO We might also select multiple stats using different criteria, and branch
+ *      the search. This is however tricky, because if we choose k statistics at
+ *      each step, we get k^N branches to walk through (with N steps). That's
+ *      not really good with large number of stats (yet better than exhaustive
+ *      search).
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
 {
-	int i;
-	ListCell   *lc;
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
 
-	MVStatisticInfo *choice = NULL;
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements)
-	 * and for each one count the referenced attributes (encoded in
-	 * the 'attnums' bitmap).
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
 	 */
-	foreach (lc, stats)
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
 
-		/* columns matching this statistics */
-		int matches = 0;
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
 			continue;
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
 
-		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
-		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		if (gain > max_gain)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
+			max_gain = gain;
+			best_stat = i;
 		}
 	}
 
-	return choice;
-}
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
 
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
 
 /*
- * This splits the clauses list into two parts - one containing clauses
- * that will be evaluated using the chosen statistics, and the remaining
- * clauses (either non-mvcompatible, or not related to the histogram).
+ * Chooses the combination of statistics, optimal for estimation of a particular
+ * clause list.
+ *
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce the size
+ * of the problem (eliminate clauses/statistics that can't be really used in
+ * the solution).
+ *
+ * It also precomputes bitmaps for attributes covered by clauses and statistics,
+ * so that we don't need to do that over and over in the actual optimizations
+ * (as it's both CPU and memory intensive).
+ *
+ * TODO This will probably have to consider compatibility of clauses, because
+ *      'dependencies' will probably work only with equality clauses.
+ *
+ * TODO Another way to make the optimization problems smaller might be splitting
+ *      the statistics into several disjoint subsets, i.e. if we can split the
+ *      graph of statistics (after the elimination) into multiple components
+ *      (so that stats in different components share no attributes), we can do
+ *      the optimization for each component separately.
+ *
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew that we
+ *      can cover 10 clauses and reuse 8 dependencies, maybe covering 9 clauses
+ *      and 7 dependencies would be OK?
  */
-static List *
-clauselist_mv_split(PlannerInfo *root, SpecialJoinInfo *sjinfo,
-					List *clauses, Oid varRelid, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
+static List*
+choose_mv_statistics(PlannerInfo *root, List *stats,
+					 List *clauses, List *conditions,
+					 Oid varRelid, SpecialJoinInfo *sjinfo)
 {
 	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
+
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
+
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
+
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
+
+	/*
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
+	 */
+	while (true)
+	{
+		List	   *tmp;
+
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
+
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, varRelid, sjinfo, type,
+							 stats, clauses, &compatible_attnums);
+
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
+
+		/*
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
+		 */
+		if (conditions != NIL)
+		{
+			tmp = filter_clauses(root, varRelid, sjinfo, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
+		}
+
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
 
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
 
-	Bitmapset *mvattnums = NULL;
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
 
-	/* build bitmap of attributes covered by the stats, so we can
-	 * do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
 
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
 
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
 
-		if (clause_is_mv_compatible(clause, varRelid, NULL,
-									&attnums, sjinfo, types))
-		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
-		}
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
+										   clauses_array, nclauses);
 
-		/*
-		 * The clause matches the selected stats, so put it to the list
-		 * of mv-compatible clauses. Otherwise, keep it in the list of
-		 * 'regular' clauses (that may be selected later).
-		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
-	}
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, varRelid, sjinfo, type,
+										   conditions_array, nconditions);
 
 	/*
-	 * Perform regular estimation using the clauses incompatible
-	 * with the chosen histogram (or MV stats in general).
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
 	 */
-	return non_mvclauses;
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
+		{
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
+		}
+		pfree(best);
+	}
+
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
+
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
+
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
+
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
+
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
 
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
+
+	return result;
 }
 
 /*
@@ -1421,10 +2290,10 @@ clause_is_mv_compatible(Node *clause, Oid varRelid,
 
 		return true;
 	}
-	else if (or_clause(clause) || and_clause(clause))
+	else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 	{
 		/*
-		 * AND/OR-clauses are supported if all sub-clauses are supported
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
 		 *
 		 * TODO We might support mixed case, where some of the clauses
 		 *      are supported and some are not, and treat all supported
@@ -1434,7 +2303,10 @@ clause_is_mv_compatible(Node *clause, Oid varRelid,
 		 *
 		 * TODO For RestrictInfo above an OR-clause, we might use the
 		 *      orclause with nested RestrictInfo - we won't have to
-		 *      call pull_varnos() for each clause, saving time. 
+		 *      call pull_varnos() for each clause, saving time.
+		 *
+		 * TODO Perhaps this needs a bit more thought for functional
+		 *      dependencies? Those don't quite work for NOT cases.
 		 */
 		Bitmapset *tmp = NULL;
 		ListCell *l;
@@ -1454,6 +2326,7 @@ clause_is_mv_compatible(Node *clause, Oid varRelid,
 	return false;
 }
 
+
 /*
  * reduce list of equality clauses using soft functional dependencies
  *
@@ -2079,22 +2952,26 @@ get_varattnos(Node * node, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2105,32 +2982,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
 
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2423,64 +3353,57 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				}
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mcvlist->nitems; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2538,15 +3461,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2559,25 +3485,55 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
 
-	nmatches = mvhist->nbuckets;
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2591,10 +3547,23 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 #ifdef DEBUG_MVHIST
@@ -2603,9 +3572,14 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 #define HIST_CACHE_NOT_FOUND		0x00
@@ -2652,7 +3626,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 {
 	int i;
 	ListCell * l;
-
+ 
 	/*
 	 * Used for caching function calls, only once per deduplicated value.
 	 *
@@ -2695,7 +3669,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 			FmgrInfo	opproc;			/* operator */
 			fmgr_info(get_opcode(expr->opno), &opproc);
-
+ 
 			/* reset the cache (per clause) */
 			memset(callcache, 0, mvhist->nbuckets);
 
@@ -2876,64 +3850,57 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
-			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			/* by default none of the buckets matches the clauses (OR clause) */
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* but AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
-
+			pfree(tmp_matches);
 		}
 		else
 			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* free the call cache */
 	pfree(callcache);
 
 	return nmatches;
@@ -3049,3 +4016,363 @@ bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
 	else
 		return ( a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
 }
+
+/*
+ * Walk through clauses and keep only those covered by at least
+ * one of the statistics.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
+			   int type, List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+		Index relid;
+
+		/*
+		 * The clause has to be mv-compatible (suitable operators etc.).
+		 */
+		if (! clause_is_mv_compatible(clause, varRelid,
+							 &relid, &clause_attnums, sjinfo, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* is there a statistics covering this clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			for (k = 0; k < stat->stakeys->dim1; k++)
+			{
+				if (bms_is_member(stat->stakeys->values[k],
+								  clause_attnums))
+					matches += 1;
+			}
+
+			/*
+			 * The clause is compatible if all attributes it references
+			 * are covered by the statistics.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+
+/*
+ * Walk through statistics and only keep those covering at least
+ * one new attribute (excluding conditions) and at two attributes
+ * in both clauses and conditions.
+ *
+ * This check might be made more strict by checking against individual
+ * clauses, because by using the bitmapsets of all attnums we may
+ * actually use attnums from clauses that are not covered by the
+ * statistics. For example, we may have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this
+ * (assuming there are some statistics covering both clases).
+ *
+ * TODO Do the more strict check.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
+
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
+
+	return mvstats;
+}
+
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
+
+	Assert(nmvstats > 0);
+
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
+}
+
+
+/*
+ * Now let's remove redundant statistics, covering the same columns
+ * as some other stats, when restricted to the attributes from
+ * remaining clauses.
+ *
+ * If statistics S1 covers S2 (covers S2 attributes and possibly
+ * some more), we can probably remove S2. What actually matters are
+ * attributes from covered clauses (not all the attributes). This
+ * might however prefer larger, and thus less accurate, statistics.
+ *
+ * When a redundancy is detected, we simply keep the smaller
+ * statistics (less number of columns), on the assumption that it's
+ * more accurate and faster to process. That might be incorrect for
+ * two reasons - first, the accuracy really depends on number of
+ * buckets/MCV items, not the number of columns. Second, we might
+ * prefer MCV lists over histograms or something like that.
+ */
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
+{
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
+
+	/*
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+
+	/*
+	 * Get the varattnos from both conditions and clauses.
+	 *
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
+	 *
+	 * XXX Is that really true?
+	 */
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
+	{
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
+
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
+	}
+
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
+	{
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
+	}
+
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
+}
+
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
+{
+	int i;
+	ListCell *l;
+
+	Node** clauses_array;
+
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
+
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
+}
+
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Oid varRelid, SpecialJoinInfo *sjinfo,
+					 int type, Node **clauses, int nclauses)
+{
+	int			i;
+	Index		relid;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
+
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
+
+		if (! clause_is_mv_compatible(clauses[i], varRelid,
+									  &relid, &attnums, sjinfo, type))
+			elog(ERROR, "should not get non-mv-compatible clause");
+
+		clauses_attnums[i] = attnums;
+	}
+
+	return clauses_attnums;
+}
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5fc2f9c..7384cb8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3520,7 +3520,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3543,7 +3544,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3710,7 +3712,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3746,7 +3748,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3783,7 +3786,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3921,12 +3925,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3938,7 +3944,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index ea831f5..6299e75 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 46c95b0..7d0a3a1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1627,13 +1627,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6259,7 +6261,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6579,7 +6582,8 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7330,7 +7334,8 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7560,7 +7565,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea5a09a..27a8de5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -393,6 +394,15 @@ static const struct config_enum_entry force_parallel_mode_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3707,6 +3717,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 3e4f4d1..d404914 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -90,6 +90,137 @@ even attempting to do the more expensive estimation.
 Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
+Combining multiple statistics
+-----------------------------
+
+When estimating selectivity of a list of clauses, there may exist no statistics
+covering all of them. If there are multiple statistics, each covering some
+subset of the attributes, the optimizer needs to figure out which of those
+statistics to apply.
+
+When the statistics do not overlap, the solution is trivial - we can simply
+split the groups of conditions by the matching statistics, and then multiply the
+selectivities. For example assume multivariate statistics on (b,c) and (d,e),
+and a condition like this:
+
+    (a=1) AND (b=2) AND (c=3) AND (d=4) AND (e=5)
+
+Then (a=1) is not covered by any of the statistics, so will be estimated using
+the regular per-column statistics. The two conditions ((b=2) AND (c=3)) will be
+estimated using the (b,c) statistics, and ((d=4) AND (e=5)) will be estimated
+using (d,e) statistics. And the resulting selectivities will be estimated.
+
+Now, what if the statistics overlap? For example assume the same condition as
+above, but let's say we have statistics on (a,b,c) and (a,c,d,e). What then?
+
+As selectivity is just a probability that the condition holds for a random row,
+we can write the selectivity like this:
+
+    P(a=1 & b=2 & c=3 & d=4 & e=5)
+
+and we can rewrite it using conditional probability like this
+
+    P(a=1 & b=2 & c=3) * P(d=4 & e=5 | a=1 & b=2 & c=3)
+
+Notice that the first part already matches to (a,b,c) statistics. If we assume
+that columns that are not referenced by the same statistics are independent, we
+may rewrite the second half like this
+
+    P(d=4 & e=5 | a=1 & b=2 & c=3) = P(d=4 & e=5 | a=1 & c=3)
+
+which corresponds to the statistics on (a,c,d,e).
+
+If there are multiple statistics defined on a table, it's not difficult to come
+up with examples when there are multiple ways to combine them to cover a list of
+clauses. We need a way to find the best combination of statistics.
+
+This is the purpose of choose_mv_statistics(). It searches through the possible
+combinations of statistics, and searches such combination that
+
+    (a) covers the most clauses of the list
+
+    (b) reuses the maximum number of clauses as conditions
+        (in conditional probabilities)
+
+While (a) criteria seems natural, the (b) may seem a bit awkward at first. The
+idea is that conditions in a way of transfering information about dependencies
+between statistics.
+
+There are two alternative implementations of choose_mv_statistics() - greedy
+and exhaustive. Exhaustive actually searches through all possible combinations
+of statistics, and for larger numbers of statistics may get quite expensive
+(as it, unsurprisingly, has exponential cost). Greedy terminates in less than
+K steps (when K is the number of clauses), and in each step chooses the best
+next statistics. I've been unable to come up with an example where those two
+approaches would produce different combinations.
+
+It's possible to choose the optimization using mvstat_search_type, with either
+'greedy' or 'exhaustive' values (default is 'greedy').
+
+    SET mvstat_search_type = 'exhaustive';
+
+Note: This is meant mostly for experimentation. I do expect we'll choose one of
+the algorithms and remove the GUC before commit.
+
+
+Limitations of combining statistics
+-----------------------------------
+
+As described in the section 'Combining multiple statistics', the current appoach
+is based on transfering information between statistics by means of conditional
+probabilities. This is a relatively cheap and efficient approach, but it is
+based on two assumptions:
+
+    (1) The overlap between the statistics needs to be sufficiently large, i.e.
+        there needs to be enough columns shared by the statistics to transfer
+        information about dependencies between the remaining columns.
+
+    (2) The query needs to include sufficient clauses on the shared columns.
+
+How a violation of those assumptions may be a problem can be illustrated by
+a simple example. Assume a table with three columns (a,b,c) containing exactly
+the same values, and statistics on (a,b) and (b,c):
+
+    CREATE TABLE test AS SELECT i, i, i
+                           FROM generate_series(1,1000);
+
+    CREATE STATISTICS s1 ON test (a,b) WITH (mcv);
+    CREATE STATISTICS s2 ON test (b,c) WITH (mcv);
+
+    ANALYZE test;
+
+First, let's estimate this query:
+
+    SELECT * FROM test WHERE (a < 10) AND (c < 10);
+
+Clearly, there are no conditions on 'b' (which is the only column shared by the
+two statistics), so we'll end up with an estimate based on assumption of
+independence:
+
+    P(a < 10) * P(c < 10) = 0.01 * 0.01 = 0.0001
+
+Which is a significant under-estimate, as the proper selectivity is 0.01.
+
+But let's estimate another query:
+
+    SELECT * FROM test WHERE (a < 10) AND (b < 500) AND (c < 10);
+
+In this case, the estimate may be computed for example like this:
+
+    P[(a < 10) & (b < 500) & (c < 10)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (a < 10) & (b < 500)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (b < 500)]
+
+The trouble is the probability P(c < 10 | b < 500) evaluates to 0.02, because
+we have assumed (a) and (c) are independent because there is no statistic
+containing both these columns, and the condition on (b) does not transfer
+sufficient amount of information between the two statistics.
+
+Currently, the only solution is to build statistics on all three columns, but
+see the 'combining statistics using convolution' section for ideas on how to
+improve this.
+
+
 Further (possibly crazy) ideas
 ------------------------------
 
@@ -111,3 +242,38 @@ But of course, this may result in expensive estimation (CPU-wise).
 
 So we might add a GUC to choose between a simple (single statistics) and thus
 multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
+
+
+Combining stats using convolution
+---------------------------------
+
+While the current approach for combining statistics is based on conditional
+probabilities, and thus only works when the query includes conditions on the
+overlapping parts of the statistics. But there may be other ways to combine
+statistics, relaxing this requirement.
+
+Let's assume two histograms H1 and H2 - then combining them might work about
+like this:
+
+
+    for (buckets of H1, satisfying local conditions)
+    {
+        for (buckets of H2, overlapping with H1 bucket)
+        {
+            mark H2 bucket as 'valid'
+        }
+    }
+
+    s1 = s2 = 0.0
+    for (buckets of H2 marked as valid)
+    {
+        s1 += frequency
+
+        if (bucket satistifes local conditions)
+             s2 += frequency
+    }
+
+    s = (s2 / s1) /* final selectivity estimate */
+
+However this may quickly get non-trivial, e.g. when combining two statistics
+of different types (histogram vs. MCV).
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 78c7cae..a5ac088 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -191,11 +191,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index f05a517..35b2f8e 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
2.1.0

0007-multivariate-ndistinct-coefficients.patchbinary/octet-stream; name=0007-multivariate-ndistinct-coefficients.patchDownload

From ca8e799b8392541ef46c9427bef431175ae8f84e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Wed, 23 Dec 2015 02:07:58 +0100
Subject: [PATCH 7/9] multivariate ndistinct coefficients

---
 doc/src/sgml/ref/create_statistics.sgml    |   9 ++
 src/backend/catalog/system_views.sql       |   3 +-
 src/backend/commands/analyze.c             |   2 +-
 src/backend/commands/statscmds.c           |  11 +-
 src/backend/optimizer/path/clausesel.c     |   4 +
 src/backend/optimizer/util/plancat.c       |   4 +-
 src/backend/utils/adt/selfuncs.c           |  93 +++++++++++++++-
 src/backend/utils/mvstats/Makefile         |   2 +-
 src/backend/utils/mvstats/README.ndistinct |  83 ++++++++++++++
 src/backend/utils/mvstats/README.stats     |   2 +
 src/backend/utils/mvstats/common.c         |  23 +++-
 src/backend/utils/mvstats/mvdist.c         | 171 +++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h      |  26 +++--
 src/include/nodes/relation.h               |   2 +
 src/include/utils/mvstats.h                |   9 +-
 src/test/regress/expected/rules.out        |   3 +-
 16 files changed, 424 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.ndistinct
 create mode 100644 src/backend/utils/mvstats/mvdist.c

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index fd3382e..80360a6 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -168,6 +168,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6afdee0..a550141 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -169,7 +169,8 @@ CREATE VIEW pg_mv_stats AS
         length(S.stamcv) AS mcvbytes,
         pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
         length(S.stahist) AS histbytes,
-        pg_mv_stats_histogram_info(S.stahist) AS histinfo
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo,
+        standcoeff AS ndcoeff
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index cbaa4e1..0f6db77 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -582,7 +582,7 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 		}
 
 		/* Build multivariate stats (if there are any). */
-		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
+		build_mv_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index b974655..6ea0e13 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -138,7 +138,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool 	build_dependencies = false,
 			build_mcv = false,
-			build_histogram = false;
+			build_histogram = false,
+			build_ndistinct = false;
 
 	int32 	max_buckets = -1,
 			max_mcv_items = -1;
@@ -221,6 +222,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "max_mcv_items") == 0)
@@ -275,10 +278,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv || build_histogram))
+	if (! (build_dependencies || build_mcv || build_histogram || build_ndistinct))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram, ndistinct) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -311,6 +314,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
 	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_ndist_enabled-1] = BoolGetDatum(build_ndistinct);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
@@ -318,6 +322,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+	nulls[Anum_pg_mv_statistic_standist -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d239488..3c2aefd 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
+#define		MV_CLAUSE_TYPE_NDIST	0x08
 
 static bool clause_is_mv_compatible(Node *clause, Oid varRelid,
 							 Index *relid, Bitmapset **attnums, SpecialJoinInfo *sjinfo,
@@ -2553,6 +2554,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_NDIST) && stat->ndist_built)
+			return true;
 	}
 
 	return false;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d46aed2..bd2c306 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -427,11 +427,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
 				info->hist_enabled = mvstat->hist_enabled;
+				info->ndist_enabled = mvstat->ndist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
 				info->hist_built = mvstat->hist_built;
+				info->ndist_built = mvstat->ndist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 7d0a3a1..a84dd2b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -132,6 +132,7 @@
 #include "utils/fmgroids.h"
 #include "utils/index_selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/nabstime.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
@@ -206,6 +207,7 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static Oid find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3422,12 +3424,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient
+			 * to get the (probably way more accurate) estimate.
+			 *
+			 * XXX Probably needs refactoring (don't like to mix with clamp
+			 *     and coeff at the same time).
 			 */
 			double		clamp = rel->tuples;
+			double		coeff = 1.0;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				Oid oid = find_ndistinct_coeff(root, rel, varinfos);
+
+				if (oid != InvalidOid)
+					coeff = load_mv_ndistinct(oid);
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3436,6 +3452,13 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
+			/*
+			 * Apply ndistinct coefficient from multivar stats (we must do this
+			 * before clamping the estimate in any way.
+			 */
+			reldistinct /= coeff;
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7582,3 +7605,71 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics.
+ */
+static Oid
+find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+	VariableStatData vardata;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+				= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach (lc, rel->mvstatlist)
+	{
+		int i;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/* only exact matches for now (same set of columns) */
+		if (bms_num_members(attnums) != info->stakeys->dim1)
+			continue;
+
+		/* check that the columns match */
+		for (i = 0; i < info->stakeys->dim1; i++)
+			if (bms_is_member(info->stakeys->values[i], attnums))
+				continue;
+
+		return info->mvoid;
+	}
+
+	return InvalidOid;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 9dbb3b6..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o histogram.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.ndistinct b/src/backend/utils/mvstats/README.ndistinct
new file mode 100644
index 0000000..32d1624
--- /dev/null
+++ b/src/backend/utils/mvstats/README.ndistinct
@@ -0,0 +1,83 @@
+ndistinct coefficients
+======================
+
+Estimating number of distinct groups in a combination of columns is tricky,
+and the estimation error is often significant. By ndistinct coefficient we
+mean a ratio
+
+    q = ndistinct(a) * ndistinct(b) / ndistinct(a,b)
+
+where 'a' and 'b' are columns, ndistinct(a) is (an estimate of) a number of
+distinct values in column 'a'. And ndistinct(a,b) is the same thing for the
+pair of columns.
+
+The meaning of the coefficient may be illustrated by answering the following
+question: Given a combination of columns (a,b), how many distinct values of 'b'
+matches a chosen value of 'a' on average?
+
+Let's assume we know ndistinct(a) and ndistinct(a,b). Then the answer to the
+question clearly is
+
+    ndistinct(a,b) / ndistinct(a)
+
+and by using 'q' we may rewrite this as
+
+    ndistinct(b) / q
+
+so 'q' may be considered as a correction factor of the ndistinct estimate given
+a condition on one of the columns.
+
+This may be generalized to a combination of 'n' columns
+
+    [ndistinct(c1) * ... * ndistinct(cn)] / ndistinct(c1, ..., cn)
+
+and the meaning is very similar, except that we need to use conditions on (n-1)
+of the columns.
+
+
+Selectivity estimation
+----------------------
+
+As explained in the previous paragraph, ndistinct coefficients may be used to
+estimate cardinality of a column, given some apriori knowledge. Let's assume
+we need to estimate selectivity of a condition
+
+    (a=1) AND (b=2)
+
+which we can expand like this
+
+    P(a=1 & b=2) = P(a=1) * P(b=2 | a=1)
+
+Let's also assume that the distribution of 'b' is uniform, i.e. that
+
+    P(a=1) = 1/ndistinct(a)
+    P(b=2) = 1/ndistinct(b)
+    P(a=1 & b=2) = 1/ndistinct(a,b)
+
+    P(b=2 | a=1) = ndistinct(a) / ndistinct(a,b)
+
+which may be rewritten like
+
+    P(b=2 | a=1)
+        = ndistinct(a,b) / ndistinct(a)
+        = (1/ndistinct(b)) * [(ndistinct(a) * ndistinct(b)) / ndistinct(a,b)]
+        = (1/ndistinct(b)) * q
+
+and therefore
+
+    P(a=1 & b=2) = (1/ndistinct(a)) * (1/ndistinct(b)) * q
+
+This also illustrates 'q' as a correction coefficient.
+
+It also explains why we store the coefficient and not simply ndistinct(a,b).
+This way we can simply estimate individual clauses and then simply correct
+the estimate by multiplying the result with 'q' - we don't have to mess with
+ndistinct estimates at all.
+
+Naturally, as the coefficient is derives from ndistinct(a,b), it may be also
+used to estimate GROUP BY clauses on the combination of columns, replacing the
+existing heuristics in estimate_num_groups().
+
+Note: Currently only the GROUP BY estimation is implemented. It's a bit unclear
+how to implement the clause estimation when there are other statistics (esp.
+MCV lists and/or functional dependencies) available.
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index d404914..6d4b09b 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -20,6 +20,8 @@ Currently we only have two kinds of multivariate statistics
 
     (c) multivariate histograms (README.histogram)
 
+    (d) ndistinct coefficients
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index ffb76f4..2be980d 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -32,7 +32,8 @@ static List* list_mv_stats(Oid relid);
  * and serializes them back into the catalog (as bytea values).
  */
 void
-build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats)
 {
 	ListCell *lc;
@@ -53,6 +54,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
 		MVHistogram	histogram = NULL;
+		double		ndist	  = -1;
 		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
@@ -92,6 +94,9 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		if (stat->ndist_enabled)
+			ndist = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
+
 		/* build the MCV list */
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
@@ -101,7 +106,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, ndist, attrs, stats);
 	}
 }
 
@@ -183,6 +188,8 @@ list_mv_stats(Oid relid)
 		info->mcv_built = stats->mcv_built;
 		info->hist_enabled = stats->hist_enabled;
 		info->hist_built = stats->hist_built;
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
 
 		result = lappend(result, info);
 	}
@@ -252,7 +259,7 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 void
 update_mv_stats(Oid mvoid,
 				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
-				int2vector *attrs, VacAttrStats **stats)
+				double ndistcoeff, int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -292,26 +299,36 @@ update_mv_stats(Oid mvoid,
 			= PointerGetDatum(data);
 	}
 
+	if (ndistcoeff > 1.0)
+	{
+		nulls[Anum_pg_mv_statistic_standist -1] = false;
+		values[Anum_pg_mv_statistic_standist-1] = Float8GetDatum(ndistcoeff);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_standist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_ndist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_ndist_built-1] = true;
 	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_ndist_built-1] = BoolGetDatum(ndistcoeff > 1.0);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..59b8358
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,171 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <math.h>
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats)
+{
+	int i, j;
+	int f1, cnt, d;
+	int nmultiple, summultiple;
+	int numattrs = attrs->dim1;
+	MultiSortSupport mss = multi_sort_init(numattrs);
+	double ndistcoeff;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	Assert(numattrs >= 2);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, i, stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[i],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	ndistcoeff = 1 / estimate_ndistinct(totalrows, numrows, d, f1);
+
+	/*
+	 * now count distinct values for each attribute and incrementally
+	 * compute ndistinct(a,b) / (ndistinct(a) * ndistinct(b))
+	 *
+	 * FIXME Probably need to handle cases when one of the ndistinct
+	 *       estimates is negative, and also check that the combined
+	 *       ndistinct is greater than any of those partial values.
+	 */
+	for (i = 0; i < numattrs; i++)
+		ndistcoeff *= stats[i]->stadistinct;
+
+	return ndistcoeff;
+}
+
+double
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return DatumGetFloat8(deps);
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double	numer,
+			denom,
+			ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+			(double) f1 * (double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index a5945af..ee353da 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,6 +39,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
 	bool		hist_enabled;		/* build histogram? */
+	bool		ndist_enabled;		/* build ndist coefficient? */
 
 	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
@@ -48,6 +49,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
 	bool		hist_built;			/* histogram was built */
+	bool		ndist_built;		/* ndistinct coeff built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -56,6 +58,7 @@ CATALOG(pg_mv_statistic,3381)
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
 	bytea		stahist;			/* MV histogram (serialized) */
+	float8		standcoeff;			/* ndistinct coeff (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -71,21 +74,24 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					15
+#define Natts_pg_mv_statistic					18
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
 #define Anum_pg_mv_statistic_mcv_enabled		5
 #define Anum_pg_mv_statistic_hist_enabled		6
-#define Anum_pg_mv_statistic_mcv_max_items		7
-#define Anum_pg_mv_statistic_hist_max_buckets	8
-#define Anum_pg_mv_statistic_deps_built			9
-#define Anum_pg_mv_statistic_mcv_built			10
-#define Anum_pg_mv_statistic_hist_built			11
-#define Anum_pg_mv_statistic_stakeys			12
-#define Anum_pg_mv_statistic_stadeps			13
-#define Anum_pg_mv_statistic_stamcv				14
-#define Anum_pg_mv_statistic_stahist			15
+#define Anum_pg_mv_statistic_ndist_enabled		7
+#define Anum_pg_mv_statistic_mcv_max_items		8
+#define Anum_pg_mv_statistic_hist_max_buckets	9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_ndist_built		13
+#define Anum_pg_mv_statistic_stakeys			14
+#define Anum_pg_mv_statistic_stadeps			15
+#define Anum_pg_mv_statistic_stamcv				16
+#define Anum_pg_mv_statistic_stahist			17
+#define Anum_pg_mv_statistic_standist			18
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 46bece6..a2fafd2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -621,11 +621,13 @@ typedef struct MVStatisticInfo
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
 	bool		hist_enabled;	/* histogram enabled */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
 	bool		hist_built;		/* histogram built */
+	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 35b2f8e..fb2c5d8 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -225,6 +225,7 @@ typedef MVSerializedHistogramData *MVSerializedHistogram;
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
 MVSerializedHistogram    load_mv_histogram(Oid mvoid);
+double		   load_mv_ndistinct(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
@@ -266,11 +267,17 @@ MVHistogram
 build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
 				   VacAttrStats **stats, int numrows_total);
 
-void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, double totalrows,
+					int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid, MVDependencies dependencies,
 					 MCVList mcvlist, MVHistogram histogram,
+					 double ndistcoeff,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #ifdef DEBUG_MVHIST
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1a1a4ca..0ad935e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1377,7 +1377,8 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stamcv) AS mcvbytes,
     pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
     length(s.stahist) AS histbytes,
-    pg_mv_stats_histogram_info(s.stahist) AS histinfo
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo,
+    s.standcoeff AS ndcoeff
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
-- 
2.1.0

0008-change-how-we-apply-selectivity-to-number-of-groups-.patchbinary/octet-stream; name=0008-change-how-we-apply-selectivity-to-number-of-groups-.patchDownload

From 050ab11a67b89383211c870e7d32259b1368f689 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 26 Jan 2016 18:14:33 +0100
Subject: [PATCH 8/9] change how we apply selectivity to number of groups
 estimate

Instead of simply multiplying the ndistinct estimate with selecticity,
we instead use the formula for the expected number of distinct values
observed in 'k' rows when there are 'd' distinct values in the bin

    d * (1 - ((d - 1) / d)^k)

This is 'with replacements' which seems appropriate for the use, and it
mostly assumes uniform distribution of the distinct values. So if the
distribution is not uniform (e.g. there are very frequent groups) this
may be less accurate than the current algorithm in some cases, giving
over-estimates. But that's probably better than OOM.
---
 src/backend/utils/adt/selfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index a84dd2b..ce3ad19 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3465,7 +3465,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			/*
 			 * Multiply by restriction selectivity.
 			 */
-			reldistinct *= rel->rows / rel->tuples;
+			reldistinct = reldistinct * (1 - powl((reldistinct - 1) / reldistinct,rel->rows));
 
 			/*
 			 * Update estimate of total distinct groups.
-- 
2.1.0

0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchbinary/octet-stream; name=0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchDownload

From acb9b004e5e6a75e33f66b6d2f261f575fc515cb Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 28 Feb 2016 21:16:40 +0100
Subject: [PATCH 9/9] fixup of regression tests (plans changes by group by
 estimation)

---
 src/test/regress/expected/join.out      | 20 ++++++++++----------
 src/test/regress/expected/subselect.out | 25 +++++++++++--------------
 src/test/regress/expected/union.out     | 16 ++++++++--------
 3 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 59d7877..d9dd5ca 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3951,17 +3951,17 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
               QUERY PLAN               
 ---------------------------------------
- Merge Left Join
-   Merge Cond: (d.a = s.id)
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+ Merge Right Join
+   Merge Cond: (s.id = d.a)
    ->  Sort
          Sort Key: s.id
          ->  Subquery Scan on s
                ->  HashAggregate
                      Group Key: b.id
                      ->  Seq Scan on b
+   ->  Sort
+         Sort Key: d.a
+         ->  Seq Scan on d
 (11 rows)
 
 -- similarly, but keying off a DISTINCT clause
@@ -3970,17 +3970,17 @@ select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
                  QUERY PLAN                  
 ---------------------------------------------
- Merge Left Join
-   Merge Cond: (d.a = s.id)
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+ Merge Right Join
+   Merge Cond: (s.id = d.a)
    ->  Sort
          Sort Key: s.id
          ->  Subquery Scan on s
                ->  HashAggregate
                      Group Key: b.id, b.c_id
                      ->  Seq Scan on b
+   ->  Sort
+         Sort Key: d.a
+         ->  Seq Scan on d
 (11 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index de64ca7..0fc93d9 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -807,27 +807,24 @@ select * from int4_tbl where
 explain (verbose, costs off)
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
-                              QUERY PLAN                              
-----------------------------------------------------------------------
- Hash Join
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Hash Semi Join
    Output: o.f1
    Hash Cond: (o.f1 = "ANY_subquery".f1)
    ->  Seq Scan on public.int4_tbl o
          Output: o.f1
    ->  Hash
          Output: "ANY_subquery".f1, "ANY_subquery".g
-         ->  HashAggregate
+         ->  Subquery Scan on "ANY_subquery"
                Output: "ANY_subquery".f1, "ANY_subquery".g
-               Group Key: "ANY_subquery".f1, "ANY_subquery".g
-               ->  Subquery Scan on "ANY_subquery"
-                     Output: "ANY_subquery".f1, "ANY_subquery".g
-                     Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
-                     ->  HashAggregate
-                           Output: i.f1, (generate_series(1, 2) / 10)
-                           Group Key: i.f1
-                           ->  Seq Scan on public.int4_tbl i
-                                 Output: i.f1
-(18 rows)
+               Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+               ->  HashAggregate
+                     Output: i.f1, (generate_series(1, 2) / 10)
+                     Group Key: i.f1
+                     ->  Seq Scan on public.int4_tbl i
+                           Output: i.f1
+(15 rows)
 
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 016571b..f2e297e 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -263,16 +263,16 @@ ORDER BY 1;
 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
         q2        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
         q2        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;
@@ -305,16 +305,16 @@ SELECT q1 FROM int8_tbl EXCEPT SELECT q2 FROM int8_tbl;
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
         q1        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q1 FROM int8_tbl FOR NO KEY UPDATE;
@@ -343,8 +343,8 @@ SELECT f1 FROM float8_tbl EXCEPT SELECT f1 FROM int4_tbl ORDER BY 1;
 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -355,15 +355,15 @@ SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FR
 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -419,8 +419,8 @@ HINT:  There is a column named "q2" in table "*SELECT* 2", but it cannot be refe
 SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 --
-- 
2.1.0

#69

Thom Brown

thom@linux.com

almost 10 years ago

In reply to: Tomas Vondra (#68)

Re: multivariate statistics v10

On 2 March 2016 at 14:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

Attached is v10 of the patch series. There are 9 parts at the moment:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patch
0002-shared-infrastructure-and-functional-dependencies.patch
0003-clause-reduction-using-functional-dependencies.patch
0004-multivariate-MCV-lists.patch
0005-multivariate-histograms.patch
0006-multi-statistics-estimation.patch
0007-multivariate-ndistinct-coefficients.patch
0008-change-how-we-apply-selectivity-to-number-of-groups-.patch
0009-fixup-of-regression-tests-plans-changes-by-group-by-.patch

However, the first one is still just a temporary workaround that I plan to address next, and the last 3 are all dealing with the ndistinct coefficients (and shall be squashed into a single chunk).

README docs
-----------

Aside from fixing a few bugs, there are several major improvements, the main one being that I've moved most of the comments explaining how it all works into a set of regular README files, located in src/backend/utils/mvstats:

1) README.stats - Overview of available types of statistics, what
clauses can be estimated, how multiple statistics are combined etc.
This is probably the right place to start.

2) docs for each type of statistics currently available

README.dependencies - soft functional dependencies
README.mcv - MCV lists
README.histogram - histograms
README.ndistinct - ndistinct coefficients

The READMEs are added and modified through the patch series, so the best thing to do is apply all the patches and start reading.

I have not improved the user-oriented SGML documentation in this patch, that's one of the tasks I'd lie to work on next. But the READMEs should give you a good idea how it's supposed to work, and there are some examples of use in the regression tests.

Significantly simplified places
-------------------------------

The patch version also significantly simplifies several places that were needlessly complex in the previous ones - firstly the function evaluating clauses on multivariate histograms was rather needlessly bloated, so I've simplified it a lot. Similarly for the code in clauselist_select() that combines multiple statistics to estimate a list of clauses - that's much simpler now too. And various other pieces.

That being said, I still think the code in clausesel.c can be simplified. I feel there's a lot of cruft, mostly due to unknowingly implementing something that could be solved by an existing function.

A prime example of that is inspecting the expression tree to check if we know how to estimate the clauses using the multivariate statistics. That sounds like a nice match for expression walker, but currently is done by custom code. I plan to look at that next.

Also, I'm not quite sure I understand what the varRelid parameter of clauselist_selectivity is for, so the code may be handling that wrong (seems to be working though).

ndistinct coefficients
----------------------

The one new piece in this patch is the GROUP BY estimation, based on the ndistinct coefficients. So for example you can do this:

CREATE TABLE t AS SELECT mod(i,1000) AS a, mod(i,1000) AS b
FROM generate_series(1,1000000) s(i);
ANALYZE t;
EXPLAIN SELECT * FROM t GROUP BY a, b;

which currently does this:

QUERY PLAN
-----------------------------------------------------------------------
Group (cost=127757.34..135257.34 rows=99996 width=8)
Group Key: a, b
-> Sort (cost=127757.34..130257.34 rows=1000000 width=8)
Sort Key: a, b
-> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=8)
(5 rows)

but we know that there are only 1000 groups because the columns are correlated. So let's create ndistinct statistics on the two columns:

CREATE STATISTICS s1 ON t (a,b) WITH (ndistinct);
ANALYZE t;

which results in estimates like this:

QUERY PLAN
-----------------------------------------------------------------
HashAggregate (cost=19425.00..19435.00 rows=1000 width=8)
Group Key: a, b
-> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=8)
(3 rows)

I'm not quite sure how to combine this type of statistics with MCV lists and histograms, so for now it's used only for GROUP BY.

Well, firstly, the patches all apply.

But I have a question (which is coming really late, but I'll ask it
anyway). Is it intended that CREATE STATISTICS will only be for
multivariate statistics? Or do you think we could add support for
expression statistics in future too?

e.g.

CREATE STATISTICS stats_comment_length ON comments (length(comment));

I also note that the docs contain this:

CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ON table_name ( [
{ column_name } ] [, ...])
[ WITH ( statistics_parameter [= value] [, ... ] )

The open square bracket before WITH doesn't get closed. Also, it
indicates that columns are entirely options, so () would be valid, but
that's not the case. Also, a space is missing after the first
ellipsis. So I think this should read:

CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ON table_name (
{ column_name } [, ... ])
[ WITH ( statistics_parameter [= value] [, ... ] ) ]

Regards

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Thom Brown (#69)

Re: multivariate statistics v10

Hi,

On 03/02/2016 05:17 PM, Thom Brown wrote:
...

Well, firstly, the patches all apply.

But I have a question (which is coming really late, but I'll ask it
anyway). Is it intended that CREATE STATISTICS will only be for
multivariate statistics? Or do you think we could add support for
expression statistics in future too?

e.g.

CREATE STATISTICS stats_comment_length ON comments (length(comment));

Hmmm, that's not a use case I had in mind while working on the patch,
but it sounds interesting. I don't see why the syntax would not support
this - I'd like to add support for expressions into the multivariate
patch, but that will still require at least 2 columns to build
multivariate statistics. But perhaps it'd be possible to relax the "at
least 2 columns" requirement, and collect regular statistics somewhere.

So I don't see why the syntax could not work for that case too, but I'm
not going to work on that.

I also note that the docs contain this:

CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ON table_name ( [
{ column_name } ] [, ...])
[ WITH ( statistics_parameter [= value] [, ... ] )

The open square bracket before WITH doesn't get closed. Also, it
indicates that columns are entirely options, so () would be valid, but
that's not the case. Also, a space is missing after the first
ellipsis. So I think this should read:

CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ON table_name (
{ column_name } [, ... ])
[ WITH ( statistics_parameter [= value] [, ... ] ) ]

Yeah, will fix.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#68)

9 attachment(s)

Re: multivariate statistics v11

Hi,

attached is v11 of the patch - this is mostly a cleanup of v10, removing
redundant code, adding missing comments, removing obsolete FIXME/TODOs
and so on. Overall this shaves ~20kB from the patch (not a primary
objective, though).

The one thing this (hopefully) fixes is handling of varRelid. Apparently
I got that a slightly wrong in the previous versions.

One thing I'm not quite sure about is schema of the new system catalog.
The existing catalog pg_statistic uses generic design with stakindN,
stanumbersN and stavaluesN columns, while the new catalog uses dedicated
columns for each type of stats (MCV, histogram, ...). Not sure whether
it's desirable to switch to the pg_statistic approach or not.

There are a few things I plan to look into next:

* possibly more cleanups in clausesel.c (I'm wondering if some pieces
should be moved to utils/mvstats/*.c)

* a few FIXMEs in the infrastructure (e.g. deriving a name when not
specified in CREATE STATISTICS)

* move the ndistinct coefficients after functional dependencies in
the patch series (but only use them for GROUP BY for now)

* extend the functional dependencies to handle multiple columns on
the left side (condition), i.e. dependencies like (a,b) -> c

* address a few remaining FIXMEs in MCV/histograms building

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchtext/x-patch; charset=UTF-8; name=0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchDownload

From 19defa4e8c1e578f3cf4099b0729357ecc333c5a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/9] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index dff52c4..80d01bd 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -197,6 +197,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -245,6 +252,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.1.0

0002-shared-infrastructure-and-functional-dependencies.patchtext/x-patch; charset=UTF-8; name=0002-shared-infrastructure-and-functional-dependencies.patchDownload

From 48412732b6e1c667fd6f0f7d025b941ad0e7c1c1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/9] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate stats, most
importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON table (columns) WITH (options)
- DROP STATISTICS name
- implementation of functional dependencies (the simplest type of
  multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e. it does not
influence the query planning (subject to follow-up patches).

The current implementation requires a valid 'ltopr' for the columns, so
that we can sort the sample rows in various ways, both in this patch
and other kinds of statistics. Maybe this restriction could be relaxed
in the future, requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV list with
limited functionality) might be made to work with hashes of the values,
which is sufficient for equality comparisons. But the queries would
require the equality operator anyway, so it's not really a weaker
requirement. The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple and probably
needs improvements, so that it detects more complicated dependencies,
and also validation of the math.

The name 'functional dependencies' is more correct (than 'association
rules') as it's exactly the name used in relational theory (esp. Normal
Forms) for tracking column-level dependencies.

The multivariate statistics are automatically removed in two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics would be
     defined on less than 2 columns (remaining)

If there are more at least two remaining columns, we keep the
statistics but perform cleanup on the next ANALYZE. The dropped columns
are removed from stakeys, and the new statistics is built on the
smaller set.

We can't do this at DROP COLUMN, because that'd leave us with invalid
statistics, or we'd have to throw it away although we can still use it.
This lazy approach lets us use the statistics although some of the
columns are dead.

This also adds a simple list of statistics to \d in psql.

This means the statistics are created within a schema by using a
qualified name (or using the default schema)

   CREATE STATISTICS schema.statistics ON ...

and then dropped by specifying qualified name

   DROP STATISTICS schema.statistics

or searching through search_path (just like with other objects).

This also gets rid of the "(opt_)stats_name" definitions in gram.y and
instead replaces them with just "opt_any_name", although the optional
case is not really handled currently - there's no generated name yet
(so either we should drop it or implement it).

I'm not entirely sure making statistics schema-specific is that a great
idea. Maybe it should be "global", but that does not seem right (e.g.
it makes multi-tenant systems based on schemas more difficult to
manage, because tenants would interact).
---
 doc/src/sgml/ref/allfiles.sgml                |   2 +
 doc/src/sgml/ref/create_statistics.sgml       | 174 ++++++++++
 doc/src/sgml/ref/drop_statistics.sgml         |  90 ++++++
 doc/src/sgml/reference.sgml                   |   2 +
 src/backend/catalog/Makefile                  |   1 +
 src/backend/catalog/dependency.c              |  11 +-
 src/backend/catalog/heap.c                    | 102 ++++++
 src/backend/catalog/namespace.c               |  51 +++
 src/backend/catalog/objectaddress.c           |  22 ++
 src/backend/catalog/system_views.sql          |  11 +
 src/backend/commands/Makefile                 |   6 +-
 src/backend/commands/analyze.c                |  21 ++
 src/backend/commands/dropcmds.c               |   4 +
 src/backend/commands/event_trigger.c          |   3 +
 src/backend/commands/statscmds.c              | 331 +++++++++++++++++++
 src/backend/commands/tablecmds.c              |   8 +-
 src/backend/nodes/copyfuncs.c                 |  16 +
 src/backend/nodes/outfuncs.c                  |  18 ++
 src/backend/optimizer/util/plancat.c          |  63 ++++
 src/backend/parser/gram.y                     |  34 +-
 src/backend/tcop/utility.c                    |  11 +
 src/backend/utils/Makefile                    |   2 +-
 src/backend/utils/cache/relcache.c            |  59 ++++
 src/backend/utils/cache/syscache.c            |  23 ++
 src/backend/utils/mvstats/Makefile            |  17 +
 src/backend/utils/mvstats/README.dependencies | 222 +++++++++++++
 src/backend/utils/mvstats/common.c            | 356 +++++++++++++++++++++
 src/backend/utils/mvstats/common.h            |  75 +++++
 src/backend/utils/mvstats/dependencies.c      | 437 ++++++++++++++++++++++++++
 src/bin/psql/describe.c                       |  44 +++
 src/include/catalog/dependency.h              |   5 +-
 src/include/catalog/heap.h                    |   1 +
 src/include/catalog/indexing.h                |   7 +
 src/include/catalog/namespace.h               |   2 +
 src/include/catalog/pg_mv_statistic.h         |  73 +++++
 src/include/catalog/pg_proc.h                 |   5 +
 src/include/catalog/toasting.h                |   1 +
 src/include/commands/defrem.h                 |   4 +
 src/include/nodes/nodes.h                     |   2 +
 src/include/nodes/parsenodes.h                |  12 +
 src/include/nodes/relation.h                  |  28 ++
 src/include/utils/mvstats.h                   |  70 +++++
 src/include/utils/rel.h                       |   4 +
 src/include/utils/relcache.h                  |   1 +
 src/include/utils/syscache.h                  |   2 +
 src/test/regress/expected/rules.out           |   9 +
 src/test/regress/expected/sanity_check.out    |   1 +
 47 files changed, 2432 insertions(+), 11 deletions(-)
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/README.dependencies
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index bf95453..c0f7653 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -76,6 +76,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
 <!ENTITY createTableSpace   SYSTEM "create_tablespace.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
 <!ENTITY dropTransform      SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..a86eae3
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,174 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON <replaceable class="PARAMETER">table_name</replaceable> ( [
+  { <replaceable class="PARAMETER">column_name</replaceable> } ] [, ...])
+[ WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )</literal></term>
+    <listitem>
+     <para>
+      ...
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Statistics Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>statistics parameters</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-notes">
+  <title>Notes</title>
+
+    <para>
+     ...
+    </para>
+
+ </refsect1>
+
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..4cc0b70
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,90 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics does not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 03020df..2b07b2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -104,6 +104,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createTable;
    &createTableAs;
    &createTableSpace;
@@ -147,6 +148,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropTable;
    &dropTableSpace;
    &dropTSConfig;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 25130ec..058b8a9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index c48e37b..8200454 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -160,7 +161,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1272,6 +1274,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2415,6 +2421,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 6a4a9d9..e7d9aaa 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_constraint_fn.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1613,7 +1614,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1841,6 +1845,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2696,6 +2705,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..dfd5bef 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4201,3 +4201,54 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+									  PointerGetDatum(stats_name),
+									  ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index d2aaa6d..3a6a0b0 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -438,9 +439,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		InvalidAttrNumber,		/* XXX same owner as relation */
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -913,6 +927,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2185,6 +2204,9 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			/* FIXME do the right owner checks here */
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index abf9a70..b8a264e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,6 +158,17 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..5151001 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 070df29..cbaa4e1 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -55,7 +56,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Per-index data for ANALYZE */
 typedef struct AnlIndexData
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..cd65b58 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -292,6 +292,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 9e32f8d..09061bb 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -110,6 +110,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1167,6 +1169,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_DEFACL:
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..84a8b13
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/multixact.h"
+#include "access/reloptions.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/dependency.h"
+#include "catalog/heap.h"
+#include "catalog/index.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_constraint.h"
+#include "catalog/pg_depend.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_inherits.h"
+#include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_tablespace.h"
+#include "catalog/pg_trigger.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_type_fn.h"
+#include "catalog/storage.h"
+#include "catalog/toasting.h"
+#include "commands/cluster.h"
+#include "commands/comment.h"
+#include "commands/defrem.h"
+#include "commands/event_trigger.h"
+#include "commands/policy.h"
+#include "commands/sequence.h"
+#include "commands/tablecmds.h"
+#include "commands/tablespace.h"
+#include "commands/trigger.h"
+#include "commands/typecmds.h"
+#include "commands/user.h"
+#include "executor/executor.h"
+#include "foreign/foreign.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/parsenodes.h"
+#include "optimizer/clauses.h"
+#include "optimizer/planner.h"
+#include "parser/parse_clause.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_collate.h"
+#include "parser/parse_expr.h"
+#include "parser/parse_oper.h"
+#include "parser/parse_relation.h"
+#include "parser/parse_type.h"
+#include "parser/parse_utilcmd.h"
+#include "parser/parser.h"
+#include "pgstat.h"
+#include "rewrite/rewriteDefine.h"
+#include "rewrite/rewriteHandler.h"
+#include "rewrite/rewriteManip.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "storage/lock.h"
+#include "storage/predicate.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "utils/typcache.h"
+#include "utils/mvstats.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON table (columns) WITH (options)
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+	ObjectAddress	address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	ObjectAddress parentobject, childobject;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (stmt->if_not_exists &&
+		SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exists, skipping",
+						namestr)));
+		return InvalidObjectAddress;
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+
+	/* transform the column names to attnum values */
+
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, stmt->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+	values[Anum_pg_mv_statistic_staname -1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace -1] = ObjectIdGetDatum(namespaceId);
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also record dependency on the schema (to drop statistics on DROP SCHEMA)
+	 */
+	parentobject.classId = NamespaceRelationId;
+	parentobject.objectId = ObjectIdGetDatum(namespaceId);
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *     DROP STATISTICS stats_name ON table_name
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	HeapTuple	tup;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96dc923..96ab02f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -37,6 +37,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -95,7 +96,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -143,8 +144,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a9e9cc3..1a04024 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4124,6 +4124,19 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt  *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4999,6 +5012,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 85acce8..474d2c7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1968,6 +1968,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3409,6 +3424,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0ea9fcf..b9de71d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -28,6 +28,7 @@
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -40,7 +41,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -94,6 +97,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -387,6 +391,65 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+			if (! HeapTupleIsValid(htup))
+				continue;
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b307b48..3be3f02 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -241,7 +241,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -809,6 +809,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3436,6 +3437,36 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $5;
+							n->keys = $7;
+							n->options = $9;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $8;
+							n->keys = $10;
+							n->options = $12;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -5621,6 +5652,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 045f7f0..2ba88e2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1520,6 +1520,10 @@ ProcessUtilitySlow(Node *parsetree,
 				address = ExecSecLabelStmt((SecLabelStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:	/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -2160,6 +2164,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_TRANSFORM:
 					tag = "DROP TRANSFORM";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2527,6 +2534,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 130c06d..3bc4c8a 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3956,6 +3957,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4920,6 +4977,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 65ffe84..3c1bc4b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -502,6 +503,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
new file mode 100644
index 0000000..1f96fbc
--- /dev/null
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -0,0 +1,222 @@
+Soft functional dependencies
+============================
+
+A type of multivariate statistics used to capture cases when one column (or
+possibly a combination of columns) determines values in another column. We may
+also say that one column implies the other one.
+
+A simple artificial example may be a table with two columns, created like this
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, once we know the value for column 'a' the value for 'b' is trivially
+determined, as it's simply (a/10). A more practical example may be addresses,
+where (ZIP code -> city name), i.e. once we know the ZIP, we probably know the
+city it belongs to, as ZIP codes are usually assigned to one city. Larger cities
+may have multiple ZIP codes, so the dependency can't be reversed.
+
+Functional dependencies are a concept well described in relational theory,
+particularly in definition of normalization and "normal forms". Wikipedia has a
+nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency on
+    a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee" table
+    that includes the attributes "Employee ID" and "Employee Date of Birth", the
+    functional dependency {Employee ID} -> {Employee Date of Birth} would hold.
+    It follows from the previous two sentences that each {Employee ID} is
+    associated with precisely one {Employee Date of Birth}.
+
+    [1] http://en.wikipedia.org/wiki/Database_normalization
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases it's actually a conscious
+design choice to model the dataset in denormalized way, either because of
+performance or to make querying easier.
+
+The functional dependencies are called 'soft' because the implementation is
+meant to allow small number of rows contradicting the dependency. Many actual
+data sets contain some sort of errors, either because of data entry mistakes
+(user mistyping the ZIP code) or issues in generating the data (e.g. a ZIP code
+mistakenly assigned to two cities in different states). A strict implementation
+would ignore dependencies on such noisy data, rendering the approach unusable on
+such data sets.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current build algorithm is rather simple - for each pair (a,b) of columns,
+the data are sorted lexicographically (first by 'a', then by 'b'). Then for each
+group (rows with the same 'a' value) we decide whether the group is neutral,
+supporting or contradicting the dependency (a->b).
+
+A group is considered neutral when it's too small - e.g. when there's a single
+row in the group, there can't possibly be multiple values in 'b'. For this
+reason we ignore groups smaller than a threshold (currently 3 rows).
+
+For sufficiently large groups (3 rows or more), we count the number of distinct
+values in 'b'. When there's a single 'b' value, the group is considered to
+support the dependency (a->b), otherwise it's condidered as contradicting it.
+
+At the end, we compare the number of rows in supporting and contradicting groups,
+and if there are at least 10x as many supporting rows, we consider the
+functional dependency to be valid.
+
+
+This approach has the negative property that the algorithm is that it's a bit
+fragile with respect to the sample - there may be data sets producing quite
+different results for each ANALYZE execution (as even a single row may change
+the outcome of the final 10x test).
+
+It was proposed to make the dependencies "fuzzy" - e.g. track some coefficient
+between [0,1] determining how much the dependency holds. That would however mean
+we have to keep all the dependencies, as eliminating them based on the value of
+the coefficient (e.g. throw away dependencies <= 0.5) would result in exactly
+the same fragility issues. This would also make it more complicated to combine
+dependencies. So this does not seem like a practical approach.
+
+A better approach might be to replace the constants (min_group_size=3 and 10x)
+with values somehow related to the particular data set.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Apllying the functional dependencies is quite simple - given a list of equality
+clauses, check which clauses are redundant (i.e. implied by some other clause).
+For example given clause list
+
+    (a = 2) AND (b = 2) AND (c = 3)
+
+and dependencies (a->b) and (a->d), the list of clauses may be simplified to
+
+    (a = 1) AND (c = 3)
+
+Functional dependencies may only be applied to equality clauses, all other types
+of clauses are ignored. See clauselist_apply_dependencies() for more details.
+
+
+Compatibility of clauses
+------------------------
+
+The reduction assumes the clauses really are redundant, and the value in the
+reduced clause (b=2) is the value determined by (a=1). If that's not the case
+and the values are "incompatible" the result will be over-estimation.
+
+This may happen for example when using conditions on ZIP and city name with
+mismatching values (ZIP for a different city), etc. In such case the result
+set will be empty, but we'll estimate the selectivity using the ZIP condition.
+
+In this case the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+
+Dependencies vs. MCV/histogram
+------------------------------
+
+In some cases the "compatibility" of the conditions might be verified using the
+other types of multivariate stats - MCV lists and histograms.
+
+For MCV lists the verification might be very simple - peek into the list if
+there are any items matching the clause on the 'a' column (e.g. ZIP code), and
+if such item is found, check that the 'b' column matches the other clause. If it
+does not, the clauses are contradictory. We can't really say if such item was
+not found, except maybe restricting the selectivity using the MCV data (e.g.
+using min/max selectivity, or something).
+
+With histograms, it might work similarly - we can't check the values directly
+(because histograms use buckets, unlike MCV lists, storing the actual values).
+So we can only observe the buckets matching the clauses - if those buckets have
+very low frequency, it probably means the two clauses are incompatible.
+
+It's unclear what 'low frequency' is, but if one of the clauses is implied
+(automatically true because of the other clause), then
+
+    selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+
+So we might compute selectivity of the first clause - for example using regular
+statistics. And then check if the selectivity computed from the histogram is
+about the same (or significantly lower).
+
+The problem is that histograms work well only when the data ordering matches the
+natural meaning. For values that serve as labels - like city names or ZIP codes,
+or even generated IDs, histograms really don't work all that well. For example
+sorting cities by name won't match the sorting of ZIP codes, rendering the
+histogram unusable.
+
+So MCVs are probably going to work much better, because they don't really assume
+any sort of ordering. And it's probably more appropriate for the label-like data.
+
+A good question however is why even use functional dependencies in such cases
+and not simply use the MCV/histogram instead. One reason is that the functional
+dependencies allow fallback to regular stats, and often produce more accurate
+estimates - especially compared to histograms, that are quite bad in estimating
+equality clauses.
+
+
+Limitations
+-----------
+
+Let's see the main liminations of functional dependencies, especially those
+related to the current implementation.
+
+The current implementation supports only dependencies between two columns, but
+this is merely a simplification of the initial implementation. It's certainly
+useful to mine for dependencies involving multiple columns on the 'left' side,
+i.e. a condition for the dependency. That is dependencies like (a,b -> c).
+
+The implementation may/should be smart enough not to mine redundant conditions,
+e.g. (a->b) and (a,c -> b), because the latter is a trivial consequence of the
+former one (if values of 'a' determine 'b', adding another column won't change
+that relationship). The ANALYZE should first analyze 1:1 dependencies, then 2:1
+dependencies (and skip the already identified ones), etc.
+
+For example the dependency
+
+    (city name -> zip code)
+
+is much stronger, i.e. whenever it hold, then
+
+    (city name, state name -> zip code)
+
+holds too. But in case there are cities with the same name in different states,
+then only the latter dependency will be valid.
+
+Of course, there probably are cities with the same name within a single state,
+but hopefully this is relatively rare occurence (and thus we'll still detect
+the 'soft' dependency).
+
+Handling multiple columns on the right side of the dependency, is not necessary,
+as those dependencies may be simply decomposed into a set of dependencies with
+the same meaning, one for each column on the right side. For example
+
+    (a -> b,c)
+
+is exactly the same as
+
+    (a -> b) & (a -> c)
+
+Of course, storing the first form may be more efficient thant storing multiple
+'simple' dependencies separately.
+
+
+TODO Support dependencies with multiple columns on left/right.
+
+TODO Investigate using histogram and MCV list to verify the dependencies.
+
+TODO Investigate statistical testing of the distribution (to decide whether it
+     makes sense to build the histogram/MCV list).
+
+TODO Using a min/max of selectivities would probably make more sense for the
+     associated columns.
+
+TODO Consider eliminating the implied columns from the histogram and MCV lists
+     (but maybe that's not a good idea, because that'd make it impossible to use
+     these stats for non-equality clauses and also it wouldn't be possible to
+     use the stats for verification of the dependencies).
+
+TODO The reduction probably might be extended to also handle IS NULL clauses,
+     assuming we fix the ANALYZE to properly handle NULL values. We however
+     won't be able to reduce IS NOT NULL (unless I'm missing something).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..6d5465b
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..2a064a0
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,437 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Detect functional dependencies between columns.
+ *
+ * TODO This builds a complete set of dependencies, i.e. including transitive
+ *      dependencies - if we identify [A => B] and [B => C], we're likely to
+ *      identify [A => C] too. It might be better to  keep only the minimal set
+ *      of dependencies, i.e. prune all the dependencies that we can recreate
+ *      by transivitity.
+ * 
+ *      There are two conceptual ways to do that:
+ * 
+ *      (a) generate all the rules, and then prune the rules that may be
+ *          recteated by combining other dependencies, or
+ * 
+ *      (b) performing the 'is combination of other dependencies' check before
+ *          actually doing the work
+ * 
+ *      The second option has the advantage that we don't really need to perform
+ *      the sort/count. It's not sufficient alone, though, because we may
+ *      discover the dependencies in the wrong order. For example we may find
+ *
+ *          (a -> b), (a -> c) and then (b -> c)
+ *
+ *      None of those dependencies is a combination of the already known ones,
+ *      yet (a -> C) is a combination of (a -> b) and (b -> c).
+ *
+ * 
+ * FIXME Currently we simply replace NULL values with 0 and then handle is as
+ *       a regular value, but that groups NULL and actual 0 values. That's
+ *       clearly incorrect - we need to handle NULL values as a separate value.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct values in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we can estimate the
+	 *      average group size and use it as a threshold. Or something
+	 *      like that. Seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index fd8dc91..4f106c3 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2104,6 +2104,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90500)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 049bf9f..12211fe 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,10 +153,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 
 /* in dependency.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index b80d8d8..5ae42f7 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ab2c1a8..a768bb5 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId  3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..44cf9c6 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -137,6 +137,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..a568a07
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,73 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;			/* relation containing attributes */
+	NameData	staname;			/* statistics name */
+	Oid			stanamespace;		/* OID of namespace containing this statistics */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					7
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_deps_enabled		4
+#define Anum_pg_mv_statistic_deps_built			5
+#define Anum_pg_mv_statistic_stakeys			6
+#define Anum_pg_mv_statistic_stadeps			7
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 62b9125..20d565c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2666,6 +2666,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index b7a38ce..a52096b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3577, 3578);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 54f67e9..99a6a62 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -75,6 +75,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(List *name, List *args, bool oldstyle,
 				List *parameters, const char *queryString);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index c407fa2..2226aad 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -251,6 +251,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -386,6 +387,7 @@ typedef enum NodeTag
 	T_CreatePolicyStmt,
 	T_AlterPolicyStmt,
 	T_CreateTransformStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fd0629..e1807fb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -601,6 +601,17 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+	bool		if_not_exists;	/* just do nothing if statistics already exists? */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1410,6 +1421,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index af8cb6b..de86d01 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -503,6 +503,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -600,6 +601,33 @@ typedef struct IndexOptInfo
 	void		(*amcostestimate) ();	/* AM's cost estimator */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..7ebd961
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..8771f9c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -61,6 +61,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -93,6 +94,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 1b48304..9f03c8d 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..0e0658d 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 81bc5c9..84b4425 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1368,6 +1368,15 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
2.1.0

0003-clause-reduction-using-functional-dependencies.patchtext/x-patch; charset=UTF-8; name=0003-clause-reduction-using-functional-dependencies.patchDownload

From 282579eef3de01e0d31ed5f7067045a4f97fbfb8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/9] clause reduction using functional dependencies

During planning, use functional dependencies to decide which clauses to
skip during cardinality estimation. Initial and rather simplistic
implementation.

This only works with regular WHERE clauses, not clauses used for join
clauses.

Note: The clause_is_mv_compatible() needs to identify the relation (so
that we can fetch the list of multivariate stats by OID).
planner_rt_fetch() seems like the appropriate way to get the relation
OID, but apparently it only works with simple vars. Maybe
examine_variable() would make this work with more complex vars too?

Includes regression tests analyzing functional dependencies (part of
ANALYZE) on several datasets (no dependencies, no transitive
dependencies, ...).

Checks that a query with conditions on two columns, where one (B) is
functionally dependent on the other one (A), correctly ignores the
clause on (B) and chooses bitmap index scan instead of plain index scan
(which is what happens otherwise, thanks to assumption of
independence).

Note: Functional dependencies only work with equality clauses, no
inequalities etc.
---
 src/backend/optimizer/path/clausesel.c        | 891 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/README.stats        |  36 ++
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 +
 src/include/utils/mvstats.h                   |  16 +-
 src/test/regress/expected/mv_dependencies.out | 172 +++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 +++++
 9 files changed, 1293 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.stats
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 02660c2..80708fe 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,23 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+
+static int count_mv_attnums(List *clauses, Index relid);
+
+static int count_varnos(List *clauses, Index *relid);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+									Index relid, List *stats);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, Index relid);
+
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +82,19 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
+ * The first thing we try to do is applying multivariate statistics, in a way
+ * that intends to minimize the overhead when there are no multivariate stats
+ * on the relation. Thus we do several simple (and inexpensive) checks first,
+ * to verify that suitable multivariate statistics exist.
+ *
+ * If we identify such multivariate statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using multivariate stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -99,6 +133,22 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* list of multivariate stats on the relation */
+	List	   *stats = NIL;
+
+	/*
+	 * To fetch the statistics, we first need to determine the rel. Currently
+	 * point we only support estimates of simple restrictions with all Vars
+	 * referencing a single baserel. However set_baserel_size_estimates() sets
+	 * varRelid=0 so we have to actually inspect the clauses by pull_varnos
+	 * and see if there's just a single varno referenced.
+	 */
+	if ((count_varnos(clauses, &relid) == 1) && ((varRelid == 0) || (varRelid == relid)))
+		stats = find_stats(root, relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +158,24 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Apply functional dependencies, but first check that there are some stats
+	 * with functional dependencies built (by simply walking the stats list),
+	 * and that there are at two or more attributes referenced by clauses that
+	 * may be reduced using functional dependencies.
+	 *
+	 * We would find that anyway when trying to actually apply the functional
+	 * dependencies, but let's do the cheap checks first.
+	 *
+	 * After applying the functional dependencies we get the remainig clauses
+	 * that need to be estimated by other types of stats (MCV, histograms etc).
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
+		(count_mv_attnums(clauses, relid) >= 2))
+	{
+		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +831,824 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+typedef struct
+{
+	Index		varno;		/* relid we're interested in */
+	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
+} mv_compatible_context;
+
+/*
+ * Recursive walker that checks compatibility of the clause with multivariate
+ * statistics, and collects attnums from the Vars.
+ *
+ * XXX The original idea was to combine this with expression_tree_walker, but
+ *     I've been unable to make that work - seems that does not quite allow
+ *     checking the structure. Hence the explicit calls to the walker.
+ */
+static bool
+mv_compatible_walker(Node *node, mv_compatible_context *context)
+{
+	if (node == NULL)
+		return false;
+ 
+	if (IsA(node, RestrictInfo))
+ 	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+ 
+ 		/* Pseudoconstants are not really interesting here. */
+ 		if (rinfo->pseudoconstant)
+			return true;
+ 
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+ 
+		/* check the clause inside the RestrictInfo */
+		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
+ 	}
+
+	if (IsA(node, Var))
+	{
+		Var * var = (Var*)node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might be
+		 * unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (! AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+ 	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+ 	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+ 		/*
+		 * Only expressions with two arguments are considered compatible.
+ 		 *
+		 * XXX Possibly unnecessary (can OpExpr have different arg count?).
+ 		 */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node*)expr) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				(varonleft = false,
+				 is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (! ok)
+			return true;
+
+ 		/*
+		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
+		 * Otherwise note the relid and attnum for the variable. This uses the
+		 * function for estimating selectivity, ont the operator directly (a bit
+		 * awkward, but well ...).
+ 		 */
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+ 		}
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return mv_compatible_walker((Node *) var, context);
+ 	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+ 
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+{
+	mv_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (mv_compatible_walker(clause, (void *) &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+/*
+ * collect attnums from functional dependencies
+ *
+ * Walk through all statistics on the relation, and collect attnums covered
+ * by those with functional dependencies. We only look at columns specified
+ * when creating the statistics, not at columns actually referenced by the
+ * dependencies (which may only be a subset of the attributes).
+ */
+static Bitmapset*
+fdeps_collect_attnums(List *stats)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		int2vector *stakeys = info->stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+			attnums = bms_add_member(attnums, stakeys->values[j]);
+	}
+
+	return attnums;
+}
+
+/* transforms bitmapset into an array (index => value) */
+static int*
+make_idx_to_attnum_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum;
+
+	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attidx++] = attnum;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+/* transforms bitmapset into an array (value => index) */
+static int*
+make_attnum_to_idx_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum;
+	int		maxattnum = -1;
+	int	   *mapping;
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		maxattnum = attnum;
+
+	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attnum] = attidx++;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+/* build adjacency matrix for the dependencies */
+static bool*
+build_adjacency_matrix(List *stats, Bitmapset *attnums,
+					   int *idx_to_attnum, int *attnum_to_idx)
+{
+	ListCell *lc;
+	int		natts  = bms_num_members(attnums);
+	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! stat->deps_built)
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(stat->mvoid);
+		if (dependencies == NULL)
+		{
+			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
+			continue;
+		}
+
+		/* set matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			matrix[aidx * natts + bidx] = true;
+		}
+	}
+
+	return matrix;
+}
+
+/*
+ * multiply the adjacency matrix
+ *
+ * By multiplying the adjacency matrix, we derive dependencies implied by those
+ * stored in the catalog (but possibly in several separate rows). We need to
+ * repeat the multiplication until no new dependencies are discovered. The
+ * maximum number of multiplications is equal to the number of attributes.
+ *
+ * This is based on modeling the functional dependencies as edges in a directed
+ * graph with attributes as vertices.
+ */
+static void
+multiply_adjacency_matrix(bool *matrix, int natts)
+{
+	int i;
+
+	/* repeat the multiplication up to natts-times */
+	for (i = 0; i < natts; i++)
+	{
+		bool changed = false;	/* no changes in this round */
+		int k, l, m;
+
+		/* k => l */
+		for (k = 0; k < natts; k++)
+		{
+			for (l = 0; l < natts; l++)
+			{
+				/* skip already known dependencies */
+				if (matrix[k * natts + l])
+					continue;
+
+				/*
+				 * compute (k,l) in the multiplied matrix
+				 *
+				 * We don't really care about the exact value, just true/false,
+				 * so terminate the loop once we get a hit. Also, this makes it
+				 * safe to modify the matrix in-place.
+				 */
+				for (m = 0; m < natts; m++)
+				{
+					if (matrix[k * natts + m] * matrix[m * natts + l])
+					{
+						matrix[k * natts + l] = true;
+						changed = true;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added in this round, so terminate */
+		if (! changed)
+			break;
+	}
+}
+
+/*
+ * Reduce clauses using functional dependencies
+ *
+ * Walk through clauses and eliminate the redundant ones (implied by other
+ * clauses). This is done by first deriving a transitive closure of all the
+ * functional dependencies (by multiplying the adjacency matrix).
+ */
+static List*
+fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
+					int *idx_to_attnum, int *attnum_to_idx, Index relid)
+{
+	int i;
+	ListCell *lc;
+	List   *reduced_clauses = NIL;
+
+	int			nmvclauses;	/* size of the arrays */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+
+	int			natts = bms_num_members(attnums);
+
+	/*
+	 * Preallocate space for all clauses (the list only containst
+	 * compatible clauses at this point). This makes it somewhat easier
+	 * to access the stats / attnums randomly.
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* fill the arrays */
+	nmvclauses = 0;
+	foreach (lc, clauses)
+	{
+		Node * clause = (Node*)lfirst(lc);
+		Bitmapset * attnums = get_varattnos(clause, relid);
+
+		mvclauses[nmvclauses] = clause;
+		mvattnums[nmvclauses] = bms_singleton_member(attnums);
+		nmvclauses++;
+	}
+
+	Assert(nmvclauses == list_length(clauses));
+
+	/* now try to reduce the clauses (using the dependencies) */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], attnums))
+				continue;
+
+			aidx = attnum_to_idx[mvattnums[i]];
+			bidx = attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = matrix[aidx * natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	return reduced_clauses;
+}
+
+/*
+ * filter clauses that are interesting for the reduction step
+ *
+ * Functional dependencies can only work with equality clauses with attributes
+ * covered by at least one of the statistics, so we walk through the clauses
+ * and copy the uninteresting ones directly to the result (reduced) clauses.
+ *
+ * That includes clauses that:
+ *     (a) are not mv-compatible
+ *     (b) reference more than a single attnum
+ *     (c) use attnum not covered by functional depencencies
+ *
+ * The clauses interesting for the reduction step are copied to deps_clauses.
+ *
+ * root            - planner root
+ * clauses         - list of clauses (input)
+ * deps_attnums    - attributes covered by dependencies
+ * reduced_clauses - resulting clauses (not subject to reduction step)
+ * deps_clauses    - clauses to be processed by reduction
+ * relid           - relid of the baserel
+ *
+ * The return value is a bitmap of attnums referenced by deps_clauses.
+ */
+static Bitmapset *
+fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Index relid)
+{
+	ListCell *lc;
+	Bitmapset *clause_attnums = NULL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+
+		if (! clause_is_mv_compatible(clause, relid, &attnum))
+
+			/* clause incompatible with functional dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(attnum, deps_attnums))
+
+			/* clause not covered by the dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else
+		{
+			*deps_clauses   = lappend(*deps_clauses, clause);
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	return clause_attnums;
+}
+
+/*
+ * reduce list of equality clauses using soft functional dependencies
+ *
+ * We simply walk through list of functional dependencies, and for each one we
+ * check whether the dependency 'matches' the clauses, i.e. if there's a clause
+ * matching the condition. If yes, we attempt to remove all clauses matching
+ * the implied part of the dependency from the list.
+ *
+ * This only reduces equality clauses, and ignores all the other types. We might
+ * extend it to handle IS NULL clause, in the future.
+ *
+ * We also assume the equality clauses are 'compatible'. For example we can't
+ * identify when the clauses use a mismatching zip code and city name. In such
+ * case the usual approach (product of selectivities) would produce a better
+ * estimate, although mostly by chance.
+ *
+ * The implementation needs to be careful about cyclic dependencies, e.g. when
+ *
+ *     (a -> b) and (b -> a)
+ *
+ * at the same time, which means there's 1:1 relationship between te columns.
+ * In this case we must not reduce clauses on both attributes at the same time.
+ *
+ * TODO Currently we only apply functional dependencies at the same level, but
+ *      maybe we could transfer the clauses from upper levels to the subtrees?
+ *      For example let's say we have (a->b) dependency, and condition
+ *
+ *          (a=1) AND (b=2 OR c=3)
+ *
+ *      Currently, we won't be able to perform any reduction, because we'll
+ *      consider (a=1) and (b=2 OR c=3) independently. But maybe we could pass
+ *      (a=1) into the other expression, and only check it against conditions
+ *      of the functional dependencies?
+ *
+ *      In this case we'd end up with
+ *
+ *         (a=1)
+ *
+ *      as we'd consider (b=2) implied thanks to the rule, rendering the whole
+ *      OR clause valid.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Index relid, List *stats)
+{
+	List	   *reduced_clauses = NIL;
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	List	   *deps_clauses   = NIL;
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/*
+	 * Build the dependency matrix, i.e. attribute adjacency matrix,
+	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
+	 * multiply it by itself, to get transitive dependencies.
+	 *
+	 * Note: This is pretty much transitive closure from graph theory.
+	 *
+	 * First, let's see what attributes are covered by functional
+	 * dependencies (sides of the adjacency matrix), and also a maximum
+	 * attribute (size of mapping to simple integer indexes);
+	 */
+	deps_attnums = fdeps_collect_attnums(stats);
+
+	/*
+	 * Walk through the clauses - clauses that are (one of)
+	 *
+	 * (a) not mv-compatible
+	 * (b) are using more than a single attnum
+	 * (c) using attnum not covered by functional depencencies
+	 *
+	 * may be copied directly to the result. The interesting clauses are
+	 * kept in 'deps_clauses' and will be processed later.
+	 */
+	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
+										&reduced_clauses, &deps_clauses, relid);
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+		list_free(deps_clauses);
+
+		return clauses;
+	}
+
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't really reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(deps_clauses);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/*
+	 * Build mapping between matrix indexes and attnums, and then the
+	 * adjacency matrix itself.
+	 */
+	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
+	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
+
+	/* build the adjacency matrix */
+	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
+										 deps_idx_to_attnum,
+										 deps_attnum_to_idx);
+
+	deps_natts = bms_num_members(deps_attnums);
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	multiply_adjacency_matrix(deps_matrix, deps_natts);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	deps_clauses = fdeps_reduce_clauses(deps_clauses,
+										deps_attnums, deps_matrix,
+										deps_idx_to_attnum,
+										deps_attnum_to_idx, relid);
+
+	/* join the two lists of clauses */
+	reduced_clauses = list_union(reduced_clauses, deps_clauses);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Lookups stats for a given baserel.
+ */
+static List *
+find_stats(PlannerInfo *root, Index relid)
+{
+	Assert(root->simple_rel_array[relid] != NULL);
+
+	return root->simple_rel_array[relid]->mvstatlist;
+}
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
new file mode 100644
index 0000000..a38ea7b
--- /dev/null
+++ b/src/backend/utils/mvstats/README.stats
@@ -0,0 +1,36 @@
+Multivariate statististics
+==========================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Multivariate stats track different types of dependencies between the columns,
+hopefully improving the estimates.
+
+Currently we only have one kind of multivariate statistics - soft functional
+dependencies, and we use it to improve estimates of equality clauses. See
+README.dependencies for details.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable multivariate stats
+        exist (so if you are not using multivariate stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are multivariate stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with multivariate statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..bd200bc 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 2a064a0..c80ba33 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -435,3 +435,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 7ebd961..cc43a79 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,12 +17,20 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -48,6 +56,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..e759997
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index bec0316..4f2ffb8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7e9b319..097a04f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -162,3 +162,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..48dea4d
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
2.1.0

0004-multivariate-MCV-lists.patchtext/x-patch; charset=UTF-8; name=0004-multivariate-MCV-lists.patchDownload

From c15fa03dbc0be00f80f12545b1468a8ca55a57f5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/9] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 doc/src/sgml/ref/create_statistics.sgml |   18 +
 src/backend/catalog/system_views.sql    |    4 +-
 src/backend/commands/statscmds.c        |   45 +-
 src/backend/nodes/outfuncs.c            |    2 +
 src/backend/optimizer/path/clausesel.c  |  829 ++++++++++++++++++++++-
 src/backend/optimizer/util/plancat.c    |    4 +-
 src/backend/utils/mvstats/Makefile      |    2 +-
 src/backend/utils/mvstats/README.mcv    |  137 ++++
 src/backend/utils/mvstats/README.stats  |   89 ++-
 src/backend/utils/mvstats/common.c      |  104 ++-
 src/backend/utils/mvstats/common.h      |   11 +-
 src/backend/utils/mvstats/mcv.c         | 1094 +++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                 |   25 +-
 src/include/catalog/pg_mv_statistic.h   |   18 +-
 src/include/catalog/pg_proc.h           |    4 +
 src/include/nodes/relation.h            |    2 +
 src/include/utils/mvstats.h             |   69 +-
 src/test/regress/expected/mv_mcv.out    |  207 ++++++
 src/test/regress/expected/rules.out     |    4 +-
 src/test/regress/parallel_schedule      |    2 +-
 src/test/regress/serial_schedule        |    1 +
 src/test/regress/sql/mv_mcv.sql         |  178 +++++
 22 files changed, 2776 insertions(+), 73 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.mcv
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index a86eae3..193e4b0 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -132,6 +132,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>max_mcv_items</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of MCV list items.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a264e..2d570ee 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -165,7 +165,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 84a8b13..90bfaed 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -136,7 +136,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject, childobject;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -212,6 +218,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -220,10 +249,16 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -243,8 +278,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 474d2c7..e3983fd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1977,9 +1977,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 80708fe..977f88e 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,23 +48,51 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
-static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
+							 int type);
 
-static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid, int type);
 
-static int count_mv_attnums(List *clauses, Index relid);
+static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, Index relid,
+								 List *clauses, List **mvclauses,
+								 MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
+
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
 
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -89,11 +118,13 @@ static List * find_stats(PlannerInfo *root, Index relid);
  * to verify that suitable multivariate statistics exist.
  *
  * If we identify such multivariate statistics apply, we try to apply them.
- * Currently we only have (soft) functional dependencies, so we try to reduce
- * the list of clauses.
  *
- * Then we remove the clauses estimated using multivariate stats, and process
- * the rest of the clauses using the regular per-column stats.
+ * First we try to reduce the list of clauses by applying (soft) functional
+ * dependencies, and then we try to estimate the selectivity of the reduced
+ * list of clauses using the multivariate MCV list.
+ *
+ * Finally we remove the portion of clauses estimated using multivariate stats,
+ * and process the rest of the clauses using the regular per-column stats.
  *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
@@ -170,12 +201,46 @@ clauselist_selectivity(PlannerInfo *root,
 	 * that need to be estimated by other types of stats (MCV, histograms etc).
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
-		(count_mv_attnums(clauses, relid) >= 2))
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_FDEP) >= 2))
 	{
 		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
 	}
 
 	/*
+	 * Check that there are statistics with MCV list or histogram, and also the
+	 * number of attributes covered by these types of statistics.
+	 *
+	 * If there are no such stats or not enough attributes, don't waste time
+	 * with the multivariate code and simply skip to estimation using the
+	 * regular per-column stats.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	{
+		/* collect attributes from the compatible conditions */
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+
+		/* and search for the statistic covering the most attributes */
+		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+		if (mvstat != NULL)	/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	*mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, MV_CLAUSE_TYPE_MCV);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats */
+			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -832,6 +897,69 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO We may support some additional conditions, most importantly those
+ *      matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ *      selectivity of the most restrictive clause), because that's the maximum
+ *      we can ever get from ANDed list of clauses. This may probably prevent
+ *      issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO We may remember the lowest frequency in the MCV list, and then later use
+ *      it as a upper boundary for the selectivity (had there been a more
+ *      frequent item, it'd be in the MCV list). This might improve cases with
+ *      low-detail histograms.
+ *
+ * TODO We may also derive some additional boundaries for the selectivity from
+ *      the MCV list, because
+ *
+ *      (a) if we have a "full equality condition" (one equality condition on
+ *          each column of the statistic) and we found a match in the MCV list,
+ *          then this is the final selectivity (and pretty accurate),
+ *
+ *      (b) if we have a "full equality condition" and we haven't found a match
+ *          in the MCV list, then the selectivity is below the lowest frequency
+ *          found in the MCV list,
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Pull varattnos from the clauses, similarly to pull_varattnos() but:
  *
@@ -869,28 +997,26 @@ get_varattnos(Node * node, Index relid)
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid)
+collect_mv_attnums(List *clauses, Index relid, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate
-	 * using multivariate stats, and remember the relid/columns. We'll
-	 * then cross-check if we have suitable stats, and only if needed
-	 * we'll split the clauses into multivariate and regular lists.
+	 * Walk through the clauses and identify the ones we can estimate using
+	 * multivariate stats, and remember the relid/columns. We'll then
+	 * cross-check if we have suitable stats, and only if needed we'll split
+	 * the clauses into multivariate and regular lists.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested
-	 * OpExpr, using either a range or equality.
+	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
+	 * using either a range or equality.
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(clause, relid, &attnum))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
 	}
 
 	/*
@@ -911,10 +1037,10 @@ collect_mv_attnums(List *clauses, Index relid)
  * Count the number of attributes in clauses compatible with multivariate stats.
  */
 static int
-count_mv_attnums(List *clauses, Index relid)
+count_mv_attnums(List *clauses, Index relid, int type)
 {
 	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
 
 	c = bms_num_members(attnums);
 
@@ -944,9 +1070,183 @@ count_varnos(List *clauses, Index *relid)
 
 	return cnt;
 }
+ 
+/*
+ * We're looking for statistics matching at least 2 attributes, referenced in
+ * clauses compatible with multivariate statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * If there are multiple statistics referencing the same number of columns
+ * (from the clauses), the one with less source columns (as listed in the
+ * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns, but one
+ *     has 100 buckets and the other one has 1000 buckets (thus likely
+ *     providing better estimates), this is not currently considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list, another
+ *     one with just a histogram and a third one with both, we treat them equally.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts, so if
+ *     there are multiple clauses on a single attribute, this still counts as
+ *     a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example equality
+ *     clauses probably work better with MCV lists than with histograms. But
+ *     IS [NOT] NULL conditions may often work better with histograms (thanks
+ *     to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
+ * as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list of
+ * clauses into two parts - conditions that are compatible with the selected
+ * stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while the last
+ * condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting conditions
+ * instead of just referenced attributes), but eventually the best option should
+ * be to combine multiple statistics. But that's much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses, because
+ *      'dependencies' will probably work only with equality clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements) and for
+	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses that
+ * will be evaluated using the chosen statistics, and the remaining clauses
+ * (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes, so we can do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list of
+		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
+		 * clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible with the chosen
+	 * histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
 
 typedef struct
 {
+	int			types;		/* types of statistics ? */
 	Index		varno;		/* relid we're interested in */
 	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
 } mv_compatible_context;
@@ -964,23 +1264,66 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 {
 	if (node == NULL)
 		return false;
- 
+
 	if (IsA(node, RestrictInfo))
  	{
 		RestrictInfo *rinfo = (RestrictInfo *) node;
- 
+
  		/* Pseudoconstants are not really interesting here. */
  		if (rinfo->pseudoconstant)
 			return true;
- 
+
 		/* clauses referencing multiple varnos are incompatible */
 		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
 			return true;
- 
+
 		/* check the clause inside the RestrictInfo */
 		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
  	}
 
+	if (or_clause(node) || and_clause(node) || not_clause(node))
+ 	{
+		/*
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses are
+		 *      supported and some are not, and treat all supported subclauses
+		 *      as a single clause, compute it's selectivity using mv stats,
+		 *      and compute the total selectivity using the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the orclause
+		 *      with nested RestrictInfo - we won't have to call pull_varnos()
+		 *      for each clause, saving time.
+		 *
+		 * TODO Perhaps this needs a bit more thought for functional
+		 *      dependencies? Those don't quite work for NOT cases.
+		 */
+		BoolExpr   *expr = (BoolExpr *) node;
+		ListCell   *lc;
+
+		foreach (lc, expr->args)
+		{
+			if (mv_compatible_walker((Node *) lfirst(lc), context))
+				return true;
+		}
+
+		return false;
+ 	}
+
+	if (IsA(node, NullTest))
+ 	{
+		NullTest* nt = (NullTest*)node;
+
+		/*
+		 * Only simple (Var IS NULL) expressions supported for now. Maybe we could
+		 * use examine_variable to fix this?
+		 */
+		if (! IsA(nt->arg, Var))
+			return true;
+
+		return mv_compatible_walker((Node*)(nt->arg), context);
+	}
+
 	if (IsA(node, Var))
 	{
 		Var * var = (Var*)node;
@@ -1031,7 +1374,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		/* unsupported structure (two variables or so) */
 		if (! ok)
 			return true;
-
+ 
  		/*
 		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
 		 * Otherwise note the relid and attnum for the variable. This uses the
@@ -1041,10 +1384,18 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		switch (get_oprrest(expr->opno))
 		{
 			case F_EQSEL:
-
 				/* equality conditions are compatible with all statistics */
 				break;
 
+			case F_SCALARLTSEL:
+			case F_SCALARGTSEL:
+
+				/* not compatible with functional dependencies */
+				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+					return true;	/* terminate */
+ 
+				break;
+
 			default:
 
 				/* unknown estimator */
@@ -1055,11 +1406,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 
 		return mv_compatible_walker((Node *) var, context);
  	}
-
+ 
 	/* Node not explicitly supported, so terminate */
 	return true;
 }
- 
+
 /*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
@@ -1078,10 +1429,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
  *      evaluate them using multivariate stats.
  */
 static bool
-clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int types)
 {
 	mv_compatible_context context;
 
+	context.types = types;
 	context.varno = relid;
 	context.varattnos = NULL;	/* no attnums */
 
@@ -1089,7 +1441,7 @@ clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
 		return false;
 
 	/* remember the newly collected attnums */
-	*attnum = bms_singleton_member(context.varattnos);
+	*attnums = bms_add_members(*attnums, context.varattnos);
 
 	return true;
 }
@@ -1394,24 +1746,39 @@ fdeps_filter_clauses(PlannerInfo *root,
 
 	foreach (lc, clauses)
 	{
-		AttrNumber	attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
 
-		if (! clause_is_mv_compatible(clause, relid, &attnum))
+		if (! clause_is_mv_compatible(clause, relid, &attnums,
+									  MV_CLAUSE_TYPE_FDEP))
 
 			/* clause incompatible with functional dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
-		else if (! bms_is_member(attnum, deps_attnums))
+		else if (bms_num_members(attnums) > 1)
+
+			/*
+			 * clause referencing multiple attributes (strange, should
+			 * this be handled by clause_is_mv_compatible directly)
+			 */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
 
 			/* clause not covered by the dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
 		else
 		{
+			/* ok, clause compatible with existing dependencies */
+			Assert(bms_num_members(attnums) == 1);
+
 			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+										bms_singleton_member(attnums));
 		}
+
+		bms_free(attnums);
 	}
 
 	return clause_attnums;
@@ -1637,6 +2004,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+			return true;
 	}
 
 	return false;
@@ -1652,3 +2022,392 @@ find_stats(PlannerInfo *root, Index relid)
 
 	return root->simple_rel_array[relid]->mvstatlist;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/*
+	 * find the lowest frequency in the MCV list
+	 *
+	 * We need to do that here, because we do various tricks in the following
+	 * code - skipping items already ruled out, etc.
+	 *
+	 * XXX A loop is necessary because the MCV list is not sorted by frequency.
+	 */
+	*lowsel = 1.0;
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem item = mcvlist->items[i];
+
+		if (item->frequency < *lowsel)
+			*lowsel = item->frequency;
+	}
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+			FmgrInfo	opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					switch (oprrest)
+					{
+						case F_EQSEL:
+							/*
+							 * We don't care about isgt in equality, because it does not
+							 * matter whether it's (var = const) or (const = var).
+							 */
+							mismatch = ! DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							if (! mismatch)
+								eqmatches = bms_add_member(eqmatches, idx);
+
+							break;
+
+						case F_SCALARLTSEL:	/* column < constant */
+						case F_SCALARGTSEL: /* column > constant */
+
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							/* invert the result if isgt=true */
+							mismatch = (isgt) ? (! mismatch) : mismatch;
+							break;
+					}
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! item->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (item->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b9de71d..a92f889 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -425,9 +425,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.mcv b/src/backend/utils/mvstats/README.mcv
new file mode 100644
index 0000000..e93cfe4
--- /dev/null
+++ b/src/backend/utils/mvstats/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+    (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+    (1) there's an equality condition on all attributes of the statistic
+
+    (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+     f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+     of the actual values. This type of MCV list will be useful only for
+     estimating equality clauses, and will reduce space requirements for large
+     varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+     building the histogram at all), except for doing this decision manually in
+     ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+    SELECT * FROM pg_mv_mcv_items();
+
+It has two input parameters:
+
+    oid   - OID of the MCV list (pg_mv_statistic.staoid)
+
+and produces a table with these columns:
+
+    - item ID (0...nitems-1)
+    - values (string array)
+    - nulls only (boolean array)
+    - frequency (double precision)
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index a38ea7b..5c5c59a 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,9 +8,50 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-Currently we only have one kind of multivariate statistics - soft functional
-dependencies, and we use it to improve estimates of equality clauses. See
-README.dependencies for details.
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) MCV lists (README.mcv)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+    (b) MCV list - equality and inequality clauses, IS [NOT] NULL, AND/OR
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
 
 
 Selectivity estimation
@@ -23,14 +64,48 @@ When estimating selectivity, we aim to achieve several things:
     (b) minimize the overhead, especially when no suitable multivariate stats
         exist (so if you are not using multivariate stats, there's no overhead)
 
-This clauselist_selectivity() performs several inexpensive checks first, before
+Thus clauselist_selectivity() performs several inexpensive checks first, before
 even attempting to do the more expensive estimation.
 
     (1) check if there are multivariate stats on the relation
 
-    (2) check there are at least two attributes referenced by clauses compatible
-        with multivariate statistics (equality clauses for func. dependencies)
+    (2) check that there are functional dependencies on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equality clauses for func. dependencies)
 
     (3) perform reduction of equality clauses using func. dependencies
 
-    (4) estimate the reduced list of clauses using regular statistics
+    (4) check that there are multivariate MCV lists on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equalities, inequalities, etc.)
+
+    (5) find the best multivariate statistics (matching the most conditions)
+        and use it to compute the estimate
+
+    (6) estimate the remaining clauses (not estimated using multivariate stats)
+        using the regular per-column statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
+
+
+Further (possibly crazy) ideas
+------------------------------
+
+Currently the clauses are only estimated using a single statistics, even if
+there are multiple candidate statistics - for example assume we have statistics
+on (a,b,c) and (b,c,d), and estimate conditions
+
+    (b = 1) AND (c = 2)
+
+Then both statistics may be used, but we only use one of them. Maybe we could
+use compute estimates using all candidate stats, and somehow aggregate them
+into the final estimate by using average or median.
+
+Some stats may give better estimates than others, but it's very difficult to say
+in advance which stats are the best (it depends on the number of buckets, number
+of additional columns not referenced in the clauses, type of condition etc.).
+
+But of course, this may result in expensive estimation (CPU-wise).
+
+So we might add a GUC to choose between a simple (single statistics) and thus
+multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd200bc..d1da714 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 6d5465b..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..551c934
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1094 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't
+ * allow more than 8k MCV items (see list max_mcv_items). We might
+ * increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the high compression as with histograms,
+ * because we're not doing any bucket splits etc. (which is the source
+ * of high redundancy there), but we need to do it anyway as we need
+ * to serialize varlena values etc. We might invent another way to
+ * serialize MCV lists, but let's keep it consistent.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* do not exceed UINT16_MAX */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval || (info[i].typlen > 0))
+			/* by value pased by reference, but fixed length */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * Inverse to serialize_mv_mcvlist() - see the comment there.
+ *
+ * We'll do full deserialization, because we don't really expect high
+ * duplication of values so the caching may not be as efficient as with
+ * histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			val = item->values[i];
+			valout = FunctionCall1(&fmgrinfo[i], val);
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 4f106c3..6339631 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,8 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2128,6 +2129,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2137,10 +2140,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index a568a07..fd7107d 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -37,15 +37,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -61,13 +67,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					7
+#define Natts_pg_mv_statistic					11
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
-#define Anum_pg_mv_statistic_deps_built			5
-#define Anum_pg_mv_statistic_stakeys			6
-#define Anum_pg_mv_statistic_stadeps			7
+#define Anum_pg_mv_statistic_mcv_enabled		5
+#define Anum_pg_mv_statistic_mcv_max_items		6
+#define Anum_pg_mv_statistic_deps_built			7
+#define Anum_pg_mv_statistic_mcv_built			8
+#define Anum_pg_mv_statistic_stakeys			9
+#define Anum_pg_mv_statistic_stadeps			10
+#define Anum_pg_mv_statistic_stamcv				11
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 20d565c..66b4bcd 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2670,6 +2670,10 @@ DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index de86d01..5ae6b3c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -619,9 +619,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index cc43a79..4535db7 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -51,30 +51,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..56748e3
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON mcv_list (unknown_column) WITH (mcv);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON mcv_list (a) WITH (mcv);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a, b) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+ERROR:  max number of MCV items is 8192
+-- correct command
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 84b4425..66071d8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1373,7 +1373,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     s.staname,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f2ffb8..85d94f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 097a04f..6584d73 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -163,3 +163,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..af4c9f4
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON mcv_list (unknown_column) WITH (mcv);
+
+-- single column
+CREATE STATISTICS s1 ON mcv_list (a) WITH (mcv);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a) WITH (mcv);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mcv_list (a, a, b) WITH (mcv);
+
+-- unknown option
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (unknown_option);
+
+-- missing MCV statistics
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+
+-- correct command
+CREATE STATISTICS s1 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.1.0

0005-multivariate-histograms.patchtext/x-patch; charset=UTF-8; name=0005-multivariate-histograms.patchDownload

From 31ff6cd36727d73e72aaa5fa1a0c52da460dae5b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/9] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 doc/src/sgml/ref/create_statistics.sgml    |   18 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   44 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  571 +++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/README.histogram |  287 ++++
 src/backend/utils/mvstats/README.stats     |    2 +
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2032 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  136 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 21 files changed, 3538 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.histogram
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 193e4b0..fd3382e 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -133,6 +133,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>max_buckets</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of histogram buckets.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>max_mcv_items</> (<type>integer</>)</term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d570ee..6afdee0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -167,7 +167,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 90bfaed..b974655 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -137,12 +137,15 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -241,6 +244,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("maximum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -249,10 +275,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -260,6 +286,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -279,11 +310,14 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e3983fd..d3a96f0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1978,10 +1978,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 977f88e..0de2418 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -74,6 +75,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -81,6 +84,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
@@ -93,6 +102,7 @@ static List * find_stats(PlannerInfo *root, Index relid);
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -121,7 +131,7 @@ static List * find_stats(PlannerInfo *root, Index relid);
  *
  * First we try to reduce the list of clauses by applying (soft) functional
  * dependencies, and then we try to estimate the selectivity of the reduced
- * list of clauses using the multivariate MCV list.
+ * list of clauses using the multivariate MCV list and histograms.
  *
  * Finally we remove the portion of clauses estimated using multivariate stats,
  * and process the rest of the clauses using the regular per-column stats.
@@ -214,11 +224,13 @@ clauselist_selectivity(PlannerInfo *root,
 	 * with the multivariate code and simply skip to estimation using the
 	 * regular per-column stats.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
-		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) &&
+		(count_mv_attnums(clauses, relid,
+						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
 		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
+											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
 		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
@@ -230,7 +242,7 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* split the clauselist into regular and mv-clauses */
 			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV);
+										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 			/* we've chosen the histogram to match the clauses */
 			Assert(mvclauses != NIL);
@@ -942,6 +954,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -955,9 +968,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 *      selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1160,7 +1188,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1391,7 +1419,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 			case F_SCALARGTSEL:
 
 				/* not compatible with functional dependencies */
-				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+				if (! (context->types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST)))
 					return true;	/* terminate */
  
 				break;
@@ -2007,6 +2035,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2411,3 +2442,525 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/* cached result of bucket boundary comparison for a single dimension */
+
+#define HIST_CACHE_NOT_FOUND		0x00
+#define HIST_CACHE_FALSE			0x01
+#define HIST_CACHE_TRUE				0x03
+#define HIST_CACHE_MASK				0x02
+
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache)
+{
+	bool a, b;
+
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/*
+	 * First some quick checks on equality - if any of the boundaries equals,
+	 * we have a partial match (so no need to call the comparator).
+	 */
+	if (((min_value == constvalue) && (min_include)) ||
+		((max_value == constvalue) && (max_include)))
+		return MVSTATS_MATCH_PARTIAL;
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	/*
+	 * If result for the bucket lower bound not in cache, evaluate the function
+	 * and store the result in the cache.
+	 */
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, min_value));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/* And do the same for the upper bound. */
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, max_value));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	return (a ^ b) ? MVSTATS_MATCH_PARTIAL : MVSTATS_MATCH_NONE;
+}
+
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache, bool isgt)
+{
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	bool a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	bool b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   min_value,
+										   constvalue));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   max_value,
+										   constvalue));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/*
+	 * Now, we need to combine both results into the final answer, and we need
+	 * to be careful about the 'isgt' variable which kinda inverts the meaning.
+	 *
+	 * First, we handle the case when each boundary returns different results.
+	 * In that case the outcome can only be 'partial' match.
+	 */
+	 if (a != b)
+		return MVSTATS_MATCH_PARTIAL;
+
+	/*
+	 * When the results are the same, then it depends on the 'isgt' value. There
+	 * are four options:
+	 *
+	 * isgt=false a=b=true  => full match
+	 * isgt=false a=b=false => empty
+	 * isgt=true  a=b=true  => empty
+	 * isgt=true  a=b=false => full match
+	 *
+	 * We'll cheat a bit, because we know that (a=b) so we'll use just one of them.
+	 */
+	if (isgt)
+		return (!a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+	else
+		return ( a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo		ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					char res = MVSTATS_MATCH_NONE;
+
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+					bool mininclude, maxinclude;
+					int minidx, maxidx;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minidx = bucket->min[idx];
+					maxidx = bucket->max[idx];
+
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mininclude = bucket->min_inclusive[idx];
+					maxinclude = bucket->max_inclusive[idx];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* Var < Const */
+						case F_SCALARGTSEL:	/* Var > Const */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															   minval, maxval,
+															   minidx, maxidx,
+															   mininclude, maxinclude,
+															   callcache, isgt);
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the
+							 * lt operator, and we also check for equality with the boundaries.
+							 */
+
+							res = bucket_contains_value(ltproc, cst->constvalue,
+														minval, maxval,
+														minidx, maxidx,
+														mininclude, maxinclude,
+														callcache);
+							break;
+					}
+
+					UPDATE_RESULT(matches[i], res, is_or);
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the bucket, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a92f889..d46aed2 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -426,10 +426,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.histogram b/src/backend/utils/mvstats/README.histogram
new file mode 100644
index 0000000..8234d2c
--- /dev/null
+++ b/src/backend/utils/mvstats/README.histogram
@@ -0,0 +1,287 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+    * frequency (fraction of tuples in the bucket)
+    * whether the boundaries are inclusive or exclusive
+    * whether the dimension contains only NULL values
+    * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+    (a) FULL_MATCH - The bucket definitely matches the clause.
+
+    (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+                        the tuples it represents.
+
+    (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+    WHERE (a < 10) => FULL_MATCH (all range values are below
+                      10, so the whole bucket matches)
+
+    WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+                      the clause, but we don't know how many)
+
+    WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+                      no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+    (a) create an initial bucket (containing all sample rows)
+
+    (b) create NULL buckets (by splitting the initial bucket)
+
+    (c) repeat
+    
+        (1) choose bucket to split next
+
+        (2) terminate if no bucket that might be split found, or if we've
+            reached the maximum number of buckets (16384)
+
+        (3) choose dimension to partition the bucket by
+
+        (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split.
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency. We also need to produce "regular" buckets, because buckets
+with one "side" much longer than the others are very likely to match a lot of
+conditions (which increases error, even if the bucket frequency is very low).
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket().
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL. 
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_mv_statistic table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+    2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+    (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+    2*N + K       = 1007
+    (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+    1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+    SELECT * FROM pg_mv_histogram_buckets();
+
+It has two input parameters:
+
+    oid   - OID of the histogram (pg_mv_statistic.staoid)
+    otype - type of output
+
+and produces a table with these columns:
+
+    - bucket ID                (0...nbuckets-1)
+    - lower bucket boundaries  (string array)
+    - upper bucket boundaries  (string array)
+    - nulls only dimensions    (boolean array)
+    - lower boundary inclusive (boolean array)
+    - upper boundary includive (boolean array)
+    - frequency                (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+    - 0 - values stored in the histogram, encoded as text
+    - 1 - indexes into the deduplicated arrays
+    - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 5c5c59a..3e4f4d1 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -18,6 +18,8 @@ Currently we only have two kinds of multivariate statistics
 
     (b) MCV lists (README.mcv)
 
+    (c) multivariate histograms (README.histogram)
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d1da714..ffb76f4 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..9e5620a
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2032 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size.
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket
+ * for more details about the algorithm.
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later
+	 * to select dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		int				j;
+		int				nvals;
+		Datum		   *tmp;
+
+		SortSupportData	ssup;
+		StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		nvals = 0;
+		tmp = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+		for (j = 0; j < numrows; j++)
+		{
+			bool	isnull;
+
+			/* remember the index of the sample row, to make the partitioning simpler */
+			Datum	value = heap_getattr(rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isnull);
+
+			if (isnull)
+				continue;
+
+			tmp[nvals++] = value;
+		}
+
+		/* do the sort and stuff only if there are non-NULL values */
+		if (nvals > 0)
+		{
+			/* sort the array of values */
+			qsort_arg((void *) tmp, nvals, sizeof(Datum),
+					  compare_scalars_simple, (void *) &ssup);
+
+			/* count distinct values */
+			ndistvalues[i] = 1;
+			for (j = 1; j < nvals; j++)
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+					ndistvalues[i] += 1;
+
+			/* FIXME allocate only needed space (count ndistinct first) */
+			distvalues[i] = (Datum*)palloc0(sizeof(Datum) * ndistvalues[i]);
+
+			/* now collect distinct values into the array */
+			distvalues[i][0] = tmp[0];
+			ndistvalues[i] = 1;
+
+			for (j = 1; j < nvals; j++)
+			{
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+				{
+					distvalues[i][ndistvalues[i]] = tmp[j];
+					ndistvalues[i] += 1;
+				}
+			}
+		}
+
+		pfree(tmp);
+	}
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats,
+							   ndistvalues, distvalues);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in
+		 * case some of the rows were used for MCV (and thus are missing
+		 * from the histogram).
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (10 * 1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 10MB (%ld > %d)",
+					total_length, (10 * 1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typlen > 0)
+			{
+				/* pased by value or reference, but fixed length */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary
+ * values deduplicated, so that it's possible to optimize the estimation
+ * part by caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* now let's allocate a single buffer for all the values and counts */
+
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+	for (i = 0; i < ndims; i++)
+	{
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+	}
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+				sizeof(MVSerializedBucket) +		/* bucket pointer */
+				sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval && info[i].typlen == sizeof(Datum))
+		{
+			/* passed by value / Datum - simply reuse the array */
+			histogram->values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typbyval)
+			{
+				/* pased by value, but smaller than Datum */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= *BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size. We
+ * select the bucket with the highest number of distinct values, and
+ * then split it by the longest dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this
+ * is used to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this
+ *       contains values for all the tuples from the sample, not just
+ *       the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *      For example use the "bucket volume" (product of dimension
+ *      lengths) to select the bucket.
+ *
+ *      We need buckets containing about the same number of tuples (so
+ *      about the same frequency), as that limits the error when we
+ *      match the bucket partially (in that case use 1/2 the bucket).
+ *
+ *      We also need buckets with "regular" size, i.e. not "narrow" in
+ *      some dimensions and "wide" in the others, because that makes
+ *      partial matches more likely and increases the estimation error,
+ *      especially when the clauses match many buckets partially. This
+ *      is especially serious for OR-clauses, because in that case any
+ *      of them may add the bucket as a (partial) match. With AND-clauses
+ *      all the clauses have to match the bucket, which makes this issue
+ *      somewhat less pressing.
+ *
+ *      For example this table:
+ *
+ *          CREATE TABLE t AS SELECT i AS a, i AS b
+ *                              FROM generate_series(1,1000000) s(i);
+ *          ALTER TABLE t ADD STATISTICS (histogram) ON (a,b);
+ *          ANALYZE t;
+ *
+ *      It's a very specific (and perhaps artificial) example, because
+ *      every bucket always has exactly the same number of distinct
+ *      values in all dimensions, which makes the partitioning tricky.
+ *
+ *      Then:
+ *
+ *          SELECT * FROM t WHERE a < 10 AND b < 10;
+ *
+ *      is estimated to return ~120 rows, while in reality it returns 9.
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *                      (actual time=0.185..270.774 rows=9 loops=1)
+ *         Filter: ((a < 10) AND (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      while the query using OR clauses is estimated like this:
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *                      (actual time=0.118..189.919 rows=9 loops=1)
+ *         Filter: ((a < 10) OR (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      which is clearly much worse. This happens because the histogram
+ *      contains buckets like this:
+ *
+ *          bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ *      i.e. the length of "a" dimension is (30310-3)=30307, while the
+ *      length of "b" is (30593-30134)=459. So the "b" dimension is much
+ *      narrower than "a". Of course, there are buckets where "b" is the
+ *      wider dimension.
+ *
+ *      This is partially mitigated by selecting the "longest" dimension
+ *      in partition_bucket() but that only happens after we already
+ *      selected the bucket. So if we never select the bucket, we can't
+ *      really fix it there.
+ *
+ *      The other reason why this particular example behaves so poorly
+ *      is due to the way we split the partition in partition_bucket().
+ *      Currently we attempt to divide the bucket into two parts with
+ *      the same number of sampled tuples (frequency), but that does not
+ *      work well when all the tuples are squashed on one end of the
+ *      bucket (e.g. exactly at the diagonal, as a=b). In that case we
+ *      split the bucket into a tiny bucket on the diagonal, and a huge
+ *      remaining part of the bucket, which is still going to be narrow
+ *      and we're unlikely to fix that.
+ *
+ *      So perhaps we need two partitioning strategies - one aiming to
+ *      split buckets with high frequency (number of sampled rows), the
+ *      other aiming to split "large" buckets. And alternating between
+ *      them, somehow.
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ *      histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest
+ * bucket dimension, measured using the array of distinct values built
+ * at the very beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly
+ * distributed, and then use this to measure length. It's essentially
+ * a number of distinct values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts
+ * with roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning
+ * the new bucket (essentially shrinking the existing one in-place and
+ * returning the other "half" as a new bucket). The caller is responsible
+ * for adding the new bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case
+ *      of strongly dependent columns - e.g. y=x). The current algorithm
+ *      minimizes this, but may still happen for perfectly dependent
+ *      examples (when all the dimensions have equal length, the first
+ *      one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	// int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split.
+	 */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* sort support for the bsearch_comparator */
+		ssup_private = &ssup;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		b = (Datum*)bsearch(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute
+		 *       here - we can start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram (or if there's no
+ * statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ * 
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options
+ * skew the lengths by distributing the distinct values uniformly. For
+ * data types without a clear meaning of 'distance' (e.g. strings) that
+ * is not a big deal, but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_size = 1.0;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(9 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+		values[3] = (char *) palloc0(1024 * sizeof(char));
+		values[4] = (char *) palloc0(1024 * sizeof(char));
+		values[5] = (char *) palloc0(1024 * sizeof(char));
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/*
+		 * currently we only print array of indexes, but the deduplicated
+		 * values should be sorted, so this is actually quite useful
+		 *
+		 * TODO print the actual min/max values, using the output
+		 *      function of the attribute type
+		 */
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bucket_size *= (bucket->max[i] - bucket->min[i]) * 1.0
+											/ (histogram->nvalues[i]-1);
+
+			/* print the actual values, i.e. use output function etc. */
+			if (otype == 0)
+			{
+				Datum minval, maxval;
+				Datum minout, maxout;
+
+				format = "%s, %s";
+				if (i == 0)
+					format = "{%s%s";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %s}";
+
+				minval = histogram->values[i][bucket->min[i]];
+				minout = FunctionCall1(&fmgrinfo[i], minval);
+
+				maxval = histogram->values[i][bucket->max[i]];
+				maxout = FunctionCall1(&fmgrinfo[i], maxval);
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], DatumGetPointer(minout));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], DatumGetPointer(maxout));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else if (otype == 1)
+			{
+				format = "%s, %d";
+				if (i == 0)
+					format = "{%s%d";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %d}";
+
+				snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else
+			{
+				format = "%s, %f";
+				if (i == 0)
+					format = "{%s%f";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %f}";
+
+				snprintf(buff, 1024, format, values[1],
+						 bucket->min[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2],
+						bucket->max[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == histogram->ndimensions-1)
+				format = "%s, %s}";
+
+			snprintf(buff, 1024, format, values[3], bucket->nullsonly[i] ? "t" : "f");
+			strncpy(values[3], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[4], bucket->min_inclusive[i] ? "t" : "f");
+			strncpy(values[4], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[5], bucket->max_inclusive[i] ? "t" : "f");
+			strncpy(values[5], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_size);	/* density */
+		snprintf(values[8], 64, "%f", bucket_size);	/* bucket_size */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+		pfree(values[4]);
+		pfree(values[5]);
+		pfree(values[6]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int i, j;
+
+	float ffull = 0, fpartial = 0;
+	int nfull = 0, npartial = 0;
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		char ranges[1024];
+
+		if (! matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		memset(ranges, 0, sizeof(ranges));
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			sprintf(ranges, "%s [%d %d]", ranges,
+										  DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+										  DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, ranges, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 6339631..3543239 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,9 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2154,8 +2154,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 9));
+							PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index fd7107d..a5945af 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,13 +38,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -52,6 +55,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -67,17 +71,21 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					11
+#define Natts_pg_mv_statistic					15
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
 #define Anum_pg_mv_statistic_mcv_enabled		5
-#define Anum_pg_mv_statistic_mcv_max_items		6
-#define Anum_pg_mv_statistic_deps_built			7
-#define Anum_pg_mv_statistic_mcv_built			8
-#define Anum_pg_mv_statistic_stakeys			9
-#define Anum_pg_mv_statistic_stadeps			10
-#define Anum_pg_mv_statistic_stamcv				11
+#define Anum_pg_mv_statistic_hist_enabled		6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_hist_max_buckets	8
+#define Anum_pg_mv_statistic_deps_built			9
+#define Anum_pg_mv_statistic_mcv_built			10
+#define Anum_pg_mv_statistic_hist_built			11
+#define Anum_pg_mv_statistic_stakeys			12
+#define Anum_pg_mv_statistic_stadeps			13
+#define Anum_pg_mv_statistic_stamcv				14
+#define Anum_pg_mv_statistic_stahist			15
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 66b4bcd..7e915bd 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2674,6 +2674,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_size}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ae6b3c..46bece6 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -620,10 +620,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 4535db7..f05a517 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -92,6 +92,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -99,20 +216,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -121,6 +243,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -130,10 +254,20 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..a34edb8
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON mv_histogram (unknown_column) WITH (histogram);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON mv_histogram (a) WITH (histogram);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a, b) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+ERROR:  maximum number of buckets is 16384
+-- correct command
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 66071d8..1a1a4ca 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1375,7 +1375,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 85d94f1..a885235 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 6584d73..2efdcd7 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -164,3 +164,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..02f49b4
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON mv_histogram (unknown_column) WITH (histogram);
+
+-- single column
+CREATE STATISTICS s1 ON mv_histogram (a) WITH (histogram);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a) WITH (histogram);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON mv_histogram (a, a, b) WITH (histogram);
+
+-- unknown option
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (unknown_option);
+
+-- missing histogram statistics
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+
+-- invalid max_buckets value / too low
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+
+-- invalid max_buckets value / too high
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+
+-- correct command
+CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.1.0

0006-multi-statistics-estimation.patchtext/x-patch; charset=UTF-8; name=0006-multi-statistics-estimation.patchDownload

From dec65426b12adcceb6303692b07bb4f5c3e564e2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/9] multi-statistics estimation

The general idea is that a probability (which is what selectivity is)
can be split into a product of conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part may be
simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute the original
probability.

The implementation works in the other direction, though. We know what
probability P(A & B & C) we need to compute, and also what statistics
are available.

So we search for a combinations of statistics, covering the clauses in
an optimal way (most clauses covered, most dependencies exploited).

There are two possible approaches - exhaustive and greedy. The
exhaustive one walks through all permutations of stats using dynamic
programming, so it's guaranteed to find the optimal solution, but it
soon gets very slow as it's roughly O(N!). The dynamic programming may
improve that a bit, but it's still far too expensive for large numbers
of statistics (on a single table).

The greedy algorithm is very simple - in every step choose the best
solution. That may not guarantee the best solution globally (but maybe
it does?), but it only needs N steps to find the solution, so it's very
fast (processing the selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with respect to
runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply them to the
clauses using the conditional probabilities. We process the selected
stats one by one, and for each we select the estimated clauses and
conditions. See clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to be covered by
a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single multivariate
statistics.

Clauses not covered by a single statistics at this level will be passed
to clause_selectivity() but this will treat them as a collection of
simpler clauses (connected by AND or OR), and the clauses from the
previous level will be used as conditions.

So using the same example, the last clause will be passed to
clause_selectivity() with 'clause1' and 'clause2' as conditions, and it
will be processed using multivariate stats if possible.

The other limitation is that all the expressions have to be
mv-compatible, i.e. there can't be a mix of expressions. If this is
violated, the clause may be passed to the next level (just like with
list of clauses not covered by a single statistics), which splits that
into clauses handled by multivariate stats and clauses handler by
regular statistics.

rework clauselist_selectivity_or to handle OR-clauses correctly

We might invent a completely new set of functions here, resembling
clauselist_selectivity but adapting the ideas to OR-clauses.

But luckily we know that each OR-clause

    (a OR b OR c)

may be rewritten as an equivalent AND-clause using negation:

    NOT ((NOT a) AND (NOT b) AND (NOT c))

And that's something we can pass to clauselist_selectivity.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |    6 +-
 src/backend/optimizer/path/clausesel.c | 1990 ++++++++++++++++++++++++++------
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/backend/utils/mvstats/README.stats |  166 +++
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 10 files changed, 1887 insertions(+), 356 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index dc035d7..8f11b7a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -969,7 +969,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d79e4cc..2f4af21 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -498,7 +498,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2149,7 +2150,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0de2418..c1b8999 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -60,23 +69,25 @@ static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
+static List *clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
-static List *clauselist_mv_split(PlannerInfo *root, Index relid,
-								 List *clauses, List **mvclauses,
-								 MVStatisticInfo *mvstats, int types);
-
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats, List *clauses,
+									List *conditions, bool is_or);
 
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -90,10 +101,33 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root, Index relid,
+							List *mvstats, List *clauses, List *conditions);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
 
+static bool stats_type_matches(MVStatisticInfo *stat, int type);
+
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
 
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
@@ -168,14 +202,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* list of multivariate stats on the relation */
 	List	   *stats = NIL;
@@ -191,12 +226,13 @@ clauselist_selectivity(PlannerInfo *root,
 		stats = find_stats(root, relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Apply functional dependencies, but first check that there are some stats
@@ -228,31 +264,96 @@ clauselist_selectivity(PlannerInfo *root,
 		(count_mv_attnums(clauses, relid,
 						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
-		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
-											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+		ListCell *s;
+
+		/*
+		 * Copy the conditions we got from the upper part of the expression tree
+		 * so that we can add local conditions to it (we need to keep the
+		 * original list intact, for sibling expressions - other expressions
+		 * at the same level).
+		 */
+		List *conditions_local = list_copy(conditions);
 
-		/* and search for the statistic covering the most attributes */
-		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+		/* find the best combination of statistics */
+		List *solution = choose_mv_statistics(root, relid, stats,
+											  clauses, conditions);
 
-		if (mvstat != NULL)	/* we have a matching stats */
+		/*
+		 * We have a good solution, which is merely a list of statistics that
+		 * we need to apply. We'll apply the statistics one by one (in the order
+		 * as they appear in the list), and for each statistic we'll
+		 *
+		 * (1) find clauses compatible with the statistic (and remove them
+		 *     from the list)
+		 *
+		 * (2) find local conditions compatible with the statistic
+		 *
+		 * (3) do the estimation P(clauses | conditions)
+		 *
+		 * (4) append the estimated clauses to local conditions
+		 *
+		 * continuously modify 
+		 */
+		foreach (s, solution)
 		{
-			/* clauses compatible with multi-variate stats */
-			List	*mvclauses = NIL;
+			MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
 
-			/* split the clauselist into regular and mv-clauses */
-			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+			/* clauses compatible with the statistic we're applying right now */
+			List	*stat_clauses = NIL;
+			List	*stat_conditions = NIL;
 
-			/* we've chosen the histogram to match the clauses */
-			Assert(mvclauses != NIL);
+			/*
+			 * Find clauses and conditions matching the statistic - the clauses
+			 * need to be removed from the list, while conditions should remain
+			 * there (so that we can apply them repeatedly).
+			 */
+			stat_clauses
+				= clauses_matching_statistic(&clauses, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 true);
+
+			stat_conditions
+				= clauses_matching_statistic(&conditions_local, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 false);
+
+			/*
+			 * If we got no clauses to estimate, we've done something wrong,
+			 * either during the optimization, detecting compatible clause, or
+			 * somewhere else.
+			 *
+			 * Also, we need at least two attributes in clauses and conditions.
+			 */
+			Assert(stat_clauses != NIL);
+			Assert(count_mv_attnums(list_union(stat_clauses, stat_conditions),
+								relid, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);
 
 			/* compute the multivariate stats */
-			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			s1 *= clauselist_mv_selectivity(root, mvstat,
+											stat_clauses, stat_conditions,
+											false); /* AND */
+
+			/*
+			 * Add the new clauses to the local conditions, so that we can use
+			 * them for the subsequent statistics. We only add the clauses,
+			 * because the conditions are already there (or should be).
+			 */
+			conditions_local = list_concat(conditions_local, stat_clauses);
 		}
+
+		/* from now on, work only with the 'local' list of conditions */
+		conditions = conditions_local;
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return s1 * clause_selectivity(root, (Node *) linitial(clauses),
+									   varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -264,7 +365,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -423,6 +525,55 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for OR-clauses. We can't simply use
+ * the same multi-statistic estimation logic for AND-clauses, at least not
+ * directly, because there are a few key differences:
+ *
+ *   - functional dependencies don't really apply to OR-clauses
+ *
+ *   - clauselist_selectivity() is based on decomposing the selectivity into
+ *     a sequence of conditional probabilities (selectivities), but that can
+ *     be done only for AND-clauses
+ *
+ * We might invent a similar infrastructure for optimizing OR-clauses, doing
+ * something similar to what clause_selectivity does for AND-clauses, but
+ * luckily we know that each disjunctive normal form (aka OR-clause)
+ *
+ *     (a OR b OR c)
+ *
+ * may be rewritten as an equivalent conjunctive normal form (aka AND-clause)
+ * by using negation:
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * And that's something we can pass to clauselist_selectivity and let it do
+ * all the heavy lifting.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	List	   *args = NIL;
+	ListCell   *l;
+	Expr	   *expr;
+
+	/* build arguments for the AND-clause by negating args of the OR-clause */
+	foreach (l, clauses)
+		args = lappend(args, makeBoolExpr(NOT_EXPR, list_make1(lfirst(l)), -1));
+
+	/* and then the actual OR-clause on the negated args */
+	expr = makeBoolExpr(AND_EXPR, args, -1);
+
+	/* instead of constructing NOT expression, just do (1.0 - s) */
+	return 1.0 - clauselist_selectivity(root, list_make1(expr), varRelid,
+										jointype, sjinfo, conditions);
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -629,7 +780,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -749,7 +901,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -758,29 +911,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -870,7 +1012,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -879,7 +1022,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else
 	{
@@ -943,15 +1087,16 @@ clause_selectivity(PlannerInfo *root,
  *          in the MCV list, then the selectivity is below the lowest frequency
  *          found in the MCV list,
  *
- * TODO When applying the clauses to the histogram/MCV list, we can do
- *      that from the most selective clauses first, because that'll
- *      eliminate the buckets/items sooner (so we'll be able to skip
- *      them without inspection, which is more expensive). But this
- *      requires really knowing the per-clause selectivities in advance,
- *      and that's not what we do now.
+ * TODO When applying the clauses to the histogram/MCV list, we can do that from
+ *      the most selective clauses first, because that'll eliminate the
+ *      buckets/items sooner (so we'll be able to skip them without inspection,
+ *      which is more expensive). But this requires really knowing the
+ *      per-clause selectivities in advance, and that's not what we do now.
+ *
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -969,7 +1114,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -982,7 +1128,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
 	 *      selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1016,260 +1163,1325 @@ get_varattnos(Node * node, Index relid)
 								 k + FirstLowInvalidHeapAttributeNumber);
 	}
 
-	bms_free(varattnos);
+	bms_free(varattnos);
+
+	return result;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid, int types)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		bms_free(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid, int type)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+static List *
+clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove)
+{
+	int i;
+	Bitmapset  *stat_attnums = NULL;
+	List	   *matching_clauses = NIL;
+	ListCell   *lc;
+
+	/* build attnum bitmapset for this statistics */
+	for (i = 0; i < statistic->stakeys->dim1; i++)
+		stat_attnums = bms_add_member(stat_attnums,
+									  statistic->stakeys->values[i]);
+
+	/*
+	 * We can't use foreach here, because we may need to remove some of the
+	 * clauses if (remove=true).
+	 */
+	lc = list_head(*clauses);
+	while (lc)
+	{
+		Node *clause = (Node*)lfirst(lc);
+		Bitmapset *attnums = NULL;
+
+		/* must advance lc before list_delete possibly pfree's it */
+		lc = lnext(lc);
+
+		/*
+		 * skip clauses that are not compatible with stats (just leave them
+		 * in the original list)
+		 *
+		 * XXX Perhaps this should check what stats are actually available in
+		 *     the statistics (not a big deal now, because MCV and histograms
+		 *     handle the same types of conditions).
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			bms_free(attnums);
+			continue;
+		}
+
+		/* if the clause is covered by the statistic, add it to the list */
+		if (bms_is_subset(attnums, stat_attnums))
+		{
+			matching_clauses = lappend(matching_clauses, clause);
+
+			/* if remove=true, remove the matching item from the main list */
+			if (remove)
+				*clauses = list_delete_ptr(*clauses, clause);
+		}
+
+		bms_free(attnums);
+	}
+
+	bms_free(stat_attnums);
+
+	return matching_clauses;
+}
+
+/*
+ * Selects the best combination of multivariate statistics, in an exhaustive
+ * way, where 'best' means:
+ *
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
+ *
+ * Don't call this directly but through choose_mv_statistics(), which does some
+ * additional tricks to minimize the runtime.
+ *
+ *
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with maximum
+ * depth equal to the number of multi-variate statistics available on the table.
+ * It actually explores all valid combinations of stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it matches are
+ * divided into 'conditions' (clauses already matched by at least one previous
+ * statistics) and clauses that are estimated.
+ *
+ * Then several checks are performed:
+ *
+ *  (a) The statistics covers at least 2 columns, referenced in the estimated
+ *      clauses (otherwise multi-variate stats are useless).
+ *
+ *  (b) The statistics covers at least 1 new column, i.e. column not refefenced
+ *      by the already used stats (and the new column has to be referenced by
+ *      the clauses, of couse). Otherwise the statistics would not add any new
+ *      information.
+ *
+ * There are some other sanity checks (e.g. stats must not be used twice etc.).
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a rather simple optimality criteria, so it may
+ * not do the best choice when
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but with
+ *     statistics in a different order). It's unclear which solution in the best
+ *     one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those solutions,
+ *      and then combine them to get the final estimate (e.g. by using average
+ *      or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for some
+ *     types of clauses (e.g. MCV list is generally a better match for equality
+ *     conditions than a histogram).
+ *
+ *     But maybe this is pointless - generally, each column is either a label
+ *     (it's not important whether because of the data type or how it's used),
+ *     or a value with ordering that makes sense. So either a MCV list is more
+ *     appropriate (labels) or a histogram (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing both types of data
+ *     (some columns would work best with MCVs, some with histograms). Maybe we
+ *     could invent a new type of statistics combining MCV list and histogram
+ *     (keeping a small histogram for each MCV item, and a separate histogram
+ *     for values not on the MCV list).
+ *
+ * TODO The algorithm should probably count number of Vars (not just attnums)
+ *      when computing the 'score' of each solution. Computing the ratio of
+ *      (num of all vars) / (num of condition vars) as a measure of how well
+ *      the solution uses conditions might be useful.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* this may run for a long sime, so let's make it interruptible */
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics covering
+ * the clauses. This chooses the "best" statistics at each step, so the
+ * resulting solution may not be the best solution globally, but this produces
+ * the solution in only N steps (where N is the number of statistics), while
+ * the exhaustive approach may have to walk through ~N! combinations (although
+ * some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does the same
+ * thing (but in a different way).
+ *
+ * Don't call this directly, but through choose_mv_statistics().
+ *
+ * TODO There are probably other metrics we might use - e.g. using number of
+ *      columns (num_cond_columns / num_cov_columns), which might work better
+ *      with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled in a special
+ *      way, because there will be 0 conditions at that moment, so there needs
+ *      to be some other criteria - e.g. using the simplest (or most complex?)
+ *      clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria, and branch
+ *      the search. This is however tricky, because if we choose k statistics at
+ *      each step, we get k^N branches to walk through (with N steps). That's
+ *      not really good with large number of stats (yet better than exhaustive
+ *      search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
+ * Remove clauses not covered by any of the available statistics
+ *
+ * This helps us to reduce the amount of work done in choose_mv_statistics()
+ * by not having to deal with clauses that can't possibly be useful.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Index relid, int type,
+			   List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+
+		/*
+		 * We do assume that thanks to previous checks, we should not run into
+		 * clauses that are incompatible with multivariate stats here. We also
+		 * need to collect the attnums for the clause.
+		 *
+		 * XXX Maybe turn this into an assert?
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &clause_attnums, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* Is there a multivariate statistics covering the clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			/* skip statistics not matching the required type */
+			if (! stats_type_matches(stat, type))
+				continue;
+
+			/*
+			 * see if all clause attributes are covered by the statistic
+			 *
+			 * We'll do that in the opposite direction, i.e. we'll see how many
+			 * attributes of the statistic are referenced in the clause, and then
+			 * compare the counts.
+			 */
+			for (k = 0; k < stat->stakeys->dim1; k++)
+				if (bms_is_member(stat->stakeys->values[k], clause_attnums))
+					matches += 1;
+
+			/*
+			 * If the number of matches is equal to attributes referenced by the
+			 * clause, then the clause is covered by the statistic.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+/*
+ * Remove statistics not covering any new clauses
+ *
+ * Statistics not covering any new clauses (conditions don't count) are not
+ * really useful, so let's ignore them. Also, we need the statistics to
+ * reference at least two different attributes (both in conditions and clauses
+ * combined), and at least one of them in the clauses alone.
+ *
+ * This check might be made more strict by checking against individual clauses,
+ * because by using the bitmapsets of all attnums we may actually use attnums
+ * from clauses that are not covered by the statistics. For example, we may
+ * have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this (assuming
+ * there are some statistics covering both clases).
+ *
+ * Parameters:
+ *
+ *     stats       - list of statistics to filter
+ *     new_attnums - attnums referenced in new clauses
+ *     all_attnums - attnums referenced by contidions and new clauses combined
+ *
+ * Returns filtered list of statistics.
+ *
+ * TODO Do the more strict check, i.e. walk through individual clauses and
+ *      conditions and only use those covered by the statistics.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
+
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
+
+	return mvstats;
+}
+
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
+
+	Assert(nmvstats > 0);
 
-	return result;
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
 }
 
+
 /*
- * Collect attributes from mv-compatible clauses.
+ * Remove redundant statistics
+ *
+ * If there are multiple statistics covering the same set of columns (counting
+ * only those referenced by clauses and conditions), we can apply one of those
+ * anyway and further reduce the size of the optimization problem.
+ *
+ * Thus when redundant stats are detected, we keep the smaller one (the one with
+ * fewer columns), based on the assumption that it's more accurate and also
+ * faster to process. That may be untrue for two reasons - first, the accuracy
+ * really depends on number of buckets/MCV items, not the number of columns.
+ * Second, some types of statistics may work better for certain types of clauses
+ * (e.g. MCV lists for equality conditions) etc.
  */
-static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid, int types)
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
 {
-	Bitmapset  *attnums = NULL;
-	ListCell   *l;
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate using
-	 * multivariate stats, and remember the relid/columns. We'll then
-	 * cross-check if we have suitable stats, and only if needed we'll split
-	 * the clauses into multivariate and regular lists.
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+
+	/*
+	 * Get the varattnos from both conditions and clauses.
+	 *
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
-	 * using either a range or equality.
+	 * XXX Is that really true?
 	 */
-	foreach (l, clauses)
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
 	{
-		Node	   *clause = (Node *) lfirst(l);
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
 
-		/* ignore the result here - we only need the attnums */
-		clause_is_mv_compatible(clause, relid, &attnums, types);
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
 	}
 
-	/*
-	 * If there are not at least two attributes referenced by the clause(s),
-	 * we can throw everything out (as we'll revert to simple stats).
-	 */
-	if (bms_num_members(attnums) <= 1)
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
-		attnums = NULL;
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
 	}
 
-	return attnums;
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
 }
 
-/*
- * Count the number of attributes in clauses compatible with multivariate stats.
- */
-static int
-count_mv_attnums(List *clauses, Index relid, int type)
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
 {
-	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+	int i;
+	ListCell *l;
 
-	c = bms_num_members(attnums);
+	Node** clauses_array;
 
-	bms_free(attnums);
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
 
-	return c;
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
 }
 
-/*
- * Count varnos referenced in the clauses, and if there's a single varno then
- * return the index in 'relid'.
- */
-static int
-count_varnos(List *clauses, Index *relid)
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Index relid,
+					 int type, Node **clauses, int nclauses)
 {
-	int cnt;
-	Bitmapset *varnos = NULL;
+	int			i;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
 
-	varnos = pull_varnos((Node *) clauses);
-	cnt = bms_num_members(varnos);
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
 
-	/* if there's a single varno in the clauses, remember it */
-	if (bms_num_members(varnos) == 1)
-		*relid = bms_singleton_member(varnos);
+		if (! clause_is_mv_compatible(clauses[i], relid, &attnums, type))
+			elog(ERROR, "should not get non-mv-compatible clause");
 
-	bms_free(varnos);
+		clauses_attnums[i] = attnums;
+	}
 
-	return cnt;
+	return clauses_attnums;
 }
- 
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
+}
+
 /*
- * We're looking for statistics matching at least 2 attributes, referenced in
- * clauses compatible with multivariate statistics. The current selection
- * criteria is very simple - we choose the statistics referencing the most
- * attributes.
- *
- * If there are multiple statistics referencing the same number of columns
- * (from the clauses), the one with less source columns (as listed in the
- * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
- *
- * This is a very simple criteria, and has several weaknesses:
- *
- * (a) does not consider the accuracy of the statistics
- *
- *     If there are two histograms built on the same set of columns, but one
- *     has 100 buckets and the other one has 1000 buckets (thus likely
- *     providing better estimates), this is not currently considered.
- *
- * (b) does not consider the type of statistics
- *
- *     If there are three statistics - one containing just a MCV list, another
- *     one with just a histogram and a third one with both, we treat them equally.
+ * Chooses the combination of statistics, optimal for estimation of a particular
+ * clause list.
  *
- * (c) does not consider the number of clauses
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce the size
+ * of the problem (eliminate clauses/statistics that can't be really used in
+ * the solution).
  *
- *     As explained, only the number of referenced attributes counts, so if
- *     there are multiple clauses on a single attribute, this still counts as
- *     a single attribute.
+ * It also precomputes bitmaps for attributes covered by clauses and statistics,
+ * so that we don't need to do that over and over in the actual optimizations
+ * (as it's both CPU and memory intensive).
  *
- * (d) does not consider type of condition
  *
- *     Some clauses may work better with some statistics - for example equality
- *     clauses probably work better with MCV lists than with histograms. But
- *     IS [NOT] NULL conditions may often work better with histograms (thanks
- *     to NULL-buckets).
+ * TODO Another way to make the optimization problems smaller might be splitting
+ *      the statistics into several disjoint subsets, i.e. if we can split the
+ *      graph of statistics (after the elimination) into multiple components
+ *      (so that stats in different components share no attributes), we can do
+ *      the optimization for each component separately.
  *
- * So for example with five WHERE conditions
- *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
- *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
- * as it references the most columns.
- *
- * Once we have selected the multivariate statistics, we split the list of
- * clauses into two parts - conditions that are compatible with the selected
- * stats, and conditions are estimated using simple statistics.
- *
- * From the example above, conditions
- *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
- *
- * will be estimated using the multivariate statistics (a,b,c,d) while the last
- * condition (e = 1) will get estimated using the regular ones.
- *
- * There are various alternative selection criteria (e.g. counting conditions
- * instead of just referenced attributes), but eventually the best option should
- * be to combine multiple statistics. But that's much harder to do correctly.
- *
- * TODO Select multiple statistics and combine them when computing the estimate.
- *
- * TODO This will probably have to consider compatibility of clauses, because
- *      'dependencies' will probably work only with equality clauses.
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew that we
+ *      can cover 10 clauses and reuse 8 dependencies, maybe covering 9 clauses
+ *      and 7 dependencies would be OK?
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static List*
+choose_mv_statistics(PlannerInfo *root, Index relid, List *stats,
+					 List *clauses, List *conditions)
 {
 	int i;
-	ListCell   *lc;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
+
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
 
-	MVStatisticInfo *choice = NULL;
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements) and for
-	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
 	 */
-	foreach (lc, stats)
+	while (true)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
-
-		/* columns matching this statistics */
-		int matches = 0;
+		List	   *tmp;
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
-			continue;
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, relid, type,
+							 stats, clauses, &compatible_attnums);
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
 
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		if (conditions != NIL)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
+			tmp = filter_clauses(root, relid, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
 		}
-	}
 
-	return choice;
-}
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
 
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
 
-/*
- * This splits the clauses list into two parts - one containing clauses that
- * will be evaluated using the chosen statistics, and the remaining clauses
- * (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, Index relid,
-					List *clauses, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
 
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
 
-	Bitmapset *mvattnums = NULL;
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
 
-	/* build bitmap of attributes, so we can do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
 
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, relid, type,
+										   clauses_array, nclauses);
 
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, relid, type,
+											  conditions_array, nconditions);
 
-		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
 		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
 		}
 
-		/*
-		 * The clause matches the selected stats, so put it to the list of
-		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
-		 * clauses (that may be selected later).
-		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
+		pfree(best);
 	}
 
-	/*
-	 * Perform regular estimation using the clauses incompatible with the chosen
-	 * histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
+
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
+
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
+
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
 
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
+
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
+
+	return result;
 }
 
 typedef struct
@@ -1474,6 +2686,7 @@ clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int type
 	return true;
 }
 
+
 /*
  * collect attnums from functional dependencies
  *
@@ -2022,6 +3235,24 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
  * Check that there are stats with at least one of the requested types.
  */
 static bool
+stats_type_matches(MVStatisticInfo *stat, int type)
+{
+	if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+		return true;
+
+	if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+		return true;
+
+	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
 has_stats(List *stats, int type)
 {
 	ListCell   *s;
@@ -2030,13 +3261,8 @@ has_stats(List *stats, int type)
 	{
 		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
 
-		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+		/* terminate if we've found at least one matching statistics */
+		if (stats_type_matches(stat, type))
 			return true;
 	}
 
@@ -2087,22 +3313,26 @@ find_stats(PlannerInfo *root, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2113,32 +3343,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2369,64 +3652,57 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				}
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mcvlist->nitems; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2484,15 +3760,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2505,25 +3784,55 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
 
-	nmatches = mvhist->nbuckets;
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2537,10 +3846,23 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 #ifdef DEBUG_MVHIST
@@ -2549,9 +3871,14 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /* cached result of bucket boundary comparison for a single dimension */
@@ -2699,7 +4026,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 {
 	int i;
 	ListCell * l;
-
+ 
 	/*
 	 * Used for caching function calls, only once per deduplicated value.
 	 *
@@ -2742,7 +4069,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 			FmgrInfo	opproc;			/* operator */
 			fmgr_info(get_opcode(expr->opno), &opproc);
-
+ 
 			/* reset the cache (per clause) */
 			memset(callcache, 0, mvhist->nbuckets);
 
@@ -2902,64 +4229,57 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
-			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			/* by default none of the buckets matches the clauses (OR clause) */
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* but AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
-
+			pfree(tmp_matches);
 		}
 		else
 			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* free the call cache */
 	pfree(callcache);
 
 	return nmatches;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5fc2f9c..7384cb8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3520,7 +3520,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3543,7 +3544,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3710,7 +3712,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3746,7 +3748,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3783,7 +3786,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3921,12 +3925,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3938,7 +3944,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index ea831f5..6299e75 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 46c95b0..7d0a3a1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1627,13 +1627,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6259,7 +6261,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6579,7 +6582,8 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7330,7 +7334,8 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7560,7 +7565,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea5a09a..27a8de5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -393,6 +394,15 @@ static const struct config_enum_entry force_parallel_mode_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3707,6 +3717,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 3e4f4d1..d404914 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -90,6 +90,137 @@ even attempting to do the more expensive estimation.
 Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
+Combining multiple statistics
+-----------------------------
+
+When estimating selectivity of a list of clauses, there may exist no statistics
+covering all of them. If there are multiple statistics, each covering some
+subset of the attributes, the optimizer needs to figure out which of those
+statistics to apply.
+
+When the statistics do not overlap, the solution is trivial - we can simply
+split the groups of conditions by the matching statistics, and then multiply the
+selectivities. For example assume multivariate statistics on (b,c) and (d,e),
+and a condition like this:
+
+    (a=1) AND (b=2) AND (c=3) AND (d=4) AND (e=5)
+
+Then (a=1) is not covered by any of the statistics, so will be estimated using
+the regular per-column statistics. The two conditions ((b=2) AND (c=3)) will be
+estimated using the (b,c) statistics, and ((d=4) AND (e=5)) will be estimated
+using (d,e) statistics. And the resulting selectivities will be estimated.
+
+Now, what if the statistics overlap? For example assume the same condition as
+above, but let's say we have statistics on (a,b,c) and (a,c,d,e). What then?
+
+As selectivity is just a probability that the condition holds for a random row,
+we can write the selectivity like this:
+
+    P(a=1 & b=2 & c=3 & d=4 & e=5)
+
+and we can rewrite it using conditional probability like this
+
+    P(a=1 & b=2 & c=3) * P(d=4 & e=5 | a=1 & b=2 & c=3)
+
+Notice that the first part already matches to (a,b,c) statistics. If we assume
+that columns that are not referenced by the same statistics are independent, we
+may rewrite the second half like this
+
+    P(d=4 & e=5 | a=1 & b=2 & c=3) = P(d=4 & e=5 | a=1 & c=3)
+
+which corresponds to the statistics on (a,c,d,e).
+
+If there are multiple statistics defined on a table, it's not difficult to come
+up with examples when there are multiple ways to combine them to cover a list of
+clauses. We need a way to find the best combination of statistics.
+
+This is the purpose of choose_mv_statistics(). It searches through the possible
+combinations of statistics, and searches such combination that
+
+    (a) covers the most clauses of the list
+
+    (b) reuses the maximum number of clauses as conditions
+        (in conditional probabilities)
+
+While (a) criteria seems natural, the (b) may seem a bit awkward at first. The
+idea is that conditions in a way of transfering information about dependencies
+between statistics.
+
+There are two alternative implementations of choose_mv_statistics() - greedy
+and exhaustive. Exhaustive actually searches through all possible combinations
+of statistics, and for larger numbers of statistics may get quite expensive
+(as it, unsurprisingly, has exponential cost). Greedy terminates in less than
+K steps (when K is the number of clauses), and in each step chooses the best
+next statistics. I've been unable to come up with an example where those two
+approaches would produce different combinations.
+
+It's possible to choose the optimization using mvstat_search_type, with either
+'greedy' or 'exhaustive' values (default is 'greedy').
+
+    SET mvstat_search_type = 'exhaustive';
+
+Note: This is meant mostly for experimentation. I do expect we'll choose one of
+the algorithms and remove the GUC before commit.
+
+
+Limitations of combining statistics
+-----------------------------------
+
+As described in the section 'Combining multiple statistics', the current appoach
+is based on transfering information between statistics by means of conditional
+probabilities. This is a relatively cheap and efficient approach, but it is
+based on two assumptions:
+
+    (1) The overlap between the statistics needs to be sufficiently large, i.e.
+        there needs to be enough columns shared by the statistics to transfer
+        information about dependencies between the remaining columns.
+
+    (2) The query needs to include sufficient clauses on the shared columns.
+
+How a violation of those assumptions may be a problem can be illustrated by
+a simple example. Assume a table with three columns (a,b,c) containing exactly
+the same values, and statistics on (a,b) and (b,c):
+
+    CREATE TABLE test AS SELECT i, i, i
+                           FROM generate_series(1,1000);
+
+    CREATE STATISTICS s1 ON test (a,b) WITH (mcv);
+    CREATE STATISTICS s2 ON test (b,c) WITH (mcv);
+
+    ANALYZE test;
+
+First, let's estimate this query:
+
+    SELECT * FROM test WHERE (a < 10) AND (c < 10);
+
+Clearly, there are no conditions on 'b' (which is the only column shared by the
+two statistics), so we'll end up with an estimate based on assumption of
+independence:
+
+    P(a < 10) * P(c < 10) = 0.01 * 0.01 = 0.0001
+
+Which is a significant under-estimate, as the proper selectivity is 0.01.
+
+But let's estimate another query:
+
+    SELECT * FROM test WHERE (a < 10) AND (b < 500) AND (c < 10);
+
+In this case, the estimate may be computed for example like this:
+
+    P[(a < 10) & (b < 500) & (c < 10)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (a < 10) & (b < 500)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (b < 500)]
+
+The trouble is the probability P(c < 10 | b < 500) evaluates to 0.02, because
+we have assumed (a) and (c) are independent because there is no statistic
+containing both these columns, and the condition on (b) does not transfer
+sufficient amount of information between the two statistics.
+
+Currently, the only solution is to build statistics on all three columns, but
+see the 'combining statistics using convolution' section for ideas on how to
+improve this.
+
+
 Further (possibly crazy) ideas
 ------------------------------
 
@@ -111,3 +242,38 @@ But of course, this may result in expensive estimation (CPU-wise).
 
 So we might add a GUC to choose between a simple (single statistics) and thus
 multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
+
+
+Combining stats using convolution
+---------------------------------
+
+While the current approach for combining statistics is based on conditional
+probabilities, and thus only works when the query includes conditions on the
+overlapping parts of the statistics. But there may be other ways to combine
+statistics, relaxing this requirement.
+
+Let's assume two histograms H1 and H2 - then combining them might work about
+like this:
+
+
+    for (buckets of H1, satisfying local conditions)
+    {
+        for (buckets of H2, overlapping with H1 bucket)
+        {
+            mark H2 bucket as 'valid'
+        }
+    }
+
+    s1 = s2 = 0.0
+    for (buckets of H2 marked as valid)
+    {
+        s1 += frequency
+
+        if (bucket satistifes local conditions)
+             s2 += frequency
+    }
+
+    s = (s2 / s1) /* final selectivity estimate */
+
+However this may quickly get non-trivial, e.g. when combining two statistics
+of different types (histogram vs. MCV).
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 78c7cae..a5ac088 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -191,11 +191,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index f05a517..35b2f8e 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
2.1.0

0007-multivariate-ndistinct-coefficients.patchtext/x-patch; charset=UTF-8; name=0007-multivariate-ndistinct-coefficients.patchDownload

From e42a2efeb060692d0a1ebe23f28c654130b26dcd Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Wed, 23 Dec 2015 02:07:58 +0100
Subject: [PATCH 7/9] multivariate ndistinct coefficients

---
 doc/src/sgml/ref/create_statistics.sgml    |   9 ++
 src/backend/catalog/system_views.sql       |   3 +-
 src/backend/commands/analyze.c             |   2 +-
 src/backend/commands/statscmds.c           |  11 +-
 src/backend/optimizer/path/clausesel.c     |   4 +
 src/backend/optimizer/util/plancat.c       |   4 +-
 src/backend/utils/adt/selfuncs.c           |  93 +++++++++++++++-
 src/backend/utils/mvstats/Makefile         |   2 +-
 src/backend/utils/mvstats/README.ndistinct |  83 ++++++++++++++
 src/backend/utils/mvstats/README.stats     |   2 +
 src/backend/utils/mvstats/common.c         |  23 +++-
 src/backend/utils/mvstats/mvdist.c         | 171 +++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h      |  26 +++--
 src/include/nodes/relation.h               |   2 +
 src/include/utils/mvstats.h                |   9 +-
 src/test/regress/expected/rules.out        |   3 +-
 16 files changed, 424 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.ndistinct
 create mode 100644 src/backend/utils/mvstats/mvdist.c

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index fd3382e..80360a6 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -168,6 +168,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6afdee0..a550141 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -169,7 +169,8 @@ CREATE VIEW pg_mv_stats AS
         length(S.stamcv) AS mcvbytes,
         pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
         length(S.stahist) AS histbytes,
-        pg_mv_stats_histogram_info(S.stahist) AS histinfo
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo,
+        standcoeff AS ndcoeff
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index cbaa4e1..0f6db77 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -582,7 +582,7 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 		}
 
 		/* Build multivariate stats (if there are any). */
-		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
+		build_mv_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index b974655..6ea0e13 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -138,7 +138,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool 	build_dependencies = false,
 			build_mcv = false,
-			build_histogram = false;
+			build_histogram = false,
+			build_ndistinct = false;
 
 	int32 	max_buckets = -1,
 			max_mcv_items = -1;
@@ -221,6 +222,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "max_mcv_items") == 0)
@@ -275,10 +278,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv || build_histogram))
+	if (! (build_dependencies || build_mcv || build_histogram || build_ndistinct))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram, ndistinct) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -311,6 +314,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
 	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_ndist_enabled-1] = BoolGetDatum(build_ndistinct);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
@@ -318,6 +322,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+	nulls[Anum_pg_mv_statistic_standist -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c1b8999..2540da9 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
+#define		MV_CLAUSE_TYPE_NDIST	0x08
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -3246,6 +3247,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_NDIST) && stat->ndist_built)
+		return true;
+
 	return false;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d46aed2..bd2c306 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -427,11 +427,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
 				info->hist_enabled = mvstat->hist_enabled;
+				info->ndist_enabled = mvstat->ndist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
 				info->hist_built = mvstat->hist_built;
+				info->ndist_built = mvstat->ndist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 7d0a3a1..a84dd2b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -132,6 +132,7 @@
 #include "utils/fmgroids.h"
 #include "utils/index_selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/nabstime.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
@@ -206,6 +207,7 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static Oid find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3422,12 +3424,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient
+			 * to get the (probably way more accurate) estimate.
+			 *
+			 * XXX Probably needs refactoring (don't like to mix with clamp
+			 *     and coeff at the same time).
 			 */
 			double		clamp = rel->tuples;
+			double		coeff = 1.0;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				Oid oid = find_ndistinct_coeff(root, rel, varinfos);
+
+				if (oid != InvalidOid)
+					coeff = load_mv_ndistinct(oid);
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3436,6 +3452,13 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
+			/*
+			 * Apply ndistinct coefficient from multivar stats (we must do this
+			 * before clamping the estimate in any way.
+			 */
+			reldistinct /= coeff;
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7582,3 +7605,71 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics.
+ */
+static Oid
+find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+	VariableStatData vardata;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+				= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach (lc, rel->mvstatlist)
+	{
+		int i;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/* only exact matches for now (same set of columns) */
+		if (bms_num_members(attnums) != info->stakeys->dim1)
+			continue;
+
+		/* check that the columns match */
+		for (i = 0; i < info->stakeys->dim1; i++)
+			if (bms_is_member(info->stakeys->values[i], attnums))
+				continue;
+
+		return info->mvoid;
+	}
+
+	return InvalidOid;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 9dbb3b6..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o histogram.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.ndistinct b/src/backend/utils/mvstats/README.ndistinct
new file mode 100644
index 0000000..32d1624
--- /dev/null
+++ b/src/backend/utils/mvstats/README.ndistinct
@@ -0,0 +1,83 @@
+ndistinct coefficients
+======================
+
+Estimating number of distinct groups in a combination of columns is tricky,
+and the estimation error is often significant. By ndistinct coefficient we
+mean a ratio
+
+    q = ndistinct(a) * ndistinct(b) / ndistinct(a,b)
+
+where 'a' and 'b' are columns, ndistinct(a) is (an estimate of) a number of
+distinct values in column 'a'. And ndistinct(a,b) is the same thing for the
+pair of columns.
+
+The meaning of the coefficient may be illustrated by answering the following
+question: Given a combination of columns (a,b), how many distinct values of 'b'
+matches a chosen value of 'a' on average?
+
+Let's assume we know ndistinct(a) and ndistinct(a,b). Then the answer to the
+question clearly is
+
+    ndistinct(a,b) / ndistinct(a)
+
+and by using 'q' we may rewrite this as
+
+    ndistinct(b) / q
+
+so 'q' may be considered as a correction factor of the ndistinct estimate given
+a condition on one of the columns.
+
+This may be generalized to a combination of 'n' columns
+
+    [ndistinct(c1) * ... * ndistinct(cn)] / ndistinct(c1, ..., cn)
+
+and the meaning is very similar, except that we need to use conditions on (n-1)
+of the columns.
+
+
+Selectivity estimation
+----------------------
+
+As explained in the previous paragraph, ndistinct coefficients may be used to
+estimate cardinality of a column, given some apriori knowledge. Let's assume
+we need to estimate selectivity of a condition
+
+    (a=1) AND (b=2)
+
+which we can expand like this
+
+    P(a=1 & b=2) = P(a=1) * P(b=2 | a=1)
+
+Let's also assume that the distribution of 'b' is uniform, i.e. that
+
+    P(a=1) = 1/ndistinct(a)
+    P(b=2) = 1/ndistinct(b)
+    P(a=1 & b=2) = 1/ndistinct(a,b)
+
+    P(b=2 | a=1) = ndistinct(a) / ndistinct(a,b)
+
+which may be rewritten like
+
+    P(b=2 | a=1)
+        = ndistinct(a,b) / ndistinct(a)
+        = (1/ndistinct(b)) * [(ndistinct(a) * ndistinct(b)) / ndistinct(a,b)]
+        = (1/ndistinct(b)) * q
+
+and therefore
+
+    P(a=1 & b=2) = (1/ndistinct(a)) * (1/ndistinct(b)) * q
+
+This also illustrates 'q' as a correction coefficient.
+
+It also explains why we store the coefficient and not simply ndistinct(a,b).
+This way we can simply estimate individual clauses and then simply correct
+the estimate by multiplying the result with 'q' - we don't have to mess with
+ndistinct estimates at all.
+
+Naturally, as the coefficient is derives from ndistinct(a,b), it may be also
+used to estimate GROUP BY clauses on the combination of columns, replacing the
+existing heuristics in estimate_num_groups().
+
+Note: Currently only the GROUP BY estimation is implemented. It's a bit unclear
+how to implement the clause estimation when there are other statistics (esp.
+MCV lists and/or functional dependencies) available.
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index d404914..6d4b09b 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -20,6 +20,8 @@ Currently we only have two kinds of multivariate statistics
 
     (c) multivariate histograms (README.histogram)
 
+    (d) ndistinct coefficients
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index ffb76f4..2be980d 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -32,7 +32,8 @@ static List* list_mv_stats(Oid relid);
  * and serializes them back into the catalog (as bytea values).
  */
 void
-build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats)
 {
 	ListCell *lc;
@@ -53,6 +54,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
 		MVHistogram	histogram = NULL;
+		double		ndist	  = -1;
 		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
@@ -92,6 +94,9 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		if (stat->ndist_enabled)
+			ndist = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
+
 		/* build the MCV list */
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
@@ -101,7 +106,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, ndist, attrs, stats);
 	}
 }
 
@@ -183,6 +188,8 @@ list_mv_stats(Oid relid)
 		info->mcv_built = stats->mcv_built;
 		info->hist_enabled = stats->hist_enabled;
 		info->hist_built = stats->hist_built;
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
 
 		result = lappend(result, info);
 	}
@@ -252,7 +259,7 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 void
 update_mv_stats(Oid mvoid,
 				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
-				int2vector *attrs, VacAttrStats **stats)
+				double ndistcoeff, int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -292,26 +299,36 @@ update_mv_stats(Oid mvoid,
 			= PointerGetDatum(data);
 	}
 
+	if (ndistcoeff > 1.0)
+	{
+		nulls[Anum_pg_mv_statistic_standist -1] = false;
+		values[Anum_pg_mv_statistic_standist-1] = Float8GetDatum(ndistcoeff);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_standist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_ndist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_ndist_built-1] = true;
 	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_ndist_built-1] = BoolGetDatum(ndistcoeff > 1.0);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..59b8358
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,171 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <math.h>
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats)
+{
+	int i, j;
+	int f1, cnt, d;
+	int nmultiple, summultiple;
+	int numattrs = attrs->dim1;
+	MultiSortSupport mss = multi_sort_init(numattrs);
+	double ndistcoeff;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	Assert(numattrs >= 2);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, i, stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[i],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	ndistcoeff = 1 / estimate_ndistinct(totalrows, numrows, d, f1);
+
+	/*
+	 * now count distinct values for each attribute and incrementally
+	 * compute ndistinct(a,b) / (ndistinct(a) * ndistinct(b))
+	 *
+	 * FIXME Probably need to handle cases when one of the ndistinct
+	 *       estimates is negative, and also check that the combined
+	 *       ndistinct is greater than any of those partial values.
+	 */
+	for (i = 0; i < numattrs; i++)
+		ndistcoeff *= stats[i]->stadistinct;
+
+	return ndistcoeff;
+}
+
+double
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return DatumGetFloat8(deps);
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double	numer,
+			denom,
+			ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+			(double) f1 * (double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index a5945af..ee353da 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,6 +39,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
 	bool		hist_enabled;		/* build histogram? */
+	bool		ndist_enabled;		/* build ndist coefficient? */
 
 	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
@@ -48,6 +49,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
 	bool		hist_built;			/* histogram was built */
+	bool		ndist_built;		/* ndistinct coeff built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -56,6 +58,7 @@ CATALOG(pg_mv_statistic,3381)
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
 	bytea		stahist;			/* MV histogram (serialized) */
+	float8		standcoeff;			/* ndistinct coeff (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -71,21 +74,24 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					15
+#define Natts_pg_mv_statistic					18
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
 #define Anum_pg_mv_statistic_mcv_enabled		5
 #define Anum_pg_mv_statistic_hist_enabled		6
-#define Anum_pg_mv_statistic_mcv_max_items		7
-#define Anum_pg_mv_statistic_hist_max_buckets	8
-#define Anum_pg_mv_statistic_deps_built			9
-#define Anum_pg_mv_statistic_mcv_built			10
-#define Anum_pg_mv_statistic_hist_built			11
-#define Anum_pg_mv_statistic_stakeys			12
-#define Anum_pg_mv_statistic_stadeps			13
-#define Anum_pg_mv_statistic_stamcv				14
-#define Anum_pg_mv_statistic_stahist			15
+#define Anum_pg_mv_statistic_ndist_enabled		7
+#define Anum_pg_mv_statistic_mcv_max_items		8
+#define Anum_pg_mv_statistic_hist_max_buckets	9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_ndist_built		13
+#define Anum_pg_mv_statistic_stakeys			14
+#define Anum_pg_mv_statistic_stadeps			15
+#define Anum_pg_mv_statistic_stamcv				16
+#define Anum_pg_mv_statistic_stahist			17
+#define Anum_pg_mv_statistic_standist			18
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 46bece6..a2fafd2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -621,11 +621,13 @@ typedef struct MVStatisticInfo
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
 	bool		hist_enabled;	/* histogram enabled */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
 	bool		hist_built;		/* histogram built */
+	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 35b2f8e..fb2c5d8 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -225,6 +225,7 @@ typedef MVSerializedHistogramData *MVSerializedHistogram;
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
 MVSerializedHistogram    load_mv_histogram(Oid mvoid);
+double		   load_mv_ndistinct(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
@@ -266,11 +267,17 @@ MVHistogram
 build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
 				   VacAttrStats **stats, int numrows_total);
 
-void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, double totalrows,
+					int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid, MVDependencies dependencies,
 					 MCVList mcvlist, MVHistogram histogram,
+					 double ndistcoeff,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #ifdef DEBUG_MVHIST
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1a1a4ca..0ad935e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1377,7 +1377,8 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stamcv) AS mcvbytes,
     pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
     length(s.stahist) AS histbytes,
-    pg_mv_stats_histogram_info(s.stahist) AS histinfo
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo,
+    s.standcoeff AS ndcoeff
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
-- 
2.1.0

0008-change-how-we-apply-selectivity-to-number-of-groups-.patchtext/x-patch; charset=UTF-8; name=0008-change-how-we-apply-selectivity-to-number-of-groups-.patchDownload

From 16df0859ba9478af4d93fc8fe45f17b4f255e1a8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 26 Jan 2016 18:14:33 +0100
Subject: [PATCH 8/9] change how we apply selectivity to number of groups
 estimate

Instead of simply multiplying the ndistinct estimate with selecticity,
we instead use the formula for the expected number of distinct values
observed in 'k' rows when there are 'd' distinct values in the bin

    d * (1 - ((d - 1) / d)^k)

This is 'with replacements' which seems appropriate for the use, and it
mostly assumes uniform distribution of the distinct values. So if the
distribution is not uniform (e.g. there are very frequent groups) this
may be less accurate than the current algorithm in some cases, giving
over-estimates. But that's probably better than OOM.
---
 src/backend/utils/adt/selfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index a84dd2b..ce3ad19 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3465,7 +3465,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			/*
 			 * Multiply by restriction selectivity.
 			 */
-			reldistinct *= rel->rows / rel->tuples;
+			reldistinct = reldistinct * (1 - powl((reldistinct - 1) / reldistinct,rel->rows));
 
 			/*
 			 * Update estimate of total distinct groups.
-- 
2.1.0

0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchtext/x-patch; charset=UTF-8; name=0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchDownload

From 60ab2e6675b5d43f5cebccb7fd06c7e7387992f3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 28 Feb 2016 21:16:40 +0100
Subject: [PATCH 9/9] fixup of regression tests (plans changes by group by
 estimation)

---
 src/test/regress/expected/join.out      | 20 ++++++++++----------
 src/test/regress/expected/subselect.out | 25 +++++++++++--------------
 src/test/regress/expected/union.out     | 16 ++++++++--------
 3 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 59d7877..d9dd5ca 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3951,17 +3951,17 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
               QUERY PLAN               
 ---------------------------------------
- Merge Left Join
-   Merge Cond: (d.a = s.id)
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+ Merge Right Join
+   Merge Cond: (s.id = d.a)
    ->  Sort
          Sort Key: s.id
          ->  Subquery Scan on s
                ->  HashAggregate
                      Group Key: b.id
                      ->  Seq Scan on b
+   ->  Sort
+         Sort Key: d.a
+         ->  Seq Scan on d
 (11 rows)
 
 -- similarly, but keying off a DISTINCT clause
@@ -3970,17 +3970,17 @@ select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
                  QUERY PLAN                  
 ---------------------------------------------
- Merge Left Join
-   Merge Cond: (d.a = s.id)
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+ Merge Right Join
+   Merge Cond: (s.id = d.a)
    ->  Sort
          Sort Key: s.id
          ->  Subquery Scan on s
                ->  HashAggregate
                      Group Key: b.id, b.c_id
                      ->  Seq Scan on b
+   ->  Sort
+         Sort Key: d.a
+         ->  Seq Scan on d
 (11 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index de64ca7..0fc93d9 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -807,27 +807,24 @@ select * from int4_tbl where
 explain (verbose, costs off)
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
-                              QUERY PLAN                              
-----------------------------------------------------------------------
- Hash Join
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Hash Semi Join
    Output: o.f1
    Hash Cond: (o.f1 = "ANY_subquery".f1)
    ->  Seq Scan on public.int4_tbl o
          Output: o.f1
    ->  Hash
          Output: "ANY_subquery".f1, "ANY_subquery".g
-         ->  HashAggregate
+         ->  Subquery Scan on "ANY_subquery"
                Output: "ANY_subquery".f1, "ANY_subquery".g
-               Group Key: "ANY_subquery".f1, "ANY_subquery".g
-               ->  Subquery Scan on "ANY_subquery"
-                     Output: "ANY_subquery".f1, "ANY_subquery".g
-                     Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
-                     ->  HashAggregate
-                           Output: i.f1, (generate_series(1, 2) / 10)
-                           Group Key: i.f1
-                           ->  Seq Scan on public.int4_tbl i
-                                 Output: i.f1
-(18 rows)
+               Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+               ->  HashAggregate
+                     Output: i.f1, (generate_series(1, 2) / 10)
+                     Group Key: i.f1
+                     ->  Seq Scan on public.int4_tbl i
+                           Output: i.f1
+(15 rows)
 
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 016571b..f2e297e 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -263,16 +263,16 @@ ORDER BY 1;
 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
         q2        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
         q2        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;
@@ -305,16 +305,16 @@ SELECT q1 FROM int8_tbl EXCEPT SELECT q2 FROM int8_tbl;
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
         q1        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q1 FROM int8_tbl FOR NO KEY UPDATE;
@@ -343,8 +343,8 @@ SELECT f1 FROM float8_tbl EXCEPT SELECT f1 FROM int4_tbl ORDER BY 1;
 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -355,15 +355,15 @@ SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FR
 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -419,8 +419,8 @@ HINT:  There is a column named "q2" in table "*SELECT* 2", but it cannot be refe
 SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 --
-- 
2.1.0

#72

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Tomas Vondra (#71)

1 attachment(s)

Re: multivariate statistics v11

On Tue, Mar 8, 2016 at 12:13 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

attached is v11 of the patch - this is mostly a cleanup of v10, removing
redundant code, adding missing comments, removing obsolete FIXME/TODOs
and so on. Overall this shaves ~20kB from the patch (not a primary
objective, though).

This has some some conflicts with the pathification commit, in the
regression tests.

To avoid that, I applied it to the commit before that, 3fc6e2d7f5b652b417fa6^

Having done that, In my hands, it fails its own regression tests.
Diff attached.

It breaks contrib postgres_fdw, I'll look into that when I get a
chance of no one beats me to it.

postgres_fdw.c: In function 'postgresGetForeignJoinPaths':
postgres_fdw.c:3623: error: too few arguments to function
'clauselist_selectivity'
postgres_fdw.c:3642: error: too few arguments to function
'clauselist_selectivity'

Cheers,

Jeff

Attachments:

regression.diffsapplication/octet-stream; name=regression.diffsDownload

*** /home/jjanes/pgsql/git/src/test/regress/expected/mv_dependencies.out	2016-03-08 18:08:45.275328461 -0800
--- /home/jjanes/pgsql/git/src/test/regress/results/mv_dependencies.out	2016-03-08 18:17:34.914707058 -0800
***************
*** 21,26 ****
--- 21,28 ----
  ERROR:  unrecognized STATISTICS option "unknown_option"
  -- correct command
  CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+ ERROR:  duplicate key value violates unique constraint "pg_mv_statistic_name_index"
+ DETAIL:  Key (staname, stanamespace)=(s1, 2200) already exists.
  -- random data (no functional dependencies)
  INSERT INTO functional_dependencies
       SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
***************
*** 29,36 ****
    FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
   deps_enabled | deps_built | pg_mv_stats_dependencies_show 
  --------------+------------+-------------------------------
!  t            | f          | 
! (1 row)
  
  TRUNCATE functional_dependencies;
  -- a => b, a => c, b => c
--- 31,37 ----
    FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
   deps_enabled | deps_built | pg_mv_stats_dependencies_show 
  --------------+------------+-------------------------------
! (0 rows)
  
  TRUNCATE functional_dependencies;
  -- a => b, a => c, b => c
***************
*** 41,48 ****
    FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
   deps_enabled | deps_built | pg_mv_stats_dependencies_show 
  --------------+------------+-------------------------------
!  t            | t          | 1 => 2, 1 => 3, 2 => 3
! (1 row)
  
  TRUNCATE functional_dependencies;
  -- a => b, a => c
--- 42,48 ----
    FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
   deps_enabled | deps_built | pg_mv_stats_dependencies_show 
  --------------+------------+-------------------------------
! (0 rows)
  
  TRUNCATE functional_dependencies;
  -- a => b, a => c
***************
*** 53,60 ****
    FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
   deps_enabled | deps_built | pg_mv_stats_dependencies_show 
  --------------+------------+-------------------------------
!  t            | t          | 1 => 2, 1 => 3
! (1 row)
  
  TRUNCATE functional_dependencies;
  -- check explain (expect bitmap index scan, not plain index scan)
--- 53,59 ----
    FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
   deps_enabled | deps_built | pg_mv_stats_dependencies_show 
  --------------+------------+-------------------------------
! (0 rows)
  
  TRUNCATE functional_dependencies;
  -- check explain (expect bitmap index scan, not plain index scan)
***************
*** 66,83 ****
    FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
   deps_enabled | deps_built | pg_mv_stats_dependencies_show 
  --------------+------------+-------------------------------
!  t            | t          | 1 => 2, 1 => 3, 2 => 3
! (1 row)
  
  EXPLAIN (COSTS off)
   SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
!                  QUERY PLAN                  
! ---------------------------------------------
!  Bitmap Heap Scan on functional_dependencies
!    Recheck Cond: ((a = 10) AND (b = 5))
!    ->  Bitmap Index Scan on fdeps_idx
!          Index Cond: ((a = 10) AND (b = 5))
! (4 rows)
  
  DROP TABLE functional_dependencies;
  -- varlena type (text)
--- 65,79 ----
    FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
   deps_enabled | deps_built | pg_mv_stats_dependencies_show 
  --------------+------------+-------------------------------
! (0 rows)
  
  EXPLAIN (COSTS off)
   SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
!                       QUERY PLAN                       
! -------------------------------------------------------
!  Index Scan using fdeps_idx on functional_dependencies
!    Index Cond: ((a = 10) AND (b = 5))
! (2 rows)
  
  DROP TABLE functional_dependencies;
  -- varlena type (text)
***************
*** 103,172 ****
  INSERT INTO functional_dependencies
       SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
  ANALYZE functional_dependencies;
! SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
!   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
!  deps_enabled | deps_built | pg_mv_stats_dependencies_show 
! --------------+------------+-------------------------------
!  t            | t          | 1 => 2, 1 => 3, 2 => 3
! (1 row)
! 
! TRUNCATE functional_dependencies;
! -- a => b, a => c
! INSERT INTO functional_dependencies
!      SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
! ANALYZE functional_dependencies;
! SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
!   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
!  deps_enabled | deps_built | pg_mv_stats_dependencies_show 
! --------------+------------+-------------------------------
!  t            | t          | 1 => 2, 1 => 3
! (1 row)
! 
! TRUNCATE functional_dependencies;
! -- check explain (expect bitmap index scan, not plain index scan)
! INSERT INTO functional_dependencies
!      SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
! CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
! ANALYZE functional_dependencies;
! SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
!   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
!  deps_enabled | deps_built | pg_mv_stats_dependencies_show 
! --------------+------------+-------------------------------
!  t            | t          | 1 => 2, 1 => 3, 2 => 3
! (1 row)
! 
! EXPLAIN (COSTS off)
!  SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
!                          QUERY PLAN                         
! ------------------------------------------------------------
!  Bitmap Heap Scan on functional_dependencies
!    Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
!    ->  Bitmap Index Scan on fdeps_idx
!          Index Cond: ((a = '10'::text) AND (b = '5'::text))
! (4 rows)
! 
! DROP TABLE functional_dependencies;
! -- NULL values (mix of int and text columns)
! CREATE TABLE functional_dependencies (
!     a INT,
!     b TEXT,
!     c INT,
!     d TEXT
! );
! CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
! INSERT INTO functional_dependencies
!      SELECT
!          mod(i, 100),
!          (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
!          mod(i, 400),
!          (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
!      FROM generate_series(1,10000) s(i);
! ANALYZE functional_dependencies;
! SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
!   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
!  deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
! --------------+------------+----------------------------------------
!  t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
! (1 row)
! 
! DROP TABLE functional_dependencies;
--- 99,108 ----
  INSERT INTO functional_dependencies
       SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
  ANALYZE functional_dependencies;
! WARNING:  terminating connection because of crash of another server process
! DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
! HINT:  In a moment you should be able to reconnect to the database and repeat your command.
! server closed the connection unexpectedly
! 	This probably means the server terminated abnormally
! 	before or while processing the request.
! connection to server was lost

======================================================================

*** /home/jjanes/pgsql/git/src/test/regress/expected/mv_mcv.out	2016-03-08 18:08:45.299328161 -0800
--- /home/jjanes/pgsql/git/src/test/regress/results/mv_mcv.out	2016-03-08 18:17:34.643710446 -0800
***************
*** 80,207 ****
  
  EXPLAIN (COSTS off)
   SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
!                  QUERY PLAN                 
! --------------------------------------------
!  Bitmap Heap Scan on mcv_list
!    Recheck Cond: ((a = 10) AND (b = 5))
!    ->  Bitmap Index Scan on mcv_idx
!          Index Cond: ((a = 10) AND (b = 5))
! (4 rows)
! 
! DROP TABLE mcv_list;
! -- varlena type (text)
! CREATE TABLE mcv_list (
!     a TEXT,
!     b TEXT,
!     c TEXT
! );
! CREATE STATISTICS s2 ON mcv_list (a, b, c) WITH (mcv);
! -- random data
! INSERT INTO mcv_list
!      SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
! ANALYZE mcv_list;
! SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
!   FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
!  mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
! -------------+-----------+--------------------------
!  t           | f         | 
! (1 row)
! 
! TRUNCATE mcv_list;
! -- a => b, a => c, b => c
! INSERT INTO mcv_list
!      SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
! ANALYZE mcv_list;
! SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
!   FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
!  mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
! -------------+-----------+--------------------------
!  t           | t         | nitems=1000
! (1 row)
! 
! TRUNCATE mcv_list;
! -- a => b, a => c
! INSERT INTO mcv_list
!      SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
! ANALYZE mcv_list;
! SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
!   FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
!  mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
! -------------+-----------+--------------------------
!  t           | t         | nitems=1000
! (1 row)
! 
! TRUNCATE mcv_list;
! -- check explain (expect bitmap index scan, not plain index scan)
! INSERT INTO mcv_list
!      SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
! CREATE INDEX mcv_idx ON mcv_list (a, b);
! ANALYZE mcv_list;
! SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
!   FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
!  mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
! -------------+-----------+--------------------------
!  t           | t         | nitems=100
! (1 row)
! 
! EXPLAIN (COSTS off)
!  SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
!                          QUERY PLAN                         
! ------------------------------------------------------------
!  Bitmap Heap Scan on mcv_list
!    Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
!    ->  Bitmap Index Scan on mcv_idx
!          Index Cond: ((a = '10'::text) AND (b = '5'::text))
! (4 rows)
! 
! TRUNCATE mcv_list;
! -- check explain (expect bitmap index scan, not plain index scan) with NULLs
! INSERT INTO mcv_list
!      SELECT
!        (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
!        (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
!        (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
!      FROM generate_series(1,1000000) s(i);
! ANALYZE mcv_list;
! SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
!   FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
!  mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
! -------------+-----------+--------------------------
!  t           | t         | nitems=100
! (1 row)
! 
! EXPLAIN (COSTS off)
!  SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
!                     QUERY PLAN                     
! ---------------------------------------------------
!  Bitmap Heap Scan on mcv_list
!    Recheck Cond: ((a IS NULL) AND (b IS NULL))
!    ->  Bitmap Index Scan on mcv_idx
!          Index Cond: ((a IS NULL) AND (b IS NULL))
! (4 rows)
! 
! DROP TABLE mcv_list;
! -- NULL values (mix of int and text columns)
! CREATE TABLE mcv_list (
!     a INT,
!     b TEXT,
!     c INT,
!     d TEXT
! );
! CREATE STATISTICS s3 ON mcv_list (a, b, c, d) WITH (mcv);
! INSERT INTO mcv_list
!      SELECT
!          mod(i, 100),
!          (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
!          mod(i, 400),
!          (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
!      FROM generate_series(1,10000) s(i);
! ANALYZE mcv_list;
! SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
!   FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
!  mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
! -------------+-----------+--------------------------
!  t           | t         | nitems=1200
! (1 row)
! 
! DROP TABLE mcv_list;
--- 80,86 ----
  
  EXPLAIN (COSTS off)
   SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
! server closed the connection unexpectedly
! 	This probably means the server terminated abnormally
! 	before or while processing the request.
! connection to server was lost

======================================================================

*** /home/jjanes/pgsql/git/src/test/regress/expected/mv_histogram.out	2016-03-08 18:08:45.373327236 -0800
--- /home/jjanes/pgsql/git/src/test/regress/results/mv_histogram.out	2016-03-08 18:17:34.920706983 -0800
***************
*** 30,35 ****
--- 30,37 ----
  ERROR:  maximum number of buckets is 16384
  -- correct command
  CREATE STATISTICS s1 ON mv_histogram (a, b, c) WITH (histogram);
+ ERROR:  duplicate key value violates unique constraint "pg_mv_statistic_name_index"
+ DETAIL:  Key (staname, stanamespace)=(s1, 2200) already exists.
  -- random data (no functional dependencies)
  INSERT INTO mv_histogram
       SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
***************
*** 38,45 ****
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
!  t            | t
! (1 row)
  
  TRUNCATE mv_histogram;
  -- a => b, a => c, b => c
--- 40,46 ----
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
! (0 rows)
  
  TRUNCATE mv_histogram;
  -- a => b, a => c, b => c
***************
*** 50,57 ****
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
!  t            | t
! (1 row)
  
  TRUNCATE mv_histogram;
  -- a => b, a => c
--- 51,57 ----
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
! (0 rows)
  
  TRUNCATE mv_histogram;
  -- a => b, a => c
***************
*** 62,69 ****
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
!  t            | t
! (1 row)
  
  TRUNCATE mv_histogram;
  -- check explain (expect bitmap index scan, not plain index scan)
--- 62,68 ----
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
! (0 rows)
  
  TRUNCATE mv_histogram;
  -- check explain (expect bitmap index scan, not plain index scan)
***************
*** 75,92 ****
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
!  t            | t
! (1 row)
  
  EXPLAIN (COSTS off)
   SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
!                  QUERY PLAN                 
! --------------------------------------------
!  Bitmap Heap Scan on mv_histogram
!    Recheck Cond: ((a = 10) AND (b = 5))
!    ->  Bitmap Index Scan on hist_idx
!          Index Cond: ((a = 10) AND (b = 5))
! (4 rows)
  
  DROP TABLE mv_histogram;
  -- varlena type (text)
--- 74,88 ----
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
! (0 rows)
  
  EXPLAIN (COSTS off)
   SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
!                 QUERY PLAN                 
! -------------------------------------------
!  Index Scan using hist_idx on mv_histogram
!    Index Cond: ((a = 10) AND (b = 5))
! (2 rows)
  
  DROP TABLE mv_histogram;
  -- varlena type (text)
***************
*** 96,101 ****
--- 92,99 ----
      c TEXT
  );
  CREATE STATISTICS s2 ON mv_histogram (a, b, c) WITH (histogram);
+ ERROR:  duplicate key value violates unique constraint "pg_mv_statistic_name_index"
+ DETAIL:  Key (staname, stanamespace)=(s2, 2200) already exists.
  -- random data (no functional dependencies)
  INSERT INTO mv_histogram
       SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
***************
*** 104,111 ****
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
!  t            | t
! (1 row)
  
  TRUNCATE mv_histogram;
  -- a => b, a => c, b => c
--- 102,108 ----
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
! (0 rows)
  
  TRUNCATE mv_histogram;
  -- a => b, a => c, b => c
***************
*** 116,123 ****
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
!  t            | t
! (1 row)
  
  TRUNCATE mv_histogram;
  -- a => b, a => c
--- 113,119 ----
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
! (0 rows)
  
  TRUNCATE mv_histogram;
  -- a => b, a => c
***************
*** 128,207 ****
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
!  t            | t
! (1 row)
  
  TRUNCATE mv_histogram;
  -- check explain (expect bitmap index scan, not plain index scan)
  INSERT INTO mv_histogram
       SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
! CREATE INDEX hist_idx ON mv_histogram (a, b);
! ANALYZE mv_histogram;
! SELECT hist_enabled, hist_built
!   FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
!  hist_enabled | hist_built 
! --------------+------------
!  t            | t
! (1 row)
! 
! EXPLAIN (COSTS off)
!  SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
!                          QUERY PLAN                         
! ------------------------------------------------------------
!  Bitmap Heap Scan on mv_histogram
!    Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
!    ->  Bitmap Index Scan on hist_idx
!          Index Cond: ((a = '10'::text) AND (b = '5'::text))
! (4 rows)
! 
! TRUNCATE mv_histogram;
! -- check explain (expect bitmap index scan, not plain index scan) with NULLs
! INSERT INTO mv_histogram
!      SELECT
!        (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
!        (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
!        (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
!      FROM generate_series(1,1000000) s(i);
! ANALYZE mv_histogram;
! SELECT hist_enabled, hist_built
!   FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
!  hist_enabled | hist_built 
! --------------+------------
!  t            | t
! (1 row)
! 
! EXPLAIN (COSTS off)
!  SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
!                     QUERY PLAN                     
! ---------------------------------------------------
!  Bitmap Heap Scan on mv_histogram
!    Recheck Cond: ((a IS NULL) AND (b IS NULL))
!    ->  Bitmap Index Scan on hist_idx
!          Index Cond: ((a IS NULL) AND (b IS NULL))
! (4 rows)
! 
! DROP TABLE mv_histogram;
! -- NULL values (mix of int and text columns)
! CREATE TABLE mv_histogram (
!     a INT,
!     b TEXT,
!     c INT,
!     d TEXT
! );
! CREATE STATISTICS s3 ON mv_histogram (a, b, c, d) WITH (histogram);
! INSERT INTO mv_histogram
!      SELECT
!          mod(i, 100),
!          (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
!          mod(i, 400),
!          (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
!      FROM generate_series(1,10000) s(i);
! ANALYZE mv_histogram;
! SELECT hist_enabled, hist_built
!   FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
!  hist_enabled | hist_built 
! --------------+------------
!  t            | t
! (1 row)
! 
! DROP TABLE mv_histogram;
--- 124,139 ----
    FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
   hist_enabled | hist_built 
  --------------+------------
! (0 rows)
  
  TRUNCATE mv_histogram;
  -- check explain (expect bitmap index scan, not plain index scan)
  INSERT INTO mv_histogram
       SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
! WARNING:  terminating connection because of crash of another server process
! DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
! HINT:  In a moment you should be able to reconnect to the database and repeat your command.
! server closed the connection unexpectedly
! 	This probably means the server terminated abnormally
! 	before or while processing the request.
! connection to server was lost

======================================================================

#73

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Jeff Janes (#72)

9 attachment(s)

Re: multivariate statistics v11

Hi,

thanks for looking at the patch. Sorry for the issues, attached is a
version v13 that should fix them (or most of them).

On Tue, 2016-03-08 at 18:24 -0800, Jeff Janes wrote:

On Tue, Mar 8, 2016 at 12:13 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

attached is v11 of the patch - this is mostly a cleanup of v10, removing
redundant code, adding missing comments, removing obsolete FIXME/TODOs
and so on. Overall this shaves ~20kB from the patch (not a primary
objective, though).

This has some some conflicts with the pathification commit, in the
regression tests.

Yeah, there was one join plan difference, due to the ndistinct
estimation patch. Meh. Fixed.

To avoid that, I applied it to the commit before that, 3fc6e2d7f5b652b417fa6^

Rebased to 51c0f63e.

Having done that, In my hands, it fails its own regression tests.
Diff attached.

Fixed. This was caused by making names of the statistics unique across
tables, thus the regression tests started to fail when executed through
'make check' (but 'make installcheck' was still fine).

The diff however also includes a segfault, apparently in processing of
functional dependencies somewhere in ANALYZE. Sadly I've been unable to
reproduce any such failure, despite running the tests many times (even
when applied on the same commit). Is there any chance this might be due
to a broken build, or something like that. If not, can you try
reproducing it and investigate a bit (enable core dumps etc.)?

It breaks contrib postgres_fdw, I'll look into that when I get a
chance of no one beats me to it.

postgres_fdw.c: In function 'postgresGetForeignJoinPaths':
postgres_fdw.c:3623: error: too few arguments to function
'clauselist_selectivity'
postgres_fdw.c:3642: error: too few arguments to function
'clauselist_selectivity'

Yeah, apparently there are two new calls to clauselist_selectivity, so I
had to add NIL as list of conditions.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchtext/x-patch; charset=UTF-8; name=0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchDownload

From 5c28e5ca8feb2c2010d98bc69de952355bd6f3a5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/9] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index dff52c4..80d01bd 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -197,6 +197,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -245,6 +252,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.1.0

0002-shared-infrastructure-and-functional-dependencies.patchtext/x-patch; charset=UTF-8; name=0002-shared-infrastructure-and-functional-dependencies.patchDownload

From 414281cc51fe5a548b334531a1bfa8562375c681 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/9] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate stats, most
importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON table (columns) WITH (options)
- DROP STATISTICS name
- implementation of functional dependencies (the simplest type of
  multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e. it does not
influence the query planning (subject to follow-up patches).

The current implementation requires a valid 'ltopr' for the columns, so
that we can sort the sample rows in various ways, both in this patch
and other kinds of statistics. Maybe this restriction could be relaxed
in the future, requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV list with
limited functionality) might be made to work with hashes of the values,
which is sufficient for equality comparisons. But the queries would
require the equality operator anyway, so it's not really a weaker
requirement. The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple and probably
needs improvements, so that it detects more complicated dependencies,
and also validation of the math.

The name 'functional dependencies' is more correct (than 'association
rules') as it's exactly the name used in relational theory (esp. Normal
Forms) for tracking column-level dependencies.

The multivariate statistics are automatically removed in two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics would be
     defined on less than 2 columns (remaining)

If there are more at least two remaining columns, we keep the
statistics but perform cleanup on the next ANALYZE. The dropped columns
are removed from stakeys, and the new statistics is built on the
smaller set.

We can't do this at DROP COLUMN, because that'd leave us with invalid
statistics, or we'd have to throw it away although we can still use it.
This lazy approach lets us use the statistics although some of the
columns are dead.

This also adds a simple list of statistics to \d in psql.

This means the statistics are created within a schema by using a
qualified name (or using the default schema)

   CREATE STATISTICS schema.statistics ON ...

and then dropped by specifying qualified name

   DROP STATISTICS schema.statistics

or searching through search_path (just like with other objects).

This also gets rid of the "(opt_)stats_name" definitions in gram.y and
instead replaces them with just "opt_any_name", although the optional
case is not really handled currently - there's no generated name yet
(so either we should drop it or implement it).

I'm not entirely sure making statistics schema-specific is that a great
idea. Maybe it should be "global", but that does not seem right (e.g.
it makes multi-tenant systems based on schemas more difficult to
manage, because tenants would interact).
---
 doc/src/sgml/ref/allfiles.sgml                |   2 +
 doc/src/sgml/ref/create_statistics.sgml       | 174 ++++++++++
 doc/src/sgml/ref/drop_statistics.sgml         |  90 ++++++
 doc/src/sgml/reference.sgml                   |   2 +
 src/backend/catalog/Makefile                  |   1 +
 src/backend/catalog/dependency.c              |  11 +-
 src/backend/catalog/heap.c                    | 102 ++++++
 src/backend/catalog/namespace.c               |  51 +++
 src/backend/catalog/objectaddress.c           |  22 ++
 src/backend/catalog/system_views.sql          |  11 +
 src/backend/commands/Makefile                 |   6 +-
 src/backend/commands/analyze.c                |  21 ++
 src/backend/commands/dropcmds.c               |   4 +
 src/backend/commands/event_trigger.c          |   3 +
 src/backend/commands/statscmds.c              | 331 +++++++++++++++++++
 src/backend/commands/tablecmds.c              |   8 +-
 src/backend/nodes/copyfuncs.c                 |  16 +
 src/backend/nodes/outfuncs.c                  |  18 ++
 src/backend/optimizer/util/plancat.c          |  63 ++++
 src/backend/parser/gram.y                     |  34 +-
 src/backend/tcop/utility.c                    |  11 +
 src/backend/utils/Makefile                    |   2 +-
 src/backend/utils/cache/relcache.c            |  59 ++++
 src/backend/utils/cache/syscache.c            |  23 ++
 src/backend/utils/mvstats/Makefile            |  17 +
 src/backend/utils/mvstats/README.dependencies | 222 +++++++++++++
 src/backend/utils/mvstats/common.c            | 356 +++++++++++++++++++++
 src/backend/utils/mvstats/common.h            |  75 +++++
 src/backend/utils/mvstats/dependencies.c      | 437 ++++++++++++++++++++++++++
 src/bin/psql/describe.c                       |  44 +++
 src/include/catalog/dependency.h              |   5 +-
 src/include/catalog/heap.h                    |   1 +
 src/include/catalog/indexing.h                |   7 +
 src/include/catalog/namespace.h               |   2 +
 src/include/catalog/pg_mv_statistic.h         |  73 +++++
 src/include/catalog/pg_proc.h                 |   5 +
 src/include/catalog/toasting.h                |   1 +
 src/include/commands/defrem.h                 |   4 +
 src/include/nodes/nodes.h                     |   2 +
 src/include/nodes/parsenodes.h                |  12 +
 src/include/nodes/relation.h                  |  28 ++
 src/include/utils/mvstats.h                   |  70 +++++
 src/include/utils/rel.h                       |   4 +
 src/include/utils/relcache.h                  |   1 +
 src/include/utils/syscache.h                  |   2 +
 src/test/regress/expected/rules.out           |   9 +
 src/test/regress/expected/sanity_check.out    |   1 +
 47 files changed, 2432 insertions(+), 11 deletions(-)
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/README.dependencies
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index bf95453..c0f7653 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -76,6 +76,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
 <!ENTITY createTableSpace   SYSTEM "create_tablespace.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
 <!ENTITY dropTransform      SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..a86eae3
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,174 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON <replaceable class="PARAMETER">table_name</replaceable> ( [
+  { <replaceable class="PARAMETER">column_name</replaceable> } ] [, ...])
+[ WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )</literal></term>
+    <listitem>
+     <para>
+      ...
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Statistics Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>statistics parameters</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-notes">
+  <title>Notes</title>
+
+    <para>
+     ...
+    </para>
+
+ </refsect1>
+
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..4cc0b70
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,90 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics does not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 03020df..2b07b2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -104,6 +104,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createTable;
    &createTableAs;
    &createTableSpace;
@@ -147,6 +148,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropTable;
    &dropTableSpace;
    &dropTSConfig;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 25130ec..058b8a9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index c48e37b..8200454 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -160,7 +161,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1272,6 +1274,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2415,6 +2421,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 6a4a9d9..e7d9aaa 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_constraint_fn.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1613,7 +1614,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1841,6 +1845,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2696,6 +2705,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..dfd5bef 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4201,3 +4201,54 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+									  PointerGetDatum(stats_name),
+									  ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index d2aaa6d..3a6a0b0 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -438,9 +439,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		InvalidAttrNumber,		/* XXX same owner as relation */
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -913,6 +927,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2185,6 +2204,9 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			/* FIXME do the right owner checks here */
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index abf9a70..b8a264e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,6 +158,17 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..5151001 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8a5f07c..8ac9915 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -27,6 +27,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -55,7 +56,11 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
 
+#include "utils/mvstats.h"
+#include "access/sysattr.h"
 
 /* Per-index data for ANALYZE */
 typedef struct AnlIndexData
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..cd65b58 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -292,6 +292,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 9e32f8d..09061bb 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -110,6 +110,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1167,6 +1169,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_DEFACL:
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..84a8b13
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/multixact.h"
+#include "access/reloptions.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/dependency.h"
+#include "catalog/heap.h"
+#include "catalog/index.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_constraint.h"
+#include "catalog/pg_depend.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_inherits.h"
+#include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_tablespace.h"
+#include "catalog/pg_trigger.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_type_fn.h"
+#include "catalog/storage.h"
+#include "catalog/toasting.h"
+#include "commands/cluster.h"
+#include "commands/comment.h"
+#include "commands/defrem.h"
+#include "commands/event_trigger.h"
+#include "commands/policy.h"
+#include "commands/sequence.h"
+#include "commands/tablecmds.h"
+#include "commands/tablespace.h"
+#include "commands/trigger.h"
+#include "commands/typecmds.h"
+#include "commands/user.h"
+#include "executor/executor.h"
+#include "foreign/foreign.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/parsenodes.h"
+#include "optimizer/clauses.h"
+#include "optimizer/planner.h"
+#include "parser/parse_clause.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_collate.h"
+#include "parser/parse_expr.h"
+#include "parser/parse_oper.h"
+#include "parser/parse_relation.h"
+#include "parser/parse_type.h"
+#include "parser/parse_utilcmd.h"
+#include "parser/parser.h"
+#include "pgstat.h"
+#include "rewrite/rewriteDefine.h"
+#include "rewrite/rewriteHandler.h"
+#include "rewrite/rewriteManip.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "storage/lock.h"
+#include "storage/predicate.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/relcache.h"
+#include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+#include "utils/typcache.h"
+#include "utils/mvstats.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON table (columns) WITH (options)
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+	ObjectAddress	address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	ObjectAddress parentobject, childobject;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (stmt->if_not_exists &&
+		SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exists, skipping",
+						namestr)));
+		return InvalidObjectAddress;
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+
+	/* transform the column names to attnum values */
+
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, stmt->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+	values[Anum_pg_mv_statistic_staname -1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace -1] = ObjectIdGetDatum(namespaceId);
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also record dependency on the schema (to drop statistics on DROP SCHEMA)
+	 */
+	parentobject.classId = NamespaceRelationId;
+	parentobject.objectId = ObjectIdGetDatum(namespaceId);
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *     DROP STATISTICS stats_name ON table_name
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	HeapTuple	tup;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96dc923..96ab02f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -37,6 +37,7 @@
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -95,7 +96,7 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "utils/typcache.h"
-
+#include "utils/mvstats.h"
 
 /*
  * ON COMMIT action list
@@ -143,8 +144,9 @@ static List *on_commits = NIL;
 #define AT_PASS_ADD_COL			5		/* ADD COLUMN */
 #define AT_PASS_ADD_INDEX		6		/* ADD indexes */
 #define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_STATS		8		/* ADD statistics */
+#define AT_PASS_MISC			9		/* other stuff */
+#define AT_NUM_PASSES			10
 
 typedef struct AlteredTableInfo
 {
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..fce46cb 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4124,6 +4124,19 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt  *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4999,6 +5012,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb0fc1e..07206d7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2153,6 +2153,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3636,6 +3651,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index ad715bb..31939dd 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -28,6 +28,7 @@
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -40,7 +41,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -94,6 +97,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -387,6 +391,65 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+			if (! HeapTupleIsValid(htup))
+				continue;
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b307b48..3be3f02 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -241,7 +241,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -809,6 +809,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3436,6 +3437,36 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $5;
+							n->keys = $7;
+							n->options = $9;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $8;
+							n->keys = $10;
+							n->options = $12;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -5621,6 +5652,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 045f7f0..2ba88e2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1520,6 +1520,10 @@ ProcessUtilitySlow(Node *parsetree,
 				address = ExecSecLabelStmt((SecLabelStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:	/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -2160,6 +2164,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_TRANSFORM:
 					tag = "DROP TRANSFORM";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2527,6 +2534,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 130c06d..3bc4c8a 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3956,6 +3957,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4920,6 +4977,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 65ffe84..3c1bc4b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -502,6 +503,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
new file mode 100644
index 0000000..1f96fbc
--- /dev/null
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -0,0 +1,222 @@
+Soft functional dependencies
+============================
+
+A type of multivariate statistics used to capture cases when one column (or
+possibly a combination of columns) determines values in another column. We may
+also say that one column implies the other one.
+
+A simple artificial example may be a table with two columns, created like this
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, once we know the value for column 'a' the value for 'b' is trivially
+determined, as it's simply (a/10). A more practical example may be addresses,
+where (ZIP code -> city name), i.e. once we know the ZIP, we probably know the
+city it belongs to, as ZIP codes are usually assigned to one city. Larger cities
+may have multiple ZIP codes, so the dependency can't be reversed.
+
+Functional dependencies are a concept well described in relational theory,
+particularly in definition of normalization and "normal forms". Wikipedia has a
+nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency on
+    a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee" table
+    that includes the attributes "Employee ID" and "Employee Date of Birth", the
+    functional dependency {Employee ID} -> {Employee Date of Birth} would hold.
+    It follows from the previous two sentences that each {Employee ID} is
+    associated with precisely one {Employee Date of Birth}.
+
+    [1] http://en.wikipedia.org/wiki/Database_normalization
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases it's actually a conscious
+design choice to model the dataset in denormalized way, either because of
+performance or to make querying easier.
+
+The functional dependencies are called 'soft' because the implementation is
+meant to allow small number of rows contradicting the dependency. Many actual
+data sets contain some sort of errors, either because of data entry mistakes
+(user mistyping the ZIP code) or issues in generating the data (e.g. a ZIP code
+mistakenly assigned to two cities in different states). A strict implementation
+would ignore dependencies on such noisy data, rendering the approach unusable on
+such data sets.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current build algorithm is rather simple - for each pair (a,b) of columns,
+the data are sorted lexicographically (first by 'a', then by 'b'). Then for each
+group (rows with the same 'a' value) we decide whether the group is neutral,
+supporting or contradicting the dependency (a->b).
+
+A group is considered neutral when it's too small - e.g. when there's a single
+row in the group, there can't possibly be multiple values in 'b'. For this
+reason we ignore groups smaller than a threshold (currently 3 rows).
+
+For sufficiently large groups (3 rows or more), we count the number of distinct
+values in 'b'. When there's a single 'b' value, the group is considered to
+support the dependency (a->b), otherwise it's condidered as contradicting it.
+
+At the end, we compare the number of rows in supporting and contradicting groups,
+and if there are at least 10x as many supporting rows, we consider the
+functional dependency to be valid.
+
+
+This approach has the negative property that the algorithm is that it's a bit
+fragile with respect to the sample - there may be data sets producing quite
+different results for each ANALYZE execution (as even a single row may change
+the outcome of the final 10x test).
+
+It was proposed to make the dependencies "fuzzy" - e.g. track some coefficient
+between [0,1] determining how much the dependency holds. That would however mean
+we have to keep all the dependencies, as eliminating them based on the value of
+the coefficient (e.g. throw away dependencies <= 0.5) would result in exactly
+the same fragility issues. This would also make it more complicated to combine
+dependencies. So this does not seem like a practical approach.
+
+A better approach might be to replace the constants (min_group_size=3 and 10x)
+with values somehow related to the particular data set.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Apllying the functional dependencies is quite simple - given a list of equality
+clauses, check which clauses are redundant (i.e. implied by some other clause).
+For example given clause list
+
+    (a = 2) AND (b = 2) AND (c = 3)
+
+and dependencies (a->b) and (a->d), the list of clauses may be simplified to
+
+    (a = 1) AND (c = 3)
+
+Functional dependencies may only be applied to equality clauses, all other types
+of clauses are ignored. See clauselist_apply_dependencies() for more details.
+
+
+Compatibility of clauses
+------------------------
+
+The reduction assumes the clauses really are redundant, and the value in the
+reduced clause (b=2) is the value determined by (a=1). If that's not the case
+and the values are "incompatible" the result will be over-estimation.
+
+This may happen for example when using conditions on ZIP and city name with
+mismatching values (ZIP for a different city), etc. In such case the result
+set will be empty, but we'll estimate the selectivity using the ZIP condition.
+
+In this case the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+
+Dependencies vs. MCV/histogram
+------------------------------
+
+In some cases the "compatibility" of the conditions might be verified using the
+other types of multivariate stats - MCV lists and histograms.
+
+For MCV lists the verification might be very simple - peek into the list if
+there are any items matching the clause on the 'a' column (e.g. ZIP code), and
+if such item is found, check that the 'b' column matches the other clause. If it
+does not, the clauses are contradictory. We can't really say if such item was
+not found, except maybe restricting the selectivity using the MCV data (e.g.
+using min/max selectivity, or something).
+
+With histograms, it might work similarly - we can't check the values directly
+(because histograms use buckets, unlike MCV lists, storing the actual values).
+So we can only observe the buckets matching the clauses - if those buckets have
+very low frequency, it probably means the two clauses are incompatible.
+
+It's unclear what 'low frequency' is, but if one of the clauses is implied
+(automatically true because of the other clause), then
+
+    selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+
+So we might compute selectivity of the first clause - for example using regular
+statistics. And then check if the selectivity computed from the histogram is
+about the same (or significantly lower).
+
+The problem is that histograms work well only when the data ordering matches the
+natural meaning. For values that serve as labels - like city names or ZIP codes,
+or even generated IDs, histograms really don't work all that well. For example
+sorting cities by name won't match the sorting of ZIP codes, rendering the
+histogram unusable.
+
+So MCVs are probably going to work much better, because they don't really assume
+any sort of ordering. And it's probably more appropriate for the label-like data.
+
+A good question however is why even use functional dependencies in such cases
+and not simply use the MCV/histogram instead. One reason is that the functional
+dependencies allow fallback to regular stats, and often produce more accurate
+estimates - especially compared to histograms, that are quite bad in estimating
+equality clauses.
+
+
+Limitations
+-----------
+
+Let's see the main liminations of functional dependencies, especially those
+related to the current implementation.
+
+The current implementation supports only dependencies between two columns, but
+this is merely a simplification of the initial implementation. It's certainly
+useful to mine for dependencies involving multiple columns on the 'left' side,
+i.e. a condition for the dependency. That is dependencies like (a,b -> c).
+
+The implementation may/should be smart enough not to mine redundant conditions,
+e.g. (a->b) and (a,c -> b), because the latter is a trivial consequence of the
+former one (if values of 'a' determine 'b', adding another column won't change
+that relationship). The ANALYZE should first analyze 1:1 dependencies, then 2:1
+dependencies (and skip the already identified ones), etc.
+
+For example the dependency
+
+    (city name -> zip code)
+
+is much stronger, i.e. whenever it hold, then
+
+    (city name, state name -> zip code)
+
+holds too. But in case there are cities with the same name in different states,
+then only the latter dependency will be valid.
+
+Of course, there probably are cities with the same name within a single state,
+but hopefully this is relatively rare occurence (and thus we'll still detect
+the 'soft' dependency).
+
+Handling multiple columns on the right side of the dependency, is not necessary,
+as those dependencies may be simply decomposed into a set of dependencies with
+the same meaning, one for each column on the right side. For example
+
+    (a -> b,c)
+
+is exactly the same as
+
+    (a -> b) & (a -> c)
+
+Of course, storing the first form may be more efficient thant storing multiple
+'simple' dependencies separately.
+
+
+TODO Support dependencies with multiple columns on left/right.
+
+TODO Investigate using histogram and MCV list to verify the dependencies.
+
+TODO Investigate statistical testing of the distribution (to decide whether it
+     makes sense to build the histogram/MCV list).
+
+TODO Using a min/max of selectivities would probably make more sense for the
+     associated columns.
+
+TODO Consider eliminating the implied columns from the histogram and MCV lists
+     (but maybe that's not a good idea, because that'd make it impossible to use
+     these stats for non-equality clauses and also it wouldn't be possible to
+     use the stats for verification of the dependencies).
+
+TODO The reduction probably might be extended to also handle IS NULL clauses,
+     assuming we fix the ANALYZE to properly handle NULL values. We however
+     won't be able to reduce IS NOT NULL (unless I'm missing something).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..6d5465b
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/datum.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/builtins.h"
+#include "access/sysattr.h"
+
+#include "utils/mvstats.h"
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..2a064a0
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,437 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Detect functional dependencies between columns.
+ *
+ * TODO This builds a complete set of dependencies, i.e. including transitive
+ *      dependencies - if we identify [A => B] and [B => C], we're likely to
+ *      identify [A => C] too. It might be better to  keep only the minimal set
+ *      of dependencies, i.e. prune all the dependencies that we can recreate
+ *      by transivitity.
+ * 
+ *      There are two conceptual ways to do that:
+ * 
+ *      (a) generate all the rules, and then prune the rules that may be
+ *          recteated by combining other dependencies, or
+ * 
+ *      (b) performing the 'is combination of other dependencies' check before
+ *          actually doing the work
+ * 
+ *      The second option has the advantage that we don't really need to perform
+ *      the sort/count. It's not sufficient alone, though, because we may
+ *      discover the dependencies in the wrong order. For example we may find
+ *
+ *          (a -> b), (a -> c) and then (b -> c)
+ *
+ *      None of those dependencies is a combination of the already known ones,
+ *      yet (a -> C) is a combination of (a -> b) and (b -> c).
+ *
+ * 
+ * FIXME Currently we simply replace NULL values with 0 and then handle is as
+ *       a regular value, but that groups NULL and actual 0 values. That's
+ *       clearly incorrect - we need to handle NULL values as a separate value.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct values in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we can estimate the
+	 *      average group size and use it as a threshold. Or something
+	 *      like that. Seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index fd8dc91..4f106c3 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2104,6 +2104,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90500)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 049bf9f..12211fe 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,10 +153,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 
 /* in dependency.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index b80d8d8..5ae42f7 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ab2c1a8..a768bb5 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId  3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..44cf9c6 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -137,6 +137,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..a568a07
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,73 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;			/* relation containing attributes */
+	NameData	staname;			/* statistics name */
+	Oid			stanamespace;		/* OID of namespace containing this statistics */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					7
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_deps_enabled		4
+#define Anum_pg_mv_statistic_deps_built			5
+#define Anum_pg_mv_statistic_stakeys			6
+#define Anum_pg_mv_statistic_stadeps			7
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cbbb883..eecce40 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2666,6 +2666,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index b7a38ce..a52096b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3577, 3578);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 54f67e9..99a6a62 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -75,6 +75,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(List *name, List *args, bool oldstyle,
 				List *parameters, const char *queryString);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fad9988..545b62a 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -266,6 +266,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -401,6 +402,7 @@ typedef enum NodeTag
 	T_CreatePolicyStmt,
 	T_AlterPolicyStmt,
 	T_CreateTransformStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fd0629..e1807fb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -601,6 +601,17 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+	bool		if_not_exists;	/* just do nothing if statistics already exists? */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1410,6 +1421,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 641728b..e10dcf1 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -539,6 +539,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -634,6 +635,33 @@ typedef struct IndexOptInfo
 	void		(*amcostestimate) ();	/* AM's cost estimator */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..7ebd961
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..8771f9c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -61,6 +61,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -93,6 +94,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 1b48304..9f03c8d 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..0e0658d 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 81bc5c9..84b4425 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1368,6 +1368,15 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
-- 
2.1.0

0003-clause-reduction-using-functional-dependencies.patchtext/x-patch; charset=UTF-8; name=0003-clause-reduction-using-functional-dependencies.patchDownload

From 6f359af9ce78fd21bde74b76e45508364da992b2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/9] clause reduction using functional dependencies

During planning, use functional dependencies to decide which clauses to
skip during cardinality estimation. Initial and rather simplistic
implementation.

This only works with regular WHERE clauses, not clauses used for join
clauses.

Note: The clause_is_mv_compatible() needs to identify the relation (so
that we can fetch the list of multivariate stats by OID).
planner_rt_fetch() seems like the appropriate way to get the relation
OID, but apparently it only works with simple vars. Maybe
examine_variable() would make this work with more complex vars too?

Includes regression tests analyzing functional dependencies (part of
ANALYZE) on several datasets (no dependencies, no transitive
dependencies, ...).

Checks that a query with conditions on two columns, where one (B) is
functionally dependent on the other one (A), correctly ignores the
clause on (B) and chooses bitmap index scan instead of plain index scan
(which is what happens otherwise, thanks to assumption of
independence).

Note: Functional dependencies only work with equality clauses, no
inequalities etc.
---
 src/backend/optimizer/path/clausesel.c        | 891 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/README.stats        |  36 ++
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 +
 src/include/utils/mvstats.h                   |  16 +-
 src/test/regress/expected/mv_dependencies.out | 172 +++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 +++++
 9 files changed, 1293 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.stats
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 02660c2..80708fe 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,23 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+
+static int count_mv_attnums(List *clauses, Index relid);
+
+static int count_varnos(List *clauses, Index *relid);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+									Index relid, List *stats);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, Index relid);
+
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +82,19 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
+ * The first thing we try to do is applying multivariate statistics, in a way
+ * that intends to minimize the overhead when there are no multivariate stats
+ * on the relation. Thus we do several simple (and inexpensive) checks first,
+ * to verify that suitable multivariate statistics exist.
+ *
+ * If we identify such multivariate statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using multivariate stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -99,6 +133,22 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* list of multivariate stats on the relation */
+	List	   *stats = NIL;
+
+	/*
+	 * To fetch the statistics, we first need to determine the rel. Currently
+	 * point we only support estimates of simple restrictions with all Vars
+	 * referencing a single baserel. However set_baserel_size_estimates() sets
+	 * varRelid=0 so we have to actually inspect the clauses by pull_varnos
+	 * and see if there's just a single varno referenced.
+	 */
+	if ((count_varnos(clauses, &relid) == 1) && ((varRelid == 0) || (varRelid == relid)))
+		stats = find_stats(root, relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +158,24 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Apply functional dependencies, but first check that there are some stats
+	 * with functional dependencies built (by simply walking the stats list),
+	 * and that there are at two or more attributes referenced by clauses that
+	 * may be reduced using functional dependencies.
+	 *
+	 * We would find that anyway when trying to actually apply the functional
+	 * dependencies, but let's do the cheap checks first.
+	 *
+	 * After applying the functional dependencies we get the remainig clauses
+	 * that need to be estimated by other types of stats (MCV, histograms etc).
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
+		(count_mv_attnums(clauses, relid) >= 2))
+	{
+		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +831,824 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+typedef struct
+{
+	Index		varno;		/* relid we're interested in */
+	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
+} mv_compatible_context;
+
+/*
+ * Recursive walker that checks compatibility of the clause with multivariate
+ * statistics, and collects attnums from the Vars.
+ *
+ * XXX The original idea was to combine this with expression_tree_walker, but
+ *     I've been unable to make that work - seems that does not quite allow
+ *     checking the structure. Hence the explicit calls to the walker.
+ */
+static bool
+mv_compatible_walker(Node *node, mv_compatible_context *context)
+{
+	if (node == NULL)
+		return false;
+ 
+	if (IsA(node, RestrictInfo))
+ 	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+ 
+ 		/* Pseudoconstants are not really interesting here. */
+ 		if (rinfo->pseudoconstant)
+			return true;
+ 
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+ 
+		/* check the clause inside the RestrictInfo */
+		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
+ 	}
+
+	if (IsA(node, Var))
+	{
+		Var * var = (Var*)node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might be
+		 * unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (! AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+ 	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+ 	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+ 		/*
+		 * Only expressions with two arguments are considered compatible.
+ 		 *
+		 * XXX Possibly unnecessary (can OpExpr have different arg count?).
+ 		 */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node*)expr) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				(varonleft = false,
+				 is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (! ok)
+			return true;
+
+ 		/*
+		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
+		 * Otherwise note the relid and attnum for the variable. This uses the
+		 * function for estimating selectivity, ont the operator directly (a bit
+		 * awkward, but well ...).
+ 		 */
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+ 		}
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return mv_compatible_walker((Node *) var, context);
+ 	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+ 
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+{
+	mv_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (mv_compatible_walker(clause, (void *) &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+/*
+ * collect attnums from functional dependencies
+ *
+ * Walk through all statistics on the relation, and collect attnums covered
+ * by those with functional dependencies. We only look at columns specified
+ * when creating the statistics, not at columns actually referenced by the
+ * dependencies (which may only be a subset of the attributes).
+ */
+static Bitmapset*
+fdeps_collect_attnums(List *stats)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		int2vector *stakeys = info->stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+			attnums = bms_add_member(attnums, stakeys->values[j]);
+	}
+
+	return attnums;
+}
+
+/* transforms bitmapset into an array (index => value) */
+static int*
+make_idx_to_attnum_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum;
+
+	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attidx++] = attnum;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+/* transforms bitmapset into an array (value => index) */
+static int*
+make_attnum_to_idx_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum;
+	int		maxattnum = -1;
+	int	   *mapping;
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		maxattnum = attnum;
+
+	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attnum] = attidx++;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+/* build adjacency matrix for the dependencies */
+static bool*
+build_adjacency_matrix(List *stats, Bitmapset *attnums,
+					   int *idx_to_attnum, int *attnum_to_idx)
+{
+	ListCell *lc;
+	int		natts  = bms_num_members(attnums);
+	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! stat->deps_built)
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(stat->mvoid);
+		if (dependencies == NULL)
+		{
+			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
+			continue;
+		}
+
+		/* set matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			matrix[aidx * natts + bidx] = true;
+		}
+	}
+
+	return matrix;
+}
+
+/*
+ * multiply the adjacency matrix
+ *
+ * By multiplying the adjacency matrix, we derive dependencies implied by those
+ * stored in the catalog (but possibly in several separate rows). We need to
+ * repeat the multiplication until no new dependencies are discovered. The
+ * maximum number of multiplications is equal to the number of attributes.
+ *
+ * This is based on modeling the functional dependencies as edges in a directed
+ * graph with attributes as vertices.
+ */
+static void
+multiply_adjacency_matrix(bool *matrix, int natts)
+{
+	int i;
+
+	/* repeat the multiplication up to natts-times */
+	for (i = 0; i < natts; i++)
+	{
+		bool changed = false;	/* no changes in this round */
+		int k, l, m;
+
+		/* k => l */
+		for (k = 0; k < natts; k++)
+		{
+			for (l = 0; l < natts; l++)
+			{
+				/* skip already known dependencies */
+				if (matrix[k * natts + l])
+					continue;
+
+				/*
+				 * compute (k,l) in the multiplied matrix
+				 *
+				 * We don't really care about the exact value, just true/false,
+				 * so terminate the loop once we get a hit. Also, this makes it
+				 * safe to modify the matrix in-place.
+				 */
+				for (m = 0; m < natts; m++)
+				{
+					if (matrix[k * natts + m] * matrix[m * natts + l])
+					{
+						matrix[k * natts + l] = true;
+						changed = true;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added in this round, so terminate */
+		if (! changed)
+			break;
+	}
+}
+
+/*
+ * Reduce clauses using functional dependencies
+ *
+ * Walk through clauses and eliminate the redundant ones (implied by other
+ * clauses). This is done by first deriving a transitive closure of all the
+ * functional dependencies (by multiplying the adjacency matrix).
+ */
+static List*
+fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
+					int *idx_to_attnum, int *attnum_to_idx, Index relid)
+{
+	int i;
+	ListCell *lc;
+	List   *reduced_clauses = NIL;
+
+	int			nmvclauses;	/* size of the arrays */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+
+	int			natts = bms_num_members(attnums);
+
+	/*
+	 * Preallocate space for all clauses (the list only containst
+	 * compatible clauses at this point). This makes it somewhat easier
+	 * to access the stats / attnums randomly.
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* fill the arrays */
+	nmvclauses = 0;
+	foreach (lc, clauses)
+	{
+		Node * clause = (Node*)lfirst(lc);
+		Bitmapset * attnums = get_varattnos(clause, relid);
+
+		mvclauses[nmvclauses] = clause;
+		mvattnums[nmvclauses] = bms_singleton_member(attnums);
+		nmvclauses++;
+	}
+
+	Assert(nmvclauses == list_length(clauses));
+
+	/* now try to reduce the clauses (using the dependencies) */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], attnums))
+				continue;
+
+			aidx = attnum_to_idx[mvattnums[i]];
+			bidx = attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = matrix[aidx * natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	return reduced_clauses;
+}
+
+/*
+ * filter clauses that are interesting for the reduction step
+ *
+ * Functional dependencies can only work with equality clauses with attributes
+ * covered by at least one of the statistics, so we walk through the clauses
+ * and copy the uninteresting ones directly to the result (reduced) clauses.
+ *
+ * That includes clauses that:
+ *     (a) are not mv-compatible
+ *     (b) reference more than a single attnum
+ *     (c) use attnum not covered by functional depencencies
+ *
+ * The clauses interesting for the reduction step are copied to deps_clauses.
+ *
+ * root            - planner root
+ * clauses         - list of clauses (input)
+ * deps_attnums    - attributes covered by dependencies
+ * reduced_clauses - resulting clauses (not subject to reduction step)
+ * deps_clauses    - clauses to be processed by reduction
+ * relid           - relid of the baserel
+ *
+ * The return value is a bitmap of attnums referenced by deps_clauses.
+ */
+static Bitmapset *
+fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Index relid)
+{
+	ListCell *lc;
+	Bitmapset *clause_attnums = NULL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+
+		if (! clause_is_mv_compatible(clause, relid, &attnum))
+
+			/* clause incompatible with functional dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(attnum, deps_attnums))
+
+			/* clause not covered by the dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else
+		{
+			*deps_clauses   = lappend(*deps_clauses, clause);
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	return clause_attnums;
+}
+
+/*
+ * reduce list of equality clauses using soft functional dependencies
+ *
+ * We simply walk through list of functional dependencies, and for each one we
+ * check whether the dependency 'matches' the clauses, i.e. if there's a clause
+ * matching the condition. If yes, we attempt to remove all clauses matching
+ * the implied part of the dependency from the list.
+ *
+ * This only reduces equality clauses, and ignores all the other types. We might
+ * extend it to handle IS NULL clause, in the future.
+ *
+ * We also assume the equality clauses are 'compatible'. For example we can't
+ * identify when the clauses use a mismatching zip code and city name. In such
+ * case the usual approach (product of selectivities) would produce a better
+ * estimate, although mostly by chance.
+ *
+ * The implementation needs to be careful about cyclic dependencies, e.g. when
+ *
+ *     (a -> b) and (b -> a)
+ *
+ * at the same time, which means there's 1:1 relationship between te columns.
+ * In this case we must not reduce clauses on both attributes at the same time.
+ *
+ * TODO Currently we only apply functional dependencies at the same level, but
+ *      maybe we could transfer the clauses from upper levels to the subtrees?
+ *      For example let's say we have (a->b) dependency, and condition
+ *
+ *          (a=1) AND (b=2 OR c=3)
+ *
+ *      Currently, we won't be able to perform any reduction, because we'll
+ *      consider (a=1) and (b=2 OR c=3) independently. But maybe we could pass
+ *      (a=1) into the other expression, and only check it against conditions
+ *      of the functional dependencies?
+ *
+ *      In this case we'd end up with
+ *
+ *         (a=1)
+ *
+ *      as we'd consider (b=2) implied thanks to the rule, rendering the whole
+ *      OR clause valid.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Index relid, List *stats)
+{
+	List	   *reduced_clauses = NIL;
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	List	   *deps_clauses   = NIL;
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/*
+	 * Build the dependency matrix, i.e. attribute adjacency matrix,
+	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
+	 * multiply it by itself, to get transitive dependencies.
+	 *
+	 * Note: This is pretty much transitive closure from graph theory.
+	 *
+	 * First, let's see what attributes are covered by functional
+	 * dependencies (sides of the adjacency matrix), and also a maximum
+	 * attribute (size of mapping to simple integer indexes);
+	 */
+	deps_attnums = fdeps_collect_attnums(stats);
+
+	/*
+	 * Walk through the clauses - clauses that are (one of)
+	 *
+	 * (a) not mv-compatible
+	 * (b) are using more than a single attnum
+	 * (c) using attnum not covered by functional depencencies
+	 *
+	 * may be copied directly to the result. The interesting clauses are
+	 * kept in 'deps_clauses' and will be processed later.
+	 */
+	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
+										&reduced_clauses, &deps_clauses, relid);
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+		list_free(deps_clauses);
+
+		return clauses;
+	}
+
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't really reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(deps_clauses);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/*
+	 * Build mapping between matrix indexes and attnums, and then the
+	 * adjacency matrix itself.
+	 */
+	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
+	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
+
+	/* build the adjacency matrix */
+	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
+										 deps_idx_to_attnum,
+										 deps_attnum_to_idx);
+
+	deps_natts = bms_num_members(deps_attnums);
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	multiply_adjacency_matrix(deps_matrix, deps_natts);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	deps_clauses = fdeps_reduce_clauses(deps_clauses,
+										deps_attnums, deps_matrix,
+										deps_idx_to_attnum,
+										deps_attnum_to_idx, relid);
+
+	/* join the two lists of clauses */
+	reduced_clauses = list_union(reduced_clauses, deps_clauses);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Lookups stats for a given baserel.
+ */
+static List *
+find_stats(PlannerInfo *root, Index relid)
+{
+	Assert(root->simple_rel_array[relid] != NULL);
+
+	return root->simple_rel_array[relid]->mvstatlist;
+}
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
new file mode 100644
index 0000000..a38ea7b
--- /dev/null
+++ b/src/backend/utils/mvstats/README.stats
@@ -0,0 +1,36 @@
+Multivariate statististics
+==========================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Multivariate stats track different types of dependencies between the columns,
+hopefully improving the estimates.
+
+Currently we only have one kind of multivariate statistics - soft functional
+dependencies, and we use it to improve estimates of equality clauses. See
+README.dependencies for details.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable multivariate stats
+        exist (so if you are not using multivariate stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are multivariate stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with multivariate statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..bd200bc 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 2a064a0..c80ba33 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -435,3 +435,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 7ebd961..cc43a79 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,12 +17,20 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -48,6 +56,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..e759997
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index bec0316..4f2ffb8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7e9b319..097a04f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -162,3 +162,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..48dea4d
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
2.1.0

0004-multivariate-MCV-lists.patchtext/x-patch; charset=UTF-8; name=0004-multivariate-MCV-lists.patchDownload

From eea437d2d84469974efc8fbf2fddd926acbbd426 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/9] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 doc/src/sgml/ref/create_statistics.sgml |   18 +
 src/backend/catalog/system_views.sql    |    4 +-
 src/backend/commands/statscmds.c        |   45 +-
 src/backend/nodes/outfuncs.c            |    2 +
 src/backend/optimizer/path/clausesel.c  |  829 ++++++++++++++++++++++-
 src/backend/optimizer/util/plancat.c    |    4 +-
 src/backend/utils/mvstats/Makefile      |    2 +-
 src/backend/utils/mvstats/README.mcv    |  137 ++++
 src/backend/utils/mvstats/README.stats  |   89 ++-
 src/backend/utils/mvstats/common.c      |  104 ++-
 src/backend/utils/mvstats/common.h      |   11 +-
 src/backend/utils/mvstats/mcv.c         | 1094 +++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                 |   25 +-
 src/include/catalog/pg_mv_statistic.h   |   18 +-
 src/include/catalog/pg_proc.h           |    4 +
 src/include/nodes/relation.h            |    2 +
 src/include/utils/mvstats.h             |   69 +-
 src/test/regress/expected/mv_mcv.out    |  207 ++++++
 src/test/regress/expected/rules.out     |    4 +-
 src/test/regress/parallel_schedule      |    2 +-
 src/test/regress/serial_schedule        |    1 +
 src/test/regress/sql/mv_mcv.sql         |  178 +++++
 22 files changed, 2776 insertions(+), 73 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.mcv
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index a86eae3..193e4b0 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -132,6 +132,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>max_mcv_items</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of MCV list items.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a264e..2d570ee 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -165,7 +165,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 84a8b13..90bfaed 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -136,7 +136,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject, childobject;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -212,6 +218,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -220,10 +249,16 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -243,8 +278,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 07206d7..333e24b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2162,9 +2162,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 80708fe..977f88e 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,23 +48,51 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
-static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
+							 int type);
 
-static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid, int type);
 
-static int count_mv_attnums(List *clauses, Index relid);
+static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, Index relid,
+								 List *clauses, List **mvclauses,
+								 MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
+
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
 
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -89,11 +118,13 @@ static List * find_stats(PlannerInfo *root, Index relid);
  * to verify that suitable multivariate statistics exist.
  *
  * If we identify such multivariate statistics apply, we try to apply them.
- * Currently we only have (soft) functional dependencies, so we try to reduce
- * the list of clauses.
  *
- * Then we remove the clauses estimated using multivariate stats, and process
- * the rest of the clauses using the regular per-column stats.
+ * First we try to reduce the list of clauses by applying (soft) functional
+ * dependencies, and then we try to estimate the selectivity of the reduced
+ * list of clauses using the multivariate MCV list.
+ *
+ * Finally we remove the portion of clauses estimated using multivariate stats,
+ * and process the rest of the clauses using the regular per-column stats.
  *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
@@ -170,12 +201,46 @@ clauselist_selectivity(PlannerInfo *root,
 	 * that need to be estimated by other types of stats (MCV, histograms etc).
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
-		(count_mv_attnums(clauses, relid) >= 2))
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_FDEP) >= 2))
 	{
 		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
 	}
 
 	/*
+	 * Check that there are statistics with MCV list or histogram, and also the
+	 * number of attributes covered by these types of statistics.
+	 *
+	 * If there are no such stats or not enough attributes, don't waste time
+	 * with the multivariate code and simply skip to estimation using the
+	 * regular per-column stats.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	{
+		/* collect attributes from the compatible conditions */
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+
+		/* and search for the statistic covering the most attributes */
+		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+		if (mvstat != NULL)	/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	*mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, MV_CLAUSE_TYPE_MCV);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats */
+			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -832,6 +897,69 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO We may support some additional conditions, most importantly those
+ *      matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ *      selectivity of the most restrictive clause), because that's the maximum
+ *      we can ever get from ANDed list of clauses. This may probably prevent
+ *      issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO We may remember the lowest frequency in the MCV list, and then later use
+ *      it as a upper boundary for the selectivity (had there been a more
+ *      frequent item, it'd be in the MCV list). This might improve cases with
+ *      low-detail histograms.
+ *
+ * TODO We may also derive some additional boundaries for the selectivity from
+ *      the MCV list, because
+ *
+ *      (a) if we have a "full equality condition" (one equality condition on
+ *          each column of the statistic) and we found a match in the MCV list,
+ *          then this is the final selectivity (and pretty accurate),
+ *
+ *      (b) if we have a "full equality condition" and we haven't found a match
+ *          in the MCV list, then the selectivity is below the lowest frequency
+ *          found in the MCV list,
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Pull varattnos from the clauses, similarly to pull_varattnos() but:
  *
@@ -869,28 +997,26 @@ get_varattnos(Node * node, Index relid)
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid)
+collect_mv_attnums(List *clauses, Index relid, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate
-	 * using multivariate stats, and remember the relid/columns. We'll
-	 * then cross-check if we have suitable stats, and only if needed
-	 * we'll split the clauses into multivariate and regular lists.
+	 * Walk through the clauses and identify the ones we can estimate using
+	 * multivariate stats, and remember the relid/columns. We'll then
+	 * cross-check if we have suitable stats, and only if needed we'll split
+	 * the clauses into multivariate and regular lists.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested
-	 * OpExpr, using either a range or equality.
+	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
+	 * using either a range or equality.
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(clause, relid, &attnum))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
 	}
 
 	/*
@@ -911,10 +1037,10 @@ collect_mv_attnums(List *clauses, Index relid)
  * Count the number of attributes in clauses compatible with multivariate stats.
  */
 static int
-count_mv_attnums(List *clauses, Index relid)
+count_mv_attnums(List *clauses, Index relid, int type)
 {
 	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
 
 	c = bms_num_members(attnums);
 
@@ -944,9 +1070,183 @@ count_varnos(List *clauses, Index *relid)
 
 	return cnt;
 }
+ 
+/*
+ * We're looking for statistics matching at least 2 attributes, referenced in
+ * clauses compatible with multivariate statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * If there are multiple statistics referencing the same number of columns
+ * (from the clauses), the one with less source columns (as listed in the
+ * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns, but one
+ *     has 100 buckets and the other one has 1000 buckets (thus likely
+ *     providing better estimates), this is not currently considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list, another
+ *     one with just a histogram and a third one with both, we treat them equally.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts, so if
+ *     there are multiple clauses on a single attribute, this still counts as
+ *     a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example equality
+ *     clauses probably work better with MCV lists than with histograms. But
+ *     IS [NOT] NULL conditions may often work better with histograms (thanks
+ *     to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
+ * as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list of
+ * clauses into two parts - conditions that are compatible with the selected
+ * stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while the last
+ * condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting conditions
+ * instead of just referenced attributes), but eventually the best option should
+ * be to combine multiple statistics. But that's much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses, because
+ *      'dependencies' will probably work only with equality clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements) and for
+	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses that
+ * will be evaluated using the chosen statistics, and the remaining clauses
+ * (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes, so we can do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list of
+		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
+		 * clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible with the chosen
+	 * histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
 
 typedef struct
 {
+	int			types;		/* types of statistics ? */
 	Index		varno;		/* relid we're interested in */
 	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
 } mv_compatible_context;
@@ -964,23 +1264,66 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 {
 	if (node == NULL)
 		return false;
- 
+
 	if (IsA(node, RestrictInfo))
  	{
 		RestrictInfo *rinfo = (RestrictInfo *) node;
- 
+
  		/* Pseudoconstants are not really interesting here. */
  		if (rinfo->pseudoconstant)
 			return true;
- 
+
 		/* clauses referencing multiple varnos are incompatible */
 		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
 			return true;
- 
+
 		/* check the clause inside the RestrictInfo */
 		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
  	}
 
+	if (or_clause(node) || and_clause(node) || not_clause(node))
+ 	{
+		/*
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses are
+		 *      supported and some are not, and treat all supported subclauses
+		 *      as a single clause, compute it's selectivity using mv stats,
+		 *      and compute the total selectivity using the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the orclause
+		 *      with nested RestrictInfo - we won't have to call pull_varnos()
+		 *      for each clause, saving time.
+		 *
+		 * TODO Perhaps this needs a bit more thought for functional
+		 *      dependencies? Those don't quite work for NOT cases.
+		 */
+		BoolExpr   *expr = (BoolExpr *) node;
+		ListCell   *lc;
+
+		foreach (lc, expr->args)
+		{
+			if (mv_compatible_walker((Node *) lfirst(lc), context))
+				return true;
+		}
+
+		return false;
+ 	}
+
+	if (IsA(node, NullTest))
+ 	{
+		NullTest* nt = (NullTest*)node;
+
+		/*
+		 * Only simple (Var IS NULL) expressions supported for now. Maybe we could
+		 * use examine_variable to fix this?
+		 */
+		if (! IsA(nt->arg, Var))
+			return true;
+
+		return mv_compatible_walker((Node*)(nt->arg), context);
+	}
+
 	if (IsA(node, Var))
 	{
 		Var * var = (Var*)node;
@@ -1031,7 +1374,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		/* unsupported structure (two variables or so) */
 		if (! ok)
 			return true;
-
+ 
  		/*
 		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
 		 * Otherwise note the relid and attnum for the variable. This uses the
@@ -1041,10 +1384,18 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		switch (get_oprrest(expr->opno))
 		{
 			case F_EQSEL:
-
 				/* equality conditions are compatible with all statistics */
 				break;
 
+			case F_SCALARLTSEL:
+			case F_SCALARGTSEL:
+
+				/* not compatible with functional dependencies */
+				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+					return true;	/* terminate */
+ 
+				break;
+
 			default:
 
 				/* unknown estimator */
@@ -1055,11 +1406,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 
 		return mv_compatible_walker((Node *) var, context);
  	}
-
+ 
 	/* Node not explicitly supported, so terminate */
 	return true;
 }
- 
+
 /*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
@@ -1078,10 +1429,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
  *      evaluate them using multivariate stats.
  */
 static bool
-clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int types)
 {
 	mv_compatible_context context;
 
+	context.types = types;
 	context.varno = relid;
 	context.varattnos = NULL;	/* no attnums */
 
@@ -1089,7 +1441,7 @@ clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
 		return false;
 
 	/* remember the newly collected attnums */
-	*attnum = bms_singleton_member(context.varattnos);
+	*attnums = bms_add_members(*attnums, context.varattnos);
 
 	return true;
 }
@@ -1394,24 +1746,39 @@ fdeps_filter_clauses(PlannerInfo *root,
 
 	foreach (lc, clauses)
 	{
-		AttrNumber	attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
 
-		if (! clause_is_mv_compatible(clause, relid, &attnum))
+		if (! clause_is_mv_compatible(clause, relid, &attnums,
+									  MV_CLAUSE_TYPE_FDEP))
 
 			/* clause incompatible with functional dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
-		else if (! bms_is_member(attnum, deps_attnums))
+		else if (bms_num_members(attnums) > 1)
+
+			/*
+			 * clause referencing multiple attributes (strange, should
+			 * this be handled by clause_is_mv_compatible directly)
+			 */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
 
 			/* clause not covered by the dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
 		else
 		{
+			/* ok, clause compatible with existing dependencies */
+			Assert(bms_num_members(attnums) == 1);
+
 			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+										bms_singleton_member(attnums));
 		}
+
+		bms_free(attnums);
 	}
 
 	return clause_attnums;
@@ -1637,6 +2004,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+			return true;
 	}
 
 	return false;
@@ -1652,3 +2022,392 @@ find_stats(PlannerInfo *root, Index relid)
 
 	return root->simple_rel_array[relid]->mvstatlist;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/*
+	 * find the lowest frequency in the MCV list
+	 *
+	 * We need to do that here, because we do various tricks in the following
+	 * code - skipping items already ruled out, etc.
+	 *
+	 * XXX A loop is necessary because the MCV list is not sorted by frequency.
+	 */
+	*lowsel = 1.0;
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem item = mcvlist->items[i];
+
+		if (item->frequency < *lowsel)
+			*lowsel = item->frequency;
+	}
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+			FmgrInfo	opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					switch (oprrest)
+					{
+						case F_EQSEL:
+							/*
+							 * We don't care about isgt in equality, because it does not
+							 * matter whether it's (var = const) or (const = var).
+							 */
+							mismatch = ! DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							if (! mismatch)
+								eqmatches = bms_add_member(eqmatches, idx);
+
+							break;
+
+						case F_SCALARLTSEL:	/* column < constant */
+						case F_SCALARGTSEL: /* column > constant */
+
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							/* invert the result if isgt=true */
+							mismatch = (isgt) ? (! mismatch) : mismatch;
+							break;
+					}
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! item->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (item->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 31939dd..d807dc7 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -425,9 +425,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.mcv b/src/backend/utils/mvstats/README.mcv
new file mode 100644
index 0000000..e93cfe4
--- /dev/null
+++ b/src/backend/utils/mvstats/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+    (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+    (1) there's an equality condition on all attributes of the statistic
+
+    (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+     f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+     of the actual values. This type of MCV list will be useful only for
+     estimating equality clauses, and will reduce space requirements for large
+     varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+     building the histogram at all), except for doing this decision manually in
+     ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+    SELECT * FROM pg_mv_mcv_items();
+
+It has two input parameters:
+
+    oid   - OID of the MCV list (pg_mv_statistic.staoid)
+
+and produces a table with these columns:
+
+    - item ID (0...nitems-1)
+    - values (string array)
+    - nulls only (boolean array)
+    - frequency (double precision)
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index a38ea7b..5c5c59a 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,9 +8,50 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-Currently we only have one kind of multivariate statistics - soft functional
-dependencies, and we use it to improve estimates of equality clauses. See
-README.dependencies for details.
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) MCV lists (README.mcv)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+    (b) MCV list - equality and inequality clauses, IS [NOT] NULL, AND/OR
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
 
 
 Selectivity estimation
@@ -23,14 +64,48 @@ When estimating selectivity, we aim to achieve several things:
     (b) minimize the overhead, especially when no suitable multivariate stats
         exist (so if you are not using multivariate stats, there's no overhead)
 
-This clauselist_selectivity() performs several inexpensive checks first, before
+Thus clauselist_selectivity() performs several inexpensive checks first, before
 even attempting to do the more expensive estimation.
 
     (1) check if there are multivariate stats on the relation
 
-    (2) check there are at least two attributes referenced by clauses compatible
-        with multivariate statistics (equality clauses for func. dependencies)
+    (2) check that there are functional dependencies on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equality clauses for func. dependencies)
 
     (3) perform reduction of equality clauses using func. dependencies
 
-    (4) estimate the reduced list of clauses using regular statistics
+    (4) check that there are multivariate MCV lists on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equalities, inequalities, etc.)
+
+    (5) find the best multivariate statistics (matching the most conditions)
+        and use it to compute the estimate
+
+    (6) estimate the remaining clauses (not estimated using multivariate stats)
+        using the regular per-column statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
+
+
+Further (possibly crazy) ideas
+------------------------------
+
+Currently the clauses are only estimated using a single statistics, even if
+there are multiple candidate statistics - for example assume we have statistics
+on (a,b,c) and (b,c,d), and estimate conditions
+
+    (b = 1) AND (c = 2)
+
+Then both statistics may be used, but we only use one of them. Maybe we could
+use compute estimates using all candidate stats, and somehow aggregate them
+into the final estimate by using average or median.
+
+Some stats may give better estimates than others, but it's very difficult to say
+in advance which stats are the best (it depends on the number of buckets, number
+of additional columns not referenced in the clauses, type of condition etc.).
+
+But of course, this may result in expensive estimation (CPU-wise).
+
+So we might add a GUC to choose between a simple (single statistics) and thus
+multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd200bc..d1da714 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 6d5465b..f4309f7 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..551c934
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1094 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't
+ * allow more than 8k MCV items (see list max_mcv_items). We might
+ * increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the high compression as with histograms,
+ * because we're not doing any bucket splits etc. (which is the source
+ * of high redundancy there), but we need to do it anyway as we need
+ * to serialize varlena values etc. We might invent another way to
+ * serialize MCV lists, but let's keep it consistent.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* do not exceed UINT16_MAX */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval || (info[i].typlen > 0))
+			/* by value pased by reference, but fixed length */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * Inverse to serialize_mv_mcvlist() - see the comment there.
+ *
+ * We'll do full deserialization, because we don't really expect high
+ * duplication of values so the caching may not be as efficient as with
+ * histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			val = item->values[i];
+			valout = FunctionCall1(&fmgrinfo[i], val);
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 4f106c3..6339631 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,8 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2128,6 +2129,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2137,10 +2140,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index a568a07..fd7107d 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -37,15 +37,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -61,13 +67,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					7
+#define Natts_pg_mv_statistic					11
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
-#define Anum_pg_mv_statistic_deps_built			5
-#define Anum_pg_mv_statistic_stakeys			6
-#define Anum_pg_mv_statistic_stadeps			7
+#define Anum_pg_mv_statistic_mcv_enabled		5
+#define Anum_pg_mv_statistic_mcv_max_items		6
+#define Anum_pg_mv_statistic_deps_built			7
+#define Anum_pg_mv_statistic_mcv_built			8
+#define Anum_pg_mv_statistic_stakeys			9
+#define Anum_pg_mv_statistic_stadeps			10
+#define Anum_pg_mv_statistic_stamcv				11
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index eecce40..b16eebc 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2670,6 +2670,10 @@ DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e10dcf1..2bcd582 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -653,9 +653,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index cc43a79..4535db7 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -51,30 +51,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..075320b
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s4 ON mcv_list (unknown_column) WITH (mcv);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s4 ON mcv_list (a) WITH (mcv);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a, b) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+ERROR:  max number of MCV items is 8192
+-- correct command
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s5 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s6 ON mcv_list (a, b, c, d) WITH (mcv);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 84b4425..66071d8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1373,7 +1373,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     s.staname,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f2ffb8..85d94f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 097a04f..6584d73 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -163,3 +163,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..b31d32d
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s4 ON mcv_list (unknown_column) WITH (mcv);
+
+-- single column
+CREATE STATISTICS s4 ON mcv_list (a) WITH (mcv);
+
+-- single column, duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a) WITH (mcv);
+
+-- two columns, one duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a, b) WITH (mcv);
+
+-- unknown option
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (unknown_option);
+
+-- missing MCV statistics
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+
+-- correct command
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s5 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s6 ON mcv_list (a, b, c, d) WITH (mcv);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.1.0

0005-multivariate-histograms.patchtext/x-patch; charset=UTF-8; name=0005-multivariate-histograms.patchDownload

From 355eb43e91c636e601c0581e6838b67d635a5981 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/9] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 doc/src/sgml/ref/create_statistics.sgml    |   18 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   44 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  571 +++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/README.histogram |  287 ++++
 src/backend/utils/mvstats/README.stats     |    2 +
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2032 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  136 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 21 files changed, 3538 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.histogram
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 193e4b0..fd3382e 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -133,6 +133,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>max_buckets</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of histogram buckets.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>max_mcv_items</> (<type>integer</>)</term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d570ee..6afdee0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -167,7 +167,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 90bfaed..b974655 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -137,12 +137,15 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -241,6 +244,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("maximum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -249,10 +275,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -260,6 +286,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -279,11 +310,14 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 333e24b..9172f21 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2163,10 +2163,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 977f88e..0de2418 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -74,6 +75,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -81,6 +84,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
@@ -93,6 +102,7 @@ static List * find_stats(PlannerInfo *root, Index relid);
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -121,7 +131,7 @@ static List * find_stats(PlannerInfo *root, Index relid);
  *
  * First we try to reduce the list of clauses by applying (soft) functional
  * dependencies, and then we try to estimate the selectivity of the reduced
- * list of clauses using the multivariate MCV list.
+ * list of clauses using the multivariate MCV list and histograms.
  *
  * Finally we remove the portion of clauses estimated using multivariate stats,
  * and process the rest of the clauses using the regular per-column stats.
@@ -214,11 +224,13 @@ clauselist_selectivity(PlannerInfo *root,
 	 * with the multivariate code and simply skip to estimation using the
 	 * regular per-column stats.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
-		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) &&
+		(count_mv_attnums(clauses, relid,
+						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
 		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
+											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
 		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
@@ -230,7 +242,7 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* split the clauselist into regular and mv-clauses */
 			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV);
+										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 			/* we've chosen the histogram to match the clauses */
 			Assert(mvclauses != NIL);
@@ -942,6 +954,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -955,9 +968,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 *      selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1160,7 +1188,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1391,7 +1419,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 			case F_SCALARGTSEL:
 
 				/* not compatible with functional dependencies */
-				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+				if (! (context->types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST)))
 					return true;	/* terminate */
  
 				break;
@@ -2007,6 +2035,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2411,3 +2442,525 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/* cached result of bucket boundary comparison for a single dimension */
+
+#define HIST_CACHE_NOT_FOUND		0x00
+#define HIST_CACHE_FALSE			0x01
+#define HIST_CACHE_TRUE				0x03
+#define HIST_CACHE_MASK				0x02
+
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache)
+{
+	bool a, b;
+
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/*
+	 * First some quick checks on equality - if any of the boundaries equals,
+	 * we have a partial match (so no need to call the comparator).
+	 */
+	if (((min_value == constvalue) && (min_include)) ||
+		((max_value == constvalue) && (max_include)))
+		return MVSTATS_MATCH_PARTIAL;
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	/*
+	 * If result for the bucket lower bound not in cache, evaluate the function
+	 * and store the result in the cache.
+	 */
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, min_value));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/* And do the same for the upper bound. */
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, max_value));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	return (a ^ b) ? MVSTATS_MATCH_PARTIAL : MVSTATS_MATCH_NONE;
+}
+
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache, bool isgt)
+{
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	bool a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	bool b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   min_value,
+										   constvalue));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   max_value,
+										   constvalue));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/*
+	 * Now, we need to combine both results into the final answer, and we need
+	 * to be careful about the 'isgt' variable which kinda inverts the meaning.
+	 *
+	 * First, we handle the case when each boundary returns different results.
+	 * In that case the outcome can only be 'partial' match.
+	 */
+	 if (a != b)
+		return MVSTATS_MATCH_PARTIAL;
+
+	/*
+	 * When the results are the same, then it depends on the 'isgt' value. There
+	 * are four options:
+	 *
+	 * isgt=false a=b=true  => full match
+	 * isgt=false a=b=false => empty
+	 * isgt=true  a=b=true  => empty
+	 * isgt=true  a=b=false => full match
+	 *
+	 * We'll cheat a bit, because we know that (a=b) so we'll use just one of them.
+	 */
+	if (isgt)
+		return (!a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+	else
+		return ( a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo		ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					char res = MVSTATS_MATCH_NONE;
+
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+					bool mininclude, maxinclude;
+					int minidx, maxidx;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minidx = bucket->min[idx];
+					maxidx = bucket->max[idx];
+
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mininclude = bucket->min_inclusive[idx];
+					maxinclude = bucket->max_inclusive[idx];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* Var < Const */
+						case F_SCALARGTSEL:	/* Var > Const */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															   minval, maxval,
+															   minidx, maxidx,
+															   mininclude, maxinclude,
+															   callcache, isgt);
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the
+							 * lt operator, and we also check for equality with the boundaries.
+							 */
+
+							res = bucket_contains_value(ltproc, cst->constvalue,
+														minval, maxval,
+														minidx, maxidx,
+														mininclude, maxinclude,
+														callcache);
+							break;
+					}
+
+					UPDATE_RESULT(matches[i], res, is_or);
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the bucket, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d807dc7..40145e7 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -426,10 +426,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.histogram b/src/backend/utils/mvstats/README.histogram
new file mode 100644
index 0000000..8234d2c
--- /dev/null
+++ b/src/backend/utils/mvstats/README.histogram
@@ -0,0 +1,287 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+    * frequency (fraction of tuples in the bucket)
+    * whether the boundaries are inclusive or exclusive
+    * whether the dimension contains only NULL values
+    * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+    (a) FULL_MATCH - The bucket definitely matches the clause.
+
+    (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+                        the tuples it represents.
+
+    (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+    WHERE (a < 10) => FULL_MATCH (all range values are below
+                      10, so the whole bucket matches)
+
+    WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+                      the clause, but we don't know how many)
+
+    WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+                      no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+    (a) create an initial bucket (containing all sample rows)
+
+    (b) create NULL buckets (by splitting the initial bucket)
+
+    (c) repeat
+    
+        (1) choose bucket to split next
+
+        (2) terminate if no bucket that might be split found, or if we've
+            reached the maximum number of buckets (16384)
+
+        (3) choose dimension to partition the bucket by
+
+        (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split.
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency. We also need to produce "regular" buckets, because buckets
+with one "side" much longer than the others are very likely to match a lot of
+conditions (which increases error, even if the bucket frequency is very low).
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket().
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL. 
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_mv_statistic table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+    2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+    (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+    2*N + K       = 1007
+    (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+    1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+    SELECT * FROM pg_mv_histogram_buckets();
+
+It has two input parameters:
+
+    oid   - OID of the histogram (pg_mv_statistic.staoid)
+    otype - type of output
+
+and produces a table with these columns:
+
+    - bucket ID                (0...nbuckets-1)
+    - lower bucket boundaries  (string array)
+    - upper bucket boundaries  (string array)
+    - nulls only dimensions    (boolean array)
+    - lower boundary inclusive (boolean array)
+    - upper boundary includive (boolean array)
+    - frequency                (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+    - 0 - values stored in the histogram, encoded as text
+    - 1 - indexes into the deduplicated arrays
+    - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 5c5c59a..3e4f4d1 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -18,6 +18,8 @@ Currently we only have two kinds of multivariate statistics
 
     (b) MCV lists (README.mcv)
 
+    (c) multivariate histograms (README.histogram)
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d1da714..ffb76f4 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..9e5620a
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2032 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size.
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket
+ * for more details about the algorithm.
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later
+	 * to select dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		int				j;
+		int				nvals;
+		Datum		   *tmp;
+
+		SortSupportData	ssup;
+		StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		nvals = 0;
+		tmp = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+		for (j = 0; j < numrows; j++)
+		{
+			bool	isnull;
+
+			/* remember the index of the sample row, to make the partitioning simpler */
+			Datum	value = heap_getattr(rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isnull);
+
+			if (isnull)
+				continue;
+
+			tmp[nvals++] = value;
+		}
+
+		/* do the sort and stuff only if there are non-NULL values */
+		if (nvals > 0)
+		{
+			/* sort the array of values */
+			qsort_arg((void *) tmp, nvals, sizeof(Datum),
+					  compare_scalars_simple, (void *) &ssup);
+
+			/* count distinct values */
+			ndistvalues[i] = 1;
+			for (j = 1; j < nvals; j++)
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+					ndistvalues[i] += 1;
+
+			/* FIXME allocate only needed space (count ndistinct first) */
+			distvalues[i] = (Datum*)palloc0(sizeof(Datum) * ndistvalues[i]);
+
+			/* now collect distinct values into the array */
+			distvalues[i][0] = tmp[0];
+			ndistvalues[i] = 1;
+
+			for (j = 1; j < nvals; j++)
+			{
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+				{
+					distvalues[i][ndistvalues[i]] = tmp[j];
+					ndistvalues[i] += 1;
+				}
+			}
+		}
+
+		pfree(tmp);
+	}
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats,
+							   ndistvalues, distvalues);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in
+		 * case some of the rows were used for MCV (and thus are missing
+		 * from the histogram).
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (10 * 1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 10MB (%ld > %d)",
+					total_length, (10 * 1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typlen > 0)
+			{
+				/* pased by value or reference, but fixed length */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary
+ * values deduplicated, so that it's possible to optimize the estimation
+ * part by caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* now let's allocate a single buffer for all the values and counts */
+
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+	for (i = 0; i < ndims; i++)
+	{
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+	}
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+				sizeof(MVSerializedBucket) +		/* bucket pointer */
+				sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval && info[i].typlen == sizeof(Datum))
+		{
+			/* passed by value / Datum - simply reuse the array */
+			histogram->values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typbyval)
+			{
+				/* pased by value, but smaller than Datum */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= *BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size. We
+ * select the bucket with the highest number of distinct values, and
+ * then split it by the longest dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this
+ * is used to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this
+ *       contains values for all the tuples from the sample, not just
+ *       the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *      For example use the "bucket volume" (product of dimension
+ *      lengths) to select the bucket.
+ *
+ *      We need buckets containing about the same number of tuples (so
+ *      about the same frequency), as that limits the error when we
+ *      match the bucket partially (in that case use 1/2 the bucket).
+ *
+ *      We also need buckets with "regular" size, i.e. not "narrow" in
+ *      some dimensions and "wide" in the others, because that makes
+ *      partial matches more likely and increases the estimation error,
+ *      especially when the clauses match many buckets partially. This
+ *      is especially serious for OR-clauses, because in that case any
+ *      of them may add the bucket as a (partial) match. With AND-clauses
+ *      all the clauses have to match the bucket, which makes this issue
+ *      somewhat less pressing.
+ *
+ *      For example this table:
+ *
+ *          CREATE TABLE t AS SELECT i AS a, i AS b
+ *                              FROM generate_series(1,1000000) s(i);
+ *          ALTER TABLE t ADD STATISTICS (histogram) ON (a,b);
+ *          ANALYZE t;
+ *
+ *      It's a very specific (and perhaps artificial) example, because
+ *      every bucket always has exactly the same number of distinct
+ *      values in all dimensions, which makes the partitioning tricky.
+ *
+ *      Then:
+ *
+ *          SELECT * FROM t WHERE a < 10 AND b < 10;
+ *
+ *      is estimated to return ~120 rows, while in reality it returns 9.
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *                      (actual time=0.185..270.774 rows=9 loops=1)
+ *         Filter: ((a < 10) AND (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      while the query using OR clauses is estimated like this:
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *                      (actual time=0.118..189.919 rows=9 loops=1)
+ *         Filter: ((a < 10) OR (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      which is clearly much worse. This happens because the histogram
+ *      contains buckets like this:
+ *
+ *          bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ *      i.e. the length of "a" dimension is (30310-3)=30307, while the
+ *      length of "b" is (30593-30134)=459. So the "b" dimension is much
+ *      narrower than "a". Of course, there are buckets where "b" is the
+ *      wider dimension.
+ *
+ *      This is partially mitigated by selecting the "longest" dimension
+ *      in partition_bucket() but that only happens after we already
+ *      selected the bucket. So if we never select the bucket, we can't
+ *      really fix it there.
+ *
+ *      The other reason why this particular example behaves so poorly
+ *      is due to the way we split the partition in partition_bucket().
+ *      Currently we attempt to divide the bucket into two parts with
+ *      the same number of sampled tuples (frequency), but that does not
+ *      work well when all the tuples are squashed on one end of the
+ *      bucket (e.g. exactly at the diagonal, as a=b). In that case we
+ *      split the bucket into a tiny bucket on the diagonal, and a huge
+ *      remaining part of the bucket, which is still going to be narrow
+ *      and we're unlikely to fix that.
+ *
+ *      So perhaps we need two partitioning strategies - one aiming to
+ *      split buckets with high frequency (number of sampled rows), the
+ *      other aiming to split "large" buckets. And alternating between
+ *      them, somehow.
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ *      histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest
+ * bucket dimension, measured using the array of distinct values built
+ * at the very beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly
+ * distributed, and then use this to measure length. It's essentially
+ * a number of distinct values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts
+ * with roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning
+ * the new bucket (essentially shrinking the existing one in-place and
+ * returning the other "half" as a new bucket). The caller is responsible
+ * for adding the new bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case
+ *      of strongly dependent columns - e.g. y=x). The current algorithm
+ *      minimizes this, but may still happen for perfectly dependent
+ *      examples (when all the dimensions have equal length, the first
+ *      one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	// int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split.
+	 */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* sort support for the bsearch_comparator */
+		ssup_private = &ssup;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		b = (Datum*)bsearch(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute
+		 *       here - we can start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram (or if there's no
+ * statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ * 
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options
+ * skew the lengths by distributing the distinct values uniformly. For
+ * data types without a clear meaning of 'distance' (e.g. strings) that
+ * is not a big deal, but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_size = 1.0;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(9 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+		values[3] = (char *) palloc0(1024 * sizeof(char));
+		values[4] = (char *) palloc0(1024 * sizeof(char));
+		values[5] = (char *) palloc0(1024 * sizeof(char));
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/*
+		 * currently we only print array of indexes, but the deduplicated
+		 * values should be sorted, so this is actually quite useful
+		 *
+		 * TODO print the actual min/max values, using the output
+		 *      function of the attribute type
+		 */
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bucket_size *= (bucket->max[i] - bucket->min[i]) * 1.0
+											/ (histogram->nvalues[i]-1);
+
+			/* print the actual values, i.e. use output function etc. */
+			if (otype == 0)
+			{
+				Datum minval, maxval;
+				Datum minout, maxout;
+
+				format = "%s, %s";
+				if (i == 0)
+					format = "{%s%s";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %s}";
+
+				minval = histogram->values[i][bucket->min[i]];
+				minout = FunctionCall1(&fmgrinfo[i], minval);
+
+				maxval = histogram->values[i][bucket->max[i]];
+				maxout = FunctionCall1(&fmgrinfo[i], maxval);
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], DatumGetPointer(minout));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], DatumGetPointer(maxout));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else if (otype == 1)
+			{
+				format = "%s, %d";
+				if (i == 0)
+					format = "{%s%d";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %d}";
+
+				snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else
+			{
+				format = "%s, %f";
+				if (i == 0)
+					format = "{%s%f";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %f}";
+
+				snprintf(buff, 1024, format, values[1],
+						 bucket->min[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2],
+						bucket->max[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == histogram->ndimensions-1)
+				format = "%s, %s}";
+
+			snprintf(buff, 1024, format, values[3], bucket->nullsonly[i] ? "t" : "f");
+			strncpy(values[3], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[4], bucket->min_inclusive[i] ? "t" : "f");
+			strncpy(values[4], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[5], bucket->max_inclusive[i] ? "t" : "f");
+			strncpy(values[5], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_size);	/* density */
+		snprintf(values[8], 64, "%f", bucket_size);	/* bucket_size */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+		pfree(values[4]);
+		pfree(values[5]);
+		pfree(values[6]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int i, j;
+
+	float ffull = 0, fpartial = 0;
+	int nfull = 0, npartial = 0;
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		char ranges[1024];
+
+		if (! matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		memset(ranges, 0, sizeof(ranges));
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			sprintf(ranges, "%s [%d %d]", ranges,
+										  DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+										  DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, ranges, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 6339631..3543239 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,9 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2154,8 +2154,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 9));
+							PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index fd7107d..a5945af 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,13 +38,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -52,6 +55,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -67,17 +71,21 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					11
+#define Natts_pg_mv_statistic					15
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
 #define Anum_pg_mv_statistic_mcv_enabled		5
-#define Anum_pg_mv_statistic_mcv_max_items		6
-#define Anum_pg_mv_statistic_deps_built			7
-#define Anum_pg_mv_statistic_mcv_built			8
-#define Anum_pg_mv_statistic_stakeys			9
-#define Anum_pg_mv_statistic_stadeps			10
-#define Anum_pg_mv_statistic_stamcv				11
+#define Anum_pg_mv_statistic_hist_enabled		6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_hist_max_buckets	8
+#define Anum_pg_mv_statistic_deps_built			9
+#define Anum_pg_mv_statistic_mcv_built			10
+#define Anum_pg_mv_statistic_hist_built			11
+#define Anum_pg_mv_statistic_stakeys			12
+#define Anum_pg_mv_statistic_stadeps			13
+#define Anum_pg_mv_statistic_stamcv				14
+#define Anum_pg_mv_statistic_stahist			15
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b16eebc..19a490a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2674,6 +2674,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_size}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 2bcd582..8c50bfb 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -654,10 +654,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 4535db7..f05a517 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -92,6 +92,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -99,20 +216,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -121,6 +243,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -130,10 +254,20 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..e830816
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s7 ON mv_histogram (unknown_column) WITH (histogram);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s7 ON mv_histogram (a) WITH (histogram);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a, b) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+ERROR:  maximum number of buckets is 16384
+-- correct command
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s8 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s9 ON mv_histogram (a, b, c, d) WITH (histogram);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 66071d8..1a1a4ca 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1375,7 +1375,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 85d94f1..a885235 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 6584d73..2efdcd7 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -164,3 +164,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..27c2510
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s7 ON mv_histogram (unknown_column) WITH (histogram);
+
+-- single column
+CREATE STATISTICS s7 ON mv_histogram (a) WITH (histogram);
+
+-- single column, duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a) WITH (histogram);
+
+-- two columns, one duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a, b) WITH (histogram);
+
+-- unknown option
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (unknown_option);
+
+-- missing histogram statistics
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+
+-- invalid max_buckets value / too low
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+
+-- invalid max_buckets value / too high
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+
+-- correct command
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s8 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s9 ON mv_histogram (a, b, c, d) WITH (histogram);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.1.0

0006-multi-statistics-estimation.patchtext/x-patch; charset=UTF-8; name=0006-multi-statistics-estimation.patchDownload

From 04b77a1750694b49ee6f3db9400980b20ae307cd Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/9] multi-statistics estimation

The general idea is that a probability (which is what selectivity is)
can be split into a product of conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part may be
simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute the original
probability.

The implementation works in the other direction, though. We know what
probability P(A & B & C) we need to compute, and also what statistics
are available.

So we search for a combinations of statistics, covering the clauses in
an optimal way (most clauses covered, most dependencies exploited).

There are two possible approaches - exhaustive and greedy. The
exhaustive one walks through all permutations of stats using dynamic
programming, so it's guaranteed to find the optimal solution, but it
soon gets very slow as it's roughly O(N!). The dynamic programming may
improve that a bit, but it's still far too expensive for large numbers
of statistics (on a single table).

The greedy algorithm is very simple - in every step choose the best
solution. That may not guarantee the best solution globally (but maybe
it does?), but it only needs N steps to find the solution, so it's very
fast (processing the selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with respect to
runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply them to the
clauses using the conditional probabilities. We process the selected
stats one by one, and for each we select the estimated clauses and
conditions. See clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to be covered by
a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single multivariate
statistics.

Clauses not covered by a single statistics at this level will be passed
to clause_selectivity() but this will treat them as a collection of
simpler clauses (connected by AND or OR), and the clauses from the
previous level will be used as conditions.

So using the same example, the last clause will be passed to
clause_selectivity() with 'clause1' and 'clause2' as conditions, and it
will be processed using multivariate stats if possible.

The other limitation is that all the expressions have to be
mv-compatible, i.e. there can't be a mix of expressions. If this is
violated, the clause may be passed to the next level (just like with
list of clauses not covered by a single statistics), which splits that
into clauses handled by multivariate stats and clauses handler by
regular statistics.

rework clauselist_selectivity_or to handle OR-clauses correctly

We might invent a completely new set of functions here, resembling
clauselist_selectivity but adapting the ideas to OR-clauses.

But luckily we know that each OR-clause

    (a OR b OR c)

may be rewritten as an equivalent AND-clause using negation:

    NOT ((NOT a) AND (NOT b) AND (NOT c))

And that's something we can pass to clauselist_selectivity.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |   11 +-
 src/backend/optimizer/path/clausesel.c | 1990 ++++++++++++++++++++++++++------
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/backend/utils/mvstats/README.stats |  166 +++
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 10 files changed, 1890 insertions(+), 358 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index dc035d7..8f11b7a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -969,7 +969,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 76d0e15..e78f140 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -498,7 +498,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2149,7 +2150,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
@@ -3618,7 +3620,8 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 0,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
 	/*
@@ -3637,7 +3640,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 		 */
 		fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
 														0, fpinfo->jointype,
-														extra->sjinfo);
+														extra->sjinfo, NIL);
 
 	}
 	fpinfo->server = GetForeignServer(joinrel->serverid);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0de2418..c1b8999 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -60,23 +69,25 @@ static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
+static List *clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
-static List *clauselist_mv_split(PlannerInfo *root, Index relid,
-								 List *clauses, List **mvclauses,
-								 MVStatisticInfo *mvstats, int types);
-
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats, List *clauses,
+									List *conditions, bool is_or);
 
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -90,10 +101,33 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root, Index relid,
+							List *mvstats, List *clauses, List *conditions);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
 
+static bool stats_type_matches(MVStatisticInfo *stat, int type);
+
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
 
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
@@ -168,14 +202,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* list of multivariate stats on the relation */
 	List	   *stats = NIL;
@@ -191,12 +226,13 @@ clauselist_selectivity(PlannerInfo *root,
 		stats = find_stats(root, relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Apply functional dependencies, but first check that there are some stats
@@ -228,31 +264,96 @@ clauselist_selectivity(PlannerInfo *root,
 		(count_mv_attnums(clauses, relid,
 						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
-		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
-											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+		ListCell *s;
+
+		/*
+		 * Copy the conditions we got from the upper part of the expression tree
+		 * so that we can add local conditions to it (we need to keep the
+		 * original list intact, for sibling expressions - other expressions
+		 * at the same level).
+		 */
+		List *conditions_local = list_copy(conditions);
 
-		/* and search for the statistic covering the most attributes */
-		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+		/* find the best combination of statistics */
+		List *solution = choose_mv_statistics(root, relid, stats,
+											  clauses, conditions);
 
-		if (mvstat != NULL)	/* we have a matching stats */
+		/*
+		 * We have a good solution, which is merely a list of statistics that
+		 * we need to apply. We'll apply the statistics one by one (in the order
+		 * as they appear in the list), and for each statistic we'll
+		 *
+		 * (1) find clauses compatible with the statistic (and remove them
+		 *     from the list)
+		 *
+		 * (2) find local conditions compatible with the statistic
+		 *
+		 * (3) do the estimation P(clauses | conditions)
+		 *
+		 * (4) append the estimated clauses to local conditions
+		 *
+		 * continuously modify 
+		 */
+		foreach (s, solution)
 		{
-			/* clauses compatible with multi-variate stats */
-			List	*mvclauses = NIL;
+			MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
 
-			/* split the clauselist into regular and mv-clauses */
-			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+			/* clauses compatible with the statistic we're applying right now */
+			List	*stat_clauses = NIL;
+			List	*stat_conditions = NIL;
 
-			/* we've chosen the histogram to match the clauses */
-			Assert(mvclauses != NIL);
+			/*
+			 * Find clauses and conditions matching the statistic - the clauses
+			 * need to be removed from the list, while conditions should remain
+			 * there (so that we can apply them repeatedly).
+			 */
+			stat_clauses
+				= clauses_matching_statistic(&clauses, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 true);
+
+			stat_conditions
+				= clauses_matching_statistic(&conditions_local, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 false);
+
+			/*
+			 * If we got no clauses to estimate, we've done something wrong,
+			 * either during the optimization, detecting compatible clause, or
+			 * somewhere else.
+			 *
+			 * Also, we need at least two attributes in clauses and conditions.
+			 */
+			Assert(stat_clauses != NIL);
+			Assert(count_mv_attnums(list_union(stat_clauses, stat_conditions),
+								relid, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);
 
 			/* compute the multivariate stats */
-			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			s1 *= clauselist_mv_selectivity(root, mvstat,
+											stat_clauses, stat_conditions,
+											false); /* AND */
+
+			/*
+			 * Add the new clauses to the local conditions, so that we can use
+			 * them for the subsequent statistics. We only add the clauses,
+			 * because the conditions are already there (or should be).
+			 */
+			conditions_local = list_concat(conditions_local, stat_clauses);
 		}
+
+		/* from now on, work only with the 'local' list of conditions */
+		conditions = conditions_local;
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return s1 * clause_selectivity(root, (Node *) linitial(clauses),
+									   varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -264,7 +365,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -423,6 +525,55 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for OR-clauses. We can't simply use
+ * the same multi-statistic estimation logic for AND-clauses, at least not
+ * directly, because there are a few key differences:
+ *
+ *   - functional dependencies don't really apply to OR-clauses
+ *
+ *   - clauselist_selectivity() is based on decomposing the selectivity into
+ *     a sequence of conditional probabilities (selectivities), but that can
+ *     be done only for AND-clauses
+ *
+ * We might invent a similar infrastructure for optimizing OR-clauses, doing
+ * something similar to what clause_selectivity does for AND-clauses, but
+ * luckily we know that each disjunctive normal form (aka OR-clause)
+ *
+ *     (a OR b OR c)
+ *
+ * may be rewritten as an equivalent conjunctive normal form (aka AND-clause)
+ * by using negation:
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * And that's something we can pass to clauselist_selectivity and let it do
+ * all the heavy lifting.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	List	   *args = NIL;
+	ListCell   *l;
+	Expr	   *expr;
+
+	/* build arguments for the AND-clause by negating args of the OR-clause */
+	foreach (l, clauses)
+		args = lappend(args, makeBoolExpr(NOT_EXPR, list_make1(lfirst(l)), -1));
+
+	/* and then the actual OR-clause on the negated args */
+	expr = makeBoolExpr(AND_EXPR, args, -1);
+
+	/* instead of constructing NOT expression, just do (1.0 - s) */
+	return 1.0 - clauselist_selectivity(root, list_make1(expr), varRelid,
+										jointype, sjinfo, conditions);
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -629,7 +780,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -749,7 +901,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -758,29 +911,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -870,7 +1012,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -879,7 +1022,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else
 	{
@@ -943,15 +1087,16 @@ clause_selectivity(PlannerInfo *root,
  *          in the MCV list, then the selectivity is below the lowest frequency
  *          found in the MCV list,
  *
- * TODO When applying the clauses to the histogram/MCV list, we can do
- *      that from the most selective clauses first, because that'll
- *      eliminate the buckets/items sooner (so we'll be able to skip
- *      them without inspection, which is more expensive). But this
- *      requires really knowing the per-clause selectivities in advance,
- *      and that's not what we do now.
+ * TODO When applying the clauses to the histogram/MCV list, we can do that from
+ *      the most selective clauses first, because that'll eliminate the
+ *      buckets/items sooner (so we'll be able to skip them without inspection,
+ *      which is more expensive). But this requires really knowing the
+ *      per-clause selectivities in advance, and that's not what we do now.
+ *
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -969,7 +1114,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -982,7 +1128,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
 	 *      selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1016,260 +1163,1325 @@ get_varattnos(Node * node, Index relid)
 								 k + FirstLowInvalidHeapAttributeNumber);
 	}
 
-	bms_free(varattnos);
+	bms_free(varattnos);
+
+	return result;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid, int types)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		bms_free(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid, int type)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+static List *
+clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove)
+{
+	int i;
+	Bitmapset  *stat_attnums = NULL;
+	List	   *matching_clauses = NIL;
+	ListCell   *lc;
+
+	/* build attnum bitmapset for this statistics */
+	for (i = 0; i < statistic->stakeys->dim1; i++)
+		stat_attnums = bms_add_member(stat_attnums,
+									  statistic->stakeys->values[i]);
+
+	/*
+	 * We can't use foreach here, because we may need to remove some of the
+	 * clauses if (remove=true).
+	 */
+	lc = list_head(*clauses);
+	while (lc)
+	{
+		Node *clause = (Node*)lfirst(lc);
+		Bitmapset *attnums = NULL;
+
+		/* must advance lc before list_delete possibly pfree's it */
+		lc = lnext(lc);
+
+		/*
+		 * skip clauses that are not compatible with stats (just leave them
+		 * in the original list)
+		 *
+		 * XXX Perhaps this should check what stats are actually available in
+		 *     the statistics (not a big deal now, because MCV and histograms
+		 *     handle the same types of conditions).
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			bms_free(attnums);
+			continue;
+		}
+
+		/* if the clause is covered by the statistic, add it to the list */
+		if (bms_is_subset(attnums, stat_attnums))
+		{
+			matching_clauses = lappend(matching_clauses, clause);
+
+			/* if remove=true, remove the matching item from the main list */
+			if (remove)
+				*clauses = list_delete_ptr(*clauses, clause);
+		}
+
+		bms_free(attnums);
+	}
+
+	bms_free(stat_attnums);
+
+	return matching_clauses;
+}
+
+/*
+ * Selects the best combination of multivariate statistics, in an exhaustive
+ * way, where 'best' means:
+ *
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
+ *
+ * Don't call this directly but through choose_mv_statistics(), which does some
+ * additional tricks to minimize the runtime.
+ *
+ *
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with maximum
+ * depth equal to the number of multi-variate statistics available on the table.
+ * It actually explores all valid combinations of stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it matches are
+ * divided into 'conditions' (clauses already matched by at least one previous
+ * statistics) and clauses that are estimated.
+ *
+ * Then several checks are performed:
+ *
+ *  (a) The statistics covers at least 2 columns, referenced in the estimated
+ *      clauses (otherwise multi-variate stats are useless).
+ *
+ *  (b) The statistics covers at least 1 new column, i.e. column not refefenced
+ *      by the already used stats (and the new column has to be referenced by
+ *      the clauses, of couse). Otherwise the statistics would not add any new
+ *      information.
+ *
+ * There are some other sanity checks (e.g. stats must not be used twice etc.).
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a rather simple optimality criteria, so it may
+ * not do the best choice when
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but with
+ *     statistics in a different order). It's unclear which solution in the best
+ *     one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those solutions,
+ *      and then combine them to get the final estimate (e.g. by using average
+ *      or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for some
+ *     types of clauses (e.g. MCV list is generally a better match for equality
+ *     conditions than a histogram).
+ *
+ *     But maybe this is pointless - generally, each column is either a label
+ *     (it's not important whether because of the data type or how it's used),
+ *     or a value with ordering that makes sense. So either a MCV list is more
+ *     appropriate (labels) or a histogram (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing both types of data
+ *     (some columns would work best with MCVs, some with histograms). Maybe we
+ *     could invent a new type of statistics combining MCV list and histogram
+ *     (keeping a small histogram for each MCV item, and a separate histogram
+ *     for values not on the MCV list).
+ *
+ * TODO The algorithm should probably count number of Vars (not just attnums)
+ *      when computing the 'score' of each solution. Computing the ratio of
+ *      (num of all vars) / (num of condition vars) as a measure of how well
+ *      the solution uses conditions might be useful.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* this may run for a long sime, so let's make it interruptible */
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics covering
+ * the clauses. This chooses the "best" statistics at each step, so the
+ * resulting solution may not be the best solution globally, but this produces
+ * the solution in only N steps (where N is the number of statistics), while
+ * the exhaustive approach may have to walk through ~N! combinations (although
+ * some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does the same
+ * thing (but in a different way).
+ *
+ * Don't call this directly, but through choose_mv_statistics().
+ *
+ * TODO There are probably other metrics we might use - e.g. using number of
+ *      columns (num_cond_columns / num_cov_columns), which might work better
+ *      with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled in a special
+ *      way, because there will be 0 conditions at that moment, so there needs
+ *      to be some other criteria - e.g. using the simplest (or most complex?)
+ *      clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria, and branch
+ *      the search. This is however tricky, because if we choose k statistics at
+ *      each step, we get k^N branches to walk through (with N steps). That's
+ *      not really good with large number of stats (yet better than exhaustive
+ *      search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
+ * Remove clauses not covered by any of the available statistics
+ *
+ * This helps us to reduce the amount of work done in choose_mv_statistics()
+ * by not having to deal with clauses that can't possibly be useful.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Index relid, int type,
+			   List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+
+		/*
+		 * We do assume that thanks to previous checks, we should not run into
+		 * clauses that are incompatible with multivariate stats here. We also
+		 * need to collect the attnums for the clause.
+		 *
+		 * XXX Maybe turn this into an assert?
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &clause_attnums, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* Is there a multivariate statistics covering the clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			/* skip statistics not matching the required type */
+			if (! stats_type_matches(stat, type))
+				continue;
+
+			/*
+			 * see if all clause attributes are covered by the statistic
+			 *
+			 * We'll do that in the opposite direction, i.e. we'll see how many
+			 * attributes of the statistic are referenced in the clause, and then
+			 * compare the counts.
+			 */
+			for (k = 0; k < stat->stakeys->dim1; k++)
+				if (bms_is_member(stat->stakeys->values[k], clause_attnums))
+					matches += 1;
+
+			/*
+			 * If the number of matches is equal to attributes referenced by the
+			 * clause, then the clause is covered by the statistic.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+/*
+ * Remove statistics not covering any new clauses
+ *
+ * Statistics not covering any new clauses (conditions don't count) are not
+ * really useful, so let's ignore them. Also, we need the statistics to
+ * reference at least two different attributes (both in conditions and clauses
+ * combined), and at least one of them in the clauses alone.
+ *
+ * This check might be made more strict by checking against individual clauses,
+ * because by using the bitmapsets of all attnums we may actually use attnums
+ * from clauses that are not covered by the statistics. For example, we may
+ * have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this (assuming
+ * there are some statistics covering both clases).
+ *
+ * Parameters:
+ *
+ *     stats       - list of statistics to filter
+ *     new_attnums - attnums referenced in new clauses
+ *     all_attnums - attnums referenced by contidions and new clauses combined
+ *
+ * Returns filtered list of statistics.
+ *
+ * TODO Do the more strict check, i.e. walk through individual clauses and
+ *      conditions and only use those covered by the statistics.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
+
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
+
+	return mvstats;
+}
+
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
+
+	Assert(nmvstats > 0);
 
-	return result;
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
 }
 
+
 /*
- * Collect attributes from mv-compatible clauses.
+ * Remove redundant statistics
+ *
+ * If there are multiple statistics covering the same set of columns (counting
+ * only those referenced by clauses and conditions), we can apply one of those
+ * anyway and further reduce the size of the optimization problem.
+ *
+ * Thus when redundant stats are detected, we keep the smaller one (the one with
+ * fewer columns), based on the assumption that it's more accurate and also
+ * faster to process. That may be untrue for two reasons - first, the accuracy
+ * really depends on number of buckets/MCV items, not the number of columns.
+ * Second, some types of statistics may work better for certain types of clauses
+ * (e.g. MCV lists for equality conditions) etc.
  */
-static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid, int types)
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
 {
-	Bitmapset  *attnums = NULL;
-	ListCell   *l;
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate using
-	 * multivariate stats, and remember the relid/columns. We'll then
-	 * cross-check if we have suitable stats, and only if needed we'll split
-	 * the clauses into multivariate and regular lists.
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+
+	/*
+	 * Get the varattnos from both conditions and clauses.
+	 *
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
-	 * using either a range or equality.
+	 * XXX Is that really true?
 	 */
-	foreach (l, clauses)
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
 	{
-		Node	   *clause = (Node *) lfirst(l);
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
 
-		/* ignore the result here - we only need the attnums */
-		clause_is_mv_compatible(clause, relid, &attnums, types);
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
 	}
 
-	/*
-	 * If there are not at least two attributes referenced by the clause(s),
-	 * we can throw everything out (as we'll revert to simple stats).
-	 */
-	if (bms_num_members(attnums) <= 1)
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
-		attnums = NULL;
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
 	}
 
-	return attnums;
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
 }
 
-/*
- * Count the number of attributes in clauses compatible with multivariate stats.
- */
-static int
-count_mv_attnums(List *clauses, Index relid, int type)
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
 {
-	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+	int i;
+	ListCell *l;
 
-	c = bms_num_members(attnums);
+	Node** clauses_array;
 
-	bms_free(attnums);
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
 
-	return c;
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
 }
 
-/*
- * Count varnos referenced in the clauses, and if there's a single varno then
- * return the index in 'relid'.
- */
-static int
-count_varnos(List *clauses, Index *relid)
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Index relid,
+					 int type, Node **clauses, int nclauses)
 {
-	int cnt;
-	Bitmapset *varnos = NULL;
+	int			i;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
 
-	varnos = pull_varnos((Node *) clauses);
-	cnt = bms_num_members(varnos);
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
 
-	/* if there's a single varno in the clauses, remember it */
-	if (bms_num_members(varnos) == 1)
-		*relid = bms_singleton_member(varnos);
+		if (! clause_is_mv_compatible(clauses[i], relid, &attnums, type))
+			elog(ERROR, "should not get non-mv-compatible clause");
 
-	bms_free(varnos);
+		clauses_attnums[i] = attnums;
+	}
 
-	return cnt;
+	return clauses_attnums;
 }
- 
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
+}
+
 /*
- * We're looking for statistics matching at least 2 attributes, referenced in
- * clauses compatible with multivariate statistics. The current selection
- * criteria is very simple - we choose the statistics referencing the most
- * attributes.
- *
- * If there are multiple statistics referencing the same number of columns
- * (from the clauses), the one with less source columns (as listed in the
- * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
- *
- * This is a very simple criteria, and has several weaknesses:
- *
- * (a) does not consider the accuracy of the statistics
- *
- *     If there are two histograms built on the same set of columns, but one
- *     has 100 buckets and the other one has 1000 buckets (thus likely
- *     providing better estimates), this is not currently considered.
- *
- * (b) does not consider the type of statistics
- *
- *     If there are three statistics - one containing just a MCV list, another
- *     one with just a histogram and a third one with both, we treat them equally.
+ * Chooses the combination of statistics, optimal for estimation of a particular
+ * clause list.
  *
- * (c) does not consider the number of clauses
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce the size
+ * of the problem (eliminate clauses/statistics that can't be really used in
+ * the solution).
  *
- *     As explained, only the number of referenced attributes counts, so if
- *     there are multiple clauses on a single attribute, this still counts as
- *     a single attribute.
+ * It also precomputes bitmaps for attributes covered by clauses and statistics,
+ * so that we don't need to do that over and over in the actual optimizations
+ * (as it's both CPU and memory intensive).
  *
- * (d) does not consider type of condition
  *
- *     Some clauses may work better with some statistics - for example equality
- *     clauses probably work better with MCV lists than with histograms. But
- *     IS [NOT] NULL conditions may often work better with histograms (thanks
- *     to NULL-buckets).
+ * TODO Another way to make the optimization problems smaller might be splitting
+ *      the statistics into several disjoint subsets, i.e. if we can split the
+ *      graph of statistics (after the elimination) into multiple components
+ *      (so that stats in different components share no attributes), we can do
+ *      the optimization for each component separately.
  *
- * So for example with five WHERE conditions
- *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
- *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
- * as it references the most columns.
- *
- * Once we have selected the multivariate statistics, we split the list of
- * clauses into two parts - conditions that are compatible with the selected
- * stats, and conditions are estimated using simple statistics.
- *
- * From the example above, conditions
- *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
- *
- * will be estimated using the multivariate statistics (a,b,c,d) while the last
- * condition (e = 1) will get estimated using the regular ones.
- *
- * There are various alternative selection criteria (e.g. counting conditions
- * instead of just referenced attributes), but eventually the best option should
- * be to combine multiple statistics. But that's much harder to do correctly.
- *
- * TODO Select multiple statistics and combine them when computing the estimate.
- *
- * TODO This will probably have to consider compatibility of clauses, because
- *      'dependencies' will probably work only with equality clauses.
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew that we
+ *      can cover 10 clauses and reuse 8 dependencies, maybe covering 9 clauses
+ *      and 7 dependencies would be OK?
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static List*
+choose_mv_statistics(PlannerInfo *root, Index relid, List *stats,
+					 List *clauses, List *conditions)
 {
 	int i;
-	ListCell   *lc;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
+
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
 
-	MVStatisticInfo *choice = NULL;
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements) and for
-	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
 	 */
-	foreach (lc, stats)
+	while (true)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
-
-		/* columns matching this statistics */
-		int matches = 0;
+		List	   *tmp;
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
-			continue;
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, relid, type,
+							 stats, clauses, &compatible_attnums);
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
 
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		if (conditions != NIL)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
+			tmp = filter_clauses(root, relid, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
 		}
-	}
 
-	return choice;
-}
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
 
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
 
-/*
- * This splits the clauses list into two parts - one containing clauses that
- * will be evaluated using the chosen statistics, and the remaining clauses
- * (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, Index relid,
-					List *clauses, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
 
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
 
-	Bitmapset *mvattnums = NULL;
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
 
-	/* build bitmap of attributes, so we can do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
 
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, relid, type,
+										   clauses_array, nclauses);
 
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, relid, type,
+											  conditions_array, nconditions);
 
-		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
 		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
 		}
 
-		/*
-		 * The clause matches the selected stats, so put it to the list of
-		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
-		 * clauses (that may be selected later).
-		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
+		pfree(best);
 	}
 
-	/*
-	 * Perform regular estimation using the clauses incompatible with the chosen
-	 * histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
+
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
+
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
+
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
 
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
+
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
+
+	return result;
 }
 
 typedef struct
@@ -1474,6 +2686,7 @@ clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int type
 	return true;
 }
 
+
 /*
  * collect attnums from functional dependencies
  *
@@ -2022,6 +3235,24 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
  * Check that there are stats with at least one of the requested types.
  */
 static bool
+stats_type_matches(MVStatisticInfo *stat, int type)
+{
+	if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+		return true;
+
+	if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+		return true;
+
+	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
 has_stats(List *stats, int type)
 {
 	ListCell   *s;
@@ -2030,13 +3261,8 @@ has_stats(List *stats, int type)
 	{
 		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
 
-		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+		/* terminate if we've found at least one matching statistics */
+		if (stats_type_matches(stat, type))
 			return true;
 	}
 
@@ -2087,22 +3313,26 @@ find_stats(PlannerInfo *root, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2113,32 +3343,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2369,64 +3652,57 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				}
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mcvlist->nitems; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2484,15 +3760,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2505,25 +3784,55 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
 
-	nmatches = mvhist->nbuckets;
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2537,10 +3846,23 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 #ifdef DEBUG_MVHIST
@@ -2549,9 +3871,14 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /* cached result of bucket boundary comparison for a single dimension */
@@ -2699,7 +4026,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 {
 	int i;
 	ListCell * l;
-
+ 
 	/*
 	 * Used for caching function calls, only once per deduplicated value.
 	 *
@@ -2742,7 +4069,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 			FmgrInfo	opproc;			/* operator */
 			fmgr_info(get_opcode(expr->opno), &opproc);
-
+ 
 			/* reset the cache (per clause) */
 			memset(callcache, 0, mvhist->nbuckets);
 
@@ -2902,64 +4229,57 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
-			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			/* by default none of the buckets matches the clauses (OR clause) */
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* but AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
-
+			pfree(tmp_matches);
 		}
 		else
 			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* free the call cache */
 	pfree(callcache);
 
 	return nmatches;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..57214e0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3518,7 +3518,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3541,7 +3542,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3708,7 +3710,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3744,7 +3746,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3781,7 +3784,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3919,12 +3923,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3936,7 +3942,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index ea831f5..6299e75 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 46c95b0..7d0a3a1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1627,13 +1627,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6259,7 +6261,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6579,7 +6582,8 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7330,7 +7334,8 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7560,7 +7565,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea5a09a..27a8de5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -393,6 +394,15 @@ static const struct config_enum_entry force_parallel_mode_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3707,6 +3717,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 3e4f4d1..d404914 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -90,6 +90,137 @@ even attempting to do the more expensive estimation.
 Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
+Combining multiple statistics
+-----------------------------
+
+When estimating selectivity of a list of clauses, there may exist no statistics
+covering all of them. If there are multiple statistics, each covering some
+subset of the attributes, the optimizer needs to figure out which of those
+statistics to apply.
+
+When the statistics do not overlap, the solution is trivial - we can simply
+split the groups of conditions by the matching statistics, and then multiply the
+selectivities. For example assume multivariate statistics on (b,c) and (d,e),
+and a condition like this:
+
+    (a=1) AND (b=2) AND (c=3) AND (d=4) AND (e=5)
+
+Then (a=1) is not covered by any of the statistics, so will be estimated using
+the regular per-column statistics. The two conditions ((b=2) AND (c=3)) will be
+estimated using the (b,c) statistics, and ((d=4) AND (e=5)) will be estimated
+using (d,e) statistics. And the resulting selectivities will be estimated.
+
+Now, what if the statistics overlap? For example assume the same condition as
+above, but let's say we have statistics on (a,b,c) and (a,c,d,e). What then?
+
+As selectivity is just a probability that the condition holds for a random row,
+we can write the selectivity like this:
+
+    P(a=1 & b=2 & c=3 & d=4 & e=5)
+
+and we can rewrite it using conditional probability like this
+
+    P(a=1 & b=2 & c=3) * P(d=4 & e=5 | a=1 & b=2 & c=3)
+
+Notice that the first part already matches to (a,b,c) statistics. If we assume
+that columns that are not referenced by the same statistics are independent, we
+may rewrite the second half like this
+
+    P(d=4 & e=5 | a=1 & b=2 & c=3) = P(d=4 & e=5 | a=1 & c=3)
+
+which corresponds to the statistics on (a,c,d,e).
+
+If there are multiple statistics defined on a table, it's not difficult to come
+up with examples when there are multiple ways to combine them to cover a list of
+clauses. We need a way to find the best combination of statistics.
+
+This is the purpose of choose_mv_statistics(). It searches through the possible
+combinations of statistics, and searches such combination that
+
+    (a) covers the most clauses of the list
+
+    (b) reuses the maximum number of clauses as conditions
+        (in conditional probabilities)
+
+While (a) criteria seems natural, the (b) may seem a bit awkward at first. The
+idea is that conditions in a way of transfering information about dependencies
+between statistics.
+
+There are two alternative implementations of choose_mv_statistics() - greedy
+and exhaustive. Exhaustive actually searches through all possible combinations
+of statistics, and for larger numbers of statistics may get quite expensive
+(as it, unsurprisingly, has exponential cost). Greedy terminates in less than
+K steps (when K is the number of clauses), and in each step chooses the best
+next statistics. I've been unable to come up with an example where those two
+approaches would produce different combinations.
+
+It's possible to choose the optimization using mvstat_search_type, with either
+'greedy' or 'exhaustive' values (default is 'greedy').
+
+    SET mvstat_search_type = 'exhaustive';
+
+Note: This is meant mostly for experimentation. I do expect we'll choose one of
+the algorithms and remove the GUC before commit.
+
+
+Limitations of combining statistics
+-----------------------------------
+
+As described in the section 'Combining multiple statistics', the current appoach
+is based on transfering information between statistics by means of conditional
+probabilities. This is a relatively cheap and efficient approach, but it is
+based on two assumptions:
+
+    (1) The overlap between the statistics needs to be sufficiently large, i.e.
+        there needs to be enough columns shared by the statistics to transfer
+        information about dependencies between the remaining columns.
+
+    (2) The query needs to include sufficient clauses on the shared columns.
+
+How a violation of those assumptions may be a problem can be illustrated by
+a simple example. Assume a table with three columns (a,b,c) containing exactly
+the same values, and statistics on (a,b) and (b,c):
+
+    CREATE TABLE test AS SELECT i, i, i
+                           FROM generate_series(1,1000);
+
+    CREATE STATISTICS s1 ON test (a,b) WITH (mcv);
+    CREATE STATISTICS s2 ON test (b,c) WITH (mcv);
+
+    ANALYZE test;
+
+First, let's estimate this query:
+
+    SELECT * FROM test WHERE (a < 10) AND (c < 10);
+
+Clearly, there are no conditions on 'b' (which is the only column shared by the
+two statistics), so we'll end up with an estimate based on assumption of
+independence:
+
+    P(a < 10) * P(c < 10) = 0.01 * 0.01 = 0.0001
+
+Which is a significant under-estimate, as the proper selectivity is 0.01.
+
+But let's estimate another query:
+
+    SELECT * FROM test WHERE (a < 10) AND (b < 500) AND (c < 10);
+
+In this case, the estimate may be computed for example like this:
+
+    P[(a < 10) & (b < 500) & (c < 10)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (a < 10) & (b < 500)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (b < 500)]
+
+The trouble is the probability P(c < 10 | b < 500) evaluates to 0.02, because
+we have assumed (a) and (c) are independent because there is no statistic
+containing both these columns, and the condition on (b) does not transfer
+sufficient amount of information between the two statistics.
+
+Currently, the only solution is to build statistics on all three columns, but
+see the 'combining statistics using convolution' section for ideas on how to
+improve this.
+
+
 Further (possibly crazy) ideas
 ------------------------------
 
@@ -111,3 +242,38 @@ But of course, this may result in expensive estimation (CPU-wise).
 
 So we might add a GUC to choose between a simple (single statistics) and thus
 multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
+
+
+Combining stats using convolution
+---------------------------------
+
+While the current approach for combining statistics is based on conditional
+probabilities, and thus only works when the query includes conditions on the
+overlapping parts of the statistics. But there may be other ways to combine
+statistics, relaxing this requirement.
+
+Let's assume two histograms H1 and H2 - then combining them might work about
+like this:
+
+
+    for (buckets of H1, satisfying local conditions)
+    {
+        for (buckets of H2, overlapping with H1 bucket)
+        {
+            mark H2 bucket as 'valid'
+        }
+    }
+
+    s1 = s2 = 0.0
+    for (buckets of H2 marked as valid)
+    {
+        s1 += frequency
+
+        if (bucket satistifes local conditions)
+             s2 += frequency
+    }
+
+    s = (s2 / s1) /* final selectivity estimate */
+
+However this may quickly get non-trivial, e.g. when combining two statistics
+of different types (histogram vs. MCV).
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..33f5a1b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -192,11 +192,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index f05a517..35b2f8e 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
2.1.0

0007-multivariate-ndistinct-coefficients.patchtext/x-patch; charset=UTF-8; name=0007-multivariate-ndistinct-coefficients.patchDownload

From bcc5f072c0d14e824c9f50b2b6f5f31e864d92e6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Wed, 23 Dec 2015 02:07:58 +0100
Subject: [PATCH 7/9] multivariate ndistinct coefficients

---
 doc/src/sgml/ref/create_statistics.sgml    |   9 ++
 src/backend/catalog/system_views.sql       |   3 +-
 src/backend/commands/analyze.c             |   2 +-
 src/backend/commands/statscmds.c           |  11 +-
 src/backend/optimizer/path/clausesel.c     |   4 +
 src/backend/optimizer/util/plancat.c       |   4 +-
 src/backend/utils/adt/selfuncs.c           |  93 +++++++++++++++-
 src/backend/utils/mvstats/Makefile         |   2 +-
 src/backend/utils/mvstats/README.ndistinct |  83 ++++++++++++++
 src/backend/utils/mvstats/README.stats     |   2 +
 src/backend/utils/mvstats/common.c         |  23 +++-
 src/backend/utils/mvstats/mvdist.c         | 171 +++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h      |  26 +++--
 src/include/nodes/relation.h               |   2 +
 src/include/utils/mvstats.h                |   9 +-
 src/test/regress/expected/rules.out        |   3 +-
 16 files changed, 424 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.ndistinct
 create mode 100644 src/backend/utils/mvstats/mvdist.c

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index fd3382e..80360a6 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -168,6 +168,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6afdee0..a550141 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -169,7 +169,8 @@ CREATE VIEW pg_mv_stats AS
         length(S.stamcv) AS mcvbytes,
         pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
         length(S.stahist) AS histbytes,
-        pg_mv_stats_histogram_info(S.stahist) AS histinfo
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo,
+        standcoeff AS ndcoeff
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8ac9915..b4f5927 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -582,7 +582,7 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 		}
 
 		/* Build multivariate stats (if there are any). */
-		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
+		build_mv_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index b974655..6ea0e13 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -138,7 +138,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool 	build_dependencies = false,
 			build_mcv = false,
-			build_histogram = false;
+			build_histogram = false,
+			build_ndistinct = false;
 
 	int32 	max_buckets = -1,
 			max_mcv_items = -1;
@@ -221,6 +222,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "max_mcv_items") == 0)
@@ -275,10 +278,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv || build_histogram))
+	if (! (build_dependencies || build_mcv || build_histogram || build_ndistinct))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram, ndistinct) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -311,6 +314,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
 	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_ndist_enabled-1] = BoolGetDatum(build_ndistinct);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
@@ -318,6 +322,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+	nulls[Anum_pg_mv_statistic_standist -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c1b8999..2540da9 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
+#define		MV_CLAUSE_TYPE_NDIST	0x08
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -3246,6 +3247,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_NDIST) && stat->ndist_built)
+		return true;
+
 	return false;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 40145e7..328633e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -416,7 +416,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -427,11 +427,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
 				info->hist_enabled = mvstat->hist_enabled;
+				info->ndist_enabled = mvstat->ndist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
 				info->hist_built = mvstat->hist_built;
+				info->ndist_built = mvstat->ndist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 7d0a3a1..a84dd2b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -132,6 +132,7 @@
 #include "utils/fmgroids.h"
 #include "utils/index_selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/nabstime.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
@@ -206,6 +207,7 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static Oid find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3422,12 +3424,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient
+			 * to get the (probably way more accurate) estimate.
+			 *
+			 * XXX Probably needs refactoring (don't like to mix with clamp
+			 *     and coeff at the same time).
 			 */
 			double		clamp = rel->tuples;
+			double		coeff = 1.0;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				Oid oid = find_ndistinct_coeff(root, rel, varinfos);
+
+				if (oid != InvalidOid)
+					coeff = load_mv_ndistinct(oid);
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3436,6 +3452,13 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
+			/*
+			 * Apply ndistinct coefficient from multivar stats (we must do this
+			 * before clamping the estimate in any way.
+			 */
+			reldistinct /= coeff;
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7582,3 +7605,71 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics.
+ */
+static Oid
+find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+	VariableStatData vardata;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+				= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach (lc, rel->mvstatlist)
+	{
+		int i;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/* only exact matches for now (same set of columns) */
+		if (bms_num_members(attnums) != info->stakeys->dim1)
+			continue;
+
+		/* check that the columns match */
+		for (i = 0; i < info->stakeys->dim1; i++)
+			if (bms_is_member(info->stakeys->values[i], attnums))
+				continue;
+
+		return info->mvoid;
+	}
+
+	return InvalidOid;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 9dbb3b6..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o histogram.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.ndistinct b/src/backend/utils/mvstats/README.ndistinct
new file mode 100644
index 0000000..32d1624
--- /dev/null
+++ b/src/backend/utils/mvstats/README.ndistinct
@@ -0,0 +1,83 @@
+ndistinct coefficients
+======================
+
+Estimating number of distinct groups in a combination of columns is tricky,
+and the estimation error is often significant. By ndistinct coefficient we
+mean a ratio
+
+    q = ndistinct(a) * ndistinct(b) / ndistinct(a,b)
+
+where 'a' and 'b' are columns, ndistinct(a) is (an estimate of) a number of
+distinct values in column 'a'. And ndistinct(a,b) is the same thing for the
+pair of columns.
+
+The meaning of the coefficient may be illustrated by answering the following
+question: Given a combination of columns (a,b), how many distinct values of 'b'
+matches a chosen value of 'a' on average?
+
+Let's assume we know ndistinct(a) and ndistinct(a,b). Then the answer to the
+question clearly is
+
+    ndistinct(a,b) / ndistinct(a)
+
+and by using 'q' we may rewrite this as
+
+    ndistinct(b) / q
+
+so 'q' may be considered as a correction factor of the ndistinct estimate given
+a condition on one of the columns.
+
+This may be generalized to a combination of 'n' columns
+
+    [ndistinct(c1) * ... * ndistinct(cn)] / ndistinct(c1, ..., cn)
+
+and the meaning is very similar, except that we need to use conditions on (n-1)
+of the columns.
+
+
+Selectivity estimation
+----------------------
+
+As explained in the previous paragraph, ndistinct coefficients may be used to
+estimate cardinality of a column, given some apriori knowledge. Let's assume
+we need to estimate selectivity of a condition
+
+    (a=1) AND (b=2)
+
+which we can expand like this
+
+    P(a=1 & b=2) = P(a=1) * P(b=2 | a=1)
+
+Let's also assume that the distribution of 'b' is uniform, i.e. that
+
+    P(a=1) = 1/ndistinct(a)
+    P(b=2) = 1/ndistinct(b)
+    P(a=1 & b=2) = 1/ndistinct(a,b)
+
+    P(b=2 | a=1) = ndistinct(a) / ndistinct(a,b)
+
+which may be rewritten like
+
+    P(b=2 | a=1)
+        = ndistinct(a,b) / ndistinct(a)
+        = (1/ndistinct(b)) * [(ndistinct(a) * ndistinct(b)) / ndistinct(a,b)]
+        = (1/ndistinct(b)) * q
+
+and therefore
+
+    P(a=1 & b=2) = (1/ndistinct(a)) * (1/ndistinct(b)) * q
+
+This also illustrates 'q' as a correction coefficient.
+
+It also explains why we store the coefficient and not simply ndistinct(a,b).
+This way we can simply estimate individual clauses and then simply correct
+the estimate by multiplying the result with 'q' - we don't have to mess with
+ndistinct estimates at all.
+
+Naturally, as the coefficient is derives from ndistinct(a,b), it may be also
+used to estimate GROUP BY clauses on the combination of columns, replacing the
+existing heuristics in estimate_num_groups().
+
+Note: Currently only the GROUP BY estimation is implemented. It's a bit unclear
+how to implement the clause estimation when there are other statistics (esp.
+MCV lists and/or functional dependencies) available.
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index d404914..6d4b09b 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -20,6 +20,8 @@ Currently we only have two kinds of multivariate statistics
 
     (c) multivariate histograms (README.histogram)
 
+    (d) ndistinct coefficients
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index ffb76f4..2be980d 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -32,7 +32,8 @@ static List* list_mv_stats(Oid relid);
  * and serializes them back into the catalog (as bytea values).
  */
 void
-build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats)
 {
 	ListCell *lc;
@@ -53,6 +54,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
 		MVHistogram	histogram = NULL;
+		double		ndist	  = -1;
 		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
@@ -92,6 +94,9 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		if (stat->ndist_enabled)
+			ndist = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
+
 		/* build the MCV list */
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
@@ -101,7 +106,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, ndist, attrs, stats);
 	}
 }
 
@@ -183,6 +188,8 @@ list_mv_stats(Oid relid)
 		info->mcv_built = stats->mcv_built;
 		info->hist_enabled = stats->hist_enabled;
 		info->hist_built = stats->hist_built;
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
 
 		result = lappend(result, info);
 	}
@@ -252,7 +259,7 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 void
 update_mv_stats(Oid mvoid,
 				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
-				int2vector *attrs, VacAttrStats **stats)
+				double ndistcoeff, int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -292,26 +299,36 @@ update_mv_stats(Oid mvoid,
 			= PointerGetDatum(data);
 	}
 
+	if (ndistcoeff > 1.0)
+	{
+		nulls[Anum_pg_mv_statistic_standist -1] = false;
+		values[Anum_pg_mv_statistic_standist-1] = Float8GetDatum(ndistcoeff);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_standist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_ndist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_ndist_built-1] = true;
 	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_ndist_built-1] = BoolGetDatum(ndistcoeff > 1.0);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..59b8358
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,171 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <math.h>
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats)
+{
+	int i, j;
+	int f1, cnt, d;
+	int nmultiple, summultiple;
+	int numattrs = attrs->dim1;
+	MultiSortSupport mss = multi_sort_init(numattrs);
+	double ndistcoeff;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	Assert(numattrs >= 2);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, i, stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[i],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	ndistcoeff = 1 / estimate_ndistinct(totalrows, numrows, d, f1);
+
+	/*
+	 * now count distinct values for each attribute and incrementally
+	 * compute ndistinct(a,b) / (ndistinct(a) * ndistinct(b))
+	 *
+	 * FIXME Probably need to handle cases when one of the ndistinct
+	 *       estimates is negative, and also check that the combined
+	 *       ndistinct is greater than any of those partial values.
+	 */
+	for (i = 0; i < numattrs; i++)
+		ndistcoeff *= stats[i]->stadistinct;
+
+	return ndistcoeff;
+}
+
+double
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return DatumGetFloat8(deps);
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double	numer,
+			denom,
+			ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+			(double) f1 * (double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index a5945af..ee353da 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,6 +39,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
 	bool		hist_enabled;		/* build histogram? */
+	bool		ndist_enabled;		/* build ndist coefficient? */
 
 	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
@@ -48,6 +49,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
 	bool		hist_built;			/* histogram was built */
+	bool		ndist_built;		/* ndistinct coeff built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -56,6 +58,7 @@ CATALOG(pg_mv_statistic,3381)
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
 	bytea		stahist;			/* MV histogram (serialized) */
+	float8		standcoeff;			/* ndistinct coeff (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -71,21 +74,24 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					15
+#define Natts_pg_mv_statistic					18
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_deps_enabled		4
 #define Anum_pg_mv_statistic_mcv_enabled		5
 #define Anum_pg_mv_statistic_hist_enabled		6
-#define Anum_pg_mv_statistic_mcv_max_items		7
-#define Anum_pg_mv_statistic_hist_max_buckets	8
-#define Anum_pg_mv_statistic_deps_built			9
-#define Anum_pg_mv_statistic_mcv_built			10
-#define Anum_pg_mv_statistic_hist_built			11
-#define Anum_pg_mv_statistic_stakeys			12
-#define Anum_pg_mv_statistic_stadeps			13
-#define Anum_pg_mv_statistic_stamcv				14
-#define Anum_pg_mv_statistic_stahist			15
+#define Anum_pg_mv_statistic_ndist_enabled		7
+#define Anum_pg_mv_statistic_mcv_max_items		8
+#define Anum_pg_mv_statistic_hist_max_buckets	9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_ndist_built		13
+#define Anum_pg_mv_statistic_stakeys			14
+#define Anum_pg_mv_statistic_stadeps			15
+#define Anum_pg_mv_statistic_stamcv				16
+#define Anum_pg_mv_statistic_stahist			17
+#define Anum_pg_mv_statistic_standist			18
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 8c50bfb..1923f2b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -655,11 +655,13 @@ typedef struct MVStatisticInfo
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
 	bool		hist_enabled;	/* histogram enabled */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
 	bool		hist_built;		/* histogram built */
+	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 35b2f8e..fb2c5d8 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -225,6 +225,7 @@ typedef MVSerializedHistogramData *MVSerializedHistogram;
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
 MVSerializedHistogram    load_mv_histogram(Oid mvoid);
+double		   load_mv_ndistinct(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
@@ -266,11 +267,17 @@ MVHistogram
 build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
 				   VacAttrStats **stats, int numrows_total);
 
-void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, double totalrows,
+					int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid, MVDependencies dependencies,
 					 MCVList mcvlist, MVHistogram histogram,
+					 double ndistcoeff,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #ifdef DEBUG_MVHIST
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1a1a4ca..0ad935e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1377,7 +1377,8 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stamcv) AS mcvbytes,
     pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
     length(s.stahist) AS histbytes,
-    pg_mv_stats_histogram_info(s.stahist) AS histinfo
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo,
+    s.standcoeff AS ndcoeff
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
-- 
2.1.0

0008-change-how-we-apply-selectivity-to-number-of-groups-.patchtext/x-patch; charset=UTF-8; name=0008-change-how-we-apply-selectivity-to-number-of-groups-.patchDownload

From 19fae36e03b6e2b4cd2ea1702ffbe9676c0aca52 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 26 Jan 2016 18:14:33 +0100
Subject: [PATCH 8/9] change how we apply selectivity to number of groups
 estimate

Instead of simply multiplying the ndistinct estimate with selecticity,
we instead use the formula for the expected number of distinct values
observed in 'k' rows when there are 'd' distinct values in the bin

    d * (1 - ((d - 1) / d)^k)

This is 'with replacements' which seems appropriate for the use, and it
mostly assumes uniform distribution of the distinct values. So if the
distribution is not uniform (e.g. there are very frequent groups) this
may be less accurate than the current algorithm in some cases, giving
over-estimates. But that's probably better than OOM.
---
 src/backend/utils/adt/selfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index a84dd2b..ce3ad19 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3465,7 +3465,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			/*
 			 * Multiply by restriction selectivity.
 			 */
-			reldistinct *= rel->rows / rel->tuples;
+			reldistinct = reldistinct * (1 - powl((reldistinct - 1) / reldistinct,rel->rows));
 
 			/*
 			 * Update estimate of total distinct groups.
-- 
2.1.0

0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchtext/x-patch; charset=UTF-8; name=0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchDownload

From d37345b7e2a8868c5ca44507c3402affaaa0cb07 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 28 Feb 2016 21:16:40 +0100
Subject: [PATCH 9/9] fixup of regression tests (plans changes by group by
 estimation)

---
 src/test/regress/expected/join.out      | 18 ++++++++++--------
 src/test/regress/expected/subselect.out | 25 +++++++++++--------------
 src/test/regress/expected/union.out     | 16 ++++++++--------
 3 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cafbc5e..151402d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3965,18 +3965,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
+                 QUERY PLAN                  
+---------------------------------------------
  Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+   Merge Cond: (s.id = d.a)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+(11 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index de64ca7..0fc93d9 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -807,27 +807,24 @@ select * from int4_tbl where
 explain (verbose, costs off)
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
-                              QUERY PLAN                              
-----------------------------------------------------------------------
- Hash Join
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Hash Semi Join
    Output: o.f1
    Hash Cond: (o.f1 = "ANY_subquery".f1)
    ->  Seq Scan on public.int4_tbl o
          Output: o.f1
    ->  Hash
          Output: "ANY_subquery".f1, "ANY_subquery".g
-         ->  HashAggregate
+         ->  Subquery Scan on "ANY_subquery"
                Output: "ANY_subquery".f1, "ANY_subquery".g
-               Group Key: "ANY_subquery".f1, "ANY_subquery".g
-               ->  Subquery Scan on "ANY_subquery"
-                     Output: "ANY_subquery".f1, "ANY_subquery".g
-                     Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
-                     ->  HashAggregate
-                           Output: i.f1, (generate_series(1, 2) / 10)
-                           Group Key: i.f1
-                           ->  Seq Scan on public.int4_tbl i
-                                 Output: i.f1
-(18 rows)
+               Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+               ->  HashAggregate
+                     Output: i.f1, (generate_series(1, 2) / 10)
+                     Group Key: i.f1
+                     ->  Seq Scan on public.int4_tbl i
+                           Output: i.f1
+(15 rows)
 
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 016571b..f2e297e 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -263,16 +263,16 @@ ORDER BY 1;
 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
         q2        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
         q2        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;
@@ -305,16 +305,16 @@ SELECT q1 FROM int8_tbl EXCEPT SELECT q2 FROM int8_tbl;
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
         q1        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q1 FROM int8_tbl FOR NO KEY UPDATE;
@@ -343,8 +343,8 @@ SELECT f1 FROM float8_tbl EXCEPT SELECT f1 FROM int4_tbl ORDER BY 1;
 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -355,15 +355,15 @@ SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FR
 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -419,8 +419,8 @@ HINT:  There is a column named "q2" in table "*SELECT* 2", but it cannot be refe
 SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 --
-- 
2.1.0

#74

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#73)

Re: multivariate statistics v11

Hi,

I gave a very quick skim to patch 0002. Not a real review yet. But
there are a few trivial points to fix:

* You still have empty sections in the SGML docs (such as the EXAMPLES).
I suppose the syntax is now firm enough that we can get some. (I looked
at the other patches to see whether it was filled in, but couldn't find
any additional text there.)

* check_object_ownership() needs to be filled in

* Since you're adding a new object type, please add a case to cover it
in the object_address.sql pg_regress test.

* in analyze.c (and elsewhere), please put new #include lines sorted.

* I think the AT_PASS_ADD_STATS is a leftover which should be removed.

* The XXX comment in get_relation_info should probably be handled
differently (namely, in a way that makes the syscache not contain OIDs
of dropped stats)

* The README.dependencies has a lot of TODOs. Do we need to get them
done during the first cut? If not, I suggest creating a new section
"Future work" in the file.

* Please put the common.h header in src/include. Make sure not to
include "postgres.h" in it -- our policy is that postgres.h goes at the
top of every .c file and never in any .h file. Also please find a
better name for it; even mvstats_common.h would be a lot more
convincing. However:

* ISTM that the code in common.c properly belongs in
src/backend/catalog/pg_mvstats.c instead (or more properly
catalog/pg_mv_statistics.c), which probably means the common.h file
should be named something else; perhaps some of it could become
pg_mv_statistic_fn.h, while the rest continues to be
src/include/utils/mvstats_common.h? Not sure.

* The version check in psql/describe.c uses 90500; should probably be
updated to 90600.

* _copyCreateStatsStmt is missing if_not_exists

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Alvaro Herrera (#74)

9 attachment(s)

Re: multivariate statistics v14

Hi,

thanks for the feedback. Attached is v14 of the patch series, fixing
most of the points you've raised.

On Wed, 2016-03-09 at 09:22 -0300, Alvaro Herrera wrote:

Hi,

I gave a very quick skim to patch 0002. Not a real review yet. But
there are a few trivial points to fix:

* You still have empty sections in the SGML docs (such as the EXAMPLES).
I suppose the syntax is now firm enough that we can get some. (I looked
at the other patches to see whether it was filled in, but couldn't find
any additional text there.)

Yes, that's one of the items I plan to work on next. Until now the
regression tests were a sufficient source of examples, but it's time to
do the SGML piece.

* check_object_ownership() needs to be filled in

Done.

I've added pg_statistics_ownercheck, which also required adding OID of
the owner to the catalog. Initially the plan was to use the same owner
as for the table, but now that we've switched to CREATE STATISTICS
partially because it will allow multi-table stats, that does not make
sense (multiple tables with different owners).

This probably means we also need an 'ALTER STATISTICS ... OWNER TO'
command, which does not exist at this point.

* Since you're adding a new object type, please add a case to cover it
in the object_address.sql pg_regress test.

Done.

Apparently there was a bunch of missing pieces in objectaddress.c, so
this adds them too.

* in analyze.c (and elsewhere), please put new #include lines sorted.

Done.

I've also significantly reduced the excessive list of includes in
statscmds.c. I expect the headers to require a bit more love, especially
in the subsequent patches (MCV, histograms etc.).

* I think the AT_PASS_ADD_STATS is a leftover which should be removed.

Yeah. Now that we've invented CREATE TABLE, all the changes to
tablecmds.c were just unnecessary leftovers. Removed.

* The XXX comment in get_relation_info should probably be handled
differently (namely, in a way that makes the syscache not contain OIDs
of dropped stats)

I believe that was actually an obsolete comment. Removed.

* The README.dependencies has a lot of TODOs. Do we need to get them
done during the first cut? If not, I suggest creating a new section
"Future work" in the file.

Right. Most of those TODOs are future work, or rather ideas (more or
less crazy). The one thing I definitely want to address now is support
for dependencies with multiple columns on the left side, because that
requires changes to serialized format. I might also look at handling IS
NULL clauses, but that may wait.

* Please put the common.h header in src/include. Make sure not to
include "postgres.h" in it -- our policy is that postgres.h goes at the
top of every .c file and never in any .h file. Also please find a
better name for it; even mvstats_common.h would be a lot more
convincing. However:

* ISTM that the code in common.c properly belongs in
src/backend/catalog/pg_mvstats.c instead (or more properly
catalog/pg_mv_statistics.c), which probably means the common.h file
should be named something else; perhaps some of it could become
pg_mv_statistic_fn.h, while the rest continues to be
src/include/utils/mvstats_common.h? Not sure.

Hmmm, not sure either. The idea was that the "common.h" is pretty much
just a private header with stuff that's not very useful anywhere else.

No changes here, for now.

* The version check in psql/describe.c uses 90500; should probably be
updated to 90600.

Fixed.

* _copyCreateStatsStmt is missing if_not_exists

Fixed.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchtext/x-patch; charset=UTF-8; name=0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchDownload

From 5c28e5ca8feb2c2010d98bc69de952355bd6f3a5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/9] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index dff52c4..80d01bd 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -197,6 +197,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -245,6 +252,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.1.0

0002-shared-infrastructure-and-functional-dependencies.patchtext/x-patch; charset=UTF-8; name=0002-shared-infrastructure-and-functional-dependencies.patchDownload

From 1c42a02189088ba194e30f5878bb67bc61953a11 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/9] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate stats, most
importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON table (columns) WITH (options)
- DROP STATISTICS name
- implementation of functional dependencies (the simplest type of
  multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e. it does not
influence the query planning (subject to follow-up patches).

The current implementation requires a valid 'ltopr' for the columns, so
that we can sort the sample rows in various ways, both in this patch
and other kinds of statistics. Maybe this restriction could be relaxed
in the future, requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV list with
limited functionality) might be made to work with hashes of the values,
which is sufficient for equality comparisons. But the queries would
require the equality operator anyway, so it's not really a weaker
requirement. The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple and probably
needs improvements, so that it detects more complicated dependencies,
and also validation of the math.

The name 'functional dependencies' is more correct (than 'association
rules') as it's exactly the name used in relational theory (esp. Normal
Forms) for tracking column-level dependencies.

The multivariate statistics are automatically removed in two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics would be
     defined on less than 2 columns (remaining)

If there are more at least two remaining columns, we keep the
statistics but perform cleanup on the next ANALYZE. The dropped columns
are removed from stakeys, and the new statistics is built on the
smaller set.

We can't do this at DROP COLUMN, because that'd leave us with invalid
statistics, or we'd have to throw it away although we can still use it.
This lazy approach lets us use the statistics although some of the
columns are dead.

This also adds a simple list of statistics to \d in psql.

This means the statistics are created within a schema by using a
qualified name (or using the default schema)

   CREATE STATISTICS schema.statistics ON ...

and then dropped by specifying qualified name

   DROP STATISTICS schema.statistics

or searching through search_path (just like with other objects).

This also gets rid of the "(opt_)stats_name" definitions in gram.y and
instead replaces them with just "opt_any_name", although the optional
case is not really handled currently - there's no generated name yet
(so either we should drop it or implement it).

I'm not entirely sure making statistics schema-specific is that a great
idea. Maybe it should be "global", but that does not seem right (e.g.
it makes multi-tenant systems based on schemas more difficult to
manage, because tenants would interact).
---
 doc/src/sgml/ref/allfiles.sgml                |   2 +
 doc/src/sgml/ref/create_statistics.sgml       | 174 ++++++++++
 doc/src/sgml/ref/drop_statistics.sgml         |  90 ++++++
 doc/src/sgml/reference.sgml                   |   2 +
 src/backend/catalog/Makefile                  |   1 +
 src/backend/catalog/aclchk.c                  |  27 ++
 src/backend/catalog/dependency.c              |  11 +-
 src/backend/catalog/heap.c                    | 102 ++++++
 src/backend/catalog/namespace.c               |  51 +++
 src/backend/catalog/objectaddress.c           |  54 ++++
 src/backend/catalog/system_views.sql          |  11 +
 src/backend/commands/Makefile                 |   6 +-
 src/backend/commands/analyze.c                |  21 ++
 src/backend/commands/dropcmds.c               |   4 +
 src/backend/commands/event_trigger.c          |   3 +
 src/backend/commands/statscmds.c              | 266 ++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |  17 +
 src/backend/nodes/outfuncs.c                  |  18 ++
 src/backend/optimizer/util/plancat.c          |  59 ++++
 src/backend/parser/gram.y                     |  34 +-
 src/backend/tcop/utility.c                    |  11 +
 src/backend/utils/Makefile                    |   2 +-
 src/backend/utils/cache/relcache.c            |  59 ++++
 src/backend/utils/cache/syscache.c            |  23 ++
 src/backend/utils/mvstats/Makefile            |  17 +
 src/backend/utils/mvstats/README.dependencies | 222 +++++++++++++
 src/backend/utils/mvstats/common.c            | 356 +++++++++++++++++++++
 src/backend/utils/mvstats/common.h            |  75 +++++
 src/backend/utils/mvstats/dependencies.c      | 437 ++++++++++++++++++++++++++
 src/bin/psql/describe.c                       |  44 +++
 src/include/catalog/dependency.h              |   5 +-
 src/include/catalog/heap.h                    |   1 +
 src/include/catalog/indexing.h                |   7 +
 src/include/catalog/namespace.h               |   2 +
 src/include/catalog/pg_mv_statistic.h         |  75 +++++
 src/include/catalog/pg_proc.h                 |   5 +
 src/include/catalog/toasting.h                |   1 +
 src/include/commands/defrem.h                 |   4 +
 src/include/nodes/nodes.h                     |   2 +
 src/include/nodes/parsenodes.h                |  12 +
 src/include/nodes/relation.h                  |  28 ++
 src/include/utils/acl.h                       |   1 +
 src/include/utils/mvstats.h                   |  70 +++++
 src/include/utils/rel.h                       |   4 +
 src/include/utils/relcache.h                  |   1 +
 src/include/utils/syscache.h                  |   2 +
 src/test/regress/expected/object_address.out  |   7 +-
 src/test/regress/expected/rules.out           |   9 +
 src/test/regress/expected/sanity_check.out    |   1 +
 src/test/regress/sql/object_address.sql       |   4 +-
 50 files changed, 2429 insertions(+), 11 deletions(-)
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/README.dependencies
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index bf95453..c0f7653 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -76,6 +76,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
 <!ENTITY createTableSpace   SYSTEM "create_tablespace.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
 <!ENTITY dropTransform      SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..a86eae3
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,174 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON <replaceable class="PARAMETER">table_name</replaceable> ( [
+  { <replaceable class="PARAMETER">column_name</replaceable> } ] [, ...])
+[ WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )</literal></term>
+    <listitem>
+     <para>
+      ...
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Statistics Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>statistics parameters</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-notes">
+  <title>Notes</title>
+
+    <para>
+     ...
+    </para>
+
+ </refsect1>
+
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..4cc0b70
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,90 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics does not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 03020df..2b07b2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -104,6 +104,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createTable;
    &createTableAs;
    &createTableSpace;
@@ -147,6 +148,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropTable;
    &dropTableSpace;
    &dropTSConfig;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 25130ec..058b8a9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 0f3bc07..e21aacd 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -38,6 +38,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -5021,6 +5022,32 @@ pg_extension_ownercheck(Oid ext_oid, Oid roleid)
 }
 
 /*
+ * Ownership check for a multivariate statistics (specified by OID).
+ */
+bool
+pg_statistics_ownercheck(Oid stat_oid, Oid roleid)
+{
+	HeapTuple	tuple;
+	Oid			ownerId;
+
+	/* Superusers bypass all permission checking. */
+	if (superuser_arg(roleid))
+		return true;
+
+	tuple = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(stat_oid));
+	if (!HeapTupleIsValid(tuple))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics with OID %u does not exist", stat_oid)));
+
+	ownerId = ((Form_pg_mv_statistic) GETSTRUCT(tuple))->staowner;
+
+	ReleaseSysCache(tuple);
+
+	return has_privs_of_role(roleid, ownerId);
+}
+
+/*
  * Check whether specified role has CREATEROLE privilege (or is a superuser)
  *
  * Note: roles do not have owners per se; instead we use this test in
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index c48e37b..8200454 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -160,7 +161,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1272,6 +1274,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2415,6 +2421,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 6a4a9d9..e7d9aaa 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_constraint_fn.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1613,7 +1614,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1841,6 +1845,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2696,6 +2705,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..dfd5bef 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4201,3 +4201,54 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+									  PointerGetDatum(stats_name),
+									  ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index d2aaa6d..85841e1 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -438,9 +439,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		InvalidAttrNumber,		/* XXX same owner as relation */
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -640,6 +654,10 @@ static const struct object_type_map
 	/* OCLASS_TRANSFORM */
 	{
 		"transform", OBJECT_TRANSFORM
+	},
+	/* OBJECT_STATISTICS */
+	{
+		"statistics", OBJECT_STATISTICS
 	}
 };
 
@@ -913,6 +931,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2185,6 +2208,10 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			if (!pg_statistics_ownercheck(address.objectId, roleid))
+				aclcheck_error_type(ACLCHECK_NOT_OWNER, address.objectId);
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
@@ -3610,6 +3637,10 @@ getObjectTypeDescription(const ObjectAddress *object)
 			appendStringInfoString(&buffer, "transform");
 			break;
 
+		case OCLASS_STATISTICS:
+			appendStringInfoString(&buffer, "statistics");
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized %u", object->classId);
 			break;
@@ -4566,6 +4597,29 @@ getObjectIdentityParts(const ObjectAddress *object,
 			}
 			break;
 
+		case OCLASS_STATISTICS:
+			{
+				HeapTuple	tup;
+				Form_pg_mv_statistic formStatistic;
+				char	   *schema;
+
+				tup = SearchSysCache1(MVSTATOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for statistics %u",
+						 object->objectId);
+				formStatistic = (Form_pg_mv_statistic) GETSTRUCT(tup);
+				schema = get_namespace_name_or_temp(formStatistic->stanamespace);
+				appendStringInfoString(&buffer,
+									   quote_qualified_identifier(schema,
+											   NameStr(formStatistic->staname)));
+				if (objname)
+					*objname = list_make2(schema,
+									   pstrdup(NameStr(formStatistic->staname)));
+				ReleaseSysCache(tup);
+				break;
+			}
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index abf9a70..b8a264e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,6 +158,17 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..5151001 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8a5f07c..9087532 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -17,6 +17,7 @@
 #include <math.h>
 
 #include "access/multixact.h"
+#include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tupconvert.h"
 #include "access/tuptoaster.h"
@@ -27,6 +28,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -45,10 +47,13 @@
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_rusage.h"
 #include "utils/sampling.h"
 #include "utils/sortsupport.h"
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..cd65b58 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -292,6 +292,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 9e32f8d..09061bb 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -110,6 +110,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1167,6 +1169,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_DEFACL:
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..1b89bbe
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,266 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/mvstats.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON table (columns) WITH (options)
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+	ObjectAddress	address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	ObjectAddress parentobject, childobject;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (stmt->if_not_exists &&
+		SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exists, skipping",
+						namestr)));
+		return InvalidObjectAddress;
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+
+	/* transform the column names to attnum values */
+
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, stmt->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+	values[Anum_pg_mv_statistic_staname -1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace -1] = ObjectIdGetDatum(namespaceId);
+	values[Anum_pg_mv_statistic_staowner-1] = ObjectIdGetDatum(GetUserId());
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also record dependency on the schema (to drop statistics on DROP SCHEMA)
+	 */
+	parentobject.classId = NamespaceRelationId;
+	parentobject.objectId = ObjectIdGetDatum(namespaceId);
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *     DROP STATISTICS stats_name ON table_name
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	HeapTuple	tup;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..3b7c87f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4124,6 +4124,20 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt  *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+	COPY_SCALAR_FIELD(if_not_exists);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4999,6 +5013,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb0fc1e..07206d7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2153,6 +2153,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3636,6 +3651,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index ad715bb..7fb2088 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -28,6 +28,7 @@
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -40,7 +41,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -94,6 +97,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -387,6 +391,61 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b307b48..3be3f02 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -241,7 +241,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -809,6 +809,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3436,6 +3437,36 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $5;
+							n->keys = $7;
+							n->options = $9;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $8;
+							n->keys = $10;
+							n->options = $12;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -5621,6 +5652,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 045f7f0..2ba88e2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1520,6 +1520,10 @@ ProcessUtilitySlow(Node *parsetree,
 				address = ExecSecLabelStmt((SecLabelStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:	/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -2160,6 +2164,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_TRANSFORM:
 					tag = "DROP TRANSFORM";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2527,6 +2534,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 130c06d..3bc4c8a 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3956,6 +3957,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4920,6 +4977,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 65ffe84..3c1bc4b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -502,6 +503,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
new file mode 100644
index 0000000..1f96fbc
--- /dev/null
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -0,0 +1,222 @@
+Soft functional dependencies
+============================
+
+A type of multivariate statistics used to capture cases when one column (or
+possibly a combination of columns) determines values in another column. We may
+also say that one column implies the other one.
+
+A simple artificial example may be a table with two columns, created like this
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, once we know the value for column 'a' the value for 'b' is trivially
+determined, as it's simply (a/10). A more practical example may be addresses,
+where (ZIP code -> city name), i.e. once we know the ZIP, we probably know the
+city it belongs to, as ZIP codes are usually assigned to one city. Larger cities
+may have multiple ZIP codes, so the dependency can't be reversed.
+
+Functional dependencies are a concept well described in relational theory,
+particularly in definition of normalization and "normal forms". Wikipedia has a
+nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency on
+    a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee" table
+    that includes the attributes "Employee ID" and "Employee Date of Birth", the
+    functional dependency {Employee ID} -> {Employee Date of Birth} would hold.
+    It follows from the previous two sentences that each {Employee ID} is
+    associated with precisely one {Employee Date of Birth}.
+
+    [1] http://en.wikipedia.org/wiki/Database_normalization
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases it's actually a conscious
+design choice to model the dataset in denormalized way, either because of
+performance or to make querying easier.
+
+The functional dependencies are called 'soft' because the implementation is
+meant to allow small number of rows contradicting the dependency. Many actual
+data sets contain some sort of errors, either because of data entry mistakes
+(user mistyping the ZIP code) or issues in generating the data (e.g. a ZIP code
+mistakenly assigned to two cities in different states). A strict implementation
+would ignore dependencies on such noisy data, rendering the approach unusable on
+such data sets.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current build algorithm is rather simple - for each pair (a,b) of columns,
+the data are sorted lexicographically (first by 'a', then by 'b'). Then for each
+group (rows with the same 'a' value) we decide whether the group is neutral,
+supporting or contradicting the dependency (a->b).
+
+A group is considered neutral when it's too small - e.g. when there's a single
+row in the group, there can't possibly be multiple values in 'b'. For this
+reason we ignore groups smaller than a threshold (currently 3 rows).
+
+For sufficiently large groups (3 rows or more), we count the number of distinct
+values in 'b'. When there's a single 'b' value, the group is considered to
+support the dependency (a->b), otherwise it's condidered as contradicting it.
+
+At the end, we compare the number of rows in supporting and contradicting groups,
+and if there are at least 10x as many supporting rows, we consider the
+functional dependency to be valid.
+
+
+This approach has the negative property that the algorithm is that it's a bit
+fragile with respect to the sample - there may be data sets producing quite
+different results for each ANALYZE execution (as even a single row may change
+the outcome of the final 10x test).
+
+It was proposed to make the dependencies "fuzzy" - e.g. track some coefficient
+between [0,1] determining how much the dependency holds. That would however mean
+we have to keep all the dependencies, as eliminating them based on the value of
+the coefficient (e.g. throw away dependencies <= 0.5) would result in exactly
+the same fragility issues. This would also make it more complicated to combine
+dependencies. So this does not seem like a practical approach.
+
+A better approach might be to replace the constants (min_group_size=3 and 10x)
+with values somehow related to the particular data set.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Apllying the functional dependencies is quite simple - given a list of equality
+clauses, check which clauses are redundant (i.e. implied by some other clause).
+For example given clause list
+
+    (a = 2) AND (b = 2) AND (c = 3)
+
+and dependencies (a->b) and (a->d), the list of clauses may be simplified to
+
+    (a = 1) AND (c = 3)
+
+Functional dependencies may only be applied to equality clauses, all other types
+of clauses are ignored. See clauselist_apply_dependencies() for more details.
+
+
+Compatibility of clauses
+------------------------
+
+The reduction assumes the clauses really are redundant, and the value in the
+reduced clause (b=2) is the value determined by (a=1). If that's not the case
+and the values are "incompatible" the result will be over-estimation.
+
+This may happen for example when using conditions on ZIP and city name with
+mismatching values (ZIP for a different city), etc. In such case the result
+set will be empty, but we'll estimate the selectivity using the ZIP condition.
+
+In this case the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+
+Dependencies vs. MCV/histogram
+------------------------------
+
+In some cases the "compatibility" of the conditions might be verified using the
+other types of multivariate stats - MCV lists and histograms.
+
+For MCV lists the verification might be very simple - peek into the list if
+there are any items matching the clause on the 'a' column (e.g. ZIP code), and
+if such item is found, check that the 'b' column matches the other clause. If it
+does not, the clauses are contradictory. We can't really say if such item was
+not found, except maybe restricting the selectivity using the MCV data (e.g.
+using min/max selectivity, or something).
+
+With histograms, it might work similarly - we can't check the values directly
+(because histograms use buckets, unlike MCV lists, storing the actual values).
+So we can only observe the buckets matching the clauses - if those buckets have
+very low frequency, it probably means the two clauses are incompatible.
+
+It's unclear what 'low frequency' is, but if one of the clauses is implied
+(automatically true because of the other clause), then
+
+    selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+
+So we might compute selectivity of the first clause - for example using regular
+statistics. And then check if the selectivity computed from the histogram is
+about the same (or significantly lower).
+
+The problem is that histograms work well only when the data ordering matches the
+natural meaning. For values that serve as labels - like city names or ZIP codes,
+or even generated IDs, histograms really don't work all that well. For example
+sorting cities by name won't match the sorting of ZIP codes, rendering the
+histogram unusable.
+
+So MCVs are probably going to work much better, because they don't really assume
+any sort of ordering. And it's probably more appropriate for the label-like data.
+
+A good question however is why even use functional dependencies in such cases
+and not simply use the MCV/histogram instead. One reason is that the functional
+dependencies allow fallback to regular stats, and often produce more accurate
+estimates - especially compared to histograms, that are quite bad in estimating
+equality clauses.
+
+
+Limitations
+-----------
+
+Let's see the main liminations of functional dependencies, especially those
+related to the current implementation.
+
+The current implementation supports only dependencies between two columns, but
+this is merely a simplification of the initial implementation. It's certainly
+useful to mine for dependencies involving multiple columns on the 'left' side,
+i.e. a condition for the dependency. That is dependencies like (a,b -> c).
+
+The implementation may/should be smart enough not to mine redundant conditions,
+e.g. (a->b) and (a,c -> b), because the latter is a trivial consequence of the
+former one (if values of 'a' determine 'b', adding another column won't change
+that relationship). The ANALYZE should first analyze 1:1 dependencies, then 2:1
+dependencies (and skip the already identified ones), etc.
+
+For example the dependency
+
+    (city name -> zip code)
+
+is much stronger, i.e. whenever it hold, then
+
+    (city name, state name -> zip code)
+
+holds too. But in case there are cities with the same name in different states,
+then only the latter dependency will be valid.
+
+Of course, there probably are cities with the same name within a single state,
+but hopefully this is relatively rare occurence (and thus we'll still detect
+the 'soft' dependency).
+
+Handling multiple columns on the right side of the dependency, is not necessary,
+as those dependencies may be simply decomposed into a set of dependencies with
+the same meaning, one for each column on the right side. For example
+
+    (a -> b,c)
+
+is exactly the same as
+
+    (a -> b) & (a -> c)
+
+Of course, storing the first form may be more efficient thant storing multiple
+'simple' dependencies separately.
+
+
+TODO Support dependencies with multiple columns on left/right.
+
+TODO Investigate using histogram and MCV list to verify the dependencies.
+
+TODO Investigate statistical testing of the distribution (to decide whether it
+     makes sense to build the histogram/MCV list).
+
+TODO Using a min/max of selectivities would probably make more sense for the
+     associated columns.
+
+TODO Consider eliminating the implied columns from the histogram and MCV lists
+     (but maybe that's not a good idea, because that'd make it impossible to use
+     these stats for non-equality clauses and also it wouldn't be possible to
+     use the stats for verification of the dependencies).
+
+TODO The reduction probably might be extended to also handle IS NULL clauses,
+     assuming we fix the ANALYZE to properly handle NULL values. We however
+     won't be able to reduce IS NOT NULL (unless I'm missing something).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..d96422d
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/sysattr.h"
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/fmgroids.h"
+#include "utils/mvstats.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..2a064a0
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,437 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Detect functional dependencies between columns.
+ *
+ * TODO This builds a complete set of dependencies, i.e. including transitive
+ *      dependencies - if we identify [A => B] and [B => C], we're likely to
+ *      identify [A => C] too. It might be better to  keep only the minimal set
+ *      of dependencies, i.e. prune all the dependencies that we can recreate
+ *      by transivitity.
+ * 
+ *      There are two conceptual ways to do that:
+ * 
+ *      (a) generate all the rules, and then prune the rules that may be
+ *          recteated by combining other dependencies, or
+ * 
+ *      (b) performing the 'is combination of other dependencies' check before
+ *          actually doing the work
+ * 
+ *      The second option has the advantage that we don't really need to perform
+ *      the sort/count. It's not sufficient alone, though, because we may
+ *      discover the dependencies in the wrong order. For example we may find
+ *
+ *          (a -> b), (a -> c) and then (b -> c)
+ *
+ *      None of those dependencies is a combination of the already known ones,
+ *      yet (a -> C) is a combination of (a -> b) and (b -> c).
+ *
+ * 
+ * FIXME Currently we simply replace NULL values with 0 and then handle is as
+ *       a regular value, but that groups NULL and actual 0 values. That's
+ *       clearly incorrect - we need to handle NULL values as a separate value.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct values in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we can estimate the
+	 *      average group size and use it as a threshold. Or something
+	 *      like that. Seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index fd8dc91..8ce9c0e 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2104,6 +2104,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90600)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 049bf9f..12211fe 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,10 +153,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 
 /* in dependency.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index b80d8d8..5ae42f7 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ab2c1a8..a768bb5 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId  3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..44cf9c6 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -137,6 +137,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..c74af47
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;			/* relation containing attributes */
+	NameData	staname;			/* statistics name */
+	Oid			stanamespace;		/* OID of namespace containing this statistics */
+	Oid			staowner;			/* statistics owner */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					8
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_staowner			4
+#define Anum_pg_mv_statistic_deps_enabled		5
+#define Anum_pg_mv_statistic_deps_built			6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_stadeps			8
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cbbb883..eecce40 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2666,6 +2666,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index b7a38ce..a52096b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3577, 3578);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 54f67e9..99a6a62 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -75,6 +75,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(List *name, List *args, bool oldstyle,
 				List *parameters, const char *queryString);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fad9988..545b62a 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -266,6 +266,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -401,6 +402,7 @@ typedef enum NodeTag
 	T_CreatePolicyStmt,
 	T_AlterPolicyStmt,
 	T_CreateTransformStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fd0629..e1807fb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -601,6 +601,17 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+	bool		if_not_exists;	/* just do nothing if statistics already exists? */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1410,6 +1421,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 641728b..e10dcf1 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -539,6 +539,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -634,6 +635,33 @@ typedef struct IndexOptInfo
 	void		(*amcostestimate) ();	/* AM's cost estimator */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/acl.h b/src/include/utils/acl.h
index 4e15a14..3e11253 100644
--- a/src/include/utils/acl.h
+++ b/src/include/utils/acl.h
@@ -330,6 +330,7 @@ extern bool pg_foreign_data_wrapper_ownercheck(Oid srv_oid, Oid roleid);
 extern bool pg_foreign_server_ownercheck(Oid srv_oid, Oid roleid);
 extern bool pg_event_trigger_ownercheck(Oid et_oid, Oid roleid);
 extern bool pg_extension_ownercheck(Oid ext_oid, Oid roleid);
+extern bool pg_statistics_ownercheck(Oid stat_oid, Oid roleid);
 extern bool has_createrole_privilege(Oid roleid);
 extern bool has_bypassrls_privilege(Oid roleid);
 
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..7ebd961
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..8771f9c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -61,6 +61,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -93,6 +94,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 1b48304..9f03c8d 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..0e0658d 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index 75751be..eb60960 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -35,6 +35,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regtest_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
+CREATE STATISTICS addr_nsp.gentable_stat ON addr_nsp.gentable(a,b) WITH (dependencies);
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
@@ -373,7 +374,8 @@ WITH objects (type, name, args) AS (VALUES
 				-- extension
 				-- event trigger
 				('policy', '{addr_nsp, gentable, genpol}', '{}'),
-				('transform', '{int}', '{sql}')
+				('transform', '{int}', '{sql}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
@@ -420,13 +422,14 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
  trigger                   |            |                   | t on addr_nsp.gentable                                               | t
  operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops USING btree                                   | t
  policy                    |            |                   | genpol on addr_nsp.gentable                                          | t
+ statistics                | addr_nsp   | gentable_stat     | addr_nsp.gentable_stat                                               | t
  collation                 | pg_catalog | "default"         | pg_catalog."default"                                                 | t
  transform                 |            |                   | for integer on language sql                                          | t
  text search dictionary    | addr_nsp   | addr_ts_dict      | addr_nsp.addr_ts_dict                                                | t
  text search parser        | addr_nsp   | addr_ts_prs       | addr_nsp.addr_ts_prs                                                 | t
  text search configuration | addr_nsp   | addr_ts_conf      | addr_nsp.addr_ts_conf                                                | t
  text search template      | addr_nsp   | addr_ts_temp      | addr_nsp.addr_ts_temp                                                | t
-(41 rows)
+(42 rows)
 
 ---
 --- Cleanup resources
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 81bc5c9..84b4425 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1368,6 +1368,15 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index 68e7cb0..3775b28 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -39,6 +39,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regtest_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
+CREATE STATISTICS addr_nsp.gentable_stat ON addr_nsp.gentable(a,b) WITH (dependencies);
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
@@ -166,7 +167,8 @@ WITH objects (type, name, args) AS (VALUES
 				-- extension
 				-- event trigger
 				('policy', '{addr_nsp, gentable, genpol}', '{}'),
-				('transform', '{int}', '{sql}')
+				('transform', '{int}', '{sql}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
-- 
2.1.0

0003-clause-reduction-using-functional-dependencies.patchtext/x-patch; charset=UTF-8; name=0003-clause-reduction-using-functional-dependencies.patchDownload

From 2433b5b3cb25a093f78857adb7f9c0b12ac88967 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/9] clause reduction using functional dependencies

During planning, use functional dependencies to decide which clauses to
skip during cardinality estimation. Initial and rather simplistic
implementation.

This only works with regular WHERE clauses, not clauses used for join
clauses.

Note: The clause_is_mv_compatible() needs to identify the relation (so
that we can fetch the list of multivariate stats by OID).
planner_rt_fetch() seems like the appropriate way to get the relation
OID, but apparently it only works with simple vars. Maybe
examine_variable() would make this work with more complex vars too?

Includes regression tests analyzing functional dependencies (part of
ANALYZE) on several datasets (no dependencies, no transitive
dependencies, ...).

Checks that a query with conditions on two columns, where one (B) is
functionally dependent on the other one (A), correctly ignores the
clause on (B) and chooses bitmap index scan instead of plain index scan
(which is what happens otherwise, thanks to assumption of
independence).

Note: Functional dependencies only work with equality clauses, no
inequalities etc.
---
 src/backend/optimizer/path/clausesel.c        | 891 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/README.stats        |  36 ++
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 +
 src/include/utils/mvstats.h                   |  16 +-
 src/test/regress/expected/mv_dependencies.out | 172 +++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 +++++
 9 files changed, 1293 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.stats
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 02660c2..80708fe 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,23 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+
+static int count_mv_attnums(List *clauses, Index relid);
+
+static int count_varnos(List *clauses, Index *relid);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+									Index relid, List *stats);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, Index relid);
+
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +82,19 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
+ * The first thing we try to do is applying multivariate statistics, in a way
+ * that intends to minimize the overhead when there are no multivariate stats
+ * on the relation. Thus we do several simple (and inexpensive) checks first,
+ * to verify that suitable multivariate statistics exist.
+ *
+ * If we identify such multivariate statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using multivariate stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -99,6 +133,22 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* list of multivariate stats on the relation */
+	List	   *stats = NIL;
+
+	/*
+	 * To fetch the statistics, we first need to determine the rel. Currently
+	 * point we only support estimates of simple restrictions with all Vars
+	 * referencing a single baserel. However set_baserel_size_estimates() sets
+	 * varRelid=0 so we have to actually inspect the clauses by pull_varnos
+	 * and see if there's just a single varno referenced.
+	 */
+	if ((count_varnos(clauses, &relid) == 1) && ((varRelid == 0) || (varRelid == relid)))
+		stats = find_stats(root, relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +158,24 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Apply functional dependencies, but first check that there are some stats
+	 * with functional dependencies built (by simply walking the stats list),
+	 * and that there are at two or more attributes referenced by clauses that
+	 * may be reduced using functional dependencies.
+	 *
+	 * We would find that anyway when trying to actually apply the functional
+	 * dependencies, but let's do the cheap checks first.
+	 *
+	 * After applying the functional dependencies we get the remainig clauses
+	 * that need to be estimated by other types of stats (MCV, histograms etc).
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
+		(count_mv_attnums(clauses, relid) >= 2))
+	{
+		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +831,824 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+typedef struct
+{
+	Index		varno;		/* relid we're interested in */
+	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
+} mv_compatible_context;
+
+/*
+ * Recursive walker that checks compatibility of the clause with multivariate
+ * statistics, and collects attnums from the Vars.
+ *
+ * XXX The original idea was to combine this with expression_tree_walker, but
+ *     I've been unable to make that work - seems that does not quite allow
+ *     checking the structure. Hence the explicit calls to the walker.
+ */
+static bool
+mv_compatible_walker(Node *node, mv_compatible_context *context)
+{
+	if (node == NULL)
+		return false;
+ 
+	if (IsA(node, RestrictInfo))
+ 	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+ 
+ 		/* Pseudoconstants are not really interesting here. */
+ 		if (rinfo->pseudoconstant)
+			return true;
+ 
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+ 
+		/* check the clause inside the RestrictInfo */
+		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
+ 	}
+
+	if (IsA(node, Var))
+	{
+		Var * var = (Var*)node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might be
+		 * unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (! AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+ 	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+ 	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+ 		/*
+		 * Only expressions with two arguments are considered compatible.
+ 		 *
+		 * XXX Possibly unnecessary (can OpExpr have different arg count?).
+ 		 */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node*)expr) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				(varonleft = false,
+				 is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (! ok)
+			return true;
+
+ 		/*
+		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
+		 * Otherwise note the relid and attnum for the variable. This uses the
+		 * function for estimating selectivity, ont the operator directly (a bit
+		 * awkward, but well ...).
+ 		 */
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+ 		}
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return mv_compatible_walker((Node *) var, context);
+ 	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+ 
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+{
+	mv_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (mv_compatible_walker(clause, (void *) &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+/*
+ * collect attnums from functional dependencies
+ *
+ * Walk through all statistics on the relation, and collect attnums covered
+ * by those with functional dependencies. We only look at columns specified
+ * when creating the statistics, not at columns actually referenced by the
+ * dependencies (which may only be a subset of the attributes).
+ */
+static Bitmapset*
+fdeps_collect_attnums(List *stats)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		int2vector *stakeys = info->stakeys;
+
+		/* skip stats without functional dependencies built */
+		if (! info->deps_built)
+			continue;
+
+		for (j = 0; j < stakeys->dim1; j++)
+			attnums = bms_add_member(attnums, stakeys->values[j]);
+	}
+
+	return attnums;
+}
+
+/* transforms bitmapset into an array (index => value) */
+static int*
+make_idx_to_attnum_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum;
+
+	int	   *mapping = (int*)palloc0(bms_num_members(attnums) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attidx++] = attnum;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+/* transforms bitmapset into an array (value => index) */
+static int*
+make_attnum_to_idx_mapping(Bitmapset *attnums)
+{
+	int		attidx = 0;
+	int		attnum;
+	int		maxattnum = -1;
+	int	   *mapping;
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		maxattnum = attnum;
+
+	mapping = (int*)palloc0((maxattnum+1) * sizeof(int));
+
+	attnum = -1;
+	while ((attnum = bms_next_member(attnums, attnum)) >= 0)
+		mapping[attnum] = attidx++;
+
+	Assert(attidx == bms_num_members(attnums));
+
+	return mapping;
+}
+
+/* build adjacency matrix for the dependencies */
+static bool*
+build_adjacency_matrix(List *stats, Bitmapset *attnums,
+					   int *idx_to_attnum, int *attnum_to_idx)
+{
+	ListCell *lc;
+	int		natts  = bms_num_members(attnums);
+	bool   *matrix = (bool*)palloc0(natts * natts * sizeof(bool));
+
+	foreach (lc, stats)
+	{
+		int j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies dependencies = NULL;
+
+		/* skip stats without functional dependencies built */
+		if (! stat->deps_built)
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(stat->mvoid);
+		if (dependencies == NULL)
+		{
+			elog(WARNING, "failed to deserialize func deps %d", stat->mvoid);
+			continue;
+		}
+
+		/* set matrix[a,b] to 'true' if 'a=>b' */
+		for (j = 0; j < dependencies->ndeps; j++)
+		{
+			int aidx = attnum_to_idx[dependencies->deps[j]->a];
+			int bidx = attnum_to_idx[dependencies->deps[j]->b];
+
+			/* a=> b */
+			matrix[aidx * natts + bidx] = true;
+		}
+	}
+
+	return matrix;
+}
+
+/*
+ * multiply the adjacency matrix
+ *
+ * By multiplying the adjacency matrix, we derive dependencies implied by those
+ * stored in the catalog (but possibly in several separate rows). We need to
+ * repeat the multiplication until no new dependencies are discovered. The
+ * maximum number of multiplications is equal to the number of attributes.
+ *
+ * This is based on modeling the functional dependencies as edges in a directed
+ * graph with attributes as vertices.
+ */
+static void
+multiply_adjacency_matrix(bool *matrix, int natts)
+{
+	int i;
+
+	/* repeat the multiplication up to natts-times */
+	for (i = 0; i < natts; i++)
+	{
+		bool changed = false;	/* no changes in this round */
+		int k, l, m;
+
+		/* k => l */
+		for (k = 0; k < natts; k++)
+		{
+			for (l = 0; l < natts; l++)
+			{
+				/* skip already known dependencies */
+				if (matrix[k * natts + l])
+					continue;
+
+				/*
+				 * compute (k,l) in the multiplied matrix
+				 *
+				 * We don't really care about the exact value, just true/false,
+				 * so terminate the loop once we get a hit. Also, this makes it
+				 * safe to modify the matrix in-place.
+				 */
+				for (m = 0; m < natts; m++)
+				{
+					if (matrix[k * natts + m] * matrix[m * natts + l])
+					{
+						matrix[k * natts + l] = true;
+						changed = true;
+						break;
+					}
+				}
+			}
+		}
+
+		/* no transitive dependency added in this round, so terminate */
+		if (! changed)
+			break;
+	}
+}
+
+/*
+ * Reduce clauses using functional dependencies
+ *
+ * Walk through clauses and eliminate the redundant ones (implied by other
+ * clauses). This is done by first deriving a transitive closure of all the
+ * functional dependencies (by multiplying the adjacency matrix).
+ */
+static List*
+fdeps_reduce_clauses(List *clauses, Bitmapset *attnums, bool *matrix,
+					int *idx_to_attnum, int *attnum_to_idx, Index relid)
+{
+	int i;
+	ListCell *lc;
+	List   *reduced_clauses = NIL;
+
+	int			nmvclauses;	/* size of the arrays */
+	bool	   *reduced;
+	AttrNumber *mvattnums;
+	Node	  **mvclauses;
+
+	int			natts = bms_num_members(attnums);
+
+	/*
+	 * Preallocate space for all clauses (the list only containst
+	 * compatible clauses at this point). This makes it somewhat easier
+	 * to access the stats / attnums randomly.
+	 *
+	 * XXX This assumes each clause references exactly one Var, so the
+	 *     arrays are sized accordingly - for functional dependencies
+	 *     this is safe, because it only works with Var=Const.
+	 */
+	mvclauses = (Node**)palloc0(list_length(clauses) * sizeof(Node*));
+	mvattnums = (AttrNumber*)palloc0(list_length(clauses) * sizeof(AttrNumber));
+	reduced = (bool*)palloc0(list_length(clauses) * sizeof(bool));
+
+	/* fill the arrays */
+	nmvclauses = 0;
+	foreach (lc, clauses)
+	{
+		Node * clause = (Node*)lfirst(lc);
+		Bitmapset * attnums = get_varattnos(clause, relid);
+
+		mvclauses[nmvclauses] = clause;
+		mvattnums[nmvclauses] = bms_singleton_member(attnums);
+		nmvclauses++;
+	}
+
+	Assert(nmvclauses == list_length(clauses));
+
+	/* now try to reduce the clauses (using the dependencies) */
+	for (i = 0; i < nmvclauses; i++)
+	{
+		int j;
+
+		/* not covered by dependencies */
+		if (! bms_is_member(mvattnums[i], attnums))
+			continue;
+
+		/* this clause was already reduced, so let's skip it */
+		if (reduced[i])
+			continue;
+
+		/* walk the potentially 'implied' clauses */
+		for (j = 0; j < nmvclauses; j++)
+		{
+			int aidx, bidx;
+
+			/* not covered by dependencies */
+			if (! bms_is_member(mvattnums[j], attnums))
+				continue;
+
+			aidx = attnum_to_idx[mvattnums[i]];
+			bidx = attnum_to_idx[mvattnums[j]];
+
+			/* can't reduce the clause by itself, or if already reduced */
+			if ((i == j) || reduced[j])
+				continue;
+
+			/* mark the clause as reduced (if aidx => bidx) */
+			reduced[j] = matrix[aidx * natts + bidx];
+		}
+	}
+
+	/* now walk through the clauses, and keep only those not reduced */
+	for (i = 0; i < nmvclauses; i++)
+		if (! reduced[i])
+			reduced_clauses = lappend(reduced_clauses, mvclauses[i]);
+
+	pfree(reduced);
+	pfree(mvclauses);
+	pfree(mvattnums);
+
+	return reduced_clauses;
+}
+
+/*
+ * filter clauses that are interesting for the reduction step
+ *
+ * Functional dependencies can only work with equality clauses with attributes
+ * covered by at least one of the statistics, so we walk through the clauses
+ * and copy the uninteresting ones directly to the result (reduced) clauses.
+ *
+ * That includes clauses that:
+ *     (a) are not mv-compatible
+ *     (b) reference more than a single attnum
+ *     (c) use attnum not covered by functional depencencies
+ *
+ * The clauses interesting for the reduction step are copied to deps_clauses.
+ *
+ * root            - planner root
+ * clauses         - list of clauses (input)
+ * deps_attnums    - attributes covered by dependencies
+ * reduced_clauses - resulting clauses (not subject to reduction step)
+ * deps_clauses    - clauses to be processed by reduction
+ * relid           - relid of the baserel
+ *
+ * The return value is a bitmap of attnums referenced by deps_clauses.
+ */
+static Bitmapset *
+fdeps_filter_clauses(PlannerInfo *root,
+					 List *clauses, Bitmapset *deps_attnums,
+					 List **reduced_clauses, List **deps_clauses,
+					 Index relid)
+{
+	ListCell *lc;
+	Bitmapset *clause_attnums = NULL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(lc);
+
+		if (! clause_is_mv_compatible(clause, relid, &attnum))
+
+			/* clause incompatible with functional dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(attnum, deps_attnums))
+
+			/* clause not covered by the dependencies */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else
+		{
+			*deps_clauses   = lappend(*deps_clauses, clause);
+			clause_attnums = bms_add_member(clause_attnums, attnum);
+		}
+	}
+
+	return clause_attnums;
+}
+
+/*
+ * reduce list of equality clauses using soft functional dependencies
+ *
+ * We simply walk through list of functional dependencies, and for each one we
+ * check whether the dependency 'matches' the clauses, i.e. if there's a clause
+ * matching the condition. If yes, we attempt to remove all clauses matching
+ * the implied part of the dependency from the list.
+ *
+ * This only reduces equality clauses, and ignores all the other types. We might
+ * extend it to handle IS NULL clause, in the future.
+ *
+ * We also assume the equality clauses are 'compatible'. For example we can't
+ * identify when the clauses use a mismatching zip code and city name. In such
+ * case the usual approach (product of selectivities) would produce a better
+ * estimate, although mostly by chance.
+ *
+ * The implementation needs to be careful about cyclic dependencies, e.g. when
+ *
+ *     (a -> b) and (b -> a)
+ *
+ * at the same time, which means there's 1:1 relationship between te columns.
+ * In this case we must not reduce clauses on both attributes at the same time.
+ *
+ * TODO Currently we only apply functional dependencies at the same level, but
+ *      maybe we could transfer the clauses from upper levels to the subtrees?
+ *      For example let's say we have (a->b) dependency, and condition
+ *
+ *          (a=1) AND (b=2 OR c=3)
+ *
+ *      Currently, we won't be able to perform any reduction, because we'll
+ *      consider (a=1) and (b=2 OR c=3) independently. But maybe we could pass
+ *      (a=1) into the other expression, and only check it against conditions
+ *      of the functional dependencies?
+ *
+ *      In this case we'd end up with
+ *
+ *         (a=1)
+ *
+ *      as we'd consider (b=2) implied thanks to the rule, rendering the whole
+ *      OR clause valid.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Index relid, List *stats)
+{
+	List	   *reduced_clauses = NIL;
+
+	/*
+	 * matrix of (natts x natts), 1 means x=>y
+	 *
+	 * This serves two purposes - first, it merges dependencies from all
+	 * the statistics, second it makes generating all the transitive
+	 * dependencies easier.
+	 *
+	 * We need to build this only for attributes from the dependencies,
+	 * not for all attributes in the table.
+	 *
+	 * We can't do that only for attributes from the clauses, because we
+	 * want to build transitive dependencies (including those going
+	 * through attributes not listed in the stats).
+	 *
+	 * This only works for A=>B dependencies, not sure how to do that
+	 * for complex dependencies.
+	 */
+	bool       *deps_matrix;
+	int			deps_natts;	/* size of the matric */
+
+	/* mapping attnum <=> matrix index */
+	int		   *deps_idx_to_attnum;
+	int		   *deps_attnum_to_idx;
+
+	/* attnums in dependencies and clauses (and intersection) */
+	List	   *deps_clauses   = NIL;
+	Bitmapset  *deps_attnums   = NULL;
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *intersect_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/*
+	 * Build the dependency matrix, i.e. attribute adjacency matrix,
+	 * where 1 means (a=>b). Once we have the adjacency matrix, we'll
+	 * multiply it by itself, to get transitive dependencies.
+	 *
+	 * Note: This is pretty much transitive closure from graph theory.
+	 *
+	 * First, let's see what attributes are covered by functional
+	 * dependencies (sides of the adjacency matrix), and also a maximum
+	 * attribute (size of mapping to simple integer indexes);
+	 */
+	deps_attnums = fdeps_collect_attnums(stats);
+
+	/*
+	 * Walk through the clauses - clauses that are (one of)
+	 *
+	 * (a) not mv-compatible
+	 * (b) are using more than a single attnum
+	 * (c) using attnum not covered by functional depencencies
+	 *
+	 * may be copied directly to the result. The interesting clauses are
+	 * kept in 'deps_clauses' and will be processed later.
+	 */
+	clause_attnums = fdeps_filter_clauses(root, clauses, deps_attnums,
+										&reduced_clauses, &deps_clauses, relid);
+
+	/*
+	 * we need at least two clauses referencing two different attributes
+	 * referencing to do the reduction
+	 */
+	if ((list_length(deps_clauses) < 2) || (bms_num_members(clause_attnums) < 2))
+	{
+		bms_free(clause_attnums);
+		list_free(reduced_clauses);
+		list_free(deps_clauses);
+
+		return clauses;
+	}
+
+
+	/*
+	 * We need at least two matching attributes in the clauses and
+	 * dependencies, otherwise we can't really reduce anything.
+	 */
+	intersect_attnums = bms_intersect(clause_attnums, deps_attnums);
+	if (bms_num_members(intersect_attnums) < 2)
+	{
+		bms_free(clause_attnums);
+		bms_free(deps_attnums);
+		bms_free(intersect_attnums);
+
+		list_free(deps_clauses);
+		list_free(reduced_clauses);
+
+		return clauses;
+	}
+
+	/*
+	 * Build mapping between matrix indexes and attnums, and then the
+	 * adjacency matrix itself.
+	 */
+	deps_idx_to_attnum = make_idx_to_attnum_mapping(deps_attnums);
+	deps_attnum_to_idx = make_attnum_to_idx_mapping(deps_attnums);
+
+	/* build the adjacency matrix */
+	deps_matrix = build_adjacency_matrix(stats, deps_attnums,
+										 deps_idx_to_attnum,
+										 deps_attnum_to_idx);
+
+	deps_natts = bms_num_members(deps_attnums);
+
+	/*
+	 * Multiply the matrix N-times (N = size of the matrix), so that we
+	 * get all the transitive dependencies. That makes the next step
+	 * much easier and faster.
+	 *
+	 * This is essentially an adjacency matrix from graph theory, and
+	 * by multiplying it we get transitive edges. We don't really care
+	 * about the exact number (number of paths between vertices) though,
+	 * so we can do the multiplication in-place (we don't care whether
+	 * we found the dependency in this round or in the previous one).
+	 *
+	 * Track how many new dependencies were added, and stop when 0, but
+	 * we can't multiply more than N-times (longest path in the graph).
+	 */
+	multiply_adjacency_matrix(deps_matrix, deps_natts);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may
+	 * reduce. The matrix contains all transitive dependencies, which
+	 * makes this very fast.
+	 *
+	 * We have to be careful not to reduce the clause using itself, or
+	 * reducing all clauses forming a cycle (so we have to skip already
+	 * eliminated clauses).
+	 *
+	 * I'm not sure whether this guarantees finding the best solution,
+	 * i.e. reducing the most clauses, but it probably does (thanks to
+	 * having all the transitive dependencies).
+	 */
+	deps_clauses = fdeps_reduce_clauses(deps_clauses,
+										deps_attnums, deps_matrix,
+										deps_idx_to_attnum,
+										deps_attnum_to_idx, relid);
+
+	/* join the two lists of clauses */
+	reduced_clauses = list_union(reduced_clauses, deps_clauses);
+
+	pfree(deps_matrix);
+	pfree(deps_idx_to_attnum);
+	pfree(deps_attnum_to_idx);
+
+	bms_free(deps_attnums);
+	bms_free(clause_attnums);
+	bms_free(intersect_attnums);
+
+	return reduced_clauses;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Lookups stats for a given baserel.
+ */
+static List *
+find_stats(PlannerInfo *root, Index relid)
+{
+	Assert(root->simple_rel_array[relid] != NULL);
+
+	return root->simple_rel_array[relid]->mvstatlist;
+}
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
new file mode 100644
index 0000000..a38ea7b
--- /dev/null
+++ b/src/backend/utils/mvstats/README.stats
@@ -0,0 +1,36 @@
+Multivariate statististics
+==========================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Multivariate stats track different types of dependencies between the columns,
+hopefully improving the estimates.
+
+Currently we only have one kind of multivariate statistics - soft functional
+dependencies, and we use it to improve estimates of equality clauses. See
+README.dependencies for details.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable multivariate stats
+        exist (so if you are not using multivariate stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are multivariate stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with multivariate statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..bd200bc 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 2a064a0..c80ba33 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -435,3 +435,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 7ebd961..cc43a79 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,12 +17,20 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
 	int16	a;
@@ -48,6 +56,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..e759997
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index bec0316..4f2ffb8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7e9b319..097a04f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -162,3 +162,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..48dea4d
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
2.1.0

0004-multivariate-MCV-lists.patchtext/x-patch; charset=UTF-8; name=0004-multivariate-MCV-lists.patchDownload

From 11e08f7a0ffc186dbc23605d522c278e9b393ea5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/9] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 doc/src/sgml/ref/create_statistics.sgml |   18 +
 src/backend/catalog/system_views.sql    |    4 +-
 src/backend/commands/statscmds.c        |   45 +-
 src/backend/nodes/outfuncs.c            |    2 +
 src/backend/optimizer/path/clausesel.c  |  829 ++++++++++++++++++++++-
 src/backend/optimizer/util/plancat.c    |    4 +-
 src/backend/utils/mvstats/Makefile      |    2 +-
 src/backend/utils/mvstats/README.mcv    |  137 ++++
 src/backend/utils/mvstats/README.stats  |   89 ++-
 src/backend/utils/mvstats/common.c      |  104 ++-
 src/backend/utils/mvstats/common.h      |   11 +-
 src/backend/utils/mvstats/mcv.c         | 1094 +++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                 |   25 +-
 src/include/catalog/pg_mv_statistic.h   |   18 +-
 src/include/catalog/pg_proc.h           |    4 +
 src/include/nodes/relation.h            |    2 +
 src/include/utils/mvstats.h             |   69 +-
 src/test/regress/expected/mv_mcv.out    |  207 ++++++
 src/test/regress/expected/rules.out     |    4 +-
 src/test/regress/parallel_schedule      |    2 +-
 src/test/regress/serial_schedule        |    1 +
 src/test/regress/sql/mv_mcv.sql         |  178 +++++
 22 files changed, 2776 insertions(+), 73 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.mcv
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index a86eae3..193e4b0 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -132,6 +132,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>max_mcv_items</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of MCV list items.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a264e..2d570ee 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -165,7 +165,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 1b89bbe..b04c583 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -70,7 +70,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject, childobject;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -146,6 +152,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -154,10 +183,16 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -178,8 +213,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 07206d7..333e24b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2162,9 +2162,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 80708fe..977f88e 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,23 +48,51 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
-static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
+							 int type);
 
-static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid, int type);
 
-static int count_mv_attnums(List *clauses, Index relid);
+static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, Index relid,
+								 List *clauses, List **mvclauses,
+								 MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
+
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
 
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -89,11 +118,13 @@ static List * find_stats(PlannerInfo *root, Index relid);
  * to verify that suitable multivariate statistics exist.
  *
  * If we identify such multivariate statistics apply, we try to apply them.
- * Currently we only have (soft) functional dependencies, so we try to reduce
- * the list of clauses.
  *
- * Then we remove the clauses estimated using multivariate stats, and process
- * the rest of the clauses using the regular per-column stats.
+ * First we try to reduce the list of clauses by applying (soft) functional
+ * dependencies, and then we try to estimate the selectivity of the reduced
+ * list of clauses using the multivariate MCV list.
+ *
+ * Finally we remove the portion of clauses estimated using multivariate stats,
+ * and process the rest of the clauses using the regular per-column stats.
  *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
@@ -170,12 +201,46 @@ clauselist_selectivity(PlannerInfo *root,
 	 * that need to be estimated by other types of stats (MCV, histograms etc).
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
-		(count_mv_attnums(clauses, relid) >= 2))
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_FDEP) >= 2))
 	{
 		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
 	}
 
 	/*
+	 * Check that there are statistics with MCV list or histogram, and also the
+	 * number of attributes covered by these types of statistics.
+	 *
+	 * If there are no such stats or not enough attributes, don't waste time
+	 * with the multivariate code and simply skip to estimation using the
+	 * regular per-column stats.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	{
+		/* collect attributes from the compatible conditions */
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+
+		/* and search for the statistic covering the most attributes */
+		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+		if (mvstat != NULL)	/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	*mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, MV_CLAUSE_TYPE_MCV);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats */
+			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -832,6 +897,69 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO We may support some additional conditions, most importantly those
+ *      matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ *      selectivity of the most restrictive clause), because that's the maximum
+ *      we can ever get from ANDed list of clauses. This may probably prevent
+ *      issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO We may remember the lowest frequency in the MCV list, and then later use
+ *      it as a upper boundary for the selectivity (had there been a more
+ *      frequent item, it'd be in the MCV list). This might improve cases with
+ *      low-detail histograms.
+ *
+ * TODO We may also derive some additional boundaries for the selectivity from
+ *      the MCV list, because
+ *
+ *      (a) if we have a "full equality condition" (one equality condition on
+ *          each column of the statistic) and we found a match in the MCV list,
+ *          then this is the final selectivity (and pretty accurate),
+ *
+ *      (b) if we have a "full equality condition" and we haven't found a match
+ *          in the MCV list, then the selectivity is below the lowest frequency
+ *          found in the MCV list,
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Pull varattnos from the clauses, similarly to pull_varattnos() but:
  *
@@ -869,28 +997,26 @@ get_varattnos(Node * node, Index relid)
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid)
+collect_mv_attnums(List *clauses, Index relid, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate
-	 * using multivariate stats, and remember the relid/columns. We'll
-	 * then cross-check if we have suitable stats, and only if needed
-	 * we'll split the clauses into multivariate and regular lists.
+	 * Walk through the clauses and identify the ones we can estimate using
+	 * multivariate stats, and remember the relid/columns. We'll then
+	 * cross-check if we have suitable stats, and only if needed we'll split
+	 * the clauses into multivariate and regular lists.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested
-	 * OpExpr, using either a range or equality.
+	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
+	 * using either a range or equality.
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(clause, relid, &attnum))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
 	}
 
 	/*
@@ -911,10 +1037,10 @@ collect_mv_attnums(List *clauses, Index relid)
  * Count the number of attributes in clauses compatible with multivariate stats.
  */
 static int
-count_mv_attnums(List *clauses, Index relid)
+count_mv_attnums(List *clauses, Index relid, int type)
 {
 	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
 
 	c = bms_num_members(attnums);
 
@@ -944,9 +1070,183 @@ count_varnos(List *clauses, Index *relid)
 
 	return cnt;
 }
+ 
+/*
+ * We're looking for statistics matching at least 2 attributes, referenced in
+ * clauses compatible with multivariate statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * If there are multiple statistics referencing the same number of columns
+ * (from the clauses), the one with less source columns (as listed in the
+ * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns, but one
+ *     has 100 buckets and the other one has 1000 buckets (thus likely
+ *     providing better estimates), this is not currently considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list, another
+ *     one with just a histogram and a third one with both, we treat them equally.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts, so if
+ *     there are multiple clauses on a single attribute, this still counts as
+ *     a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example equality
+ *     clauses probably work better with MCV lists than with histograms. But
+ *     IS [NOT] NULL conditions may often work better with histograms (thanks
+ *     to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
+ * as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list of
+ * clauses into two parts - conditions that are compatible with the selected
+ * stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while the last
+ * condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting conditions
+ * instead of just referenced attributes), but eventually the best option should
+ * be to combine multiple statistics. But that's much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses, because
+ *      'dependencies' will probably work only with equality clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements) and for
+	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses that
+ * will be evaluated using the chosen statistics, and the remaining clauses
+ * (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes, so we can do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list of
+		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
+		 * clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible with the chosen
+	 * histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
 
 typedef struct
 {
+	int			types;		/* types of statistics ? */
 	Index		varno;		/* relid we're interested in */
 	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
 } mv_compatible_context;
@@ -964,23 +1264,66 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 {
 	if (node == NULL)
 		return false;
- 
+
 	if (IsA(node, RestrictInfo))
  	{
 		RestrictInfo *rinfo = (RestrictInfo *) node;
- 
+
  		/* Pseudoconstants are not really interesting here. */
  		if (rinfo->pseudoconstant)
 			return true;
- 
+
 		/* clauses referencing multiple varnos are incompatible */
 		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
 			return true;
- 
+
 		/* check the clause inside the RestrictInfo */
 		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
  	}
 
+	if (or_clause(node) || and_clause(node) || not_clause(node))
+ 	{
+		/*
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses are
+		 *      supported and some are not, and treat all supported subclauses
+		 *      as a single clause, compute it's selectivity using mv stats,
+		 *      and compute the total selectivity using the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the orclause
+		 *      with nested RestrictInfo - we won't have to call pull_varnos()
+		 *      for each clause, saving time.
+		 *
+		 * TODO Perhaps this needs a bit more thought for functional
+		 *      dependencies? Those don't quite work for NOT cases.
+		 */
+		BoolExpr   *expr = (BoolExpr *) node;
+		ListCell   *lc;
+
+		foreach (lc, expr->args)
+		{
+			if (mv_compatible_walker((Node *) lfirst(lc), context))
+				return true;
+		}
+
+		return false;
+ 	}
+
+	if (IsA(node, NullTest))
+ 	{
+		NullTest* nt = (NullTest*)node;
+
+		/*
+		 * Only simple (Var IS NULL) expressions supported for now. Maybe we could
+		 * use examine_variable to fix this?
+		 */
+		if (! IsA(nt->arg, Var))
+			return true;
+
+		return mv_compatible_walker((Node*)(nt->arg), context);
+	}
+
 	if (IsA(node, Var))
 	{
 		Var * var = (Var*)node;
@@ -1031,7 +1374,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		/* unsupported structure (two variables or so) */
 		if (! ok)
 			return true;
-
+ 
  		/*
 		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
 		 * Otherwise note the relid and attnum for the variable. This uses the
@@ -1041,10 +1384,18 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		switch (get_oprrest(expr->opno))
 		{
 			case F_EQSEL:
-
 				/* equality conditions are compatible with all statistics */
 				break;
 
+			case F_SCALARLTSEL:
+			case F_SCALARGTSEL:
+
+				/* not compatible with functional dependencies */
+				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+					return true;	/* terminate */
+ 
+				break;
+
 			default:
 
 				/* unknown estimator */
@@ -1055,11 +1406,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 
 		return mv_compatible_walker((Node *) var, context);
  	}
-
+ 
 	/* Node not explicitly supported, so terminate */
 	return true;
 }
- 
+
 /*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
@@ -1078,10 +1429,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
  *      evaluate them using multivariate stats.
  */
 static bool
-clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int types)
 {
 	mv_compatible_context context;
 
+	context.types = types;
 	context.varno = relid;
 	context.varattnos = NULL;	/* no attnums */
 
@@ -1089,7 +1441,7 @@ clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
 		return false;
 
 	/* remember the newly collected attnums */
-	*attnum = bms_singleton_member(context.varattnos);
+	*attnums = bms_add_members(*attnums, context.varattnos);
 
 	return true;
 }
@@ -1394,24 +1746,39 @@ fdeps_filter_clauses(PlannerInfo *root,
 
 	foreach (lc, clauses)
 	{
-		AttrNumber	attnum;
+		Bitmapset *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(lc);
 
-		if (! clause_is_mv_compatible(clause, relid, &attnum))
+		if (! clause_is_mv_compatible(clause, relid, &attnums,
+									  MV_CLAUSE_TYPE_FDEP))
 
 			/* clause incompatible with functional dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
-		else if (! bms_is_member(attnum, deps_attnums))
+		else if (bms_num_members(attnums) > 1)
+
+			/*
+			 * clause referencing multiple attributes (strange, should
+			 * this be handled by clause_is_mv_compatible directly)
+			 */
+			*reduced_clauses = lappend(*reduced_clauses, clause);
+
+		else if (! bms_is_member(bms_singleton_member(attnums), deps_attnums))
 
 			/* clause not covered by the dependencies */
 			*reduced_clauses = lappend(*reduced_clauses, clause);
 
 		else
 		{
+			/* ok, clause compatible with existing dependencies */
+			Assert(bms_num_members(attnums) == 1);
+
 			*deps_clauses   = lappend(*deps_clauses, clause);
-			clause_attnums = bms_add_member(clause_attnums, attnum);
+			clause_attnums = bms_add_member(clause_attnums,
+										bms_singleton_member(attnums));
 		}
+
+		bms_free(attnums);
 	}
 
 	return clause_attnums;
@@ -1637,6 +2004,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+			return true;
 	}
 
 	return false;
@@ -1652,3 +2022,392 @@ find_stats(PlannerInfo *root, Index relid)
 
 	return root->simple_rel_array[relid]->mvstatlist;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/*
+	 * find the lowest frequency in the MCV list
+	 *
+	 * We need to do that here, because we do various tricks in the following
+	 * code - skipping items already ruled out, etc.
+	 *
+	 * XXX A loop is necessary because the MCV list is not sorted by frequency.
+	 */
+	*lowsel = 1.0;
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem item = mcvlist->items[i];
+
+		if (item->frequency < *lowsel)
+			*lowsel = item->frequency;
+	}
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+			FmgrInfo	opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					switch (oprrest)
+					{
+						case F_EQSEL:
+							/*
+							 * We don't care about isgt in equality, because it does not
+							 * matter whether it's (var = const) or (const = var).
+							 */
+							mismatch = ! DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							if (! mismatch)
+								eqmatches = bms_add_member(eqmatches, idx);
+
+							break;
+
+						case F_SCALARLTSEL:	/* column < constant */
+						case F_SCALARGTSEL: /* column > constant */
+
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							/* invert the result if isgt=true */
+							mismatch = (isgt) ? (! mismatch) : mismatch;
+							break;
+					}
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! item->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (item->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 7fb2088..8394111 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -421,9 +421,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.mcv b/src/backend/utils/mvstats/README.mcv
new file mode 100644
index 0000000..e93cfe4
--- /dev/null
+++ b/src/backend/utils/mvstats/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+    (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+    (1) there's an equality condition on all attributes of the statistic
+
+    (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+     f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+     of the actual values. This type of MCV list will be useful only for
+     estimating equality clauses, and will reduce space requirements for large
+     varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+     building the histogram at all), except for doing this decision manually in
+     ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+    SELECT * FROM pg_mv_mcv_items();
+
+It has two input parameters:
+
+    oid   - OID of the MCV list (pg_mv_statistic.staoid)
+
+and produces a table with these columns:
+
+    - item ID (0...nitems-1)
+    - values (string array)
+    - nulls only (boolean array)
+    - frequency (double precision)
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index a38ea7b..5c5c59a 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,9 +8,50 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-Currently we only have one kind of multivariate statistics - soft functional
-dependencies, and we use it to improve estimates of equality clauses. See
-README.dependencies for details.
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) MCV lists (README.mcv)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+    (b) MCV list - equality and inequality clauses, IS [NOT] NULL, AND/OR
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
 
 
 Selectivity estimation
@@ -23,14 +64,48 @@ When estimating selectivity, we aim to achieve several things:
     (b) minimize the overhead, especially when no suitable multivariate stats
         exist (so if you are not using multivariate stats, there's no overhead)
 
-This clauselist_selectivity() performs several inexpensive checks first, before
+Thus clauselist_selectivity() performs several inexpensive checks first, before
 even attempting to do the more expensive estimation.
 
     (1) check if there are multivariate stats on the relation
 
-    (2) check there are at least two attributes referenced by clauses compatible
-        with multivariate statistics (equality clauses for func. dependencies)
+    (2) check that there are functional dependencies on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equality clauses for func. dependencies)
 
     (3) perform reduction of equality clauses using func. dependencies
 
-    (4) estimate the reduced list of clauses using regular statistics
+    (4) check that there are multivariate MCV lists on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equalities, inequalities, etc.)
+
+    (5) find the best multivariate statistics (matching the most conditions)
+        and use it to compute the estimate
+
+    (6) estimate the remaining clauses (not estimated using multivariate stats)
+        using the regular per-column statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
+
+
+Further (possibly crazy) ideas
+------------------------------
+
+Currently the clauses are only estimated using a single statistics, even if
+there are multiple candidate statistics - for example assume we have statistics
+on (a,b,c) and (b,c,d), and estimate conditions
+
+    (b = 1) AND (c = 2)
+
+Then both statistics may be used, but we only use one of them. Maybe we could
+use compute estimates using all candidate stats, and somehow aggregate them
+into the final estimate by using average or median.
+
+Some stats may give better estimates than others, but it's very difficult to say
+in advance which stats are the best (it depends on the number of buckets, number
+of additional columns not referenced in the clauses, type of condition etc.).
+
+But of course, this may result in expensive estimation (CPU-wise).
+
+So we might add a GUC to choose between a simple (single statistics) and thus
+multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index bd200bc..d1da714 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index d96422d..9f1bd59 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -71,5 +79,6 @@ int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..551c934
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1094 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(int32))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(int32) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* pointers into a flat serialized item of ITEM_SIZE(n) bytes */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+/*
+ * Builds MCV list from sample rows, and removes rows represented by
+ * the MCV list from the sample (the number of remaining sample rows is
+ * returned by the numrows_filtered parameter).
+ *
+ * The method is quite simple - in short it does about these steps:
+ *
+ *       (1) sort the data (default collation, '<' for the data type)
+ *
+ *       (2) count distinct groups, decide how many to keep
+ *
+ *       (3) build the MCV list using the threshold determined in (2)
+ *
+ *       (4) remove rows represented by the MCV from the sample
+ *
+ * For more details, see the comments in the code.
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We
+ *       should do that too, because when walking through the list we
+ *       want to check the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or
+ *      float4). Maybe we could save some space here, but the bytea
+ *      compression should handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed
+ *      from the table, but rather estimate the number of distinct
+ *      values in the table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int count = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * Preallocate space for all the items as a single chunk, and point
+	 * the items to the appropriate parts of the array.
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool	   *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	/* keep all the rows by default (as if there was no MCV list) */
+	*numrows_filtered = numrows;
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* load the values/null flags from sample rows */
+	for (j = 0; j < numrows; j++)
+		for (i = 0; i < numattrs; i++)
+			items[j].values[i] = heap_getattr(rows[j], attrs->values[i],
+								stats[i]->tupDesc, &items[j].isnull[i]);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+				multi_sort_compare, mss);
+
+	/*
+	 * Count the number of distinct groups - just walk through the
+	 * sorted list and count the number of key changes. We use this to
+	 * determine the threshold (125% of the average frequency).
+	 */
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	/*
+	 * Determine how many groups actually exceed the threshold, and then
+	 * walk the array again and collect them into an array. We'll always
+	 * require at least 4 rows per group.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e.
+	 * if there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS),
+	 * we'll require only 2 rows per group.
+	 *
+	 * TODO For now the threshold is the same as in the single-column
+	 *      case (average + 25%), but maybe that's worth revisiting
+	 *      for the multivariate case.
+	 *
+	 * TODO We can do this only if we believe we got all the distinct
+	 *      values of the table.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog)
+	 *       instead of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/*
+	 * Walk through the sorted data again, and see how many groups
+	 * reach the mcv_threshold (and become an item in the MCV list).
+	 */
+	count = 1;
+	for (i = 1; i <= numrows; i++)
+	{
+		/* last row or new group, so check if we exceed  mcv_threshold */
+		if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+		{
+			/* group hits the threshold, count the group as MCV item */
+			if (count >= mcv_threshold)
+				nitems += 1;
+
+			count = 1;
+		}
+		else	/* within group, so increase the number of items */
+			count += 1;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as
+		 * we'll pass this outside this method and thus it needs to be
+		 * easy to pfree() the data (and we wouldn't know where the
+		 * arrays start).
+		 *
+		 * TODO Maybe the reasoning that we can't allocate a single
+		 *      piece because we're passing it out is bogus? Who'd
+		 *      free a single item of the MCV list, anyway?
+		 *
+		 * TODO Maybe with a proper encoding (stuffing all the values
+		 *      into a list-level array, this will be untrue)?
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/*
+		 * Repeat the same loop as above, but this time copy the data
+		 * into the MCV list (for items exceeding the threshold).
+		 *
+		 * TODO Maybe we could simply remember indexes of the last item
+		 *      in each group (from the previous loop)?
+		 */
+		count = 1;
+		nitems = 0;
+		for (i = 1; i <= numrows; i++)
+		{
+			/* last row or a new group */
+			if ((i == numrows) || (multi_sort_compare(&items[i], &items[i-1], mss) != 0))
+			{
+				/* count the MCV item if exceeding the threshold (and copy into the array) */
+				if (count >= mcv_threshold)
+				{
+					/* just pointer to the proper place in the list */
+					MCVItem item = mcvlist->items[nitems];
+
+					/* copy values from the _previous_ group (last item of) */
+					memcpy(item->values, items[(i-1)].values, sizeof(Datum) * numattrs);
+					memcpy(item->isnull, items[(i-1)].isnull, sizeof(bool)  * numattrs);
+
+
+					/* and finally the group frequency */
+					item->frequency = (double)count / numrows;
+
+					/* next item */
+					nitems += 1;
+				}
+
+				count = 1;
+			}
+			else	/* same group, just increase the number of items */
+				count += 1;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows
+		 * that are not represented by the MCV list).
+		 *
+		 * FIXME This implementation is rather naive, effectively O(N^2).
+		 *       As the MCV list grows, the check will take longer and
+		 *       longer. And as the number of sampled rows increases (by
+		 *       increasing statistics target), it will take longer and
+		 *       longer. One option is to sort the MCV items first and
+		 *       then perform a binary search.
+		 *
+		 *       A better option would be keeping the ID of the row in
+		 *       the sort item, and then just walk through the items and
+		 *       mark rows to remove (in a bitmap of the same size).
+		 *       There's not space for that in SortItem at this moment,
+		 *       but it's trivial to add 'private' pointer, or just
+		 *       using another structure with extra field (starting with
+		 *       SortItem, so that the comparators etc. still work).
+		 *
+		 *       Another option is to use the sorted array of items
+		 *       (because that's how we sorted the source data), and
+		 *       simply do a bsearch() into it. If we find a matching
+		 *       item, the row belongs to the MCV list.
+		 */
+		if (nitems == ndistinct) /* all rows are covered by MCV items */
+			*numrows_filtered = 0;
+		else /* (nitems < ndistinct) && (nitems > 0) */
+		{
+			int nfiltered = 0;
+			HeapTuple *rows_filtered = (HeapTuple*)palloc0(sizeof(HeapTuple) * numrows);
+
+			/* used for the searches */
+			SortItem item, mcvitem;;
+
+			item.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			item.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * FIXME we don't need to allocate this, we can reference
+			 *       the MCV item directly ...
+			 */
+			mcvitem.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			mcvitem.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				bool	match = false;
+
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					item.values[j] = heap_getattr(rows[i], attrs->values[j],
+										stats[j]->tupDesc, &item.isnull[j]);
+
+				/* scan through the MCV list for matches */
+				for (j = 0; j < mcvlist->nitems; j++)
+				{
+					/*
+					 * TODO Create a SortItem/MCVItem comparator so that
+					 *      we don't need to do memcpy() like crazy.
+					 */
+					memcpy(mcvitem.values, mcvlist->items[j]->values,
+							numattrs * sizeof(Datum));
+					memcpy(mcvitem.isnull, mcvlist->items[j]->isnull,
+							numattrs * sizeof(bool));
+
+					if (multi_sort_compare(&item, &mcvitem, mss) == 0)
+					{
+						match = true;
+						break;
+					}
+				}
+
+				/* if no match in the MCV list, copy the row into the filtered ones */
+				if (! match)
+					memcpy(&rows_filtered[nfiltered++], &rows[i], sizeof(HeapTuple));
+			}
+
+			/* replace the rows and remember how many rows we kept */
+			memcpy(rows, rows_filtered, sizeof(HeapTuple) * nfiltered);
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(rows_filtered);
+			pfree(item.values);
+			pfree(item.isnull);
+			pfree(mcvitem.values);
+			pfree(mcvitem.isnull);
+		}
+	}
+
+	pfree(values);
+	pfree(items);
+	pfree(isnull);
+
+	return mcvlist;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Serialize MCV list into a bytea value. The basic algorithm is simple:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we're mixing
+ * different datatypes, and we don't know what equality means for them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't
+ * allow more than 8k MCV items (see list max_mcv_items). We might
+ * increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the high compression as with histograms,
+ * because we're not doing any bucket splits etc. (which is the source
+ * of high redundancy there), but we need to do it anyway as we need
+ * to serialize varlena values etc. We might invent another way to
+ * serialize MCV lists, but let's keep it consistent.
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider using 16-bit values for the indexes in step (3).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	Size	total_length = 0;
+
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for all values, including NULLs (won't use them) */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			if (! mcvlist->items[j]->isnull[i])	/* skip NULL values */
+			{
+				values[i][counts[i]] = mcvlist->items[j]->values[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* do not exceed UINT16_MAX */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typbyval || (info[i].typlen > 0))
+			/* by value pased by reference, but fixed length */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data.
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > 1024 * 1024)
+		elog(ERROR, "serialized MCV exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'ptr' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typbyval)
+			{
+				/* passed by value / Datum */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the values for each item */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! mcvlist->items[i]->isnull[j])
+			{
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				v = (Datum*)bsearch(&mcvlist->items[i]->values[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				ITEM_INDEXES(item)[j] = (v - values[j]);
+
+				/* check the index is within expected bounds */
+				Assert(ITEM_INDEXES(item)[j] >= 0);
+				Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+			}
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims),
+				mcvlist->items[i]->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims),
+				&mcvlist->items[i]->frequency, sizeof(double));
+
+		/* copy the item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * Inverse to serialize_mv_mcvlist() - see the comment there.
+ *
+ * We'll do full deserialization, because we don't really expect high
+ * duplication of values so the caching may not be as efficient as with
+ * histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MCVListData,items))
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert(nitems > 0);
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MCVListData,items) +
+					ndims * sizeof(DimensionInfo) +
+					(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * We'll allocate one large chunk of memory for the intermediate
+	 * data, needed only for deserializing the MCV list, and we'll pack
+	 * use a local dense allocation to minimize the palloc overhead.
+	 *
+	 * Let's see how much space we'll actually need, and also include
+	 * space for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;			/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should exhaust the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			val = item->values[i];
+			valout = FunctionCall1(&fmgrinfo[i], val);
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 8ce9c0e..2c22d31 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,8 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2128,6 +2129,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2137,10 +2140,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index c74af47..3529b03 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,15 +38,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -62,14 +68,18 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					8
+#define Natts_pg_mv_statistic					12
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_deps_enabled		5
-#define Anum_pg_mv_statistic_deps_built			6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_stadeps			8
+#define Anum_pg_mv_statistic_mcv_enabled		6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_deps_built			8
+#define Anum_pg_mv_statistic_mcv_built			9
+#define Anum_pg_mv_statistic_stakeys			10
+#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_stamcv				12
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index eecce40..b16eebc 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2670,6 +2670,10 @@ DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e10dcf1..2bcd582 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -653,9 +653,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index cc43a79..4535db7 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -51,30 +51,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..075320b
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s4 ON mcv_list (unknown_column) WITH (mcv);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s4 ON mcv_list (a) WITH (mcv);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a, b) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+ERROR:  max number of MCV items is 8192
+-- correct command
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s5 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s6 ON mcv_list (a, b, c, d) WITH (mcv);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 84b4425..66071d8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1373,7 +1373,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     s.staname,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f2ffb8..85d94f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 097a04f..6584d73 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -163,3 +163,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..b31d32d
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s4 ON mcv_list (unknown_column) WITH (mcv);
+
+-- single column
+CREATE STATISTICS s4 ON mcv_list (a) WITH (mcv);
+
+-- single column, duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a) WITH (mcv);
+
+-- two columns, one duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a, b) WITH (mcv);
+
+-- unknown option
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (unknown_option);
+
+-- missing MCV statistics
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+
+-- correct command
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s5 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s6 ON mcv_list (a, b, c, d) WITH (mcv);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.1.0

0005-multivariate-histograms.patchtext/x-patch; charset=UTF-8; name=0005-multivariate-histograms.patchDownload

From fea437ee38376fda67d177276ce9812f2b0e9d81 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/9] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 doc/src/sgml/ref/create_statistics.sgml    |   18 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   44 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  571 +++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/README.histogram |  287 ++++
 src/backend/utils/mvstats/README.stats     |    2 +
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2032 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  136 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 21 files changed, 3538 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.histogram
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 193e4b0..fd3382e 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -133,6 +133,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>max_buckets</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of histogram buckets.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>max_mcv_items</> (<type>integer</>)</term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d570ee..6afdee0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -167,7 +167,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index b04c583..e2f3ff1 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -71,12 +71,15 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -175,6 +178,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("maximum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -183,10 +209,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -194,6 +220,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -214,11 +245,14 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 333e24b..9172f21 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2163,10 +2163,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 977f88e..0de2418 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -74,6 +75,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -81,6 +84,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
@@ -93,6 +102,7 @@ static List * find_stats(PlannerInfo *root, Index relid);
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -121,7 +131,7 @@ static List * find_stats(PlannerInfo *root, Index relid);
  *
  * First we try to reduce the list of clauses by applying (soft) functional
  * dependencies, and then we try to estimate the selectivity of the reduced
- * list of clauses using the multivariate MCV list.
+ * list of clauses using the multivariate MCV list and histograms.
  *
  * Finally we remove the portion of clauses estimated using multivariate stats,
  * and process the rest of the clauses using the regular per-column stats.
@@ -214,11 +224,13 @@ clauselist_selectivity(PlannerInfo *root,
 	 * with the multivariate code and simply skip to estimation using the
 	 * regular per-column stats.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
-		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) &&
+		(count_mv_attnums(clauses, relid,
+						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
 		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
+											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
 		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
@@ -230,7 +242,7 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* split the clauselist into regular and mv-clauses */
 			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV);
+										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 			/* we've chosen the histogram to match the clauses */
 			Assert(mvclauses != NIL);
@@ -942,6 +954,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -955,9 +968,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 *      selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1160,7 +1188,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1391,7 +1419,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 			case F_SCALARGTSEL:
 
 				/* not compatible with functional dependencies */
-				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+				if (! (context->types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST)))
 					return true;	/* terminate */
  
 				break;
@@ -2007,6 +2035,9 @@ has_stats(List *stats, int type)
 
 		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2411,3 +2442,525 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/* cached result of bucket boundary comparison for a single dimension */
+
+#define HIST_CACHE_NOT_FOUND		0x00
+#define HIST_CACHE_FALSE			0x01
+#define HIST_CACHE_TRUE				0x03
+#define HIST_CACHE_MASK				0x02
+
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache)
+{
+	bool a, b;
+
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/*
+	 * First some quick checks on equality - if any of the boundaries equals,
+	 * we have a partial match (so no need to call the comparator).
+	 */
+	if (((min_value == constvalue) && (min_include)) ||
+		((max_value == constvalue) && (max_include)))
+		return MVSTATS_MATCH_PARTIAL;
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	/*
+	 * If result for the bucket lower bound not in cache, evaluate the function
+	 * and store the result in the cache.
+	 */
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, min_value));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/* And do the same for the upper bound. */
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, max_value));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	return (a ^ b) ? MVSTATS_MATCH_PARTIAL : MVSTATS_MATCH_NONE;
+}
+
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache, bool isgt)
+{
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	bool a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	bool b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   min_value,
+										   constvalue));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   max_value,
+										   constvalue));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/*
+	 * Now, we need to combine both results into the final answer, and we need
+	 * to be careful about the 'isgt' variable which kinda inverts the meaning.
+	 *
+	 * First, we handle the case when each boundary returns different results.
+	 * In that case the outcome can only be 'partial' match.
+	 */
+	 if (a != b)
+		return MVSTATS_MATCH_PARTIAL;
+
+	/*
+	 * When the results are the same, then it depends on the 'isgt' value. There
+	 * are four options:
+	 *
+	 * isgt=false a=b=true  => full match
+	 * isgt=false a=b=false => empty
+	 * isgt=true  a=b=true  => empty
+	 * isgt=true  a=b=false => full match
+	 *
+	 * We'll cheat a bit, because we know that (a=b) so we'll use just one of them.
+	 */
+	if (isgt)
+		return (!a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+	else
+		return ( a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo		ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					char res = MVSTATS_MATCH_NONE;
+
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+					bool mininclude, maxinclude;
+					int minidx, maxidx;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minidx = bucket->min[idx];
+					maxidx = bucket->max[idx];
+
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mininclude = bucket->min_inclusive[idx];
+					maxinclude = bucket->max_inclusive[idx];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* Var < Const */
+						case F_SCALARGTSEL:	/* Var > Const */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															   minval, maxval,
+															   minidx, maxidx,
+															   mininclude, maxinclude,
+															   callcache, isgt);
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the
+							 * lt operator, and we also check for equality with the boundaries.
+							 */
+
+							res = bucket_contains_value(ltproc, cst->constvalue,
+														minval, maxval,
+														minidx, maxidx,
+														mininclude, maxinclude,
+														callcache);
+							break;
+					}
+
+					UPDATE_RESULT(matches[i], res, is_or);
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the bucket, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8394111..2519249 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -422,10 +422,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.histogram b/src/backend/utils/mvstats/README.histogram
new file mode 100644
index 0000000..8234d2c
--- /dev/null
+++ b/src/backend/utils/mvstats/README.histogram
@@ -0,0 +1,287 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+    * frequency (fraction of tuples in the bucket)
+    * whether the boundaries are inclusive or exclusive
+    * whether the dimension contains only NULL values
+    * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+    (a) FULL_MATCH - The bucket definitely matches the clause.
+
+    (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+                        the tuples it represents.
+
+    (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+    WHERE (a < 10) => FULL_MATCH (all range values are below
+                      10, so the whole bucket matches)
+
+    WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+                      the clause, but we don't know how many)
+
+    WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+                      no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+    (a) create an initial bucket (containing all sample rows)
+
+    (b) create NULL buckets (by splitting the initial bucket)
+
+    (c) repeat
+    
+        (1) choose bucket to split next
+
+        (2) terminate if no bucket that might be split found, or if we've
+            reached the maximum number of buckets (16384)
+
+        (3) choose dimension to partition the bucket by
+
+        (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split.
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency. We also need to produce "regular" buckets, because buckets
+with one "side" much longer than the others are very likely to match a lot of
+conditions (which increases error, even if the bucket frequency is very low).
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket().
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL. 
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_mv_statistic table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+    2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+    (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+    2*N + K       = 1007
+    (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+    1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+    SELECT * FROM pg_mv_histogram_buckets();
+
+It has two input parameters:
+
+    oid   - OID of the histogram (pg_mv_statistic.staoid)
+    otype - type of output
+
+and produces a table with these columns:
+
+    - bucket ID                (0...nbuckets-1)
+    - lower bucket boundaries  (string array)
+    - upper bucket boundaries  (string array)
+    - nulls only dimensions    (boolean array)
+    - lower boundary inclusive (boolean array)
+    - upper boundary includive (boolean array)
+    - frequency                (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+    - 0 - values stored in the histogram, encoded as text
+    - 1 - indexes into the deduplicated arrays
+    - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 5c5c59a..3e4f4d1 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -18,6 +18,8 @@ Currently we only have two kinds of multivariate statistics
 
     (b) MCV lists (README.mcv)
 
+    (c) multivariate histograms (README.histogram)
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index d1da714..ffb76f4 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..9e5620a
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2032 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static int bsearch_comparator(const void * a, const void * b);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(int32))
+ * - max boundary indexes (2 * ndim * sizeof(int32))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(int32) + 3 * sizeof(bool)) +
+ *   2 * sizeof(float)
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		((float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * Building a multivariate algorithm. In short it first creates a single
+ * bucket containing all the rows, and then repeatedly split is by first
+ * searching for the bucket / dimension most in need of a split.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size.
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket
+ * for more details about the algorithm.
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [not reaching maximum number of buckets]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	histogram->ndimensions = numattrs;
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+	histogram->nbuckets = 1;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later
+	 * to select dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		int				j;
+		int				nvals;
+		Datum		   *tmp;
+
+		SortSupportData	ssup;
+		StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		nvals = 0;
+		tmp = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+		for (j = 0; j < numrows; j++)
+		{
+			bool	isnull;
+
+			/* remember the index of the sample row, to make the partitioning simpler */
+			Datum	value = heap_getattr(rows[j], attrs->values[i],
+										 stats[i]->tupDesc, &isnull);
+
+			if (isnull)
+				continue;
+
+			tmp[nvals++] = value;
+		}
+
+		/* do the sort and stuff only if there are non-NULL values */
+		if (nvals > 0)
+		{
+			/* sort the array of values */
+			qsort_arg((void *) tmp, nvals, sizeof(Datum),
+					  compare_scalars_simple, (void *) &ssup);
+
+			/* count distinct values */
+			ndistvalues[i] = 1;
+			for (j = 1; j < nvals; j++)
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+					ndistvalues[i] += 1;
+
+			/* FIXME allocate only needed space (count ndistinct first) */
+			distvalues[i] = (Datum*)palloc0(sizeof(Datum) * ndistvalues[i]);
+
+			/* now collect distinct values into the array */
+			distvalues[i][0] = tmp[0];
+			ndistvalues[i] = 1;
+
+			for (j = 1; j < nvals; j++)
+			{
+				if (compare_scalars_simple(&tmp[j], &tmp[j-1], &ssup) != 0)
+				{
+					distvalues[i][ndistvalues[i]] = tmp[j];
+					ndistvalues[i] += 1;
+				}
+			}
+		}
+
+		pfree(tmp);
+	}
+
+	/*
+	 * The initial bucket may contain NULL values, so we have to create
+	 * buckets with NULL-only dimensions.
+	 *
+	 * FIXME We may need up to 2^ndims buckets - check that there are
+	 *       enough buckets (MVSTAT_HIST_MAX_BUCKETS >= 2^ndims).
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no more buckets to partition */
+		if (bucket == NULL)
+			break;
+
+		histogram->buckets[histogram->nbuckets]
+			= partition_bucket(bucket, attrs, stats,
+							   ndistvalues, distvalues);
+
+		histogram->nbuckets += 1;
+	}
+
+	/* finalize the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in
+		 * case some of the rows were used for MCV (and thus are missing
+		 * from the histogram).
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+
+/* used to pass context into bsearch() */
+static SortSupport ssup_private = NULL;
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	DimensionInfo * info
+				= (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	SortSupport	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs
+		 * (we won't use them, but we don't know how many are there),
+		 * and then collect all non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (10 * 1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 10MB (%ld > %d)",
+					total_length, (10 * 1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* value array for each dimension */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			if (info[i].typlen > 0)
+			{
+				/* pased by value or reference, but fixed length */
+				memcpy(data, &values[i][j], info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							VARSIZE_ANY(values[i][j]));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring (don't forget the \0 terminator!) */
+				memcpy(data, DatumGetPointer(values[i][j]),
+							strlen(DatumGetPointer(values[i][j])) + 1);
+				data += strlen(DatumGetPointer(values[i][j])) + 1;
+			}
+		}
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* and finally, the histogram buckets */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		*BUCKET_NTUPLES(bucket)   = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+				ssup_private = &ssup[j];
+
+				/* min boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								bsearch_comparator);
+
+				if (v == NULL)
+					elog(ERROR, "value for dim %d not found in array", j);
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* FIXME free the values/counts arrays here */
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary
+ * values deduplicated, so that it's possible to optimize the estimation
+ * part by caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete,
+	 * as we yet have to count the array sizes (from DimensionInfo
+	 * records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* now let's allocate a single buffer for all the values and counts */
+
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+	for (i = 0; i < ndims; i++)
+	{
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+	}
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+				sizeof(MVSerializedBucket) +		/* bucket pointer */
+				sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval && info[i].typlen == sizeof(Datum))
+		{
+			/* passed by value / Datum - simply reuse the array */
+			histogram->values[i] = (Datum*)tmp;
+			tmp += info[i].nbytes;
+		}
+		else
+		{
+			/* all the varlena data need a chunk from the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typbyval)
+			{
+				/* pased by value, but smaller than Datum */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= *BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which
+	 * we use when selecting bucket to partition), and then number of
+	 * distinct values for each partition (which we use when choosing
+	 * which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm
+ * produces buckets with about equal frequency and regular size. We
+ * select the bucket with the highest number of distinct values, and
+ * then split it by the longest dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this
+ * is used to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this
+ *       contains values for all the tuples from the sample, not just
+ *       the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned,
+ * or NULL if there are no buckets that may be split (i.e. all buckets
+ * contain a single distinct value).
+ *
+ * TODO Consider other partitioning criteria (v-optimal, maxdiff etc.).
+ *      For example use the "bucket volume" (product of dimension
+ *      lengths) to select the bucket.
+ *
+ *      We need buckets containing about the same number of tuples (so
+ *      about the same frequency), as that limits the error when we
+ *      match the bucket partially (in that case use 1/2 the bucket).
+ *
+ *      We also need buckets with "regular" size, i.e. not "narrow" in
+ *      some dimensions and "wide" in the others, because that makes
+ *      partial matches more likely and increases the estimation error,
+ *      especially when the clauses match many buckets partially. This
+ *      is especially serious for OR-clauses, because in that case any
+ *      of them may add the bucket as a (partial) match. With AND-clauses
+ *      all the clauses have to match the bucket, which makes this issue
+ *      somewhat less pressing.
+ *
+ *      For example this table:
+ *
+ *          CREATE TABLE t AS SELECT i AS a, i AS b
+ *                              FROM generate_series(1,1000000) s(i);
+ *          ALTER TABLE t ADD STATISTICS (histogram) ON (a,b);
+ *          ANALYZE t;
+ *
+ *      It's a very specific (and perhaps artificial) example, because
+ *      every bucket always has exactly the same number of distinct
+ *      values in all dimensions, which makes the partitioning tricky.
+ *
+ *      Then:
+ *
+ *          SELECT * FROM t WHERE a < 10 AND b < 10;
+ *
+ *      is estimated to return ~120 rows, while in reality it returns 9.
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *                      (actual time=0.185..270.774 rows=9 loops=1)
+ *         Filter: ((a < 10) AND (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      while the query using OR clauses is estimated like this:
+ *
+ *                                     QUERY PLAN
+ *      ----------------------------------------------------------------
+ *       Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *                      (actual time=0.118..189.919 rows=9 loops=1)
+ *         Filter: ((a < 10) OR (b < 10))
+ *         Rows Removed by Filter: 999991
+ *
+ *      which is clearly much worse. This happens because the histogram
+ *      contains buckets like this:
+ *
+ *          bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ *      i.e. the length of "a" dimension is (30310-3)=30307, while the
+ *      length of "b" is (30593-30134)=459. So the "b" dimension is much
+ *      narrower than "a". Of course, there are buckets where "b" is the
+ *      wider dimension.
+ *
+ *      This is partially mitigated by selecting the "longest" dimension
+ *      in partition_bucket() but that only happens after we already
+ *      selected the bucket. So if we never select the bucket, we can't
+ *      really fix it there.
+ *
+ *      The other reason why this particular example behaves so poorly
+ *      is due to the way we split the partition in partition_bucket().
+ *      Currently we attempt to divide the bucket into two parts with
+ *      the same number of sampled tuples (frequency), but that does not
+ *      work well when all the tuples are squashed on one end of the
+ *      bucket (e.g. exactly at the diagonal, as a=b). In that case we
+ *      split the bucket into a tiny bucket on the diagonal, and a huge
+ *      remaining part of the bucket, which is still going to be narrow
+ *      and we're unlikely to fix that.
+ *
+ *      So perhaps we need two partitioning strategies - one aiming to
+ *      split buckets with high frequency (number of sampled rows), the
+ *      other aiming to split "large" buckets. And alternating between
+ *      them, somehow.
+ *
+ * TODO Allowing the bucket to degenerate to a single combination of
+ *      values makes it rather strange MCV list. Maybe we should use
+ *      higher lower boundary, or maybe make the selection criteria
+ *      more complex (e.g. consider number of rows in the bucket, etc.).
+ *
+ *      That however is different from buckets 'degenerated' only for
+ *      some dimensions (e.g. half of them), which is perfectly
+ *      appropriate for statistics on a combination of low and high
+ *      cardinality columns.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ *      histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest
+ * bucket dimension, measured using the array of distinct values built
+ * at the very beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly
+ * distributed, and then use this to measure length. It's essentially
+ * a number of distinct values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts
+ * with roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning
+ * the new bucket (essentially shrinking the existing one in-place and
+ * returning the other "half" as a new bucket). The caller is responsible
+ * for adding the new bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension
+ * most in need of a split. For a nice summary and general overview, see
+ * "rK-Hist : an R-Tree based histogram for multi-dimensional selectivity
+ * estimation" thesis by J. A. Lopez, Concordia University, p.34-37 (and
+ * possibly p. 32-34 for explanation of the terms).
+ *
+ * TODO It requires care to prevent splitting only one dimension and not
+ *      splitting another one at all (which might happen easily in case
+ *      of strongly dependent columns - e.g. y=x). The current algorithm
+ *      minimizes this, but may still happen for perfectly dependent
+ *      examples (when all the dimensions have equal length, the first
+ *      one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	/* looking for the split value */
+	// int ndistinct = 1;	/* number of distinct values below current value */
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Look for the next dimension to split.
+	 */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* sort support for the bsearch_comparator */
+		ssup_private = &ssup;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		b = (Datum*)bsearch(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), bsearch_comparator);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values
+	 * and then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we don't do splits by null-only dimensions) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values
+	 * in this dimension, and we want to split this into half, so walk
+	 * through the array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value,
+	 * and use it as an exclusive upper boundary (and inclusive lower
+	 * boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct
+	 *      values (at least for even distinct counts), but that would
+	 *      require being able to do an average (which does not work
+	 *      for non-arithmetic types).
+	 *
+	 * TODO Another option is to look for a split that'd give about
+	 *      50% tuples (not distinct values) in each partition. That
+	 *      might work better when there are a few very frequent
+	 *      values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno'
+	 * index. We know 'nrows' rows should remain in the original
+	 * bucket and the rest goes to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should
+	 * go to the new one. Use the tupno field to get the actual HeapTuple
+	 * row from the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time
+ * data, i.e. sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies
+ * the Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types
+ * (assuming they don't use collations etc.)
+ *
+ * TODO This might evaluate and store the distinct counts for all
+ *      possible attribute combinations. The assumption is this might be
+ *      useful for estimating things like GROUP BY cardinalities (e.g.
+ *      in cases when some buckets contain a lot of low-frequency
+ *      combinations, and other buckets contain few high-frequency ones).
+ *
+ *      But it's unclear whether it's worth the price. Computing this
+ *      is actually quite cheap, because it may be evaluated at the very
+ *      end, when the buckets are rather small (so sorting it in 2^N ways
+ *      is not a big deal). Assuming the partitioning algorithm does not
+ *      use these values to do the decisions, of course (the current
+ *      algorithm does not).
+ *
+ *      The overhead with storing, fetching and parsing the data is more
+ *      concerning - adding 2^N values per bucket (even if it's just
+ *      a 1B or 2B value) would significantly bloat the histogram, and
+ *      thus the impact on optimizer. Which is not really desirable.
+ *
+ * TODO This only updates the ndistinct for the sample (or bucket), but
+ *      we eventually need an estimate of the total number of distinct
+ *      values in the dataset. It's possible to either use the current
+ *      1D approach (i.e., if it's more than 10% of the sample, assume
+ *      it's proportional to the number of rows). Or it's possible to
+ *      implement the estimator suggested in the article, supposedly
+ *      giving 'optimal' estimates (w.r.t. probability of error).
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes
+	 * above (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and
+ * non-NULL values in a single dimension. Each dimension may either be
+ * marked as 'nulls only', and thus containing only NULL values, or
+ * it must not contain any NULL values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns,
+ * it's necessary to build those NULL-buckets. This is done in an
+ * iterative way using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL
+ *         and non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not
+ *         marked as NULL-only, mark it as NULL-only and run the
+ *         algorithm again (on this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the
+ *         bucket into two parts - one with NULL values, one with
+ *         non-NULL values (replacing the current one). Then run
+ *         the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions
+ * should be quite low - limited by the number of NULL-buckets. Also,
+ * in each branch the number of nested calls is limited by the number
+ * of dimensions (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The
+ * number of buckets produced by this algorithm is rather limited - with
+ * N dimensions, there may be only 2^N such buckets (each dimension may
+ * be either NULL or non-NULL). So with 8 dimensions (current value of
+ * MVSTATS_MAX_DIMENSIONS) there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further
+ * optimizing the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL
+	 * in a dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute
+		 *       here - we can start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only,
+	 * but is not yet marked like that. It's enough to mark it and
+	 * repeat the process recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in
+	 * the dimension, one with non-NULL values. We don't need to sort
+	 * the data or anything, but otherwise it's similar to what's done
+	 * in partition_bucket().
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each
+	 *      bucket (NULL is not a value, so 0, and the other bucket got
+	 *      all the ndistinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+
+}
+
+/*
+ * We need to pass the SortSupport to the comparator, but bsearch()
+ * has no 'context' parameter, so we use a global variable (ugly).
+ */
+static int
+bsearch_comparator(const void * a, const void * b)
+{
+	Assert(ssup_private != NULL);
+	return compare_scalars_simple(a, b, (void*)ssup_private);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows
+ * returned if the statistics contains no histogram (or if there's no
+ * statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ * 
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options
+ * skew the lengths by distributing the distinct values uniformly. For
+ * data types without a clear meaning of 'distance' (e.g. strings) that
+ * is not a big deal, but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_size = 1.0;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple.
+		 * This should be an array of C strings which will
+		 * be processed later by the type input functions.
+		 */
+		values = (char **) palloc(9 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+		values[3] = (char *) palloc0(1024 * sizeof(char));
+		values[4] = (char *) palloc0(1024 * sizeof(char));
+		values[5] = (char *) palloc0(1024 * sizeof(char));
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/*
+		 * currently we only print array of indexes, but the deduplicated
+		 * values should be sorted, so this is actually quite useful
+		 *
+		 * TODO print the actual min/max values, using the output
+		 *      function of the attribute type
+		 */
+
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bucket_size *= (bucket->max[i] - bucket->min[i]) * 1.0
+											/ (histogram->nvalues[i]-1);
+
+			/* print the actual values, i.e. use output function etc. */
+			if (otype == 0)
+			{
+				Datum minval, maxval;
+				Datum minout, maxout;
+
+				format = "%s, %s";
+				if (i == 0)
+					format = "{%s%s";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %s}";
+
+				minval = histogram->values[i][bucket->min[i]];
+				minout = FunctionCall1(&fmgrinfo[i], minval);
+
+				maxval = histogram->values[i][bucket->max[i]];
+				maxout = FunctionCall1(&fmgrinfo[i], maxval);
+
+				// snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				snprintf(buff, 1024, format, values[1], DatumGetPointer(minout));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				// snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				snprintf(buff, 1024, format, values[2], DatumGetPointer(maxout));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else if (otype == 1)
+			{
+				format = "%s, %d";
+				if (i == 0)
+					format = "{%s%d";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %d}";
+
+				snprintf(buff, 1024, format, values[1], bucket->min[i]);
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2], bucket->max[i]);
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+			else
+			{
+				format = "%s, %f";
+				if (i == 0)
+					format = "{%s%f";
+				else if (i == histogram->ndimensions-1)
+					format = "%s, %f}";
+
+				snprintf(buff, 1024, format, values[1],
+						 bucket->min[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[1], buff, 1023);
+				buff[0] = '\0';
+
+				snprintf(buff, 1024, format, values[2],
+						bucket->max[i] * 1.0 / (histogram->nvalues[i]-1));
+				strncpy(values[2], buff, 1023);
+				buff[0] = '\0';
+			}
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == histogram->ndimensions-1)
+				format = "%s, %s}";
+
+			snprintf(buff, 1024, format, values[3], bucket->nullsonly[i] ? "t" : "f");
+			strncpy(values[3], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[4], bucket->min_inclusive[i] ? "t" : "f");
+			strncpy(values[4], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[5], bucket->max_inclusive[i] ? "t" : "f");
+			strncpy(values[5], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_size);	/* density */
+		snprintf(values[8], 64, "%f", bucket_size);	/* bucket_size */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+		pfree(values[4]);
+		pfree(values[5]);
+		pfree(values[6]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int i, j;
+
+	float ffull = 0, fpartial = 0;
+	int nfull = 0, npartial = 0;
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		char ranges[1024];
+
+		if (! matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		memset(ranges, 0, sizeof(ranges));
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			sprintf(ranges, "%s [%d %d]", ranges,
+										  DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+										  DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, ranges, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 2c22d31..b693f36 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,9 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2154,8 +2154,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 9));
+							PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 3529b03..37f473f 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,13 +39,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -53,6 +56,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -68,18 +72,22 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					12
+#define Natts_pg_mv_statistic					15
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_deps_enabled		5
 #define Anum_pg_mv_statistic_mcv_enabled		6
-#define Anum_pg_mv_statistic_mcv_max_items		7
-#define Anum_pg_mv_statistic_deps_built			8
-#define Anum_pg_mv_statistic_mcv_built			9
-#define Anum_pg_mv_statistic_stakeys			10
-#define Anum_pg_mv_statistic_stadeps			11
-#define Anum_pg_mv_statistic_stamcv				12
+#define Anum_pg_mv_statistic_hist_enabled		7
+#define Anum_pg_mv_statistic_mcv_max_items		8
+#define Anum_pg_mv_statistic_hist_max_buckets	9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_stakeys			13
+#define Anum_pg_mv_statistic_stadeps			14
+#define Anum_pg_mv_statistic_stamcv				15
+#define Anum_pg_mv_statistic_stahist			16
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b16eebc..19a490a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2674,6 +2674,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_size}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 2bcd582..8c50bfb 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -654,10 +654,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 4535db7..f05a517 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -92,6 +92,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -99,20 +216,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -121,6 +243,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -130,10 +254,20 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..e830816
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s7 ON mv_histogram (unknown_column) WITH (histogram);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s7 ON mv_histogram (a) WITH (histogram);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a, b) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+ERROR:  maximum number of buckets is 16384
+-- correct command
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s8 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s9 ON mv_histogram (a, b, c, d) WITH (histogram);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 66071d8..1a1a4ca 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1375,7 +1375,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 85d94f1..a885235 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 6584d73..2efdcd7 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -164,3 +164,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..27c2510
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s7 ON mv_histogram (unknown_column) WITH (histogram);
+
+-- single column
+CREATE STATISTICS s7 ON mv_histogram (a) WITH (histogram);
+
+-- single column, duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a) WITH (histogram);
+
+-- two columns, one duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a, b) WITH (histogram);
+
+-- unknown option
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (unknown_option);
+
+-- missing histogram statistics
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+
+-- invalid max_buckets value / too low
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+
+-- invalid max_buckets value / too high
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+
+-- correct command
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s8 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s9 ON mv_histogram (a, b, c, d) WITH (histogram);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.1.0

0006-multi-statistics-estimation.patchtext/x-patch; charset=UTF-8; name=0006-multi-statistics-estimation.patchDownload

From 3a564dbf9aa2c734d80c5e385f105cf8a48da1f5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/9] multi-statistics estimation

The general idea is that a probability (which is what selectivity is)
can be split into a product of conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part may be
simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute the original
probability.

The implementation works in the other direction, though. We know what
probability P(A & B & C) we need to compute, and also what statistics
are available.

So we search for a combinations of statistics, covering the clauses in
an optimal way (most clauses covered, most dependencies exploited).

There are two possible approaches - exhaustive and greedy. The
exhaustive one walks through all permutations of stats using dynamic
programming, so it's guaranteed to find the optimal solution, but it
soon gets very slow as it's roughly O(N!). The dynamic programming may
improve that a bit, but it's still far too expensive for large numbers
of statistics (on a single table).

The greedy algorithm is very simple - in every step choose the best
solution. That may not guarantee the best solution globally (but maybe
it does?), but it only needs N steps to find the solution, so it's very
fast (processing the selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with respect to
runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply them to the
clauses using the conditional probabilities. We process the selected
stats one by one, and for each we select the estimated clauses and
conditions. See clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to be covered by
a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single multivariate
statistics.

Clauses not covered by a single statistics at this level will be passed
to clause_selectivity() but this will treat them as a collection of
simpler clauses (connected by AND or OR), and the clauses from the
previous level will be used as conditions.

So using the same example, the last clause will be passed to
clause_selectivity() with 'clause1' and 'clause2' as conditions, and it
will be processed using multivariate stats if possible.

The other limitation is that all the expressions have to be
mv-compatible, i.e. there can't be a mix of expressions. If this is
violated, the clause may be passed to the next level (just like with
list of clauses not covered by a single statistics), which splits that
into clauses handled by multivariate stats and clauses handler by
regular statistics.

rework clauselist_selectivity_or to handle OR-clauses correctly

We might invent a completely new set of functions here, resembling
clauselist_selectivity but adapting the ideas to OR-clauses.

But luckily we know that each OR-clause

    (a OR b OR c)

may be rewritten as an equivalent AND-clause using negation:

    NOT ((NOT a) AND (NOT b) AND (NOT c))

And that's something we can pass to clauselist_selectivity.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |   11 +-
 src/backend/optimizer/path/clausesel.c | 1990 ++++++++++++++++++++++++++------
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/backend/utils/mvstats/README.stats |  166 +++
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 10 files changed, 1890 insertions(+), 358 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index dc035d7..8f11b7a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -969,7 +969,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 76d0e15..e78f140 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -498,7 +498,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2149,7 +2150,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
@@ -3618,7 +3620,8 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 0,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
 	/*
@@ -3637,7 +3640,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 		 */
 		fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
 														0, fpinfo->jointype,
-														extra->sjinfo);
+														extra->sjinfo, NIL);
 
 	}
 	fpinfo->server = GetForeignServer(joinrel->serverid);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0de2418..c1b8999 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -60,23 +69,25 @@ static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
+static List *clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
-static List *clauselist_mv_split(PlannerInfo *root, Index relid,
-								 List *clauses, List **mvclauses,
-								 MVStatisticInfo *mvstats, int types);
-
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats, List *clauses,
+									List *conditions, bool is_or);
 
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -90,10 +101,33 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root, Index relid,
+							List *mvstats, List *clauses, List *conditions);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
 
+static bool stats_type_matches(MVStatisticInfo *stat, int type);
+
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
 
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
@@ -168,14 +202,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* list of multivariate stats on the relation */
 	List	   *stats = NIL;
@@ -191,12 +226,13 @@ clauselist_selectivity(PlannerInfo *root,
 		stats = find_stats(root, relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Apply functional dependencies, but first check that there are some stats
@@ -228,31 +264,96 @@ clauselist_selectivity(PlannerInfo *root,
 		(count_mv_attnums(clauses, relid,
 						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
-		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
-											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+		ListCell *s;
+
+		/*
+		 * Copy the conditions we got from the upper part of the expression tree
+		 * so that we can add local conditions to it (we need to keep the
+		 * original list intact, for sibling expressions - other expressions
+		 * at the same level).
+		 */
+		List *conditions_local = list_copy(conditions);
 
-		/* and search for the statistic covering the most attributes */
-		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+		/* find the best combination of statistics */
+		List *solution = choose_mv_statistics(root, relid, stats,
+											  clauses, conditions);
 
-		if (mvstat != NULL)	/* we have a matching stats */
+		/*
+		 * We have a good solution, which is merely a list of statistics that
+		 * we need to apply. We'll apply the statistics one by one (in the order
+		 * as they appear in the list), and for each statistic we'll
+		 *
+		 * (1) find clauses compatible with the statistic (and remove them
+		 *     from the list)
+		 *
+		 * (2) find local conditions compatible with the statistic
+		 *
+		 * (3) do the estimation P(clauses | conditions)
+		 *
+		 * (4) append the estimated clauses to local conditions
+		 *
+		 * continuously modify 
+		 */
+		foreach (s, solution)
 		{
-			/* clauses compatible with multi-variate stats */
-			List	*mvclauses = NIL;
+			MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
 
-			/* split the clauselist into regular and mv-clauses */
-			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+			/* clauses compatible with the statistic we're applying right now */
+			List	*stat_clauses = NIL;
+			List	*stat_conditions = NIL;
 
-			/* we've chosen the histogram to match the clauses */
-			Assert(mvclauses != NIL);
+			/*
+			 * Find clauses and conditions matching the statistic - the clauses
+			 * need to be removed from the list, while conditions should remain
+			 * there (so that we can apply them repeatedly).
+			 */
+			stat_clauses
+				= clauses_matching_statistic(&clauses, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 true);
+
+			stat_conditions
+				= clauses_matching_statistic(&conditions_local, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 false);
+
+			/*
+			 * If we got no clauses to estimate, we've done something wrong,
+			 * either during the optimization, detecting compatible clause, or
+			 * somewhere else.
+			 *
+			 * Also, we need at least two attributes in clauses and conditions.
+			 */
+			Assert(stat_clauses != NIL);
+			Assert(count_mv_attnums(list_union(stat_clauses, stat_conditions),
+								relid, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);
 
 			/* compute the multivariate stats */
-			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			s1 *= clauselist_mv_selectivity(root, mvstat,
+											stat_clauses, stat_conditions,
+											false); /* AND */
+
+			/*
+			 * Add the new clauses to the local conditions, so that we can use
+			 * them for the subsequent statistics. We only add the clauses,
+			 * because the conditions are already there (or should be).
+			 */
+			conditions_local = list_concat(conditions_local, stat_clauses);
 		}
+
+		/* from now on, work only with the 'local' list of conditions */
+		conditions = conditions_local;
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return s1 * clause_selectivity(root, (Node *) linitial(clauses),
+									   varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -264,7 +365,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -423,6 +525,55 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for OR-clauses. We can't simply use
+ * the same multi-statistic estimation logic for AND-clauses, at least not
+ * directly, because there are a few key differences:
+ *
+ *   - functional dependencies don't really apply to OR-clauses
+ *
+ *   - clauselist_selectivity() is based on decomposing the selectivity into
+ *     a sequence of conditional probabilities (selectivities), but that can
+ *     be done only for AND-clauses
+ *
+ * We might invent a similar infrastructure for optimizing OR-clauses, doing
+ * something similar to what clause_selectivity does for AND-clauses, but
+ * luckily we know that each disjunctive normal form (aka OR-clause)
+ *
+ *     (a OR b OR c)
+ *
+ * may be rewritten as an equivalent conjunctive normal form (aka AND-clause)
+ * by using negation:
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * And that's something we can pass to clauselist_selectivity and let it do
+ * all the heavy lifting.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	List	   *args = NIL;
+	ListCell   *l;
+	Expr	   *expr;
+
+	/* build arguments for the AND-clause by negating args of the OR-clause */
+	foreach (l, clauses)
+		args = lappend(args, makeBoolExpr(NOT_EXPR, list_make1(lfirst(l)), -1));
+
+	/* and then the actual OR-clause on the negated args */
+	expr = makeBoolExpr(AND_EXPR, args, -1);
+
+	/* instead of constructing NOT expression, just do (1.0 - s) */
+	return 1.0 - clauselist_selectivity(root, list_make1(expr), varRelid,
+										jointype, sjinfo, conditions);
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -629,7 +780,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -749,7 +901,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -758,29 +911,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -870,7 +1012,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -879,7 +1022,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else
 	{
@@ -943,15 +1087,16 @@ clause_selectivity(PlannerInfo *root,
  *          in the MCV list, then the selectivity is below the lowest frequency
  *          found in the MCV list,
  *
- * TODO When applying the clauses to the histogram/MCV list, we can do
- *      that from the most selective clauses first, because that'll
- *      eliminate the buckets/items sooner (so we'll be able to skip
- *      them without inspection, which is more expensive). But this
- *      requires really knowing the per-clause selectivities in advance,
- *      and that's not what we do now.
+ * TODO When applying the clauses to the histogram/MCV list, we can do that from
+ *      the most selective clauses first, because that'll eliminate the
+ *      buckets/items sooner (so we'll be able to skip them without inspection,
+ *      which is more expensive). But this requires really knowing the
+ *      per-clause selectivities in advance, and that's not what we do now.
+ *
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -969,7 +1114,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 
 	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
 										   &fullmatch, &mcv_low);
 
 	/*
@@ -982,7 +1128,8 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
 	 *      selectivity as upper bound */
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
 
 	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
 	return s1 + s2;
@@ -1016,260 +1163,1325 @@ get_varattnos(Node * node, Index relid)
 								 k + FirstLowInvalidHeapAttributeNumber);
 	}
 
-	bms_free(varattnos);
+	bms_free(varattnos);
+
+	return result;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid, int types)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		bms_free(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid, int type)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+static List *
+clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove)
+{
+	int i;
+	Bitmapset  *stat_attnums = NULL;
+	List	   *matching_clauses = NIL;
+	ListCell   *lc;
+
+	/* build attnum bitmapset for this statistics */
+	for (i = 0; i < statistic->stakeys->dim1; i++)
+		stat_attnums = bms_add_member(stat_attnums,
+									  statistic->stakeys->values[i]);
+
+	/*
+	 * We can't use foreach here, because we may need to remove some of the
+	 * clauses if (remove=true).
+	 */
+	lc = list_head(*clauses);
+	while (lc)
+	{
+		Node *clause = (Node*)lfirst(lc);
+		Bitmapset *attnums = NULL;
+
+		/* must advance lc before list_delete possibly pfree's it */
+		lc = lnext(lc);
+
+		/*
+		 * skip clauses that are not compatible with stats (just leave them
+		 * in the original list)
+		 *
+		 * XXX Perhaps this should check what stats are actually available in
+		 *     the statistics (not a big deal now, because MCV and histograms
+		 *     handle the same types of conditions).
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			bms_free(attnums);
+			continue;
+		}
+
+		/* if the clause is covered by the statistic, add it to the list */
+		if (bms_is_subset(attnums, stat_attnums))
+		{
+			matching_clauses = lappend(matching_clauses, clause);
+
+			/* if remove=true, remove the matching item from the main list */
+			if (remove)
+				*clauses = list_delete_ptr(*clauses, clause);
+		}
+
+		bms_free(attnums);
+	}
+
+	bms_free(stat_attnums);
+
+	return matching_clauses;
+}
+
+/*
+ * Selects the best combination of multivariate statistics, in an exhaustive
+ * way, where 'best' means:
+ *
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
+ *
+ * Don't call this directly but through choose_mv_statistics(), which does some
+ * additional tricks to minimize the runtime.
+ *
+ *
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with maximum
+ * depth equal to the number of multi-variate statistics available on the table.
+ * It actually explores all valid combinations of stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it matches are
+ * divided into 'conditions' (clauses already matched by at least one previous
+ * statistics) and clauses that are estimated.
+ *
+ * Then several checks are performed:
+ *
+ *  (a) The statistics covers at least 2 columns, referenced in the estimated
+ *      clauses (otherwise multi-variate stats are useless).
+ *
+ *  (b) The statistics covers at least 1 new column, i.e. column not refefenced
+ *      by the already used stats (and the new column has to be referenced by
+ *      the clauses, of couse). Otherwise the statistics would not add any new
+ *      information.
+ *
+ * There are some other sanity checks (e.g. stats must not be used twice etc.).
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a rather simple optimality criteria, so it may
+ * not do the best choice when
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but with
+ *     statistics in a different order). It's unclear which solution in the best
+ *     one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those solutions,
+ *      and then combine them to get the final estimate (e.g. by using average
+ *      or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for some
+ *     types of clauses (e.g. MCV list is generally a better match for equality
+ *     conditions than a histogram).
+ *
+ *     But maybe this is pointless - generally, each column is either a label
+ *     (it's not important whether because of the data type or how it's used),
+ *     or a value with ordering that makes sense. So either a MCV list is more
+ *     appropriate (labels) or a histogram (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing both types of data
+ *     (some columns would work best with MCVs, some with histograms). Maybe we
+ *     could invent a new type of statistics combining MCV list and histogram
+ *     (keeping a small histogram for each MCV item, and a separate histogram
+ *     for values not on the MCV list).
+ *
+ * TODO The algorithm should probably count number of Vars (not just attnums)
+ *      when computing the 'score' of each solution. Computing the ratio of
+ *      (num of all vars) / (num of condition vars) as a measure of how well
+ *      the solution uses conditions might be useful.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* this may run for a long sime, so let's make it interruptible */
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+		Bitmapset  *new_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+
+			/* add the attnums into attnums from 'new clauses' */
+			// new_attnums = bms_union(new_attnums, clause_attnums);
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+		bms_free(new_attnums);
+
+		all_attnums = NULL;
+		new_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics covering
+ * the clauses. This chooses the "best" statistics at each step, so the
+ * resulting solution may not be the best solution globally, but this produces
+ * the solution in only N steps (where N is the number of statistics), while
+ * the exhaustive approach may have to walk through ~N! combinations (although
+ * some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does the same
+ * thing (but in a different way).
+ *
+ * Don't call this directly, but through choose_mv_statistics().
+ *
+ * TODO There are probably other metrics we might use - e.g. using number of
+ *      columns (num_cond_columns / num_cov_columns), which might work better
+ *      with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled in a special
+ *      way, because there will be 0 conditions at that moment, so there needs
+ *      to be some other criteria - e.g. using the simplest (or most complex?)
+ *      clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria, and branch
+ *      the search. This is however tricky, because if we choose k statistics at
+ *      each step, we get k^N branches to walk through (with N steps). That's
+ *      not really good with large number of stats (yet better than exhaustive
+ *      search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
+ * Remove clauses not covered by any of the available statistics
+ *
+ * This helps us to reduce the amount of work done in choose_mv_statistics()
+ * by not having to deal with clauses that can't possibly be useful.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Index relid, int type,
+			   List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+
+		/*
+		 * We do assume that thanks to previous checks, we should not run into
+		 * clauses that are incompatible with multivariate stats here. We also
+		 * need to collect the attnums for the clause.
+		 *
+		 * XXX Maybe turn this into an assert?
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &clause_attnums, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* Is there a multivariate statistics covering the clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			/* skip statistics not matching the required type */
+			if (! stats_type_matches(stat, type))
+				continue;
+
+			/*
+			 * see if all clause attributes are covered by the statistic
+			 *
+			 * We'll do that in the opposite direction, i.e. we'll see how many
+			 * attributes of the statistic are referenced in the clause, and then
+			 * compare the counts.
+			 */
+			for (k = 0; k < stat->stakeys->dim1; k++)
+				if (bms_is_member(stat->stakeys->values[k], clause_attnums))
+					matches += 1;
+
+			/*
+			 * If the number of matches is equal to attributes referenced by the
+			 * clause, then the clause is covered by the statistic.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+/*
+ * Remove statistics not covering any new clauses
+ *
+ * Statistics not covering any new clauses (conditions don't count) are not
+ * really useful, so let's ignore them. Also, we need the statistics to
+ * reference at least two different attributes (both in conditions and clauses
+ * combined), and at least one of them in the clauses alone.
+ *
+ * This check might be made more strict by checking against individual clauses,
+ * because by using the bitmapsets of all attnums we may actually use attnums
+ * from clauses that are not covered by the statistics. For example, we may
+ * have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this (assuming
+ * there are some statistics covering both clases).
+ *
+ * Parameters:
+ *
+ *     stats       - list of statistics to filter
+ *     new_attnums - attnums referenced in new clauses
+ *     all_attnums - attnums referenced by contidions and new clauses combined
+ *
+ * Returns filtered list of statistics.
+ *
+ * TODO Do the more strict check, i.e. walk through individual clauses and
+ *      conditions and only use those covered by the statistics.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
+
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
+
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
+
+	return mvstats;
+}
+
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
+
+	Assert(nmvstats > 0);
 
-	return result;
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
 }
 
+
 /*
- * Collect attributes from mv-compatible clauses.
+ * Remove redundant statistics
+ *
+ * If there are multiple statistics covering the same set of columns (counting
+ * only those referenced by clauses and conditions), we can apply one of those
+ * anyway and further reduce the size of the optimization problem.
+ *
+ * Thus when redundant stats are detected, we keep the smaller one (the one with
+ * fewer columns), based on the assumption that it's more accurate and also
+ * faster to process. That may be untrue for two reasons - first, the accuracy
+ * really depends on number of buckets/MCV items, not the number of columns.
+ * Second, some types of statistics may work better for certain types of clauses
+ * (e.g. MCV lists for equality conditions) etc.
  */
-static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid, int types)
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
 {
-	Bitmapset  *attnums = NULL;
-	ListCell   *l;
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate using
-	 * multivariate stats, and remember the relid/columns. We'll then
-	 * cross-check if we have suitable stats, and only if needed we'll split
-	 * the clauses into multivariate and regular lists.
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
+
+	/*
+	 * Get the varattnos from both conditions and clauses.
+	 *
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
-	 * using either a range or equality.
+	 * XXX Is that really true?
 	 */
-	foreach (l, clauses)
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
 	{
-		Node	   *clause = (Node *) lfirst(l);
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
 
-		/* ignore the result here - we only need the attnums */
-		clause_is_mv_compatible(clause, relid, &attnums, types);
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
 	}
 
-	/*
-	 * If there are not at least two attributes referenced by the clause(s),
-	 * we can throw everything out (as we'll revert to simple stats).
-	 */
-	if (bms_num_members(attnums) <= 1)
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
-		attnums = NULL;
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
 	}
 
-	return attnums;
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
 }
 
-/*
- * Count the number of attributes in clauses compatible with multivariate stats.
- */
-static int
-count_mv_attnums(List *clauses, Index relid, int type)
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
 {
-	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+	int i;
+	ListCell *l;
 
-	c = bms_num_members(attnums);
+	Node** clauses_array;
 
-	bms_free(attnums);
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
 
-	return c;
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
 }
 
-/*
- * Count varnos referenced in the clauses, and if there's a single varno then
- * return the index in 'relid'.
- */
-static int
-count_varnos(List *clauses, Index *relid)
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Index relid,
+					 int type, Node **clauses, int nclauses)
 {
-	int cnt;
-	Bitmapset *varnos = NULL;
+	int			i;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
 
-	varnos = pull_varnos((Node *) clauses);
-	cnt = bms_num_members(varnos);
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
 
-	/* if there's a single varno in the clauses, remember it */
-	if (bms_num_members(varnos) == 1)
-		*relid = bms_singleton_member(varnos);
+		if (! clause_is_mv_compatible(clauses[i], relid, &attnums, type))
+			elog(ERROR, "should not get non-mv-compatible clause");
 
-	bms_free(varnos);
+		clauses_attnums[i] = attnums;
+	}
 
-	return cnt;
+	return clauses_attnums;
 }
- 
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
+}
+
 /*
- * We're looking for statistics matching at least 2 attributes, referenced in
- * clauses compatible with multivariate statistics. The current selection
- * criteria is very simple - we choose the statistics referencing the most
- * attributes.
- *
- * If there are multiple statistics referencing the same number of columns
- * (from the clauses), the one with less source columns (as listed in the
- * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
- *
- * This is a very simple criteria, and has several weaknesses:
- *
- * (a) does not consider the accuracy of the statistics
- *
- *     If there are two histograms built on the same set of columns, but one
- *     has 100 buckets and the other one has 1000 buckets (thus likely
- *     providing better estimates), this is not currently considered.
- *
- * (b) does not consider the type of statistics
- *
- *     If there are three statistics - one containing just a MCV list, another
- *     one with just a histogram and a third one with both, we treat them equally.
+ * Chooses the combination of statistics, optimal for estimation of a particular
+ * clause list.
  *
- * (c) does not consider the number of clauses
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce the size
+ * of the problem (eliminate clauses/statistics that can't be really used in
+ * the solution).
  *
- *     As explained, only the number of referenced attributes counts, so if
- *     there are multiple clauses on a single attribute, this still counts as
- *     a single attribute.
+ * It also precomputes bitmaps for attributes covered by clauses and statistics,
+ * so that we don't need to do that over and over in the actual optimizations
+ * (as it's both CPU and memory intensive).
  *
- * (d) does not consider type of condition
  *
- *     Some clauses may work better with some statistics - for example equality
- *     clauses probably work better with MCV lists than with histograms. But
- *     IS [NOT] NULL conditions may often work better with histograms (thanks
- *     to NULL-buckets).
+ * TODO Another way to make the optimization problems smaller might be splitting
+ *      the statistics into several disjoint subsets, i.e. if we can split the
+ *      graph of statistics (after the elimination) into multiple components
+ *      (so that stats in different components share no attributes), we can do
+ *      the optimization for each component separately.
  *
- * So for example with five WHERE conditions
- *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
- *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
- * as it references the most columns.
- *
- * Once we have selected the multivariate statistics, we split the list of
- * clauses into two parts - conditions that are compatible with the selected
- * stats, and conditions are estimated using simple statistics.
- *
- * From the example above, conditions
- *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
- *
- * will be estimated using the multivariate statistics (a,b,c,d) while the last
- * condition (e = 1) will get estimated using the regular ones.
- *
- * There are various alternative selection criteria (e.g. counting conditions
- * instead of just referenced attributes), but eventually the best option should
- * be to combine multiple statistics. But that's much harder to do correctly.
- *
- * TODO Select multiple statistics and combine them when computing the estimate.
- *
- * TODO This will probably have to consider compatibility of clauses, because
- *      'dependencies' will probably work only with equality clauses.
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew that we
+ *      can cover 10 clauses and reuse 8 dependencies, maybe covering 9 clauses
+ *      and 7 dependencies would be OK?
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static List*
+choose_mv_statistics(PlannerInfo *root, Index relid, List *stats,
+					 List *clauses, List *conditions)
 {
 	int i;
-	ListCell   *lc;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
+
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
 
-	MVStatisticInfo *choice = NULL;
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements) and for
-	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
 	 */
-	foreach (lc, stats)
+	while (true)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
-
-		/* columns matching this statistics */
-		int matches = 0;
+		List	   *tmp;
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
-			continue;
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, relid, type,
+							 stats, clauses, &compatible_attnums);
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
 
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		if (conditions != NIL)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
+			tmp = filter_clauses(root, relid, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
 		}
-	}
 
-	return choice;
-}
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
 
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
 
-/*
- * This splits the clauses list into two parts - one containing clauses that
- * will be evaluated using the chosen statistics, and the remaining clauses
- * (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, Index relid,
-					List *clauses, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
 
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
 
-	Bitmapset *mvattnums = NULL;
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
 
-	/* build bitmap of attributes, so we can do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
 
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, relid, type,
+										   clauses_array, nclauses);
 
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, relid, type,
+											  conditions_array, nconditions);
 
-		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
 		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
 		}
 
-		/*
-		 * The clause matches the selected stats, so put it to the list of
-		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
-		 * clauses (that may be selected later).
-		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
+		pfree(best);
 	}
 
-	/*
-	 * Perform regular estimation using the clauses incompatible with the chosen
-	 * histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
+
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
+
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
+
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
 
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
+
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
+
+	return result;
 }
 
 typedef struct
@@ -1474,6 +2686,7 @@ clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int type
 	return true;
 }
 
+
 /*
  * collect attnums from functional dependencies
  *
@@ -2022,6 +3235,24 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
  * Check that there are stats with at least one of the requested types.
  */
 static bool
+stats_type_matches(MVStatisticInfo *stat, int type)
+{
+	if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+		return true;
+
+	if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+		return true;
+
+	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
 has_stats(List *stats, int type)
 {
 	ListCell   *s;
@@ -2030,13 +3261,8 @@ has_stats(List *stats, int type)
 	{
 		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
 
-		if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
-			return true;
-
-		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+		/* terminate if we've found at least one matching statistics */
+		if (stats_type_matches(stat, type))
 			return true;
 	}
 
@@ -2087,22 +3313,26 @@ find_stats(PlannerInfo *root, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -2113,32 +3343,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -2369,64 +3652,57 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				}
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mcvlist->nitems; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2484,15 +3760,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2505,25 +3784,55 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
 
-	nmatches = mvhist->nbuckets;
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2537,10 +3846,23 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 #ifdef DEBUG_MVHIST
@@ -2549,9 +3871,14 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /* cached result of bucket boundary comparison for a single dimension */
@@ -2699,7 +4026,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 {
 	int i;
 	ListCell * l;
-
+ 
 	/*
 	 * Used for caching function calls, only once per deduplicated value.
 	 *
@@ -2742,7 +4069,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 			FmgrInfo	opproc;			/* operator */
 			fmgr_info(get_opcode(expr->opno), &opproc);
-
+ 
 			/* reset the cache (per clause) */
 			memset(callcache, 0, mvhist->nbuckets);
 
@@ -2902,64 +4229,57 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
-			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			/* by default none of the buckets matches the clauses (OR clause) */
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* but AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
-
+			pfree(tmp_matches);
 		}
 		else
 			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* free the call cache */
 	pfree(callcache);
 
 	return nmatches;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..57214e0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3518,7 +3518,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3541,7 +3542,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3708,7 +3710,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3744,7 +3746,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3781,7 +3784,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3919,12 +3923,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3936,7 +3942,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index ea831f5..6299e75 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 46c95b0..7d0a3a1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1627,13 +1627,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6259,7 +6261,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6579,7 +6582,8 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7330,7 +7334,8 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7560,7 +7565,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea5a09a..27a8de5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -393,6 +394,15 @@ static const struct config_enum_entry force_parallel_mode_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3707,6 +3717,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 3e4f4d1..d404914 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -90,6 +90,137 @@ even attempting to do the more expensive estimation.
 Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
+Combining multiple statistics
+-----------------------------
+
+When estimating selectivity of a list of clauses, there may exist no statistics
+covering all of them. If there are multiple statistics, each covering some
+subset of the attributes, the optimizer needs to figure out which of those
+statistics to apply.
+
+When the statistics do not overlap, the solution is trivial - we can simply
+split the groups of conditions by the matching statistics, and then multiply the
+selectivities. For example assume multivariate statistics on (b,c) and (d,e),
+and a condition like this:
+
+    (a=1) AND (b=2) AND (c=3) AND (d=4) AND (e=5)
+
+Then (a=1) is not covered by any of the statistics, so will be estimated using
+the regular per-column statistics. The two conditions ((b=2) AND (c=3)) will be
+estimated using the (b,c) statistics, and ((d=4) AND (e=5)) will be estimated
+using (d,e) statistics. And the resulting selectivities will be estimated.
+
+Now, what if the statistics overlap? For example assume the same condition as
+above, but let's say we have statistics on (a,b,c) and (a,c,d,e). What then?
+
+As selectivity is just a probability that the condition holds for a random row,
+we can write the selectivity like this:
+
+    P(a=1 & b=2 & c=3 & d=4 & e=5)
+
+and we can rewrite it using conditional probability like this
+
+    P(a=1 & b=2 & c=3) * P(d=4 & e=5 | a=1 & b=2 & c=3)
+
+Notice that the first part already matches to (a,b,c) statistics. If we assume
+that columns that are not referenced by the same statistics are independent, we
+may rewrite the second half like this
+
+    P(d=4 & e=5 | a=1 & b=2 & c=3) = P(d=4 & e=5 | a=1 & c=3)
+
+which corresponds to the statistics on (a,c,d,e).
+
+If there are multiple statistics defined on a table, it's not difficult to come
+up with examples when there are multiple ways to combine them to cover a list of
+clauses. We need a way to find the best combination of statistics.
+
+This is the purpose of choose_mv_statistics(). It searches through the possible
+combinations of statistics, and searches such combination that
+
+    (a) covers the most clauses of the list
+
+    (b) reuses the maximum number of clauses as conditions
+        (in conditional probabilities)
+
+While (a) criteria seems natural, the (b) may seem a bit awkward at first. The
+idea is that conditions in a way of transfering information about dependencies
+between statistics.
+
+There are two alternative implementations of choose_mv_statistics() - greedy
+and exhaustive. Exhaustive actually searches through all possible combinations
+of statistics, and for larger numbers of statistics may get quite expensive
+(as it, unsurprisingly, has exponential cost). Greedy terminates in less than
+K steps (when K is the number of clauses), and in each step chooses the best
+next statistics. I've been unable to come up with an example where those two
+approaches would produce different combinations.
+
+It's possible to choose the optimization using mvstat_search_type, with either
+'greedy' or 'exhaustive' values (default is 'greedy').
+
+    SET mvstat_search_type = 'exhaustive';
+
+Note: This is meant mostly for experimentation. I do expect we'll choose one of
+the algorithms and remove the GUC before commit.
+
+
+Limitations of combining statistics
+-----------------------------------
+
+As described in the section 'Combining multiple statistics', the current appoach
+is based on transfering information between statistics by means of conditional
+probabilities. This is a relatively cheap and efficient approach, but it is
+based on two assumptions:
+
+    (1) The overlap between the statistics needs to be sufficiently large, i.e.
+        there needs to be enough columns shared by the statistics to transfer
+        information about dependencies between the remaining columns.
+
+    (2) The query needs to include sufficient clauses on the shared columns.
+
+How a violation of those assumptions may be a problem can be illustrated by
+a simple example. Assume a table with three columns (a,b,c) containing exactly
+the same values, and statistics on (a,b) and (b,c):
+
+    CREATE TABLE test AS SELECT i, i, i
+                           FROM generate_series(1,1000);
+
+    CREATE STATISTICS s1 ON test (a,b) WITH (mcv);
+    CREATE STATISTICS s2 ON test (b,c) WITH (mcv);
+
+    ANALYZE test;
+
+First, let's estimate this query:
+
+    SELECT * FROM test WHERE (a < 10) AND (c < 10);
+
+Clearly, there are no conditions on 'b' (which is the only column shared by the
+two statistics), so we'll end up with an estimate based on assumption of
+independence:
+
+    P(a < 10) * P(c < 10) = 0.01 * 0.01 = 0.0001
+
+Which is a significant under-estimate, as the proper selectivity is 0.01.
+
+But let's estimate another query:
+
+    SELECT * FROM test WHERE (a < 10) AND (b < 500) AND (c < 10);
+
+In this case, the estimate may be computed for example like this:
+
+    P[(a < 10) & (b < 500) & (c < 10)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (a < 10) & (b < 500)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (b < 500)]
+
+The trouble is the probability P(c < 10 | b < 500) evaluates to 0.02, because
+we have assumed (a) and (c) are independent because there is no statistic
+containing both these columns, and the condition on (b) does not transfer
+sufficient amount of information between the two statistics.
+
+Currently, the only solution is to build statistics on all three columns, but
+see the 'combining statistics using convolution' section for ideas on how to
+improve this.
+
+
 Further (possibly crazy) ideas
 ------------------------------
 
@@ -111,3 +242,38 @@ But of course, this may result in expensive estimation (CPU-wise).
 
 So we might add a GUC to choose between a simple (single statistics) and thus
 multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
+
+
+Combining stats using convolution
+---------------------------------
+
+While the current approach for combining statistics is based on conditional
+probabilities, and thus only works when the query includes conditions on the
+overlapping parts of the statistics. But there may be other ways to combine
+statistics, relaxing this requirement.
+
+Let's assume two histograms H1 and H2 - then combining them might work about
+like this:
+
+
+    for (buckets of H1, satisfying local conditions)
+    {
+        for (buckets of H2, overlapping with H1 bucket)
+        {
+            mark H2 bucket as 'valid'
+        }
+    }
+
+    s1 = s2 = 0.0
+    for (buckets of H2 marked as valid)
+    {
+        s1 += frequency
+
+        if (bucket satistifes local conditions)
+             s2 += frequency
+    }
+
+    s = (s2 / s1) /* final selectivity estimate */
+
+However this may quickly get non-trivial, e.g. when combining two statistics
+of different types (histogram vs. MCV).
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..33f5a1b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -192,11 +192,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index f05a517..35b2f8e 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
2.1.0

0007-multivariate-ndistinct-coefficients.patchtext/x-patch; charset=UTF-8; name=0007-multivariate-ndistinct-coefficients.patchDownload

From d9b0afd75f2f678079d50f3d520bdd478c75bc89 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Wed, 23 Dec 2015 02:07:58 +0100
Subject: [PATCH 7/9] multivariate ndistinct coefficients

---
 doc/src/sgml/ref/create_statistics.sgml    |   9 ++
 src/backend/catalog/system_views.sql       |   3 +-
 src/backend/commands/analyze.c             |   2 +-
 src/backend/commands/statscmds.c           |  11 +-
 src/backend/optimizer/path/clausesel.c     |   4 +
 src/backend/optimizer/util/plancat.c       |   4 +-
 src/backend/utils/adt/selfuncs.c           |  93 +++++++++++++++-
 src/backend/utils/mvstats/Makefile         |   2 +-
 src/backend/utils/mvstats/README.ndistinct |  83 ++++++++++++++
 src/backend/utils/mvstats/README.stats     |   2 +
 src/backend/utils/mvstats/common.c         |  23 +++-
 src/backend/utils/mvstats/mvdist.c         | 171 +++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h      |  26 +++--
 src/include/nodes/relation.h               |   2 +
 src/include/utils/mvstats.h                |   9 +-
 src/test/regress/expected/rules.out        |   3 +-
 16 files changed, 424 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.ndistinct
 create mode 100644 src/backend/utils/mvstats/mvdist.c

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index fd3382e..80360a6 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -168,6 +168,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6afdee0..a550141 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -169,7 +169,8 @@ CREATE VIEW pg_mv_stats AS
         length(S.stamcv) AS mcvbytes,
         pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
         length(S.stahist) AS histbytes,
-        pg_mv_stats_histogram_info(S.stahist) AS histinfo
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo,
+        standcoeff AS ndcoeff
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9087532..c29f1be 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -582,7 +582,7 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 		}
 
 		/* Build multivariate stats (if there are any). */
-		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
+		build_mv_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index e2f3ff1..11de1c5 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -72,7 +72,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool 	build_dependencies = false,
 			build_mcv = false,
-			build_histogram = false;
+			build_histogram = false,
+			build_ndistinct = false;
 
 	int32 	max_buckets = -1,
 			max_mcv_items = -1;
@@ -155,6 +156,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "max_mcv_items") == 0)
@@ -209,10 +212,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv || build_histogram))
+	if (! (build_dependencies || build_mcv || build_histogram || build_ndistinct))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram, ndistinct) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -246,6 +249,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
 	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_ndist_enabled-1] = BoolGetDatum(build_ndistinct);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
@@ -253,6 +257,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+	nulls[Anum_pg_mv_statistic_standist -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c1b8999..2540da9 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
+#define		MV_CLAUSE_TYPE_NDIST	0x08
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -3246,6 +3247,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_NDIST) && stat->ndist_built)
+		return true;
+
 	return false;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 2519249..3741b7a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -423,11 +423,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
 				info->hist_enabled = mvstat->hist_enabled;
+				info->ndist_enabled = mvstat->ndist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
 				info->hist_built = mvstat->hist_built;
+				info->ndist_built = mvstat->ndist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 7d0a3a1..a84dd2b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -132,6 +132,7 @@
 #include "utils/fmgroids.h"
 #include "utils/index_selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/nabstime.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
@@ -206,6 +207,7 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static Oid find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3422,12 +3424,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient
+			 * to get the (probably way more accurate) estimate.
+			 *
+			 * XXX Probably needs refactoring (don't like to mix with clamp
+			 *     and coeff at the same time).
 			 */
 			double		clamp = rel->tuples;
+			double		coeff = 1.0;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				Oid oid = find_ndistinct_coeff(root, rel, varinfos);
+
+				if (oid != InvalidOid)
+					coeff = load_mv_ndistinct(oid);
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3436,6 +3452,13 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
+			/*
+			 * Apply ndistinct coefficient from multivar stats (we must do this
+			 * before clamping the estimate in any way.
+			 */
+			reldistinct /= coeff;
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7582,3 +7605,71 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics.
+ */
+static Oid
+find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+	VariableStatData vardata;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+				= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach (lc, rel->mvstatlist)
+	{
+		int i;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/* only exact matches for now (same set of columns) */
+		if (bms_num_members(attnums) != info->stakeys->dim1)
+			continue;
+
+		/* check that the columns match */
+		for (i = 0; i < info->stakeys->dim1; i++)
+			if (bms_is_member(info->stakeys->values[i], attnums))
+				continue;
+
+		return info->mvoid;
+	}
+
+	return InvalidOid;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 9dbb3b6..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o histogram.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.ndistinct b/src/backend/utils/mvstats/README.ndistinct
new file mode 100644
index 0000000..32d1624
--- /dev/null
+++ b/src/backend/utils/mvstats/README.ndistinct
@@ -0,0 +1,83 @@
+ndistinct coefficients
+======================
+
+Estimating number of distinct groups in a combination of columns is tricky,
+and the estimation error is often significant. By ndistinct coefficient we
+mean a ratio
+
+    q = ndistinct(a) * ndistinct(b) / ndistinct(a,b)
+
+where 'a' and 'b' are columns, ndistinct(a) is (an estimate of) a number of
+distinct values in column 'a'. And ndistinct(a,b) is the same thing for the
+pair of columns.
+
+The meaning of the coefficient may be illustrated by answering the following
+question: Given a combination of columns (a,b), how many distinct values of 'b'
+matches a chosen value of 'a' on average?
+
+Let's assume we know ndistinct(a) and ndistinct(a,b). Then the answer to the
+question clearly is
+
+    ndistinct(a,b) / ndistinct(a)
+
+and by using 'q' we may rewrite this as
+
+    ndistinct(b) / q
+
+so 'q' may be considered as a correction factor of the ndistinct estimate given
+a condition on one of the columns.
+
+This may be generalized to a combination of 'n' columns
+
+    [ndistinct(c1) * ... * ndistinct(cn)] / ndistinct(c1, ..., cn)
+
+and the meaning is very similar, except that we need to use conditions on (n-1)
+of the columns.
+
+
+Selectivity estimation
+----------------------
+
+As explained in the previous paragraph, ndistinct coefficients may be used to
+estimate cardinality of a column, given some apriori knowledge. Let's assume
+we need to estimate selectivity of a condition
+
+    (a=1) AND (b=2)
+
+which we can expand like this
+
+    P(a=1 & b=2) = P(a=1) * P(b=2 | a=1)
+
+Let's also assume that the distribution of 'b' is uniform, i.e. that
+
+    P(a=1) = 1/ndistinct(a)
+    P(b=2) = 1/ndistinct(b)
+    P(a=1 & b=2) = 1/ndistinct(a,b)
+
+    P(b=2 | a=1) = ndistinct(a) / ndistinct(a,b)
+
+which may be rewritten like
+
+    P(b=2 | a=1)
+        = ndistinct(a,b) / ndistinct(a)
+        = (1/ndistinct(b)) * [(ndistinct(a) * ndistinct(b)) / ndistinct(a,b)]
+        = (1/ndistinct(b)) * q
+
+and therefore
+
+    P(a=1 & b=2) = (1/ndistinct(a)) * (1/ndistinct(b)) * q
+
+This also illustrates 'q' as a correction coefficient.
+
+It also explains why we store the coefficient and not simply ndistinct(a,b).
+This way we can simply estimate individual clauses and then simply correct
+the estimate by multiplying the result with 'q' - we don't have to mess with
+ndistinct estimates at all.
+
+Naturally, as the coefficient is derives from ndistinct(a,b), it may be also
+used to estimate GROUP BY clauses on the combination of columns, replacing the
+existing heuristics in estimate_num_groups().
+
+Note: Currently only the GROUP BY estimation is implemented. It's a bit unclear
+how to implement the clause estimation when there are other statistics (esp.
+MCV lists and/or functional dependencies) available.
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index d404914..6d4b09b 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -20,6 +20,8 @@ Currently we only have two kinds of multivariate statistics
 
     (c) multivariate histograms (README.histogram)
 
+    (d) ndistinct coefficients
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index ffb76f4..2be980d 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -32,7 +32,8 @@ static List* list_mv_stats(Oid relid);
  * and serializes them back into the catalog (as bytea values).
  */
 void
-build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats)
 {
 	ListCell *lc;
@@ -53,6 +54,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
 		MVHistogram	histogram = NULL;
+		double		ndist	  = -1;
 		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
@@ -92,6 +94,9 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		if (stat->ndist_enabled)
+			ndist = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
+
 		/* build the MCV list */
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
@@ -101,7 +106,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, ndist, attrs, stats);
 	}
 }
 
@@ -183,6 +188,8 @@ list_mv_stats(Oid relid)
 		info->mcv_built = stats->mcv_built;
 		info->hist_enabled = stats->hist_enabled;
 		info->hist_built = stats->hist_built;
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
 
 		result = lappend(result, info);
 	}
@@ -252,7 +259,7 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 void
 update_mv_stats(Oid mvoid,
 				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
-				int2vector *attrs, VacAttrStats **stats)
+				double ndistcoeff, int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -292,26 +299,36 @@ update_mv_stats(Oid mvoid,
 			= PointerGetDatum(data);
 	}
 
+	if (ndistcoeff > 1.0)
+	{
+		nulls[Anum_pg_mv_statistic_standist -1] = false;
+		values[Anum_pg_mv_statistic_standist-1] = Float8GetDatum(ndistcoeff);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_standist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_ndist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_ndist_built-1] = true;
 	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_ndist_built-1] = BoolGetDatum(ndistcoeff > 1.0);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..59b8358
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,171 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <math.h>
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats)
+{
+	int i, j;
+	int f1, cnt, d;
+	int nmultiple, summultiple;
+	int numattrs = attrs->dim1;
+	MultiSortSupport mss = multi_sort_init(numattrs);
+	double ndistcoeff;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	Assert(numattrs >= 2);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, i, stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[i],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	ndistcoeff = 1 / estimate_ndistinct(totalrows, numrows, d, f1);
+
+	/*
+	 * now count distinct values for each attribute and incrementally
+	 * compute ndistinct(a,b) / (ndistinct(a) * ndistinct(b))
+	 *
+	 * FIXME Probably need to handle cases when one of the ndistinct
+	 *       estimates is negative, and also check that the combined
+	 *       ndistinct is greater than any of those partial values.
+	 */
+	for (i = 0; i < numattrs; i++)
+		ndistcoeff *= stats[i]->stadistinct;
+
+	return ndistcoeff;
+}
+
+double
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return DatumGetFloat8(deps);
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double	numer,
+			denom,
+			ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+			(double) f1 * (double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 37f473f..e46cc6b 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -40,6 +40,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
 	bool		hist_enabled;		/* build histogram? */
+	bool		ndist_enabled;		/* build ndist coefficient? */
 
 	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
@@ -49,6 +50,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
 	bool		hist_built;			/* histogram was built */
+	bool		ndist_built;		/* ndistinct coeff built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -57,6 +59,7 @@ CATALOG(pg_mv_statistic,3381)
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
 	bytea		stahist;			/* MV histogram (serialized) */
+	float8		standcoeff;			/* ndistinct coeff (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -72,7 +75,7 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					15
+#define Natts_pg_mv_statistic					19
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
@@ -80,14 +83,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
 #define Anum_pg_mv_statistic_deps_enabled		5
 #define Anum_pg_mv_statistic_mcv_enabled		6
 #define Anum_pg_mv_statistic_hist_enabled		7
-#define Anum_pg_mv_statistic_mcv_max_items		8
-#define Anum_pg_mv_statistic_hist_max_buckets	9
-#define Anum_pg_mv_statistic_deps_built			10
-#define Anum_pg_mv_statistic_mcv_built			11
-#define Anum_pg_mv_statistic_hist_built			12
-#define Anum_pg_mv_statistic_stakeys			13
-#define Anum_pg_mv_statistic_stadeps			14
-#define Anum_pg_mv_statistic_stamcv				15
-#define Anum_pg_mv_statistic_stahist			16
+#define Anum_pg_mv_statistic_ndist_enabled		8
+#define Anum_pg_mv_statistic_mcv_max_items		9
+#define Anum_pg_mv_statistic_hist_max_buckets	19
+#define Anum_pg_mv_statistic_deps_built			11
+#define Anum_pg_mv_statistic_mcv_built			12
+#define Anum_pg_mv_statistic_hist_built			13
+#define Anum_pg_mv_statistic_ndist_built		14
+#define Anum_pg_mv_statistic_stakeys			15
+#define Anum_pg_mv_statistic_stadeps			16
+#define Anum_pg_mv_statistic_stamcv				17
+#define Anum_pg_mv_statistic_stahist			18
+#define Anum_pg_mv_statistic_standist			19
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 8c50bfb..1923f2b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -655,11 +655,13 @@ typedef struct MVStatisticInfo
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
 	bool		hist_enabled;	/* histogram enabled */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
 	bool		hist_built;		/* histogram built */
+	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 35b2f8e..fb2c5d8 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -225,6 +225,7 @@ typedef MVSerializedHistogramData *MVSerializedHistogram;
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
 MVSerializedHistogram    load_mv_histogram(Oid mvoid);
+double		   load_mv_ndistinct(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
@@ -266,11 +267,17 @@ MVHistogram
 build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
 				   VacAttrStats **stats, int numrows_total);
 
-void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, double totalrows,
+					int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid, MVDependencies dependencies,
 					 MCVList mcvlist, MVHistogram histogram,
+					 double ndistcoeff,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #ifdef DEBUG_MVHIST
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1a1a4ca..0ad935e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1377,7 +1377,8 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stamcv) AS mcvbytes,
     pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
     length(s.stahist) AS histbytes,
-    pg_mv_stats_histogram_info(s.stahist) AS histinfo
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo,
+    s.standcoeff AS ndcoeff
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
-- 
2.1.0

0008-change-how-we-apply-selectivity-to-number-of-groups-.patchtext/x-patch; charset=UTF-8; name=0008-change-how-we-apply-selectivity-to-number-of-groups-.patchDownload

From 3e6238b1651b37c0fc3f1dbad6be3c5bdbae5be8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 26 Jan 2016 18:14:33 +0100
Subject: [PATCH 8/9] change how we apply selectivity to number of groups
 estimate

Instead of simply multiplying the ndistinct estimate with selecticity,
we instead use the formula for the expected number of distinct values
observed in 'k' rows when there are 'd' distinct values in the bin

    d * (1 - ((d - 1) / d)^k)

This is 'with replacements' which seems appropriate for the use, and it
mostly assumes uniform distribution of the distinct values. So if the
distribution is not uniform (e.g. there are very frequent groups) this
may be less accurate than the current algorithm in some cases, giving
over-estimates. But that's probably better than OOM.
---
 src/backend/utils/adt/selfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index a84dd2b..ce3ad19 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3465,7 +3465,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			/*
 			 * Multiply by restriction selectivity.
 			 */
-			reldistinct *= rel->rows / rel->tuples;
+			reldistinct = reldistinct * (1 - powl((reldistinct - 1) / reldistinct,rel->rows));
 
 			/*
 			 * Update estimate of total distinct groups.
-- 
2.1.0

0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchtext/x-patch; charset=UTF-8; name=0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchDownload

From 82510eec9a98e24bc86deb313f3c031d54420996 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 28 Feb 2016 21:16:40 +0100
Subject: [PATCH 9/9] fixup of regression tests (plans changes by group by
 estimation)

---
 src/test/regress/expected/join.out      | 18 ++++++++++--------
 src/test/regress/expected/subselect.out | 25 +++++++++++--------------
 src/test/regress/expected/union.out     | 16 ++++++++--------
 3 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cafbc5e..151402d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3965,18 +3965,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
+                 QUERY PLAN                  
+---------------------------------------------
  Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+   Merge Cond: (s.id = d.a)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+(11 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index de64ca7..0fc93d9 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -807,27 +807,24 @@ select * from int4_tbl where
 explain (verbose, costs off)
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
-                              QUERY PLAN                              
-----------------------------------------------------------------------
- Hash Join
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Hash Semi Join
    Output: o.f1
    Hash Cond: (o.f1 = "ANY_subquery".f1)
    ->  Seq Scan on public.int4_tbl o
          Output: o.f1
    ->  Hash
          Output: "ANY_subquery".f1, "ANY_subquery".g
-         ->  HashAggregate
+         ->  Subquery Scan on "ANY_subquery"
                Output: "ANY_subquery".f1, "ANY_subquery".g
-               Group Key: "ANY_subquery".f1, "ANY_subquery".g
-               ->  Subquery Scan on "ANY_subquery"
-                     Output: "ANY_subquery".f1, "ANY_subquery".g
-                     Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
-                     ->  HashAggregate
-                           Output: i.f1, (generate_series(1, 2) / 10)
-                           Group Key: i.f1
-                           ->  Seq Scan on public.int4_tbl i
-                                 Output: i.f1
-(18 rows)
+               Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+               ->  HashAggregate
+                     Output: i.f1, (generate_series(1, 2) / 10)
+                     Group Key: i.f1
+                     ->  Seq Scan on public.int4_tbl i
+                           Output: i.f1
+(15 rows)
 
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 016571b..f2e297e 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -263,16 +263,16 @@ ORDER BY 1;
 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
         q2        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
         q2        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;
@@ -305,16 +305,16 @@ SELECT q1 FROM int8_tbl EXCEPT SELECT q2 FROM int8_tbl;
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
         q1        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q1 FROM int8_tbl FOR NO KEY UPDATE;
@@ -343,8 +343,8 @@ SELECT f1 FROM float8_tbl EXCEPT SELECT f1 FROM int4_tbl ORDER BY 1;
 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -355,15 +355,15 @@ SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FR
 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -419,8 +419,8 @@ HINT:  There is a column named "q2" in table "*SELECT* 2", but it cannot be refe
 SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 --
-- 
2.1.0

#76

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Tomas Vondra (#75)

Re: multivariate statistics v14

On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

thanks for the feedback. Attached is v14 of the patch series, fixing
most of the points you've raised.

Hi Tomas,

Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
check if I configure without --enable-cassert.

With --enable-cassert, it passes the regression test.

I got the core file, configured and compiled with:
CFLAGS="-fno-omit-frame-pointer" --enable-debug

The first core dump is on this statement:

-- check explain (expect bitmap index scan, not plain index scan)
INSERT INTO functional_dependencies
SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);

#0 0x00000000006e1160 in cost_qual_eval (cost=0x2494418,
quals=0x2495550, root=0x2541b88) at costsize.c:3181
#1 0x00000000006e1ee5 in set_baserel_size_estimates (root=0x2541b88,
rel=0x2494300) at costsize.c:3754
#2 0x00000000006d37e8 in set_plain_rel_size (root=0x2541b88,
rel=0x2494300, rte=0x247e660) at allpaths.c:480
#3 0x00000000006d353d in set_rel_size (root=0x2541b88, rel=0x2494300,
rti=1, rte=0x247e660) at allpaths.c:350
#4 0x00000000006d338f in set_base_rel_sizes (root=0x2541b88) at allpaths.c:270
#5 0x00000000006d3233 in make_one_rel (root=0x2541b88,
joinlist=0x2494628) at allpaths.c:169
#6 0x000000000070012e in query_planner (root=0x2541b88,
tlist=0x2541e58, qp_callback=0x7048d4 <standard_qp_callback>,
qp_extra=0x7ffefa6474e0)
at planmain.c:246
#7 0x0000000000702a33 in grouping_planner (root=0x2541b88,
inheritance_update=0 '\000', tuple_fraction=0) at planner.c:1647
#8 0x0000000000701310 in subquery_planner (glob=0x2541af8,
parse=0x246a838, parent_root=0x0, hasRecursion=0 '\000',
tuple_fraction=0) at planner.c:740
#9 0x000000000070055b in standard_planner (parse=0x246a838,
cursorOptions=256, boundParams=0x0) at planner.c:290
#10 0x000000000070023f in planner (parse=0x246a838, cursorOptions=256,
boundParams=0x0) at planner.c:160
#11 0x00000000007b8bf9 in pg_plan_query (querytree=0x246a838,
cursorOptions=256, boundParams=0x0) at postgres.c:798
#12 0x00000000005d1967 in ExplainOneQuery (query=0x246a838, into=0x0,
es=0x246a778,
queryString=0x2443d80 "EXPLAIN (COSTS off)\n SELECT * FROM
mcv_list WHERE a = 10 AND b = 5;", params=0x0) at explain.c:350
#13 0x00000000005d16a3 in ExplainQuery (stmt=0x2444f90,
queryString=0x2443d80 "EXPLAIN (COSTS off)\n SELECT * FROM mcv_list
WHERE a = 10 AND b = 5;",
params=0x0, dest=0x246a6e8) at explain.c:244
#14 0x00000000007c0afb in standard_ProcessUtility (parsetree=0x2444f90,
queryString=0x2443d80 "EXPLAIN (COSTS off)\n SELECT * FROM
mcv_list WHERE a = 10 AND b = 5;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0,
dest=0x246a6e8, completionTag=0x7ffefa647b60 "") at utility.c:659
#15 0x00000000007c0299 in ProcessUtility (parsetree=0x2444f90,
queryString=0x2443d80 "EXPLAIN (COSTS off)\n SELECT * FROM mcv_list
WHERE a = 10 AND b = 5;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x246a6e8,
completionTag=0x7ffefa647b60 "") at utility.c:335
#16 0x00000000007bf47b in PortalRunUtility (portal=0x23ed510,
utilityStmt=0x2444f90, isTopLevel=1 '\001', dest=0x246a6e8,
completionTag=0x7ffefa647b60 "")
at pquery.c:1183
#17 0x00000000007bf1ce in FillPortalStore (portal=0x23ed510,
isTopLevel=1 '\001') at pquery.c:1057
#18 0x00000000007beb19 in PortalRun (portal=0x23ed510,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x253f6c0,
altdest=0x253f6c0,
completionTag=0x7ffefa647d40 "") at pquery.c:781
#19 0x00000000007b90ae in exec_simple_query (query_string=0x2443d80
"EXPLAIN (COSTS off)\n SELECT * FROM mcv_list WHERE a = 10 AND b =
5;")
at postgres.c:1094
#20 0x00000000007bcfac in PostgresMain (argc=1, argv=0x23d5070,
dbname=0x23d4e48 "regression", username=0x23d4e30 "jjanes") at
postgres.c:4021
#21 0x0000000000745a62 in BackendRun (port=0x23f4110) at postmaster.c:4258
#22 0x00000000007451d6 in BackendStartup (port=0x23f4110) at postmaster.c:3932
#23 0x0000000000741ab7 in ServerLoop () at postmaster.c:1690
#24 0x00000000007411c0 in PostmasterMain (argc=8, argv=0x23d3f20) at
postmaster.c:1298
#25 0x0000000000690026 in main (argc=8, argv=0x23d3f20) at main.c:223

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Jeff Janes (#76)

1 attachment(s)

Re: multivariate statistics v14

Hi,

On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:

On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

thanks for the feedback. Attached is v14 of the patch series, fixing
most of the points you've raised.

Hi Tomas,

Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
check if I configure without --enable-cassert.

Ah, after disabling asserts I can reproduce it too. And the reason why
it fails is quite simple - clauselist_selectivity modifies the original
list of clauses, which then confuses cost_qual_eval.

Can you try if the attached patch fixes the issue? I'll need to rework a
bit more of the code, but let's see if this fixes the issue on your
machine too.

With --enable-cassert, it passes the regression test.

I wonder how can it work with casserts and fail without them. That's
kinda exactly the opposite to what I'd expect ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

mvstats-segfault-fix.patchtext/x-patch; charset=UTF-8; name=mvstats-segfault-fix.patchDownload

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 2540da9..ddfdc3b 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -279,6 +279,10 @@ clauselist_selectivity(PlannerInfo *root,
 		List *solution = choose_mv_statistics(root, relid, stats,
 											  clauses, conditions);
 
+		/* FIXME we must not scribble over the original list */
+		if (solution)
+			clauses = list_copy(clauses);
+
 		/*
 		 * We have a good solution, which is merely a list of statistics that
 		 * we need to apply. We'll apply the statistics one by one (in the order

#78

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Tomas Vondra (#77)

Re: multivariate statistics v14

On Wed, Mar 9, 2016 at 9:21 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:

On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

thanks for the feedback. Attached is v14 of the patch series, fixing
most of the points you've raised.

Hi Tomas,

Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
check if I configure without --enable-cassert.

Ah, after disabling asserts I can reproduce it too. And the reason why
it fails is quite simple - clauselist_selectivity modifies the original
list of clauses, which then confuses cost_qual_eval.

Can you try if the attached patch fixes the issue? I'll need to rework a
bit more of the code, but let's see if this fixes the issue on your
machine too.

Yes, that fixes it.

With --enable-cassert, it passes the regression test.

I wonder how can it work with casserts and fail without them. That's
kinda exactly the opposite to what I'd expect ...

I too was surprised by that. Maybe cassert makes a copy of some data
structure which is used in-place without cassert?

Thanks,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#77)

Re: multivariate statistics v14

On Wed, 2016-03-09 at 18:21 +0100, Tomas Vondra wrote:

Hi,

On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:

On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

thanks for the feedback. Attached is v14 of the patch series, fixing
most of the points you've raised.

Hi Tomas,

Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
check if I configure without --enable-cassert.

Ah, after disabling asserts I can reproduce it too. And the reason why
it fails is quite simple - clauselist_selectivity modifies the original
list of clauses, which then confuses cost_qual_eval.

More precisely, it gets confused because the first clause in the list
gets deleted but cost_qual_eval never learns about that, and follows
stale pointer to the next cell, thus a segfault.

Can you try if the attached patch fixes the issue? I'll need to rework a
bit more of the code, but let's see if this fixes the issue on your
machine too.

With --enable-cassert, it passes the regression test.

I wonder how can it work with casserts and fail without them. That's
kinda exactly the opposite to what I'd expect ...

FWIW it seems to be somehow related to this assert in clausesel.c:

Assert(count_mv_attnums(list_union(stat_clauses, stat_conditions),
relid, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);

With the assert in place, the code passes without a failure. After
removing the assert (commenting it out), or even just changing it to

Assert(count_mv_attnums(stat_clauses, relid,
MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST)
+ count_mv_attnums(stat_conditions, relid,
MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);

i.e. removing the list_union, it fails as expected.

The only thing that I can think of is that list_union happens to place
the right stuff at the right position in memory - pure luck.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Tomas Vondra (#77)

Re: multivariate statistics v14

On Wed, Mar 9, 2016 at 9:21 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:

On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

thanks for the feedback. Attached is v14 of the patch series, fixing
most of the points you've raised.

Hi Tomas,

Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
check if I configure without --enable-cassert.

Ah, after disabling asserts I can reproduce it too. And the reason why
it fails is quite simple - clauselist_selectivity modifies the original
list of clauses, which then confuses cost_qual_eval.

Can you try if the attached patch fixes the issue? I'll need to rework a
bit more of the code, but let's see if this fixes the issue on your
machine too.

That patch on top of v14 did fix the original problem. But I got
another segfault:

jjanes=# create table foo as select x, floor(x/(10000000/500))::int as
y from generate_series(1,10000000) f(x);
jjanes=# create index on foo (x,y);
jjanes=# create index on foo (y,x);
jjanes=# create statistics jjj on foo (x,y) with (dependencies,histogram);
jjanes=# analyze ;
server closed the connection unexpectedly

#0 multi_sort_add_dimension (mss=mss@entry=0x7f45dafc7c88,
sortdim=sortdim@entry=0, dim=dim@entry=0,
vacattrstats=vacattrstats@entry=0x16f0dd0) at common.c:436
#1 0x00000000007d022a in update_bucket_ndistinct (attrs=0x166fdf8,
stats=0x16f0dd0, bucket=<optimized out>) at histogram.c:1384
#2 0x00000000007d09aa in create_initial_mv_bucket (stats=0x16f0dd0,
attrs=0x166fdf8, rows=0x17cda20, numrows=30000) at histogram.c:880
#3 build_mv_histogram (numrows=30000, rows=rows@entry=0x170ecf0,
attrs=attrs@entry=0x166fdf8, stats=stats@entry=0x16f0dd0,
numrows_total=numrows_total@entry=30000)
at histogram.c:156
#4 0x00000000007ced19 in build_mv_stats
(onerel=onerel@entry=0x7f45e797d040, totalrows=9999985,
numrows=numrows@entry=30000, rows=rows@entry=0x170ecf0,
natts=natts@entry=2,
vacattrstats=vacattrstats@entry=0x166efa0) at common.c:106
#5 0x000000000055ff6b in do_analyze_rel
(onerel=onerel@entry=0x7f45e797d040, options=options@entry=2,
va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>,
relpages=44248,
inh=inh@entry=0 '\000', in_outer_xact=in_outer_xact@entry=0
'\000', elevel=elevel@entry=13, params=0x7ffcbe382a30) at
analyze.c:585
#6 0x0000000000560ced in analyze_rel (relid=relid@entry=16441,
relation=relation@entry=0x16bc9d0, options=options@entry=2,
params=params@entry=0x7ffcbe382a30,
va_cols=va_cols@entry=0x0, in_outer_xact=<optimized out>,
bstrategy=0x16640f0) at analyze.c:262
#7 0x00000000005b70fd in vacuum (options=2, relation=0x16bc9d0,
relid=relid@entry=0, params=params@entry=0x7ffcbe382a30, va_cols=0x0,
bstrategy=<optimized out>,
bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=1 '\001') at vacuum.c:313
#8 0x00000000005b748e in ExecVacuum (vacstmt=vacstmt@entry=0x16bca20,
isTopLevel=isTopLevel@entry=1 '\001') at vacuum.c:121
#9 0x00000000006c90f3 in standard_ProcessUtility
(parsetree=0x16bca20, queryString=0x16bbfc0 "analyze foo ;",
context=<optimized out>, params=0x0, dest=0x16bcd60,
completionTag=0x7ffcbe382fa0 "") at utility.c:654
#10 0x00007f45e413b1d1 in pgss_ProcessUtility (parsetree=0x16bca20,
queryString=0x16bbfc0 "analyze foo ;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x16bcd60,
completionTag=0x7ffcbe382fa0 "") at pg_stat_statements.c:986
#11 0x00000000006c6841 in PortalRunUtility (portal=0x16f7700,
utilityStmt=0x16bca20, isTopLevel=<optimized out>, dest=0x16bcd60,
completionTag=0x7ffcbe382fa0 "") at pquery.c:1175
#12 0x00000000006c73c5 in PortalRunMulti
(portal=portal@entry=0x16f7700, isTopLevel=isTopLevel@entry=1 '\001',
dest=dest@entry=0x16bcd60, altdest=altdest@entry=0x16bcd60,
completionTag=completionTag@entry=0x7ffcbe382fa0 "") at pquery.c:1306
#13 0x00000000006c7dd9 in PortalRun (portal=portal@entry=0x16f7700,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001', dest=dest@entry=0x16bcd60,
altdest=altdest@entry=0x16bcd60,
completionTag=completionTag@entry=0x7ffcbe382fa0 "") at pquery.c:813
#14 0x00000000006c5c98 in exec_simple_query (query_string=0x16bbfc0
"analyze foo ;") at postgres.c:1094
#15 PostgresMain (argc=<optimized out>, argv=argv@entry=0x164baf8,
dbname=0x164b9a8 "jjanes", username=<optimized out>) at
postgres.c:4021
#16 0x000000000047cb1e in BackendRun (port=0x1669d40) at postmaster.c:4258
#17 BackendStartup (port=0x1669d40) at postmaster.c:3932
#18 ServerLoop () at postmaster.c:1690
#19 0x000000000066ff27 in PostmasterMain (argc=argc@entry=1,
argv=argv@entry=0x164aa10) at postmaster.c:1298
#20 0x000000000047d35e in main (argc=1, argv=0x164aa10) at main.c:228

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#81

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Jeff Janes (#80)

9 attachment(s)

Re: multivariate statistics v14

On Sat, 2016-03-12 at 23:30 -0800, Jeff Janes wrote:

On Wed, Mar 9, 2016 at 9:21 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:

On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

thanks for the feedback. Attached is v14 of the patch series,
fixing
most of the points you've raised.

Hi Tomas,

Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in
make
check if I configure without --enable-cassert.

Ah, after disabling asserts I can reproduce it too. And the reason
why
it fails is quite simple - clauselist_selectivity modifies the
original
list of clauses, which then confuses cost_qual_eval.

Can you try if the attached patch fixes the issue? I'll need to
rework a
bit more of the code, but let's see if this fixes the issue on your
machine too.

That patch on top of v14 did fix the original problem. But I got
another segfault:

Oh, yeah. There was an extra pfree().

Attached is v15 of the patch series, fixing this and also doing quite a
few additional improvements:

* added some basic examples into the SGML documentation

* addressing the objectaddress omissions, as pointed out by Alvaro

* support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA

* significant refactoring of MCV and histogram code, particularly
serialization, deserialization and building

* reworking the functional dependencies to support more complex
dependencies, with multiple columns as 'conditions'

* the reduction using functional dependencies is also significantly
simplified (I decided to get rid of computing the transitive closure
for now - it got too complex after the multi-condition dependencies,
so I'll leave that for the future

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0008-change-how-we-apply-selectivity-to-number-of-groups-.patchtext/x-patch; charset=UTF-8; name=0008-change-how-we-apply-selectivity-to-number-of-groups-.patchDownload

From 494a31e1ed7976e0f965a32e81c769e1c3dfad66 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 26 Jan 2016 18:14:33 +0100
Subject: [PATCH 8/9] change how we apply selectivity to number of groups
 estimate

Instead of simply multiplying the ndistinct estimate with selecticity,
we instead use the formula for the expected number of distinct values
observed in 'k' rows when there are 'd' distinct values in the bin

    d * (1 - ((d - 1) / d)^k)

This is 'with replacements' which seems appropriate for the use, and it
mostly assumes uniform distribution of the distinct values. So if the
distribution is not uniform (e.g. there are very frequent groups) this
may be less accurate than the current algorithm in some cases, giving
over-estimates. But that's probably better than OOM.
---
 src/backend/utils/adt/selfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f8d39aa..6eceedf 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3466,7 +3466,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			/*
 			 * Multiply by restriction selectivity.
 			 */
-			reldistinct *= rel->rows / rel->tuples;
+			reldistinct = reldistinct * (1 - powl((reldistinct - 1) / reldistinct,rel->rows));
 
 			/*
 			 * Update estimate of total distinct groups.
-- 
2.5.0

0007-multivariate-ndistinct-coefficients.patchtext/x-patch; charset=UTF-8; name=0007-multivariate-ndistinct-coefficients.patchDownload

From 1b905c77e851d34229da72c2a84107fa0925f54a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Wed, 23 Dec 2015 02:07:58 +0100
Subject: [PATCH 7/9] multivariate ndistinct coefficients

---
 doc/src/sgml/ref/create_statistics.sgml    |   9 ++
 src/backend/catalog/system_views.sql       |   3 +-
 src/backend/commands/analyze.c             |   2 +-
 src/backend/commands/statscmds.c           |  11 +-
 src/backend/optimizer/path/clausesel.c     |   4 +
 src/backend/optimizer/util/plancat.c       |   4 +-
 src/backend/utils/adt/selfuncs.c           |  93 +++++++++++++++-
 src/backend/utils/mvstats/Makefile         |   2 +-
 src/backend/utils/mvstats/README.ndistinct |  83 ++++++++++++++
 src/backend/utils/mvstats/README.stats     |   2 +
 src/backend/utils/mvstats/common.c         |  23 +++-
 src/backend/utils/mvstats/mvdist.c         | 171 +++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h      |  26 +++--
 src/include/nodes/relation.h               |   2 +
 src/include/utils/mvstats.h                |   9 +-
 src/test/regress/expected/rules.out        |   3 +-
 16 files changed, 424 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.ndistinct
 create mode 100644 src/backend/utils/mvstats/mvdist.c

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index f7336fd..80e472f 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -168,6 +168,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b151db1..8d2b435 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -169,7 +169,8 @@ CREATE VIEW pg_mv_stats AS
         length(S.stamcv) AS mcvbytes,
         pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
         length(S.stahist) AS histbytes,
-        pg_mv_stats_histogram_info(S.stahist) AS histinfo
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo,
+        standcoeff AS ndcoeff
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9087532..c29f1be 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -582,7 +582,7 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 		}
 
 		/* Build multivariate stats (if there are any). */
-		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
+		build_mv_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index e0b085f..a7c569d 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -72,7 +72,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool 	build_dependencies = false,
 			build_mcv = false,
-			build_histogram = false;
+			build_histogram = false,
+			build_ndistinct = false;
 
 	int32 	max_buckets = -1,
 			max_mcv_items = -1;
@@ -155,6 +156,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "max_mcv_items") == 0)
@@ -209,10 +212,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv || build_histogram))
+	if (! (build_dependencies || build_mcv || build_histogram || build_ndistinct))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram, ndistinct) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -246,6 +249,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
 	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_ndist_enabled-1] = BoolGetDatum(build_ndistinct);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
@@ -253,6 +257,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+	nulls[Anum_pg_mv_statistic_standist -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index e06fd99..255d275 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
+#define		MV_CLAUSE_TYPE_NDIST	0x08
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -2860,6 +2861,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_NDIST) && stat->ndist_built)
+		return true;
+
 	return false;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 2519249..3741b7a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -423,11 +423,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
 				info->hist_enabled = mvstat->hist_enabled;
+				info->ndist_enabled = mvstat->ndist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
 				info->hist_built = mvstat->hist_built;
+				info->ndist_built = mvstat->ndist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 805d633..f8d39aa 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -132,6 +132,7 @@
 #include "utils/fmgroids.h"
 #include "utils/index_selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/nabstime.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
@@ -206,6 +207,7 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static Oid find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3423,12 +3425,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient
+			 * to get the (probably way more accurate) estimate.
+			 *
+			 * XXX Probably needs refactoring (don't like to mix with clamp
+			 *     and coeff at the same time).
 			 */
 			double		clamp = rel->tuples;
+			double		coeff = 1.0;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				Oid oid = find_ndistinct_coeff(root, rel, varinfos);
+
+				if (oid != InvalidOid)
+					coeff = load_mv_ndistinct(oid);
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3437,6 +3453,13 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
+			/*
+			 * Apply ndistinct coefficient from multivar stats (we must do this
+			 * before clamping the estimate in any way.
+			 */
+			reldistinct /= coeff;
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7583,3 +7606,71 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics.
+ */
+static Oid
+find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+	VariableStatData vardata;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+				= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach (lc, rel->mvstatlist)
+	{
+		int i;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/* only exact matches for now (same set of columns) */
+		if (bms_num_members(attnums) != info->stakeys->dim1)
+			continue;
+
+		/* check that the columns match */
+		for (i = 0; i < info->stakeys->dim1; i++)
+			if (bms_is_member(info->stakeys->values[i], attnums))
+				continue;
+
+		return info->mvoid;
+	}
+
+	return InvalidOid;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 9dbb3b6..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o histogram.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.ndistinct b/src/backend/utils/mvstats/README.ndistinct
new file mode 100644
index 0000000..32d1624
--- /dev/null
+++ b/src/backend/utils/mvstats/README.ndistinct
@@ -0,0 +1,83 @@
+ndistinct coefficients
+======================
+
+Estimating number of distinct groups in a combination of columns is tricky,
+and the estimation error is often significant. By ndistinct coefficient we
+mean a ratio
+
+    q = ndistinct(a) * ndistinct(b) / ndistinct(a,b)
+
+where 'a' and 'b' are columns, ndistinct(a) is (an estimate of) a number of
+distinct values in column 'a'. And ndistinct(a,b) is the same thing for the
+pair of columns.
+
+The meaning of the coefficient may be illustrated by answering the following
+question: Given a combination of columns (a,b), how many distinct values of 'b'
+matches a chosen value of 'a' on average?
+
+Let's assume we know ndistinct(a) and ndistinct(a,b). Then the answer to the
+question clearly is
+
+    ndistinct(a,b) / ndistinct(a)
+
+and by using 'q' we may rewrite this as
+
+    ndistinct(b) / q
+
+so 'q' may be considered as a correction factor of the ndistinct estimate given
+a condition on one of the columns.
+
+This may be generalized to a combination of 'n' columns
+
+    [ndistinct(c1) * ... * ndistinct(cn)] / ndistinct(c1, ..., cn)
+
+and the meaning is very similar, except that we need to use conditions on (n-1)
+of the columns.
+
+
+Selectivity estimation
+----------------------
+
+As explained in the previous paragraph, ndistinct coefficients may be used to
+estimate cardinality of a column, given some apriori knowledge. Let's assume
+we need to estimate selectivity of a condition
+
+    (a=1) AND (b=2)
+
+which we can expand like this
+
+    P(a=1 & b=2) = P(a=1) * P(b=2 | a=1)
+
+Let's also assume that the distribution of 'b' is uniform, i.e. that
+
+    P(a=1) = 1/ndistinct(a)
+    P(b=2) = 1/ndistinct(b)
+    P(a=1 & b=2) = 1/ndistinct(a,b)
+
+    P(b=2 | a=1) = ndistinct(a) / ndistinct(a,b)
+
+which may be rewritten like
+
+    P(b=2 | a=1)
+        = ndistinct(a,b) / ndistinct(a)
+        = (1/ndistinct(b)) * [(ndistinct(a) * ndistinct(b)) / ndistinct(a,b)]
+        = (1/ndistinct(b)) * q
+
+and therefore
+
+    P(a=1 & b=2) = (1/ndistinct(a)) * (1/ndistinct(b)) * q
+
+This also illustrates 'q' as a correction coefficient.
+
+It also explains why we store the coefficient and not simply ndistinct(a,b).
+This way we can simply estimate individual clauses and then simply correct
+the estimate by multiplying the result with 'q' - we don't have to mess with
+ndistinct estimates at all.
+
+Naturally, as the coefficient is derives from ndistinct(a,b), it may be also
+used to estimate GROUP BY clauses on the combination of columns, replacing the
+existing heuristics in estimate_num_groups().
+
+Note: Currently only the GROUP BY estimation is implemented. It's a bit unclear
+how to implement the clause estimation when there are other statistics (esp.
+MCV lists and/or functional dependencies) available.
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index d404914..6d4b09b 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -20,6 +20,8 @@ Currently we only have two kinds of multivariate statistics
 
     (c) multivariate histograms (README.histogram)
 
+    (d) ndistinct coefficients
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index f6d1074..d34d072 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -32,7 +32,8 @@ static List* list_mv_stats(Oid relid);
  * and serializes them back into the catalog (as bytea values).
  */
 void
-build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats)
 {
 	ListCell *lc;
@@ -53,6 +54,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
 		MVHistogram	histogram = NULL;
+		double		ndist	  = -1;
 		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
@@ -92,6 +94,9 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		if (stat->ndist_enabled)
+			ndist = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
+
 		/* build the MCV list */
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
@@ -101,7 +106,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, ndist, attrs, stats);
 	}
 }
 
@@ -183,6 +188,8 @@ list_mv_stats(Oid relid)
 		info->mcv_built = stats->mcv_built;
 		info->hist_enabled = stats->hist_enabled;
 		info->hist_built = stats->hist_built;
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
 
 		result = lappend(result, info);
 	}
@@ -252,7 +259,7 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 void
 update_mv_stats(Oid mvoid,
 				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
-				int2vector *attrs, VacAttrStats **stats)
+				double ndistcoeff, int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -292,26 +299,36 @@ update_mv_stats(Oid mvoid,
 			= PointerGetDatum(data);
 	}
 
+	if (ndistcoeff > 1.0)
+	{
+		nulls[Anum_pg_mv_statistic_standist -1] = false;
+		values[Anum_pg_mv_statistic_standist-1] = Float8GetDatum(ndistcoeff);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_standist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_ndist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_ndist_built-1] = true;
 	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_ndist_built-1] = BoolGetDatum(ndistcoeff > 1.0);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..59b8358
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,171 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <math.h>
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats)
+{
+	int i, j;
+	int f1, cnt, d;
+	int nmultiple, summultiple;
+	int numattrs = attrs->dim1;
+	MultiSortSupport mss = multi_sort_init(numattrs);
+	double ndistcoeff;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	Assert(numattrs >= 2);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, i, stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[i],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	ndistcoeff = 1 / estimate_ndistinct(totalrows, numrows, d, f1);
+
+	/*
+	 * now count distinct values for each attribute and incrementally
+	 * compute ndistinct(a,b) / (ndistinct(a) * ndistinct(b))
+	 *
+	 * FIXME Probably need to handle cases when one of the ndistinct
+	 *       estimates is negative, and also check that the combined
+	 *       ndistinct is greater than any of those partial values.
+	 */
+	for (i = 0; i < numattrs; i++)
+		ndistcoeff *= stats[i]->stadistinct;
+
+	return ndistcoeff;
+}
+
+double
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return DatumGetFloat8(deps);
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double	numer,
+			denom,
+			ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+			(double) f1 * (double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 7020772..e46cc6b 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -40,6 +40,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
 	bool		hist_enabled;		/* build histogram? */
+	bool		ndist_enabled;		/* build ndist coefficient? */
 
 	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
@@ -49,6 +50,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
 	bool		hist_built;			/* histogram was built */
+	bool		ndist_built;		/* ndistinct coeff built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -57,6 +59,7 @@ CATALOG(pg_mv_statistic,3381)
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
 	bytea		stahist;			/* MV histogram (serialized) */
+	float8		standcoeff;			/* ndistinct coeff (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -72,7 +75,7 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					16
+#define Natts_pg_mv_statistic					19
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
@@ -80,14 +83,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
 #define Anum_pg_mv_statistic_deps_enabled		5
 #define Anum_pg_mv_statistic_mcv_enabled		6
 #define Anum_pg_mv_statistic_hist_enabled		7
-#define Anum_pg_mv_statistic_mcv_max_items		8
-#define Anum_pg_mv_statistic_hist_max_buckets	9
-#define Anum_pg_mv_statistic_deps_built			10
-#define Anum_pg_mv_statistic_mcv_built			11
-#define Anum_pg_mv_statistic_hist_built			12
-#define Anum_pg_mv_statistic_stakeys			13
-#define Anum_pg_mv_statistic_stadeps			14
-#define Anum_pg_mv_statistic_stamcv				15
-#define Anum_pg_mv_statistic_stahist			16
+#define Anum_pg_mv_statistic_ndist_enabled		8
+#define Anum_pg_mv_statistic_mcv_max_items		9
+#define Anum_pg_mv_statistic_hist_max_buckets	19
+#define Anum_pg_mv_statistic_deps_built			11
+#define Anum_pg_mv_statistic_mcv_built			12
+#define Anum_pg_mv_statistic_hist_built			13
+#define Anum_pg_mv_statistic_ndist_built		14
+#define Anum_pg_mv_statistic_stakeys			15
+#define Anum_pg_mv_statistic_stadeps			16
+#define Anum_pg_mv_statistic_stamcv				17
+#define Anum_pg_mv_statistic_stahist			18
+#define Anum_pg_mv_statistic_standist			19
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 8c50bfb..1923f2b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -655,11 +655,13 @@ typedef struct MVStatisticInfo
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
 	bool		hist_enabled;	/* histogram enabled */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
 	bool		hist_built;		/* histogram built */
+	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 80bf96f..0ff24ce 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -226,6 +226,7 @@ typedef MVSerializedHistogramData *MVSerializedHistogram;
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
 MVSerializedHistogram    load_mv_histogram(Oid mvoid);
+double		   load_mv_ndistinct(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
@@ -267,11 +268,17 @@ MVHistogram
 build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
 				   VacAttrStats **stats, int numrows_total);
 
-void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, double totalrows,
+					int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid, MVDependencies dependencies,
 					 MCVList mcvlist, MVHistogram histogram,
+					 double ndistcoeff,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #ifdef DEBUG_MVHIST
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 528ac36..7a914da 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1377,7 +1377,8 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stamcv) AS mcvbytes,
     pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
     length(s.stahist) AS histbytes,
-    pg_mv_stats_histogram_info(s.stahist) AS histinfo
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo,
+    s.standcoeff AS ndcoeff
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
-- 
2.5.0

0006-multi-statistics-estimation.patchtext/x-patch; charset=UTF-8; name=0006-multi-statistics-estimation.patchDownload

From 91b9b31cbeb22767b33c2f58b912b7a14c943b28 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/9] multi-statistics estimation

The general idea is that a probability (which is what selectivity is)
can be split into a product of conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part may be
simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute the original
probability.

The implementation works in the other direction, though. We know what
probability P(A & B & C) we need to compute, and also what statistics
are available.

So we search for a combinations of statistics, covering the clauses in
an optimal way (most clauses covered, most dependencies exploited).

There are two possible approaches - exhaustive and greedy. The
exhaustive one walks through all permutations of stats using dynamic
programming, so it's guaranteed to find the optimal solution, but it
soon gets very slow as it's roughly O(N!). The dynamic programming may
improve that a bit, but it's still far too expensive for large numbers
of statistics (on a single table).

The greedy algorithm is very simple - in every step choose the best
solution. That may not guarantee the best solution globally (but maybe
it does?), but it only needs N steps to find the solution, so it's very
fast (processing the selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with respect to
runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply them to the
clauses using the conditional probabilities. We process the selected
stats one by one, and for each we select the estimated clauses and
conditions. See clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to be covered by
a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single multivariate
statistics.

Clauses not covered by a single statistics at this level will be passed
to clause_selectivity() but this will treat them as a collection of
simpler clauses (connected by AND or OR), and the clauses from the
previous level will be used as conditions.

So using the same example, the last clause will be passed to
clause_selectivity() with 'clause1' and 'clause2' as conditions, and it
will be processed using multivariate stats if possible.

The other limitation is that all the expressions have to be
mv-compatible, i.e. there can't be a mix of expressions. If this is
violated, the clause may be passed to the next level (just like with
list of clauses not covered by a single statistics), which splits that
into clauses handled by multivariate stats and clauses handler by
regular statistics.

rework clauselist_selectivity_or to handle OR-clauses correctly

We might invent a completely new set of functions here, resembling
clauselist_selectivity but adapting the ideas to OR-clauses.

But luckily we know that each OR-clause

    (a OR b OR c)

may be rewritten as an equivalent AND-clause using negation:

    NOT ((NOT a) AND (NOT b) AND (NOT c))

And that's something we can pass to clauselist_selectivity.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |   11 +-
 src/backend/optimizer/path/clausesel.c | 2024 ++++++++++++++++++++++++++------
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/backend/utils/mvstats/README.stats |  166 +++
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 10 files changed, 1913 insertions(+), 369 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index dc035d7..8f11b7a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -969,7 +969,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 40bffd6..d458a81 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -500,7 +500,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2136,7 +2137,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
@@ -3663,7 +3665,8 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 0,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
 	/*
@@ -3682,7 +3685,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 		 */
 		fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
 														0, fpinfo->jointype,
-														extra->sjinfo);
+														extra->sjinfo, NIL);
 
 	}
 	fpinfo->server = GetForeignServer(joinrel->serverid);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 5e73a4e..e06fd99 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -60,23 +69,25 @@ static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
+static List *clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
-static List *clauselist_mv_split(PlannerInfo *root, Index relid,
-								 List *clauses, List **mvclauses,
-								 MVStatisticInfo *mvstats, int types);
-
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats, List *clauses,
+									List *conditions, bool is_or);
 
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -90,12 +101,33 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root, Index relid,
+							List *mvstats, List *clauses, List *conditions);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
 
 static bool stats_type_matches(MVStatisticInfo *stat, int type);
 
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
 
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
@@ -170,14 +202,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* list of multivariate stats on the relation */
 	List	   *stats = NIL;
@@ -193,12 +226,13 @@ clauselist_selectivity(PlannerInfo *root,
 		stats = find_stats(root, relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Apply functional dependencies, but first check that there are some stats
@@ -230,31 +264,100 @@ clauselist_selectivity(PlannerInfo *root,
 		(count_mv_attnums(clauses, relid,
 						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
-		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
-											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+		ListCell *s;
+
+		/*
+		 * Copy the conditions we got from the upper part of the expression tree
+		 * so that we can add local conditions to it (we need to keep the
+		 * original list intact, for sibling expressions - other expressions
+		 * at the same level).
+		 */
+		List *conditions_local = list_copy(conditions);
+
+		/* find the best combination of statistics */
+		List *solution = choose_mv_statistics(root, relid, stats,
+											  clauses, conditions);
 
-		/* and search for the statistic covering the most attributes */
-		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+		/* FIXME we must not scribble over the original list */
+		if (solution)
+			clauses = list_copy(clauses);
 
-		if (mvstat != NULL)	/* we have a matching stats */
+		/*
+		 * We have a good solution, which is merely a list of statistics that
+		 * we need to apply. We'll apply the statistics one by one (in the order
+		 * as they appear in the list), and for each statistic we'll
+		 *
+		 * (1) find clauses compatible with the statistic (and remove them
+		 *     from the list)
+		 *
+		 * (2) find local conditions compatible with the statistic
+		 *
+		 * (3) do the estimation P(clauses | conditions)
+		 *
+		 * (4) append the estimated clauses to local conditions
+		 *
+		 * continuously modify 
+		 */
+		foreach (s, solution)
 		{
-			/* clauses compatible with multi-variate stats */
-			List	*mvclauses = NIL;
+			MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
 
-			/* split the clauselist into regular and mv-clauses */
-			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+			/* clauses compatible with the statistic we're applying right now */
+			List	*stat_clauses = NIL;
+			List	*stat_conditions = NIL;
 
-			/* we've chosen the histogram to match the clauses */
-			Assert(mvclauses != NIL);
+			/*
+			 * Find clauses and conditions matching the statistic - the clauses
+			 * need to be removed from the list, while conditions should remain
+			 * there (so that we can apply them repeatedly).
+			 */
+			stat_clauses
+				= clauses_matching_statistic(&clauses, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 true);
+
+			stat_conditions
+				= clauses_matching_statistic(&conditions_local, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 false);
+
+			/*
+			 * If we got no clauses to estimate, we've done something wrong,
+			 * either during the optimization, detecting compatible clause, or
+			 * somewhere else.
+			 *
+			 * Also, we need at least two attributes in clauses and conditions.
+			 */
+			Assert(stat_clauses != NIL);
+			Assert(count_mv_attnums(list_union(stat_clauses, stat_conditions),
+								relid, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);
 
 			/* compute the multivariate stats */
-			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			s1 *= clauselist_mv_selectivity(root, mvstat,
+											stat_clauses, stat_conditions,
+											false); /* AND */
+
+			/*
+			 * Add the new clauses to the local conditions, so that we can use
+			 * them for the subsequent statistics. We only add the clauses,
+			 * because the conditions are already there (or should be).
+			 */
+			conditions_local = list_concat(conditions_local, stat_clauses);
 		}
+
+		/* from now on, work only with the 'local' list of conditions */
+		conditions = conditions_local;
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return s1 * clause_selectivity(root, (Node *) linitial(clauses),
+									   varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -266,7 +369,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -425,6 +529,55 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for OR-clauses. We can't simply use
+ * the same multi-statistic estimation logic for AND-clauses, at least not
+ * directly, because there are a few key differences:
+ *
+ *   - functional dependencies don't really apply to OR-clauses
+ *
+ *   - clauselist_selectivity() is based on decomposing the selectivity into
+ *     a sequence of conditional probabilities (selectivities), but that can
+ *     be done only for AND-clauses
+ *
+ * We might invent a similar infrastructure for optimizing OR-clauses, doing
+ * something similar to what clause_selectivity does for AND-clauses, but
+ * luckily we know that each disjunctive normal form (aka OR-clause)
+ *
+ *     (a OR b OR c)
+ *
+ * may be rewritten as an equivalent conjunctive normal form (aka AND-clause)
+ * by using negation:
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * And that's something we can pass to clauselist_selectivity and let it do
+ * all the heavy lifting.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	List	   *args = NIL;
+	ListCell   *l;
+	Expr	   *expr;
+
+	/* build arguments for the AND-clause by negating args of the OR-clause */
+	foreach (l, clauses)
+		args = lappend(args, makeBoolExpr(NOT_EXPR, list_make1(lfirst(l)), -1));
+
+	/* and then the actual OR-clause on the negated args */
+	expr = makeBoolExpr(AND_EXPR, args, -1);
+
+	/* instead of constructing NOT expression, just do (1.0 - s) */
+	return 1.0 - clauselist_selectivity(root, list_make1(expr), varRelid,
+										jointype, sjinfo, conditions);
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -631,7 +784,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -751,7 +905,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -760,29 +915,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -872,7 +1016,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -881,7 +1026,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else
 	{
@@ -945,15 +1091,16 @@ clause_selectivity(PlannerInfo *root,
  *          in the MCV list, then the selectivity is below the lowest frequency
  *          found in the MCV list,
  *
- * TODO When applying the clauses to the histogram/MCV list, we can do
- *      that from the most selective clauses first, because that'll
- *      eliminate the buckets/items sooner (so we'll be able to skip
- *      them without inspection, which is more expensive). But this
- *      requires really knowing the per-clause selectivities in advance,
- *      and that's not what we do now.
+ * TODO When applying the clauses to the histogram/MCV list, we can do that from
+ *      the most selective clauses first, because that'll eliminate the
+ *      buckets/items sooner (so we'll be able to skip them without inspection,
+ *      which is more expensive). But this requires really knowing the
+ *      per-clause selectivities in advance, and that's not what we do now.
+ *
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
@@ -964,281 +1111,1375 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 */
 	Selectivity mcv_low = 0.0;
 
-	/* TODO Evaluate simple 1D selectivities, use the smallest one as
-	 *      an upper bound, product as lower bound, and sort the
-	 *      clauses in ascending order by selectivity (to optimize the
-	 *      MCV/histogram evaluation).
-	 */
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
+										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 *      selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
+}
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid, int types)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		bms_free(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid, int type)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+static List *
+clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove)
+{
+	int i;
+	Bitmapset  *stat_attnums = NULL;
+	List	   *matching_clauses = NIL;
+	ListCell   *lc;
+
+	/* build attnum bitmapset for this statistics */
+	for (i = 0; i < statistic->stakeys->dim1; i++)
+		stat_attnums = bms_add_member(stat_attnums,
+									  statistic->stakeys->values[i]);
+
+	/*
+	 * We can't use foreach here, because we may need to remove some of the
+	 * clauses if (remove=true).
+	 */
+	lc = list_head(*clauses);
+	while (lc)
+	{
+		Node *clause = (Node*)lfirst(lc);
+		Bitmapset *attnums = NULL;
+
+		/* must advance lc before list_delete possibly pfree's it */
+		lc = lnext(lc);
+
+		/*
+		 * skip clauses that are not compatible with stats (just leave them
+		 * in the original list)
+		 *
+		 * XXX Perhaps this should check what stats are actually available in
+		 *     the statistics (not a big deal now, because MCV and histograms
+		 *     handle the same types of conditions).
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			bms_free(attnums);
+			continue;
+		}
+
+		/* if the clause is covered by the statistic, add it to the list */
+		if (bms_is_subset(attnums, stat_attnums))
+		{
+			matching_clauses = lappend(matching_clauses, clause);
+
+			/* if remove=true, remove the matching item from the main list */
+			if (remove)
+				*clauses = list_delete_ptr(*clauses, clause);
+		}
+
+		bms_free(attnums);
+	}
+
+	bms_free(stat_attnums);
+
+	return matching_clauses;
+}
+
+/*
+ * Selects the best combination of multivariate statistics, in an exhaustive
+ * way, where 'best' means:
+ *
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
+ *
+ * Don't call this directly but through choose_mv_statistics(), which does some
+ * additional tricks to minimize the runtime.
+ *
+ *
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with maximum
+ * depth equal to the number of multi-variate statistics available on the table.
+ * It actually explores all valid combinations of stats.
+ * 
+ * Whenever it considers adding the next statistics, the clauses it matches are
+ * divided into 'conditions' (clauses already matched by at least one previous
+ * statistics) and clauses that are estimated.
+ *
+ * Then several checks are performed:
+ *
+ *  (a) The statistics covers at least 2 columns, referenced in the estimated
+ *      clauses (otherwise multi-variate stats are useless).
+ *
+ *  (b) The statistics covers at least 1 new column, i.e. column not refefenced
+ *      by the already used stats (and the new column has to be referenced by
+ *      the clauses, of couse). Otherwise the statistics would not add any new
+ *      information.
+ *
+ * There are some other sanity checks (e.g. stats must not be used twice etc.).
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a rather simple optimality criteria, so it may
+ * not do the best choice when
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but with
+ *     statistics in a different order). It's unclear which solution in the best
+ *     one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those solutions,
+ *      and then combine them to get the final estimate (e.g. by using average
+ *      or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for some
+ *     types of clauses (e.g. MCV list is generally a better match for equality
+ *     conditions than a histogram).
+ *
+ *     But maybe this is pointless - generally, each column is either a label
+ *     (it's not important whether because of the data type or how it's used),
+ *     or a value with ordering that makes sense. So either a MCV list is more
+ *     appropriate (labels) or a histogram (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing both types of data
+ *     (some columns would work best with MCVs, some with histograms). Maybe we
+ *     could invent a new type of statistics combining MCV list and histogram
+ *     (keeping a small histogram for each MCV item, and a separate histogram
+ *     for values not on the MCV list).
+ *
+ * TODO The algorithm should probably count number of Vars (not just attnums)
+ *      when computing the 'score' of each solution. Computing the ratio of
+ *      (num of all vars) / (num of condition vars) as a measure of how well
+ *      the solution uses conditions might be useful.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* this may run for a long sime, so let's make it interruptible */
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+
+		all_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics covering
+ * the clauses. This chooses the "best" statistics at each step, so the
+ * resulting solution may not be the best solution globally, but this produces
+ * the solution in only N steps (where N is the number of statistics), while
+ * the exhaustive approach may have to walk through ~N! combinations (although
+ * some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does the same
+ * thing (but in a different way).
+ *
+ * Don't call this directly, but through choose_mv_statistics().
+ *
+ * TODO There are probably other metrics we might use - e.g. using number of
+ *      columns (num_cond_columns / num_cov_columns), which might work better
+ *      with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled in a special
+ *      way, because there will be 0 conditions at that moment, so there needs
+ *      to be some other criteria - e.g. using the simplest (or most complex?)
+ *      clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria, and branch
+ *      the search. This is however tricky, because if we choose k statistics at
+ *      each step, we get k^N branches to walk through (with N steps). That's
+ *      not really good with large number of stats (yet better than exhaustive
+ *      search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
+ * Remove clauses not covered by any of the available statistics
+ *
+ * This helps us to reduce the amount of work done in choose_mv_statistics()
+ * by not having to deal with clauses that can't possibly be useful.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Index relid, int type,
+			   List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+
+		/*
+		 * We do assume that thanks to previous checks, we should not run into
+		 * clauses that are incompatible with multivariate stats here. We also
+		 * need to collect the attnums for the clause.
+		 *
+		 * XXX Maybe turn this into an assert?
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &clause_attnums, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* Is there a multivariate statistics covering the clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			/* skip statistics not matching the required type */
+			if (! stats_type_matches(stat, type))
+				continue;
+
+			/*
+			 * see if all clause attributes are covered by the statistic
+			 *
+			 * We'll do that in the opposite direction, i.e. we'll see how many
+			 * attributes of the statistic are referenced in the clause, and then
+			 * compare the counts.
+			 */
+			for (k = 0; k < stat->stakeys->dim1; k++)
+				if (bms_is_member(stat->stakeys->values[k], clause_attnums))
+					matches += 1;
+
+			/*
+			 * If the number of matches is equal to attributes referenced by the
+			 * clause, then the clause is covered by the statistic.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+/*
+ * Remove statistics not covering any new clauses
+ *
+ * Statistics not covering any new clauses (conditions don't count) are not
+ * really useful, so let's ignore them. Also, we need the statistics to
+ * reference at least two different attributes (both in conditions and clauses
+ * combined), and at least one of them in the clauses alone.
+ *
+ * This check might be made more strict by checking against individual clauses,
+ * because by using the bitmapsets of all attnums we may actually use attnums
+ * from clauses that are not covered by the statistics. For example, we may
+ * have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this (assuming
+ * there are some statistics covering both clases).
+ *
+ * Parameters:
+ *
+ *     stats       - list of statistics to filter
+ *     new_attnums - attnums referenced in new clauses
+ *     all_attnums - attnums referenced by contidions and new clauses combined
+ *
+ * Returns filtered list of statistics.
+ *
+ * TODO Do the more strict check, i.e. walk through individual clauses and
+ *      conditions and only use those covered by the statistics.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
 
-	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
-										   &fullmatch, &mcv_low);
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
 
-	/*
-	 * If we got a full equality match on the MCV list, we're done (and
-	 * the estimate is pretty good).
-	 */
-	if (fullmatch && (s1 > 0.0))
-		return s1;
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
 
-	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
-	 *      selectivity as upper bound */
+	return mvstats;
+}
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
 
-	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
-	return s1 + s2;
+	Assert(nmvstats > 0);
+
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
 }
 
+
 /*
- * Collect attributes from mv-compatible clauses.
+ * Remove redundant statistics
+ *
+ * If there are multiple statistics covering the same set of columns (counting
+ * only those referenced by clauses and conditions), we can apply one of those
+ * anyway and further reduce the size of the optimization problem.
+ *
+ * Thus when redundant stats are detected, we keep the smaller one (the one with
+ * fewer columns), based on the assumption that it's more accurate and also
+ * faster to process. That may be untrue for two reasons - first, the accuracy
+ * really depends on number of buckets/MCV items, not the number of columns.
+ * Second, some types of statistics may work better for certain types of clauses
+ * (e.g. MCV lists for equality conditions) etc.
  */
-static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid, int types)
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
 {
-	Bitmapset  *attnums = NULL;
-	ListCell   *l;
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
+
+	/*
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate using
-	 * multivariate stats, and remember the relid/columns. We'll then
-	 * cross-check if we have suitable stats, and only if needed we'll split
-	 * the clauses into multivariate and regular lists.
+	 * Get the varattnos from both conditions and clauses.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
-	 * using either a range or equality.
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
+	 *
+	 * XXX Is that really true?
 	 */
-	foreach (l, clauses)
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
 	{
-		Node	   *clause = (Node *) lfirst(l);
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
 
-		/* ignore the result here - we only need the attnums */
-		clause_is_mv_compatible(clause, relid, &attnums, types);
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
 	}
 
-	/*
-	 * If there are not at least two attributes referenced by the clause(s),
-	 * we can throw everything out (as we'll revert to simple stats).
-	 */
-	if (bms_num_members(attnums) <= 1)
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
-		attnums = NULL;
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
 	}
 
-	return attnums;
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
 }
 
-/*
- * Count the number of attributes in clauses compatible with multivariate stats.
- */
-static int
-count_mv_attnums(List *clauses, Index relid, int type)
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
 {
-	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+	int i;
+	ListCell *l;
 
-	c = bms_num_members(attnums);
+	Node** clauses_array;
 
-	bms_free(attnums);
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
 
-	return c;
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
 }
 
-/*
- * Count varnos referenced in the clauses, and if there's a single varno then
- * return the index in 'relid'.
- */
-static int
-count_varnos(List *clauses, Index *relid)
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Index relid,
+					 int type, Node **clauses, int nclauses)
 {
-	int cnt;
-	Bitmapset *varnos = NULL;
+	int			i;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
 
-	varnos = pull_varnos((Node *) clauses);
-	cnt = bms_num_members(varnos);
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
 
-	/* if there's a single varno in the clauses, remember it */
-	if (bms_num_members(varnos) == 1)
-		*relid = bms_singleton_member(varnos);
+		if (! clause_is_mv_compatible(clauses[i], relid, &attnums, type))
+			elog(ERROR, "should not get non-mv-compatible clause");
 
-	bms_free(varnos);
+		clauses_attnums[i] = attnums;
+	}
 
-	return cnt;
+	return clauses_attnums;
 }
- 
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
+}
+
 /*
- * We're looking for statistics matching at least 2 attributes, referenced in
- * clauses compatible with multivariate statistics. The current selection
- * criteria is very simple - we choose the statistics referencing the most
- * attributes.
- *
- * If there are multiple statistics referencing the same number of columns
- * (from the clauses), the one with less source columns (as listed in the
- * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
- *
- * This is a very simple criteria, and has several weaknesses:
- *
- * (a) does not consider the accuracy of the statistics
- *
- *     If there are two histograms built on the same set of columns, but one
- *     has 100 buckets and the other one has 1000 buckets (thus likely
- *     providing better estimates), this is not currently considered.
- *
- * (b) does not consider the type of statistics
- *
- *     If there are three statistics - one containing just a MCV list, another
- *     one with just a histogram and a third one with both, we treat them equally.
+ * Chooses the combination of statistics, optimal for estimation of a particular
+ * clause list.
  *
- * (c) does not consider the number of clauses
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce the size
+ * of the problem (eliminate clauses/statistics that can't be really used in
+ * the solution).
  *
- *     As explained, only the number of referenced attributes counts, so if
- *     there are multiple clauses on a single attribute, this still counts as
- *     a single attribute.
+ * It also precomputes bitmaps for attributes covered by clauses and statistics,
+ * so that we don't need to do that over and over in the actual optimizations
+ * (as it's both CPU and memory intensive).
  *
- * (d) does not consider type of condition
  *
- *     Some clauses may work better with some statistics - for example equality
- *     clauses probably work better with MCV lists than with histograms. But
- *     IS [NOT] NULL conditions may often work better with histograms (thanks
- *     to NULL-buckets).
+ * TODO Another way to make the optimization problems smaller might be splitting
+ *      the statistics into several disjoint subsets, i.e. if we can split the
+ *      graph of statistics (after the elimination) into multiple components
+ *      (so that stats in different components share no attributes), we can do
+ *      the optimization for each component separately.
  *
- * So for example with five WHERE conditions
- *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
- *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
- * as it references the most columns.
- *
- * Once we have selected the multivariate statistics, we split the list of
- * clauses into two parts - conditions that are compatible with the selected
- * stats, and conditions are estimated using simple statistics.
- *
- * From the example above, conditions
- *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
- *
- * will be estimated using the multivariate statistics (a,b,c,d) while the last
- * condition (e = 1) will get estimated using the regular ones.
- *
- * There are various alternative selection criteria (e.g. counting conditions
- * instead of just referenced attributes), but eventually the best option should
- * be to combine multiple statistics. But that's much harder to do correctly.
- *
- * TODO Select multiple statistics and combine them when computing the estimate.
- *
- * TODO This will probably have to consider compatibility of clauses, because
- *      'dependencies' will probably work only with equality clauses.
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew that we
+ *      can cover 10 clauses and reuse 8 dependencies, maybe covering 9 clauses
+ *      and 7 dependencies would be OK?
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static List*
+choose_mv_statistics(PlannerInfo *root, Index relid, List *stats,
+					 List *clauses, List *conditions)
 {
 	int i;
-	ListCell   *lc;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
+
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
 
-	MVStatisticInfo *choice = NULL;
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements) and for
-	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
 	 */
-	foreach (lc, stats)
+	while (true)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
-
-		/* columns matching this statistics */
-		int matches = 0;
+		List	   *tmp;
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
 
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
-			continue;
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, relid, type,
+							 stats, clauses, &compatible_attnums);
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
 
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		if (conditions != NIL)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
+			tmp = filter_clauses(root, relid, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
 		}
-	}
 
-	return choice;
-}
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
+
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
 
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
 
-/*
- * This splits the clauses list into two parts - one containing clauses that
- * will be evaluated using the chosen statistics, and the remaining clauses
- * (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, Index relid,
-					List *clauses, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
 
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
 
-	Bitmapset *mvattnums = NULL;
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
 
-	/* build bitmap of attributes, so we can do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
 
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, relid, type,
+										   clauses_array, nclauses);
 
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, relid, type,
+											  conditions_array, nconditions);
 
-		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
 		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
 		}
 
-		/*
-		 * The clause matches the selected stats, so put it to the list of
-		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
-		 * clauses (that may be selected later).
-		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
+		pfree(best);
 	}
 
-	/*
-	 * Perform regular estimation using the clauses incompatible with the chosen
-	 * histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
 
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
+
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
+
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
+
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
+
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
+
+	return result;
 }
 
 typedef struct
@@ -1637,9 +2878,6 @@ has_stats(List *stats, int type)
 		/* terminate if we've found at least one matching statistics */
 		if (stats_type_matches(stat, type))
 			return true;
-
-		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
-			return true;
 	}
 
 	return false;
@@ -1689,22 +2927,26 @@ find_stats(PlannerInfo *root, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -1715,32 +2957,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
 
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -1971,64 +3266,57 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				}
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mcvlist->nitems; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2086,15 +3374,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2107,25 +3398,55 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
 
-	nmatches = mvhist->nbuckets;
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2139,10 +3460,23 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 #ifdef DEBUG_MVHIST
@@ -2151,9 +3485,14 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /* cached result of bucket boundary comparison for a single dimension */
@@ -2301,7 +3640,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 {
 	int i;
 	ListCell * l;
-
+ 
 	/*
 	 * Used for caching function calls, only once per deduplicated value.
 	 *
@@ -2344,7 +3683,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 			FmgrInfo	opproc;			/* operator */
 			fmgr_info(get_opcode(expr->opno), &opproc);
-
+ 
 			/* reset the cache (per clause) */
 			memset(callcache, 0, mvhist->nbuckets);
 
@@ -2504,64 +3843,57 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
-			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			/* by default none of the buckets matches the clauses (OR clause) */
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* but AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
-
+			pfree(tmp_matches);
 		}
 		else
 			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* free the call cache */
 	pfree(callcache);
 
 	return nmatches;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..57214e0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3518,7 +3518,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3541,7 +3542,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3708,7 +3710,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3744,7 +3746,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3781,7 +3784,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3919,12 +3923,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3936,7 +3942,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index ea831f5..6299e75 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d396ef1..805d633 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1627,13 +1627,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6260,7 +6262,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6580,7 +6583,8 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7331,7 +7335,8 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7561,7 +7566,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index edcafce..b7aabed 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -393,6 +394,15 @@ static const struct config_enum_entry force_parallel_mode_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3743,6 +3753,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 3e4f4d1..d404914 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -90,6 +90,137 @@ even attempting to do the more expensive estimation.
 Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
+Combining multiple statistics
+-----------------------------
+
+When estimating selectivity of a list of clauses, there may exist no statistics
+covering all of them. If there are multiple statistics, each covering some
+subset of the attributes, the optimizer needs to figure out which of those
+statistics to apply.
+
+When the statistics do not overlap, the solution is trivial - we can simply
+split the groups of conditions by the matching statistics, and then multiply the
+selectivities. For example assume multivariate statistics on (b,c) and (d,e),
+and a condition like this:
+
+    (a=1) AND (b=2) AND (c=3) AND (d=4) AND (e=5)
+
+Then (a=1) is not covered by any of the statistics, so will be estimated using
+the regular per-column statistics. The two conditions ((b=2) AND (c=3)) will be
+estimated using the (b,c) statistics, and ((d=4) AND (e=5)) will be estimated
+using (d,e) statistics. And the resulting selectivities will be estimated.
+
+Now, what if the statistics overlap? For example assume the same condition as
+above, but let's say we have statistics on (a,b,c) and (a,c,d,e). What then?
+
+As selectivity is just a probability that the condition holds for a random row,
+we can write the selectivity like this:
+
+    P(a=1 & b=2 & c=3 & d=4 & e=5)
+
+and we can rewrite it using conditional probability like this
+
+    P(a=1 & b=2 & c=3) * P(d=4 & e=5 | a=1 & b=2 & c=3)
+
+Notice that the first part already matches to (a,b,c) statistics. If we assume
+that columns that are not referenced by the same statistics are independent, we
+may rewrite the second half like this
+
+    P(d=4 & e=5 | a=1 & b=2 & c=3) = P(d=4 & e=5 | a=1 & c=3)
+
+which corresponds to the statistics on (a,c,d,e).
+
+If there are multiple statistics defined on a table, it's not difficult to come
+up with examples when there are multiple ways to combine them to cover a list of
+clauses. We need a way to find the best combination of statistics.
+
+This is the purpose of choose_mv_statistics(). It searches through the possible
+combinations of statistics, and searches such combination that
+
+    (a) covers the most clauses of the list
+
+    (b) reuses the maximum number of clauses as conditions
+        (in conditional probabilities)
+
+While (a) criteria seems natural, the (b) may seem a bit awkward at first. The
+idea is that conditions in a way of transfering information about dependencies
+between statistics.
+
+There are two alternative implementations of choose_mv_statistics() - greedy
+and exhaustive. Exhaustive actually searches through all possible combinations
+of statistics, and for larger numbers of statistics may get quite expensive
+(as it, unsurprisingly, has exponential cost). Greedy terminates in less than
+K steps (when K is the number of clauses), and in each step chooses the best
+next statistics. I've been unable to come up with an example where those two
+approaches would produce different combinations.
+
+It's possible to choose the optimization using mvstat_search_type, with either
+'greedy' or 'exhaustive' values (default is 'greedy').
+
+    SET mvstat_search_type = 'exhaustive';
+
+Note: This is meant mostly for experimentation. I do expect we'll choose one of
+the algorithms and remove the GUC before commit.
+
+
+Limitations of combining statistics
+-----------------------------------
+
+As described in the section 'Combining multiple statistics', the current appoach
+is based on transfering information between statistics by means of conditional
+probabilities. This is a relatively cheap and efficient approach, but it is
+based on two assumptions:
+
+    (1) The overlap between the statistics needs to be sufficiently large, i.e.
+        there needs to be enough columns shared by the statistics to transfer
+        information about dependencies between the remaining columns.
+
+    (2) The query needs to include sufficient clauses on the shared columns.
+
+How a violation of those assumptions may be a problem can be illustrated by
+a simple example. Assume a table with three columns (a,b,c) containing exactly
+the same values, and statistics on (a,b) and (b,c):
+
+    CREATE TABLE test AS SELECT i, i, i
+                           FROM generate_series(1,1000);
+
+    CREATE STATISTICS s1 ON test (a,b) WITH (mcv);
+    CREATE STATISTICS s2 ON test (b,c) WITH (mcv);
+
+    ANALYZE test;
+
+First, let's estimate this query:
+
+    SELECT * FROM test WHERE (a < 10) AND (c < 10);
+
+Clearly, there are no conditions on 'b' (which is the only column shared by the
+two statistics), so we'll end up with an estimate based on assumption of
+independence:
+
+    P(a < 10) * P(c < 10) = 0.01 * 0.01 = 0.0001
+
+Which is a significant under-estimate, as the proper selectivity is 0.01.
+
+But let's estimate another query:
+
+    SELECT * FROM test WHERE (a < 10) AND (b < 500) AND (c < 10);
+
+In this case, the estimate may be computed for example like this:
+
+    P[(a < 10) & (b < 500) & (c < 10)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (a < 10) & (b < 500)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (b < 500)]
+
+The trouble is the probability P(c < 10 | b < 500) evaluates to 0.02, because
+we have assumed (a) and (c) are independent because there is no statistic
+containing both these columns, and the condition on (b) does not transfer
+sufficient amount of information between the two statistics.
+
+Currently, the only solution is to build statistics on all three columns, but
+see the 'combining statistics using convolution' section for ideas on how to
+improve this.
+
+
 Further (possibly crazy) ideas
 ------------------------------
 
@@ -111,3 +242,38 @@ But of course, this may result in expensive estimation (CPU-wise).
 
 So we might add a GUC to choose between a simple (single statistics) and thus
 multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
+
+
+Combining stats using convolution
+---------------------------------
+
+While the current approach for combining statistics is based on conditional
+probabilities, and thus only works when the query includes conditions on the
+overlapping parts of the statistics. But there may be other ways to combine
+statistics, relaxing this requirement.
+
+Let's assume two histograms H1 and H2 - then combining them might work about
+like this:
+
+
+    for (buckets of H1, satisfying local conditions)
+    {
+        for (buckets of H2, overlapping with H1 bucket)
+        {
+            mark H2 bucket as 'valid'
+        }
+    }
+
+    s1 = s2 = 0.0
+    for (buckets of H2 marked as valid)
+    {
+        s1 += frequency
+
+        if (bucket satistifes local conditions)
+             s2 += frequency
+    }
+
+    s = (s2 / s1) /* final selectivity estimate */
+
+However this may quickly get non-trivial, e.g. when combining two statistics
+of different types (histogram vs. MCV).
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..33f5a1b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -192,11 +192,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 6708139..80bf96f 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
2.5.0

0005-multivariate-histograms.patchtext/x-patch; charset=UTF-8; name=0005-multivariate-histograms.patchDownload

From 93e428970d3d814f6b61e4a5f4384237cf94ed41 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/9] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 doc/src/sgml/ref/create_statistics.sgml    |   44 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   44 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  574 +++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/README.histogram |  299 ++++
 src/backend/utils/mvstats/README.stats     |    2 +
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2023 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  136 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 21 files changed, 3570 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.histogram
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index d6973e8..f7336fd 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -133,6 +133,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>max_buckets</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of histogram buckets.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>max_mcv_items</> (<type>integer</>)</term>
     <listitem>
      <para>
@@ -220,6 +238,32 @@ EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
 </programlisting>
   </para>
 
+  <para>
+   Create table <structname>t3</> with two strongly correlated columns, and
+   a histogram on those two columns:
+
+<programlisting>
+CREATE TABLE t3 (
+    a   float,
+    b   float
+);
+
+INSERT INTO t3 SELECT mod(i,1000), mod(i,1000) + 50 * (r - 0.5) FROM (
+                   SELECT i, random() r FROM generate_series(1,1000000) s(i)
+                 ) foo;
+
+CREATE STATISTICS s3 ON t3 (a, b) WITH (histogram);
+
+ANALYZE t2;
+
+-- small overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
+
+-- no overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);
+</programlisting>
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5c40334..b151db1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -167,7 +167,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index c480fbe..e0b085f 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -71,12 +71,15 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -175,6 +178,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("maximum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -183,10 +209,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -194,6 +220,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -214,11 +245,14 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 333e24b..9172f21 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2163,10 +2163,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 7fc0c49..5e73a4e 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -74,6 +75,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -81,6 +84,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
@@ -95,6 +104,7 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -123,7 +133,7 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
  *
  * First we try to reduce the list of clauses by applying (soft) functional
  * dependencies, and then we try to estimate the selectivity of the reduced
- * list of clauses using the multivariate MCV list.
+ * list of clauses using the multivariate MCV list and histograms.
  *
  * Finally we remove the portion of clauses estimated using multivariate stats,
  * and process the rest of the clauses using the regular per-column stats.
@@ -216,11 +226,13 @@ clauselist_selectivity(PlannerInfo *root,
 	 * with the multivariate code and simply skip to estimation using the
 	 * regular per-column stats.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
-		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) &&
+		(count_mv_attnums(clauses, relid,
+						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
 		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
+											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
 		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
@@ -232,7 +244,7 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* split the clauselist into regular and mv-clauses */
 			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV);
+										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 			/* we've chosen the histogram to match the clauses */
 			Assert(mvclauses != NIL);
@@ -944,6 +956,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -957,9 +970,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 *      selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1129,7 +1157,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1360,7 +1388,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 			case F_SCALARGTSEL:
 
 				/* not compatible with functional dependencies */
-				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+				if (! (context->types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST)))
 					return true;	/* terminate */
  
 				break;
@@ -1588,6 +1616,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+		return true;
+
 	return false;
 }
 
@@ -1606,6 +1637,9 @@ has_stats(List *stats, int type)
 		/* terminate if we've found at least one matching statistics */
 		if (stats_type_matches(stat, type))
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2010,3 +2044,525 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/* cached result of bucket boundary comparison for a single dimension */
+
+#define HIST_CACHE_NOT_FOUND		0x00
+#define HIST_CACHE_FALSE			0x01
+#define HIST_CACHE_TRUE				0x03
+#define HIST_CACHE_MASK				0x02
+
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache)
+{
+	bool a, b;
+
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/*
+	 * First some quick checks on equality - if any of the boundaries equals,
+	 * we have a partial match (so no need to call the comparator).
+	 */
+	if (((min_value == constvalue) && (min_include)) ||
+		((max_value == constvalue) && (max_include)))
+		return MVSTATS_MATCH_PARTIAL;
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	/*
+	 * If result for the bucket lower bound not in cache, evaluate the function
+	 * and store the result in the cache.
+	 */
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, min_value));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/* And do the same for the upper bound. */
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, max_value));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	return (a ^ b) ? MVSTATS_MATCH_PARTIAL : MVSTATS_MATCH_NONE;
+}
+
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache, bool isgt)
+{
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	bool a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	bool b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   min_value,
+										   constvalue));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   max_value,
+										   constvalue));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/*
+	 * Now, we need to combine both results into the final answer, and we need
+	 * to be careful about the 'isgt' variable which kinda inverts the meaning.
+	 *
+	 * First, we handle the case when each boundary returns different results.
+	 * In that case the outcome can only be 'partial' match.
+	 */
+	 if (a != b)
+		return MVSTATS_MATCH_PARTIAL;
+
+	/*
+	 * When the results are the same, then it depends on the 'isgt' value. There
+	 * are four options:
+	 *
+	 * isgt=false a=b=true  => full match
+	 * isgt=false a=b=false => empty
+	 * isgt=true  a=b=true  => empty
+	 * isgt=true  a=b=false => full match
+	 *
+	 * We'll cheat a bit, because we know that (a=b) so we'll use just one of them.
+	 */
+	if (isgt)
+		return (!a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+	else
+		return ( a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo		ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					char res = MVSTATS_MATCH_NONE;
+
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+					bool mininclude, maxinclude;
+					int minidx, maxidx;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minidx = bucket->min[idx];
+					maxidx = bucket->max[idx];
+
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mininclude = bucket->min_inclusive[idx];
+					maxinclude = bucket->max_inclusive[idx];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 * 
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* Var < Const */
+						case F_SCALARGTSEL:	/* Var > Const */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															   minval, maxval,
+															   minidx, maxidx,
+															   mininclude, maxinclude,
+															   callcache, isgt);
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the
+							 * lt operator, and we also check for equality with the boundaries.
+							 */
+
+							res = bucket_contains_value(ltproc, cst->constvalue,
+														minval, maxval,
+														minidx, maxidx,
+														mininclude, maxinclude,
+														callcache);
+							break;
+					}
+
+					UPDATE_RESULT(matches[i], res, is_or);
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the bucket, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8394111..2519249 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -422,10 +422,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.histogram b/src/backend/utils/mvstats/README.histogram
new file mode 100644
index 0000000..cd640e5
--- /dev/null
+++ b/src/backend/utils/mvstats/README.histogram
@@ -0,0 +1,299 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+    * frequency (fraction of tuples in the bucket)
+    * whether the boundaries are inclusive or exclusive
+    * whether the dimension contains only NULL values
+    * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+    (a) FULL_MATCH - The bucket definitely matches the clause.
+
+    (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+                        the tuples it represents.
+
+    (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+    WHERE (a < 10) => FULL_MATCH (all range values are below
+                      10, so the whole bucket matches)
+
+    WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+                      the clause, but we don't know how many)
+
+    WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+                      no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+    (a) create an initial bucket (containing all sample rows)
+
+    (b) create NULL buckets (by splitting the initial bucket)
+
+    (c) repeat
+    
+        (1) choose bucket to split next
+
+        (2) terminate if no bucket that might be split found, or if we've
+            reached the maximum number of buckets (16384)
+
+        (3) choose dimension to partition the bucket by
+
+        (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split, as discussed in the next section.
+
+
+Partitioning criteria
+---------------------
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency.
+
+We also need to produce "regular" buckets, because buckets with one dimension
+much longer than the others are very likely to match a lot of conditions (which
+increases error, even if the bucket frequency is very low).
+
+This is especially important when handling OR-clauses, because in that case each
+clause may add buckets independently. With AND-clauses all the clauses have to
+match each bucket, which makes this issue somewhat less concenrning.
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket(),
+which also includes further discussion.
+
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL. 
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_mv_statistic table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+    2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+    (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+    2*N + K       = 1007
+    (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+    1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+    SELECT * FROM pg_mv_histogram_buckets();
+
+It has two input parameters:
+
+    oid   - OID of the histogram (pg_mv_statistic.staoid)
+    otype - type of output
+
+and produces a table with these columns:
+
+    - bucket ID                (0...nbuckets-1)
+    - lower bucket boundaries  (string array)
+    - upper bucket boundaries  (string array)
+    - nulls only dimensions    (boolean array)
+    - lower boundary inclusive (boolean array)
+    - upper boundary includive (boolean array)
+    - frequency                (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+    - 0 - values stored in the histogram, encoded as text
+    - 1 - indexes into the deduplicated arrays
+    - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 5c5c59a..3e4f4d1 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -18,6 +18,8 @@ Currently we only have two kinds of multivariate statistics
 
     (b) MCV lists (README.mcv)
 
+    (c) multivariate histograms (README.histogram)
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 4f5a842..f6d1074 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..6b07b51
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2023 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static Datum * build_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+							   VacAttrStats **stats, int i, int *nvals);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(uint16))
+ * - max boundary indexes (2 * ndim * sizeof(uint16))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(uint16) + 3 * sizeof(bool)) + (2 * sizeof(float))
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		(*(float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * builds a multivariate algorithm
+ *
+ * The build algorithm is iterative - initially a single bucket containing all
+ * the sample rows is formed, and then repeatedly split into smaller buckets.
+ * In each step the largest bucket (in some sense) is chosen to be split next.
+ *
+ * The criteria for selecting the largest bucket (and the dimension for the
+ * split) needs to be elaborate enough to produce buckets of roughly the same
+ * size, and also regular shape (not very long in one dimension).
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [maximum number of buckets not reached]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket for
+ * more details about the algorithm.
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram		histogram;
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* build histogram header */
+
+	histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+
+	histogram->nbuckets = 1;
+	histogram->ndimensions = numattrs;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later to select
+	 * dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+		distvalues[i] = build_ndistinct(numrows, rows, attrs, stats, i,
+										&ndistvalues[i]);
+
+	/*
+	 * Split the initial bucket into buckets that don't mix NULL and non-NULL
+	 * values in a single dimension.
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	/*
+	 * Do the actual histogram build - select a bucket and split it.
+	 *
+	 * FIXME This should use  the max_buckets specified in CREATE STATISTICS.
+	 */
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no buckets eligible for partitioning */
+		if (bucket == NULL)
+			break;
+
+		/* we modify the bucket in-place and add one new bucket */
+		histogram->buckets[histogram->nbuckets++]
+			= partition_bucket(bucket, attrs, stats, ndistvalues, distvalues);
+	}
+
+	/* finalize the histogram build - compute the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in case some
+		 * of the rows were used for MCV.
+		 *
+		 * XXX Perhaps this should simply compute frequency with respect to the
+		 *     local freuquency, and then factor-in the MCV later.
+		 *
+		 * FIXME The 'ntuples' sounds a bit inappropriate for frequency.
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* build array of distinct values for a single attribute */
+static Datum *
+build_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				VacAttrStats **stats, int i, int *nvals)
+{
+	int				j;
+	int				nvalues,
+					ndistinct;
+	Datum		   *values,
+				   *distvalues;
+
+	SortSupportData	ssup;
+	StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	nvalues = 0;
+	values = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+	/* collect values from the sample rows, ignore NULLs */
+	for (j = 0; j < numrows; j++)
+	{
+		Datum	value;
+		bool	isnull;
+
+		/* remember the index of the sample row, to make the partitioning simpler */
+		value = heap_getattr(rows[j], attrs->values[i],
+							 stats[i]->tupDesc, &isnull);
+
+		if (isnull)
+			continue;
+
+		values[nvalues++] = value;
+	}
+
+	/* if no non-NULL values were found, free the memory and terminate */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		return NULL;
+	}
+
+	/* sort the array of values using the SortSupport */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/* count the distinct values first, and allocate just enough memory */
+	ndistinct = 1;
+	for (j = 1; j < nvalues; j++)
+		if (compare_scalars_simple(&values[j], &values[j-1], &ssup) != 0)
+			ndistinct += 1;
+
+	distvalues = (Datum*)palloc0(sizeof(Datum) * ndistinct);
+
+	/* now collect distinct values into the array */
+	distvalues[0] = values[0];
+	ndistinct = 1;
+
+	for (j = 1; j < nvalues; j++)
+	{
+		if (compare_scalars_simple(&values[j], &values[j-1], &ssup) != 0)
+		{
+			distvalues[ndistinct] = values[j];
+			ndistinct += 1;
+		}
+	}
+
+	pfree(values);
+
+	*nvals = ndistinct;
+	return distvalues;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	DimensionInfo  *info;
+	SortSupport		ssup;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	info = (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs (we won't use
+		 * them, but we don't know how many are there), and then collect all
+		 * non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 1MB (%ld > %d)",
+					total_length, (1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* serialize the deduplicated values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			Datum v = values[i][j];
+
+			if (info[i].typbyval)			/* passed by value */
+			{
+				memcpy(data, &v, info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)	/* pased by reference */
+			{
+				memcpy(data, DatumGetPointer(v), info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)	/* varlena */
+			{
+				memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)	/* cstring */
+			{
+				memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v))+1);
+				data += strlen(DatumGetPointer(v)) + 1;
+			}
+		}
+
+		/* make sure we got exactly the amount of data we expected */
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* finally serialize the items, with uint16 indexes instead of the values */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		BUCKET_NTUPLES(bucket) = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+
+				/* min boundary */
+				v = (Datum*)bsearch_arg(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								compare_scalars_simple, &ssup[j]);
+
+				Assert(v != NULL);	/* serialization or deduplication error */
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch_arg(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								compare_scalars_simple, &ssup[j]);
+
+				Assert(v != NULL);	/* serialization or deduplication error */
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* free the values/counts arrays here */
+	pfree(counts);
+	pfree(info);
+	pfree(ssup);
+
+	for (i = 0; i < ndims; i++)
+		pfree(values[i]);
+
+	pfree(values);
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary values
+ * deduplicated, so that it's possible to optimize the estimation part by
+ * caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete, as we yet
+	 * have to count the array sizes (from DimensionInfo records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* a single buffer for all the values and counts */
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+
+	for (i = 0; i < ndims; i++)
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+			   sizeof(MVSerializedBucket) +		/* bucket pointer */
+			   sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				histogram->values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				histogram->values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the other types need a chunk of the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which we use
+	 * when selecting bucket to partition), and then number of distinct values
+	 * for each partition (which we use when choosing which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm produces
+ * buckets with about equal frequency and regular size. We select the bucket
+ * with the highest number of distinct values, and then split it by the longest
+ * dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this is used
+ * to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this contains
+ *       values for all the tuples from the sample, not just the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned, or NULL if
+ * there are no buckets that may be split (e.g. if all buckets are too small
+ * or contain too few distinct values).
+ *
+ *
+ * Tricky example
+ * --------------
+ *
+ * Consider this table:
+ *
+ *     CREATE TABLE t AS SELECT i AS a, i AS b
+ *                         FROM generate_series(1,1000000) s(i);
+ *
+ *     CREATE STATISTICS s1 ON t (a,b) WITH (histogram);
+ *
+ *     ANALYZE t;
+ *
+ * It's a very specific (and perhaps artificial) example, because every bucket
+ * always has exactly the same number of distinct values in all dimensions,
+ * which makes the partitioning tricky.
+ *
+ * Then:
+ *
+ *     SELECT * FROM t WHERE (a < 100) AND (b < 100);
+ *
+ * is estimated to return ~120 rows, while in reality it returns only 99.
+ *
+ *                           QUERY PLAN
+ *     -------------------------------------------------------------
+ *      Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *                     (actual time=0.129..82.776 rows=99 loops=1)
+ *        Filter: ((a < 100) AND (b < 100))
+ *        Rows Removed by Filter: 999901
+ *      Planning time: 1.286 ms
+ *      Execution time: 82.984 ms
+ *     (5 rows)
+ *
+ * So this estimate is reasonably close. Let's change the query to OR clause:
+ *
+ *     SELECT * FROM t WHERE (a < 100) OR (b < 100);
+ *
+ *                           QUERY PLAN
+ *     -------------------------------------------------------------
+ *      Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *                     (actual time=0.145..99.910 rows=99 loops=1)
+ *        Filter: ((a < 100) OR (b < 100))
+ *        Rows Removed by Filter: 999901
+ *      Planning time: 1.578 ms
+ *      Execution time: 100.132 ms
+ *     (5 rows)
+ *
+ * That's clearly a much worse estimate. This happens because the histogram
+ * contains buckets like this:
+ *
+ *     bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ * i.e. the length of "a" dimension is (30310-3)=30307, while the length of "b"
+ * is (30593-30134)=459. So the "b" dimension is much narrower than "a".
+ * Of course, there are also buckets where "b" is the wider dimension.
+ *
+ * This is partially mitigated by selecting the "longest" dimension but that
+ * only happens after we already selected the bucket. So if we never select the
+ * bucket, this optimization does not apply.
+ *
+ * The other reason why this particular example behaves so poorly is due to the
+ * way we actually split the selected bucket. We do attempt to divide the bucket
+ * into two parts containing about the same number of tuples, but that does not
+ * too well when most of the tuples is squashed on one side of the bucket.
+ *
+ * For example for columns with data on the diagonal (i.e. when a=b), we end up
+ * with a narrow bucket on the diagonal and a huge bucket overing the remaining
+ * part (with much lower density).
+ *
+ * So perhaps we need two partitioning strategies - one aiming to split buckets
+ * with high frequency (number of sampled rows), the other aiming to split
+ * "large" buckets. And alternating between them, somehow.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ *      histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest bucket
+ * dimension, measured using the array of distinct values built at the very
+ * beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly distributed,
+ * and then use this to measure length. It's essentially a number of distinct
+ * values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts with
+ * roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning the new
+ * bucket (essentially shrinking the existing one in-place and returning the
+ * other "half" as a new bucket). The caller is responsible for adding the new
+ * bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension most in
+ * need of a split. For a nice summary and general overview, see "rK-Hist : an
+ * R-Tree based histogram for multi-dimensional selectivity estimation" thesis
+ * by J. A. Lopez, Concordia University, p.34-37 (and possibly p. 32-34 for
+ * explanation of the terms).
+ *
+ * It requires care to prevent splitting only one dimension and not splitting
+ * another one at all (which might happen easily in case of strongly dependent
+ * columns - e.g. y=x). The current algorithm minimizes this, but may still
+ * happen for perfectly dependent examples (when all the dimensions have equal
+ * length, the first one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* Look for the next dimension to split. */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch_arg(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), compare_scalars_simple, &ssup);
+
+		b = (Datum*)bsearch_arg(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), compare_scalars_simple, &ssup);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values and
+	 * then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we never split null-only dimension) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values in this
+	 * dimension, and we want to split this into half, so walk through the
+	 * array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value, and
+	 * use it as an exclusive upper boundary (and inclusive lower boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct values
+	 *      (at least for even distinct counts), but that would require being
+	 *      able to do an average (which does not work for non-numeric types).
+	 *
+	 * TODO Another option is to look for a split that'd give about 50% tuples
+	 *      (not distinct values) in each partition. That might work better
+	 *      when there are a few very frequent values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno' index. We
+	 * know 'nrows' rows should remain in the original bucket and the rest goes
+	 * to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should go to the
+	 * new one. Use the tupno field to get the actual HeapTuple row from the
+	 * original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time data, i.e.
+ * sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies the
+ * Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types (assuming
+ * they don't use collations etc.)
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes above
+	 * (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and non-NULL
+ * values in a single dimension. Each dimension may either be marked as 'nulls
+ * only', and thus containing only NULL values, or it must not contain any NULL
+ * values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns, it's
+ * necessary to build those NULL-buckets. This is done in an iterative way
+ * using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL and
+ *         non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not marked as
+ *         NULL-only, mark it as NULL-only and run the algorithm again (on
+ *         this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the bucket
+ *         into two parts - one with NULL values, one with non-NULL values
+ *         (replacing the current one). Then run the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions should
+ * be quite low - limited by the number of NULL-buckets. Also, in each branch
+ * the number of nested calls is limited by the number of dimensions
+ * (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The number of
+ * buckets produced by this algorithm is rather limited - with N dimensions,
+ * there may be only 2^N such buckets (each dimension may be either NULL or
+ * non-NULL). So with 8 dimensions (current value of MVSTATS_MAX_DIMENSIONS)
+ * there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further optimizing
+ * the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL in a
+	 * dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute here - we can
+		 *       start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only, but is
+	 * not yet marked like that. It's enough to mark it and repeat the process
+	 * recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in the
+	 * dimension, one with non-NULL values. We don't need to sort the data or
+	 * anything, but otherwise it's similar to what partition_bucket() does.
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each bucket (NULL
+	 *      is not a value, so NULL buckets get 0, and the other bucket got all
+	 *      the distinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if the
+ * statistics contains no histogram (or if there's no statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ * 
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options skew the
+ * lengths by distributing the distinct values uniformly. For data types
+ * without a clear meaning of 'distance' (e.g. strings) that is not a big deal,
+ * but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+#define OUTPUT_FORMAT_RAW		0
+#define OUTPUT_FORMAT_INDEXES	1
+#define	OUTPUT_FORMAT_DISTINCT	2
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_volume = 1.0;
+		StringInfo	bufs;
+
+		char	   *format;
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * The scalar values will be formatted directly, using snprintf.
+		 *
+		 * The 'array' values will be formatted through StringInfo.
+		 */
+		values = (char **) palloc0(9 * sizeof(char *));
+		bufs   = (StringInfo) palloc0(9 * sizeof(StringInfoData));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		initStringInfo(&bufs[1]);	/* lower boundaries */
+		initStringInfo(&bufs[2]);	/* upper boundaries */
+		initStringInfo(&bufs[3]);	/* nulls-only */
+		initStringInfo(&bufs[4]);	/* lower inclusive */
+		initStringInfo(&bufs[5]);	/* upper inclusive */
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		/*
+		 * lookup output functions for all histogram dimensions
+		 *
+		 * XXX This might be one in the first call and stored in user_fctx.
+		 */
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/* for the arrays of lower/upper boundaries, formated according to otype */
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			Datum  *vals   = histogram->values[i];
+
+			uint16	minidx = bucket->min[i];
+			uint16	maxidx = bucket->max[i];
+
+			/* compute bucket volume, using distinct values as a measure
+			 *
+			 * XXX Not really sure what to do for NULL dimensions here, so let's
+			 *     simply count them as '1'.
+			 */
+			bucket_volume
+				*= (double)(maxidx - minidx + 1) / (histogram->nvalues[i]-1);
+
+			if (i == 0)
+				format = "{%s";		/* fist dimension */
+			else if (i < (histogram->ndimensions - 1))
+				format = ", %s";	/* medium dimensions */
+			else
+				format = ", %s}";	/* last dimension */
+
+			appendStringInfo(&bufs[3], format, bucket->nullsonly[i] ? "t" : "f");
+			appendStringInfo(&bufs[4], format, bucket->min_inclusive[i] ? "t" : "f");
+			appendStringInfo(&bufs[5], format, bucket->max_inclusive[i] ? "t" : "f");
+
+			/* for NULL-only  dimension, simply put there the NULL and continue */
+			if (bucket->nullsonly[i])
+			{
+				if (i == 0)
+					format = "{%s";
+				else if (i < (histogram->ndimensions - 1))
+					format = ", %s";
+				else
+					format = ", %s}";
+
+				appendStringInfo(&bufs[1], format, "NULL");
+				appendStringInfo(&bufs[2], format, "NULL");
+
+				continue;
+			}
+
+			/* otherwise we really need to format the value */
+			switch (otype)
+			{
+				case OUTPUT_FORMAT_RAW:		/* actual boundary values */
+
+					if (i == 0)
+						format = "{%s";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %s";
+					else
+						format = ", %s}";
+
+					appendStringInfo(&bufs[1], format,
+									 FunctionCall1(&fmgrinfo[i], vals[minidx]));
+
+					appendStringInfo(&bufs[2], format,
+									 FunctionCall1(&fmgrinfo[i], vals[maxidx]));
+
+					break;
+
+				case OUTPUT_FORMAT_INDEXES:	/* indexes into deduplicated arrays */
+
+					if (i == 0)
+						format = "{%d";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %d";
+					else
+						format = ", %d}";
+
+					appendStringInfo(&bufs[1], format, minidx);
+
+					appendStringInfo(&bufs[2], format, maxidx);
+
+					break;
+
+				case OUTPUT_FORMAT_DISTINCT:	/* distinct arrays as measure */
+
+					if (i == 0)
+						format = "{%f";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %f";
+					else
+						format = ", %f}";
+
+					appendStringInfo(&bufs[1], format,
+									 (minidx * 1.0 / (histogram->nvalues[i]-1)));
+
+					appendStringInfo(&bufs[2], format,
+									 (maxidx * 1.0 / (histogram->nvalues[i]-1)));
+
+					break;
+
+				default:
+					elog(ERROR, "unknown output type: %d", otype);
+			}
+		}
+
+		values[1] = bufs[1].data;
+		values[2] = bufs[2].data;
+		values[3] = bufs[3].data;
+		values[4] = bufs[4].data;
+		values[5] = bufs[5].data;
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_volume);	/* density */
+		snprintf(values[8], 64, "%f", bucket_volume);	/* volume (as a fraction) */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[6]);
+		pfree(values[7]);
+		pfree(values[8]);
+
+		resetStringInfo(&bufs[1]);
+		resetStringInfo(&bufs[2]);
+		resetStringInfo(&bufs[3]);
+		resetStringInfo(&bufs[4]);
+		resetStringInfo(&bufs[5]);
+
+		pfree(bufs);
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int i, j;
+
+	float ffull = 0, fpartial = 0;
+	int nfull = 0, npartial = 0;
+
+	StringInfoData	buf;
+
+	initStringInfo(&buf);
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		if (! matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		resetStringInfo(&buf);
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			appendStringInfo(&buf, '[%d %d]',
+							 DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+							 DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, buf.data, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 2c22d31..b693f36 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,9 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2154,8 +2154,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 9));
+							PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 3529b03..7020772 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,13 +39,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -53,6 +56,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -68,18 +72,22 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					12
+#define Natts_pg_mv_statistic					16
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_deps_enabled		5
 #define Anum_pg_mv_statistic_mcv_enabled		6
-#define Anum_pg_mv_statistic_mcv_max_items		7
-#define Anum_pg_mv_statistic_deps_built			8
-#define Anum_pg_mv_statistic_mcv_built			9
-#define Anum_pg_mv_statistic_stakeys			10
-#define Anum_pg_mv_statistic_stadeps			11
-#define Anum_pg_mv_statistic_stamcv				12
+#define Anum_pg_mv_statistic_hist_enabled		7
+#define Anum_pg_mv_statistic_mcv_max_items		8
+#define Anum_pg_mv_statistic_hist_max_buckets	9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_stakeys			13
+#define Anum_pg_mv_statistic_stadeps			14
+#define Anum_pg_mv_statistic_stamcv				15
+#define Anum_pg_mv_statistic_stahist			16
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f8ceabf..0ca4957 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2674,6 +2674,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_volume}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 2bcd582..8c50bfb 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -654,10 +654,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index ce7c3ad..6708139 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -93,6 +93,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -100,20 +217,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -122,6 +244,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -131,10 +255,20 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..e830816
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s7 ON mv_histogram (unknown_column) WITH (histogram);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s7 ON mv_histogram (a) WITH (histogram);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a, b) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+ERROR:  maximum number of buckets is 16384
+-- correct command
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s8 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s9 ON mv_histogram (a, b, c, d) WITH (histogram);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3d55ffe..528ac36 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1375,7 +1375,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 85d94f1..a885235 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 6584d73..2efdcd7 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -164,3 +164,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..27c2510
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s7 ON mv_histogram (unknown_column) WITH (histogram);
+
+-- single column
+CREATE STATISTICS s7 ON mv_histogram (a) WITH (histogram);
+
+-- single column, duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a) WITH (histogram);
+
+-- two columns, one duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a, b) WITH (histogram);
+
+-- unknown option
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (unknown_option);
+
+-- missing histogram statistics
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+
+-- invalid max_buckets value / too low
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+
+-- invalid max_buckets value / too high
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+
+-- correct command
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s8 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s9 ON mv_histogram (a, b, c, d) WITH (histogram);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.5.0

0004-multivariate-MCV-lists.patchtext/x-patch; charset=UTF-8; name=0004-multivariate-MCV-lists.patchDownload

From 9786256d4dec9b3d6ea90ebbbeebd41568453b1b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/9] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 doc/src/sgml/ref/create_statistics.sgml |   43 ++
 src/backend/catalog/system_views.sql    |    4 +-
 src/backend/commands/statscmds.c        |   45 +-
 src/backend/nodes/outfuncs.c            |    2 +
 src/backend/optimizer/path/clausesel.c  |  814 +++++++++++++++++++++-
 src/backend/optimizer/util/plancat.c    |    4 +-
 src/backend/utils/mvstats/Makefile      |    2 +-
 src/backend/utils/mvstats/README.mcv    |  137 ++++
 src/backend/utils/mvstats/README.stats  |   89 ++-
 src/backend/utils/mvstats/common.c      |  133 +++-
 src/backend/utils/mvstats/common.h      |   17 +-
 src/backend/utils/mvstats/mcv.c         | 1120 +++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                 |   25 +-
 src/include/catalog/pg_mv_statistic.h   |   18 +-
 src/include/catalog/pg_proc.h           |    4 +
 src/include/nodes/relation.h            |    2 +
 src/include/utils/mvstats.h             |   69 +-
 src/test/regress/expected/mv_mcv.out    |  207 ++++++
 src/test/regress/expected/rules.out     |    4 +-
 src/test/regress/parallel_schedule      |    2 +-
 src/test/regress/serial_schedule        |    1 +
 src/test/regress/sql/mv_mcv.sql         |  178 +++++
 22 files changed, 2847 insertions(+), 73 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.mcv
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index ff09fa5..d6973e8 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -132,6 +132,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>max_mcv_items</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of MCV list items.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
@@ -177,6 +195,31 @@ EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 2);
 </programlisting>
   </para>
 
+  <para>
+   Create table <structname>t2</> with two perfectly correlated columns
+   (containing identical data), and a MCV list on those columns:
+
+<programlisting>
+CREATE TABLE t2 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t2 SELECT mod(i,100), mod(i,100)
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s2 ON t2 (a, b) WITH (mcv);
+
+ANALYZE t2;
+
+-- valid combination (found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
+
+-- invalid combination (not found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
+</programlisting>
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31dbb2c..5c40334 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -165,7 +165,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index f43b053..c480fbe 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -70,7 +70,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject, childobject;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -146,6 +152,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -154,10 +183,16 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -178,8 +213,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 07206d7..333e24b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2162,9 +2162,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 5ab7f15..7fc0c49 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,18 +48,39 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
-static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
+							 int type);
 
-static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid, int type);
 
-static int count_mv_attnums(List *clauses, Index relid);
+static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, Index relid,
+								 List *clauses, List **mvclauses,
+								 MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
+
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
@@ -66,6 +88,13 @@ static List * find_stats(PlannerInfo *root, Index relid);
 static bool stats_type_matches(MVStatisticInfo *stat, int type);
 
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -91,11 +120,13 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
  * to verify that suitable multivariate statistics exist.
  *
  * If we identify such multivariate statistics apply, we try to apply them.
- * Currently we only have (soft) functional dependencies, so we try to reduce
- * the list of clauses.
  *
- * Then we remove the clauses estimated using multivariate stats, and process
- * the rest of the clauses using the regular per-column stats.
+ * First we try to reduce the list of clauses by applying (soft) functional
+ * dependencies, and then we try to estimate the selectivity of the reduced
+ * list of clauses using the multivariate MCV list.
+ *
+ * Finally we remove the portion of clauses estimated using multivariate stats,
+ * and process the rest of the clauses using the regular per-column stats.
  *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
@@ -172,12 +203,46 @@ clauselist_selectivity(PlannerInfo *root,
 	 * that need to be estimated by other types of stats (MCV, histograms etc).
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
-		(count_mv_attnums(clauses, relid) >= 2))
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_FDEP) >= 2))
 	{
 		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
 	}
 
 	/*
+	 * Check that there are statistics with MCV list or histogram, and also the
+	 * number of attributes covered by these types of statistics.
+	 *
+	 * If there are no such stats or not enough attributes, don't waste time
+	 * with the multivariate code and simply skip to estimation using the
+	 * regular per-column stats.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	{
+		/* collect attributes from the compatible conditions */
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+
+		/* and search for the statistic covering the most attributes */
+		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+		if (mvstat != NULL)	/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	*mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, MV_CLAUSE_TYPE_MCV);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats */
+			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -834,32 +899,93 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO We may support some additional conditions, most importantly those
+ *      matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ *      selectivity of the most restrictive clause), because that's the maximum
+ *      we can ever get from ANDed list of clauses. This may probably prevent
+ *      issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO We may remember the lowest frequency in the MCV list, and then later use
+ *      it as a upper boundary for the selectivity (had there been a more
+ *      frequent item, it'd be in the MCV list). This might improve cases with
+ *      low-detail histograms.
+ *
+ * TODO We may also derive some additional boundaries for the selectivity from
+ *      the MCV list, because
+ *
+ *      (a) if we have a "full equality condition" (one equality condition on
+ *          each column of the statistic) and we found a match in the MCV list,
+ *          then this is the final selectivity (and pretty accurate),
+ *
+ *      (b) if we have a "full equality condition" and we haven't found a match
+ *          in the MCV list, then the selectivity is below the lowest frequency
+ *          found in the MCV list,
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid)
+collect_mv_attnums(List *clauses, Index relid, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate
-	 * using multivariate stats, and remember the relid/columns. We'll
-	 * then cross-check if we have suitable stats, and only if needed
-	 * we'll split the clauses into multivariate and regular lists.
+	 * Walk through the clauses and identify the ones we can estimate using
+	 * multivariate stats, and remember the relid/columns. We'll then
+	 * cross-check if we have suitable stats, and only if needed we'll split
+	 * the clauses into multivariate and regular lists.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested
-	 * OpExpr, using either a range or equality.
+	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
+	 * using either a range or equality.
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(clause, relid, &attnum))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
 	}
 
 	/*
@@ -880,10 +1006,10 @@ collect_mv_attnums(List *clauses, Index relid)
  * Count the number of attributes in clauses compatible with multivariate stats.
  */
 static int
-count_mv_attnums(List *clauses, Index relid)
+count_mv_attnums(List *clauses, Index relid, int type)
 {
 	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
 
 	c = bms_num_members(attnums);
 
@@ -913,9 +1039,183 @@ count_varnos(List *clauses, Index *relid)
 
 	return cnt;
 }
+ 
+/*
+ * We're looking for statistics matching at least 2 attributes, referenced in
+ * clauses compatible with multivariate statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * If there are multiple statistics referencing the same number of columns
+ * (from the clauses), the one with less source columns (as listed in the
+ * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns, but one
+ *     has 100 buckets and the other one has 1000 buckets (thus likely
+ *     providing better estimates), this is not currently considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list, another
+ *     one with just a histogram and a third one with both, we treat them equally.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts, so if
+ *     there are multiple clauses on a single attribute, this still counts as
+ *     a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example equality
+ *     clauses probably work better with MCV lists than with histograms. But
+ *     IS [NOT] NULL conditions may often work better with histograms (thanks
+ *     to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
+ * as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list of
+ * clauses into two parts - conditions that are compatible with the selected
+ * stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while the last
+ * condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting conditions
+ * instead of just referenced attributes), but eventually the best option should
+ * be to combine multiple statistics. But that's much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses, because
+ *      'dependencies' will probably work only with equality clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements) and for
+	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses that
+ * will be evaluated using the chosen statistics, and the remaining clauses
+ * (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes, so we can do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list of
+		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
+		 * clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible with the chosen
+	 * histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
 
 typedef struct
 {
+	int			types;		/* types of statistics ? */
 	Index		varno;		/* relid we're interested in */
 	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
 } mv_compatible_context;
@@ -933,23 +1233,66 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 {
 	if (node == NULL)
 		return false;
- 
+
 	if (IsA(node, RestrictInfo))
  	{
 		RestrictInfo *rinfo = (RestrictInfo *) node;
- 
+
  		/* Pseudoconstants are not really interesting here. */
  		if (rinfo->pseudoconstant)
 			return true;
- 
+
 		/* clauses referencing multiple varnos are incompatible */
 		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
 			return true;
- 
+
 		/* check the clause inside the RestrictInfo */
 		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
  	}
 
+	if (or_clause(node) || and_clause(node) || not_clause(node))
+ 	{
+		/*
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses are
+		 *      supported and some are not, and treat all supported subclauses
+		 *      as a single clause, compute it's selectivity using mv stats,
+		 *      and compute the total selectivity using the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the orclause
+		 *      with nested RestrictInfo - we won't have to call pull_varnos()
+		 *      for each clause, saving time.
+		 *
+		 * TODO Perhaps this needs a bit more thought for functional
+		 *      dependencies? Those don't quite work for NOT cases.
+		 */
+		BoolExpr   *expr = (BoolExpr *) node;
+		ListCell   *lc;
+
+		foreach (lc, expr->args)
+		{
+			if (mv_compatible_walker((Node *) lfirst(lc), context))
+				return true;
+		}
+
+		return false;
+ 	}
+
+	if (IsA(node, NullTest))
+ 	{
+		NullTest* nt = (NullTest*)node;
+
+		/*
+		 * Only simple (Var IS NULL) expressions supported for now. Maybe we could
+		 * use examine_variable to fix this?
+		 */
+		if (! IsA(nt->arg, Var))
+			return true;
+
+		return mv_compatible_walker((Node*)(nt->arg), context);
+	}
+
 	if (IsA(node, Var))
 	{
 		Var * var = (Var*)node;
@@ -1000,7 +1343,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		/* unsupported structure (two variables or so) */
 		if (! ok)
 			return true;
-
+ 
  		/*
 		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
 		 * Otherwise note the relid and attnum for the variable. This uses the
@@ -1010,10 +1353,18 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		switch (get_oprrest(expr->opno))
 		{
 			case F_EQSEL:
-
 				/* equality conditions are compatible with all statistics */
 				break;
 
+			case F_SCALARLTSEL:
+			case F_SCALARGTSEL:
+
+				/* not compatible with functional dependencies */
+				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+					return true;	/* terminate */
+ 
+				break;
+
 			default:
 
 				/* unknown estimator */
@@ -1024,11 +1375,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 
 		return mv_compatible_walker((Node *) var, context);
  	}
-
+ 
 	/* Node not explicitly supported, so terminate */
 	return true;
 }
- 
+
 /*
  * Determines whether the clause is compatible with multivariate stats,
  * and if it is, returns some additional information - varno (index
@@ -1047,10 +1398,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
  *      evaluate them using multivariate stats.
  */
 static bool
-clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int types)
 {
 	mv_compatible_context context;
 
+	context.types = types;
 	context.varno = relid;
 	context.varattnos = NULL;	/* no attnums */
 
@@ -1058,7 +1410,7 @@ clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
 		return false;
 
 	/* remember the newly collected attnums */
-	*attnum = bms_singleton_member(context.varattnos);
+	*attnums = bms_add_members(*attnums, context.varattnos);
 
 	return true;
 }
@@ -1075,15 +1427,15 @@ fdeps_reduce_clauses(List *clauses, Index relid, Bitmapset *reduced_attnums)
 
 	foreach (lc, clauses)
 	{
-		AttrNumber attnum = InvalidAttrNumber;
+		Bitmapset *attnums = NULL;
 		Node * clause = (Node*)lfirst(lc);
 
 		/* ignore clauses that are not compatible with functional dependencies */
-		if (! clause_is_mv_compatible(clause, relid, &attnum))
+		if (! clause_is_mv_compatible(clause, relid, &attnums, MV_CLAUSE_TYPE_FDEP))
 			reduced_clauses = lappend(reduced_clauses, clause);
 
 		/* for equality clauses, only keep those not on reduced attributes */
-		if (! bms_is_member(attnum, reduced_attnums))
+		if (! bms_is_subset(attnums, reduced_attnums))
 			reduced_clauses = lappend(reduced_clauses, clause);
 	}
 
@@ -1208,7 +1560,7 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 		return clauses;
 
 	/* collect attnums from clauses compatible with dependencies (equality) */
-	clause_attnums = collect_mv_attnums(clauses, relid);
+	clause_attnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_FDEP);
 
 	/* decide which attnums may be eliminated */
 	reduced_attnums = fdeps_reduce_attnums(stats, clause_attnums);
@@ -1233,6 +1585,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+		return true;
+
 	return false;
 }
 
@@ -1266,3 +1621,392 @@ find_stats(PlannerInfo *root, Index relid)
 
 	return root->simple_rel_array[relid]->mvstatlist;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/*
+	 * find the lowest frequency in the MCV list
+	 *
+	 * We need to do that here, because we do various tricks in the following
+	 * code - skipping items already ruled out, etc.
+	 *
+	 * XXX A loop is necessary because the MCV list is not sorted by frequency.
+	 */
+	*lowsel = 1.0;
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem item = mcvlist->items[i];
+
+		if (item->frequency < *lowsel)
+			*lowsel = item->frequency;
+	}
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+			FmgrInfo	opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					switch (oprrest)
+					{
+						case F_EQSEL:
+							/*
+							 * We don't care about isgt in equality, because it does not
+							 * matter whether it's (var = const) or (const = var).
+							 */
+							mismatch = ! DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							if (! mismatch)
+								eqmatches = bms_add_member(eqmatches, idx);
+
+							break;
+
+						case F_SCALARLTSEL:	/* column < constant */
+						case F_SCALARGTSEL: /* column > constant */
+
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							/* invert the result if isgt=true */
+							mismatch = (isgt) ? (! mismatch) : mismatch;
+							break;
+					}
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! item->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (item->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 7fb2088..8394111 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -421,9 +421,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.mcv b/src/backend/utils/mvstats/README.mcv
new file mode 100644
index 0000000..e93cfe4
--- /dev/null
+++ b/src/backend/utils/mvstats/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+    (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+    (1) there's an equality condition on all attributes of the statistic
+
+    (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+     f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+     of the actual values. This type of MCV list will be useful only for
+     estimating equality clauses, and will reduce space requirements for large
+     varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+     building the histogram at all), except for doing this decision manually in
+     ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+    SELECT * FROM pg_mv_mcv_items();
+
+It has two input parameters:
+
+    oid   - OID of the MCV list (pg_mv_statistic.staoid)
+
+and produces a table with these columns:
+
+    - item ID (0...nitems-1)
+    - values (string array)
+    - nulls only (boolean array)
+    - frequency (double precision)
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index a38ea7b..5c5c59a 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,9 +8,50 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-Currently we only have one kind of multivariate statistics - soft functional
-dependencies, and we use it to improve estimates of equality clauses. See
-README.dependencies for details.
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) MCV lists (README.mcv)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+    (b) MCV list - equality and inequality clauses, IS [NOT] NULL, AND/OR
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
 
 
 Selectivity estimation
@@ -23,14 +64,48 @@ When estimating selectivity, we aim to achieve several things:
     (b) minimize the overhead, especially when no suitable multivariate stats
         exist (so if you are not using multivariate stats, there's no overhead)
 
-This clauselist_selectivity() performs several inexpensive checks first, before
+Thus clauselist_selectivity() performs several inexpensive checks first, before
 even attempting to do the more expensive estimation.
 
     (1) check if there are multivariate stats on the relation
 
-    (2) check there are at least two attributes referenced by clauses compatible
-        with multivariate statistics (equality clauses for func. dependencies)
+    (2) check that there are functional dependencies on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equality clauses for func. dependencies)
 
     (3) perform reduction of equality clauses using func. dependencies
 
-    (4) estimate the reduced list of clauses using regular statistics
+    (4) check that there are multivariate MCV lists on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equalities, inequalities, etc.)
+
+    (5) find the best multivariate statistics (matching the most conditions)
+        and use it to compute the estimate
+
+    (6) estimate the remaining clauses (not estimated using multivariate stats)
+        using the regular per-column statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
+
+
+Further (possibly crazy) ideas
+------------------------------
+
+Currently the clauses are only estimated using a single statistics, even if
+there are multiple candidate statistics - for example assume we have statistics
+on (a,b,c) and (b,c,d), and estimate conditions
+
+    (b = 1) AND (c = 2)
+
+Then both statistics may be used, but we only use one of them. Maybe we could
+use compute estimates using all candidate stats, and somehow aggregate them
+into the final estimate by using average or median.
+
+Some stats may give better estimates than others, but it's very difficult to say
+in advance which stats are the best (it depends on the number of buckets, number
+of additional columns not referenced in the clauses, type of condition etc.).
+
+But of course, this may result in expensive estimation (CPU-wise).
+
+So we might add a GUC to choose between a simple (single statistics) and thus
+multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index dcb7c78..4f5a842 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
@@ -377,3 +465,32 @@ multi_sort_compare_dims(int start, int end,
 
 	return 0;
 }
+
+/* simple counterpart to qsort_arg */
+void *
+bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,
+			int (*compar) (const void *, const void *, void *),
+			void *arg)
+{
+	size_t l, u, idx;
+	const void *p;
+	int comparison;
+
+	l = 0;
+	u = nmemb;
+	while (l < u)
+	{
+		idx = (l + u) / 2;
+		p = (void *) (((const char *) base) + (idx * size));
+		comparison = (*compar) (key, p, arg);
+
+		if (comparison < 0)
+			u = idx;
+		else if (comparison > 0)
+			l = idx + 1;
+		else
+			return (void *) p;
+	}
+
+	return NULL;
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index a019ea6..350760b 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -46,7 +46,15 @@ typedef struct
 	Datum		value;			/* a data value */
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
- 
+
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -58,6 +66,7 @@ typedef MultiSortSupportData* MultiSortSupport;
 typedef struct SortItem {
 	Datum  *values;
 	bool   *isnull;
+	int		count;
 } SortItem;
 
 MultiSortSupport multi_sort_init(int ndims);
@@ -74,5 +83,11 @@ int multi_sort_compare_dims(int start, int end, const SortItem *a,
 							const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
+
+void * bsearch_arg(const void *key, const void *base,
+				   size_t nmemb, size_t size,
+				   int (*compar) (const void *, const void *, void *),
+				   void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..b300c1a
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1120 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(uint16))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(uint16) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* Macros for convenient access to parts of the serialized MCV item */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+static MultiSortSupport build_mss(VacAttrStats **stats, int2vector *attrs);
+
+static SortItem *build_sorted_items(int numrows, HeapTuple *rows,
+									TupleDesc tdesc, MultiSortSupport mss,
+									int2vector *attrs);
+
+static SortItem *build_distinct_groups(int numrows, SortItem *items,
+									   MultiSortSupport mss, int *ndistinct);
+
+static int count_distinct_groups(int numrows, SortItem *items,
+								 MultiSortSupport mss);
+
+/*
+ * Builds MCV list from the set of sampled rows.
+ *
+ * The algorithm is quite simple:
+ *
+ *     (1) sort the data (default collation, '<' for the data type)
+ *
+ *     (2) count distinct groups, decide how many to keep
+ *
+ *     (3) build the MCV list using the threshold determined in (2)
+ *
+ *     (4) remove rows represented by the MCV from the sample
+ *
+ * The method also removes rows matching the MCV items from the input array,
+ * and passes the number of remaining rows (useful for building histograms)
+ * using the numrows_filtered parameter.
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We should
+ *       do that too, because when walking through the list we want to check
+ *       the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or float4).
+ *      Maybe we could save some space here, but the bytea compression should
+ *      handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed from
+ *      the table, but rather estimate the number of distinct values in the
+ *      table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* comparator for all the columns */
+	MultiSortSupport mss = build_mss(stats, attrs);
+
+	/* sort the rows */
+	SortItem   *items  = build_sorted_items(numrows, rows, stats[0]->tupDesc,
+											mss, attrs);
+
+	/* transform the sorted rows into groups (sorted by frequency) */
+	SortItem   *groups = build_distinct_groups(numrows, items, mss, &ndistinct);
+
+	/*
+	 * Determine the minimum size of a group to be eligible for MCV list, and
+	 * check how many groups actually pass that threshold. We use 1.25x the
+	 * avarage group size, just like for regular statistics.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e. if there
+	 * are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS), we'll require
+	 * only 2 rows per group.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog) instead
+	 * 		 of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/* Walk through the groups and stop once we fall below the threshold. */
+	nitems = 0;
+	for (i = 0; i < ndistinct; i++)
+	{
+		if (groups[i].count < mcv_threshold)
+			break;
+
+		nitems++;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as we will
+		 * pass the result outside and thus it needs to be easy to pfree().
+		 *
+		 * XXX Although we're the only ones dealing with this.
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/* Copy the first chunk of groups into the result. */
+		for (i = 0; i < nitems; i++)
+		{
+			/* just pointer to the proper place in the list */
+			MCVItem item = mcvlist->items[i];
+
+			/* copy values from the _previous_ group (last item of) */
+			memcpy(item->values, groups[i].values, sizeof(Datum) * numattrs);
+			memcpy(item->isnull, groups[i].isnull, sizeof(bool)  * numattrs);
+
+			/* and finally the group frequency */
+			item->frequency = (double)groups[i].count / numrows;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows that are
+		 * not represented by the MCV list). We will first sort the groups
+		 * by the keys (not by count) and then use binary search.
+		 */
+		if (nitems > ndistinct)
+		{
+			int i, j;
+			int nfiltered = 0;
+
+			/* used for the searches */
+			SortItem key;
+
+			/* wfill this with data from the rows */
+			key.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			key.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * Sort the groups for bsearch_r (but only the items that actually
+			 * made it to the MCV list).
+			 */
+			qsort_arg((void *) groups, nitems, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					key.values[j]
+						= heap_getattr(rows[i], attrs->values[j],
+									   stats[j]->tupDesc, &key.isnull[j]);
+
+				/* if not included in the MCV list, keep it in the array */
+				if (bsearch_arg(&key, groups, nitems, sizeof(SortItem),
+								multi_sort_compare, mss) == NULL)
+					rows[nfiltered++] = rows[i];
+			}
+
+			/* remember how many rows we actually kept */
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(key.values);
+			pfree(key.isnull);
+		}
+		else
+			/* the MCV list convers all the rows */
+			*numrows_filtered = 0;
+	}
+
+	pfree(items);
+	pfree(groups);
+
+	return mcvlist;
+}
+
+/* build MultiSortSupport for the attributes passed in attrs */
+static MultiSortSupport
+build_mss(VacAttrStats **stats, int2vector *attrs)
+{
+	int	i;
+	int	numattrs = attrs->dim1;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	return mss;
+}
+
+/* build sorted array of SortItem with values from rows */
+static SortItem *
+build_sorted_items(int numrows, HeapTuple *rows, TupleDesc tdesc,
+				   MultiSortSupport mss, int2vector *attrs)
+{
+	int	i, j, len;
+	int	numattrs = attrs->dim1;
+	int	nvalues = numrows * numattrs;
+
+	/*
+	 * We won't allocate the arrays for each item independenly, but in one large
+	 * chunk and then just set the pointers.
+	 */
+	SortItem   *items;
+	Datum	   *values;
+	bool	   *isnull;
+	char	   *ptr;
+
+	/* Compute the total amount of memory we need (both items and values). */
+	len = numrows * sizeof(SortItem) + nvalues * (sizeof(Datum) + sizeof(bool));
+
+	/* Allocate the memory and split it into the pieces. */
+	ptr = palloc0(len);
+
+	/* items to sort */
+	items = (SortItem*)ptr;
+	ptr += numrows * sizeof(SortItem);
+
+	/* values and null flags */
+	values = (Datum*)ptr;
+	ptr += nvalues * sizeof(Datum);
+
+	isnull = (bool*)ptr;
+	ptr += nvalues * sizeof(bool);
+
+	/* make sure we consumed the whole buffer exactly */
+	Assert((ptr - (char*)items) == len);
+
+	/* fix the pointers to Datum and bool arrays */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+
+		/* load the values/null flags from sample rows */
+		for (j = 0; j < numattrs; j++)
+		{
+			items[i].values[j] = heap_getattr(rows[i],
+										  attrs->values[j], /* attnum */
+										  tdesc,
+										  &items[i].isnull[j]);	/* isnull */
+		}
+	}
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	return items;
+}
+
+/* count distinct combinations of SortItems in the array */
+static int
+count_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss)
+{
+	int i;
+	int ndistinct;
+
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	return ndistinct;
+}
+
+/* compares frequencies of the SortItem entries (in descending order) */
+static int
+compare_sort_item_count(const void *a, const void *b)
+{
+	SortItem *ia = (SortItem *)a;
+	SortItem *ib = (SortItem *)b;
+
+	if (ia->count == ib->count)
+		return 0;
+	else if (ia->count > ib->count)
+		return -1;
+
+	return 1;
+}
+
+/* builds SortItems for distinct groups and counts the matching items */
+static SortItem *
+build_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss,
+					  int *ndistinct)
+{
+	int	i, j;
+	int ngroups = count_distinct_groups(numrows, items, mss);
+
+	SortItem *groups = (SortItem*)palloc0(ngroups * sizeof(SortItem));
+
+	j = 0;
+	groups[0] = items[0];
+	groups[0].count = 1;
+
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			groups[++j] = items[i];
+
+		groups[j].count++;
+	}
+
+	pg_qsort((void *) groups, ngroups, sizeof(SortItem),
+			 compare_sort_item_count);
+
+	*ndistinct = ngroups;
+	return groups;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * serialize MCV list into a bytea value
+ *
+ *
+ * The basic algorithm is simple:
+ *
+ * (1) perform deduplication (for each attribute separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we may be mixing
+ * different datatypes, with different sort operators, etc.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't allow more
+ * than 8k MCV items (see list max_mcv_items), although that's mostly arbitrary
+ * limit. We might increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the serialization to save as much space as for
+ * histograms, because we are not doing any bucket splits (which is the source
+ * of high redundancy in histograms).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into a single char
+ *      (or a longer type) instead of using an array of bool items.
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	SortSupport		ssup;
+	DimensionInfo  *info;
+
+	Size	total_length;
+
+	/* allocate just once */
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/*
+	 * We'll include some rudimentary information about the attributes (type
+	 * length, etc.), so that we don't have to look them up while deserializing
+	 * the MCV list.
+	 */
+	info = (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data for all attributes included in the MCV list */
+	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+		int ndistinct;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* copy important info about the data type (length, by-value) */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for values in the attribute and collect them */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			/* skip NULL values - we don't need to serialize them */
+			if (mcvlist->items[j]->isnull[i])
+				continue;
+
+			values[i][counts[i]] = mcvlist->items[j]->values[i];
+			counts[i] += 1;
+		}
+
+		/* there are just NULL values in this dimension, we're done */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate the data */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicate values, but keep the
+		 * ordering (so that we can do bsearch later). We know there's at least
+		 * one item as (counts[i] != 0), so we can skip the first element.
+		 */
+		ndistinct = 1;	/* number of distinct values */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if the value is the same as the previous one, we can skip it */
+			if (! compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]))
+				continue;
+
+			values[i][ndistinct] = values[i][j];
+			ndistinct += 1;
+		}
+
+		/* we must not exceed UINT16_MAX, as we use uint16 indexes */
+		Assert(ndistinct <= UINT16_MAX);
+
+		/*
+		 * Store additional info about the attribute - number of deduplicated
+		 * values, and also size of the serialized data. For fixed-length data
+		 * types this is trivial to compute, for varwidth types we need to
+		 * actually walk the array and sum the sizes.
+		 */
+		info[i].nvalues = ndistinct;
+
+		if (info[i].typlen > 0)					/* fixed-length data types */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)			/* varlena */
+		{
+			info[i].nbytes = 0;
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		}
+		else if (info[i].typlen == -2)			/* cstring */
+		{
+			info[i].nbytes = 0;
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		}
+
+		/* we know (count>0) so there must be some data */
+		Assert(info[i].nbytes > 0);
+	}
+
+	/*
+	 * Now we can finally compute how much space we'll actually need for the
+	 * serialized MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and then we
+	 * will place all the data (values + indexes).
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (1024 * 1024))
+		elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* 'data' points to the current position in the output buffer */
+	data = VARDATA(output);
+
+	/* MCV list header (number of items, ...) */
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	/* information about the attributes */
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* now serialize the deduplicated values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;	/* remember the starting point */
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			Datum v = values[i][j];
+
+			if (info[i].typbyval)			/* passed by value */
+			{
+				memcpy(data, &v, info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)	/* pased by reference */
+			{
+				memcpy(data, DatumGetPointer(v), info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)	/* varlena */
+			{
+				memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+				data += VARSIZE_ANY(v);
+			}
+			else if (info[i].typlen == -2)	/* cstring */
+			{
+				memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v))+1);
+				data += strlen(DatumGetPointer(v)) + 1;	/* terminator */
+			}
+		}
+
+		/* make sure we got exactly the amount of data we expected */
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* finally serialize the items, with uint16 indexes instead of the values */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem	mcvitem = mcvlist->items[i];
+
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the item (we only allocate it once and reuse it) */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			Datum  *v = NULL;
+
+			/* do the lookup only for non-NULL values */
+			if (mcvlist->items[i]->isnull[j])
+				continue;
+
+			v = (Datum*)bsearch_arg(&mcvitem->values[j], values[j],
+									info[j].nvalues, sizeof(Datum),
+									compare_scalars_simple, &ssup[j]);
+
+			Assert(v != NULL);	/* serialization or deduplication error */
+
+			/* compute index within the array */
+			ITEM_INDEXES(item)[j] = (v - values[j]);
+
+			/* check the index is within expected bounds */
+			Assert(ITEM_INDEXES(item)[j] >= 0);
+			Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims), mcvitem->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims), &mcvitem->frequency, sizeof(double));
+
+		/* copy the serialized item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * deserialize MCV list from the varlena value
+ *
+ *
+ * We deserialize the MCV list fully, because we don't expect there bo be a lot
+ * of duplicate values. But perhaps we should keep the MCV in serialized form
+ * just like histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values  = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	/* we can't deserialize the MCV if there's not even a complete header */
+	expected_size = offsetof(MCVListData,items);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform further sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert((nitems > 0) && (nitems  <= MVSTAT_MCVLIST_MAX_ITEMS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Check amount of data including DimensionInfo for all dimensions and
+	 * also the serialized items (including uint16 indexes). Also, walk
+	 * through the dimension information and add it to the sum.
+	 */
+	expected_size += ndims * sizeof(DimensionInfo) +
+					 (nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+	{
+		Assert(info[i].nvalues >= 0);
+		Assert(info[i].nbytes >= 0);
+
+		expected_size += info[i].nbytes;
+	}
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * Allocate one large chunk of memory for the intermediate data, needed
+	 * only for deserializing the MCV list (and allocate densely to minimize
+	 * the palloc overhead).
+	 *
+	 * Let's see how much space we'll actually need, and also include space
+	 * for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;		/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * XXX This uses pointers to the original data array (the types not passed
+	 *     by value), so when someone frees the memory, e.g. by doing something
+	 *     like this:
+	 *
+	 *         bytea * data = ... fetch the data from catalog ...
+	 *         MCVList mcvlist = deserialize_mcv_list(data);
+	 *         pfree(data);
+	 *
+	 *     then 'mcvlist' references the freed memory. Should copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the other types need a chunk of the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should have exhausted the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for all the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc0(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if
+ * the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/* build metadata needed later to produce tuples from raw C-strings */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple. This should
+		 * be an array of C strings which will be processed later by the type
+		 * input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			if (item->isnull[i])
+				valout = CStringGetDatum("NULL");
+			else
+			{
+				val = item->values[i];
+				valout = FunctionCall1(&fmgrinfo[i], val);
+			}
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 8ce9c0e..2c22d31 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,8 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2128,6 +2129,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2137,10 +2140,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index c74af47..3529b03 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,15 +38,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -62,14 +68,18 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					8
+#define Natts_pg_mv_statistic					12
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_deps_enabled		5
-#define Anum_pg_mv_statistic_deps_built			6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_stadeps			8
+#define Anum_pg_mv_statistic_mcv_enabled		6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_deps_built			8
+#define Anum_pg_mv_statistic_mcv_built			9
+#define Anum_pg_mv_statistic_stakeys			10
+#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_stamcv				12
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index ff2d797..f8ceabf 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2670,6 +2670,10 @@ DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e10dcf1..2bcd582 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -653,9 +653,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index c6f45ab..ce7c3ad 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -52,30 +52,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..075320b
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s4 ON mcv_list (unknown_column) WITH (mcv);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s4 ON mcv_list (a) WITH (mcv);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a, b) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+ERROR:  max number of MCV items is 8192
+-- correct command
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s5 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s6 ON mcv_list (a, b, c, d) WITH (mcv);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 06f2231..3d55ffe 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1373,7 +1373,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     s.staname,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f2ffb8..85d94f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 097a04f..6584d73 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -163,3 +163,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..b31d32d
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s4 ON mcv_list (unknown_column) WITH (mcv);
+
+-- single column
+CREATE STATISTICS s4 ON mcv_list (a) WITH (mcv);
+
+-- single column, duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a) WITH (mcv);
+
+-- two columns, one duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a, b) WITH (mcv);
+
+-- unknown option
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (unknown_option);
+
+-- missing MCV statistics
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+
+-- correct command
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s5 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s6 ON mcv_list (a, b, c, d) WITH (mcv);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.5.0

0003-clause-reduction-using-functional-dependencies.patchtext/x-patch; charset=UTF-8; name=0003-clause-reduction-using-functional-dependencies.patchDownload

From 70cf8ed9f0d161a335be1e72a25f325261ea9e46 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/9] clause reduction using functional dependencies

During planning, use functional dependencies to decide which clauses to
skip during cardinality estimation. Initial and rather simplistic
implementation.

This only works with regular WHERE clauses, not clauses used for join
clauses.

Note: The clause_is_mv_compatible() needs to identify the relation (so
that we can fetch the list of multivariate stats by OID).
planner_rt_fetch() seems like the appropriate way to get the relation
OID, but apparently it only works with simple vars. Maybe
examine_variable() would make this work with more complex vars too?

Includes regression tests analyzing functional dependencies (part of
ANALYZE) on several datasets (no dependencies, no transitive
dependencies, ...).

Checks that a query with conditions on two columns, where one (B) is
functionally dependent on the other one (A), correctly ignores the
clause on (B) and chooses bitmap index scan instead of plain index scan
(which is what happens otherwise, thanks to assumption of
independence).

Note: Functional dependencies only work with equality clauses, no
inequalities etc.
---
 src/backend/optimizer/path/clausesel.c        | 505 ++++++++++++++++-
 src/backend/utils/mvstats/README.dependencies |  63 +--
 src/backend/utils/mvstats/README.stats        |  36 ++
 src/backend/utils/mvstats/common.c            |  25 +-
 src/backend/utils/mvstats/common.h            |   3 +
 src/backend/utils/mvstats/dependencies.c      | 775 +++++++++++++++++---------
 src/include/utils/mvstats.h                   |  21 +-
 src/test/regress/expected/mv_dependencies.out | 172 ++++++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 150 +++++
 11 files changed, 1457 insertions(+), 297 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.stats
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 02660c2..5ab7f15 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,25 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+
+static int count_mv_attnums(List *clauses, Index relid);
+
+static int count_varnos(List *clauses, Index *relid);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+									Index relid, List *stats);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, Index relid);
+
+static bool stats_type_matches(MVStatisticInfo *stat, int type);
+
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +84,19 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
+ * The first thing we try to do is applying multivariate statistics, in a way
+ * that intends to minimize the overhead when there are no multivariate stats
+ * on the relation. Thus we do several simple (and inexpensive) checks first,
+ * to verify that suitable multivariate statistics exist.
+ *
+ * If we identify such multivariate statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using multivariate stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -99,6 +135,22 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* list of multivariate stats on the relation */
+	List	   *stats = NIL;
+
+	/*
+	 * To fetch the statistics, we first need to determine the rel. Currently
+	 * point we only support estimates of simple restrictions with all Vars
+	 * referencing a single baserel. However set_baserel_size_estimates() sets
+	 * varRelid=0 so we have to actually inspect the clauses by pull_varnos
+	 * and see if there's just a single varno referenced.
+	 */
+	if ((count_varnos(clauses, &relid) == 1) && ((varRelid == 0) || (varRelid == relid)))
+		stats = find_stats(root, relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +160,24 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Apply functional dependencies, but first check that there are some stats
+	 * with functional dependencies built (by simply walking the stats list),
+	 * and that there are at two or more attributes referenced by clauses that
+	 * may be reduced using functional dependencies.
+	 *
+	 * We would find that anyway when trying to actually apply the functional
+	 * dependencies, but let's do the cheap checks first.
+	 *
+	 * After applying the functional dependencies we get the remainig clauses
+	 * that need to be estimated by other types of stats (MCV, histograms etc).
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
+		(count_mv_attnums(clauses, relid) >= 2))
+	{
+		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +833,436 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+typedef struct
+{
+	Index		varno;		/* relid we're interested in */
+	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
+} mv_compatible_context;
+
+/*
+ * Recursive walker that checks compatibility of the clause with multivariate
+ * statistics, and collects attnums from the Vars.
+ *
+ * XXX The original idea was to combine this with expression_tree_walker, but
+ *     I've been unable to make that work - seems that does not quite allow
+ *     checking the structure. Hence the explicit calls to the walker.
+ */
+static bool
+mv_compatible_walker(Node *node, mv_compatible_context *context)
+{
+	if (node == NULL)
+		return false;
+ 
+	if (IsA(node, RestrictInfo))
+ 	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+ 
+ 		/* Pseudoconstants are not really interesting here. */
+ 		if (rinfo->pseudoconstant)
+			return true;
+ 
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+ 
+		/* check the clause inside the RestrictInfo */
+		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
+ 	}
+
+	if (IsA(node, Var))
+	{
+		Var * var = (Var*)node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might be
+		 * unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (! AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+ 	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+ 	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+ 		/*
+		 * Only expressions with two arguments are considered compatible.
+ 		 *
+		 * XXX Possibly unnecessary (can OpExpr have different arg count?).
+ 		 */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node*)expr) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				(varonleft = false,
+				 is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (! ok)
+			return true;
+
+ 		/*
+		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
+		 * Otherwise note the relid and attnum for the variable. This uses the
+		 * function for estimating selectivity, ont the operator directly (a bit
+		 * awkward, but well ...).
+ 		 */
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+ 		}
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return mv_compatible_walker((Node *) var, context);
+ 	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+ 
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+{
+	mv_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (mv_compatible_walker(clause, (void *) &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+
+/*
+ * Reduce clauses using functional dependencies
+ */
+static List*
+fdeps_reduce_clauses(List *clauses, Index relid, Bitmapset *reduced_attnums)
+{
+	ListCell *lc;
+	List	 *reduced_clauses = NIL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber attnum = InvalidAttrNumber;
+		Node * clause = (Node*)lfirst(lc);
+
+		/* ignore clauses that are not compatible with functional dependencies */
+		if (! clause_is_mv_compatible(clause, relid, &attnum))
+			reduced_clauses = lappend(reduced_clauses, clause);
+
+		/* for equality clauses, only keep those not on reduced attributes */
+		if (! bms_is_member(attnum, reduced_attnums))
+			reduced_clauses = lappend(reduced_clauses, clause);
+	}
+
+	return reduced_clauses;
+}
+
+/*
+ * decide which attributes are redundant (for equality clauses)
+ *
+ * We try to apply all functional dependencies available, and for each one we
+ * check if it matches attnums from equality clauses, but only those not yet
+ * reduced.
+ *
+ * XXX Not sure if the order in which we apply the dependencies matters.
+ *
+ * XXX We do not combine functional dependencies from separate stats. That is
+ *     if we have dependencies on [a,b] and [b,c], then we don't deduce
+ *     a->c from a->b and b->c. Computing such transitive closure is a possible
+ *     future improvement.
+ */
+static Bitmapset *
+fdeps_reduce_attnums(List *stats, Bitmapset *attnums)
+{
+	ListCell  *lc;
+	Bitmapset *reduced = NULL;
+
+	foreach (lc, stats)
+	{
+		int i;
+		MVDependencies dependencies = NULL;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without dependencies */
+		if (! stats_type_matches(info, MV_CLAUSE_TYPE_FDEP))
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(info->mvoid);
+
+		for (i = 0; i < dependencies->ndeps; i++)
+		{
+			int j;
+			bool matched = true;
+			MVDependency dep = dependencies->deps[i];
+
+			/* we don't bother to break the loop early (only few attributes) */
+			for (j = 0; j < dep->nattributes; j++)
+			{
+				if (! bms_is_member(dep->attributes[j], attnums))
+					matched = false;
+
+				if (bms_is_member(dep->attributes[j], reduced))
+					matched = false;
+			}
+
+			/* if dependency applies, mark the last attribute as reduced */
+			if (matched)
+				reduced = bms_add_member(reduced,
+										 dep->attributes[dep->nattributes-1]);
+		}
+	}
+
+	return reduced;
+}
+
+/*
+ * reduce list of equality clauses using soft functional dependencies
+ *
+ * We simply walk through list of functional dependencies, and for each one we
+ * check whether the dependency 'matches' the clauses, i.e. if there's a clause
+ * matching the condition. If yes, we attempt to remove all clauses matching
+ * the implied part of the dependency from the list.
+ *
+ * This only reduces equality clauses, and ignores all the other types. We might
+ * extend it to handle IS NULL clause, in the future.
+ *
+ * We also assume the equality clauses are 'compatible'. For example we can't
+ * identify when the clauses use a mismatching zip code and city name. In such
+ * case the usual approach (product of selectivities) would produce a better
+ * estimate, although mostly by chance.
+ *
+ * The implementation needs to be careful about cyclic dependencies, e.g. when
+ *
+ *     (a -> b) and (b -> a)
+ *
+ * at the same time, which means there's 1:1 relationship between te columns.
+ * In this case we must not reduce clauses on both attributes at the same time.
+ *
+ * TODO Currently we only apply functional dependencies at the same level, but
+ *      maybe we could transfer the clauses from upper levels to the subtrees?
+ *      For example let's say we have (a->b) dependency, and condition
+ *
+ *          (a=1) AND (b=2 OR c=3)
+ *
+ *      Currently, we won't be able to perform any reduction, because we'll
+ *      consider (a=1) and (b=2 OR c=3) independently. But maybe we could pass
+ *      (a=1) into the other expression, and only check it against conditions
+ *      of the functional dependencies?
+ *
+ *      In this case we'd end up with
+ *
+ *         (a=1)
+ *
+ *      as we'd consider (b=2) implied thanks to the rule, rendering the whole
+ *      OR clause valid.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Index relid, List *stats)
+{
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *reduced_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this a bit pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/* collect attnums from clauses compatible with dependencies (equality) */
+	clause_attnums = collect_mv_attnums(clauses, relid);
+
+	/* decide which attnums may be eliminated */
+	reduced_attnums = fdeps_reduce_attnums(stats, clause_attnums);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may reduce.
+	 */
+	clauses = fdeps_reduce_clauses(clauses, relid, reduced_attnums);
+
+	bms_free(clause_attnums);
+	bms_free(reduced_attnums);
+
+	return clauses;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+stats_type_matches(MVStatisticInfo *stat, int type)
+{
+	if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* terminate if we've found at least one matching statistics */
+		if (stats_type_matches(stat, type))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Lookups stats for a given baserel.
+ */
+static List *
+find_stats(PlannerInfo *root, Index relid)
+{
+	Assert(root->simple_rel_array[relid] != NULL);
+
+	return root->simple_rel_array[relid]->mvstatlist;
+}
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
index 1f96fbc..f248459 100644
--- a/src/backend/utils/mvstats/README.dependencies
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -156,37 +156,24 @@ estimates - especially compared to histograms, that are quite bad in estimating
 equality clauses.
 
 
-Limitations
------------
-
-Let's see the main liminations of functional dependencies, especially those
-related to the current implementation.
+Multi-column dependencies
+-------------------------
 
-The current implementation supports only dependencies between two columns, but
-this is merely a simplification of the initial implementation. It's certainly
-useful to mine for dependencies involving multiple columns on the 'left' side,
-i.e. a condition for the dependency. That is dependencies like (a,b -> c).
+The implementation supports dependencies with multiple columns on the left side
+(i.e. condition of the dependency). The detection starts from dependencies with
+a single condition, and then proceeds to higher condition counts.
 
-The implementation may/should be smart enough not to mine redundant conditions,
-e.g. (a->b) and (a,c -> b), because the latter is a trivial consequence of the
-former one (if values of 'a' determine 'b', adding another column won't change
-that relationship). The ANALYZE should first analyze 1:1 dependencies, then 2:1
-dependencies (and skip the already identified ones), etc.
+It also detects dependencies that are implied by already identified ones, and
+ignores them. For example if we know that (a->b) then we won't add (a,c->b) as
+this dependency is a trivial consequence of (a->b).
 
-For example the dependency
+For a more practical example, consider these two dependencies
 
     (city name -> zip code)
-
-is much stronger, i.e. whenever it hold, then
-
     (city name, state name -> zip code)
 
-holds too. But in case there are cities with the same name in different states,
-then only the latter dependency will be valid.
-
-Of course, there probably are cities with the same name within a single state,
-but hopefully this is relatively rare occurence (and thus we'll still detect
-the 'soft' dependency).
+We could say that the former dependency is stronger because if it's valid, then
+the second dependency is valid too.
 
 Handling multiple columns on the right side of the dependency, is not necessary,
 as those dependencies may be simply decomposed into a set of dependencies with
@@ -199,24 +186,22 @@ is exactly the same as
     (a -> b) & (a -> c)
 
 Of course, storing the first form may be more efficient thant storing multiple
-'simple' dependencies separately.
-
+'simple' dependencies separately. This is left as a future work.
 
-TODO Support dependencies with multiple columns on left/right.
 
-TODO Investigate using histogram and MCV list to verify the dependencies.
+Future work
+-----------
 
-TODO Investigate statistical testing of the distribution (to decide whether it
-     makes sense to build the histogram/MCV list).
+* Investigate using histogram and MCV list to verify the dependencies.
 
-TODO Using a min/max of selectivities would probably make more sense for the
-     associated columns.
+* Investigate statistical testing of the distribution (to decide whether it
+  makes sense to build the histogram/MCV list).
 
-TODO Consider eliminating the implied columns from the histogram and MCV lists
-     (but maybe that's not a good idea, because that'd make it impossible to use
-     these stats for non-equality clauses and also it wouldn't be possible to
-     use the stats for verification of the dependencies).
+* Consider eliminating the implied columns from the histogram and MCV lists
+  (but maybe that's not a good idea, because that'd make it impossible to use
+  these stats for non-equality clauses and also it wouldn't be possible to
+  use the stats for verification of the dependencies).
 
-TODO The reduction probably might be extended to also handle IS NULL clauses,
-     assuming we fix the ANALYZE to properly handle NULL values. We however
-     won't be able to reduce IS NOT NULL (unless I'm missing something).
+* The reduction probably might be extended to also handle IS NULL clauses,
+  assuming we fix the ANALYZE to properly handle NULL values. We however
+  won't be able to reduce IS NOT NULL (unless I'm missing something).
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
new file mode 100644
index 0000000..a38ea7b
--- /dev/null
+++ b/src/backend/utils/mvstats/README.stats
@@ -0,0 +1,36 @@
+Multivariate statististics
+==========================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Multivariate stats track different types of dependencies between the columns,
+hopefully improving the estimates.
+
+Currently we only have one kind of multivariate statistics - soft functional
+dependencies, and we use it to improve estimates of equality clauses. See
+README.dependencies for details.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable multivariate stats
+        exist (so if you are not using multivariate stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are multivariate stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with multivariate statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index a755c49..dcb7c78 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
@@ -354,3 +357,23 @@ multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
 							   b->values[dim], b->isnull[dim],
 							   &mss->ssup[dim]);
 }
+
+int
+multi_sort_compare_dims(int start, int end,
+						const SortItem *a, const SortItem *b,
+						MultiSortSupport mss)
+{
+	int dim;
+
+	for (dim = start; dim <= end; dim++)
+	{
+		int r = ApplySortComparator(a->values[dim], a->isnull[dim],
+									b->values[dim], b->isnull[dim],
+									&mss->ssup[dim]);
+
+		if (r != 0)
+			return r;
+	}
+
+	return 0;
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index d96422d..a019ea6 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -70,6 +70,9 @@ int multi_sort_compare(const void *a, const void *b, void *arg);
 int multi_sort_compare_dim(int dim, const SortItem *a,
 						   const SortItem *b, MultiSortSupport mss);
 
+int multi_sort_compare_dims(int start, int end, const SortItem *a,
+							const SortItem *b, MultiSortSupport mss);
+
 /* comparators, used when constructing multivariate stats */
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 2a064a0..412dc30 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -17,293 +17,521 @@
 #include "common.h"
 #include "utils/lsyscache.h"
 
+/* internal state for generator of variations (k-permutations of n elements) */
+typedef struct VariationGeneratorData {
+
+	int k;					/* size of the k-permutation */
+	int current;			/* index of the next variation to return */
+
+	int nvariations;		/* number of variations generated (size of array) */
+	int	variations[1];		/* array of pre-built variations */
+
+} VariationGeneratorData;
+
+typedef VariationGeneratorData* VariationGenerator;
+
+/*
+ * generate all variations (k-permutations of n elements)
+ */
+static void
+generate_variations(VariationGenerator state,
+					int n, int maxlevel, int level, int *current)
+{
+	int i, j;
+
+	/* initialize */
+	if (level == 0)
+	{
+		current = (int*)palloc0(sizeof(int) * (maxlevel+1));
+		state->current = 0;
+	}
+
+	for (i = 0; i < n; i++)
+	{
+		/* check if the value is already used current variation */
+		bool found = false;
+		for (j = 0; j < level; j++)
+		{
+			if (current[j] == i)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		/* already used, so try the next element */
+		if (found)
+			continue;
+
+		/* ok, we can use this element, so store it */
+		current[level] = i;
+
+		/* and check if we do have a complete variation of k elements */
+		if (level == maxlevel)
+		{
+			/* yep, store the variation */
+			Assert(state->current < state->nvariations);
+			memcpy(&state->variations[(state->k * state->current)], current,
+				   sizeof(int) * (maxlevel+1));
+			state->current++;
+		}
+		else
+			/* nope, look for additional elements */
+			generate_variations(state, n, maxlevel, level+1, current);
+	}
+
+	if (level == 0)
+		pfree(current);
+}
+
 /*
- * Detect functional dependencies between columns.
+ * initialize the generator of variations, and prebuild the variations
  *
- * TODO This builds a complete set of dependencies, i.e. including transitive
- *      dependencies - if we identify [A => B] and [B => C], we're likely to
- *      identify [A => C] too. It might be better to  keep only the minimal set
- *      of dependencies, i.e. prune all the dependencies that we can recreate
- *      by transivitity.
- * 
- *      There are two conceptual ways to do that:
- * 
- *      (a) generate all the rules, and then prune the rules that may be
- *          recteated by combining other dependencies, or
- * 
- *      (b) performing the 'is combination of other dependencies' check before
- *          actually doing the work
- * 
- *      The second option has the advantage that we don't really need to perform
- *      the sort/count. It's not sufficient alone, though, because we may
- *      discover the dependencies in the wrong order. For example we may find
+ * This pre-builds all the variations. We could also generate them in
+ * generator_next(), but this seems simpler.
+ */
+static VariationGenerator
+generator_init(int2vector *attrs, int k)
+{
+	int	i;
+	int	n = attrs->dim1;
+	int	nvariations;
+	VariationGenerator	state;
+
+	Assert((n >= k) &&  (k > 0));
+
+	/* compute the total number of variations as n!/(n-k)! */
+	nvariations = n;
+	for (i = 1; i < k; i++)
+		nvariations *= (n - i);
+
+	/* allocate the generator state as a single chunk of memory */
+	state = (VariationGenerator)palloc0(
+					offsetof(VariationGeneratorData, variations)
+					+ (nvariations * k * sizeof(int)));	/* variations */
+
+	state->nvariations = nvariations;
+	state->k = k;
+
+	/* now actually pre-generate all the variations */
+	generate_variations(state, n, (k-1), 0, NULL);
+
+	/* we expect to generate exactly the right number of variations */
+	Assert(state->nvariations == state->current);
+
+	/* reset the index */
+	state->current = 0;
+
+	return state;
+}
+
+/* free the generator state */
+static void
+generator_free(VariationGenerator state)
+{
+	/* we've allocated a single chunk, so just free it */
+	pfree(state);
+}
+
+/* generate next combination */
+static int*
+generator_next(VariationGenerator state, int2vector *attrs)
+{
+	if (state->current == state->nvariations)
+		return NULL;
+
+	return &state->variations[state->k * state->current++];
+}
+
+/*
+ * check if the dependency is implied by existing dependencies
+ *
+ * A dependency is considered implied, if there exists a dependency with the
+ * same column on the left, and a subset of columns on the right side. So for
+ * example if we have a dependency
+ *
+ *     (a,b,c) -> d
  *
- *          (a -> b), (a -> c) and then (b -> c)
+ * then we are looking for these six dependencies
  *
- *      None of those dependencies is a combination of the already known ones,
- *      yet (a -> C) is a combination of (a -> b) and (b -> c).
+ *     (a) -> d
+ *     (b) -> d
+ *     (c) -> d
+ *     (a,b) -> d
+ *     (a,c) -> d
+ *     (b,c) -> d
  *
- * 
- * FIXME Currently we simply replace NULL values with 0 and then handle is as
- *       a regular value, but that groups NULL and actual 0 values. That's
- *       clearly incorrect - we need to handle NULL values as a separate value.
+ * This does not detect transitive dependencies. For example if we have
+ *
+ *     (a) -> b
+ *     (b) -> c
+ *
+ * then obviously
+ *
+ *     (a) -> c
+ *
+ * but this is not detected. Extending the method to handle transitive cases
+ * is future work.
  */
-MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
-					  VacAttrStats **stats)
+static bool
+dependency_is_implied(MVDependencies dependencies, int k, int *dependency,
+					  int2vector * attrs)
 {
-	int i;
-	int numattrs = attrs->dim1;
+	bool	implied = false;
+	int		i, j, l;
+	int	   *tmp;
 
-	/* result */
-	int ndeps = 0;
-	MVDependencies	dependencies = NULL;
-	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+	if (dependencies == NULL)
+		return false;
 
-	/* TODO Maybe this should be somehow related to the number of
-	 *      distinct values in the two columns we're currently analyzing.
-	 *      Assuming the distribution is uniform, we can estimate the
-	 *      average group size and use it as a threshold. Or something
-	 *      like that. Seems better than a static approach.
-	 */
-	int min_group_size = 3;
+	tmp = (int*)palloc0(sizeof(int) * k);
+
+	/* translate the indexes to actual attribute numbers */
+	for (i = 0; i < k; i++)
+		tmp[i] = attrs->values[dependency[i]];
+
+	/* search for a smaller */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		bool contained = true;
+		MVDependency dep = dependencies->deps[i];
+
+		/* does the last attribute match? */
+		if (tmp[k-1] != dep->attributes[dep->nattributes-1])
+			continue;	/* nope, no need to check this dependency further */
+
+		/* are the conditions superset of the existing dependency? */
+		for (j = 0; j < (dep->nattributes-1); j++)
+		{
+			bool found = false;
+
+			for (l = 0; l < (k-1); l++)
+			{
+				if (tmp[l] == dep->attributes[j])
+				{
+					found = true;
+					break;
+				}
+			}
+
+			/* we've found an attribute not included in the new dependency */
+			if (! found)
+			{
+				contained = false;
+				break;
+			}
+		}
+
+		/* we've found an existing dependency, trivially proving the new one */
+		if (contained)
+		{
+			implied = true;
+			break;
+		}
+	}
 
-	/* dimension indexes we'll check for associations [a => b] */
-	int dima, dimb;
+	pfree(tmp);
+
+	return implied;
+}
+
+/*
+ * validates functional dependency on the data
+ *
+ * An actual work horse of detecting functional dependencies. Given a variation
+ * of k attributes, it checks that the first (k-1) are sufficient to determine
+ * the last one.
+ */
+static bool
+dependency_is_valid(int numrows, HeapTuple *rows, int k, int * dependency,
+					VacAttrStats **stats, int2vector *attrs)
+{
+	int i, j;
+	int nvalues = numrows * k;
 
 	/*
-	 * We'll reuse the same array for all the 2-column combinations.
-	 *
-	 * It's possible to sort the sample rows directly, but this seemed
-	 * somehow simples / less error prone. Another option would be to
-	 * allocate the arrays for each SortItem separately, but that'd be
-	 * significant overhead (not just CPU, but especially memory bloat).
+	 * XXX Maybe the threshold should be somehow related to the number of
+	 *     distinct values in the combination of columns we're analyzing.
+	 *     Assuming the distribution is uniform, we can estimate the average
+	 *     group size and use it as a threshold, similarly to what we do for
+	 *     MCV lists.
 	 */
-	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	int min_group_size = 3;
+
+	/* number of groups supporting / contradicting the dependency */
+	int n_supporting = 0;
+	int n_contradicting = 0;
+
+	/* counters valid within a group */
+	int group_size = 0;
+	int n_violations = 0;
 
-	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
-	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+	int n_supporting_rows = 0;
+	int n_contradicting_rows = 0;
 
+	/* sort info for all attributes columns */
+	MultiSortSupport mss = multi_sort_init(k);
+
+	/* data for the sort */
+	SortItem *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum    *values = (Datum*)palloc0(sizeof(Datum) * nvalues);
+	bool     *isnull = (bool*)palloc0(sizeof(bool) * nvalues);
+
+	/* fix the pointers to values/isnull */
 	for (i = 0; i < numrows; i++)
 	{
-		items[i].values = &values[i * 2];
-		items[i].isnull = &isnull[i * 2];
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
 	}
 
-	Assert(numattrs >= 2);
-
 	/*
-	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 * Verify the dependency (a,b,...)->z, using a rather simple algorithm:
+	 *
+	 * (a) sort the data lexicographically
 	 *
-	 * (a) sort the data by [A,B]
-	 * (b) split the data into groups by A (new group whenever a value changes)
-	 * (c) count different values in the B column (again, value changes)
+	 * (b) split the data into groups by first (k-1) columns
 	 *
-	 * TODO It should be rather simple to merge [A => B] and [A => C] into
-	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
-	 *      and you're done.
+	 * (c) for each group count different values in the last column
 	 */
-	for (dima = 0; dima < numattrs; dima++)
+
+	/* prepare the sort function for the first dimension, and SortItem array */
+	for (i = 0; i < k; i++)
 	{
-		/* prepare the sort function for the first dimension */
-		multi_sort_add_dimension(mss, 0, dima, stats);
+		multi_sort_add_dimension(mss, i, dependency[i], stats);
 
-		for (dimb = 0; dimb < numattrs; dimb++)
+		/* accumulate all the data for both columns into an array and sort it */
+		for (j = 0; j < numrows; j++)
 		{
-			SortItem current;
-
-			/* number of groups supporting / contradicting the dependency */
-			int n_supporting = 0;
-			int n_contradicting = 0;
-
-			/* counters valid within a group */
-			int group_size = 0;
-			int n_violations = 0;
-
-			int n_supporting_rows = 0;
-			int n_contradicting_rows = 0;
-
-			/* make sure the columns are different (A => A) */
-			if (dima == dimb)
-				continue;
-
-			/* prepare the sort function for the second dimension */
-			multi_sort_add_dimension(mss, 1, dimb, stats);
-
-			/* reset the values and isnull flags */
-			memset(values, 0, sizeof(Datum) * numrows * 2);
-			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[dependency[i]],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
 
-			/* accumulate all the data for both columns into an array and sort it */
-			for (i = 0; i < numrows; i++)
-			{
-				items[i].values[0]
-					= heap_getattr(rows[i], attrs->values[dima],
-									stats[dima]->tupDesc, &items[i].isnull[0]);
+	/* sort the items so that we can detect the groups */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
 
-				items[i].values[1]
-					= heap_getattr(rows[i], attrs->values[dimb],
-									stats[dimb]->tupDesc, &items[i].isnull[1]);
-			}
+	/*
+	 * Walk through the sorted array, split it into rows according to the first
+	 * (k-1) columns. If there's a single value in the last column, we count
+	 * the group as 'supporting' the functional dependency. Otherwise we count
+	 * it as contradicting.
+	 *
+	 * We also require a group to have a minimum number of rows to be considered
+	 * useful for supporting the dependency. Contradicting groups may be of
+	 * any size, though.
+	 *
+	 * XXX The minimum size requirement makes it impossible to identify case
+	 *     when both columns are unique (or nearly unique), and therefore
+	 *     trivially functionally dependent.
+	 */
 
-			qsort_arg((void *) items, numrows, sizeof(SortItem),
-					  multi_sort_compare, mss);
+	/* start with the first row forming a group */
+	group_size  = 1;
 
+	for (i = 1; i < numrows; i++)
+	{
+		/* end of the preceding group */
+		if (multi_sort_compare_dims(0, (k-2), &items[i-1], &items[i], mss) != 0)
+		{
 			/*
-			 * Walk through the array, split it into rows according to
-			 * the A value, and count distinct values in the other one.
-			 * If there's a single B value for the whole group, we count
-			 * it as supporting the association, otherwise we count it
-			 * as contradicting.
-			 *
-			 * Furthermore we require a group to have at least a certain
-			 * number of rows to be considered useful for supporting the
-			 * dependency. But when it's contradicting, use it always useful.
+			 * If there is a single are no contradicting rows, count the group
+			 * as supporting, otherwise contradicting.
 			 */
-
-			/* start with values from the first row */
-			current = items[0];
-			group_size  = 1;
-
-			for (i = 1; i < numrows; i++)
-			{
-				/* end of the group */
-				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
-				{
-					/*
-					 * If there are no contradicting rows, count it as
-					 * supporting (otherwise contradicting), but only if
-					 * the group is large enough.
-					 *
-					 * The requirement of a minimum group size makes it
-					 * impossible to identify [unique,unique] cases, but
-					 * that's probably a different case. This is more
-					 * about [zip => city] associations etc.
-					 *
-					 * If there are violations, count the group/rows as
-					 * a violation.
-					 *
-					 * It may ne neither, if the group is too small (does
-					 * not contain at least min_group_size rows).
-					 */
-					if ((n_violations == 0) && (group_size >= min_group_size))
-					{
-						n_supporting +=  1;
-						n_supporting_rows += group_size;
-					}
-					else if (n_violations > 0)
-					{
-						n_contradicting +=  1;
-						n_contradicting_rows += group_size;
-					}
-
-					/* current values start a new group */
-					n_violations = 0;
-					group_size = 0;
-				}
-				/* mismatch of a B value is contradicting */
-				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
-				{
-					n_violations += 1;
-				}
-
-				current = items[i];
-				group_size += 1;
-			}
-
-			/* handle the last group (just like above) */
 			if ((n_violations == 0) && (group_size >= min_group_size))
 			{
-				n_supporting += 1;
+				n_supporting +=  1;
 				n_supporting_rows += group_size;
 			}
-			else if (n_violations)
+			else if (n_violations > 0)
 			{
-				n_contradicting += 1;
+				n_contradicting +=  1;
 				n_contradicting_rows += group_size;
 			}
 
-			/*
-			 * See if the number of rows supporting the association is at least
-			 * 10x the number of rows violating the hypothetical dependency.
-			 *
-			 * TODO This is rather arbitrary limit - I guess it's possible to do
-			 *      some math to come up with a better rule (e.g. testing a hypothesis
-			 *      'this is due to randomness'). We can create a contingency table
-			 *      from the values and use it for testing. Possibly only when
-			 *      there are no contradicting rows?
-			 *
-			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
-			 *      means there's a 1:1 relation (or one is a 'label'), making the
-			 *      conditions rather redundant. Although it's possible that the
-			 *      query uses incompatible combination of values.
-			 */
-			if (n_supporting_rows > (n_contradicting_rows * 10))
-			{
-				if (dependencies == NULL)
-				{
-					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
-					dependencies->magic = MVSTAT_DEPS_MAGIC;
-				}
-				else
-					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
-											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+			/* current values start a new group */
+			n_violations = 0;
+			group_size = 0;
+		}
+		/* first colums match, but the last one does not (so contradicting) */
+		else if (multi_sort_compare_dims((k-1), (k-1), &items[i-1], &items[i], mss) != 0)
+			n_violations += 1;
 
-				/* update the */
-				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
-				dependencies->deps[ndeps]->a = attrs->values[dima];
-				dependencies->deps[ndeps]->b = attrs->values[dimb];
+		group_size += 1;
+	}
 
-				dependencies->ndeps = (++ndeps);
-			}
-		}
+	/* handle the last group (just like above) */
+	if ((n_violations == 0) && (group_size >= min_group_size))
+	{
+		n_supporting += 1;
+		n_supporting_rows += group_size;
+	}
+	else if (n_violations)
+	{
+		n_contradicting += 1;
+		n_contradicting_rows += group_size;
 	}
 
 	pfree(items);
 	pfree(values);
 	pfree(isnull);
-	pfree(stats);
 	pfree(mss);
 
-	return dependencies;
+	/*
+	 * See if the number of rows supporting the association is at least 10x the
+	 * number of rows violating the hypothetical dependency.
+	 */
+	return (n_supporting_rows > (n_contradicting_rows * 10));
 }
 
 /*
- * Store the dependencies into a bytea, so that it can be stored in the
- * pg_mv_statistic catalog.
+ * detects functional dependencies between groups of columns
+ *
+ * Generates all possible subsets of columns (variations) and checks if the
+ * last one is determined by the preceding ones. For example given 3 columns,
+ * there are 12 variations (6 for variations on 2 columns, 6 for 3 columns):
+ *
+ *     two columns            three columns
+ *     -----------            -------------
+ *     (a) -> c               (a,b) -> c
+ *     (b) -> c               (b,a) -> c
+ *     (a) -> b               (a,c) -> b
+ *     (c) -> b               (c,a) -> b
+ *     (c) -> a               (c,b) -> a
+ *     (b) -> a               (b,c) -> a
+ *
+ * Clearly some of the variations are redundant, as the order of columns on the
+ * left side does not matter. This is detected in dependency_is_implied, and
+ * those dependencies are ignored.
  *
- * Currently this only supports simple two-column rules, and stores them
- * as a sequence of attnum pairs. In the future, this needs to be made
- * more complex to support multiple columns on both sides of the
- * implication (using AND on left, OR on right).
+ * We however do not detect that dependencies are transitively implied. For
+ * example given dependencies
+ *
+ *     (a) -> b
+ *     (b) -> c
+ *
+ * then
+ *
+ *     (a) -> c
+ *
+ * is trivially implied. However we don't detect that and all three dependencies
+ * will get included in the resulting set. Eliminating such transitively implied
+ * dependencies is future work.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int k;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	MVDependencies	dependencies = NULL;
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * We'll try build functional dependencies starting from the smallest ones
+	 * covering jut 2 columns, to the largest ones, covering all columns
+	 * included int the statistics. We start from the smallest ones because
+	 * we want to be able to skip already implied ones.
+	 */
+	for (k = 2; k <= numattrs; k++)
+	{
+		int *dependency;	/* array with k elements */
+
+		/* prepare a generator of variation */
+		VariationGenerator generator = generator_init(attrs, k);
+
+		/* generate all possible variations of k values (out of n) */
+		while ((dependency = generator_next(generator, attrs)))
+		{
+			MVDependency d;
+
+			/* skip dependencies that are already trivially implied */
+			if (dependency_is_implied(dependencies, k, dependency, attrs))
+				continue;
+
+			/* also skip dependencies that don't seem to be valid */
+			if (! dependency_is_valid(numrows, rows, k, dependency, stats, attrs))
+				continue;
+
+			d = (MVDependency)palloc0(offsetof(MVDependencyData, attributes)
+											   + k * sizeof(int));
+
+			/* copy the dependency, but translate it to actuall attnums */
+			d->nattributes = k;
+			for (i = 0; i < k; i++)
+				d->attributes[i] = attrs->values[dependency[i]];
+
+			/* initialize the list of dependencies */
+			if (dependencies == NULL)
+			{
+				dependencies
+					= (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+				dependencies->magic = MVSTAT_DEPS_MAGIC;
+				dependencies->type  = MVSTAT_DEPS_TYPE_BASIC;
+				dependencies->ndeps = 0;
+			}
+
+			dependencies->ndeps++;
+			dependencies = (MVDependencies)repalloc(dependencies,
+							offsetof(MVDependenciesData, deps)
+							+ dependencies->ndeps * sizeof(MVDependency));
+
+			dependencies->deps[dependencies->ndeps-1] = d;
+		}
+
+		/* we're done with variations of k elements, so free the generator */
+		generator_free(generator);
+	}
+
+	return dependencies;
+}
+
+
+/*
+ * serialize list of dependencies into a bytea
  */
 bytea *
 serialize_mv_dependencies(MVDependencies dependencies)
 {
 	int i;
+	bytea * output;
+	char *tmp;
 
-	/* we need to store ndeps, and each needs 2 * int16 */
+	/* we need to store ndeps, with a number of attributes for each one */
 	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
-				+ dependencies->ndeps * (sizeof(int16) * 2);
-
-	bytea * output = (bytea*)palloc0(len);
+			  + sizeof(int) * dependencies->ndeps;
 
-	char * tmp = VARDATA(output);
+	/* and also include space for the actual attribute numbers */
+	for (i = 0; i < dependencies->ndeps; i++)
+		len += (sizeof(int16) * dependencies->deps[i]->nattributes);
 
+	output = (bytea*)palloc0(len);
 	SET_VARSIZE(output, len);
 
+	tmp = VARDATA(output);
+
 	/* first, store the number of dimensions / items */
 	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
 	tmp += offsetof(MVDependenciesData, deps);
 
-	/* walk through the dependencies and copy both columns into the bytea */
+	/* store number of attributes and attribute numbers for each dependency */
 	for (i = 0; i < dependencies->ndeps; i++)
 	{
-		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
-		tmp += sizeof(int16);
+		MVDependency d = dependencies->deps[i];
+
+		memcpy(tmp, &(d->nattributes), sizeof(int));
+		tmp += sizeof(int);
 
-		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
-		tmp += sizeof(int16);
+		memcpy(tmp, d->attributes, sizeof(int16) * d->nattributes);
+		tmp += sizeof(int16) * d->nattributes;
+
+		Assert(tmp <= ((char*)output + len));
 	}
 
 	return output;
@@ -338,20 +566,21 @@ deserialize_mv_dependencies(bytea * data)
 	tmp += offsetof(MVDependenciesData, deps);
 
 	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
-	{
-		pfree(dependencies);
-		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
-		return NULL;
-	}
+		elog(ERROR, "invalid dependency type %d (expected %dd)",
+			 dependencies->type, MVSTAT_DEPS_MAGIC);
+
+	if (dependencies->type != MVSTAT_DEPS_TYPE_BASIC)
+		elog(ERROR, "invalid dependency type %d (expected %dd)",
+			 dependencies->type, MVSTAT_DEPS_TYPE_BASIC);
 
 	Assert(dependencies->ndeps > 0);
 
-	/* what bytea size do we expect for those parameters */
+	/* what minimum bytea size do we expect for those parameters */
 	expected_size = offsetof(MVDependenciesData,deps) +
-					dependencies->ndeps * sizeof(int16) * 2;
+					dependencies->ndeps * (sizeof(int) + sizeof(int16) * 2);
 
-	if (VARSIZE_ANY_EXHDR(data) != expected_size)
-		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
 			 VARSIZE_ANY_EXHDR(data), expected_size);
 
 	/* allocate space for the MCV items */
@@ -360,15 +589,35 @@ deserialize_mv_dependencies(bytea * data)
 
 	for (i = 0; i < dependencies->ndeps; i++)
 	{
-		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+		int k;
+		MVDependency d;
+
+		/* number of attributes */
+		memcpy(&k, tmp, sizeof(int));
+		tmp += sizeof(int);
+
+		/* is the number of attributes valid? */
+		Assert((k >= 2) && (k <= MVSTATS_MAX_DIMENSIONS));
 
-		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
-		tmp += sizeof(int16);
+		/* now that we know the number of attributes, allocate the dependency */
+		d = (MVDependency)palloc0(offsetof(MVDependencyData, attributes)
+								  + k * sizeof(int));
 
-		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
-		tmp += sizeof(int16);
+		d->nattributes = k;
+
+		/* copy attribute numbers */
+		memcpy(d->attributes, tmp, sizeof(int16) * d->nattributes);
+		tmp += sizeof(int16) * d->nattributes;
+
+		dependencies->deps[i] = d;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char*)data + VARSIZE_ANY(data)));
 	}
 
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char*)data + VARSIZE_ANY(data)));
+
 	return dependencies;
 }
 
@@ -392,46 +641,70 @@ pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
 
-/* print the dependencies
- *
- * TODO  Would be nice if this knew the actual column names (instead of
- *       the attnums).
+/*
+ * print the dependencies
  *
- * FIXME This is really ugly and does not really check the lengths and
- *       strcpy/snprintf return values properly. Needs to be fixed.
+ * TODO  Would be nice if this printed column names (instead of just attnums).
  */
 Datum
 pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 {
-	int			i = 0;
-	bytea	   *data = PG_GETARG_BYTEA_P(0);
-	char	   *result = NULL;
-	int			len = 0;
+	int		i, j;
+	bytea   *data = PG_GETARG_BYTEA_P(0);
+	StringInfoData	buf;
 
 	MVDependencies dependencies = deserialize_mv_dependencies(data);
 
 	if (dependencies == NULL)
 		PG_RETURN_NULL();
 
+	initStringInfo(&buf);
+
 	for (i = 0; i < dependencies->ndeps; i++)
 	{
 		MVDependency dependency = dependencies->deps[i];
-		char	buffer[128];
 
-		int		tmp = snprintf(buffer, 128, "%s%d => %d",
-				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+		if (i > 0)
+			appendStringInfo(&buf, ", ");
 
-		if (tmp < 127)
+		/* conditions */
+		appendStringInfoChar(&buf, '(');
+		for (j = 0; j < dependency->nattributes-1; j++)
 		{
-			if (result == NULL)
-				result = palloc0(len + tmp + 1);
-			else
-				result = repalloc(result, len + tmp + 1);
+			if (j > 0)
+				appendStringInfoChar(&buf, ',');
 
-			strcpy(result + len, buffer);
-			len += tmp;
+			appendStringInfo(&buf, "%d", dependency->attributes[j]);
 		}
+
+		/* the implied attribute */
+		appendStringInfo(&buf, ") => %d",
+						 dependency->attributes[dependency->nattributes-1]);
 	}
 
-	PG_RETURN_TEXT_P(cstring_to_text(result));
+	PG_RETURN_TEXT_P(cstring_to_text(buf.data));
+}
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
 }
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 7ebd961..c6f45ab 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,22 +17,31 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item / histogram bucket matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
 
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
-/* An associative rule, tracking [a => b] dependency.
- *
- * TODO Make this work with multiple columns on both sides.
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
  */
 typedef struct MVDependencyData {
-	int16	a;
-	int16	b;
+	int 	nattributes;	/* number of attributes */
+	int16	attributes[1];	/* attribute numbers */
 } MVDependencyData;
 
 typedef MVDependencyData* MVDependency;
 
 typedef struct MVDependenciesData {
 	uint32			magic;		/* magic constant marker */
+	uint32			type;		/* type of MV Dependencies (BASIC) */
 	int32			ndeps;		/* number of dependencies */
 	MVDependency	deps[1];	/* XXX why not a pointer? */
 } MVDependenciesData;
@@ -48,6 +57,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..e759997
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,172 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | 1 => 2, 1 => 3, 2 => 3
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |     pg_mv_stats_dependencies_show      
+--------------+------------+----------------------------------------
+ t            | t          | 2 => 1, 3 => 1, 3 => 2, 4 => 1, 4 => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index bec0316..4f2ffb8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7e9b319..097a04f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -162,3 +162,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..48dea4d
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
-- 
2.5.0

0002-shared-infrastructure-and-functional-dependencies.patchtext/x-patch; charset=UTF-8; name=0002-shared-infrastructure-and-functional-dependencies.patchDownload

From 2c719f145ab1f74e87b0da2c4cff67af9f4a8e50 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/9] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate stats, most
importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON table (columns) WITH (options)
- DROP STATISTICS name
- ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME
- implementation of functional dependencies (the simplest type of
  multivariate statistics)
- building functional dependencies in ANALYZE
- updates regression tests (new catalog etc.)

This does not include any changes to the optimizer, i.e. it does not
influence the query planning (subject to follow-up patches).

The current implementation requires a valid 'ltopr' for the columns, so
that we can sort the sample rows in various ways, both in this patch
and other kinds of statistics. Maybe this restriction could be relaxed
in the future, requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV list with
limited functionality) might be made to work with hashes of the values,
which is sufficient for equality comparisons. But the queries would
require the equality operator anyway, so it's not really a weaker
requirement. The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple and probably
needs improvements, so that it detects more complicated dependencies,
and also validation of the math.

The name 'functional dependencies' is more correct (than 'association
rules') as it's exactly the name used in relational theory (esp. Normal
Forms) for tracking column-level dependencies.

The multivariate statistics are automatically removed in two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics would be
     defined on less than 2 columns (remaining)

If there are more at least two remaining columns, we keep the
statistics but perform cleanup on the next ANALYZE. The dropped columns
are removed from stakeys, and the new statistics is built on the
smaller set.

We can't do this at DROP COLUMN, because that'd leave us with invalid
statistics, or we'd have to throw it away although we can still use it.
This lazy approach lets us use the statistics although some of the
columns are dead.

This also adds a simple list of statistics to \d in psql.

This means the statistics are created within a schema by using a
qualified name (or using the default schema)

   CREATE STATISTICS schema.statistics ON ...

and then dropped by specifying qualified name

   DROP STATISTICS schema.statistics

or searching through search_path (just like with other objects).

This also gets rid of the "(opt_)stats_name" definitions in gram.y and
instead replaces them with just "opt_any_name", although the optional
case is not really handled currently - there's no generated name yet
(so either we should drop it or implement it).

I'm not entirely sure making statistics schema-specific is that a great
idea. Maybe it should be "global", but that does not seem right (e.g.
it makes multi-tenant systems based on schemas more difficult to
manage, because tenants would interact).
---
 doc/src/sgml/ref/allfiles.sgml                |   3 +
 doc/src/sgml/ref/alter_statistics.sgml        | 115 +++++++
 doc/src/sgml/ref/create_statistics.sgml       | 198 ++++++++++++
 doc/src/sgml/ref/drop_statistics.sgml         |  91 ++++++
 doc/src/sgml/reference.sgml                   |   2 +
 src/backend/catalog/Makefile                  |   1 +
 src/backend/catalog/aclchk.c                  |  27 ++
 src/backend/catalog/dependency.c              |  11 +-
 src/backend/catalog/heap.c                    | 102 ++++++
 src/backend/catalog/namespace.c               |  51 +++
 src/backend/catalog/objectaddress.c           |  54 ++++
 src/backend/catalog/system_views.sql          |  11 +
 src/backend/commands/Makefile                 |   6 +-
 src/backend/commands/alter.c                  |   3 +
 src/backend/commands/analyze.c                |  21 ++
 src/backend/commands/dropcmds.c               |   4 +
 src/backend/commands/event_trigger.c          |   3 +
 src/backend/commands/statscmds.c              | 277 ++++++++++++++++
 src/backend/nodes/copyfuncs.c                 |  17 +
 src/backend/nodes/outfuncs.c                  |  18 ++
 src/backend/optimizer/util/plancat.c          |  59 ++++
 src/backend/parser/gram.y                     |  60 +++-
 src/backend/tcop/utility.c                    |  14 +
 src/backend/utils/Makefile                    |   2 +-
 src/backend/utils/cache/relcache.c            |  59 ++++
 src/backend/utils/cache/syscache.c            |  23 ++
 src/backend/utils/mvstats/Makefile            |  17 +
 src/backend/utils/mvstats/README.dependencies | 222 +++++++++++++
 src/backend/utils/mvstats/common.c            | 356 +++++++++++++++++++++
 src/backend/utils/mvstats/common.h            |  75 +++++
 src/backend/utils/mvstats/dependencies.c      | 437 ++++++++++++++++++++++++++
 src/bin/psql/describe.c                       |  44 +++
 src/include/catalog/dependency.h              |   5 +-
 src/include/catalog/heap.h                    |   1 +
 src/include/catalog/indexing.h                |   7 +
 src/include/catalog/namespace.h               |   2 +
 src/include/catalog/pg_mv_statistic.h         |  75 +++++
 src/include/catalog/pg_proc.h                 |   5 +
 src/include/catalog/toasting.h                |   1 +
 src/include/commands/defrem.h                 |   4 +
 src/include/nodes/nodes.h                     |   2 +
 src/include/nodes/parsenodes.h                |  12 +
 src/include/nodes/relation.h                  |  28 ++
 src/include/utils/acl.h                       |   1 +
 src/include/utils/mvstats.h                   |  70 +++++
 src/include/utils/rel.h                       |   4 +
 src/include/utils/relcache.h                  |   1 +
 src/include/utils/syscache.h                  |   2 +
 src/test/regress/expected/object_address.out  |   7 +-
 src/test/regress/expected/rules.out           |   9 +
 src/test/regress/expected/sanity_check.out    |   1 +
 src/test/regress/sql/object_address.sql       |   4 +-
 52 files changed, 2613 insertions(+), 11 deletions(-)
 create mode 100644 doc/src/sgml/ref/alter_statistics.sgml
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/README.dependencies
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index bf95453..524ed83 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -32,6 +32,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY alterServer        SYSTEM "alter_server.sgml">
 <!ENTITY alterSequence      SYSTEM "alter_sequence.sgml">
 <!ENTITY alterSystem        SYSTEM "alter_system.sgml">
+<!ENTITY alterStatistics    SYSTEM "alter_statistics.sgml">
 <!ENTITY alterTable         SYSTEM "alter_table.sgml">
 <!ENTITY alterTableSpace    SYSTEM "alter_tablespace.sgml">
 <!ENTITY alterTSConfig      SYSTEM "alter_tsconfig.sgml">
@@ -76,6 +77,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
 <!ENTITY createTableSpace   SYSTEM "create_tablespace.sgml">
@@ -119,6 +121,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
 <!ENTITY dropTransform      SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/alter_statistics.sgml b/doc/src/sgml/ref/alter_statistics.sgml
new file mode 100644
index 0000000..aa421c0
--- /dev/null
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -0,0 +1,115 @@
+<!--
+doc/src/sgml/ref/alter_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-ALTERSTATISTICS">
+ <indexterm zone="sql-alterstatistics">
+  <primary>ALTER STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>ALTER STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>ALTER STATISTICS</refname>
+  <refpurpose>
+   change the definition of a multivariate statistics
+  </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> | CURRENT_USER | SESSION_USER }
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> RENAME TO <replaceable class="parameter">new_name</replaceable>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> SET SCHEMA <replaceable class="parameter">new_schema</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>ALTER STATISTICS</command> changes the parameters of an existing
+   multivariate statistics.  Any parameters not specifically set in the
+   <command>ALTER STATISTICS</command> command retain their prior settings.
+  </para>
+
+  <para>
+   You must own the statistics to use <command>ALTER STATISTICS</>.
+   To change a statistics' schema, you must also have <literal>CREATE</>
+   privilege on the new schema.
+   To alter the owner, you must also be a direct or indirect member of the new
+   owning role, and that role must have <literal>CREATE</literal> privilege on
+   the statistics' schema.  (These restrictions enforce that altering the owner
+   doesn't do anything you couldn't do by dropping and recreating the statistics.
+   However, a superuser can alter ownership of any statistics anyway.)
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><replaceable class="parameter">name</replaceable></term>
+      <listitem>
+       <para>
+        The name (optionally schema-qualified) of a statistics to be altered.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="PARAMETER">new_owner</replaceable></term>
+      <listitem>
+       <para>
+        The user name of the new owner of the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+   
+     <varlistentry>
+      <term><replaceable class="parameter">new_name</replaceable></term>
+      <listitem>
+       <para>
+        The new name for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+   
+     <varlistentry>
+      <term><replaceable class="parameter">new_schema</replaceable></term>
+      <listitem>
+       <para>
+        The new schema for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+  </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>ALTER STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..ff09fa5
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,198 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON <replaceable class="PARAMETER">table_name</replaceable> ( [
+  { <replaceable class="PARAMETER">column_name</replaceable> } ] [, ...])
+[ WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )</literal></term>
+    <listitem>
+     <para>
+      ...
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Statistics Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>statistics parameters</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-notes">
+  <title>Notes</title>
+
+    <para>
+     ...
+    </para>
+
+ </refsect1>
+
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   Create table <structname>t1</> with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for detemining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
+
+<programlisting>
+CREATE TABLE t1 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t1 SELECT i/100, i/500
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s1 ON t1 (a, b) WITH (dependencies);
+
+ANALYZE t1;
+
+-- valid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
+
+-- invalid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 2);
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..dd9047a
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,91 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics does not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 03020df..2b07b2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -104,6 +104,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createTable;
    &createTableAs;
    &createTableSpace;
@@ -147,6 +148,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropTable;
    &dropTableSpace;
    &dropTSConfig;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 25130ec..058b8a9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 0f3bc07..e21aacd 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -38,6 +38,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -5021,6 +5022,32 @@ pg_extension_ownercheck(Oid ext_oid, Oid roleid)
 }
 
 /*
+ * Ownership check for a multivariate statistics (specified by OID).
+ */
+bool
+pg_statistics_ownercheck(Oid stat_oid, Oid roleid)
+{
+	HeapTuple	tuple;
+	Oid			ownerId;
+
+	/* Superusers bypass all permission checking. */
+	if (superuser_arg(roleid))
+		return true;
+
+	tuple = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(stat_oid));
+	if (!HeapTupleIsValid(tuple))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics with OID %u does not exist", stat_oid)));
+
+	ownerId = ((Form_pg_mv_statistic) GETSTRUCT(tuple))->staowner;
+
+	ReleaseSysCache(tuple);
+
+	return has_privs_of_role(roleid, ownerId);
+}
+
+/*
  * Check whether specified role has CREATEROLE privilege (or is a superuser)
  *
  * Note: roles do not have owners per se; instead we use this test in
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index c48e37b..8200454 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -160,7 +161,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1272,6 +1274,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2415,6 +2421,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index e997b57..47ec8cc 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_constraint_fn.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1613,7 +1614,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1841,6 +1845,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2692,6 +2701,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..dfd5bef 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4201,3 +4201,54 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+									  PointerGetDatum(stats_name),
+									  ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index d2aaa6d..c13a569 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -438,9 +439,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		Anum_pg_mv_statistic_staowner,
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -640,6 +654,10 @@ static const struct object_type_map
 	/* OCLASS_TRANSFORM */
 	{
 		"transform", OBJECT_TRANSFORM
+	},
+	/* OBJECT_STATISTICS */
+	{
+		"statistics", OBJECT_STATISTICS
 	}
 };
 
@@ -913,6 +931,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2185,6 +2208,10 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			if (!pg_statistics_ownercheck(address.objectId, roleid))
+				aclcheck_error_type(ACLCHECK_NOT_OWNER, address.objectId);
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
@@ -3610,6 +3637,10 @@ getObjectTypeDescription(const ObjectAddress *object)
 			appendStringInfoString(&buffer, "transform");
 			break;
 
+		case OCLASS_STATISTICS:
+			appendStringInfoString(&buffer, "statistics");
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized %u", object->classId);
 			break;
@@ -4566,6 +4597,29 @@ getObjectIdentityParts(const ObjectAddress *object,
 			}
 			break;
 
+		case OCLASS_STATISTICS:
+			{
+				HeapTuple	tup;
+				Form_pg_mv_statistic formStatistic;
+				char	   *schema;
+
+				tup = SearchSysCache1(MVSTATOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for statistics %u",
+						 object->objectId);
+				formStatistic = (Form_pg_mv_statistic) GETSTRUCT(tup);
+				schema = get_namespace_name_or_temp(formStatistic->stanamespace);
+				appendStringInfoString(&buffer,
+									   quote_qualified_identifier(schema,
+											   NameStr(formStatistic->staname)));
+				if (objname)
+					*objname = list_make2(schema,
+									   pstrdup(NameStr(formStatistic->staname)));
+				ReleaseSysCache(tup);
+				break;
+			}
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 84aa061..31dbb2c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,6 +158,17 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..5151001 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 5af0f2f..89985499 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -359,6 +359,7 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
 		case OBJECT_LANGUAGE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -437,6 +438,7 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -745,6 +747,7 @@ ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABLESPACE:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSCONFIGURATION:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8a5f07c..9087532 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -17,6 +17,7 @@
 #include <math.h>
 
 #include "access/multixact.h"
+#include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tupconvert.h"
 #include "access/tuptoaster.h"
@@ -27,6 +28,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -45,10 +47,13 @@
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_rusage.h"
 #include "utils/sampling.h"
 #include "utils/sortsupport.h"
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..cd65b58 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -292,6 +292,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 9e32f8d..09061bb 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -110,6 +110,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1167,6 +1169,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_DEFACL:
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..f43b053
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,277 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/mvstats.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON table (columns) WITH (options)
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+	ObjectAddress	address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	ObjectAddress parentobject, childobject;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (stmt->if_not_exists &&
+		SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exists, skipping",
+						namestr)));
+		return InvalidObjectAddress;
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+
+	/* transform the column names to attnum values */
+
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, stmt->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+	values[Anum_pg_mv_statistic_staname -1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace -1] = ObjectIdGetDatum(namespaceId);
+	values[Anum_pg_mv_statistic_staowner-1] = ObjectIdGetDatum(GetUserId());
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also record dependency on the schema (to drop statistics on DROP SCHEMA)
+	 */
+	parentobject.classId = NamespaceRelationId;
+	parentobject.objectId = ObjectIdGetDatum(namespaceId);
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *     DROP STATISTICS stats_name ON table_name
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	Oid			relid;
+	Relation	rel;
+	HeapTuple	tup;
+	Form_pg_mv_statistic	mvstat;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(tup);
+	relid = mvstat->starelid;
+
+	rel = heap_open(relid, AccessExclusiveLock);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	CacheInvalidateRelcache(rel);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+	heap_close(rel, NoLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..3b7c87f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4124,6 +4124,20 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt  *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+	COPY_SCALAR_FIELD(if_not_exists);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4999,6 +5013,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb0fc1e..07206d7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2153,6 +2153,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3636,6 +3651,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index ad715bb..7fb2088 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -28,6 +28,7 @@
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -40,7 +41,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -94,6 +97,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -387,6 +391,61 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b9aeb31..eed9927 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -241,7 +241,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -809,6 +809,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3436,6 +3437,36 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $5;
+							n->keys = $7;
+							n->options = $9;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $8;
+							n->keys = $10;
+							n->options = $12;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -5621,6 +5652,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
@@ -7995,6 +8027,15 @@ RenameStmt: ALTER AGGREGATE func_name aggr_args RENAME TO name
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name RENAME TO name
+				{
+					RenameStmt *n = makeNode(RenameStmt);
+					n->renameType = OBJECT_STATISTICS;
+					n->object = $3;
+					n->newname = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 		;
 
 opt_column: COLUMN									{ $$ = COLUMN; }
@@ -8231,6 +8272,15 @@ AlterObjectSchemaStmt:
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name SET SCHEMA name
+				{
+					AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = $3;
+					n->newschema = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 		;
 
 /*****************************************************************************
@@ -8421,6 +8471,14 @@ AlterOwnerStmt: ALTER AGGREGATE func_name aggr_args OWNER TO RoleSpec
 					n->newowner = $7;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS name OWNER TO RoleSpec
+				{
+					AlterOwnerStmt *n = makeNode(AlterOwnerStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = list_make1(makeString($3));
+					n->newowner = $6;
+					$$ = (Node *)n;
+				}
 		;
 
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 045f7f0..96b58f8 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1520,6 +1520,10 @@ ProcessUtilitySlow(Node *parsetree,
 				address = ExecSecLabelStmt((SecLabelStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:	/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -1878,6 +1882,9 @@ AlterObjectTypeCommandTag(ObjectType objtype)
 		case OBJECT_MATVIEW:
 			tag = "ALTER MATERIALIZED VIEW";
 			break;
+		case OBJECT_STATISTICS:
+			tag = "ALTER STATISTICS";
+			break;
 		default:
 			tag = "???";
 			break;
@@ -2160,6 +2167,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_TRANSFORM:
 					tag = "DROP TRANSFORM";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2527,6 +2537,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 130c06d..3bc4c8a 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3956,6 +3957,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4920,6 +4977,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 65ffe84..3c1bc4b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -502,6 +503,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
new file mode 100644
index 0000000..1f96fbc
--- /dev/null
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -0,0 +1,222 @@
+Soft functional dependencies
+============================
+
+A type of multivariate statistics used to capture cases when one column (or
+possibly a combination of columns) determines values in another column. We may
+also say that one column implies the other one.
+
+A simple artificial example may be a table with two columns, created like this
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, once we know the value for column 'a' the value for 'b' is trivially
+determined, as it's simply (a/10). A more practical example may be addresses,
+where (ZIP code -> city name), i.e. once we know the ZIP, we probably know the
+city it belongs to, as ZIP codes are usually assigned to one city. Larger cities
+may have multiple ZIP codes, so the dependency can't be reversed.
+
+Functional dependencies are a concept well described in relational theory,
+particularly in definition of normalization and "normal forms". Wikipedia has a
+nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency on
+    a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee" table
+    that includes the attributes "Employee ID" and "Employee Date of Birth", the
+    functional dependency {Employee ID} -> {Employee Date of Birth} would hold.
+    It follows from the previous two sentences that each {Employee ID} is
+    associated with precisely one {Employee Date of Birth}.
+
+    [1] http://en.wikipedia.org/wiki/Database_normalization
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases it's actually a conscious
+design choice to model the dataset in denormalized way, either because of
+performance or to make querying easier.
+
+The functional dependencies are called 'soft' because the implementation is
+meant to allow small number of rows contradicting the dependency. Many actual
+data sets contain some sort of errors, either because of data entry mistakes
+(user mistyping the ZIP code) or issues in generating the data (e.g. a ZIP code
+mistakenly assigned to two cities in different states). A strict implementation
+would ignore dependencies on such noisy data, rendering the approach unusable on
+such data sets.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current build algorithm is rather simple - for each pair (a,b) of columns,
+the data are sorted lexicographically (first by 'a', then by 'b'). Then for each
+group (rows with the same 'a' value) we decide whether the group is neutral,
+supporting or contradicting the dependency (a->b).
+
+A group is considered neutral when it's too small - e.g. when there's a single
+row in the group, there can't possibly be multiple values in 'b'. For this
+reason we ignore groups smaller than a threshold (currently 3 rows).
+
+For sufficiently large groups (3 rows or more), we count the number of distinct
+values in 'b'. When there's a single 'b' value, the group is considered to
+support the dependency (a->b), otherwise it's condidered as contradicting it.
+
+At the end, we compare the number of rows in supporting and contradicting groups,
+and if there are at least 10x as many supporting rows, we consider the
+functional dependency to be valid.
+
+
+This approach has the negative property that the algorithm is that it's a bit
+fragile with respect to the sample - there may be data sets producing quite
+different results for each ANALYZE execution (as even a single row may change
+the outcome of the final 10x test).
+
+It was proposed to make the dependencies "fuzzy" - e.g. track some coefficient
+between [0,1] determining how much the dependency holds. That would however mean
+we have to keep all the dependencies, as eliminating them based on the value of
+the coefficient (e.g. throw away dependencies <= 0.5) would result in exactly
+the same fragility issues. This would also make it more complicated to combine
+dependencies. So this does not seem like a practical approach.
+
+A better approach might be to replace the constants (min_group_size=3 and 10x)
+with values somehow related to the particular data set.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Apllying the functional dependencies is quite simple - given a list of equality
+clauses, check which clauses are redundant (i.e. implied by some other clause).
+For example given clause list
+
+    (a = 2) AND (b = 2) AND (c = 3)
+
+and dependencies (a->b) and (a->d), the list of clauses may be simplified to
+
+    (a = 1) AND (c = 3)
+
+Functional dependencies may only be applied to equality clauses, all other types
+of clauses are ignored. See clauselist_apply_dependencies() for more details.
+
+
+Compatibility of clauses
+------------------------
+
+The reduction assumes the clauses really are redundant, and the value in the
+reduced clause (b=2) is the value determined by (a=1). If that's not the case
+and the values are "incompatible" the result will be over-estimation.
+
+This may happen for example when using conditions on ZIP and city name with
+mismatching values (ZIP for a different city), etc. In such case the result
+set will be empty, but we'll estimate the selectivity using the ZIP condition.
+
+In this case the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+
+Dependencies vs. MCV/histogram
+------------------------------
+
+In some cases the "compatibility" of the conditions might be verified using the
+other types of multivariate stats - MCV lists and histograms.
+
+For MCV lists the verification might be very simple - peek into the list if
+there are any items matching the clause on the 'a' column (e.g. ZIP code), and
+if such item is found, check that the 'b' column matches the other clause. If it
+does not, the clauses are contradictory. We can't really say if such item was
+not found, except maybe restricting the selectivity using the MCV data (e.g.
+using min/max selectivity, or something).
+
+With histograms, it might work similarly - we can't check the values directly
+(because histograms use buckets, unlike MCV lists, storing the actual values).
+So we can only observe the buckets matching the clauses - if those buckets have
+very low frequency, it probably means the two clauses are incompatible.
+
+It's unclear what 'low frequency' is, but if one of the clauses is implied
+(automatically true because of the other clause), then
+
+    selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+
+So we might compute selectivity of the first clause - for example using regular
+statistics. And then check if the selectivity computed from the histogram is
+about the same (or significantly lower).
+
+The problem is that histograms work well only when the data ordering matches the
+natural meaning. For values that serve as labels - like city names or ZIP codes,
+or even generated IDs, histograms really don't work all that well. For example
+sorting cities by name won't match the sorting of ZIP codes, rendering the
+histogram unusable.
+
+So MCVs are probably going to work much better, because they don't really assume
+any sort of ordering. And it's probably more appropriate for the label-like data.
+
+A good question however is why even use functional dependencies in such cases
+and not simply use the MCV/histogram instead. One reason is that the functional
+dependencies allow fallback to regular stats, and often produce more accurate
+estimates - especially compared to histograms, that are quite bad in estimating
+equality clauses.
+
+
+Limitations
+-----------
+
+Let's see the main liminations of functional dependencies, especially those
+related to the current implementation.
+
+The current implementation supports only dependencies between two columns, but
+this is merely a simplification of the initial implementation. It's certainly
+useful to mine for dependencies involving multiple columns on the 'left' side,
+i.e. a condition for the dependency. That is dependencies like (a,b -> c).
+
+The implementation may/should be smart enough not to mine redundant conditions,
+e.g. (a->b) and (a,c -> b), because the latter is a trivial consequence of the
+former one (if values of 'a' determine 'b', adding another column won't change
+that relationship). The ANALYZE should first analyze 1:1 dependencies, then 2:1
+dependencies (and skip the already identified ones), etc.
+
+For example the dependency
+
+    (city name -> zip code)
+
+is much stronger, i.e. whenever it hold, then
+
+    (city name, state name -> zip code)
+
+holds too. But in case there are cities with the same name in different states,
+then only the latter dependency will be valid.
+
+Of course, there probably are cities with the same name within a single state,
+but hopefully this is relatively rare occurence (and thus we'll still detect
+the 'soft' dependency).
+
+Handling multiple columns on the right side of the dependency, is not necessary,
+as those dependencies may be simply decomposed into a set of dependencies with
+the same meaning, one for each column on the right side. For example
+
+    (a -> b,c)
+
+is exactly the same as
+
+    (a -> b) & (a -> c)
+
+Of course, storing the first form may be more efficient thant storing multiple
+'simple' dependencies separately.
+
+
+TODO Support dependencies with multiple columns on left/right.
+
+TODO Investigate using histogram and MCV list to verify the dependencies.
+
+TODO Investigate statistical testing of the distribution (to decide whether it
+     makes sense to build the histogram/MCV list).
+
+TODO Using a min/max of selectivities would probably make more sense for the
+     associated columns.
+
+TODO Consider eliminating the implied columns from the histogram and MCV lists
+     (but maybe that's not a good idea, because that'd make it impossible to use
+     these stats for non-equality clauses and also it wouldn't be possible to
+     use the stats for verification of the dependencies).
+
+TODO The reduction probably might be extended to also handle IS NULL clauses,
+     assuming we fix the ANALYZE to properly handle NULL values. We however
+     won't be able to reduce IS NOT NULL (unless I'm missing something).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..a755c49
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,356 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..d96422d
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/sysattr.h"
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/fmgroids.h"
+#include "utils/mvstats.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+ 
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..2a064a0
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,437 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Detect functional dependencies between columns.
+ *
+ * TODO This builds a complete set of dependencies, i.e. including transitive
+ *      dependencies - if we identify [A => B] and [B => C], we're likely to
+ *      identify [A => C] too. It might be better to  keep only the minimal set
+ *      of dependencies, i.e. prune all the dependencies that we can recreate
+ *      by transivitity.
+ * 
+ *      There are two conceptual ways to do that:
+ * 
+ *      (a) generate all the rules, and then prune the rules that may be
+ *          recteated by combining other dependencies, or
+ * 
+ *      (b) performing the 'is combination of other dependencies' check before
+ *          actually doing the work
+ * 
+ *      The second option has the advantage that we don't really need to perform
+ *      the sort/count. It's not sufficient alone, though, because we may
+ *      discover the dependencies in the wrong order. For example we may find
+ *
+ *          (a -> b), (a -> c) and then (b -> c)
+ *
+ *      None of those dependencies is a combination of the already known ones,
+ *      yet (a -> C) is a combination of (a -> b) and (b -> c).
+ *
+ * 
+ * FIXME Currently we simply replace NULL values with 0 and then handle is as
+ *       a regular value, but that groups NULL and actual 0 values. That's
+ *       clearly incorrect - we need to handle NULL values as a separate value.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	int ndeps = 0;
+	MVDependencies	dependencies = NULL;
+	MultiSortSupport mss = multi_sort_init(2);	/* 2 dimensions for now */
+
+	/* TODO Maybe this should be somehow related to the number of
+	 *      distinct values in the two columns we're currently analyzing.
+	 *      Assuming the distribution is uniform, we can estimate the
+	 *      average group size and use it as a threshold. Or something
+	 *      like that. Seems better than a static approach.
+	 */
+	int min_group_size = 3;
+
+	/* dimension indexes we'll check for associations [a => b] */
+	int dima, dimb;
+
+	/*
+	 * We'll reuse the same array for all the 2-column combinations.
+	 *
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simples / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * 2);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * 2);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * 2];
+		items[i].isnull = &isnull[i * 2];
+	}
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * Evaluate all possible combinations of [A => B], using a simple algorithm:
+	 *
+	 * (a) sort the data by [A,B]
+	 * (b) split the data into groups by A (new group whenever a value changes)
+	 * (c) count different values in the B column (again, value changes)
+	 *
+	 * TODO It should be rather simple to merge [A => B] and [A => C] into
+	 *      [A => B,C]. Just keep A constant, collect all the "implied" columns
+	 *      and you're done.
+	 */
+	for (dima = 0; dima < numattrs; dima++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, 0, dima, stats);
+
+		for (dimb = 0; dimb < numattrs; dimb++)
+		{
+			SortItem current;
+
+			/* number of groups supporting / contradicting the dependency */
+			int n_supporting = 0;
+			int n_contradicting = 0;
+
+			/* counters valid within a group */
+			int group_size = 0;
+			int n_violations = 0;
+
+			int n_supporting_rows = 0;
+			int n_contradicting_rows = 0;
+
+			/* make sure the columns are different (A => A) */
+			if (dima == dimb)
+				continue;
+
+			/* prepare the sort function for the second dimension */
+			multi_sort_add_dimension(mss, 1, dimb, stats);
+
+			/* reset the values and isnull flags */
+			memset(values, 0, sizeof(Datum) * numrows * 2);
+			memset(isnull, 0, sizeof(bool)  * numrows * 2);
+
+			/* accumulate all the data for both columns into an array and sort it */
+			for (i = 0; i < numrows; i++)
+			{
+				items[i].values[0]
+					= heap_getattr(rows[i], attrs->values[dima],
+									stats[dima]->tupDesc, &items[i].isnull[0]);
+
+				items[i].values[1]
+					= heap_getattr(rows[i], attrs->values[dimb],
+									stats[dimb]->tupDesc, &items[i].isnull[1]);
+			}
+
+			qsort_arg((void *) items, numrows, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/*
+			 * Walk through the array, split it into rows according to
+			 * the A value, and count distinct values in the other one.
+			 * If there's a single B value for the whole group, we count
+			 * it as supporting the association, otherwise we count it
+			 * as contradicting.
+			 *
+			 * Furthermore we require a group to have at least a certain
+			 * number of rows to be considered useful for supporting the
+			 * dependency. But when it's contradicting, use it always useful.
+			 */
+
+			/* start with values from the first row */
+			current = items[0];
+			group_size  = 1;
+
+			for (i = 1; i < numrows; i++)
+			{
+				/* end of the group */
+				if (multi_sort_compare_dim(0, &items[i], &current, mss) != 0)
+				{
+					/*
+					 * If there are no contradicting rows, count it as
+					 * supporting (otherwise contradicting), but only if
+					 * the group is large enough.
+					 *
+					 * The requirement of a minimum group size makes it
+					 * impossible to identify [unique,unique] cases, but
+					 * that's probably a different case. This is more
+					 * about [zip => city] associations etc.
+					 *
+					 * If there are violations, count the group/rows as
+					 * a violation.
+					 *
+					 * It may ne neither, if the group is too small (does
+					 * not contain at least min_group_size rows).
+					 */
+					if ((n_violations == 0) && (group_size >= min_group_size))
+					{
+						n_supporting +=  1;
+						n_supporting_rows += group_size;
+					}
+					else if (n_violations > 0)
+					{
+						n_contradicting +=  1;
+						n_contradicting_rows += group_size;
+					}
+
+					/* current values start a new group */
+					n_violations = 0;
+					group_size = 0;
+				}
+				/* mismatch of a B value is contradicting */
+				else if (multi_sort_compare_dim(1, &items[i], &current, mss) != 0)
+				{
+					n_violations += 1;
+				}
+
+				current = items[i];
+				group_size += 1;
+			}
+
+			/* handle the last group (just like above) */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/*
+			 * See if the number of rows supporting the association is at least
+			 * 10x the number of rows violating the hypothetical dependency.
+			 *
+			 * TODO This is rather arbitrary limit - I guess it's possible to do
+			 *      some math to come up with a better rule (e.g. testing a hypothesis
+			 *      'this is due to randomness'). We can create a contingency table
+			 *      from the values and use it for testing. Possibly only when
+			 *      there are no contradicting rows?
+			 *
+			 * TODO Also, if (a => b) and (b => a) at the same time, it pretty much
+			 *      means there's a 1:1 relation (or one is a 'label'), making the
+			 *      conditions rather redundant. Although it's possible that the
+			 *      query uses incompatible combination of values.
+			 */
+			if (n_supporting_rows > (n_contradicting_rows * 10))
+			{
+				if (dependencies == NULL)
+				{
+					dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+					dependencies->magic = MVSTAT_DEPS_MAGIC;
+				}
+				else
+					dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+											+ sizeof(MVDependency) * (dependencies->ndeps + 1));
+
+				/* update the */
+				dependencies->deps[ndeps] = (MVDependency)palloc0(sizeof(MVDependencyData));
+				dependencies->deps[ndeps]->a = attrs->values[dima];
+				dependencies->deps[ndeps]->b = attrs->values[dimb];
+
+				dependencies->ndeps = (++ndeps);
+			}
+		}
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(stats);
+	pfree(mss);
+
+	return dependencies;
+}
+
+/*
+ * Store the dependencies into a bytea, so that it can be stored in the
+ * pg_mv_statistic catalog.
+ *
+ * Currently this only supports simple two-column rules, and stores them
+ * as a sequence of attnum pairs. In the future, this needs to be made
+ * more complex to support multiple columns on both sides of the
+ * implication (using AND on left, OR on right).
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+
+	/* we need to store ndeps, and each needs 2 * int16 */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+				+ dependencies->ndeps * (sizeof(int16) * 2);
+
+	bytea * output = (bytea*)palloc0(len);
+
+	char * tmp = VARDATA(output);
+
+	SET_VARSIZE(output, len);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* walk through the dependencies and copy both columns into the bytea */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		memcpy(tmp, &(dependencies->deps[i]->a), sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(tmp, &(dependencies->deps[i]->b), sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+	{
+		pfree(dependencies);
+		elog(WARNING, "not a MV Dependencies (magic number mismatch)");
+		return NULL;
+	}
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * sizeof(int16) * 2;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		dependencies->deps[i] = (MVDependency)palloc0(sizeof(MVDependencyData));
+
+		memcpy(&(dependencies->deps[i]->a), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+
+		memcpy(&(dependencies->deps[i]->b), tmp, sizeof(int16));
+		tmp += sizeof(int16);
+	}
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/* print the dependencies
+ *
+ * TODO  Would be nice if this knew the actual column names (instead of
+ *       the attnums).
+ *
+ * FIXME This is really ugly and does not really check the lengths and
+ *       strcpy/snprintf return values properly. Needs to be fixed.
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int			i = 0;
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result = NULL;
+	int			len = 0;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+		char	buffer[128];
+
+		int		tmp = snprintf(buffer, 128, "%s%d => %d",
+				((i == 0) ? "" : ", "), dependency->a, dependency->b);
+
+		if (tmp < 127)
+		{
+			if (result == NULL)
+				result = palloc0(len + tmp + 1);
+			else
+				result = repalloc(result, len + tmp + 1);
+
+			strcpy(result + len, buffer);
+			len += tmp;
+		}
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index fd8dc91..8ce9c0e 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2104,6 +2104,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90600)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 049bf9f..12211fe 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,10 +153,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 
 /* in dependency.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index b80d8d8..5ae42f7 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ab2c1a8..a768bb5 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId  3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..44cf9c6 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -137,6 +137,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..c74af47
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;			/* relation containing attributes */
+	NameData	staname;			/* statistics name */
+	Oid			stanamespace;		/* OID of namespace containing this statistics */
+	Oid			staowner;			/* statistics owner */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					8
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_staowner			4
+#define Anum_pg_mv_statistic_deps_enabled		5
+#define Anum_pg_mv_statistic_deps_built			6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_stadeps			8
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 5c71bce..ff2d797 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2666,6 +2666,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index b7a38ce..a52096b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3577, 3578);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 54f67e9..99a6a62 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -75,6 +75,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(List *name, List *args, bool oldstyle,
 				List *parameters, const char *queryString);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fad9988..545b62a 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -266,6 +266,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -401,6 +402,7 @@ typedef enum NodeTag
 	T_CreatePolicyStmt,
 	T_AlterPolicyStmt,
 	T_CreateTransformStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fd0629..e1807fb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -601,6 +601,17 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+	bool		if_not_exists;	/* just do nothing if statistics already exists? */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1410,6 +1421,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 641728b..e10dcf1 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -539,6 +539,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -634,6 +635,33 @@ typedef struct IndexOptInfo
 	void		(*amcostestimate) ();	/* AM's cost estimator */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/acl.h b/src/include/utils/acl.h
index 4e15a14..3e11253 100644
--- a/src/include/utils/acl.h
+++ b/src/include/utils/acl.h
@@ -330,6 +330,7 @@ extern bool pg_foreign_data_wrapper_ownercheck(Oid srv_oid, Oid roleid);
 extern bool pg_foreign_server_ownercheck(Oid srv_oid, Oid roleid);
 extern bool pg_event_trigger_ownercheck(Oid et_oid, Oid roleid);
 extern bool pg_extension_ownercheck(Oid ext_oid, Oid roleid);
+extern bool pg_statistics_ownercheck(Oid stat_oid, Oid roleid);
 extern bool has_createrole_privilege(Oid roleid);
 extern bool has_bypassrls_privilege(Oid roleid);
 
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..7ebd961
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/* An associative rule, tracking [a => b] dependency.
+ *
+ * TODO Make this work with multiple columns on both sides.
+ */
+typedef struct MVDependencyData {
+	int16	a;
+	int16	b;
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..8771f9c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -61,6 +61,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -93,6 +94,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 1b48304..9f03c8d 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..0e0658d 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index 75751be..eb60960 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -35,6 +35,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regtest_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
+CREATE STATISTICS addr_nsp.gentable_stat ON addr_nsp.gentable(a,b) WITH (dependencies);
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
@@ -373,7 +374,8 @@ WITH objects (type, name, args) AS (VALUES
 				-- extension
 				-- event trigger
 				('policy', '{addr_nsp, gentable, genpol}', '{}'),
-				('transform', '{int}', '{sql}')
+				('transform', '{int}', '{sql}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
@@ -420,13 +422,14 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
  trigger                   |            |                   | t on addr_nsp.gentable                                               | t
  operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops USING btree                                   | t
  policy                    |            |                   | genpol on addr_nsp.gentable                                          | t
+ statistics                | addr_nsp   | gentable_stat     | addr_nsp.gentable_stat                                               | t
  collation                 | pg_catalog | "default"         | pg_catalog."default"                                                 | t
  transform                 |            |                   | for integer on language sql                                          | t
  text search dictionary    | addr_nsp   | addr_ts_dict      | addr_nsp.addr_ts_dict                                                | t
  text search parser        | addr_nsp   | addr_ts_prs       | addr_nsp.addr_ts_prs                                                 | t
  text search configuration | addr_nsp   | addr_ts_conf      | addr_nsp.addr_ts_conf                                                | t
  text search template      | addr_nsp   | addr_ts_temp      | addr_nsp.addr_ts_temp                                                | t
-(41 rows)
+(42 rows)
 
 ---
 --- Cleanup resources
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 22ea06c..06f2231 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1368,6 +1368,15 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index 68e7cb0..3775b28 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -39,6 +39,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regtest_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
+CREATE STATISTICS addr_nsp.gentable_stat ON addr_nsp.gentable(a,b) WITH (dependencies);
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
@@ -166,7 +167,8 @@ WITH objects (type, name, args) AS (VALUES
 				-- extension
 				-- event trigger
 				('policy', '{addr_nsp, gentable, genpol}', '{}'),
-				('transform', '{int}', '{sql}')
+				('transform', '{int}', '{sql}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
-- 
2.5.0

0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchtext/x-patch; charset=UTF-8; name=0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchDownload

From 771376751dfba0f469f6830c2a9eb545d1e25235 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 28 Feb 2016 21:16:40 +0100
Subject: [PATCH 9/9] fixup of regression tests (plans changes by group by
 estimation)

---
 src/test/regress/expected/join.out      | 18 ++++++++++--------
 src/test/regress/expected/subselect.out | 25 +++++++++++--------------
 src/test/regress/expected/union.out     | 16 ++++++++--------
 3 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cafbc5e..151402d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3965,18 +3965,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
+                 QUERY PLAN                  
+---------------------------------------------
  Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+   Merge Cond: (s.id = d.a)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+(11 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index de64ca7..0fc93d9 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -807,27 +807,24 @@ select * from int4_tbl where
 explain (verbose, costs off)
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
-                              QUERY PLAN                              
-----------------------------------------------------------------------
- Hash Join
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Hash Semi Join
    Output: o.f1
    Hash Cond: (o.f1 = "ANY_subquery".f1)
    ->  Seq Scan on public.int4_tbl o
          Output: o.f1
    ->  Hash
          Output: "ANY_subquery".f1, "ANY_subquery".g
-         ->  HashAggregate
+         ->  Subquery Scan on "ANY_subquery"
                Output: "ANY_subquery".f1, "ANY_subquery".g
-               Group Key: "ANY_subquery".f1, "ANY_subquery".g
-               ->  Subquery Scan on "ANY_subquery"
-                     Output: "ANY_subquery".f1, "ANY_subquery".g
-                     Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
-                     ->  HashAggregate
-                           Output: i.f1, (generate_series(1, 2) / 10)
-                           Group Key: i.f1
-                           ->  Seq Scan on public.int4_tbl i
-                                 Output: i.f1
-(18 rows)
+               Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+               ->  HashAggregate
+                     Output: i.f1, (generate_series(1, 2) / 10)
+                     Group Key: i.f1
+                     ->  Seq Scan on public.int4_tbl i
+                           Output: i.f1
+(15 rows)
 
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 016571b..f2e297e 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -263,16 +263,16 @@ ORDER BY 1;
 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl;
         q2        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl;
         q2        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1;
@@ -305,16 +305,16 @@ SELECT q1 FROM int8_tbl EXCEPT SELECT q2 FROM int8_tbl;
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl;
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl;
         q1        
 ------------------
+              123
  4567890123456789
  4567890123456789
-              123
 (3 rows)
 
 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q1 FROM int8_tbl FOR NO KEY UPDATE;
@@ -343,8 +343,8 @@ SELECT f1 FROM float8_tbl EXCEPT SELECT f1 FROM int4_tbl ORDER BY 1;
 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -355,15 +355,15 @@ SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FR
 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl;
         q1         
 -------------------
-  4567890123456789
                123
+  4567890123456789
                456
   4567890123456789
                123
@@ -419,8 +419,8 @@ HINT:  There is a column named "q2" in table "*SELECT* 2", but it cannot be refe
 SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1)));
         q1        
 ------------------
- 4567890123456789
               123
+ 4567890123456789
 (2 rows)
 
 --
-- 
2.5.0

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchtext/x-patch; charset=UTF-8; name=0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchDownload

From e073a4d8368a0b0c66cd2933e6a7a210c1b5c53f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/9] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 292e1f4..9228a46 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -196,6 +196,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -244,6 +251,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.5.0

#82

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#81)

Re: multivariate statistics v14

Instead of simply multiplying the ndistinct estimate with selecticity,
we instead use the formula for the expected number of distinct values
observed in 'k' rows when there are 'd' distinct values in the bin

d * (1 - ((d - 1) / d)^k)

This is 'with replacements' which seems appropriate for the use, and it
mostly assumes uniform distribution of the distinct values. So if the
distribution is not uniform (e.g. there are very frequent groups) this
may be less accurate than the current algorithm in some cases, giving
over-estimates. But that's probably better than OOM.
---
src/backend/utils/adt/selfuncs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f8d39aa..6eceedf 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3466,7 +3466,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
/*
* Multiply by restriction selectivity.
*/
-			reldistinct *= rel->rows / rel->tuples;
+			reldistinct = reldistinct * (1 - powl((reldistinct - 1) / reldistinct,rel->rows));

Why do you change "*=" style? I see no reason to change this.

reldistinct *= 1 - powl((reldistinct - 1) / reldistinct, rel->rows);

Looks better to me because it's shorter and cleaner.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#81)

Re: multivariate statistics v14

I apology if it's already discussed. I am new to this patch.

Attached is v15 of the patch series, fixing this and also doing quite a
few additional improvements:

* added some basic examples into the SGML documentation

* addressing the objectaddress omissions, as pointed out by Alvaro

* support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA

* significant refactoring of MCV and histogram code, particularly
serialization, deserialization and building

* reworking the functional dependencies to support more complex
dependencies, with multiple columns as 'conditions'

* the reduction using functional dependencies is also significantly
simplified (I decided to get rid of computing the transitive closure
for now - it got too complex after the multi-condition dependencies,
so I'll leave that for the future

Do you have any other missing parts in this work? I am asking because
I wonder if you want to push this into 9.6 or rather 9.7.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#84

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 10 years ago

In reply to: Tomas Vondra (#81)

Re: multivariate statistics v14

Hello, I returned to this.

At Sun, 13 Mar 2016 22:59:38 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <1457906378.27231.10.camel@2ndquadrant.com>

Oh, yeah. There was an extra pfree().

Attached is v15 of the patch series, fixing this and also doing quite a
few additional improvements:

* added some basic examples into the SGML documentation

* addressing the objectaddress omissions, as pointed out by Alvaro

* support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA

* significant refactoring of MCV and histogram code, particularly
serialization, deserialization and building

* reworking the functional dependencies to support more complex
dependencies, with multiple columns as 'conditions'

* the reduction using functional dependencies is also significantly
simplified (I decided to get rid of computing the transitive closure
for now - it got too complex after the multi-condition dependencies,
so I'll leave that for the future

Many trailing white spaces found.

0002

2014 should be 2016?

This patch defines many "magic"s for many structs, but
magic(number)s seems to be used to identify file or buffer page
in PostgreSQL. They wouldn't be needed if you don't intend to
dig out or identify the orphan memory blocks of mvstats.

+ MVDependency deps[1]; /* XXX why not a pointer? */

MVDependency seems to be a pointer type.

+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
and
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));

seem to be contradicting.

.. Sorry, time is up..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#85

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Kyotaro HORIGUCHI (#84)

9 attachment(s)

Re: multivariate statistics v14

On 03/16/2016 09:31 AM, Kyotaro HORIGUCHI wrote:

Hello, I returned to this.

At Sun, 13 Mar 2016 22:59:38 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <1457906378.27231.10.camel@2ndquadrant.com>

Oh, yeah. There was an extra pfree().

Attached is v15 of the patch series, fixing this and also doing quite a
few additional improvements:

* added some basic examples into the SGML documentation

* addressing the objectaddress omissions, as pointed out by Alvaro

* support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA

* significant refactoring of MCV and histogram code, particularly
serialization, deserialization and building

* reworking the functional dependencies to support more complex
dependencies, with multiple columns as 'conditions'

* the reduction using functional dependencies is also significantly
simplified (I decided to get rid of computing the transitive closure
for now - it got too complex after the multi-condition dependencies,
so I'll leave that for the future

Many trailing white spaces found.

Sorry, haven't noticed that after one of the rebases. Fixed in the
attached v15 of the patch.

0002

+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group

2014 should be 2016?

Yes, the copyright info will need some tweaks. There's a few other files
with 2015, and I think the start should be the current year (and not 1996).

This patch defines many "magic"s for many structs, but
magic(number)s seems to be used to identify file or buffer page
in PostgreSQL. They wouldn't be needed if you don't intend to
dig out or identify the orphan memory blocks of mvstats.

+ MVDependency deps[1]; /* XXX why not a pointer? */

MVDependency seems to be a pointer type.

Right, but we need an array of the structures here, so one way is to use
a pointer and the other one is using variable-length field. Will remove
the comment, I think the structure is fine as is.

+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
and
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));

seem to be contradicting.

Nope, because the first check is in a loop where 'numcols' is used as an
index into an array with MVSTATS_MAX_DIMENSIONS elements.

.. Sorry, time is up..

Thanks for the comments!

Attached is v15 of the patch, that also fixes one mistake - after
reworking the functional dependencies to support multiple columns on the
left side (as conditions), I failed to move it to the proper place in
the patch series. So 0002 built the dependencies in the old way and 0003
changed it to the new one. That was pointless and added another 20kB to
the patch, so v15 moves the new code to 0002.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchtext/x-patch; name=0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patchDownload

From 5de240e541a0893ed945b16ec1fe23522c00ae61 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 28 Apr 2015 19:56:33 +0200
Subject: [PATCH 1/9] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 292e1f4..9228a46 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -196,6 +196,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -244,6 +251,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.5.0

0002-shared-infrastructure-and-functional-dependencies.patchtext/x-patch; name=0002-shared-infrastructure-and-functional-dependencies.patchDownload

From 6cf5d3b456bb294bb033b9e1e2eb545cfd4c1739 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 19:51:48 +0100
Subject: [PATCH 2/9] shared infrastructure and functional dependencies

Basic infrastructure shared by all kinds of multivariate stats, most
importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON table (columns) WITH (options)
- DROP STATISTICS name
- ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME
- implementation of functional dependencies (the simplest type of
  multivariate statistics)
- building functional dependencies in ANALYZE
- updates existing regression tests (new catalog etc.)
- adds a new regression test for functional dependencies

This does not include any changes to the optimizer, i.e. it does not
influence the query planning (subject to follow-up patches).

The current implementation requires a valid 'ltopr' for the columns, so
that we can sort the sample rows in various ways, both in this patch
and other kinds of statistics. Maybe this restriction could be relaxed
in the future, requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Maybe some of the stats (functional dependencies and MCV list with
limited functionality) might be made to work with hashes of the values,
which is sufficient for equality comparisons. But the queries would
require the equality operator anyway, so it's not really a weaker
requirement. The hashes might reduce space requirements, though.

The algorithm detecting the dependencies is rather simple and probably
needs improvements, so that it detects more complicated dependencies,
and also validation of the math.

The name 'functional dependencies' is more correct (than 'association
rules') as it's exactly the name used in relational theory (esp. Normal
Forms) for tracking column-level dependencies.

The multivariate statistics are automatically removed in two situations

 (a) after a DROP TABLE (obviously)

 (b) after ALTER TABLE ... DROP COLUMN, if the statistics would be
     defined on less than 2 columns (remaining)

If there are more at least two remaining columns, we keep the
statistics but perform cleanup on the next ANALYZE. The dropped columns
are removed from stakeys, and the new statistics is built on the
smaller set.

We can't do this at DROP COLUMN, because that'd leave us with invalid
statistics, or we'd have to throw it away although we can still use it.
This lazy approach lets us use the statistics although some of the
columns are dead.

This also adds a simple list of statistics to \d in psql.

This means the statistics are created within a schema by using a
qualified name (or using the default schema)

   CREATE STATISTICS schema.statistics ON ...

and then dropped by specifying qualified name

   DROP STATISTICS schema.statistics

or searching through search_path (just like with other objects).

This also gets rid of the "(opt_)stats_name" definitions in gram.y and
instead replaces them with just "opt_any_name", although the optional
case is not really handled currently - there's no generated name yet
(so either we should drop it or implement it).

I'm not entirely sure making statistics schema-specific is that a great
idea. Maybe it should be "global", but that does not seem right (e.g.
it makes multi-tenant systems based on schemas more difficult to
manage, because tenants would interact).
---
 doc/src/sgml/ref/allfiles.sgml                |   3 +
 doc/src/sgml/ref/alter_statistics.sgml        | 115 +++++
 doc/src/sgml/ref/create_statistics.sgml       | 198 ++++++++
 doc/src/sgml/ref/drop_statistics.sgml         |  91 ++++
 doc/src/sgml/reference.sgml                   |   2 +
 src/backend/catalog/Makefile                  |   1 +
 src/backend/catalog/aclchk.c                  |  27 +
 src/backend/catalog/dependency.c              |  11 +-
 src/backend/catalog/heap.c                    | 102 ++++
 src/backend/catalog/namespace.c               |  51 ++
 src/backend/catalog/objectaddress.c           |  54 ++
 src/backend/catalog/system_views.sql          |  11 +
 src/backend/commands/Makefile                 |   6 +-
 src/backend/commands/alter.c                  |   3 +
 src/backend/commands/analyze.c                |  21 +
 src/backend/commands/dropcmds.c               |   4 +
 src/backend/commands/event_trigger.c          |   3 +
 src/backend/commands/statscmds.c              | 277 +++++++++++
 src/backend/nodes/copyfuncs.c                 |  17 +
 src/backend/nodes/outfuncs.c                  |  18 +
 src/backend/optimizer/util/plancat.c          |  59 +++
 src/backend/parser/gram.y                     |  60 ++-
 src/backend/tcop/utility.c                    |  14 +
 src/backend/utils/Makefile                    |   2 +-
 src/backend/utils/cache/relcache.c            |  59 +++
 src/backend/utils/cache/syscache.c            |  23 +
 src/backend/utils/mvstats/Makefile            |  17 +
 src/backend/utils/mvstats/README.dependencies | 222 +++++++++
 src/backend/utils/mvstats/common.c            | 376 ++++++++++++++
 src/backend/utils/mvstats/common.h            |  78 +++
 src/backend/utils/mvstats/dependencies.c      | 686 ++++++++++++++++++++++++++
 src/bin/psql/describe.c                       |  44 ++
 src/include/catalog/dependency.h              |   5 +-
 src/include/catalog/heap.h                    |   1 +
 src/include/catalog/indexing.h                |   7 +
 src/include/catalog/namespace.h               |   2 +
 src/include/catalog/pg_mv_statistic.h         |  75 +++
 src/include/catalog/pg_proc.h                 |   5 +
 src/include/catalog/toasting.h                |   1 +
 src/include/commands/defrem.h                 |   4 +
 src/include/nodes/nodes.h                     |   2 +
 src/include/nodes/parsenodes.h                |  12 +
 src/include/nodes/relation.h                  |  28 ++
 src/include/utils/acl.h                       |   1 +
 src/include/utils/mvstats.h                   |  71 +++
 src/include/utils/rel.h                       |   4 +
 src/include/utils/relcache.h                  |   1 +
 src/include/utils/syscache.h                  |   2 +
 src/test/regress/expected/mv_dependencies.out | 150 ++++++
 src/test/regress/expected/object_address.out  |   7 +-
 src/test/regress/expected/rules.out           |   9 +
 src/test/regress/expected/sanity_check.out    |   1 +
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 142 ++++++
 src/test/regress/sql/object_address.sql       |   4 +-
 55 files changed, 3179 insertions(+), 11 deletions(-)
 create mode 100644 doc/src/sgml/ref/alter_statistics.sgml
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/README.dependencies
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index bf95453..524ed83 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -32,6 +32,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY alterServer        SYSTEM "alter_server.sgml">
 <!ENTITY alterSequence      SYSTEM "alter_sequence.sgml">
 <!ENTITY alterSystem        SYSTEM "alter_system.sgml">
+<!ENTITY alterStatistics    SYSTEM "alter_statistics.sgml">
 <!ENTITY alterTable         SYSTEM "alter_table.sgml">
 <!ENTITY alterTableSpace    SYSTEM "alter_tablespace.sgml">
 <!ENTITY alterTSConfig      SYSTEM "alter_tsconfig.sgml">
@@ -76,6 +77,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
 <!ENTITY createTableSpace   SYSTEM "create_tablespace.sgml">
@@ -119,6 +121,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
 <!ENTITY dropTransform      SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/alter_statistics.sgml b/doc/src/sgml/ref/alter_statistics.sgml
new file mode 100644
index 0000000..aa421c0
--- /dev/null
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -0,0 +1,115 @@
+<!--
+doc/src/sgml/ref/alter_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-ALTERSTATISTICS">
+ <indexterm zone="sql-alterstatistics">
+  <primary>ALTER STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>ALTER STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>ALTER STATISTICS</refname>
+  <refpurpose>
+   change the definition of a multivariate statistics
+  </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> | CURRENT_USER | SESSION_USER }
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> RENAME TO <replaceable class="parameter">new_name</replaceable>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> SET SCHEMA <replaceable class="parameter">new_schema</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>ALTER STATISTICS</command> changes the parameters of an existing
+   multivariate statistics.  Any parameters not specifically set in the
+   <command>ALTER STATISTICS</command> command retain their prior settings.
+  </para>
+
+  <para>
+   You must own the statistics to use <command>ALTER STATISTICS</>.
+   To change a statistics' schema, you must also have <literal>CREATE</>
+   privilege on the new schema.
+   To alter the owner, you must also be a direct or indirect member of the new
+   owning role, and that role must have <literal>CREATE</literal> privilege on
+   the statistics' schema.  (These restrictions enforce that altering the owner
+   doesn't do anything you couldn't do by dropping and recreating the statistics.
+   However, a superuser can alter ownership of any statistics anyway.)
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><replaceable class="parameter">name</replaceable></term>
+      <listitem>
+       <para>
+        The name (optionally schema-qualified) of a statistics to be altered.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="PARAMETER">new_owner</replaceable></term>
+      <listitem>
+       <para>
+        The user name of the new owner of the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+   
+     <varlistentry>
+      <term><replaceable class="parameter">new_name</replaceable></term>
+      <listitem>
+       <para>
+        The new name for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+   
+     <varlistentry>
+      <term><replaceable class="parameter">new_schema</replaceable></term>
+      <listitem>
+       <para>
+        The new schema for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+  </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>ALTER STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..ff09fa5
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,198 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON <replaceable class="PARAMETER">table_name</replaceable> ( [
+  { <replaceable class="PARAMETER">column_name</replaceable> } ] [, ...])
+[ WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>WITH ( <replaceable class="PARAMETER">statistics_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )</literal></term>
+    <listitem>
+     <para>
+      ...
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Statistics Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>statistics parameters</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-notes">
+  <title>Notes</title>
+
+    <para>
+     ...
+    </para>
+
+ </refsect1>
+
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   Create table <structname>t1</> with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for detemining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
+
+<programlisting>
+CREATE TABLE t1 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t1 SELECT i/100, i/500
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s1 ON t1 (a, b) WITH (dependencies);
+
+ANALYZE t1;
+
+-- valid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
+
+-- invalid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 2);
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..dd9047a
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,91 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics does not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 03020df..2b07b2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -104,6 +104,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createTable;
    &createTableAs;
    &createTableSpace;
@@ -147,6 +148,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropTable;
    &dropTableSpace;
    &dropTSConfig;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 25130ec..058b8a9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 0f3bc07..e21aacd 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -38,6 +38,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -5021,6 +5022,32 @@ pg_extension_ownercheck(Oid ext_oid, Oid roleid)
 }
 
 /*
+ * Ownership check for a multivariate statistics (specified by OID).
+ */
+bool
+pg_statistics_ownercheck(Oid stat_oid, Oid roleid)
+{
+	HeapTuple	tuple;
+	Oid			ownerId;
+
+	/* Superusers bypass all permission checking. */
+	if (superuser_arg(roleid))
+		return true;
+
+	tuple = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(stat_oid));
+	if (!HeapTupleIsValid(tuple))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics with OID %u does not exist", stat_oid)));
+
+	ownerId = ((Form_pg_mv_statistic) GETSTRUCT(tuple))->staowner;
+
+	ReleaseSysCache(tuple);
+
+	return has_privs_of_role(roleid, ownerId);
+}
+
+/*
  * Check whether specified role has CREATEROLE privilege (or is a superuser)
  *
  * Note: roles do not have owners per se; instead we use this test in
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index c48e37b..8200454 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -160,7 +161,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1272,6 +1274,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2415,6 +2421,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index e997b57..47ec8cc 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_constraint_fn.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
@@ -1613,7 +1614,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1841,6 +1845,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2692,6 +2701,99 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single
+	 * remaining (undropped column). To do that, we need the tuple
+	 * descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER
+	 * TABLE ... DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16*)ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((! tupdesc->attrs[attnums[i]-1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..dfd5bef 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4201,3 +4201,54 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+									  PointerGetDatum(stats_name),
+									  ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index d2aaa6d..c13a569 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -438,9 +439,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		Anum_pg_mv_statistic_staowner,
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -640,6 +654,10 @@ static const struct object_type_map
 	/* OCLASS_TRANSFORM */
 	{
 		"transform", OBJECT_TRANSFORM
+	},
+	/* OBJECT_STATISTICS */
+	{
+		"statistics", OBJECT_STATISTICS
 	}
 };
 
@@ -913,6 +931,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2185,6 +2208,10 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			if (!pg_statistics_ownercheck(address.objectId, roleid))
+				aclcheck_error_type(ACLCHECK_NOT_OWNER, address.objectId);
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
@@ -3610,6 +3637,10 @@ getObjectTypeDescription(const ObjectAddress *object)
 			appendStringInfoString(&buffer, "transform");
 			break;
 
+		case OCLASS_STATISTICS:
+			appendStringInfoString(&buffer, "statistics");
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized %u", object->classId);
 			break;
@@ -4566,6 +4597,29 @@ getObjectIdentityParts(const ObjectAddress *object,
 			}
 			break;
 
+		case OCLASS_STATISTICS:
+			{
+				HeapTuple	tup;
+				Form_pg_mv_statistic formStatistic;
+				char	   *schema;
+
+				tup = SearchSysCache1(MVSTATOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for statistics %u",
+						 object->objectId);
+				formStatistic = (Form_pg_mv_statistic) GETSTRUCT(tup);
+				schema = get_namespace_name_or_temp(formStatistic->stanamespace);
+				appendStringInfoString(&buffer,
+									   quote_qualified_identifier(schema,
+											   NameStr(formStatistic->staname)));
+				if (objname)
+					*objname = list_make2(schema,
+									   pstrdup(NameStr(formStatistic->staname)));
+				ReleaseSysCache(tup);
+				break;
+			}
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 84aa061..31dbb2c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -158,6 +158,17 @@ CREATE VIEW pg_indexes AS
          LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
     WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(S.stadeps) as depsbytes,
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..5151001 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 5af0f2f..89985499 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -359,6 +359,7 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
 		case OBJECT_LANGUAGE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -437,6 +438,7 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -745,6 +747,7 @@ ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABLESPACE:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSCONFIGURATION:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8a5f07c..9087532 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -17,6 +17,7 @@
 #include <math.h>
 
 #include "access/multixact.h"
+#include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tupconvert.h"
 #include "access/tuptoaster.h"
@@ -27,6 +28,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -45,10 +47,13 @@
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_rusage.h"
 #include "utils/sampling.h"
 #include "utils/sortsupport.h"
@@ -460,6 +465,19 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 	 * all analyzable columns.  We use a lower bound of 100 rows to avoid
 	 * possible overflow in Vitter's algorithm.  (Note: that will also be the
 	 * target in the corner case where there are no analyzable columns.)
+	 *
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
 	 */
 	targrows = 100;
 	for (i = 0; i < attr_cnt; i++)
@@ -562,6 +580,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..cd65b58 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -292,6 +292,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 9e32f8d..09061bb 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -110,6 +110,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1167,6 +1169,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_DEFACL:
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..f43b053
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,277 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/mvstats.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON table (columns) WITH (options)
+ *
+ * TODO Check that the types support sort, although maybe we can live
+ *      without it (and only build MCV list / association rules).
+ *
+ * TODO This should probably check for duplicate stats (i.e. same
+ *      keys, same options). Although maybe it's useful to have
+ *      multiple stats on the same columns with different options
+ *      (say, a detailed MCV-only stats for some queries, histogram
+ *      for others, etc.)
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i, j;
+	ListCell   *l;
+	int16		attnums[INDEX_MAX_KEYS];
+	int			numcols = 0;
+	ObjectAddress	address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	ObjectAddress parentobject, childobject;
+
+	/* by default build nothing */
+	bool 	build_dependencies = false;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (stmt->if_not_exists &&
+		SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exists, skipping",
+						namestr)));
+		return InvalidObjectAddress;
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+
+	/* transform the column names to attnum values */
+
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(RelationGetRelid(rel), attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+					 errmsg("column \"%s\" referenced in statistics does not exist",
+							attname)));
+
+		/* more than MVHIST_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check the lower bound (at least 2 columns), the upper bound was
+	 * already checked in the loop.
+	 */
+	if (numcols < 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("multivariate stats require 2 or more columns")));
+
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));
+
+	/* parse the statistics options */
+	foreach (l, stmt->options)
+	{
+		DefElem *opt = (DefElem*)lfirst(l);
+
+		if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* check that at least some statistics were requested */
+	if (! build_dependencies)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies) was requested")));
+
+	/* sort the attnums and build int2vector */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Okay, let's create the pg_mv_statistic entry.
+	 */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* no stats collected yet, so just the keys */
+	values[Anum_pg_mv_statistic_starelid-1] = ObjectIdGetDatum(RelationGetRelid(rel));
+	values[Anum_pg_mv_statistic_staname -1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace -1] = ObjectIdGetDatum(namespaceId);
+	values[Anum_pg_mv_statistic_staowner-1] = ObjectIdGetDatum(GetUserId());
+
+	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
+
+	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+
+	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also record dependency on the schema (to drop statistics on DROP SCHEMA)
+	 */
+	parentobject.classId = NamespaceRelationId;
+	parentobject.objectId = ObjectIdGetDatum(namespaceId);
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *     DROP STATISTICS stats_name ON table_name
+ *
+ * The first one requires an exact match, the second one just drops
+ * all the statistics on a table.
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	Oid			relid;
+	Relation	rel;
+	HeapTuple	tup;
+	Form_pg_mv_statistic	mvstat;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(tup);
+	relid = mvstat->starelid;
+
+	rel = heap_open(relid, AccessExclusiveLock);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	CacheInvalidateRelcache(rel);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+	heap_close(rel, NoLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df7c2fa..3b7c87f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4124,6 +4124,20 @@ _copyAlterPolicyStmt(const AlterPolicyStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt  *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
+	COPY_SCALAR_FIELD(if_not_exists);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -4999,6 +5013,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb0fc1e..07206d7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2153,6 +2153,21 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(deps_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(deps_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3636,6 +3651,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index ad715bb..7fb2088 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -28,6 +28,7 @@
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -40,7 +41,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -94,6 +97,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -387,6 +391,61 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->deps_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->deps_enabled = mvstat->deps_enabled;
+
+				/* built/available statistics */
+				info->deps_built = mvstat->deps_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+										 Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b9aeb31..eed9927 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -241,7 +241,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -809,6 +809,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3436,6 +3437,36 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $5;
+							n->keys = $7;
+							n->options = $9;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON qualified_name '(' columnList ')' opt_reloptions
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $8;
+							n->keys = $10;
+							n->options = $12;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -5621,6 +5652,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
@@ -7995,6 +8027,15 @@ RenameStmt: ALTER AGGREGATE func_name aggr_args RENAME TO name
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name RENAME TO name
+				{
+					RenameStmt *n = makeNode(RenameStmt);
+					n->renameType = OBJECT_STATISTICS;
+					n->object = $3;
+					n->newname = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 		;
 
 opt_column: COLUMN									{ $$ = COLUMN; }
@@ -8231,6 +8272,15 @@ AlterObjectSchemaStmt:
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name SET SCHEMA name
+				{
+					AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = $3;
+					n->newschema = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 		;
 
 /*****************************************************************************
@@ -8421,6 +8471,14 @@ AlterOwnerStmt: ALTER AGGREGATE func_name aggr_args OWNER TO RoleSpec
 					n->newowner = $7;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS name OWNER TO RoleSpec
+				{
+					AlterOwnerStmt *n = makeNode(AlterOwnerStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = list_make1(makeString($3));
+					n->newowner = $6;
+					$$ = (Node *)n;
+				}
 		;
 
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 045f7f0..96b58f8 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1520,6 +1520,10 @@ ProcessUtilitySlow(Node *parsetree,
 				address = ExecSecLabelStmt((SecLabelStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:	/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -1878,6 +1882,9 @@ AlterObjectTypeCommandTag(ObjectType objtype)
 		case OBJECT_MATVIEW:
 			tag = "ALTER MATERIALIZED VIEW";
 			break;
+		case OBJECT_STATISTICS:
+			tag = "ALTER STATISTICS";
+			break;
 		default:
 			tag = "???";
 			break;
@@ -2160,6 +2167,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_TRANSFORM:
 					tag = "DROP TRANSFORM";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2527,6 +2537,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 130c06d..3bc4c8a 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -47,6 +47,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -3956,6 +3957,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -4920,6 +4977,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 65ffe84..3c1bc4b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -502,6 +503,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..099f1ed
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o dependencies.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
new file mode 100644
index 0000000..1f96fbc
--- /dev/null
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -0,0 +1,222 @@
+Soft functional dependencies
+============================
+
+A type of multivariate statistics used to capture cases when one column (or
+possibly a combination of columns) determines values in another column. We may
+also say that one column implies the other one.
+
+A simple artificial example may be a table with two columns, created like this
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, once we know the value for column 'a' the value for 'b' is trivially
+determined, as it's simply (a/10). A more practical example may be addresses,
+where (ZIP code -> city name), i.e. once we know the ZIP, we probably know the
+city it belongs to, as ZIP codes are usually assigned to one city. Larger cities
+may have multiple ZIP codes, so the dependency can't be reversed.
+
+Functional dependencies are a concept well described in relational theory,
+particularly in definition of normalization and "normal forms". Wikipedia has a
+nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency on
+    a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee" table
+    that includes the attributes "Employee ID" and "Employee Date of Birth", the
+    functional dependency {Employee ID} -> {Employee Date of Birth} would hold.
+    It follows from the previous two sentences that each {Employee ID} is
+    associated with precisely one {Employee Date of Birth}.
+
+    [1] http://en.wikipedia.org/wiki/Database_normalization
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases it's actually a conscious
+design choice to model the dataset in denormalized way, either because of
+performance or to make querying easier.
+
+The functional dependencies are called 'soft' because the implementation is
+meant to allow small number of rows contradicting the dependency. Many actual
+data sets contain some sort of errors, either because of data entry mistakes
+(user mistyping the ZIP code) or issues in generating the data (e.g. a ZIP code
+mistakenly assigned to two cities in different states). A strict implementation
+would ignore dependencies on such noisy data, rendering the approach unusable on
+such data sets.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current build algorithm is rather simple - for each pair (a,b) of columns,
+the data are sorted lexicographically (first by 'a', then by 'b'). Then for each
+group (rows with the same 'a' value) we decide whether the group is neutral,
+supporting or contradicting the dependency (a->b).
+
+A group is considered neutral when it's too small - e.g. when there's a single
+row in the group, there can't possibly be multiple values in 'b'. For this
+reason we ignore groups smaller than a threshold (currently 3 rows).
+
+For sufficiently large groups (3 rows or more), we count the number of distinct
+values in 'b'. When there's a single 'b' value, the group is considered to
+support the dependency (a->b), otherwise it's condidered as contradicting it.
+
+At the end, we compare the number of rows in supporting and contradicting groups,
+and if there are at least 10x as many supporting rows, we consider the
+functional dependency to be valid.
+
+
+This approach has the negative property that the algorithm is that it's a bit
+fragile with respect to the sample - there may be data sets producing quite
+different results for each ANALYZE execution (as even a single row may change
+the outcome of the final 10x test).
+
+It was proposed to make the dependencies "fuzzy" - e.g. track some coefficient
+between [0,1] determining how much the dependency holds. That would however mean
+we have to keep all the dependencies, as eliminating them based on the value of
+the coefficient (e.g. throw away dependencies <= 0.5) would result in exactly
+the same fragility issues. This would also make it more complicated to combine
+dependencies. So this does not seem like a practical approach.
+
+A better approach might be to replace the constants (min_group_size=3 and 10x)
+with values somehow related to the particular data set.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Apllying the functional dependencies is quite simple - given a list of equality
+clauses, check which clauses are redundant (i.e. implied by some other clause).
+For example given clause list
+
+    (a = 2) AND (b = 2) AND (c = 3)
+
+and dependencies (a->b) and (a->d), the list of clauses may be simplified to
+
+    (a = 1) AND (c = 3)
+
+Functional dependencies may only be applied to equality clauses, all other types
+of clauses are ignored. See clauselist_apply_dependencies() for more details.
+
+
+Compatibility of clauses
+------------------------
+
+The reduction assumes the clauses really are redundant, and the value in the
+reduced clause (b=2) is the value determined by (a=1). If that's not the case
+and the values are "incompatible" the result will be over-estimation.
+
+This may happen for example when using conditions on ZIP and city name with
+mismatching values (ZIP for a different city), etc. In such case the result
+set will be empty, but we'll estimate the selectivity using the ZIP condition.
+
+In this case the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+
+Dependencies vs. MCV/histogram
+------------------------------
+
+In some cases the "compatibility" of the conditions might be verified using the
+other types of multivariate stats - MCV lists and histograms.
+
+For MCV lists the verification might be very simple - peek into the list if
+there are any items matching the clause on the 'a' column (e.g. ZIP code), and
+if such item is found, check that the 'b' column matches the other clause. If it
+does not, the clauses are contradictory. We can't really say if such item was
+not found, except maybe restricting the selectivity using the MCV data (e.g.
+using min/max selectivity, or something).
+
+With histograms, it might work similarly - we can't check the values directly
+(because histograms use buckets, unlike MCV lists, storing the actual values).
+So we can only observe the buckets matching the clauses - if those buckets have
+very low frequency, it probably means the two clauses are incompatible.
+
+It's unclear what 'low frequency' is, but if one of the clauses is implied
+(automatically true because of the other clause), then
+
+    selectivity[clause(A)] = selectivity[clause(A) & clause(B)]
+
+So we might compute selectivity of the first clause - for example using regular
+statistics. And then check if the selectivity computed from the histogram is
+about the same (or significantly lower).
+
+The problem is that histograms work well only when the data ordering matches the
+natural meaning. For values that serve as labels - like city names or ZIP codes,
+or even generated IDs, histograms really don't work all that well. For example
+sorting cities by name won't match the sorting of ZIP codes, rendering the
+histogram unusable.
+
+So MCVs are probably going to work much better, because they don't really assume
+any sort of ordering. And it's probably more appropriate for the label-like data.
+
+A good question however is why even use functional dependencies in such cases
+and not simply use the MCV/histogram instead. One reason is that the functional
+dependencies allow fallback to regular stats, and often produce more accurate
+estimates - especially compared to histograms, that are quite bad in estimating
+equality clauses.
+
+
+Limitations
+-----------
+
+Let's see the main liminations of functional dependencies, especially those
+related to the current implementation.
+
+The current implementation supports only dependencies between two columns, but
+this is merely a simplification of the initial implementation. It's certainly
+useful to mine for dependencies involving multiple columns on the 'left' side,
+i.e. a condition for the dependency. That is dependencies like (a,b -> c).
+
+The implementation may/should be smart enough not to mine redundant conditions,
+e.g. (a->b) and (a,c -> b), because the latter is a trivial consequence of the
+former one (if values of 'a' determine 'b', adding another column won't change
+that relationship). The ANALYZE should first analyze 1:1 dependencies, then 2:1
+dependencies (and skip the already identified ones), etc.
+
+For example the dependency
+
+    (city name -> zip code)
+
+is much stronger, i.e. whenever it hold, then
+
+    (city name, state name -> zip code)
+
+holds too. But in case there are cities with the same name in different states,
+then only the latter dependency will be valid.
+
+Of course, there probably are cities with the same name within a single state,
+but hopefully this is relatively rare occurence (and thus we'll still detect
+the 'soft' dependency).
+
+Handling multiple columns on the right side of the dependency, is not necessary,
+as those dependencies may be simply decomposed into a set of dependencies with
+the same meaning, one for each column on the right side. For example
+
+    (a -> b,c)
+
+is exactly the same as
+
+    (a -> b) & (a -> c)
+
+Of course, storing the first form may be more efficient thant storing multiple
+'simple' dependencies separately.
+
+
+TODO Support dependencies with multiple columns on left/right.
+
+TODO Investigate using histogram and MCV list to verify the dependencies.
+
+TODO Investigate statistical testing of the distribution (to decide whether it
+     makes sense to build the histogram/MCV list).
+
+TODO Using a min/max of selectivities would probably make more sense for the
+     associated columns.
+
+TODO Consider eliminating the implied columns from the histogram and MCV lists
+     (but maybe that's not a good idea, because that'd make it impossible to use
+     these stats for non-equality clauses and also it wouldn't be possible to
+     use the stats for verification of the dependencies).
+
+TODO The reduction probably might be extended to also handle IS NULL clauses,
+     assuming we fix the ANALYZE to properly handle NULL values. We however
+     won't be able to reduce IS NOT NULL (unless I'm missing something).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..82f2177
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,376 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
+									  int natts, VacAttrStats **vacattrstats);
+
+static List* list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell *lc;
+	List *mvstats;
+
+	TupleDesc tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute
+	 * the MV statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach (lc, mvstats)
+	{
+		int				j;
+		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
+		MVDependencies	deps  = NULL;
+
+		VacAttrStats  **stats  = NULL;
+		int				numatts   = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector * attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16 *tmp = palloc0(numatts * sizeof(int16));
+			int attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (! tupdesc->attrs[attrs->values[j]-1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/*
+		 * Analyze functional dependencies of columns.
+		 */
+		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, deps, attrs);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats**)palloc0(numattrs * sizeof(VacAttrStats*));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and
+		 * that there's the requested 'lt' operator and that the type
+		 * is 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/* FIXME This is rather ugly way to check for 'ltopr' (which
+		 *       is defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *)stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List*
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_built = stats->deps_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls,    1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values,   0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram
+	 * and MCV list, depending whether it actually was computed.
+	 */
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps -1]    = false;
+		values[Anum_pg_mv_statistic_stadeps  - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
+
+	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum*)a;
+	Datum		db = *(Datum*)b;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem*)a)->value;
+	Datum		db = ((ScalarItem*)b)->value;
+	SortSupport ssup= (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport)palloc0(offsetof(MultiSortSupportData, ssup)
+									+ sizeof(SortSupportData)*ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *)vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int i;
+	SortItem *ia = (SortItem*)a;
+	SortItem *ib = (SortItem*)b;
+
+	MultiSortSupport mss = (MultiSortSupport)arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int	compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
+
+int
+multi_sort_compare_dims(int start, int end,
+						const SortItem *a, const SortItem *b,
+						MultiSortSupport mss)
+{
+	int dim;
+
+	for (dim = start; dim <= end; dim++)
+	{
+		int r = ApplySortComparator(a->values[dim], a->isnull[dim],
+									b->values[dim], b->isnull[dim],
+									&mss->ssup[dim]);
+
+		if (r != 0)
+			return r;
+	}
+
+	return 0;
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..75b9c54
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,78 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/sysattr.h"
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/fmgroids.h"
+#include "utils/mvstats.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+
+/* multi-sort */
+typedef struct MultiSortSupportData {
+	int				ndims;		/* number of dimensions supported by the */
+	SortSupportData	ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData* MultiSortSupport;
+
+typedef struct SortItem {
+	Datum  *values;
+	bool   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+							  int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+						   const SortItem *b, MultiSortSupport mss);
+
+int multi_sort_compare_dims(int start, int end, const SortItem *a,
+							const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..5437bdf
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,686 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+/* internal state for generator of variations (k-permutations of n elements) */
+typedef struct VariationGeneratorData {
+
+	int k;					/* size of the k-permutation */
+	int current;			/* index of the next variation to return */
+
+	int nvariations;		/* number of variations generated (size of array) */
+	int	variations[1];		/* array of pre-built variations */
+
+} VariationGeneratorData;
+
+typedef VariationGeneratorData* VariationGenerator;
+
+/*
+ * generate all variations (k-permutations of n elements)
+ */
+static void
+generate_variations(VariationGenerator state,
+					int n, int maxlevel, int level, int *current)
+{
+	int i, j;
+
+	/* initialize */
+	if (level == 0)
+	{
+		current = (int*)palloc0(sizeof(int) * (maxlevel+1));
+		state->current = 0;
+	}
+
+	for (i = 0; i < n; i++)
+	{
+		/* check if the value is already used current variation */
+		bool found = false;
+		for (j = 0; j < level; j++)
+		{
+			if (current[j] == i)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		/* already used, so try the next element */
+		if (found)
+			continue;
+
+		/* ok, we can use this element, so store it */
+		current[level] = i;
+
+		/* and check if we do have a complete variation of k elements */
+		if (level == maxlevel)
+		{
+			/* yep, store the variation */
+			Assert(state->current < state->nvariations);
+			memcpy(&state->variations[(state->k * state->current)], current,
+				   sizeof(int) * (maxlevel+1));
+			state->current++;
+		}
+		else
+			/* nope, look for additional elements */
+			generate_variations(state, n, maxlevel, level+1, current);
+	}
+
+	if (level == 0)
+		pfree(current);
+}
+
+/*
+ * initialize the generator of variations, and prebuild the variations
+ *
+ * This pre-builds all the variations. We could also generate them in
+ * generator_next(), but this seems simpler.
+ */
+static VariationGenerator
+generator_init(int2vector *attrs, int k)
+{
+	int	i;
+	int	n = attrs->dim1;
+	int	nvariations;
+	VariationGenerator	state;
+
+	Assert((n >= k) &&  (k > 0));
+
+	/* compute the total number of variations as n!/(n-k)! */
+	nvariations = n;
+	for (i = 1; i < k; i++)
+		nvariations *= (n - i);
+
+	/* allocate the generator state as a single chunk of memory */
+	state = (VariationGenerator)palloc0(
+					offsetof(VariationGeneratorData, variations)
+					+ (nvariations * k * sizeof(int)));	/* variations */
+
+	state->nvariations = nvariations;
+	state->k = k;
+
+	/* now actually pre-generate all the variations */
+	generate_variations(state, n, (k-1), 0, NULL);
+
+	/* we expect to generate exactly the right number of variations */
+	Assert(state->nvariations == state->current);
+
+	/* reset the index */
+	state->current = 0;
+
+	return state;
+}
+
+/* free the generator state */
+static void
+generator_free(VariationGenerator state)
+{
+	/* we've allocated a single chunk, so just free it */
+	pfree(state);
+}
+
+/* generate next combination */
+static int*
+generator_next(VariationGenerator state, int2vector *attrs)
+{
+	if (state->current == state->nvariations)
+		return NULL;
+
+	return &state->variations[state->k * state->current++];
+}
+
+/*
+ * check if the dependency is implied by existing dependencies
+ *
+ * A dependency is considered implied, if there exists a dependency with the
+ * same column on the left, and a subset of columns on the right side. So for
+ * example if we have a dependency
+ *
+ *     (a,b,c) -> d
+ *
+ * then we are looking for these six dependencies
+ *
+ *     (a) -> d
+ *     (b) -> d
+ *     (c) -> d
+ *     (a,b) -> d
+ *     (a,c) -> d
+ *     (b,c) -> d
+ *
+ * This does not detect transitive dependencies. For example if we have
+ *
+ *     (a) -> b
+ *     (b) -> c
+ *
+ * then obviously
+ *
+ *     (a) -> c
+ *
+ * but this is not detected. Extending the method to handle transitive cases
+ * is future work.
+ */
+static bool
+dependency_is_implied(MVDependencies dependencies, int k, int *dependency,
+					  int2vector * attrs)
+{
+	bool	implied = false;
+	int		i, j, l;
+	int	   *tmp;
+
+	if (dependencies == NULL)
+		return false;
+
+	tmp = (int*)palloc0(sizeof(int) * k);
+
+	/* translate the indexes to actual attribute numbers */
+	for (i = 0; i < k; i++)
+		tmp[i] = attrs->values[dependency[i]];
+
+	/* search for a smaller */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		bool contained = true;
+		MVDependency dep = dependencies->deps[i];
+
+		/* does the last attribute match? */
+		if (tmp[k-1] != dep->attributes[dep->nattributes-1])
+			continue;	/* nope, no need to check this dependency further */
+
+		/* are the conditions superset of the existing dependency? */
+		for (j = 0; j < (dep->nattributes-1); j++)
+		{
+			bool found = false;
+
+			for (l = 0; l < (k-1); l++)
+			{
+				if (tmp[l] == dep->attributes[j])
+				{
+					found = true;
+					break;
+				}
+			}
+
+			/* we've found an attribute not included in the new dependency */
+			if (! found)
+			{
+				contained = false;
+				break;
+			}
+		}
+
+		/* we've found an existing dependency, trivially proving the new one */
+		if (contained)
+		{
+			implied = true;
+			break;
+		}
+	}
+
+	pfree(tmp);
+
+	return implied;
+}
+
+/*
+ * validates functional dependency on the data
+ *
+ * An actual work horse of detecting functional dependencies. Given a variation
+ * of k attributes, it checks that the first (k-1) are sufficient to determine
+ * the last one.
+ */
+static bool
+dependency_is_valid(int numrows, HeapTuple *rows, int k, int * dependency,
+					VacAttrStats **stats, int2vector *attrs)
+{
+	int i, j;
+	int nvalues = numrows * k;
+
+	/*
+	 * XXX Maybe the threshold should be somehow related to the number of
+	 *     distinct values in the combination of columns we're analyzing.
+	 *     Assuming the distribution is uniform, we can estimate the average
+	 *     group size and use it as a threshold, similarly to what we do for
+	 *     MCV lists.
+	 */
+	int min_group_size = 3;
+
+	/* number of groups supporting / contradicting the dependency */
+	int n_supporting = 0;
+	int n_contradicting = 0;
+
+	/* counters valid within a group */
+	int group_size = 0;
+	int n_violations = 0;
+
+	int n_supporting_rows = 0;
+	int n_contradicting_rows = 0;
+
+	/* sort info for all attributes columns */
+	MultiSortSupport mss = multi_sort_init(k);
+
+	/* data for the sort */
+	SortItem *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum    *values = (Datum*)palloc0(sizeof(Datum) * nvalues);
+	bool     *isnull = (bool*)palloc0(sizeof(bool) * nvalues);
+
+	/* fix the pointers to values/isnull */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	/*
+	 * Verify the dependency (a,b,...)->z, using a rather simple algorithm:
+	 *
+	 * (a) sort the data lexicographically
+	 *
+	 * (b) split the data into groups by first (k-1) columns
+	 *
+	 * (c) for each group count different values in the last column
+	 */
+
+	/* prepare the sort function for the first dimension, and SortItem array */
+	for (i = 0; i < k; i++)
+	{
+		multi_sort_add_dimension(mss, i, dependency[i], stats);
+
+		/* accumulate all the data for both columns into an array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[dependency[i]],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	/* sort the items so that we can detect the groups */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/*
+	 * Walk through the sorted array, split it into rows according to the first
+	 * (k-1) columns. If there's a single value in the last column, we count
+	 * the group as 'supporting' the functional dependency. Otherwise we count
+	 * it as contradicting.
+	 *
+	 * We also require a group to have a minimum number of rows to be considered
+	 * useful for supporting the dependency. Contradicting groups may be of
+	 * any size, though.
+	 *
+	 * XXX The minimum size requirement makes it impossible to identify case
+	 *     when both columns are unique (or nearly unique), and therefore
+	 *     trivially functionally dependent.
+	 */
+
+	/* start with the first row forming a group */
+	group_size  = 1;
+
+	for (i = 1; i < numrows; i++)
+	{
+		/* end of the preceding group */
+		if (multi_sort_compare_dims(0, (k-2), &items[i-1], &items[i], mss) != 0)
+		{
+			/*
+			 * If there is a single are no contradicting rows, count the group
+			 * as supporting, otherwise contradicting.
+			 */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting +=  1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations > 0)
+			{
+				n_contradicting +=  1;
+				n_contradicting_rows += group_size;
+			}
+
+			/* current values start a new group */
+			n_violations = 0;
+			group_size = 0;
+		}
+		/* first colums match, but the last one does not (so contradicting) */
+		else if (multi_sort_compare_dims((k-1), (k-1), &items[i-1], &items[i], mss) != 0)
+			n_violations += 1;
+
+		group_size += 1;
+	}
+
+	/* handle the last group (just like above) */
+	if ((n_violations == 0) && (group_size >= min_group_size))
+	{
+		n_supporting += 1;
+		n_supporting_rows += group_size;
+	}
+	else if (n_violations)
+	{
+		n_contradicting += 1;
+		n_contradicting_rows += group_size;
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(mss);
+
+	/*
+	 * See if the number of rows supporting the association is at least 10x the
+	 * number of rows violating the hypothetical dependency.
+	 */
+	return (n_supporting_rows > (n_contradicting_rows * 10));
+}
+
+/*
+ * detects functional dependencies between groups of columns
+ *
+ * Generates all possible subsets of columns (variations) and checks if the
+ * last one is determined by the preceding ones. For example given 3 columns,
+ * there are 12 variations (6 for variations on 2 columns, 6 for 3 columns):
+ *
+ *     two columns            three columns
+ *     -----------            -------------
+ *     (a) -> c               (a,b) -> c
+ *     (b) -> c               (b,a) -> c
+ *     (a) -> b               (a,c) -> b
+ *     (c) -> b               (c,a) -> b
+ *     (c) -> a               (c,b) -> a
+ *     (b) -> a               (b,c) -> a
+ *
+ * Clearly some of the variations are redundant, as the order of columns on the
+ * left side does not matter. This is detected in dependency_is_implied, and
+ * those dependencies are ignored.
+ *
+ * We however do not detect that dependencies are transitively implied. For
+ * example given dependencies
+ *
+ *     (a) -> b
+ *     (b) -> c
+ *
+ * then
+ *
+ *     (a) -> c
+ *
+ * is trivially implied. However we don't detect that and all three dependencies
+ * will get included in the resulting set. Eliminating such transitively implied
+ * dependencies is future work.
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int i;
+	int k;
+	int numattrs = attrs->dim1;
+
+	/* result */
+	MVDependencies	dependencies = NULL;
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * We'll try build functional dependencies starting from the smallest ones
+	 * covering jut 2 columns, to the largest ones, covering all columns
+	 * included int the statistics. We start from the smallest ones because
+	 * we want to be able to skip already implied ones.
+	 */
+	for (k = 2; k <= numattrs; k++)
+	{
+		int *dependency;	/* array with k elements */
+
+		/* prepare a generator of variation */
+		VariationGenerator generator = generator_init(attrs, k);
+
+		/* generate all possible variations of k values (out of n) */
+		while ((dependency = generator_next(generator, attrs)))
+		{
+			MVDependency d;
+
+			/* skip dependencies that are already trivially implied */
+			if (dependency_is_implied(dependencies, k, dependency, attrs))
+				continue;
+
+			/* also skip dependencies that don't seem to be valid */
+			if (! dependency_is_valid(numrows, rows, k, dependency, stats, attrs))
+				continue;
+
+			d = (MVDependency)palloc0(offsetof(MVDependencyData, attributes)
+											   + k * sizeof(int));
+
+			/* copy the dependency, but translate it to actuall attnums */
+			d->nattributes = k;
+			for (i = 0; i < k; i++)
+				d->attributes[i] = attrs->values[dependency[i]];
+
+			/* initialize the list of dependencies */
+			if (dependencies == NULL)
+			{
+				dependencies
+					= (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+				dependencies->magic = MVSTAT_DEPS_MAGIC;
+				dependencies->type  = MVSTAT_DEPS_TYPE_BASIC;
+				dependencies->ndeps = 0;
+			}
+
+			dependencies->ndeps++;
+			dependencies = (MVDependencies)repalloc(dependencies,
+							offsetof(MVDependenciesData, deps)
+							+ dependencies->ndeps * sizeof(MVDependency));
+
+			dependencies->deps[dependencies->ndeps-1] = d;
+		}
+
+		/* we're done with variations of k elements, so free the generator */
+		generator_free(generator);
+	}
+
+	return dependencies;
+}
+
+
+/*
+ * serialize list of dependencies into a bytea
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int i;
+	bytea * output;
+	char *tmp;
+
+	/* we need to store ndeps, with a number of attributes for each one */
+	Size len = VARHDRSZ + offsetof(MVDependenciesData, deps)
+			  + sizeof(int) * dependencies->ndeps;
+
+	/* and also include space for the actual attribute numbers */
+	for (i = 0; i < dependencies->ndeps; i++)
+		len += (sizeof(int16) * dependencies->deps[i]->nattributes);
+
+	output = (bytea*)palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* store number of attributes and attribute numbers for each dependency */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency d = dependencies->deps[i];
+
+		memcpy(tmp, &(d->nattributes), sizeof(int));
+		tmp += sizeof(int);
+
+		memcpy(tmp, d->attributes, sizeof(int16) * d->nattributes);
+		tmp += sizeof(int16) * d->nattributes;
+
+		Assert(tmp <= ((char*)output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea * data)
+{
+	int		i;
+	Size	expected_size;
+	MVDependencies	dependencies;
+	char   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData,deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData,deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies)palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+		elog(ERROR, "invalid dependency type %d (expected %dd)",
+			 dependencies->type, MVSTAT_DEPS_MAGIC);
+
+	if (dependencies->type != MVSTAT_DEPS_TYPE_BASIC)
+		elog(ERROR, "invalid dependency type %d (expected %dd)",
+			 dependencies->type, MVSTAT_DEPS_TYPE_BASIC);
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what minimum bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData,deps) +
+					dependencies->ndeps * (sizeof(int) + sizeof(int16) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData,deps)
+							+ (dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		int k;
+		MVDependency d;
+
+		/* number of attributes */
+		memcpy(&k, tmp, sizeof(int));
+		tmp += sizeof(int);
+
+		/* is the number of attributes valid? */
+		Assert((k >= 2) && (k <= MVSTATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the dependency */
+		d = (MVDependency)palloc0(offsetof(MVDependencyData, attributes)
+								  + k * sizeof(int));
+
+		d->nattributes = k;
+
+		/* copy attribute numbers */
+		memcpy(d->attributes, tmp, sizeof(int16) * d->nattributes);
+		tmp += sizeof(int16) * d->nattributes;
+
+		dependencies->deps[i] = d;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char*)data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char*)data + VARSIZE_ANY(data)));
+
+	return dependencies;
+}
+
+/* print some basic info about dependencies (number of dependencies) */
+Datum
+pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	result = palloc0(128);
+	snprintf(result, 128, "dependencies=%d", dependencies->ndeps);
+
+	/* FIXME free the deserialized data (pfree is not enough) */
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * print the dependencies
+ *
+ * TODO  Would be nice if this printed column names (instead of just attnums).
+ */
+Datum
+pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
+{
+	int		i, j;
+	bytea   *data = PG_GETARG_BYTEA_P(0);
+	StringInfoData	buf;
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	if (dependencies == NULL)
+		PG_RETURN_NULL();
+
+	initStringInfo(&buf);
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+
+		if (i > 0)
+			appendStringInfo(&buf, ", ");
+
+		/* conditions */
+		appendStringInfoChar(&buf, '(');
+		for (j = 0; j < dependency->nattributes-1; j++)
+		{
+			if (j > 0)
+				appendStringInfoChar(&buf, ',');
+
+			appendStringInfo(&buf, "%d", dependency->attributes[j]);
+		}
+
+		/* the implied attribute */
+		appendStringInfo(&buf, ") => %d",
+						 dependency->attributes[dependency->nattributes-1]);
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(buf.data));
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index fd8dc91..8ce9c0e 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2104,6 +2104,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90600)
+		{
+			printfPQExpBuffer(&buf,
+						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+						   "  deps_enabled,\n"
+						   "  deps_built,\n"
+						   "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+						   "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/*  options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+							PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 049bf9f..12211fe 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,10 +153,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 
 /* in dependency.c */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index b80d8d8..5ae42f7 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ab2c1a8..a768bb5 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -173,6 +173,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId  3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId	3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..44cf9c6 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -137,6 +137,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..c74af47
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;			/* relation containing attributes */
+	NameData	staname;			/* statistics name */
+	Oid			stanamespace;		/* OID of namespace containing this statistics */
+	Oid			staowner;			/* statistics owner */
+
+	/* statistics requested to build */
+	bool		deps_enabled;		/* analyze dependencies? */
+
+	/* statistics that are available (if requested) */
+	bool		deps_built;			/* dependencies were built */
+
+	/* variable-length fields start here, but we allow direct access to stakeys */
+	int2vector	stakeys;			/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	bytea		stadeps;			/* dependencies (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					8
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_staowner			4
+#define Anum_pg_mv_statistic_deps_enabled		5
+#define Anum_pg_mv_statistic_deps_built			6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_stadeps			8
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index ceb8129..cdcbf95 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2666,6 +2666,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_info _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies info");
+DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
+DESCR("multivariate stats: functional dependencies show");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index b7a38ce..a52096b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3577, 3578);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 54f67e9..99a6a62 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -75,6 +75,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(List *name, List *args, bool oldstyle,
 				List *parameters, const char *queryString);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fad9988..545b62a 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -266,6 +266,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -401,6 +402,7 @@ typedef enum NodeTag
 	T_CreatePolicyStmt,
 	T_AlterPolicyStmt,
 	T_CreateTransformStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2fd0629..e1807fb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -601,6 +601,17 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
+	bool		if_not_exists;	/* just do nothing if statistics already exists? */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1410,6 +1421,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index bdea72c..75c4752 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -541,6 +541,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -636,6 +637,33 @@ typedef struct IndexOptInfo
 	void		(*amcostestimate) ();	/* AM's cost estimator */
 } IndexOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
+
+	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
+
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/acl.h b/src/include/utils/acl.h
index 4e15a14..3e11253 100644
--- a/src/include/utils/acl.h
+++ b/src/include/utils/acl.h
@@ -330,6 +330,7 @@ extern bool pg_foreign_data_wrapper_ownercheck(Oid srv_oid, Oid roleid);
 extern bool pg_foreign_server_ownercheck(Oid srv_oid, Oid roleid);
 extern bool pg_event_trigger_ownercheck(Oid et_oid, Oid roleid);
 extern bool pg_extension_ownercheck(Oid ext_oid, Oid roleid);
+extern bool pg_statistics_ownercheck(Oid stat_oid, Oid roleid);
 extern bool has_createrole_privilege(Oid roleid);
 extern bool has_bypassrls_privilege(Oid roleid);
 
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..7837bc0
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,71 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
+ */
+typedef struct MVDependencyData {
+	int 	nattributes;	/* number of attributes */
+	int16	attributes[1];	/* attribute numbers */
+} MVDependencyData;
+
+typedef MVDependencyData* MVDependency;
+
+typedef struct MVDependenciesData {
+	uint32			magic;		/* magic constant marker */
+	uint32			type;		/* type of MV Dependencies (BASIC) */
+	int32			ndeps;		/* number of dependencies */
+	MVDependency	deps[1];	/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData* MVDependencies;
+
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C	/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
+
+/*
+ * TODO Maybe fetching the histogram/MCV list separately is inefficient?
+ *      Consider adding a single `fetch_stats` method, fetching all
+ *      stats specified using flags (or something like that).
+ */
+
+bytea * serialize_mv_dependencies(MVDependencies dependencies);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVDependencies	deserialize_mv_dependencies(bytea * data);
+
+/* FIXME this probably belongs somewhere else (not to operations stats) */
+extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows,
+								  int2vector *attrs,
+								  VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+						   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..8771f9c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -61,6 +61,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid;	/* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -93,6 +94,9 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
+ 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 1b48304..9f03c8d 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -38,6 +38,7 @@ extern void RelationClose(Relation relation);
  * Routines to compute/retrieve additional cached information
  */
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..0e0658d 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..f54e1b7
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,150 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | (1) => 2, (1) => 3, (2) => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | (1) => 2, (1) => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | (1) => 2, (1) => 3, (2) => 3
+(1 row)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | (1) => 2, (1) => 3, (2) => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | (1) => 2, (1) => 3
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | pg_mv_stats_dependencies_show 
+--------------+------------+-------------------------------
+ t            | t          | (1) => 2, (1) => 3, (2) => 3
+(1 row)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |          pg_mv_stats_dependencies_show           
+--------------+------------+--------------------------------------------------
+ t            | t          | (2) => 1, (3) => 1, (3) => 2, (4) => 1, (4) => 2
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index 75751be..eb60960 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -35,6 +35,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regtest_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
+CREATE STATISTICS addr_nsp.gentable_stat ON addr_nsp.gentable(a,b) WITH (dependencies);
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
@@ -373,7 +374,8 @@ WITH objects (type, name, args) AS (VALUES
 				-- extension
 				-- event trigger
 				('policy', '{addr_nsp, gentable, genpol}', '{}'),
-				('transform', '{int}', '{sql}')
+				('transform', '{int}', '{sql}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
@@ -420,13 +422,14 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
  trigger                   |            |                   | t on addr_nsp.gentable                                               | t
  operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops USING btree                                   | t
  policy                    |            |                   | genpol on addr_nsp.gentable                                          | t
+ statistics                | addr_nsp   | gentable_stat     | addr_nsp.gentable_stat                                               | t
  collation                 | pg_catalog | "default"         | pg_catalog."default"                                                 | t
  transform                 |            |                   | for integer on language sql                                          | t
  text search dictionary    | addr_nsp   | addr_ts_dict      | addr_nsp.addr_ts_dict                                                | t
  text search parser        | addr_nsp   | addr_ts_prs       | addr_nsp.addr_ts_prs                                                 | t
  text search configuration | addr_nsp   | addr_ts_conf      | addr_nsp.addr_ts_conf                                                | t
  text search template      | addr_nsp   | addr_ts_temp      | addr_nsp.addr_ts_temp                                                | t
-(41 rows)
+(42 rows)
 
 ---
 --- Cleanup resources
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 22ea06c..06f2231 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1368,6 +1368,15 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length(s.stadeps) AS depsbytes,
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index eb0bc88..92a0d8a 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -113,6 +113,7 @@ pg_inherits|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7e9b319..097a04f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -162,3 +162,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..051633a
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,142 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 ON functional_dependencies (unknown_column) WITH (dependencies);
+
+-- single column
+CREATE STATISTICS s1 ON functional_dependencies (a) WITH (dependencies);
+
+-- single column, duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a,a) WITH (dependencies);
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 ON functional_dependencies (a, a, b) WITH (dependencies);
+
+-- unknown option
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (unknown_option);
+
+-- correct command
+CREATE STATISTICS s1 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 ON functional_dependencies (a, b, c) WITH (dependencies);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 ON functional_dependencies (a, b, c, d) WITH (dependencies);
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index 68e7cb0..3775b28 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -39,6 +39,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regtest_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
+CREATE STATISTICS addr_nsp.gentable_stat ON addr_nsp.gentable(a,b) WITH (dependencies);
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
@@ -166,7 +167,8 @@ WITH objects (type, name, args) AS (VALUES
 				-- extension
 				-- event trigger
 				('policy', '{addr_nsp, gentable, genpol}', '{}'),
-				('transform', '{int}', '{sql}')
+				('transform', '{int}', '{sql}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
-- 
2.5.0

0003-clause-reduction-using-functional-dependencies.patchtext/x-patch; name=0003-clause-reduction-using-functional-dependencies.patchDownload

From 6e3e16f46f93f045c137c070b48e387a470c3a08 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 19:42:18 +0200
Subject: [PATCH 3/9] clause reduction using functional dependencies

During planning, use functional dependencies to decide which clauses to
skip during cardinality estimation. Initial and rather simplistic
implementation.

This only works with regular WHERE clauses, not clauses used for join
clauses.

Note: The clause_is_mv_compatible() needs to identify the relation (so
that we can fetch the list of multivariate stats by OID).
planner_rt_fetch() seems like the appropriate way to get the relation
OID, but apparently it only works with simple vars. Maybe
examine_variable() would make this work with more complex vars too?

Includes regression tests analyzing functional dependencies (part of
ANALYZE) on several datasets (no dependencies, no transitive
dependencies, ...).

Checks that a query with conditions on two columns, where one (B) is
functionally dependent on the other one (A), correctly ignores the
clause on (B) and chooses bitmap index scan instead of plain index scan
(which is what happens otherwise, thanks to assumption of
independence).

Note: Functional dependencies only work with equality clauses, no
inequalities etc.
---
 src/backend/optimizer/path/clausesel.c        | 505 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/README.dependencies |  63 ++--
 src/backend/utils/mvstats/README.stats        |  36 ++
 src/backend/utils/mvstats/common.c            |   5 +-
 src/backend/utils/mvstats/dependencies.c      |  24 ++
 src/include/utils/mvstats.h                   |   3 +-
 src/include/utils/rel.h                       |   2 +-
 src/test/regress/expected/mv_dependencies.out |  24 ++
 src/test/regress/parallel_schedule            |   3 +
 src/test/regress/sql/mv_dependencies.sql      |  15 +
 10 files changed, 637 insertions(+), 43 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.stats

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 02660c2..a3afdf5 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,25 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		MV_CLAUSE_TYPE_FDEP		0x01
+
+static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+
+static int count_mv_attnums(List *clauses, Index relid);
+
+static int count_varnos(List *clauses, Index *relid);
+
+static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+									Index relid, List *stats);
+
+static bool has_stats(List *stats, int type);
+
+static List * find_stats(PlannerInfo *root, Index relid);
+
+static bool stats_type_matches(MVStatisticInfo *stat, int type);
+
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +84,19 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
+ * The first thing we try to do is applying multivariate statistics, in a way
+ * that intends to minimize the overhead when there are no multivariate stats
+ * on the relation. Thus we do several simple (and inexpensive) checks first,
+ * to verify that suitable multivariate statistics exist.
+ *
+ * If we identify such multivariate statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using multivariate stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -99,6 +135,22 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* list of multivariate stats on the relation */
+	List	   *stats = NIL;
+
+	/*
+	 * To fetch the statistics, we first need to determine the rel. Currently
+	 * point we only support estimates of simple restrictions with all Vars
+	 * referencing a single baserel. However set_baserel_size_estimates() sets
+	 * varRelid=0 so we have to actually inspect the clauses by pull_varnos
+	 * and see if there's just a single varno referenced.
+	 */
+	if ((count_varnos(clauses, &relid) == 1) && ((varRelid == 0) || (varRelid == relid)))
+		stats = find_stats(root, relid);
+
 	/*
 	 * If there's exactly one clause, then no use in trying to match up pairs,
 	 * so just go directly to clause_selectivity().
@@ -108,6 +160,24 @@ clauselist_selectivity(PlannerInfo *root,
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * Apply functional dependencies, but first check that there are some stats
+	 * with functional dependencies built (by simply walking the stats list),
+	 * and that there are at two or more attributes referenced by clauses that
+	 * may be reduced using functional dependencies.
+	 *
+	 * We would find that anyway when trying to actually apply the functional
+	 * dependencies, but let's do the cheap checks first.
+	 *
+	 * After applying the functional dependencies we get the remainig clauses
+	 * that need to be estimated by other types of stats (MCV, histograms etc).
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
+		(count_mv_attnums(clauses, relid) >= 2))
+	{
+		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +833,436 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		AttrNumber attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+typedef struct
+{
+	Index		varno;		/* relid we're interested in */
+	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
+} mv_compatible_context;
+
+/*
+ * Recursive walker that checks compatibility of the clause with multivariate
+ * statistics, and collects attnums from the Vars.
+ *
+ * XXX The original idea was to combine this with expression_tree_walker, but
+ *     I've been unable to make that work - seems that does not quite allow
+ *     checking the structure. Hence the explicit calls to the walker.
+ */
+static bool
+mv_compatible_walker(Node *node, mv_compatible_context *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return true;
+
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+
+		/* check the clause inside the RestrictInfo */
+		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
+	}
+
+	if (IsA(node, Var))
+	{
+		Var * var = (Var*)node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might be
+		 * unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (! AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+		/*
+		 * Only expressions with two arguments are considered compatible.
+		 *
+		 * XXX Possibly unnecessary (can OpExpr have different arg count?).
+		 */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node*)expr) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				(varonleft = false,
+				 is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (! ok)
+			return true;
+
+		/*
+		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
+		 * Otherwise note the relid and attnum for the variable. This uses the
+		 * function for estimating selectivity, ont the operator directly (a bit
+		 * awkward, but well ...).
+		 */
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+		}
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return mv_compatible_walker((Node *) var, context);
+	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *     variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO Support 'OR clauses' - shouldn't be all that difficult to
+ *      evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+{
+	mv_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (mv_compatible_walker(clause, (void *) &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+
+/*
+ * Reduce clauses using functional dependencies
+ */
+static List*
+fdeps_reduce_clauses(List *clauses, Index relid, Bitmapset *reduced_attnums)
+{
+	ListCell *lc;
+	List	 *reduced_clauses = NIL;
+
+	foreach (lc, clauses)
+	{
+		AttrNumber attnum = InvalidAttrNumber;
+		Node * clause = (Node*)lfirst(lc);
+
+		/* ignore clauses that are not compatible with functional dependencies */
+		if (! clause_is_mv_compatible(clause, relid, &attnum))
+			reduced_clauses = lappend(reduced_clauses, clause);
+
+		/* for equality clauses, only keep those not on reduced attributes */
+		if (! bms_is_member(attnum, reduced_attnums))
+			reduced_clauses = lappend(reduced_clauses, clause);
+	}
+
+	return reduced_clauses;
+}
+
+/*
+ * decide which attributes are redundant (for equality clauses)
+ *
+ * We try to apply all functional dependencies available, and for each one we
+ * check if it matches attnums from equality clauses, but only those not yet
+ * reduced.
+ *
+ * XXX Not sure if the order in which we apply the dependencies matters.
+ *
+ * XXX We do not combine functional dependencies from separate stats. That is
+ *     if we have dependencies on [a,b] and [b,c], then we don't deduce
+ *     a->c from a->b and b->c. Computing such transitive closure is a possible
+ *     future improvement.
+ */
+static Bitmapset *
+fdeps_reduce_attnums(List *stats, Bitmapset *attnums)
+{
+	ListCell  *lc;
+	Bitmapset *reduced = NULL;
+
+	foreach (lc, stats)
+	{
+		int i;
+		MVDependencies dependencies = NULL;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without dependencies */
+		if (! stats_type_matches(info, MV_CLAUSE_TYPE_FDEP))
+			continue;
+
+		/* fetch and deserialize dependencies */
+		dependencies = load_mv_dependencies(info->mvoid);
+
+		for (i = 0; i < dependencies->ndeps; i++)
+		{
+			int j;
+			bool matched = true;
+			MVDependency dep = dependencies->deps[i];
+
+			/* we don't bother to break the loop early (only few attributes) */
+			for (j = 0; j < dep->nattributes; j++)
+			{
+				if (! bms_is_member(dep->attributes[j], attnums))
+					matched = false;
+
+				if (bms_is_member(dep->attributes[j], reduced))
+					matched = false;
+			}
+
+			/* if dependency applies, mark the last attribute as reduced */
+			if (matched)
+				reduced = bms_add_member(reduced,
+										 dep->attributes[dep->nattributes-1]);
+		}
+	}
+
+	return reduced;
+}
+
+/*
+ * reduce list of equality clauses using soft functional dependencies
+ *
+ * We simply walk through list of functional dependencies, and for each one we
+ * check whether the dependency 'matches' the clauses, i.e. if there's a clause
+ * matching the condition. If yes, we attempt to remove all clauses matching
+ * the implied part of the dependency from the list.
+ *
+ * This only reduces equality clauses, and ignores all the other types. We might
+ * extend it to handle IS NULL clause, in the future.
+ *
+ * We also assume the equality clauses are 'compatible'. For example we can't
+ * identify when the clauses use a mismatching zip code and city name. In such
+ * case the usual approach (product of selectivities) would produce a better
+ * estimate, although mostly by chance.
+ *
+ * The implementation needs to be careful about cyclic dependencies, e.g. when
+ *
+ *     (a -> b) and (b -> a)
+ *
+ * at the same time, which means there's 1:1 relationship between te columns.
+ * In this case we must not reduce clauses on both attributes at the same time.
+ *
+ * TODO Currently we only apply functional dependencies at the same level, but
+ *      maybe we could transfer the clauses from upper levels to the subtrees?
+ *      For example let's say we have (a->b) dependency, and condition
+ *
+ *          (a=1) AND (b=2 OR c=3)
+ *
+ *      Currently, we won't be able to perform any reduction, because we'll
+ *      consider (a=1) and (b=2 OR c=3) independently. But maybe we could pass
+ *      (a=1) into the other expression, and only check it against conditions
+ *      of the functional dependencies?
+ *
+ *      In this case we'd end up with
+ *
+ *         (a=1)
+ *
+ *      as we'd consider (b=2) implied thanks to the rule, rendering the whole
+ *      OR clause valid.
+ */
+static List *
+clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
+							  Index relid, List *stats)
+{
+	Bitmapset  *clause_attnums = NULL;
+	Bitmapset  *reduced_attnums = NULL;
+
+	/*
+	 * Is there at least one statistics with functional dependencies?
+	 * If not, return the original clauses right away.
+	 *
+	 * XXX Isn't this a bit pointless, thanks to exactly the same check in
+	 *     clauselist_selectivity()? Can we trigger the condition here?
+	 */
+	if (! has_stats(stats, MV_CLAUSE_TYPE_FDEP))
+		return clauses;
+
+	/* collect attnums from clauses compatible with dependencies (equality) */
+	clause_attnums = collect_mv_attnums(clauses, relid);
+
+	/* decide which attnums may be eliminated */
+	reduced_attnums = fdeps_reduce_attnums(stats, clause_attnums);
+
+	/*
+	 * Walk through the clauses, and see which other clauses we may reduce.
+	 */
+	clauses = fdeps_reduce_clauses(clauses, relid, reduced_attnums);
+
+	bms_free(clause_attnums);
+	bms_free(reduced_attnums);
+
+	return clauses;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+stats_type_matches(MVStatisticInfo *stat, int type)
+{
+	if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach (s, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* terminate if we've found at least one matching statistics */
+		if (stats_type_matches(stat, type))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Lookups stats for a given baserel.
+ */
+static List *
+find_stats(PlannerInfo *root, Index relid)
+{
+	Assert(root->simple_rel_array[relid] != NULL);
+
+	return root->simple_rel_array[relid]->mvstatlist;
+}
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
index 1f96fbc..f248459 100644
--- a/src/backend/utils/mvstats/README.dependencies
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -156,37 +156,24 @@ estimates - especially compared to histograms, that are quite bad in estimating
 equality clauses.
 
 
-Limitations
------------
-
-Let's see the main liminations of functional dependencies, especially those
-related to the current implementation.
+Multi-column dependencies
+-------------------------
 
-The current implementation supports only dependencies between two columns, but
-this is merely a simplification of the initial implementation. It's certainly
-useful to mine for dependencies involving multiple columns on the 'left' side,
-i.e. a condition for the dependency. That is dependencies like (a,b -> c).
+The implementation supports dependencies with multiple columns on the left side
+(i.e. condition of the dependency). The detection starts from dependencies with
+a single condition, and then proceeds to higher condition counts.
 
-The implementation may/should be smart enough not to mine redundant conditions,
-e.g. (a->b) and (a,c -> b), because the latter is a trivial consequence of the
-former one (if values of 'a' determine 'b', adding another column won't change
-that relationship). The ANALYZE should first analyze 1:1 dependencies, then 2:1
-dependencies (and skip the already identified ones), etc.
+It also detects dependencies that are implied by already identified ones, and
+ignores them. For example if we know that (a->b) then we won't add (a,c->b) as
+this dependency is a trivial consequence of (a->b).
 
-For example the dependency
+For a more practical example, consider these two dependencies
 
     (city name -> zip code)
-
-is much stronger, i.e. whenever it hold, then
-
     (city name, state name -> zip code)
 
-holds too. But in case there are cities with the same name in different states,
-then only the latter dependency will be valid.
-
-Of course, there probably are cities with the same name within a single state,
-but hopefully this is relatively rare occurence (and thus we'll still detect
-the 'soft' dependency).
+We could say that the former dependency is stronger because if it's valid, then
+the second dependency is valid too.
 
 Handling multiple columns on the right side of the dependency, is not necessary,
 as those dependencies may be simply decomposed into a set of dependencies with
@@ -199,24 +186,22 @@ is exactly the same as
     (a -> b) & (a -> c)
 
 Of course, storing the first form may be more efficient thant storing multiple
-'simple' dependencies separately.
-
+'simple' dependencies separately. This is left as a future work.
 
-TODO Support dependencies with multiple columns on left/right.
 
-TODO Investigate using histogram and MCV list to verify the dependencies.
+Future work
+-----------
 
-TODO Investigate statistical testing of the distribution (to decide whether it
-     makes sense to build the histogram/MCV list).
+* Investigate using histogram and MCV list to verify the dependencies.
 
-TODO Using a min/max of selectivities would probably make more sense for the
-     associated columns.
+* Investigate statistical testing of the distribution (to decide whether it
+  makes sense to build the histogram/MCV list).
 
-TODO Consider eliminating the implied columns from the histogram and MCV lists
-     (but maybe that's not a good idea, because that'd make it impossible to use
-     these stats for non-equality clauses and also it wouldn't be possible to
-     use the stats for verification of the dependencies).
+* Consider eliminating the implied columns from the histogram and MCV lists
+  (but maybe that's not a good idea, because that'd make it impossible to use
+  these stats for non-equality clauses and also it wouldn't be possible to
+  use the stats for verification of the dependencies).
 
-TODO The reduction probably might be extended to also handle IS NULL clauses,
-     assuming we fix the ANALYZE to properly handle NULL values. We however
-     won't be able to reduce IS NOT NULL (unless I'm missing something).
+* The reduction probably might be extended to also handle IS NULL clauses,
+  assuming we fix the ANALYZE to properly handle NULL values. We however
+  won't be able to reduce IS NOT NULL (unless I'm missing something).
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
new file mode 100644
index 0000000..a38ea7b
--- /dev/null
+++ b/src/backend/utils/mvstats/README.stats
@@ -0,0 +1,36 @@
+Multivariate statististics
+==========================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Multivariate stats track different types of dependencies between the columns,
+hopefully improving the estimates.
+
+Currently we only have one kind of multivariate statistics - soft functional
+dependencies, and we use it to improve estimates of equality clauses. See
+README.dependencies for details.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable multivariate stats
+        exist (so if you are not using multivariate stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are multivariate stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with multivariate statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 82f2177..dcb7c78 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -84,7 +84,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		/*
 		 * Analyze functional dependencies of columns.
 		 */
-		deps = build_mv_dependencies(numrows, rows, attrs, stats);
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
 		/* store the histogram / MCV list in the catalog */
 		update_mv_stats(stat->mvoid, deps, attrs);
@@ -163,6 +164,7 @@ list_mv_stats(Oid relid)
 
 		info->mvoid = HeapTupleGetOid(htup);
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
@@ -274,6 +276,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 5437bdf..412dc30 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -684,3 +684,27 @@ pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS)
 
 	PG_RETURN_TEXT_P(cstring_to_text(buf.data));
 }
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 7837bc0..ec55a09 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,7 +17,6 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
-
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
 /*
@@ -49,6 +48,8 @@ typedef MVDependenciesData* MVDependencies;
  *      stats specified using flags (or something like that).
  */
 
+MVDependencies load_mv_dependencies(Oid mvoid);
+
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 8771f9c..d09ba25 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -94,7 +94,7 @@ typedef struct RelationData
 	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
- 
+
 	/* data managed by RelationGetMVStatList: */
 	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
 
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
index f54e1b7..ee8a9b2 100644
--- a/src/test/regress/expected/mv_dependencies.out
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -58,8 +58,10 @@ SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
 
 TRUNCATE functional_dependencies;
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
      SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
 ANALYZE functional_dependencies;
 SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
@@ -68,6 +70,16 @@ SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
  t            | t          | (1) => 2, (1) => 3, (2) => 3
 (1 row)
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
 DROP TABLE functional_dependencies;
 -- varlena type (text)
 CREATE TABLE functional_dependencies (
@@ -113,8 +125,10 @@ SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
 
 TRUNCATE functional_dependencies;
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
      SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
 ANALYZE functional_dependencies;
 SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
@@ -123,6 +137,16 @@ SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
  t            | t          | (1) => 2, (1) => 3, (2) => 3
 (1 row)
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
 DROP TABLE functional_dependencies;
 -- NULL values (mix of int and text columns)
 CREATE TABLE functional_dependencies (
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index bec0316..4f2ffb8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -110,3 +110,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
index 051633a..8ba72a4 100644
--- a/src/test/regress/sql/mv_dependencies.sql
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -56,13 +56,20 @@ SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
 TRUNCATE functional_dependencies;
 
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
      SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
 DROP TABLE functional_dependencies;
 
 -- varlena type (text)
@@ -99,6 +106,7 @@ TRUNCATE functional_dependencies;
 -- a => b, a => c
 INSERT INTO functional_dependencies
      SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
@@ -107,13 +115,20 @@ SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
 TRUNCATE functional_dependencies;
 
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
      SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, pg_mv_stats_dependencies_show(stadeps)
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
 DROP TABLE functional_dependencies;
 
 -- NULL values (mix of int and text columns)
-- 
2.5.0

0004-multivariate-MCV-lists.patchtext/x-patch; name=0004-multivariate-MCV-lists.patchDownload

From 6aba7480c5a4fd56896bf1a2d320e19ea231225d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Mon, 6 Apr 2015 16:52:15 +0200
Subject: [PATCH 4/9] multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries

Includes regression tests, mostly equal to regression tests for
functional dependencies.
---
 doc/src/sgml/ref/create_statistics.sgml |   43 ++
 src/backend/catalog/system_views.sql    |    4 +-
 src/backend/commands/statscmds.c        |   45 +-
 src/backend/nodes/outfuncs.c            |    2 +
 src/backend/optimizer/path/clausesel.c  |  800 +++++++++++++++++++++-
 src/backend/optimizer/util/plancat.c    |    4 +-
 src/backend/utils/mvstats/Makefile      |    2 +-
 src/backend/utils/mvstats/README.mcv    |  137 ++++
 src/backend/utils/mvstats/README.stats  |   89 ++-
 src/backend/utils/mvstats/common.c      |  133 +++-
 src/backend/utils/mvstats/common.h      |   15 +
 src/backend/utils/mvstats/mcv.c         | 1120 +++++++++++++++++++++++++++++++
 src/bin/psql/describe.c                 |   25 +-
 src/include/catalog/pg_mv_statistic.h   |   18 +-
 src/include/catalog/pg_proc.h           |    4 +
 src/include/nodes/relation.h            |    2 +
 src/include/utils/mvstats.h             |   77 ++-
 src/test/regress/expected/mv_mcv.out    |  207 ++++++
 src/test/regress/expected/rules.out     |    4 +-
 src/test/regress/parallel_schedule      |    2 +-
 src/test/regress/serial_schedule        |    1 +
 src/test/regress/sql/mv_mcv.sql         |  178 +++++
 22 files changed, 2847 insertions(+), 65 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.mcv
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index ff09fa5..d6973e8 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -132,6 +132,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>max_mcv_items</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of MCV list items.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
@@ -177,6 +195,31 @@ EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 2);
 </programlisting>
   </para>
 
+  <para>
+   Create table <structname>t2</> with two perfectly correlated columns
+   (containing identical data), and a MCV list on those columns:
+
+<programlisting>
+CREATE TABLE t2 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t2 SELECT mod(i,100), mod(i,100)
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s2 ON t2 (a, b) WITH (mcv);
+
+ANALYZE t2;
+
+-- valid combination (found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
+
+-- invalid combination (not found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
+</programlisting>
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31dbb2c..5c40334 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -165,7 +165,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(S.stadeps) as depsbytes,
-        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo
+        pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
+        length(S.stamcv) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index f43b053..c480fbe 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -70,7 +70,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject, childobject;
 
 	/* by default build nothing */
-	bool 	build_dependencies = false;
+	bool 	build_dependencies = false,
+			build_mcv = false;
+
+	int32 	max_mcv_items = -1;
+
+	/* options required because of other options */
+	bool	require_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -146,6 +152,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_mcv_items") == 0)
+		{
+			max_mcv_items = defGetInt32(opt);
+
+			/* this option requires 'mcv' to be enabled */
+			require_mcv = true;
+
+			/* sanity check */
+			if (max_mcv_items < MVSTAT_MCVLIST_MIN_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items must be at least %d",
+								MVSTAT_MCVLIST_MIN_ITEMS)));
+
+			else if (max_mcv_items > MVSTAT_MCVLIST_MAX_ITEMS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("max number of MCV items is %d",
+								MVSTAT_MCVLIST_MAX_ITEMS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -154,10 +183,16 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! build_dependencies)
+	if (! (build_dependencies || build_mcv))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (dependencies, mcv) was requested")));
+
+	/* now do some checking of the options */
+	if (require_mcv && (! build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies) was requested")));
+				 errmsg("option 'mcv' is required by other options(s)")));
 
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
@@ -178,8 +213,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys -1] = PointerGetDatum(stakeys);
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+
+	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 07206d7..333e24b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2162,9 +2162,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index a3afdf5..c16d559 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,18 +48,39 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
+#define		MV_CLAUSE_TYPE_MCV		0x02
 
-static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
+							 int type);
 
-static Bitmapset  *collect_mv_attnums(List *clauses, Index relid);
+static Bitmapset  *collect_mv_attnums(List *clauses, Index relid, int type);
 
-static int count_mv_attnums(List *clauses, Index relid);
+static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
+
+static List *clauselist_mv_split(PlannerInfo *root, Index relid,
+								 List *clauses, List **mvclauses,
+								 MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
+
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats,
+									bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+									int2vector *stakeys, MCVList mcvlist,
+									int nmatches, char * matches,
+									Selectivity *lowsel, bool *fullmatch,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
@@ -66,6 +88,13 @@ static List * find_stats(PlannerInfo *root, Index relid);
 static bool stats_type_matches(MVStatisticInfo *stat, int type);
 
 
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+#define UPDATE_RESULT(m,r,isor)	\
+	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -91,11 +120,13 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
  * to verify that suitable multivariate statistics exist.
  *
  * If we identify such multivariate statistics apply, we try to apply them.
- * Currently we only have (soft) functional dependencies, so we try to reduce
- * the list of clauses.
  *
- * Then we remove the clauses estimated using multivariate stats, and process
- * the rest of the clauses using the regular per-column stats.
+ * First we try to reduce the list of clauses by applying (soft) functional
+ * dependencies, and then we try to estimate the selectivity of the reduced
+ * list of clauses using the multivariate MCV list.
+ *
+ * Finally we remove the portion of clauses estimated using multivariate stats,
+ * and process the rest of the clauses using the regular per-column stats.
  *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
@@ -172,12 +203,46 @@ clauselist_selectivity(PlannerInfo *root,
 	 * that need to be estimated by other types of stats (MCV, histograms etc).
 	 */
 	if (has_stats(stats, MV_CLAUSE_TYPE_FDEP) &&
-		(count_mv_attnums(clauses, relid) >= 2))
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_FDEP) >= 2))
 	{
 		clauses = clauselist_apply_dependencies(root, clauses, relid, stats);
 	}
 
 	/*
+	 * Check that there are statistics with MCV list or histogram, and also the
+	 * number of attributes covered by these types of statistics.
+	 *
+	 * If there are no such stats or not enough attributes, don't waste time
+	 * with the multivariate code and simply skip to estimation using the
+	 * regular per-column stats.
+	 */
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
+		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	{
+		/* collect attributes from the compatible conditions */
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+
+		/* and search for the statistic covering the most attributes */
+		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+
+		if (mvstat != NULL)	/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	*mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, MV_CLAUSE_TYPE_MCV);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats */
+			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -834,32 +899,93 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO We may support some additional conditions, most importantly those
+ *      matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ *      selectivity of the most restrictive clause), because that's the maximum
+ *      we can ever get from ANDed list of clauses. This may probably prevent
+ *      issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO We may remember the lowest frequency in the MCV list, and then later use
+ *      it as a upper boundary for the selectivity (had there been a more
+ *      frequent item, it'd be in the MCV list). This might improve cases with
+ *      low-detail histograms.
+ *
+ * TODO We may also derive some additional boundaries for the selectivity from
+ *      the MCV list, because
+ *
+ *      (a) if we have a "full equality condition" (one equality condition on
+ *          each column of the statistic) and we found a match in the MCV list,
+ *          then this is the final selectivity (and pretty accurate),
+ *
+ *      (b) if we have a "full equality condition" and we haven't found a match
+ *          in the MCV list, then the selectivity is below the lowest frequency
+ *          found in the MCV list,
+ *
+ * TODO When applying the clauses to the histogram/MCV list, we can do
+ *      that from the most selective clauses first, because that'll
+ *      eliminate the buckets/items sooner (so we'll be able to skip
+ *      them without inspection, which is more expensive). But this
+ *      requires really knowing the per-clause selectivities in advance,
+ *      and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+}
+
 /*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid)
+collect_mv_attnums(List *clauses, Index relid, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate
-	 * using multivariate stats, and remember the relid/columns. We'll
-	 * then cross-check if we have suitable stats, and only if needed
-	 * we'll split the clauses into multivariate and regular lists.
+	 * Walk through the clauses and identify the ones we can estimate using
+	 * multivariate stats, and remember the relid/columns. We'll then
+	 * cross-check if we have suitable stats, and only if needed we'll split
+	 * the clauses into multivariate and regular lists.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested
-	 * OpExpr, using either a range or equality.
+	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
+	 * using either a range or equality.
 	 */
 	foreach (l, clauses)
 	{
-		AttrNumber attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(clause, relid, &attnum))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
 	}
 
 	/*
@@ -880,10 +1006,10 @@ collect_mv_attnums(List *clauses, Index relid)
  * Count the number of attributes in clauses compatible with multivariate stats.
  */
 static int
-count_mv_attnums(List *clauses, Index relid)
+count_mv_attnums(List *clauses, Index relid, int type)
 {
 	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid);
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
 
 	c = bms_num_members(attnums);
 
@@ -913,9 +1039,183 @@ count_varnos(List *clauses, Index *relid)
 
 	return cnt;
 }
+ 
+/*
+ * We're looking for statistics matching at least 2 attributes, referenced in
+ * clauses compatible with multivariate statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * If there are multiple statistics referencing the same number of columns
+ * (from the clauses), the one with less source columns (as listed in the
+ * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *     If there are two histograms built on the same set of columns, but one
+ *     has 100 buckets and the other one has 1000 buckets (thus likely
+ *     providing better estimates), this is not currently considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *     If there are three statistics - one containing just a MCV list, another
+ *     one with just a histogram and a third one with both, we treat them equally.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *     As explained, only the number of referenced attributes counts, so if
+ *     there are multiple clauses on a single attribute, this still counts as
+ *     a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *     Some clauses may work better with some statistics - for example equality
+ *     clauses probably work better with MCV lists than with histograms. But
+ *     IS [NOT] NULL conditions may often work better with histograms (thanks
+ *     to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
+ * as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list of
+ * clauses into two parts - conditions that are compatible with the selected
+ * stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while the last
+ * condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting conditions
+ * instead of just referenced attributes), but eventually the best option should
+ * be to combine multiple statistics. But that's much harder to do correctly.
+ *
+ * TODO Select multiple statistics and combine them when computing the estimate.
+ *
+ * TODO This will probably have to consider compatibility of clauses, because
+ *      'dependencies' will probably work only with equality clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums)
+{
+	int i;
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int current_matches = 1;						/* goal #1: maximize */
+	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements) and for
+	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 */
+	foreach (lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* columns matching this statistics */
+		int matches = 0;
+
+		int2vector * attrs = info->stakeys;
+		int	numattrs = attrs->dim1;
+
+		/* skip dependencies-only stats */
+		if (! info->mcv_built)
+			continue;
+
+		/* count columns covered by the histogram */
+		for (i = 0; i < numattrs; i++)
+			if (bms_is_member(attrs->values[i], attnums))
+				matches++;
+
+		/*
+		 * Use this statistics when it improves the number of matches or
+		 * when it matches the same number of attributes but is smaller.
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * This splits the clauses list into two parts - one containing clauses that
+ * will be evaluated using the chosen statistics, and the remaining clauses
+ * (either non-mvcompatible, or not related to the histogram).
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int i;
+	ListCell *l;
+	List	 *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector * attrs = mvstats->stakeys;
+	int	numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset *mvattnums = NULL;
+
+	/* build bitmap of attributes, so we can do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach (l, clauses)
+	{
+		bool		match = false;	/* by default not mv-compatible */
+		Bitmapset	*attnums = NULL;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_subset(attnums, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list of
+		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
+		 * clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible with the chosen
+	 * histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
 
 typedef struct
 {
+	int			types;		/* types of statistics ? */
 	Index		varno;		/* relid we're interested in */
 	Bitmapset  *varattnos;	/* attnums referenced by the clauses */
 } mv_compatible_context;
@@ -950,6 +1250,49 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		return mv_compatible_walker((Node*)rinfo->clause, (void *) context);
 	}
 
+	if (or_clause(node) || and_clause(node) || not_clause(node))
+ 	{
+		/*
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO We might support mixed case, where some of the clauses are
+		 *      supported and some are not, and treat all supported subclauses
+		 *      as a single clause, compute it's selectivity using mv stats,
+		 *      and compute the total selectivity using the current algorithm.
+		 *
+		 * TODO For RestrictInfo above an OR-clause, we might use the orclause
+		 *      with nested RestrictInfo - we won't have to call pull_varnos()
+		 *      for each clause, saving time.
+		 *
+		 * TODO Perhaps this needs a bit more thought for functional
+		 *      dependencies? Those don't quite work for NOT cases.
+		 */
+		BoolExpr   *expr = (BoolExpr *) node;
+		ListCell   *lc;
+
+		foreach (lc, expr->args)
+		{
+			if (mv_compatible_walker((Node *) lfirst(lc), context))
+				return true;
+		}
+
+		return false;
+ 	}
+
+	if (IsA(node, NullTest))
+ 	{
+		NullTest* nt = (NullTest*)node;
+
+		/*
+		 * Only simple (Var IS NULL) expressions supported for now. Maybe we could
+		 * use examine_variable to fix this?
+		 */
+		if (! IsA(nt->arg, Var))
+			return true;
+
+		return mv_compatible_walker((Node*)(nt->arg), context);
+	}
+
 	if (IsA(node, Var))
 	{
 		Var * var = (Var*)node;
@@ -1010,10 +1353,18 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		switch (get_oprrest(expr->opno))
 		{
 			case F_EQSEL:
-
 				/* equality conditions are compatible with all statistics */
 				break;
 
+			case F_SCALARLTSEL:
+			case F_SCALARGTSEL:
+
+				/* not compatible with functional dependencies */
+				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+					return true;	/* terminate */
+ 
+				break;
+
 			default:
 
 				/* unknown estimator */
@@ -1047,10 +1398,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
  *      evaluate them using multivariate stats.
  */
 static bool
-clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int types)
 {
 	mv_compatible_context context;
 
+	context.types = types;
 	context.varno = relid;
 	context.varattnos = NULL;	/* no attnums */
 
@@ -1058,7 +1410,7 @@ clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
 		return false;
 
 	/* remember the newly collected attnums */
-	*attnum = bms_singleton_member(context.varattnos);
+	*attnums = bms_add_members(*attnums, context.varattnos);
 
 	return true;
 }
@@ -1075,15 +1427,15 @@ fdeps_reduce_clauses(List *clauses, Index relid, Bitmapset *reduced_attnums)
 
 	foreach (lc, clauses)
 	{
-		AttrNumber attnum = InvalidAttrNumber;
+		Bitmapset *attnums = NULL;
 		Node * clause = (Node*)lfirst(lc);
 
 		/* ignore clauses that are not compatible with functional dependencies */
-		if (! clause_is_mv_compatible(clause, relid, &attnum))
+		if (! clause_is_mv_compatible(clause, relid, &attnums, MV_CLAUSE_TYPE_FDEP))
 			reduced_clauses = lappend(reduced_clauses, clause);
 
 		/* for equality clauses, only keep those not on reduced attributes */
-		if (! bms_is_member(attnum, reduced_attnums))
+		if (! bms_is_subset(attnums, reduced_attnums))
 			reduced_clauses = lappend(reduced_clauses, clause);
 	}
 
@@ -1208,7 +1560,7 @@ clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 		return clauses;
 
 	/* collect attnums from clauses compatible with dependencies (equality) */
-	clause_attnums = collect_mv_attnums(clauses, relid);
+	clause_attnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_FDEP);
 
 	/* decide which attnums may be eliminated */
 	reduced_attnums = fdeps_reduce_attnums(stats, clause_attnums);
@@ -1233,6 +1585,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_FDEP) && stat->deps_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
+		return true;
+
 	return false;
 }
 
@@ -1266,3 +1621,392 @@ find_stats(PlannerInfo *root, Index relid)
 
 	return root->simple_rel_array[relid]->mvstatlist;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all items as 'match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the items
+ *   4) skip items that are already 'no match'
+ *   5) check clause for items that still match
+ *   6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO This only handles AND-ed clauses, but it might work for OR-ed
+ *      lists too - it just needs to reverse the logic a bit. I.e. start
+ *      with 'no match' for all items, and mark the items as a match
+ *      as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList mcvlist = NULL;
+	int	nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char * matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (! mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s*u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO This works with 'bitmap' where each bit is represented as a char,
+ *      which is slightly wasteful. Instead, we could use a regular
+ *      bitmap, reducing the size to ~1/8. Another thing is merging the
+ *      bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+						   int2vector *stakeys, MCVList mcvlist,
+						   int nmatches, char * matches,
+						   Selectivity *lowsel, bool *fullmatch,
+						   bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	Bitmapset *eqmatches = NULL;	/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (! is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/*
+	 * find the lowest frequency in the MCV list
+	 *
+	 * We need to do that here, because we do various tricks in the following
+	 * code - skipping items already ruled out, etc.
+	 *
+	 * XXX A loop is necessary because the MCV list is not sorted by frequency.
+	 */
+	*lowsel = 1.0;
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem item = mcvlist->items[i];
+
+		if (item->frequency < *lowsel)
+			*lowsel = item->frequency;
+	}
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate
+	 * all the MCV items not yet eliminated by the preceding clauses.
+	 */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (! is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+				break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+			FmgrInfo	opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure	oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	gtproc;
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause. We can
+				 * skip items that were already ruled out, and terminate if there are
+				 * no remaining MCV items that might possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool mismatch = false;
+					MCVItem item = mcvlist->items[i];
+
+					/*
+					 * If there are no more matches (AND) or no remaining unmatched
+					 * items (OR), we can stop processing this clause.
+					 */
+					if (((nmatches == 0) && (! is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no match' (and
+					 * then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					switch (oprrest)
+					{
+						case F_EQSEL:
+							/*
+							 * We don't care about isgt in equality, because it does not
+							 * matter whether it's (var = const) or (const = var).
+							 */
+							mismatch = ! DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							if (! mismatch)
+								eqmatches = bms_add_member(eqmatches, idx);
+
+							break;
+
+						case F_SCALARLTSEL:	/* column < constant */
+						case F_SCALARGTSEL: /* column > constant */
+
+							/*
+							 * First check whether the constant is below the lower boundary (in that
+							 * case we can skip the bucket, because there's no overlap).
+							 */
+							mismatch = DatumGetBool(FunctionCall2Coll(&opproc,
+																 DEFAULT_COLLATION_OID,
+																 cst->constvalue,
+																 item->values[idx]));
+
+							/* invert the result if isgt=true */
+							mismatch = (isgt) ? (! mismatch) : mismatch;
+							break;
+					}
+
+					/* XXX The conditions on matches[i] are not needed, as we
+					 *     skip MCV items that can't become true/false, depending
+					 *     on the current flag. See beginning of the loop over
+					 *     MCV items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (! mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((! is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem item = mcvlist->items[i];
+
+				/* if there are no more matches, we can stop processing this clause */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (! item->isnull[idx])) ||
+					((expr->nulltesttype == IS_NOT_NULL) && (item->isnull[idx])))
+				{
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+									   stakeys, mcvlist,
+									   or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match.
+	 * In this case there can be just a single MCV item, matching the
+	 * clause (if there were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 7fb2088..8394111 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built)
+			if (mvstat->deps_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -421,9 +421,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled  = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built  = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 099f1ed..f9bf10c 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o
+OBJS = common.o dependencies.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.mcv b/src/backend/utils/mvstats/README.mcv
new file mode 100644
index 0000000..e93cfe4
--- /dev/null
+++ b/src/backend/utils/mvstats/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+    (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+    (1) there's an equality condition on all attributes of the statistic
+
+    (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+     f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+     of the actual values. This type of MCV list will be useful only for
+     estimating equality clauses, and will reduce space requirements for large
+     varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+     building the histogram at all), except for doing this decision manually in
+     ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+    SELECT * FROM pg_mv_mcv_items();
+
+It has two input parameters:
+
+    oid   - OID of the MCV list (pg_mv_statistic.staoid)
+
+and produces a table with these columns:
+
+    - item ID (0...nitems-1)
+    - values (string array)
+    - nulls only (boolean array)
+    - frequency (double precision)
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index a38ea7b..5c5c59a 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,9 +8,50 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-Currently we only have one kind of multivariate statistics - soft functional
-dependencies, and we use it to improve estimates of equality clauses. See
-README.dependencies for details.
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) MCV lists (README.mcv)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+    (b) MCV list - equality and inequality clauses, IS [NOT] NULL, AND/OR
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
 
 
 Selectivity estimation
@@ -23,14 +64,48 @@ When estimating selectivity, we aim to achieve several things:
     (b) minimize the overhead, especially when no suitable multivariate stats
         exist (so if you are not using multivariate stats, there's no overhead)
 
-This clauselist_selectivity() performs several inexpensive checks first, before
+Thus clauselist_selectivity() performs several inexpensive checks first, before
 even attempting to do the more expensive estimation.
 
     (1) check if there are multivariate stats on the relation
 
-    (2) check there are at least two attributes referenced by clauses compatible
-        with multivariate statistics (equality clauses for func. dependencies)
+    (2) check that there are functional dependencies on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equality clauses for func. dependencies)
 
     (3) perform reduction of equality clauses using func. dependencies
 
-    (4) estimate the reduced list of clauses using regular statistics
+    (4) check that there are multivariate MCV lists on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equalities, inequalities, etc.)
+
+    (5) find the best multivariate statistics (matching the most conditions)
+        and use it to compute the estimate
+
+    (6) estimate the remaining clauses (not estimated using multivariate stats)
+        using the regular per-column statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
+
+
+Further (possibly crazy) ideas
+------------------------------
+
+Currently the clauses are only estimated using a single statistics, even if
+there are multiple candidate statistics - for example assume we have statistics
+on (a,b,c) and (b,c,d), and estimate conditions
+
+    (b = 1) AND (c = 2)
+
+Then both statistics may be used, but we only use one of them. Maybe we could
+use compute estimates using all candidate stats, and somehow aggregate them
+into the final estimate by using average or median.
+
+Some stats may give better estimates than others, but it's very difficult to say
+in advance which stats are the best (it depends on the number of buckets, number
+of additional columns not referenced in the clauses, type of condition etc.).
+
+But of course, this may result in expensive estimation (CPU-wise).
+
+So we might add a GUC to choose between a simple (single statistics) and thus
+multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index dcb7c78..4f5a842 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -16,12 +16,14 @@
 
 #include "common.h"
 
+#include "utils/array.h"
+
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
-									  int natts, VacAttrStats **vacattrstats);
+											 int natts,
+											 VacAttrStats **vacattrstats);
 
 static List* list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -49,6 +51,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		int				j;
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
+		MCVList		mcvlist   = NULL;
+		int numrows_filtered  = 0;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -87,8 +91,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, attrs);
+		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -166,6 +174,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -180,8 +190,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector*
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+							ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/* TODO maybe save the list into relcache, as in RelationGetIndexList
+	 *      (which was used as an inspiration of this one)?. */
+
+	return keys;
+}
+
+
 void
-update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
+update_mv_stats(Oid mvoid,
+				MVDependencies dependencies, MCVList mcvlist,
+				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -206,18 +264,29 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea * data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stamcv -1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
@@ -246,6 +315,21 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector * stakeys)
+{
+	int i, idx = 0;
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -256,11 +340,15 @@ update_mv_stats(Oid mvoid, MVDependencies dependencies, int2vector *attrs)
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum*)a;
-	Datum		db = *(Datum*)b;
-	SortSupport ssup= (SortSupport) arg;
+	return compare_datums_simple(*(Datum*)a,
+								 *(Datum*)b,
+								 (SortSupport)arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
@@ -377,3 +465,32 @@ multi_sort_compare_dims(int start, int end,
 
 	return 0;
 }
+
+/* simple counterpart to qsort_arg */
+void *
+bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,
+			int (*compar) (const void *, const void *, void *),
+			void *arg)
+{
+	size_t l, u, idx;
+	const void *p;
+	int comparison;
+
+	l = 0;
+	u = nmemb;
+	while (l < u)
+	{
+		idx = (l + u) / 2;
+		p = (void *) (((const char *) base) + (idx * size));
+		comparison = (*compar) (key, p, arg);
+
+		if (comparison < 0)
+			u = idx;
+		else if (comparison > 0)
+			l = idx + 1;
+		else
+			return (void *) p;
+	}
+
+	return NULL;
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index 75b9c54..350760b 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -47,6 +47,14 @@ typedef struct
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
 
+/* (de)serialization info */
+typedef struct DimensionInfo {
+	int		nvalues;	/* number of deduplicated values */
+	int		nbytes;		/* number of bytes (serialized) */
+	int		typlen;		/* pg_type.typlen */
+	bool	typbyval;	/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData {
 	int				ndims;		/* number of dimensions supported by the */
@@ -58,6 +66,7 @@ typedef MultiSortSupportData* MultiSortSupport;
 typedef struct SortItem {
 	Datum  *values;
 	bool   *isnull;
+	int		count;
 } SortItem;
 
 MultiSortSupport multi_sort_init(int ndims);
@@ -74,5 +83,11 @@ int multi_sort_compare_dims(int start, int end, const SortItem *a,
 							const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
 int compare_scalars_simple(const void *a, const void *b, void *arg);
 int compare_scalars_partition(const void *a, const void *b, void *arg);
+
+void * bsearch_arg(const void *key, const void *base,
+				   size_t nmemb, size_t size,
+				   int (*compar) (const void *, const void *, void *),
+				   void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..b300c1a
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1120 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes              (ndim * sizeof(uint16))
+ * - null flags           (ndim * sizeof(bool))
+ * - frequency            (sizeof(double))
+ *
+ * So in total:
+ *
+ *   ndim * (sizeof(uint16) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* Macros for convenient access to parts of the serialized MCV item */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+static MultiSortSupport build_mss(VacAttrStats **stats, int2vector *attrs);
+
+static SortItem *build_sorted_items(int numrows, HeapTuple *rows,
+									TupleDesc tdesc, MultiSortSupport mss,
+									int2vector *attrs);
+
+static SortItem *build_distinct_groups(int numrows, SortItem *items,
+									   MultiSortSupport mss, int *ndistinct);
+
+static int count_distinct_groups(int numrows, SortItem *items,
+								 MultiSortSupport mss);
+
+/*
+ * Builds MCV list from the set of sampled rows.
+ *
+ * The algorithm is quite simple:
+ *
+ *     (1) sort the data (default collation, '<' for the data type)
+ *
+ *     (2) count distinct groups, decide how many to keep
+ *
+ *     (3) build the MCV list using the threshold determined in (2)
+ *
+ *     (4) remove rows represented by the MCV from the sample
+ *
+ * The method also removes rows matching the MCV items from the input array,
+ * and passes the number of remaining rows (useful for building histograms)
+ * using the numrows_filtered parameter.
+ *
+ * FIXME Use max_mcv_items from ALTER TABLE ADD STATISTICS command.
+ *
+ * FIXME Single-dimensional MCV is sorted by frequency (descending). We should
+ *       do that too, because when walking through the list we want to check
+ *       the most frequent items first.
+ *
+ * TODO We're using Datum (8B), even for data types (e.g. int4 or float4).
+ *      Maybe we could save some space here, but the bytea compression should
+ *      handle it just fine.
+ *
+ * TODO This probably should not use the ndistinct directly (as computed from
+ *      the table, but rather estimate the number of distinct values in the
+ *      table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats, int *numrows_filtered)
+{
+	int i;
+	int numattrs = attrs->dim1;
+	int ndistinct = 0;
+	int mcv_threshold = 0;
+	int nitems = 0;
+
+	MCVList	mcvlist = NULL;
+
+	/* comparator for all the columns */
+	MultiSortSupport mss = build_mss(stats, attrs);
+
+	/* sort the rows */
+	SortItem   *items  = build_sorted_items(numrows, rows, stats[0]->tupDesc,
+											mss, attrs);
+
+	/* transform the sorted rows into groups (sorted by frequency) */
+	SortItem   *groups = build_distinct_groups(numrows, items, mss, &ndistinct);
+
+	/*
+	 * Determine the minimum size of a group to be eligible for MCV list, and
+	 * check how many groups actually pass that threshold. We use 1.25x the
+	 * avarage group size, just like for regular statistics.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e. if there
+	 * are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS), we'll require
+	 * only 2 rows per group.
+	 *
+	 * FIXME This should really reference mcv_max_items (from catalog) instead
+	 * 		 of the constant MVSTAT_MCVLIST_MAX_ITEMS.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4  : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/* Walk through the groups and stop once we fall below the threshold. */
+	nitems = 0;
+	for (i = 0; i < ndistinct; i++)
+	{
+		if (groups[i].count < mcv_threshold)
+			break;
+
+		nitems++;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as we will
+		 * pass the result outside and thus it needs to be easy to pfree().
+		 *
+		 * XXX Although we're the only ones dealing with this.
+		 */
+		mcvlist->items = (MCVItem*)palloc0(sizeof(MCVItem)*nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem)palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum*)palloc0(sizeof(Datum)*numattrs);
+			mcvlist->items[i]->isnull = (bool*)palloc0(sizeof(bool)*numattrs);
+		}
+
+		/* Copy the first chunk of groups into the result. */
+		for (i = 0; i < nitems; i++)
+		{
+			/* just pointer to the proper place in the list */
+			MCVItem item = mcvlist->items[i];
+
+			/* copy values from the _previous_ group (last item of) */
+			memcpy(item->values, groups[i].values, sizeof(Datum) * numattrs);
+			memcpy(item->isnull, groups[i].isnull, sizeof(bool)  * numattrs);
+
+			/* and finally the group frequency */
+			item->frequency = (double)groups[i].count / numrows;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows that are
+		 * not represented by the MCV list). We will first sort the groups
+		 * by the keys (not by count) and then use binary search.
+		 */
+		if (nitems > ndistinct)
+		{
+			int i, j;
+			int nfiltered = 0;
+
+			/* used for the searches */
+			SortItem key;
+
+			/* wfill this with data from the rows */
+			key.values = (Datum*)palloc0(numattrs * sizeof(Datum));
+			key.isnull = (bool*)palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * Sort the groups for bsearch_r (but only the items that actually
+			 * made it to the MCV list).
+			 */
+			qsort_arg((void *) groups, nitems, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					key.values[j]
+						= heap_getattr(rows[i], attrs->values[j],
+									   stats[j]->tupDesc, &key.isnull[j]);
+
+				/* if not included in the MCV list, keep it in the array */
+				if (bsearch_arg(&key, groups, nitems, sizeof(SortItem),
+								multi_sort_compare, mss) == NULL)
+					rows[nfiltered++] = rows[i];
+			}
+
+			/* remember how many rows we actually kept */
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(key.values);
+			pfree(key.isnull);
+		}
+		else
+			/* the MCV list convers all the rows */
+			*numrows_filtered = 0;
+	}
+
+	pfree(items);
+	pfree(groups);
+
+	return mcvlist;
+}
+
+/* build MultiSortSupport for the attributes passed in attrs */
+static MultiSortSupport
+build_mss(VacAttrStats **stats, int2vector *attrs)
+{
+	int	i;
+	int	numattrs = attrs->dim1;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	return mss;
+}
+
+/* build sorted array of SortItem with values from rows */
+static SortItem *
+build_sorted_items(int numrows, HeapTuple *rows, TupleDesc tdesc,
+				   MultiSortSupport mss, int2vector *attrs)
+{
+	int	i, j, len;
+	int	numattrs = attrs->dim1;
+	int	nvalues = numrows * numattrs;
+
+	/*
+	 * We won't allocate the arrays for each item independenly, but in one large
+	 * chunk and then just set the pointers.
+	 */
+	SortItem   *items;
+	Datum	   *values;
+	bool	   *isnull;
+	char	   *ptr;
+
+	/* Compute the total amount of memory we need (both items and values). */
+	len = numrows * sizeof(SortItem) + nvalues * (sizeof(Datum) + sizeof(bool));
+
+	/* Allocate the memory and split it into the pieces. */
+	ptr = palloc0(len);
+
+	/* items to sort */
+	items = (SortItem*)ptr;
+	ptr += numrows * sizeof(SortItem);
+
+	/* values and null flags */
+	values = (Datum*)ptr;
+	ptr += nvalues * sizeof(Datum);
+
+	isnull = (bool*)ptr;
+	ptr += nvalues * sizeof(bool);
+
+	/* make sure we consumed the whole buffer exactly */
+	Assert((ptr - (char*)items) == len);
+
+	/* fix the pointers to Datum and bool arrays */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+
+		/* load the values/null flags from sample rows */
+		for (j = 0; j < numattrs; j++)
+		{
+			items[i].values[j] = heap_getattr(rows[i],
+										  attrs->values[j], /* attnum */
+										  tdesc,
+										  &items[i].isnull[j]);	/* isnull */
+		}
+	}
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	return items;
+}
+
+/* count distinct combinations of SortItems in the array */
+static int
+count_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss)
+{
+	int i;
+	int ndistinct;
+
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			ndistinct += 1;
+
+	return ndistinct;
+}
+
+/* compares frequencies of the SortItem entries (in descending order) */
+static int
+compare_sort_item_count(const void *a, const void *b)
+{
+	SortItem *ia = (SortItem *)a;
+	SortItem *ib = (SortItem *)b;
+
+	if (ia->count == ib->count)
+		return 0;
+	else if (ia->count > ib->count)
+		return -1;
+
+	return 1;
+}
+
+/* builds SortItems for distinct groups and counts the matching items */
+static SortItem *
+build_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss,
+					  int *ndistinct)
+{
+	int	i, j;
+	int ngroups = count_distinct_groups(numrows, items, mss);
+
+	SortItem *groups = (SortItem*)palloc0(ngroups * sizeof(SortItem));
+
+	j = 0;
+	groups[0] = items[0];
+	groups[0].count = 1;
+
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			groups[++j] = items[i];
+
+		groups[j].count++;
+	}
+
+	pg_qsort((void *) groups, ngroups, sizeof(SortItem),
+			 compare_sort_item_count);
+
+	*ndistinct = ngroups;
+	return groups;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList	mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * serialize MCV list into a bytea value
+ *
+ *
+ * The basic algorithm is simple:
+ *
+ * (1) perform deduplication (for each attribute separately)
+ *     (a) collect all (non-NULL) attribute values from all MCV items
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *     (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we may be mixing
+ * different datatypes, with different sort operators, etc.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't allow more
+ * than 8k MCV items (see list max_mcv_items), although that's mostly arbitrary
+ * limit. We might increase this to 65k and still fit into uint16.
+ *
+ * We don't really expect the serialization to save as much space as for
+ * histograms, because we are not doing any bucket splits (which is the source
+ * of high redundancy in histograms).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into a single char
+ *      (or a longer type) instead of using an array of bool items.
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int	i, j;
+	int	ndims = mcvlist->ndimensions;
+	int	itemsize = ITEM_SIZE(ndims);
+
+	SortSupport		ssup;
+	DimensionInfo  *info;
+
+	Size	total_length;
+
+	/* allocate just once */
+	char   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea  *output;
+	char   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/*
+	 * We'll include some rudimentary information about the attributes (type
+	 * length, etc.), so that we don't have to look them up while deserializing
+	 * the MCV list.
+	 */
+	info = (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data for all attributes included in the MCV list */
+	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+		int ndistinct;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* copy important info about the data type (length, by-value) */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for values in the attribute and collect them */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			/* skip NULL values - we don't need to serialize them */
+			if (mcvlist->items[j]->isnull[i])
+				continue;
+
+			values[i][counts[i]] = mcvlist->items[j]->values[i];
+			counts[i] += 1;
+		}
+
+		/* there are just NULL values in this dimension, we're done */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate the data */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicate values, but keep the
+		 * ordering (so that we can do bsearch later). We know there's at least
+		 * one item as (counts[i] != 0), so we can skip the first element.
+		 */
+		ndistinct = 1;	/* number of distinct values */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if the value is the same as the previous one, we can skip it */
+			if (! compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]))
+				continue;
+
+			values[i][ndistinct] = values[i][j];
+			ndistinct += 1;
+		}
+
+		/* we must not exceed UINT16_MAX, as we use uint16 indexes */
+		Assert(ndistinct <= UINT16_MAX);
+
+		/*
+		 * Store additional info about the attribute - number of deduplicated
+		 * values, and also size of the serialized data. For fixed-length data
+		 * types this is trivial to compute, for varwidth types we need to
+		 * actually walk the array and sum the sizes.
+		 */
+		info[i].nvalues = ndistinct;
+
+		if (info[i].typlen > 0)					/* fixed-length data types */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)			/* varlena */
+		{
+			info[i].nbytes = 0;
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		}
+		else if (info[i].typlen == -2)			/* cstring */
+		{
+			info[i].nbytes = 0;
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		}
+
+		/* we know (count>0) so there must be some data */
+		Assert(info[i].nbytes > 0);
+	}
+
+	/*
+	 * Now we can finally compute how much space we'll actually need for the
+	 * serialized MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nitems (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and then we
+	 * will place all the data (values + indexes).
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (1024 * 1024))
+		elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* 'data' points to the current position in the output buffer */
+	data = VARDATA(output);
+
+	/* MCV list header (number of items, ...) */
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	/* information about the attributes */
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* now serialize the deduplicated values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;	/* remember the starting point */
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			Datum v = values[i][j];
+
+			if (info[i].typbyval)			/* passed by value */
+			{
+				memcpy(data, &v, info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)	/* pased by reference */
+			{
+				memcpy(data, DatumGetPointer(v), info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)	/* varlena */
+			{
+				memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+				data += VARSIZE_ANY(v);
+			}
+			else if (info[i].typlen == -2)	/* cstring */
+			{
+				memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v))+1);
+				data += strlen(DatumGetPointer(v)) + 1;	/* terminator */
+			}
+		}
+
+		/* make sure we got exactly the amount of data we expected */
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* finally serialize the items, with uint16 indexes instead of the values */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem	mcvitem = mcvlist->items[i];
+
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - itemsize);
+
+		/* reset the item (we only allocate it once and reuse it) */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			Datum  *v = NULL;
+
+			/* do the lookup only for non-NULL values */
+			if (mcvlist->items[i]->isnull[j])
+				continue;
+
+			v = (Datum*)bsearch_arg(&mcvitem->values[j], values[j],
+									info[j].nvalues, sizeof(Datum),
+									compare_scalars_simple, &ssup[j]);
+
+			Assert(v != NULL);	/* serialization or deduplication error */
+
+			/* compute index within the array */
+			ITEM_INDEXES(item)[j] = (v - values[j]);
+
+			/* check the index is within expected bounds */
+			Assert(ITEM_INDEXES(item)[j] >= 0);
+			Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims), mcvitem->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims), &mcvitem->frequency, sizeof(double));
+
+		/* copy the serialized item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	return output;
+}
+
+/*
+ * deserialize MCV list from the varlena value
+ *
+ *
+ * We deserialize the MCV list fully, because we don't expect there bo be a lot
+ * of duplicate values. But perhaps we should keep the MCV in serialized form
+ * just like histograms.
+ */
+MCVList deserialize_mv_mcvlist(bytea * data)
+{
+	int		i, j;
+	Size	expected_size;
+	MCVList mcvlist;
+	char   *tmp;
+
+	int		ndims, nitems, itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16 *indexes = NULL;
+	Datum **values  = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	/* buffer used for the result */
+	int		rbufflen;
+	char   *rbuff;
+	char   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	/* we can't deserialize the MCV if there's not even a complete header */
+	expected_size = offsetof(MCVListData,items);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData,items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList)palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform further sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData,items));
+	tmp += offsetof(MCVListData,items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert((nitems > 0) && (nitems  <= MVSTAT_MCVLIST_MAX_ITEMS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Check amount of data including DimensionInfo for all dimensions and
+	 * also the serialized items (including uint16 indexes). Also, walk
+	 * through the dimension information and add it to the sum.
+	 */
+	expected_size += ndims * sizeof(DimensionInfo) +
+					 (nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+	{
+		Assert(info[i].nvalues >= 0);
+		Assert(info[i].nbytes >= 0);
+
+		expected_size += info[i].nbytes;
+	}
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * Allocate one large chunk of memory for the intermediate data, needed
+	 * only for deserializing the MCV list (and allocate densely to minimize
+	 * the palloc overhead).
+	 *
+	 * Let's see how much space we'll actually need, and also include space
+	 * for the array with pointers.
+	 */
+	bufflen = sizeof(Datum*) * ndims;		/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (! (info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	values = (Datum**)buff;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * XXX This uses pointers to the original data array (the types not passed
+	 *     by value), so when someone frees the memory, e.g. by doing something
+	 *     like this:
+	 *
+	 *         bytea * data = ... fetch the data from catalog ...
+	 *         MCVList mcvlist = deserialize_mcv_list(data);
+	 *         pfree(data);
+	 *
+	 *     then 'mcvlist' references the freed memory. Should copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the other types need a chunk of the buffer */
+			values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should have exhausted the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for all the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum)*ndims + sizeof(bool)*ndims) * nitems;
+
+	rbuff = palloc0(rbufflen);
+	rptr  = rbuff;
+
+	mcvlist->items = (MCVItem*)rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem item = (MCVItem)rptr;
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum*)rptr;
+		rptr += (sizeof(Datum)*ndims);
+
+		item->isnull = (bool*)rptr;
+		rptr += (sizeof(bool) *ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (! item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char*)data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if
+ * the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+		MCVList			mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/* build metadata needed later to produce tuples from raw C-strings */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char *buff = palloc0(1024);
+		char *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList)funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple. This should
+		 * be an array of C strings which will be processed later by the type
+		 * input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum val, valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions-1)
+				format = "%s, %s}";
+
+			if (item->isnull[i])
+				valout = CStringGetDatum("NULL");
+			else
+			{
+				val = item->values[i];
+				valout = FunctionCall1(&fmgrinfo[i], val);
+			}
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency);	/* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 8ce9c0e..2c22d31 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,8 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled,\n"
-						   "  deps_built,\n"
+						   "  deps_enabled, mcv_enabled,\n"
+						   "  deps_built, mcv_built,\n"
+						   "  mcv_max_items,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2128,6 +2129,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2137,10 +2140,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/*  options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-							PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+							PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index c74af47..3529b03 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,15 +38,21 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
+	bool		mcv_enabled;		/* build MCV list? */
+
+	/* MCV size */
+	int32		mcv_max_items;		/* max MCV items */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
+	bool		mcv_built;			/* MCV list was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
 
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
+	bytea		stamcv;				/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -62,14 +68,18 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					8
+#define Natts_pg_mv_statistic					12
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_deps_enabled		5
-#define Anum_pg_mv_statistic_deps_built			6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_stadeps			8
+#define Anum_pg_mv_statistic_mcv_enabled		6
+#define Anum_pg_mv_statistic_mcv_max_items		7
+#define Anum_pg_mv_statistic_deps_built			8
+#define Anum_pg_mv_statistic_mcv_built			9
+#define Anum_pg_mv_statistic_stakeys			10
+#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_stamcv				12
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cdcbf95..5640dc1 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2670,6 +2670,10 @@ DATA(insert OID = 3998 (  pg_mv_stats_dependencies_info     PGNSP PGUID 12 1 0 0
 DESCR("multivariate stats: functional dependencies info");
 DATA(insert OID = 3999 (  pg_mv_stats_dependencies_show     PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_dependencies_show _null_ _null_ _null_ ));
 DESCR("multivariate stats: functional dependencies show");
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 75c4752..f52884a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -655,9 +655,11 @@ typedef struct MVStatisticInfo
 
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index ec55a09..b2643ec 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
+
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
 /*
@@ -43,30 +51,89 @@ typedef MVDependenciesData* MVDependencies;
 #define MVSTAT_DEPS_TYPE_BASIC	1			/* basic dependencies type */
 
 /*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData {
+	double		frequency;	/* frequency of this combination */
+	bool	   *isnull;		/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;		/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData {
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem	   *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2	/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1			/* basic MCV list type */
+
+/*
+ * Limits used for mcv_max_items option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_MCVLIST_MIN_ITEMS, and we cannot
+ * have more than MVSTAT_MCVLIST_MAX_ITEMS items.
+ *
+ * This is just a boundary for the 'max' threshold - the actual list
+ * may of course contain less items than MVSTAT_MCVLIST_MIN_ITEMS.
+ */
+#define MVSTAT_MCVLIST_MIN_ITEMS	128		/* min items in MCV list */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
  */
 
 MVDependencies load_mv_dependencies(Oid mvoid);
+MCVList        load_mv_mcvlist(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
+bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							 VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
+MCVList			deserialize_mv_mcvlist(bytea * data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 /* FIXME this probably belongs somewhere else (not to operations stats) */
 extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
 
 MVDependencies
-build_mv_dependencies(int numrows, HeapTuple *rows,
-								  int2vector *attrs,
-								  VacAttrStats **stats);
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats);
+
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
 
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
-						   int natts, VacAttrStats **vacattrstats);
+					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, int2vector *attrs);
+void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..075320b
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s4 ON mcv_list (unknown_column) WITH (mcv);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s4 ON mcv_list (a) WITH (mcv);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a, b) WITH (mcv);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing MCV statistics
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+ERROR:  option 'mcv' is required by other options(s)
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+ERROR:  max number of MCV items must be at least 128
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+ERROR:  max number of MCV items is 8192
+-- correct command
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s5 ON mcv_list (a, b, c) WITH (mcv);
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s6 ON mcv_list (a, b, c, d) WITH (mcv);
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 06f2231..3d55ffe 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1373,7 +1373,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     s.staname,
     s.stakeys AS attnums,
     length(s.stadeps) AS depsbytes,
-    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo
+    pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
+    length(s.stamcv) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f2ffb8..85d94f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies
+test: mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 097a04f..6584d73 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -163,3 +163,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..b31d32d
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,178 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s4 ON mcv_list (unknown_column) WITH (mcv);
+
+-- single column
+CREATE STATISTICS s4 ON mcv_list (a) WITH (mcv);
+
+-- single column, duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a) WITH (mcv);
+
+-- two columns, one duplicated
+CREATE STATISTICS s4 ON mcv_list (a, a, b) WITH (mcv);
+
+-- unknown option
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (unknown_option);
+
+-- missing MCV statistics
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (dependencies, max_mcv_items=200);
+
+-- invalid mcv_max_items value / too low
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10);
+
+-- invalid mcv_max_items value / too high
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv, max_mcv_items=10000);
+
+-- correct command
+CREATE STATISTICS s4 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s5 ON mcv_list (a, b, c) WITH (mcv);
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s6 ON mcv_list (a, b, c, d) WITH (mcv);
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.5.0

0005-multivariate-histograms.patchtext/x-patch; name=0005-multivariate-histograms.patchDownload

From eb184e590bc0d1e41b9bf69d7cdc6d09e28daac3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 11 Jan 2015 20:18:24 +0100
Subject: [PATCH 5/9] multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.
---
 doc/src/sgml/ref/create_statistics.sgml    |   44 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   44 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  584 +++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/README.histogram |  299 ++++
 src/backend/utils/mvstats/README.stats     |    2 +
 src/backend/utils/mvstats/common.c         |   37 +-
 src/backend/utils/mvstats/histogram.c      | 2023 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   17 +-
 src/include/catalog/pg_mv_statistic.h      |   24 +-
 src/include/catalog/pg_proc.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/mvstats.h                |  138 +-
 src/test/regress/expected/mv_histogram.out |  207 +++
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  176 +++
 21 files changed, 3576 insertions(+), 44 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.histogram
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index d6973e8..f7336fd 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -133,6 +133,24 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>max_buckets</> (<type>integer</>)</term>
+    <listitem>
+     <para>
+      Maximum number of histogram buckets.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>max_mcv_items</> (<type>integer</>)</term>
     <listitem>
      <para>
@@ -220,6 +238,32 @@ EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
 </programlisting>
   </para>
 
+  <para>
+   Create table <structname>t3</> with two strongly correlated columns, and
+   a histogram on those two columns:
+
+<programlisting>
+CREATE TABLE t3 (
+    a   float,
+    b   float
+);
+
+INSERT INTO t3 SELECT mod(i,1000), mod(i,1000) + 50 * (r - 0.5) FROM (
+                   SELECT i, random() r FROM generate_series(1,1000000) s(i)
+                 ) foo;
+
+CREATE STATISTICS s3 ON t3 (a, b) WITH (histogram);
+
+ANALYZE t2;
+
+-- small overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
+
+-- no overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);
+</programlisting>
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5c40334..b151db1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -167,7 +167,9 @@ CREATE VIEW pg_mv_stats AS
         length(S.stadeps) as depsbytes,
         pg_mv_stats_dependencies_info(S.stadeps) as depsinfo,
         length(S.stamcv) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index c480fbe..e0b085f 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -71,12 +71,15 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool 	build_dependencies = false,
-			build_mcv = false;
+			build_mcv = false,
+			build_histogram = false;
 
-	int32 	max_mcv_items = -1;
+	int32 	max_buckets = -1,
+			max_mcv_items = -1;
 
 	/* options required because of other options */
-	bool	require_mcv = false;
+	bool	require_mcv = false,
+			require_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -175,6 +178,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 								MVSTAT_MCVLIST_MAX_ITEMS)));
 
 		}
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "max_buckets") == 0)
+		{
+			max_buckets = defGetInt32(opt);
+
+			/* this option requires 'histogram' to be enabled */
+			require_histogram = true;
+
+			/* sanity check */
+			if (max_buckets < MVSTAT_HIST_MIN_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("minimum number of buckets is %d",
+								MVSTAT_HIST_MIN_BUCKETS)));
+
+			else if (max_buckets > MVSTAT_HIST_MAX_BUCKETS)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("maximum number of buckets is %d",
+								MVSTAT_HIST_MAX_BUCKETS)));
+
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -183,10 +209,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv))
+	if (! (build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -194,6 +220,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("option 'mcv' is required by other options(s)")));
 
+	if (require_histogram && (! build_histogram))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("option 'histogram' is required by other options(s)")));
+
 	/* sort the attnums and build int2vector */
 	qsort(attnums, numcols, sizeof(int16), compare_int16);
 	stakeys = buildint2vector(attnums, numcols);
@@ -214,11 +245,14 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
+	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
 
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
+	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 333e24b..9172f21 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2163,10 +2163,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c16d559..fe96a73 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
+#define		MV_CLAUSE_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -74,6 +75,8 @@ static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 									List *clauses, MVStatisticInfo *mvstats,
 									bool *fullmatch, Selectivity *lowsel);
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -81,6 +84,12 @@ static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									Selectivity *lowsel, bool *fullmatch,
 									bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+									int2vector *stakeys,
+									MVSerializedHistogram mvhist,
+									int nmatches, char * matches,
+									bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
@@ -95,6 +104,7 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
 #define UPDATE_RESULT(m,r,isor)	\
 	(m) = (isor) ? (MAX(m,r)) : (MIN(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -123,7 +133,7 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
  *
  * First we try to reduce the list of clauses by applying (soft) functional
  * dependencies, and then we try to estimate the selectivity of the reduced
- * list of clauses using the multivariate MCV list.
+ * list of clauses using the multivariate MCV list and histograms.
  *
  * Finally we remove the portion of clauses estimated using multivariate stats,
  * and process the rest of the clauses using the regular per-column stats.
@@ -216,11 +226,13 @@ clauselist_selectivity(PlannerInfo *root,
 	 * with the multivariate code and simply skip to estimation using the
 	 * regular per-column stats.
 	 */
-	if (has_stats(stats, MV_CLAUSE_TYPE_MCV) &&
-		(count_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV) >= 2))
+	if (has_stats(stats, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) &&
+		(count_mv_attnums(clauses, relid,
+						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
 		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid, MV_CLAUSE_TYPE_MCV);
+		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
+											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
 		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
@@ -232,7 +244,7 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* split the clauselist into regular and mv-clauses */
 			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV);
+										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
 			/* we've chosen the histogram to match the clauses */
 			Assert(mvclauses != NIL);
@@ -944,6 +956,7 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool fullmatch = false;
+	Selectivity s1 = 0.0, s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound
@@ -957,9 +970,24 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 *      MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
 										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 *      selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1039,7 +1067,7 @@ count_varnos(List *clauses, Index *relid)
 
 	return cnt;
 }
- 
+
 /*
  * We're looking for statistics matching at least 2 attributes, referenced in
  * clauses compatible with multivariate statistics. The current selection
@@ -1129,7 +1157,7 @@ choose_mv_statistics(List *stats, Bitmapset *attnums)
 		int	numattrs = attrs->dim1;
 
 		/* skip dependencies-only stats */
-		if (! info->mcv_built)
+		if (! (info->mcv_built || info->hist_built))
 			continue;
 
 		/* count columns covered by the histogram */
@@ -1251,7 +1279,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 	}
 
 	if (or_clause(node) || and_clause(node) || not_clause(node))
- 	{
+	{
 		/*
 		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
 		 *
@@ -1277,10 +1305,10 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		}
 
 		return false;
- 	}
+	}
 
 	if (IsA(node, NullTest))
- 	{
+	{
 		NullTest* nt = (NullTest*)node;
 
 		/*
@@ -1360,9 +1388,9 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 			case F_SCALARGTSEL:
 
 				/* not compatible with functional dependencies */
-				if (! (context->types & MV_CLAUSE_TYPE_MCV))
+				if (! (context->types & (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST)))
 					return true;	/* terminate */
- 
+
 				break;
 
 			default:
@@ -1588,6 +1616,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_MCV) && stat->mcv_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+		return true;
+
 	return false;
 }
 
@@ -1606,6 +1637,9 @@ has_stats(List *stats, int type)
 		/* terminate if we've found at least one matching statistics */
 		if (stats_type_matches(stat, type))
 			return true;
+
+		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -2010,3 +2044,525 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *   1) mark all buckets as 'full match'
+ *   2) walk through all the clauses
+ *   3) for a particular clause, walk through all the buckets
+ *   4) skip buckets that are already 'no match'
+ *   5) check clause for buckets that still match (at least partially)
+ *   6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO This might use a similar shortcut to MCV lists - count buckets
+ *      marked as partial/full match, and terminate once this drop to 0.
+ *      Not sure if it's really worth it - for MCV lists a situation like
+ *      this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int		nmatches = 0;
+	char   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (! mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert (mvhist != NULL);
+	Assert (clauses != NIL);
+	Assert (list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default
+	 * all buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample got into the histogram, and the rest
+		 * is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole histogram, which might save us some time
+		 *      spent accessing the not-matching part of the histogram.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/* cached result of bucket boundary comparison for a single dimension */
+
+#define HIST_CACHE_NOT_FOUND		0x00
+#define HIST_CACHE_FALSE			0x01
+#define HIST_CACHE_TRUE				0x03
+#define HIST_CACHE_MASK				0x02
+
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache)
+{
+	bool a, b;
+
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/*
+	 * First some quick checks on equality - if any of the boundaries equals,
+	 * we have a partial match (so no need to call the comparator).
+	 */
+	if (((min_value == constvalue) && (min_include)) ||
+		((max_value == constvalue) && (max_include)))
+		return MVSTATS_MATCH_PARTIAL;
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	/*
+	 * If result for the bucket lower bound not in cache, evaluate the function
+	 * and store the result in the cache.
+	 */
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, min_value));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/* And do the same for the upper bound. */
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, max_value));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	return (a ^ b) ? MVSTATS_MATCH_PARTIAL : MVSTATS_MATCH_NONE;
+}
+
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+					  Datum min_value,   Datum max_value,
+					  int   min_index,   int   max_index,
+					  bool  min_include, bool  max_include,
+					  char * callcache, bool isgt)
+{
+	char min_cached = callcache[min_index];
+	char max_cached = callcache[max_index];
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	bool a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	bool b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	if (! min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   min_value,
+										   constvalue));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	if (! max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   max_value,
+										   constvalue));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/*
+	 * Now, we need to combine both results into the final answer, and we need
+	 * to be careful about the 'isgt' variable which kinda inverts the meaning.
+	 *
+	 * First, we handle the case when each boundary returns different results.
+	 * In that case the outcome can only be 'partial' match.
+	 */
+	 if (a != b)
+		return MVSTATS_MATCH_PARTIAL;
+
+	/*
+	 * When the results are the same, then it depends on the 'isgt' value. There
+	 * are four options:
+	 *
+	 * isgt=false a=b=true  => full match
+	 * isgt=false a=b=false => empty
+	 * isgt=true  a=b=true  => empty
+	 * isgt=true  a=b=false => full match
+	 *
+	 * We'll cheat a bit, because we know that (a=b) so we'll use just one of them.
+	 */
+	if (isgt)
+		return (!a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+	else
+		return ( a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ *       than two possible values for each item - no match, partial
+ *       match and full match. So we need 2 bits per item.
+ *
+ * TODO This works with 'bitmap' where each item is represented as a
+ *      char, which is slightly wasteful. Instead, we could use a bitmap
+ *      with 2 bits per item, reducing the size to ~1/4. By using values
+ *      0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ *      might be performed just like for simple bitmap by using & and |,
+ *      which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char * matches,
+							  bool is_or)
+{
+	int i;
+	ListCell * l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses,
+	 * to minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte
+	 * per value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called
+	 * 0x01 - called, result is 'false'
+	 * 0x03 - called, result is 'true'
+	 */
+	char *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach (l, clauses)
+	{
+		Node * clause = (Node*)lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node*)((RestrictInfo*)clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr * expr = (OpExpr*)clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc;			/* operator */
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				 (is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo		ltproc;
+				RegProcedure	oprrest = get_oprrest(expr->opno);
+
+				Var * var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const * cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool isgt = (! varonleft);
+
+				TypeCacheEntry *typecache
+								= lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+
+				/* lookup dimension for the attribute */
+				int idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the bitmap
+				 *
+				 * We already know the clauses use suitable operators (because that's
+				 * how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					char res = MVSTATS_MATCH_NONE;
+
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum minval, maxval;
+					bool mininclude, maxinclude;
+					int minidx, maxidx;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no match'
+					 * (and then skip them). For OR-lists this is not possible.
+					 */
+					if ((! is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is impotant
+					 * considering how we update the info (we only lower the match).
+					 * We can't really do anything about the MATCH_PARTIAL buckets.
+					 */
+					if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minidx = bucket->min[idx];
+					maxidx = bucket->max[idx];
+
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mininclude = bucket->min_inclusive[idx];
+					maxinclude = bucket->max_inclusive[idx];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar optimization
+					 *      as for the MCV lists:
+					 *
+					 *      (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 *      (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 *      But it's more complex because of the partial matches.
+					 */
+
+					/*
+					* If it's not a "<" or ">" or "=" operator, just ignore the
+					* clause. Otherwise note the relid and attnum for the variable.
+					*
+					* TODO I'm really unsure the handling of 'isgt' flag (that is, clauses
+					*      with reverse order of variable/constant) is correct. I wouldn't
+					*      be surprised if there was some mixup. Using the lt/gt operators
+					*      instead of messing with the opproc could make it simpler.
+					*      It would however be using a different operator than the query,
+					*      although it's not any shadier than using the selectivity function
+					*      as is done currently.
+					*/
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:	/* Var < Const */
+						case F_SCALARGTSEL:	/* Var > Const */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															   minval, maxval,
+															   minidx, maxidx,
+															   mininclude, maxinclude,
+															   callcache, isgt);
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the bucket, using the
+							 * lt operator, and we also check for equality with the boundaries.
+							 */
+
+							res = bucket_contains_value(ltproc, cst->constvalue,
+														minval, maxval,
+														minidx, maxidx,
+														mininclude, maxinclude,
+														callcache);
+							break;
+					}
+
+					UPDATE_RESULT(matches[i], res, is_or);
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest * expr = (NullTest*)clause;
+			Var * var = (Var*)(expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We can
+			 * skip items that were already ruled out, and terminate if there are
+			 * no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is impotant
+				 * considering how we update the info (we only lower the match)
+				 */
+				if ((! is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the bucket, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (! bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/* AND/OR clause, with all clauses compatible with the selected MV stat */
+
+			int			i;
+			BoolExpr   *orclause  = ((BoolExpr*)clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int	or_nmatches = 0;
+			char * or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+										stakeys, mvhist,
+										or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one*/
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * To AND-merge the bitmaps, a MIN() semantics is used.
+				 * For OR-merge, use MAX().
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8394111..2519249 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -422,10 +422,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index f9bf10c..9dbb3b6 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.histogram b/src/backend/utils/mvstats/README.histogram
new file mode 100644
index 0000000..a182fa3
--- /dev/null
+++ b/src/backend/utils/mvstats/README.histogram
@@ -0,0 +1,299 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+    * frequency (fraction of tuples in the bucket)
+    * whether the boundaries are inclusive or exclusive
+    * whether the dimension contains only NULL values
+    * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+    (a) FULL_MATCH - The bucket definitely matches the clause.
+
+    (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+                        the tuples it represents.
+
+    (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+    WHERE (a < 10) => FULL_MATCH (all range values are below
+                      10, so the whole bucket matches)
+
+    WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+                      the clause, but we don't know how many)
+
+    WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+                      no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+    (a) create an initial bucket (containing all sample rows)
+
+    (b) create NULL buckets (by splitting the initial bucket)
+
+    (c) repeat
+
+        (1) choose bucket to split next
+
+        (2) terminate if no bucket that might be split found, or if we've
+            reached the maximum number of buckets (16384)
+
+        (3) choose dimension to partition the bucket by
+
+        (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split, as discussed in the next section.
+
+
+Partitioning criteria
+---------------------
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency.
+
+We also need to produce "regular" buckets, because buckets with one dimension
+much longer than the others are very likely to match a lot of conditions (which
+increases error, even if the bucket frequency is very low).
+
+This is especially important when handling OR-clauses, because in that case each
+clause may add buckets independently. With AND-clauses all the clauses have to
+match each bucket, which makes this issue somewhat less concenrning.
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket(),
+which also includes further discussion.
+
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL.
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_mv_statistic table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+    2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+    (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+    2*N + K       = 1007
+    (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+    1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+    SELECT * FROM pg_mv_histogram_buckets();
+
+It has two input parameters:
+
+    oid   - OID of the histogram (pg_mv_statistic.staoid)
+    otype - type of output
+
+and produces a table with these columns:
+
+    - bucket ID                (0...nbuckets-1)
+    - lower bucket boundaries  (string array)
+    - upper bucket boundaries  (string array)
+    - nulls only dimensions    (boolean array)
+    - lower boundary inclusive (boolean array)
+    - upper boundary includive (boolean array)
+    - frequency                (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+    - 0 - values stored in the histogram, encoded as text
+    - 1 - indexes into the deduplicated arrays
+    - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 5c5c59a..3e4f4d1 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -18,6 +18,8 @@ Currently we only have two kinds of multivariate statistics
 
     (b) MCV lists (README.mcv)
 
+    (c) multivariate histograms (README.histogram)
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 4f5a842..f6d1074 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,11 +13,11 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
+#include "utils/array.h"
 
 #include "common.h"
 
-#include "utils/array.h"
-
 static VacAttrStats ** lookup_var_attr_stats(int2vector *attrs,
 											 int natts,
 											 VacAttrStats **vacattrstats);
@@ -52,7 +52,8 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVStatisticInfo *stat = (MVStatisticInfo *)lfirst(lc);
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
-		int numrows_filtered  = 0;
+		MVHistogram	histogram = NULL;
+		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
 		int				numatts   = 0;
@@ -95,8 +96,12 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -176,6 +181,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -190,7 +197,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -236,9 +242,16 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 }
 
 
+/*
+ * FIXME This adds statistics, but we need to drop statistics when the
+ *       table is dropped. Not sure what to do when a column is dropped.
+ *       Either we can (a) remove all stats on that column, (b) remove
+ *       the column from defined stats and force rebuild, (c) remove the
+ *       column on next ANALYZE. Or maybe something else?
+ */
 void
 update_mv_stats(Oid mvoid,
-				MVDependencies dependencies, MCVList mcvlist,
+				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -271,22 +284,34 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea * data = serialize_mv_histogram(histogram, attrs, stats);
+		nulls[Anum_pg_mv_statistic_stahist-1]    = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
+	replaces[Anum_pg_mv_statistic_stahist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..4bf7ec6
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2023 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+										 int2vector *attrs,
+										 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket * buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+								 VacAttrStats **stats,
+								 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+									VacAttrStats ** stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+									   int2vector *attrs,
+									   VacAttrStats ** stats,
+									   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+								int2vector *attrs, VacAttrStats ** stats);
+
+static Datum * build_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+							   VacAttrStats **stats, int i, int *nvals);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples     (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(uint16))
+ * - max boundary indexes (2 * ndim * sizeof(uint16))
+ *
+ * So in total:
+ *
+ *   ndim * (4 * sizeof(uint16) + 3 * sizeof(bool)) + (2 * sizeof(float))
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		(*(float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n)	((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n)	((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData {
+
+	float	ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;		/* aray of sample rows */
+	uint32		numrows;	/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when
+	 * building the histogram (and is not serialized/deserialized).
+	 */
+	uint32 *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData	*HistogramBuild;
+
+/*
+ * builds a multivariate algorithm
+ *
+ * The build algorithm is iterative - initially a single bucket containing all
+ * the sample rows is formed, and then repeatedly split into smaller buckets.
+ * In each step the largest bucket (in some sense) is chosen to be split next.
+ *
+ * The criteria for selecting the largest bucket (and the dimension for the
+ * split) needs to be elaborate enough to produce buckets of roughly the same
+ * size, and also regular shape (not very long in one dimension).
+ *
+ * The current algorithm works like this:
+ *
+ *     build NULL-buckets (create_null_buckets)
+ *
+ *     while [maximum number of buckets not reached]
+ *
+ *         choose bucket to partition (largest bucket)
+ *             if no bucket to partition
+ *                 terminate the algorithm
+ *
+ *         choose bucket dimension to partition (largest dimension)
+ *             split the bucket into two buckets
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket for
+ * more details about the algorithm.
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int i;
+	int numattrs = attrs->dim1;
+
+	int			   *ndistvalues;
+	Datum		  **distvalues;
+
+	MVHistogram		histogram;
+
+	HeapTuple * rows_copy = (HeapTuple*)palloc0(numrows * sizeof(HeapTuple));
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* build histogram header */
+
+	histogram = (MVHistogram)palloc0(sizeof(MVHistogramData));
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type  = MVSTAT_HIST_TYPE_BASIC;
+
+	histogram->nbuckets = 1;
+	histogram->ndimensions = numattrs;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket*)palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later to select
+	 * dimension to partition).
+	 */
+	ndistvalues = (int*)palloc0(sizeof(int) * numattrs);
+	distvalues  = (Datum**)palloc0(sizeof(Datum*) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+		distvalues[i] = build_ndistinct(numrows, rows, attrs, stats, i,
+										&ndistvalues[i]);
+
+	/*
+	 * Split the initial bucket into buckets that don't mix NULL and non-NULL
+	 * values in a single dimension.
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	/*
+	 * Do the actual histogram build - select a bucket and split it.
+	 *
+	 * FIXME This should use  the max_buckets specified in CREATE STATISTICS.
+	 */
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket bucket = select_bucket_to_partition(histogram->nbuckets,
+													 histogram->buckets);
+
+		/* no buckets eligible for partitioning */
+		if (bucket == NULL)
+			break;
+
+		/* we modify the bucket in-place and add one new bucket */
+		histogram->buckets[histogram->nbuckets++]
+			= partition_bucket(bucket, attrs, stats, ndistvalues, distvalues);
+	}
+
+	/* finalize the histogram build - compute the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+			= ((HistogramBuild)histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in case some
+		 * of the rows were used for MCV.
+		 *
+		 * XXX Perhaps this should simply compute frequency with respect to the
+		 *     local freuquency, and then factor-in the MCV later.
+		 *
+		 * FIXME The 'ntuples' sounds a bit inappropriate for frequency.
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* build array of distinct values for a single attribute */
+static Datum *
+build_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				VacAttrStats **stats, int i, int *nvals)
+{
+	int				j;
+	int				nvalues,
+					ndistinct;
+	Datum		   *values,
+				   *distvalues;
+
+	SortSupportData	ssup;
+	StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	nvalues = 0;
+	values = (Datum*)palloc0(sizeof(Datum) * numrows);
+
+	/* collect values from the sample rows, ignore NULLs */
+	for (j = 0; j < numrows; j++)
+	{
+		Datum	value;
+		bool	isnull;
+
+		/* remember the index of the sample row, to make the partitioning simpler */
+		value = heap_getattr(rows[j], attrs->values[i],
+							 stats[i]->tupDesc, &isnull);
+
+		if (isnull)
+			continue;
+
+		values[nvalues++] = value;
+	}
+
+	/* if no non-NULL values were found, free the memory and terminate */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		return NULL;
+	}
+
+	/* sort the array of values using the SortSupport */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/* count the distinct values first, and allocate just enough memory */
+	ndistinct = 1;
+	for (j = 1; j < nvalues; j++)
+		if (compare_scalars_simple(&values[j], &values[j-1], &ssup) != 0)
+			ndistinct += 1;
+
+	distvalues = (Datum*)palloc0(sizeof(Datum) * ndistinct);
+
+	/* now collect distinct values into the array */
+	distvalues[0] = values[0];
+	ndistinct = 1;
+
+	for (j = 1; j < nvalues; j++)
+	{
+		if (compare_scalars_simple(&values[j], &values[j-1], &ssup) != 0)
+		{
+			distvalues[ndistinct] = values[j];
+			ndistinct += 1;
+		}
+	}
+
+	pfree(values);
+
+	*nvals = ndistinct;
+	return distvalues;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (! HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ *     (a) collect all (non-NULL) attribute values from all buckets
+ *     (b) sort the data (using 'lt' from VacAttrStats)
+ *     (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ *     (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ *       (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char'
+ *      or a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int i = 0, j = 0;
+	Size	total_length = 0;
+
+	bytea  *output = NULL;
+	char   *data = NULL;
+
+	DimensionInfo  *info;
+	SortSupport		ssup;
+
+	int		nbuckets = histogram->nbuckets;
+	int		ndims    = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int		bucketsize = BUCKET_SIZE(ndims);
+	char   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum **values = (Datum**)palloc0(sizeof(Datum*) * ndims);
+	int	   *counts = (int*)palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	info = (DimensionInfo *)palloc0(sizeof(DimensionInfo)*ndims);
+
+	/* sort support data */
+	ssup = (SortSupport)palloc0(sizeof(SortSupportData)*ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *)stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen   = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs (we won't use
+		 * them, but we don't know how many are there), and then collect all
+		 * non-NULL values.
+		 */
+		values[i] = (Datum*)palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (! histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+										compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but
+		 * keep the ordering (so that we can do bsearch later). We know
+		 * there's at least 1 item, so we can skip the first element.
+		 */
+		count = 1;	/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j-1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena
+	 * - magic (4B)
+	 * - type (4B)
+	 * - ndimensions (4B)
+	 * - nbuckets (4B)
+	 * - info (ndim * sizeof(DimensionInfo)
+	 * - arrays of values for each dimension
+	 * - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and
+	 * then we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 1MB (%ld > %d)",
+					total_length, (1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea*)palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* serialize the deduplicated values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			Datum v = values[i][j];
+
+			if (info[i].typbyval)			/* passed by value */
+			{
+				memcpy(data, &v, info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)	/* pased by reference */
+			{
+				memcpy(data, DatumGetPointer(v), info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)	/* varlena */
+			{
+				memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)	/* cstring */
+			{
+				memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v))+1);
+				data += strlen(DatumGetPointer(v)) + 1;
+			}
+		}
+
+		/* make sure we got exactly the amount of data we expected */
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* finally serialize the items, with uint16 indexes instead of the values */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char*)output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		BUCKET_NTUPLES(bucket) = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (! histogram->buckets[i]->nullsonly[j])
+			{
+				uint16 idx;
+				Datum * v = NULL;
+
+				/* min boundary */
+				v = (Datum*)bsearch_arg(&histogram->buckets[i]->min[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								compare_scalars_simple, &ssup[j]);
+
+				Assert(v != NULL);	/* serialization or deduplication error */
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum*)bsearch_arg(&histogram->buckets[i]->max[j],
+								values[j], info[j].nvalues, sizeof(Datum),
+								compare_scalars_simple, &ssup[j]);
+
+				Assert(v != NULL);	/* serialization or deduplication error */
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+				histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+				histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+				histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char*)output) == total_length);
+
+	/* free the values/counts arrays here */
+	pfree(counts);
+	pfree(info);
+	pfree(ssup);
+
+	for (i = 0; i < ndims; i++)
+		pfree(values[i]);
+
+	pfree(values);
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary values
+ * deduplicated, so that it's possible to optimize the estimation part by
+ * caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea * data)
+{
+	int i = 0, j = 0;
+
+	Size	expected_size;
+	char   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int		nbuckets;
+	int		ndims;
+	int		bucketsize;
+
+	/* temporary deserialization buffer */
+	int		bufflen;
+	char   *buff;
+	char   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData,buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData,buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram)palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims    = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete, as we yet
+	 * have to count the array sizes (from DimensionInfo records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData,buckets) +
+					ndims * sizeof(DimensionInfo) +
+					(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo*)(tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* a single buffer for all the values and counts */
+	bufflen = (sizeof(int)  + sizeof(Datum*)) * ndims;
+
+	for (i = 0; i < ndims; i++)
+		/* don't allocate space for byval types, matching Datum */
+		if (! (info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+			   sizeof(MVSerializedBucket) +		/* bucket pointer */
+			   sizeof(MVSerializedBucketData));	/* bucket data */
+
+	buff = palloc0(bufflen);
+	ptr  = buff;
+
+	histogram->nvalues = (int*)ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum**)ptr;
+	ptr += (sizeof(Datum*) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types
+	 *       not passed by value), so when someone frees the memory,
+	 *       e.g. by doing something like this:
+	 *
+	 *           bytea * data = ... fetch the data from catalog ...
+	 *           MCVList mcvlist = deserialize_mcv_list(data);
+	 *           pfree(data);
+	 *
+	 *       then 'mcvlist' references the freed memory. This needs to
+	 *       copy the pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				histogram->values[i] = (Datum*)tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				histogram->values[i] = (Datum*)ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the other types need a chunk of the buffer */
+			histogram->values[i] = (Datum*)ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1); /* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket*)ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket)ptr;
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples			= BUCKET_NTUPLES(tmp);
+		bucket->nullsonly		= BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive	= BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive	= BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min				= BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max				= BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char*)data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int i;
+	int	numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool*)palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum*)palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum*)palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32*)palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which we use
+	 * when selecting bucket to partition), and then number of distinct values
+	 * for each partition (which we use when choosing which dimension to split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm produces
+ * buckets with about equal frequency and regular size. We select the bucket
+ * with the highest number of distinct values, and then split it by the longest
+ * dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this is used
+ * to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this contains
+ *       values for all the tuples from the sample, not just the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned, or NULL if
+ * there are no buckets that may be split (e.g. if all buckets are too small
+ * or contain too few distinct values).
+ *
+ *
+ * Tricky example
+ * --------------
+ *
+ * Consider this table:
+ *
+ *     CREATE TABLE t AS SELECT i AS a, i AS b
+ *                         FROM generate_series(1,1000000) s(i);
+ *
+ *     CREATE STATISTICS s1 ON t (a,b) WITH (histogram);
+ *
+ *     ANALYZE t;
+ *
+ * It's a very specific (and perhaps artificial) example, because every bucket
+ * always has exactly the same number of distinct values in all dimensions,
+ * which makes the partitioning tricky.
+ *
+ * Then:
+ *
+ *     SELECT * FROM t WHERE (a < 100) AND (b < 100);
+ *
+ * is estimated to return ~120 rows, while in reality it returns only 99.
+ *
+ *                           QUERY PLAN
+ *     -------------------------------------------------------------
+ *      Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *                     (actual time=0.129..82.776 rows=99 loops=1)
+ *        Filter: ((a < 100) AND (b < 100))
+ *        Rows Removed by Filter: 999901
+ *      Planning time: 1.286 ms
+ *      Execution time: 82.984 ms
+ *     (5 rows)
+ *
+ * So this estimate is reasonably close. Let's change the query to OR clause:
+ *
+ *     SELECT * FROM t WHERE (a < 100) OR (b < 100);
+ *
+ *                           QUERY PLAN
+ *     -------------------------------------------------------------
+ *      Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *                     (actual time=0.145..99.910 rows=99 loops=1)
+ *        Filter: ((a < 100) OR (b < 100))
+ *        Rows Removed by Filter: 999901
+ *      Planning time: 1.578 ms
+ *      Execution time: 100.132 ms
+ *     (5 rows)
+ *
+ * That's clearly a much worse estimate. This happens because the histogram
+ * contains buckets like this:
+ *
+ *     bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ * i.e. the length of "a" dimension is (30310-3)=30307, while the length of "b"
+ * is (30593-30134)=459. So the "b" dimension is much narrower than "a".
+ * Of course, there are also buckets where "b" is the wider dimension.
+ *
+ * This is partially mitigated by selecting the "longest" dimension but that
+ * only happens after we already selected the bucket. So if we never select the
+ * bucket, this optimization does not apply.
+ *
+ * The other reason why this particular example behaves so poorly is due to the
+ * way we actually split the selected bucket. We do attempt to divide the bucket
+ * into two parts containing about the same number of tuples, but that does not
+ * too well when most of the tuples is squashed on one side of the bucket.
+ *
+ * For example for columns with data on the diagonal (i.e. when a=b), we end up
+ * with a narrow bucket on the diagonal and a huge bucket overing the remaining
+ * part (with much lower density).
+ *
+ * So perhaps we need two partitioning strategies - one aiming to split buckets
+ * with high frequency (number of sampled rows), the other aiming to split
+ * "large" buckets. And alternating between them, somehow.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ *      histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket * buckets)
+{
+	int i;
+	int numrows = 0;
+	MVBucket bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild)buckets[i]->build_data;
+
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS)) {
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest bucket
+ * dimension, measured using the array of distinct values built at the very
+ * beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly distributed,
+ * and then use this to measure length. It's essentially a number of distinct
+ * values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts with
+ * roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning the new
+ * bucket (essentially shrinking the existing one in-place and returning the
+ * other "half" as a new bucket). The caller is responsible for adding the new
+ * bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension most in
+ * need of a split. For a nice summary and general overview, see "rK-Hist : an
+ * R-Tree based histogram for multi-dimensional selectivity estimation" thesis
+ * by J. A. Lopez, Concordia University, p.34-37 (and possibly p. 32-34 for
+ * explanation of the terms).
+ *
+ * It requires care to prevent splitting only one dimension and not splitting
+ * another one at all (which might happen easily in case of strongly dependent
+ * columns - e.g. y=x). The current algorithm minimizes this, but may still
+ * happen for perfectly dependent examples (when all the dimensions have equal
+ * length, the first one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ *      to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int i;
+	int dimension;
+	int numattrs = attrs->dim1;
+
+	Datum split_value;
+	MVBucket new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool isNull;
+	int nvalues = 0;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	StdAnalyzeData * mystats = NULL;
+	ScalarItem * values = (ScalarItem*)palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	int nrows = 1;		/* number of rows below current value */
+	double delta;
+
+	/* needed when splitting the values */
+	HeapTuple * oldrows = data->rows;
+	int oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* Look for the next dimension to split. */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum *a, *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum*)bsearch_arg(&bucket->min[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), compare_scalars_simple, &ssup);
+
+		b = (Datum*)bsearch_arg(&bucket->max[i],
+							distvalues[i], ndistvalues[i],
+							sizeof(Datum), compare_scalars_simple, &ssup);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b-a)*1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b-a)*1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something
+	 * wrong in select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values and
+	 * then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/* remember the index of the sample row, to make the partitioning simpler */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+											 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we never split null-only dimension) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values in this
+	 * dimension, and we want to split this into half, so walk through the
+	 * array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value, and
+	 * use it as an exclusive upper boundary (and inclusive lower boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct values
+	 *      (at least for even distinct counts), but that would require being
+	 *      able to do an average (which does not work for non-numeric types).
+	 *
+	 * TODO Another option is to look for a split that'd give about 50% tuples
+	 *      (not distinct values) in each partition. That might work better
+	 *      when there are a few very frequent values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i-1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows/2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows/2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/* create the new bucket as a (incomplete) copy of the one being partitioned. */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild)new_bucket->build_data;
+
+	/*
+	* Do the actual split of the chosen dimension, using the split value as the
+	* upper bound for the existing bucket, and lower bound for the new one.
+	*/
+	bucket->max[dimension]     = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	bucket->max_inclusive[dimension]		= false;
+	new_bucket->max_inclusive[dimension]	= true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno' index. We
+	 * know 'nrows' rows should remain in the original bucket and the rest goes
+	 * to the new one.
+	 */
+
+	data->rows     = (HeapTuple*)palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple*)palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows	 = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should go to the
+	 * new one. Use the tupno field to get the actual HeapTuple row from the
+	 * original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i-nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time data, i.e.
+ * sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket new_bucket = (MVBucket)palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild)palloc0(sizeof(HistogramBuildData));
+
+	/* Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split. */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool*)palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum*)palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum*)palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions*sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions*sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions*sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32*)palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies the
+ * Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types (assuming
+ * they don't use collations etc.)
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats ** stats)
+{
+	int i, j;
+	int numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	int numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes above
+	 * (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items  = (SortItem*)palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum*)palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool*)palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+								stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats ** stats, bool update_boundaries)
+{
+	int j;
+	int nvalues = 0;
+	bool isNull;
+	HistogramBuild data = (HistogramBuild)bucket->build_data;
+	Datum * values = (Datum*)palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData * mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (! isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/* if there are only NULL values in the column, mark it so and continue
+	 * with the next one */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues-1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs
+	 *       etc.). Although thanks to the deduplication it might work
+	 *       even for those types (equal values will get the same item
+	 *       in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++) {
+		if (values[j] != values[j-1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and non-NULL
+ * values in a single dimension. Each dimension may either be marked as 'nulls
+ * only', and thus containing only NULL values, or it must not contain any NULL
+ * values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns, it's
+ * necessary to build those NULL-buckets. This is done in an iterative way
+ * using this algorithm, operating on a single bucket:
+ *
+ *     (1) Check that all dimensions are well-formed (not mixing NULL and
+ *         non-NULL values).
+ *
+ *     (2) If all dimensions are well-formed, terminate.
+ *
+ *     (3) If the dimension contains only NULL values, but is not marked as
+ *         NULL-only, mark it as NULL-only and run the algorithm again (on
+ *         this bucket).
+ *
+ *     (4) If the dimension mixes NULL and non-NULL values, split the bucket
+ *         into two parts - one with NULL values, one with non-NULL values
+ *         (replacing the current one). Then run the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions should
+ * be quite low - limited by the number of NULL-buckets. Also, in each branch
+ * the number of nested calls is limited by the number of dimensions
+ * (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The number of
+ * buckets produced by this algorithm is rather limited - with N dimensions,
+ * there may be only 2^N such buckets (each dimension may be either NULL or
+ * non-NULL). So with 8 dimensions (current value of MVSTATS_MAX_DIMENSIONS)
+ * there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further optimizing
+ * the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats ** stats)
+{
+	int			i, j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket, null_bucket;
+	int			null_idx, curr_idx;
+	HistogramBuild	data, null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild)bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL in a
+	 * dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute here - we can
+		 *       start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (! null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only, but is
+	 * not yet marked like that. It's enough to mark it and repeat the process
+	 * recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in the
+	 * dimension, one with non-NULL values. We don't need to sort the data or
+	 * anything, but otherwise it's similar to what partition_bucket() does.
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild)null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple*)palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple*)palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+					sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 *      because we know how many distinct values went to each bucket (NULL
+	 *      is not a value, so NULL buckets get 0, and the other bucket got all
+	 *      the distinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new
+	 * one first, because the call may change number of buckets, and
+	 * it's used as an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets-1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if the
+ * statistics contains no histogram (or if there's no statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ *
+ * 0 (actual values)
+ * -----------------
+ *    - prints actual values
+ *    - using the output function of the data type (as string)
+ *    - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *    - prints index of the distinct value (into the serialized array)
+ *    - makes it easier to spot neighbor buckets, etc.
+ *    - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *    - prints index of the distinct value, but normalized into [0,1]
+ *    - similar to 1, but shows how 'long' the bucket range is
+ *    - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options skew the
+ * lengths by distributing the distinct values uniformly. For data types
+ * without a clear meaning of 'distance' (e.g. strings) that is not a big deal,
+ * but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+#define OUTPUT_FORMAT_RAW		0
+#define OUTPUT_FORMAT_INDEXES	1
+#define	OUTPUT_FORMAT_DISTINCT	2
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext	   *funcctx;
+	int					call_cntr;
+	int					max_calls;
+	TupleDesc			tupdesc;
+	AttInMetadata	   *attinmeta;
+
+	Oid					mvoid = PG_GETARG_OID(0);
+	int					otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext   oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples
+		 * from raw C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)    /* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_volume = 1.0;
+		StringInfo	bufs;
+
+		char	   *format;
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram)funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * The scalar values will be formatted directly, using snprintf.
+		 *
+		 * The 'array' values will be formatted through StringInfo.
+		 */
+		values = (char **) palloc0(9 * sizeof(char *));
+		bufs   = (StringInfo) palloc0(9 * sizeof(StringInfoData));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		initStringInfo(&bufs[1]);	/* lower boundaries */
+		initStringInfo(&bufs[2]);	/* upper boundaries */
+		initStringInfo(&bufs[3]);	/* nulls-only */
+		initStringInfo(&bufs[4]);	/* lower inclusive */
+		initStringInfo(&bufs[5]);	/* upper inclusive */
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid*)palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo*)palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		/*
+		 * lookup output functions for all histogram dimensions
+		 *
+		 * XXX This might be one in the first call and stored in user_fctx.
+		 */
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);	/* bucket ID */
+
+		/* for the arrays of lower/upper boundaries, formated according to otype */
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			Datum  *vals   = histogram->values[i];
+
+			uint16	minidx = bucket->min[i];
+			uint16	maxidx = bucket->max[i];
+
+			/* compute bucket volume, using distinct values as a measure
+			 *
+			 * XXX Not really sure what to do for NULL dimensions here, so let's
+			 *     simply count them as '1'.
+			 */
+			bucket_volume
+				*= (double)(maxidx - minidx + 1) / (histogram->nvalues[i]-1);
+
+			if (i == 0)
+				format = "{%s";		/* fist dimension */
+			else if (i < (histogram->ndimensions - 1))
+				format = ", %s";	/* medium dimensions */
+			else
+				format = ", %s}";	/* last dimension */
+
+			appendStringInfo(&bufs[3], format, bucket->nullsonly[i] ? "t" : "f");
+			appendStringInfo(&bufs[4], format, bucket->min_inclusive[i] ? "t" : "f");
+			appendStringInfo(&bufs[5], format, bucket->max_inclusive[i] ? "t" : "f");
+
+			/* for NULL-only  dimension, simply put there the NULL and continue */
+			if (bucket->nullsonly[i])
+			{
+				if (i == 0)
+					format = "{%s";
+				else if (i < (histogram->ndimensions - 1))
+					format = ", %s";
+				else
+					format = ", %s}";
+
+				appendStringInfo(&bufs[1], format, "NULL");
+				appendStringInfo(&bufs[2], format, "NULL");
+
+				continue;
+			}
+
+			/* otherwise we really need to format the value */
+			switch (otype)
+			{
+				case OUTPUT_FORMAT_RAW:		/* actual boundary values */
+
+					if (i == 0)
+						format = "{%s";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %s";
+					else
+						format = ", %s}";
+
+					appendStringInfo(&bufs[1], format,
+									 FunctionCall1(&fmgrinfo[i], vals[minidx]));
+
+					appendStringInfo(&bufs[2], format,
+									 FunctionCall1(&fmgrinfo[i], vals[maxidx]));
+
+					break;
+
+				case OUTPUT_FORMAT_INDEXES:	/* indexes into deduplicated arrays */
+
+					if (i == 0)
+						format = "{%d";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %d";
+					else
+						format = ", %d}";
+
+					appendStringInfo(&bufs[1], format, minidx);
+
+					appendStringInfo(&bufs[2], format, maxidx);
+
+					break;
+
+				case OUTPUT_FORMAT_DISTINCT:	/* distinct arrays as measure */
+
+					if (i == 0)
+						format = "{%f";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %f";
+					else
+						format = ", %f}";
+
+					appendStringInfo(&bufs[1], format,
+									 (minidx * 1.0 / (histogram->nvalues[i]-1)));
+
+					appendStringInfo(&bufs[2], format,
+									 (maxidx * 1.0 / (histogram->nvalues[i]-1)));
+
+					break;
+
+				default:
+					elog(ERROR, "unknown output type: %d", otype);
+			}
+		}
+
+		values[1] = bufs[1].data;
+		values[2] = bufs[2].data;
+		values[3] = bufs[3].data;
+		values[4] = bufs[4].data;
+		values[5] = bufs[5].data;
+
+		snprintf(values[6], 64, "%f", bucket->ntuples);	/* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_volume);	/* density */
+		snprintf(values[8], 64, "%f", bucket_volume);	/* volume (as a fraction) */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[6]);
+		pfree(values[7]);
+		pfree(values[8]);
+
+		resetStringInfo(&bufs[1]);
+		resetStringInfo(&bufs[2]);
+		resetStringInfo(&bufs[3]);
+		resetStringInfo(&bufs[4]);
+		resetStringInfo(&bufs[5]);
+
+		pfree(bufs);
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else    /* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int i, j;
+
+	float ffull = 0, fpartial = 0;
+	int nfull = 0, npartial = 0;
+
+	StringInfoData	buf;
+
+	initStringInfo(&buf);
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		if (! matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		resetStringInfo(&buf);
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			appendStringInfo(&buf, '[%d %d]',
+							 DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+							 DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, buf.data, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 2c22d31..b693f36 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2109,9 +2109,9 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 						   "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-						   "  deps_enabled, mcv_enabled,\n"
-						   "  deps_built, mcv_built,\n"
-						   "  mcv_max_items,\n"
+						   "  deps_enabled, mcv_enabled, hist_enabled,\n"
+						   "  deps_built, mcv_built, hist_built,\n"
+						   "  mcv_max_items, hist_max_buckets,\n"
 						   "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 						   "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2154,8 +2154,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (! first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-							PQgetvalue(result, i, 9));
+							PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 3529b03..7020772 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,13 +39,16 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
+	bool		hist_enabled;		/* build histogram? */
 
-	/* MCV size */
+	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
+	int32		hist_max_buckets;	/* max histogram buckets */
 
 	/* statistics that are available (if requested) */
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
+	bool		hist_built;			/* histogram was built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -53,6 +56,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
+	bytea		stahist;			/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -68,18 +72,22 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					12
+#define Natts_pg_mv_statistic					16
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_deps_enabled		5
 #define Anum_pg_mv_statistic_mcv_enabled		6
-#define Anum_pg_mv_statistic_mcv_max_items		7
-#define Anum_pg_mv_statistic_deps_built			8
-#define Anum_pg_mv_statistic_mcv_built			9
-#define Anum_pg_mv_statistic_stakeys			10
-#define Anum_pg_mv_statistic_stadeps			11
-#define Anum_pg_mv_statistic_stamcv				12
+#define Anum_pg_mv_statistic_hist_enabled		7
+#define Anum_pg_mv_statistic_mcv_max_items		8
+#define Anum_pg_mv_statistic_hist_max_buckets	9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_stakeys			13
+#define Anum_pg_mv_statistic_stadeps			14
+#define Anum_pg_mv_statistic_stamcv				15
+#define Anum_pg_mv_statistic_stahist			16
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 5640dc1..f2f735d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2674,6 +2674,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "17" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_volume}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f52884a..84be0ce 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -656,10 +656,12 @@ typedef struct MVStatisticInfo
 	/* enabled statistics */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index b2643ec..777c7da 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -18,7 +18,7 @@
 #include "commands/vacuum.h"
 
 /*
- * Degree of how much MCV item matches a clause.
+ * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
  */
 #define MVSTATS_MATCH_NONE		0		/* no match at all */
@@ -92,6 +92,123 @@ typedef MCVListData *MCVList;
 #define MVSTAT_MCVLIST_MAX_ITEMS	8192	/* max items in MCV list */
 
 /*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum  *min;
+	bool   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum  *max;
+	bool   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData	*MVBucket;
+
+
+typedef struct MVHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData {
+
+	/* Frequencies of this bucket. */
+	float	ntuples;	/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16 *min;
+	bool   *min_inclusive;
+
+	/* indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive) */
+	uint16 *max;
+	bool   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData	*MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData {
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32 		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of
+	 * deserialization (same offset)
+	 */
+	MVSerializedBucket   *buckets;		/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated
+	 * (the min/max indexes point into these arrays)
+	 */
+	int	   *nvalues;
+	Datum **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670	/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1			/* basic histogram type */
+
+/*
+ * Limits used for max_buckets option, i.e. we're always guaranteed
+ * to have space for at least MVSTAT_HIST_MIN_BUCKETS, and we cannot
+ * have more than MVSTAT_HIST_MAX_BUCKETS buckets.
+ *
+ * This is just a boundary for the 'max' threshold - the actual
+ * histogram may use less buckets than MVSTAT_HIST_MAX_BUCKETS.
+ *
+ * TODO The MVSTAT_HIST_MIN_BUCKETS should be related to the number of
+ *      attributes (MVSTATS_MAX_DIMENSIONS) because of NULL-buckets.
+ *      There should be at least 2^N buckets, otherwise we may be unable
+ *      to build the NULL buckets.
+ */
+#define MVSTAT_HIST_MIN_BUCKETS	128			/* min number of buckets */
+#define MVSTAT_HIST_MAX_BUCKETS	16384		/* max number of buckets */
+
+/*
  * TODO Maybe fetching the histogram/MCV list separately is inefficient?
  *      Consider adding a single `fetch_stats` method, fetching all
  *      stats specified using flags (or something like that).
@@ -99,20 +216,25 @@ typedef MCVListData *MCVList;
 
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram    load_mv_histogram(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							 VacAttrStats **stats);
+bytea * serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					  VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVDependencies	deserialize_mv_dependencies(bytea * data);
 MCVList			deserialize_mv_mcvlist(bytea * data);
+MVSerializedHistogram	deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
  * dimension within the stats).
  */
 int mv_get_index(AttrNumber varattno, int2vector * stakeys);
+int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
 int2vector* find_mv_attnums(Oid mvoid, Oid *relid);
 
@@ -121,6 +243,8 @@ extern Datum pg_mv_stats_dependencies_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_dependencies_show(PG_FUNCTION_ARGS);
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -130,10 +254,20 @@ MCVList
 build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVDependencies dependencies, MCVList mcvlist,
+void update_mv_stats(Oid relid, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..e830816
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,207 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s7 ON mv_histogram (unknown_column) WITH (histogram);
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s7 ON mv_histogram (a) WITH (histogram);
+ERROR:  multivariate stats require 2 or more columns
+-- single column, duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a, b) WITH (histogram);
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (unknown_option);
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- missing histogram statistics
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+ERROR:  option 'histogram' is required by other options(s)
+-- invalid max_buckets value / too low
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+ERROR:  minimum number of buckets is 128
+-- invalid max_buckets value / too high
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+ERROR:  maximum number of buckets is 16384
+-- correct command
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s8 ON mv_histogram (a, b, c) WITH (histogram);
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s9 ON mv_histogram (a, b, c, d) WITH (histogram);
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3d55ffe..528ac36 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1375,7 +1375,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stadeps) AS depsbytes,
     pg_mv_stats_dependencies_info(s.stadeps) AS depsinfo,
     length(s.stamcv) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length(s.stahist) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 85d94f1..a885235 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,4 +112,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_dependencies mv_mcv
+test: mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 6584d73..2efdcd7 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -164,3 +164,4 @@ test: event_trigger
 test: stats
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..27c2510
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,176 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s7 ON mv_histogram (unknown_column) WITH (histogram);
+
+-- single column
+CREATE STATISTICS s7 ON mv_histogram (a) WITH (histogram);
+
+-- single column, duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a) WITH (histogram);
+
+-- two columns, one duplicated
+CREATE STATISTICS s7 ON mv_histogram (a, a, b) WITH (histogram);
+
+-- unknown option
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (unknown_option);
+
+-- missing histogram statistics
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (dependencies, max_buckets=200);
+
+-- invalid max_buckets value / too low
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=10);
+
+-- invalid max_buckets value / too high
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (mcv, max_buckets=100000);
+
+-- correct command
+CREATE STATISTICS s7 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s8 ON mv_histogram (a, b, c) WITH (histogram);
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/10000 = 0 THEN NULL ELSE i/10000 END),
+       (CASE WHEN i/20000 = 0 THEN NULL ELSE i/20000 END),
+       (CASE WHEN i/40000 = 0 THEN NULL ELSE i/40000 END)
+     FROM generate_series(1,1000000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s9 ON mv_histogram (a, b, c, d) WITH (histogram);
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.5.0

0006-multi-statistics-estimation.patchtext/x-patch; name=0006-multi-statistics-estimation.patchDownload

From 6a965d339eca0b8573dbc709dc30eb9ac3c95e02 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 6 Feb 2015 01:42:38 +0100
Subject: [PATCH 6/9] multi-statistics estimation

The general idea is that a probability (which is what selectivity is)
can be split into a product of conditional probabilities like this:

    P(A & B & C) = P(A & B) * P(C|A & B)

If we assume that C and B are independent, the last part may be
simplified like this

    P(A & B & C) = P(A & B) * P(C|A)

we only need probabilities on [A,B] and [C,A] to compute the original
probability.

The implementation works in the other direction, though. We know what
probability P(A & B & C) we need to compute, and also what statistics
are available.

So we search for a combinations of statistics, covering the clauses in
an optimal way (most clauses covered, most dependencies exploited).

There are two possible approaches - exhaustive and greedy. The
exhaustive one walks through all permutations of stats using dynamic
programming, so it's guaranteed to find the optimal solution, but it
soon gets very slow as it's roughly O(N!). The dynamic programming may
improve that a bit, but it's still far too expensive for large numbers
of statistics (on a single table).

The greedy algorithm is very simple - in every step choose the best
solution. That may not guarantee the best solution globally (but maybe
it does?), but it only needs N steps to find the solution, so it's very
fast (processing the selected stats is usually way more expensive).

There's a GUC for selecting the search algorithm

    mvstat_search = {'greedy', 'exhaustive'}

The default value is 'greedy' as that's much safer (with respect to
runtime). See choose_mv_statistics().

Once we have found a sequence of statistics, we apply them to the
clauses using the conditional probabilities. We process the selected
stats one by one, and for each we select the estimated clauses and
conditions. See clauselist_selectivity() for more details.

Limitations
-----------

It's still true that each clause at a given level has to be covered by
a single MV statistics. So with this query

    WHERE (clause1) AND (clause2) AND (clause3 OR clause4)

each parenthesized clause has to be covered by a single multivariate
statistics.

Clauses not covered by a single statistics at this level will be passed
to clause_selectivity() but this will treat them as a collection of
simpler clauses (connected by AND or OR), and the clauses from the
previous level will be used as conditions.

So using the same example, the last clause will be passed to
clause_selectivity() with 'clause1' and 'clause2' as conditions, and it
will be processed using multivariate stats if possible.

The other limitation is that all the expressions have to be
mv-compatible, i.e. there can't be a mix of expressions. If this is
violated, the clause may be passed to the next level (just like with
list of clauses not covered by a single statistics), which splits that
into clauses handled by multivariate stats and clauses handler by
regular statistics.

rework clauselist_selectivity_or to handle OR-clauses correctly

We might invent a completely new set of functions here, resembling
clauselist_selectivity but adapting the ideas to OR-clauses.

But luckily we know that each OR-clause

    (a OR b OR c)

may be rewritten as an equivalent AND-clause using negation:

    NOT ((NOT a) AND (NOT b) AND (NOT c))

And that's something we can pass to clauselist_selectivity.
---
 contrib/file_fdw/file_fdw.c            |    3 +-
 contrib/postgres_fdw/postgres_fdw.c    |   11 +-
 src/backend/optimizer/path/clausesel.c | 2030 ++++++++++++++++++++++++++------
 src/backend/optimizer/path/costsize.c  |   23 +-
 src/backend/optimizer/util/orclauses.c |    4 +-
 src/backend/utils/adt/selfuncs.c       |   17 +-
 src/backend/utils/misc/guc.c           |   20 +
 src/backend/utils/mvstats/README.stats |  166 +++
 src/include/optimizer/cost.h           |    6 +-
 src/include/utils/mvstats.h            |    8 +
 10 files changed, 1916 insertions(+), 372 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index dc035d7..8f11b7a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -969,7 +969,8 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index aa745f2..d89f9e3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -500,7 +500,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2136,7 +2137,8 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NIL);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
@@ -3661,7 +3663,8 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 0,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NIL);
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
 	/*
@@ -3680,7 +3683,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 		 */
 		fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
 														0, fpinfo->jointype,
-														extra->sjinfo);
+														extra->sjinfo, NIL);
 
 	}
 	fpinfo->server = GetForeignServer(joinrel->serverid);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index fe96a73..14e3444 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -29,6 +29,8 @@
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
+#include "miscadmin.h"
+
 
 /*
  * Data structure for accumulating info about possible range-query
@@ -44,6 +46,13 @@ typedef struct RangeQueryClause
 	Selectivity hibound;		/* Selectivity of a var < something clause */
 } RangeQueryClause;
 
+static Selectivity clauselist_selectivity_or(PlannerInfo *root,
+											 List *clauses,
+											 int varRelid,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 List *conditions);
+
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
@@ -60,23 +69,25 @@ static int count_mv_attnums(List *clauses, Index relid, int type);
 
 static int count_varnos(List *clauses, Index *relid);
 
+static List *clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove);
+
 static List *clauselist_apply_dependencies(PlannerInfo *root, List *clauses,
 									Index relid, List *stats);
 
-static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums);
-
-static List *clauselist_mv_split(PlannerInfo *root, Index relid,
-								 List *clauses, List **mvclauses,
-								 MVStatisticInfo *mvstats, int types);
-
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats, List *clauses,
+									List *conditions, bool is_or);
 
 static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats,
-									bool *fullmatch, Selectivity *lowsel);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or, bool *fullmatch,
+									Selectivity *lowsel);
 static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
-									List *clauses, MVStatisticInfo *mvstats);
+									MVStatisticInfo *mvstats,
+									List *clauses, List *conditions,
+									bool is_or);
 
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 									int2vector *stakeys, MCVList mcvlist,
@@ -90,12 +101,33 @@ static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 									int nmatches, char * matches,
 									bool is_or);
 
+/*
+ * Describes a combination of multiple statistics to cover attributes
+ * referenced by the clauses. The array 'stats' (with nstats elements)
+ * lists attributes (in the order as they are applied), and number of
+ * clause attributes covered by this solution.
+ *
+ * choose_mv_statistics_exhaustive() uses this to track both the current
+ * and the best solutions, while walking through the state of possible
+ * combination.
+ */
+typedef struct mv_solution_t {
+	int		nclauses;		/* number of clauses covered */
+	int		nconditions;	/* number of conditions covered */
+	int		nstats;			/* number of stats applied */
+	int	   *stats;			/* stats (in the apply order) */
+} mv_solution_t;
+
+static List *choose_mv_statistics(PlannerInfo *root, Index relid,
+							List *mvstats, List *clauses, List *conditions);
+
 static bool has_stats(List *stats, int type);
 
 static List * find_stats(PlannerInfo *root, Index relid);
 
 static bool stats_type_matches(MVStatisticInfo *stat, int type);
 
+int mvstat_search_type = MVSTAT_SEARCH_GREEDY;
 
 /* used for merging bitmaps - AND (min), OR (max) */
 #define MAX(x, y) (((x) > (y)) ? (x) : (y))
@@ -170,14 +202,15 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/* processing mv stats */
-	Oid			relid = InvalidOid;
+	Index		relid = InvalidOid;
 
 	/* list of multivariate stats on the relation */
 	List	   *stats = NIL;
@@ -193,12 +226,13 @@ clauselist_selectivity(PlannerInfo *root,
 		stats = find_stats(root, relid);
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, or matching multivariate statistics, so just go directly
+	 * to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, conditions);
 
 	/*
 	 * Apply functional dependencies, but first check that there are some stats
@@ -230,31 +264,100 @@ clauselist_selectivity(PlannerInfo *root,
 		(count_mv_attnums(clauses, relid,
 						  MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2))
 	{
-		/* collect attributes from the compatible conditions */
-		Bitmapset *mvattnums = collect_mv_attnums(clauses, relid,
-											MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+		ListCell *s;
+
+		/*
+		 * Copy the conditions we got from the upper part of the expression tree
+		 * so that we can add local conditions to it (we need to keep the
+		 * original list intact, for sibling expressions - other expressions
+		 * at the same level).
+		 */
+		List *conditions_local = list_copy(conditions);
+
+		/* find the best combination of statistics */
+		List *solution = choose_mv_statistics(root, relid, stats,
+											  clauses, conditions);
 
-		/* and search for the statistic covering the most attributes */
-		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums);
+		/* FIXME we must not scribble over the original list */
+		if (solution)
+			clauses = list_copy(clauses);
 
-		if (mvstat != NULL)	/* we have a matching stats */
+		/*
+		 * We have a good solution, which is merely a list of statistics that
+		 * we need to apply. We'll apply the statistics one by one (in the order
+		 * as they appear in the list), and for each statistic we'll
+		 *
+		 * (1) find clauses compatible with the statistic (and remove them
+		 *     from the list)
+		 *
+		 * (2) find local conditions compatible with the statistic
+		 *
+		 * (3) do the estimation P(clauses | conditions)
+		 *
+		 * (4) append the estimated clauses to local conditions
+		 *
+		 * continuously modify
+		 */
+		foreach (s, solution)
 		{
-			/* clauses compatible with multi-variate stats */
-			List	*mvclauses = NIL;
+			MVStatisticInfo *mvstat = (MVStatisticInfo *)lfirst(s);
 
-			/* split the clauselist into regular and mv-clauses */
-			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
+			/* clauses compatible with the statistic we're applying right now */
+			List	*stat_clauses = NIL;
+			List	*stat_conditions = NIL;
 
-			/* we've chosen the histogram to match the clauses */
-			Assert(mvclauses != NIL);
+			/*
+			 * Find clauses and conditions matching the statistic - the clauses
+			 * need to be removed from the list, while conditions should remain
+			 * there (so that we can apply them repeatedly).
+			 */
+			stat_clauses
+				= clauses_matching_statistic(&clauses, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 true);
+
+			stat_conditions
+				= clauses_matching_statistic(&conditions_local, mvstat, relid,
+											 MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST,
+											 false);
+
+			/*
+			 * If we got no clauses to estimate, we've done something wrong,
+			 * either during the optimization, detecting compatible clause, or
+			 * somewhere else.
+			 *
+			 * Also, we need at least two attributes in clauses and conditions.
+			 */
+			Assert(stat_clauses != NIL);
+			Assert(count_mv_attnums(list_union(stat_clauses, stat_conditions),
+								relid, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);
 
 			/* compute the multivariate stats */
-			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+			s1 *= clauselist_mv_selectivity(root, mvstat,
+											stat_clauses, stat_conditions,
+											false); /* AND */
+
+			/*
+			 * Add the new clauses to the local conditions, so that we can use
+			 * them for the subsequent statistics. We only add the clauses,
+			 * because the conditions are already there (or should be).
+			 */
+			conditions_local = list_concat(conditions_local, stat_clauses);
 		}
+
+		/* from now on, work only with the 'local' list of conditions */
+		conditions = conditions_local;
 	}
 
 	/*
+	 * If there's exactly one clause, then no use in trying to match up
+	 * pairs, so just go directly to clause_selectivity().
+	 */
+	if (list_length(clauses) == 1)
+		return s1 * clause_selectivity(root, (Node *) linitial(clauses),
+									   varRelid, jointype, sjinfo, conditions);
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -266,7 +369,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								conditions);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -425,6 +529,55 @@ clauselist_selectivity(PlannerInfo *root,
 }
 
 /*
+ * Similar to clauselist_selectivity(), but for OR-clauses. We can't simply use
+ * the same multi-statistic estimation logic for AND-clauses, at least not
+ * directly, because there are a few key differences:
+ *
+ *   - functional dependencies don't really apply to OR-clauses
+ *
+ *   - clauselist_selectivity() is based on decomposing the selectivity into
+ *     a sequence of conditional probabilities (selectivities), but that can
+ *     be done only for AND-clauses
+ *
+ * We might invent a similar infrastructure for optimizing OR-clauses, doing
+ * something similar to what clause_selectivity does for AND-clauses, but
+ * luckily we know that each disjunctive normal form (aka OR-clause)
+ *
+ *     (a OR b OR c)
+ *
+ * may be rewritten as an equivalent conjunctive normal form (aka AND-clause)
+ * by using negation:
+ *
+ *     NOT ((NOT a) AND (NOT b) AND (NOT c))
+ *
+ * And that's something we can pass to clauselist_selectivity and let it do
+ * all the heavy lifting.
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+					   List *clauses,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions)
+{
+	List	   *args = NIL;
+	ListCell   *l;
+	Expr	   *expr;
+
+	/* build arguments for the AND-clause by negating args of the OR-clause */
+	foreach (l, clauses)
+		args = lappend(args, makeBoolExpr(NOT_EXPR, list_make1(lfirst(l)), -1));
+
+	/* and then the actual OR-clause on the negated args */
+	expr = makeBoolExpr(AND_EXPR, args, -1);
+
+	/* instead of constructing NOT expression, just do (1.0 - s) */
+	return 1.0 - clauselist_selectivity(root, list_make1(expr), varRelid,
+										jointype, sjinfo, conditions);
+}
+
+/*
  * addRangeClause --- add a new range clause for clauselist_selectivity
  *
  * Here is where we try to match up pairs of range-query clauses
@@ -631,7 +784,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -751,7 +905,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  conditions);
 	}
 	else if (and_clause(clause))
 	{
@@ -760,29 +915,18 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									conditions);
 	}
 	else if (or_clause(clause))
 	{
-		/*
-		 * Selectivities for an OR clause are computed as s1+s2 - s1*s2 to
-		 * account for the probable overlap of selected tuple sets.
-		 *
-		 * XXX is this too conservative?
-		 */
-		ListCell   *arg;
-
-		s1 = 0.0;
-		foreach(arg, ((BoolExpr *) clause)->args)
-		{
-			Selectivity s2 = clause_selectivity(root,
-												(Node *) lfirst(arg),
-												varRelid,
-												jointype,
-												sjinfo);
-
-			s1 = s1 + s2 - s1 * s2;
-		}
+		/* just call to clauselist_selectivity_or() */
+		s1 = clauselist_selectivity_or(root,
+									((BoolExpr *) clause)->args,
+									varRelid,
+									jointype,
+									sjinfo,
+									conditions);
 	}
 	else if (is_opclause(clause) || IsA(clause, DistinctExpr))
 	{
@@ -872,7 +1016,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -881,7 +1026,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								conditions);
 	}
 	else
 	{
@@ -945,300 +1091,1395 @@ clause_selectivity(PlannerInfo *root,
  *          in the MCV list, then the selectivity is below the lowest frequency
  *          found in the MCV list,
  *
- * TODO When applying the clauses to the histogram/MCV list, we can do
- *      that from the most selective clauses first, because that'll
- *      eliminate the buckets/items sooner (so we'll be able to skip
- *      them without inspection, which is more expensive). But this
- *      requires really knowing the per-clause selectivities in advance,
- *      and that's not what we do now.
+ * TODO When applying the clauses to the histogram/MCV list, we can do that from
+ *      the most selective clauses first, because that'll eliminate the
+ *      buckets/items sooner (so we'll be able to skip them without inspection,
+ *      which is more expensive). But this requires really knowing the
+ *      per-clause selectivities in advance, and that's not what we do now.
+ *
  */
 static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+clauselist_mv_selectivity(PlannerInfo *root, MVStatisticInfo *mvstats,
+						  List *clauses, List *conditions, bool is_or)
 {
 	bool fullmatch = false;
 	Selectivity s1 = 0.0, s2 = 0.0;
 
-	/*
-	 * Lowest frequency in the MCV list (may be used as an upper bound
-	 * for full equality conditions that did not match any MCV item).
-	 */
-	Selectivity mcv_low = 0.0;
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound
+	 * for full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/* TODO Evaluate simple 1D selectivities, use the smallest one as
+	 *      an upper bound, product as lower bound, and sort the
+	 *      clauses in ascending order by selectivity (to optimize the
+	 *      MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, mvstats,
+										   clauses, conditions, is_or,
+										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and
+	 * the estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 *      selectivity as upper bound */
+
+	s2 = clauselist_mv_selectivity_histogram(root, mvstats,
+											 clauses, conditions, is_or);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
+}
+
+/*
+ * Pull varattnos from the clauses, similarly to pull_varattnos() but:
+ *
+ * (a) only get attributes for a particular relation (relid)
+ * (b) ignore system attributes (we can't build stats on them anyway)
+ *
+ * This makes it possible to directly compare the result with attnum
+ * values from pg_attribute etc.
+ */
+static Bitmapset *
+get_varattnos(Node * node, Index relid)
+{
+	int			k;
+	Bitmapset  *varattnos = NULL;
+	Bitmapset  *result = NULL;
+
+	/* get the varattnos */
+	pull_varattnos(node, relid, &varattnos);
+
+	k = -1;
+	while ((k = bms_next_member(varattnos, k)) >= 0)
+	{
+		if (k + FirstLowInvalidHeapAttributeNumber > 0)
+			result
+				= bms_add_member(result,
+								 k + FirstLowInvalidHeapAttributeNumber);
+	}
+
+	bms_free(varattnos);
+
+	return result;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid, int types)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate
+	 * using multivariate stats, and remember the relid/columns. We'll
+	 * then cross-check if we have suitable stats, and only if needed
+	 * we'll split the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested
+	 * OpExpr, using either a range or equality.
+	 */
+	foreach (l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		bms_free(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid, int type)
+{
+	int c;
+	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int cnt;
+	Bitmapset *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+static List *
+clauses_matching_statistic(List **clauses, MVStatisticInfo *statistic,
+						   Index relid, int types, bool remove)
+{
+	int i;
+	Bitmapset  *stat_attnums = NULL;
+	List	   *matching_clauses = NIL;
+	ListCell   *lc;
+
+	/* build attnum bitmapset for this statistics */
+	for (i = 0; i < statistic->stakeys->dim1; i++)
+		stat_attnums = bms_add_member(stat_attnums,
+									  statistic->stakeys->values[i]);
+
+	/*
+	 * We can't use foreach here, because we may need to remove some of the
+	 * clauses if (remove=true).
+	 */
+	lc = list_head(*clauses);
+	while (lc)
+	{
+		Node *clause = (Node*)lfirst(lc);
+		Bitmapset *attnums = NULL;
+
+		/* must advance lc before list_delete possibly pfree's it */
+		lc = lnext(lc);
+
+		/*
+		 * skip clauses that are not compatible with stats (just leave them
+		 * in the original list)
+		 *
+		 * XXX Perhaps this should check what stats are actually available in
+		 *     the statistics (not a big deal now, because MCV and histograms
+		 *     handle the same types of conditions).
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &attnums, types))
+		{
+			bms_free(attnums);
+			continue;
+		}
+
+		/* if the clause is covered by the statistic, add it to the list */
+		if (bms_is_subset(attnums, stat_attnums))
+		{
+			matching_clauses = lappend(matching_clauses, clause);
+
+			/* if remove=true, remove the matching item from the main list */
+			if (remove)
+				*clauses = list_delete_ptr(*clauses, clause);
+		}
+
+		bms_free(attnums);
+	}
+
+	bms_free(stat_attnums);
+
+	return matching_clauses;
+}
+
+/*
+ * Selects the best combination of multivariate statistics, in an exhaustive
+ * way, where 'best' means:
+ *
+ * (a) covering the most attributes (referenced by clauses)
+ * (b) using the least number of multivariate stats
+ * (c) using the most conditions to exploit dependency
+ *
+ * Don't call this directly but through choose_mv_statistics(), which does some
+ * additional tricks to minimize the runtime.
+ *
+ *
+ * Algorithm
+ * ---------
+ * The algorithm is a recursive implementation of backtracking, with maximum
+ * depth equal to the number of multi-variate statistics available on the table.
+ * It actually explores all valid combinations of stats.
+ *
+ * Whenever it considers adding the next statistics, the clauses it matches are
+ * divided into 'conditions' (clauses already matched by at least one previous
+ * statistics) and clauses that are estimated.
+ *
+ * Then several checks are performed:
+ *
+ *  (a) The statistics covers at least 2 columns, referenced in the estimated
+ *      clauses (otherwise multi-variate stats are useless).
+ *
+ *  (b) The statistics covers at least 1 new column, i.e. column not refefenced
+ *      by the already used stats (and the new column has to be referenced by
+ *      the clauses, of couse). Otherwise the statistics would not add any new
+ *      information.
+ *
+ * There are some other sanity checks (e.g. stats must not be used twice etc.).
+ *
+ *
+ * Weaknesses
+ * ----------
+ * The current implemetation uses a rather simple optimality criteria, so it may
+ * not do the best choice when
+ *
+ * (a) There may be multiple solutions with the same number of covered
+ *     attributes and number of statistics (e.g. the same solution but with
+ *     statistics in a different order). It's unclear which solution in the best
+ *     one - in a sense all of them are equal.
+ *
+ * TODO It might be possible to compute estimate for each of those solutions,
+ *      and then combine them to get the final estimate (e.g. by using average
+ *      or median).
+ *
+ * (b) Does not consider that some types of stats are a better match for some
+ *     types of clauses (e.g. MCV list is generally a better match for equality
+ *     conditions than a histogram).
+ *
+ *     But maybe this is pointless - generally, each column is either a label
+ *     (it's not important whether because of the data type or how it's used),
+ *     or a value with ordering that makes sense. So either a MCV list is more
+ *     appropriate (labels) or a histogram (values with orderings).
+ *
+ *     Now sure what to do with statistics on columns mixing both types of data
+ *     (some columns would work best with MCVs, some with histograms). Maybe we
+ *     could invent a new type of statistics combining MCV list and histogram
+ *     (keeping a small histogram for each MCV item, and a separate histogram
+ *     for values not on the MCV list).
+ *
+ * TODO The algorithm should probably count number of Vars (not just attnums)
+ *      when computing the 'score' of each solution. Computing the ratio of
+ *      (num of all vars) / (num of condition vars) as a measure of how well
+ *      the solution uses conditions might be useful.
+ */
+static void
+choose_mv_statistics_exhaustive(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* this may run for a long sime, so let's make it interruptible */
+	CHECK_FOR_INTERRUPTS();
+
+	if (current == NULL)
+	{
+		current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+		current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+		current->nstats = 0;
+		current->nclauses = 0;
+		current->nconditions = 0;
+	}
+
+	/*
+	 * Now try to apply each statistics, matching at least two attributes,
+	 * unless it's already used in one of the previous steps.
+	 */
+	for (i = 0; i < nmvstats; i++)
+	{
+		int c;
+
+		int ncovered_clauses = 0;		/* number of covered clauses */
+		int ncovered_conditions = 0;	/* number of covered conditions */
+		int nattnums = 0;		/* number of covered attributes */
+
+		Bitmapset  *all_attnums = NULL;
+
+		/* skip statistics that were already used or eliminated */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nclauses; c++)
+		{
+			bool covered = false;
+			Bitmapset *clause_attnums = clauses_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! cover_map[i * nclauses + c])
+				continue;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+
+			/* let's see if it's covered by any of the previous stats */
+			for (j = 0; j < step; j++)
+			{
+				/* already covered by the previous stats */
+				if (cover_map[current->stats[j] * nclauses + c])
+					covered = true;
+
+				if (covered)
+					break;
+			}
+
+			/* if already covered, continue with the next clause */
+			if (covered)
+			{
+				ncovered_conditions += 1;
+				continue;
+			}
+
+			/*
+			 * OK, this clause is covered by this statistics (and not by
+			 * any of the previous ones)
+			 */
+			ncovered_clauses += 1;
+		}
+
+		/* can't have more new clauses than original clauses */
+		Assert(nclauses >= ncovered_clauses);
+		Assert(ncovered_clauses >= 0);	/* mostly paranoia */
+
+		nattnums = bms_num_members(all_attnums);
+
+		/* free all the bitmapsets - we don't need them anymore */
+		bms_free(all_attnums);
+
+		all_attnums = NULL;
+
+		/*
+		 * See if we have clauses covered by this statistics, but not
+		 * yet covered by any of the preceding onces.
+		 */
+		for (c = 0; c < nconditions; c++)
+		{
+			Bitmapset *clause_attnums = conditions_attnums[c];
+			Bitmapset *tmp = NULL;
+
+			/*
+			 * If this clause is not covered by this stats, we can't
+			 * use the stats to estimate that at all.
+			 */
+			if (! condition_map[i * nconditions + c])
+				continue;
+
+			/* count this as a condition */
+			ncovered_conditions += 1;
+
+			/*
+			 * Now we know we'll use this clause - either as a condition
+			 * or as a new clause (the estimated one). So let's add the
+			 * attributes to the attnums from all the clauses usable with
+			 * this statistics.
+			 */
+			tmp = bms_union(all_attnums, clause_attnums);
+
+			/* free the old bitmap */
+			bms_free(all_attnums);
+			all_attnums = tmp;
+		}
+
+		/*
+		 * Let's mark the statistics as 'ruled out' - either we'll use
+		 * it (and proceed to the next step), or it's incompatible.
+		 */
+		ruled_out[i] = step;
+
+		/*
+		 * There are no clauses usable with this statistics (not already
+		 * covered by aome of the previous stats).
+		 *
+		 * Similarly, if the clauses only use a single attribute, we
+		 * can't really use that.
+		 */
+		if ((ncovered_clauses == 0) || (nattnums < 2))
+			continue;
+
+		/*
+		 * TODO Not sure if it's possible to add a clause referencing
+		 *      only attributes already covered by previous stats?
+		 *      Introducing only some new dependency, not a new
+		 *      attribute. Couldn't come up with an example, though.
+		 *      Might be worth adding some assert.
+		 */
+
+		/*
+		 * got a suitable statistics - let's update the current solution,
+		 * maybe use it as the best solution
+		 */
+		current->nclauses += ncovered_clauses;
+		current->nconditions += ncovered_conditions;
+		current->nstats += 1;
+		current->stats[step] = i;
+
+		/*
+		 * We can never cover more clauses, or use more stats that we
+		 * actually have at the beginning.
+		 */
+		Assert(nclauses >= current->nclauses);
+		Assert(nmvstats >= current->nstats);
+		Assert(step < nmvstats);
+
+		if (*best == NULL)
+		{
+			*best = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			(*best)->nstats = 0;
+			(*best)->nclauses = 0;
+			(*best)->nconditions = 0;
+		}
+
+		/* see if it's better than the current 'best' solution */
+		if ((current->nclauses > (*best)->nclauses) ||
+			((current->nclauses == (*best)->nclauses) &&
+			((current->nstats > (*best)->nstats))))
+		{
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		 */
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_exhaustive(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= ncovered_clauses;
+		current->nconditions -= ncovered_conditions;
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[i] = -1;
+
+		Assert(current->nclauses >= 0);
+		Assert(current->nstats >= 0);
+	}
+
+	/* reset all statistics as 'incompatible' in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+}
+
+/*
+ * Greedy search for a multivariate solution - a sequence of statistics covering
+ * the clauses. This chooses the "best" statistics at each step, so the
+ * resulting solution may not be the best solution globally, but this produces
+ * the solution in only N steps (where N is the number of statistics), while
+ * the exhaustive approach may have to walk through ~N! combinations (although
+ * some of those are terminated early).
+ *
+ * See the comments at choose_mv_statistics_exhaustive() as this does the same
+ * thing (but in a different way).
+ *
+ * Don't call this directly, but through choose_mv_statistics().
+ *
+ * TODO There are probably other metrics we might use - e.g. using number of
+ *      columns (num_cond_columns / num_cov_columns), which might work better
+ *      with a mix of simple and complex clauses.
+ *
+ * TODO Also the choice at the very first step should be handled in a special
+ *      way, because there will be 0 conditions at that moment, so there needs
+ *      to be some other criteria - e.g. using the simplest (or most complex?)
+ *      clause might be a good idea.
+ *
+ * TODO We might also select multiple stats using different criteria, and branch
+ *      the search. This is however tricky, because if we choose k statistics at
+ *      each step, we get k^N branches to walk through (with N steps). That's
+ *      not really good with large number of stats (yet better than exhaustive
+ *      search).
+ */
+static void
+choose_mv_statistics_greedy(PlannerInfo *root, int step,
+					int nmvstats, MVStatisticInfo *mvstats, Bitmapset ** stats_attnums,
+					int nclauses, Node ** clauses, Bitmapset ** clauses_attnums,
+					int nconditions, Node ** conditions, Bitmapset ** conditions_attnums,
+					bool *cover_map, bool *condition_map, int *ruled_out,
+					mv_solution_t *current, mv_solution_t **best)
+{
+	int i, j;
+	int best_stat = -1;
+	double gain, max_gain = -1.0;
+
+	/*
+	 * Bitmap tracking which clauses are already covered (by the previous
+	 * statistics) and may thus serve only as a condition in this step.
+	 */
+	bool *covered_clauses = (bool*)palloc0(nclauses);
+
+	/*
+	 * Number of clauses and columns covered by each statistics - this
+	 * includes both conditions and clauses covered by the statistics for
+	 * the first time. The number of columns may count some columns
+	 * repeatedly - if a column is shared by multiple clauses, it will
+	 * be counted once for each clause (covered by the statistics).
+	 * So with two clauses [(a=1 OR b=2),(a<2 OR c>1)] the column "a"
+	 * will be counted twice (if both clauses are covered).
+	 *
+	 * The values for reduded statistics (that can't be applied) are
+	 * not computed, because that'd be pointless.
+	 */
+	int	*num_cov_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cov_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Same as above, but this only includes clauses that are already
+	 * covered by the previous stats (and the current one).
+	 */
+	int	*num_cond_clauses	= (int*)palloc0(sizeof(int) * nmvstats);
+	int	*num_cond_columns	= (int*)palloc0(sizeof(int) * nmvstats);
+
+	/*
+	 * Number of attributes for each clause.
+	 *
+	 * TODO Might be computed in choose_mv_statistics() and then passed
+	 *      here, but then the function would not have the same signature
+	 *      as _exhaustive().
+	 */
+	int *attnum_counts = (int*)palloc0(sizeof(int) * nclauses);
+	int *attnum_cond_counts = (int*)palloc0(sizeof(int) * nconditions);
+
+	CHECK_FOR_INTERRUPTS();
+
+	Assert(best != NULL);
+	Assert((step == 0 && current == NULL) || (step > 0 && current != NULL));
+
+	/* compute attributes (columns) for each clause */
+	for (i = 0; i < nclauses; i++)
+		attnum_counts[i] = bms_num_members(clauses_attnums[i]);
+
+	/* compute attributes (columns) for each condition */
+	for (i = 0; i < nconditions; i++)
+		attnum_cond_counts[i] = bms_num_members(conditions_attnums[i]);
+
+	/* see which clauses are already covered at this point (by previous stats) */
+	for (i = 0; i < step; i++)
+		for (j = 0; j < nclauses; j++)
+			covered_clauses[j] |= (cover_map[current->stats[i] * nclauses + j]);
+
+	/* which remaining statistics covers most clauses / uses most conditions? */
+	for (i = 0; i < nmvstats; i++)
+	{
+		Bitmapset *attnums_covered = NULL;
+		Bitmapset *attnums_conditions = NULL;
+
+		/* skip stats that are already ruled out (either used or inapplicable) */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* count covered clauses and conditions (for the statistics) */
+		for (j = 0; j < nclauses; j++)
+		{
+			if (cover_map[i * nclauses + j])
+			{
+				Bitmapset *attnums_new
+					= bms_union(attnums_covered, clauses_attnums[j]);
+
+				/* get rid of the old bitmap and keep the unified result */
+				bms_free(attnums_covered);
+				attnums_covered = attnums_new;
+
+				num_cov_clauses[i] += 1;
+				num_cov_columns[i] += attnum_counts[j];
+
+				/* is the clause already covered (i.e. a condition)? */
+				if (covered_clauses[j])
+				{
+					num_cond_clauses[i] += 1;
+					num_cond_columns[i] += attnum_counts[j];
+					attnums_new = bms_union(attnums_conditions,
+											clauses_attnums[j]);
+
+					bms_free(attnums_conditions);
+					attnums_conditions = attnums_new;
+				}
+			}
+		}
+
+		/* if all covered clauses are covered by prev stats (thus conditions) */
+		if (num_cov_clauses[i] == num_cond_clauses[i])
+			ruled_out[i] = step;
+
+		/* same if there are no new attributes */
+		else if (bms_num_members(attnums_conditions) == bms_num_members(attnums_covered))
+			ruled_out[i] = step;
+
+		bms_free(attnums_covered);
+		bms_free(attnums_conditions);
+
+		/* if the statistics is inapplicable, try the next one */
+		if (ruled_out[i] != -1)
+			continue;
+
+		/* now let's walk through conditions and count the covered */
+		for (j = 0; j < nconditions; j++)
+		{
+			if (condition_map[i * nconditions + j])
+			{
+				num_cond_clauses[i] += 1;
+				num_cond_columns[i] += attnum_cond_counts[j];
+			}
+		}
+
+		/* otherwise see if this improves the interesting metrics */
+		gain = num_cond_columns[i] / (double)num_cov_columns[i];
+
+		if (gain > max_gain)
+		{
+			max_gain = gain;
+			best_stat = i;
+		}
+	}
+
+	/*
+	 * Have we found a suitable statistics? Add it to the solution and
+	 * try next step.
+	 */
+	if (best_stat != -1)
+	{
+		/* mark the statistics, so that we skip it in next steps */
+		ruled_out[best_stat] = step;
+
+		/* allocate current solution if necessary */
+		if (current == NULL)
+		{
+			current = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			current->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			current->nstats = 0;
+			current->nclauses = 0;
+			current->nconditions = 0;
+		}
+
+		current->nclauses += num_cov_clauses[best_stat];
+		current->nconditions += num_cond_clauses[best_stat];
+		current->stats[step] = best_stat;
+		current->nstats++;
+
+		if (*best == NULL)
+		{
+			(*best) = (mv_solution_t*)palloc0(sizeof(mv_solution_t));
+			(*best)->nstats = current->nstats;
+			(*best)->nclauses = current->nclauses;
+			(*best)->nconditions = current->nconditions;
+
+			(*best)->stats = (int*)palloc0(sizeof(int)*nmvstats);
+			memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+		}
+		else
+		{
+			/* see if this is a better solution */
+			double current_gain = (double)current->nconditions / current->nclauses;
+			double best_gain    = (double)(*best)->nconditions / (*best)->nclauses;
+
+			if ((current_gain > best_gain) ||
+				((current_gain == best_gain) && (current->nstats < (*best)->nstats)))
+			{
+				(*best)->nstats = current->nstats;
+				(*best)->nclauses = current->nclauses;
+				(*best)->nconditions = current->nconditions;
+				memcpy((*best)->stats, current->stats, nmvstats * sizeof(int));
+			}
+		}
+
+		/*
+		 * The recursion only makes sense if we haven't covered all the
+		 * attributes (then adding stats is not really possible).
+		*/
+		if ((step + 1) < nmvstats)
+			choose_mv_statistics_greedy(root, step+1,
+									nmvstats, mvstats, stats_attnums,
+									nclauses, clauses, clauses_attnums,
+									nconditions, conditions, conditions_attnums,
+									cover_map, condition_map, ruled_out,
+									current, best);
+
+		/* reset the last step */
+		current->nclauses -= num_cov_clauses[best_stat];
+		current->nconditions -= num_cond_clauses[best_stat];
+		current->nstats -= 1;
+		current->stats[step] = 0;
+
+		/* mark the statistics as usable again */
+		ruled_out[best_stat] = -1;
+	}
+
+	/* reset all statistics eliminated in this step */
+	for (i = 0; i < nmvstats; i++)
+		if (ruled_out[i] == step)
+			ruled_out[i] = -1;
+
+	/* free everything allocated in this step */
+	pfree(covered_clauses);
+	pfree(attnum_counts);
+	pfree(num_cov_clauses);
+	pfree(num_cov_columns);
+	pfree(num_cond_clauses);
+	pfree(num_cond_columns);
+}
+
+/*
+ * Remove clauses not covered by any of the available statistics
+ *
+ * This helps us to reduce the amount of work done in choose_mv_statistics()
+ * by not having to deal with clauses that can't possibly be useful.
+ */
+static List *
+filter_clauses(PlannerInfo *root, Index relid, int type,
+			   List *stats, List *clauses, Bitmapset **attnums)
+{
+	ListCell   *c;
+	ListCell   *s;
+
+	/* results (list of compatible clauses, attnums) */
+	List	   *rclauses = NIL;
+
+	foreach (c, clauses)
+	{
+		Node *clause = (Node*)lfirst(c);
+		Bitmapset *clause_attnums = NULL;
+
+		/*
+		 * We do assume that thanks to previous checks, we should not run into
+		 * clauses that are incompatible with multivariate stats here. We also
+		 * need to collect the attnums for the clause.
+		 *
+		 * XXX Maybe turn this into an assert?
+		 */
+		if (! clause_is_mv_compatible(clause, relid, &clause_attnums, type))
+				elog(ERROR, "should not get non-mv-compatible cluase");
+
+		/* Is there a multivariate statistics covering the clause? */
+		foreach (s, stats)
+		{
+			int k, matches = 0;
+			MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+			/* skip statistics not matching the required type */
+			if (! stats_type_matches(stat, type))
+				continue;
+
+			/*
+			 * see if all clause attributes are covered by the statistic
+			 *
+			 * We'll do that in the opposite direction, i.e. we'll see how many
+			 * attributes of the statistic are referenced in the clause, and then
+			 * compare the counts.
+			 */
+			for (k = 0; k < stat->stakeys->dim1; k++)
+				if (bms_is_member(stat->stakeys->values[k], clause_attnums))
+					matches += 1;
+
+			/*
+			 * If the number of matches is equal to attributes referenced by the
+			 * clause, then the clause is covered by the statistic.
+			 */
+			if (bms_num_members(clause_attnums) == matches)
+			{
+				*attnums = bms_union(*attnums, clause_attnums);
+				rclauses = lappend(rclauses, clause);
+				break;
+			}
+		}
+
+		bms_free(clause_attnums);
+	}
+
+	/* we can't have more compatible conditions than source conditions */
+	Assert(list_length(clauses) >= list_length(rclauses));
+
+	return rclauses;
+}
+
+/*
+ * Remove statistics not covering any new clauses
+ *
+ * Statistics not covering any new clauses (conditions don't count) are not
+ * really useful, so let's ignore them. Also, we need the statistics to
+ * reference at least two different attributes (both in conditions and clauses
+ * combined), and at least one of them in the clauses alone.
+ *
+ * This check might be made more strict by checking against individual clauses,
+ * because by using the bitmapsets of all attnums we may actually use attnums
+ * from clauses that are not covered by the statistics. For example, we may
+ * have a condition
+ *
+ *    (a=1 AND b=2)
+ *
+ * and a new clause
+ *
+ *    (c=1 AND d=1)
+ *
+ * With only bitmapsets, statistics on [b,c] will pass through this (assuming
+ * there are some statistics covering both clases).
+ *
+ * Parameters:
+ *
+ *     stats       - list of statistics to filter
+ *     new_attnums - attnums referenced in new clauses
+ *     all_attnums - attnums referenced by contidions and new clauses combined
+ *
+ * Returns filtered list of statistics.
+ *
+ * TODO Do the more strict check, i.e. walk through individual clauses and
+ *      conditions and only use those covered by the statistics.
+ */
+static List *
+filter_stats(List *stats, Bitmapset *new_attnums, Bitmapset *all_attnums)
+{
+	ListCell   *s;
+	List	   *stats_filtered = NIL;
+
+	foreach (s, stats)
+	{
+		int k;
+		int matches_new = 0,
+			matches_all = 0;
+
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(s);
+
+		/* see how many attributes the statistics covers */
+		for (k = 0; k < stat->stakeys->dim1; k++)
+		{
+			/* attributes from new clauses */
+			if (bms_is_member(stat->stakeys->values[k], new_attnums))
+				matches_new += 1;
+
+			/* attributes from onditions */
+			if (bms_is_member(stat->stakeys->values[k], all_attnums))
+				matches_all += 1;
+		}
+
+		/* check we have enough attributes for this statistics */
+		if ((matches_new >= 1) && (matches_all >= 2))
+			stats_filtered = lappend(stats_filtered, stat);
+	}
+
+	/* we can't have more useful stats than we had originally */
+	Assert(list_length(stats) >= list_length(stats_filtered));
+
+	return stats_filtered;
+}
+
+static MVStatisticInfo *
+make_stats_array(List *stats, int *nmvstats)
+{
+	int i;
+	ListCell   *l;
+
+	MVStatisticInfo *mvstats = NULL;
+	*nmvstats = list_length(stats);
 
-	/* TODO Evaluate simple 1D selectivities, use the smallest one as
-	 *      an upper bound, product as lower bound, and sort the
-	 *      clauses in ascending order by selectivity (to optimize the
-	 *      MCV/histogram evaluation).
-	 */
+	mvstats
+		= (MVStatisticInfo*)palloc0((*nmvstats) * sizeof(MVStatisticInfo));
 
-	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
-										   &fullmatch, &mcv_low);
+	i = 0;
+	foreach (l, stats)
+	{
+		MVStatisticInfo	*stat = (MVStatisticInfo *)lfirst(l);
+		memcpy(&mvstats[i++], stat, sizeof(MVStatisticInfo));
+	}
 
-	/*
-	 * If we got a full equality match on the MCV list, we're done (and
-	 * the estimate is pretty good).
-	 */
-	if (fullmatch && (s1 > 0.0))
-		return s1;
+	return mvstats;
+}
 
-	/* TODO if (fullmatch) without matching MCV item, use the mcv_low
-	 *      selectivity as upper bound */
+static Bitmapset **
+make_stats_attnums(MVStatisticInfo *mvstats, int nmvstats)
+{
+	int			i, j;
+	Bitmapset **stats_attnums = NULL;
 
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+	Assert(nmvstats > 0);
 
-	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
-	return s1 + s2;
+	/* build bitmaps of attnums for the stats (easier to compare) */
+	stats_attnums = (Bitmapset **)palloc0(nmvstats * sizeof(Bitmapset*));
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < mvstats[i].stakeys->dim1; j++)
+			stats_attnums[i]
+				= bms_add_member(stats_attnums[i],
+								 mvstats[i].stakeys->values[j]);
+
+	return stats_attnums;
 }
 
+
 /*
- * Collect attributes from mv-compatible clauses.
+ * Remove redundant statistics
+ *
+ * If there are multiple statistics covering the same set of columns (counting
+ * only those referenced by clauses and conditions), we can apply one of those
+ * anyway and further reduce the size of the optimization problem.
+ *
+ * Thus when redundant stats are detected, we keep the smaller one (the one with
+ * fewer columns), based on the assumption that it's more accurate and also
+ * faster to process. That may be untrue for two reasons - first, the accuracy
+ * really depends on number of buckets/MCV items, not the number of columns.
+ * Second, some types of statistics may work better for certain types of clauses
+ * (e.g. MCV lists for equality conditions) etc.
  */
-static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid, int types)
+static List*
+filter_redundant_stats(List *stats, List *clauses, List *conditions)
 {
-	Bitmapset  *attnums = NULL;
-	ListCell   *l;
+	int i, j, nmvstats;
+
+	MVStatisticInfo	   *mvstats;
+	bool			   *redundant;
+	Bitmapset		  **stats_attnums;
+	Bitmapset		   *varattnos;
+	Index				relid;
+
+	Assert(list_length(stats) > 0);
+	Assert(list_length(clauses) > 0);
+
+	/*
+	 * We'll convert the list of statistics into an array now, because
+	 * the reduction of redundant statistics is easier to do that way
+	 * (we can mark previous stats as redundant, etc.).
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
+
+	/* by default, none of the stats is redundant (so palloc0) */
+	redundant = palloc0(nmvstats * sizeof(bool));
+
+	/*
+	 * We only expect a single relid here, and also we should get the
+	 * same relid from clauses and conditions (but we get it from
+	 * clauses, because those are certainly non-empty).
+	 */
+	relid = bms_singleton_member(pull_varnos((Node*)clauses));
 
 	/*
-	 * Walk through the clauses and identify the ones we can estimate using
-	 * multivariate stats, and remember the relid/columns. We'll then
-	 * cross-check if we have suitable stats, and only if needed we'll split
-	 * the clauses into multivariate and regular lists.
+	 * Get the varattnos from both conditions and clauses.
+	 *
+	 * This skips system attributes, although that should be impossible
+	 * thanks to previous filtering out of incompatible clauses.
 	 *
-	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
-	 * using either a range or equality.
+	 * XXX Is that really true?
 	 */
-	foreach (l, clauses)
+	varattnos = bms_union(get_varattnos((Node*)clauses, relid),
+						  get_varattnos((Node*)conditions, relid));
+
+	for (i = 1; i < nmvstats; i++)
 	{
-		Node	   *clause = (Node *) lfirst(l);
+		/* intersect with current statistics */
+		Bitmapset *curr = bms_intersect(stats_attnums[i], varattnos);
 
-		/* ignore the result here - we only need the attnums */
-		clause_is_mv_compatible(clause, relid, &attnums, types);
+		/* walk through 'previous' stats and check redundancy */
+		for (j = 0; j < i; j++)
+		{
+			/* intersect with current statistics */
+			Bitmapset *prev;
+
+			/* skip stats already identified as redundant */
+			if (redundant[j])
+				continue;
+
+			prev = bms_intersect(stats_attnums[j], varattnos);
+
+			switch (bms_subset_compare(curr, prev))
+			{
+				case BMS_EQUAL:
+					/*
+					 * Use the smaller one (hopefully more accurate).
+					 * If both have the same size, use the first one.
+					 */
+					if (mvstats[i].stakeys->dim1 >= mvstats[j].stakeys->dim1)
+						redundant[i] = TRUE;
+					else
+						redundant[j] = TRUE;
+
+					break;
+
+				case BMS_SUBSET1: /* curr is subset of prev */
+					redundant[i] = TRUE;
+					break;
+
+				case BMS_SUBSET2: /* prev is subset of curr */
+					redundant[j] = TRUE;
+					break;
+
+				case BMS_DIFFERENT:
+					/* do nothing - keep both stats */
+					break;
+			}
+
+			bms_free(prev);
+		}
+
+		bms_free(curr);
 	}
 
-	/*
-	 * If there are not at least two attributes referenced by the clause(s),
-	 * we can throw everything out (as we'll revert to simple stats).
-	 */
-	if (bms_num_members(attnums) <= 1)
+	/* can't reduce all statistics (at least one has to remain) */
+	Assert(nmvstats > 0);
+
+	/* now, let's remove the reduced statistics from the arrays */
+	list_free(stats);
+	stats = NIL;
+
+	for (i = 0; i < nmvstats; i++)
 	{
-		if (attnums != NULL)
-			pfree(attnums);
-		attnums = NULL;
+		MVStatisticInfo *info;
+
+		pfree(stats_attnums[i]);
+
+		if (redundant[i])
+			continue;
+
+		info = makeNode(MVStatisticInfo);
+		memcpy(info, &mvstats[i], sizeof(MVStatisticInfo));
+
+		stats = lappend(stats, info);
 	}
 
-	return attnums;
+	pfree(mvstats);
+	pfree(stats_attnums);
+	pfree(redundant);
+
+	return stats;
 }
 
-/*
- * Count the number of attributes in clauses compatible with multivariate stats.
- */
-static int
-count_mv_attnums(List *clauses, Index relid, int type)
+static Node**
+make_clauses_array(List *clauses, int *nclauses)
 {
-	int c;
-	Bitmapset *attnums = collect_mv_attnums(clauses, relid, type);
+	int i;
+	ListCell *l;
 
-	c = bms_num_members(attnums);
+	Node** clauses_array;
 
-	bms_free(attnums);
+	*nclauses = list_length(clauses);
+	clauses_array = (Node **)palloc0((*nclauses) * sizeof(Node *));
 
-	return c;
+	i = 0;
+	foreach (l, clauses)
+		clauses_array[i++] = (Node *)lfirst(l);
+
+	*nclauses = i;
+
+	return clauses_array;
 }
 
-/*
- * Count varnos referenced in the clauses, and if there's a single varno then
- * return the index in 'relid'.
- */
-static int
-count_varnos(List *clauses, Index *relid)
+static Bitmapset **
+make_clauses_attnums(PlannerInfo *root, Index relid,
+					 int type, Node **clauses, int nclauses)
 {
-	int cnt;
-	Bitmapset *varnos = NULL;
+	int			i;
+	Bitmapset **clauses_attnums
+		= (Bitmapset **)palloc0(nclauses * sizeof(Bitmapset *));
 
-	varnos = pull_varnos((Node *) clauses);
-	cnt = bms_num_members(varnos);
+	for (i = 0; i < nclauses; i++)
+	{
+		Bitmapset * attnums = NULL;
 
-	/* if there's a single varno in the clauses, remember it */
-	if (bms_num_members(varnos) == 1)
-		*relid = bms_singleton_member(varnos);
+		if (! clause_is_mv_compatible(clauses[i], relid, &attnums, type))
+			elog(ERROR, "should not get non-mv-compatible clause");
 
-	bms_free(varnos);
+		clauses_attnums[i] = attnums;
+	}
 
-	return cnt;
+	return clauses_attnums;
+}
+
+static bool*
+make_cover_map(Bitmapset **stats_attnums, int nmvstats,
+			   Bitmapset **clauses_attnums, int nclauses)
+{
+	int		i, j;
+	bool   *cover_map	= (bool*)palloc0(nclauses * nmvstats);
+
+	for (i = 0; i < nmvstats; i++)
+		for (j = 0; j < nclauses; j++)
+			cover_map[i * nclauses + j]
+				= bms_is_subset(clauses_attnums[j], stats_attnums[i]);
+
+	return cover_map;
 }
 
 /*
- * We're looking for statistics matching at least 2 attributes, referenced in
- * clauses compatible with multivariate statistics. The current selection
- * criteria is very simple - we choose the statistics referencing the most
- * attributes.
- *
- * If there are multiple statistics referencing the same number of columns
- * (from the clauses), the one with less source columns (as listed in the
- * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
- *
- * This is a very simple criteria, and has several weaknesses:
- *
- * (a) does not consider the accuracy of the statistics
+ * Chooses the combination of statistics, optimal for estimation of a particular
+ * clause list.
  *
- *     If there are two histograms built on the same set of columns, but one
- *     has 100 buckets and the other one has 1000 buckets (thus likely
- *     providing better estimates), this is not currently considered.
+ * This only handles a 'preparation' shared by the exhaustive and greedy
+ * implementations (see the previous methods), mostly trying to reduce the size
+ * of the problem (eliminate clauses/statistics that can't be really used in
+ * the solution).
  *
- * (b) does not consider the type of statistics
+ * It also precomputes bitmaps for attributes covered by clauses and statistics,
+ * so that we don't need to do that over and over in the actual optimizations
+ * (as it's both CPU and memory intensive).
  *
- *     If there are three statistics - one containing just a MCV list, another
- *     one with just a histogram and a third one with both, we treat them equally.
  *
- * (c) does not consider the number of clauses
+ * TODO Another way to make the optimization problems smaller might be splitting
+ *      the statistics into several disjoint subsets, i.e. if we can split the
+ *      graph of statistics (after the elimination) into multiple components
+ *      (so that stats in different components share no attributes), we can do
+ *      the optimization for each component separately.
  *
- *     As explained, only the number of referenced attributes counts, so if
- *     there are multiple clauses on a single attribute, this still counts as
- *     a single attribute.
- *
- * (d) does not consider type of condition
- *
- *     Some clauses may work better with some statistics - for example equality
- *     clauses probably work better with MCV lists than with histograms. But
- *     IS [NOT] NULL conditions may often work better with histograms (thanks
- *     to NULL-buckets).
- *
- * So for example with five WHERE conditions
- *
- *     WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
- *
- * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
- * as it references the most columns.
- *
- * Once we have selected the multivariate statistics, we split the list of
- * clauses into two parts - conditions that are compatible with the selected
- * stats, and conditions are estimated using simple statistics.
- *
- * From the example above, conditions
- *
- *     (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
- *
- * will be estimated using the multivariate statistics (a,b,c,d) while the last
- * condition (e = 1) will get estimated using the regular ones.
- *
- * There are various alternative selection criteria (e.g. counting conditions
- * instead of just referenced attributes), but eventually the best option should
- * be to combine multiple statistics. But that's much harder to do correctly.
- *
- * TODO Select multiple statistics and combine them when computing the estimate.
- *
- * TODO This will probably have to consider compatibility of clauses, because
- *      'dependencies' will probably work only with equality clauses.
+ * TODO If we could compute what is a "perfect solution" maybe we could
+ *      terminate the search after reaching ~90% of it? Say, if we knew that we
+ *      can cover 10 clauses and reuse 8 dependencies, maybe covering 9 clauses
+ *      and 7 dependencies would be OK?
  */
-static MVStatisticInfo *
-choose_mv_statistics(List *stats, Bitmapset *attnums)
+static List*
+choose_mv_statistics(PlannerInfo *root, Index relid, List *stats,
+					 List *clauses, List *conditions)
 {
 	int i;
-	ListCell   *lc;
+	mv_solution_t *best = NULL;
+	List *result = NIL;
+
+	int nmvstats;
+	MVStatisticInfo *mvstats;
 
-	MVStatisticInfo *choice = NULL;
+	/* we only work with MCV lists and histograms here */
+	int type = (MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST);
 
-	int current_matches = 1;						/* goal #1: maximize */
-	int current_dims = (MVSTATS_MAX_DIMENSIONS+1);	/* goal #2: minimize */
+	bool   *clause_cover_map = NULL,
+		   *condition_cover_map = NULL;
+	int	   *ruled_out = NULL;
+
+	/* build bitmapsets for all stats and clauses */
+	Bitmapset **stats_attnums;
+	Bitmapset **clauses_attnums;
+	Bitmapset **conditions_attnums;
+
+	int nclauses, nconditions;
+	Node ** clauses_array;
+	Node ** conditions_array;
+
+	/* copy lists, so that we can free them during elimination easily */
+	clauses = list_copy(clauses);
+	conditions = list_copy(conditions);
+	stats = list_copy(stats);
 
 	/*
-	 * Walk through the statistics (simple array with nmvstats elements) and for
-	 * each one count the referenced attributes (encoded in the 'attnums' bitmap).
+	 * Reduce the optimization problem size as much as possible.
+	 *
+	 * Eliminate clauses and conditions not covered by any statistics,
+	 * or statistics not matching at least two attributes (one of them
+	 * has to be in a regular clause).
+	 *
+	 * It's possible that removing a statistics in one iteration
+	 * eliminates clause in the next one, so we'll repeat this until we
+	 * eliminate no clauses/stats in that iteration.
+	 *
+	 * This can only happen after eliminating a statistics - clauses are
+	 * eliminated first, so statistics always reflect that.
 	 */
-	foreach (lc, stats)
+	while (true)
 	{
-		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+		List	   *tmp;
 
-		/* columns matching this statistics */
-		int matches = 0;
+		Bitmapset *compatible_attnums = NULL;
+		Bitmapset *condition_attnums  = NULL;
+		Bitmapset *all_attnums = NULL;
 
-		int2vector * attrs = info->stakeys;
-		int	numattrs = attrs->dim1;
-
-		/* skip dependencies-only stats */
-		if (! (info->mcv_built || info->hist_built))
-			continue;
+		/*
+		 * Clauses
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. We'll also keep info
+		 * about attnums in clauses (without conditions) so that we can
+		 * ignore stats covering just conditions (which is pointless).
+		 */
+		tmp = filter_clauses(root, relid, type,
+							 stats, clauses, &compatible_attnums);
 
-		/* count columns covered by the histogram */
-		for (i = 0; i < numattrs; i++)
-			if (bms_is_member(attrs->values[i], attnums))
-				matches++;
+		/* discard the original list */
+		list_free(clauses);
+		clauses = tmp;
 
 		/*
-		 * Use this statistics when it improves the number of matches or
-		 * when it matches the same number of attributes but is smaller.
+		 * Conditions
+		 *
+		 * Walk through clauses and keep only those covered by at least
+		 * one of the statistics we still have. Also, collect bitmap of
+		 * attributes so that we can make sure we add at least one new
+		 * attribute (by comparing with clauses).
 		 */
-		if ((matches > current_matches) ||
-			((matches == current_matches) && (current_dims > numattrs)))
+		if (conditions != NIL)
 		{
-			choice = info;
-			current_matches = matches;
-			current_dims = numattrs;
+			tmp = filter_clauses(root, relid, type,
+								 stats, conditions, &condition_attnums);
+
+			/* discard the original list */
+			list_free(conditions);
+			conditions = tmp;
 		}
-	}
 
-	return choice;
-}
+		/* get a union of attnums (from conditions and new clauses) */
+		all_attnums = bms_union(compatible_attnums, condition_attnums);
 
+		/*
+		 * Statisitics
+		 *
+		 * Walk through statistics and only keep those covering at least
+		 * one new attribute (excluding conditions) and at two attributes
+		 * in both clauses and conditions.
+		 */
+		tmp = filter_stats(stats, compatible_attnums, all_attnums);
 
-/*
- * This splits the clauses list into two parts - one containing clauses that
- * will be evaluated using the chosen statistics, and the remaining clauses
- * (either non-mvcompatible, or not related to the histogram).
- */
-static List *
-clauselist_mv_split(PlannerInfo *root, Index relid,
-					List *clauses, List **mvclauses,
-					MVStatisticInfo *mvstats, int types)
-{
-	int i;
-	ListCell *l;
-	List	 *non_mvclauses = NIL;
+		/* if we've not eliminated anything, terminate */
+		if (list_length(stats) == list_length(tmp))
+			break;
 
-	/* FIXME is there a better way to get info on int2vector? */
-	int2vector * attrs = mvstats->stakeys;
-	int	numattrs = mvstats->stakeys->dim1;
+		/* work only with filtered statistics from now */
+		list_free(stats);
+		stats = tmp;
+	}
 
-	Bitmapset *mvattnums = NULL;
+	/* only do the optimization if we have clauses/statistics */
+	if ((list_length(stats) == 0) || (list_length(clauses) == 0))
+		return NULL;
 
-	/* build bitmap of attributes, so we can do bms_is_subset later */
-	for (i = 0; i < numattrs; i++)
-		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+	/* remove redundant stats (stats covered by another stats) */
+	stats = filter_redundant_stats(stats, clauses, conditions);
 
-	/* erase the list of mv-compatible clauses */
-	*mvclauses = NIL;
+	/*
+	 * TODO We should sort the stats to make the order deterministic,
+	 *      otherwise we may get different estimates on different
+	 *      executions - if there are multiple "equally good" solutions,
+	 *      we'll keep the first solution we see.
+	 *
+	 *      Sorting by OID probably is not the right solution though,
+	 *      because we'd like it to be somehow reproducible,
+	 *      irrespectedly of the order of ADD STATISTICS commands.
+	 *      So maybe statkeys?
+	 */
+	mvstats = make_stats_array(stats, &nmvstats);
+	stats_attnums = make_stats_attnums(mvstats, nmvstats);
 
-	foreach (l, clauses)
-	{
-		bool		match = false;	/* by default not mv-compatible */
-		Bitmapset	*attnums = NULL;
-		Node	   *clause = (Node *) lfirst(l);
+	/* collect clauses an bitmap of attnums */
+	clauses_array = make_clauses_array(clauses, &nclauses);
+	clauses_attnums = make_clauses_attnums(root, relid, type,
+										   clauses_array, nclauses);
+
+	/* collect conditions and bitmap of attnums */
+	conditions_array = make_clauses_array(conditions, &nconditions);
+	conditions_attnums = make_clauses_attnums(root, relid, type,
+											  conditions_array, nconditions);
 
-		if (clause_is_mv_compatible(clause, relid, &attnums, types))
+	/*
+	 * Build bitmaps with info about which clauses/conditions are
+	 * covered by each statistics (so that we don't need to call the
+	 * bms_is_subset over and over again).
+	 */
+	clause_cover_map = make_cover_map(stats_attnums, nmvstats,
+									  clauses_attnums, nclauses);
+
+	condition_cover_map	= make_cover_map(stats_attnums, nmvstats,
+										 conditions_attnums, nconditions);
+
+	ruled_out =  (int*)palloc0(nmvstats * sizeof(int));
+
+	/* no stats are ruled out by default */
+	for (i = 0; i < nmvstats; i++)
+		ruled_out[i] = -1;
+
+	/* do the optimization itself */
+	if (mvstat_search_type == MVSTAT_SEARCH_EXHAUSTIVE)
+		choose_mv_statistics_exhaustive(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+	else
+		choose_mv_statistics_greedy(root, 0,
+									   nmvstats, mvstats, stats_attnums,
+									   nclauses, clauses_array, clauses_attnums,
+									   nconditions, conditions_array, conditions_attnums,
+									   clause_cover_map, condition_cover_map,
+									   ruled_out, NULL, &best);
+
+	/* create a list of statistics from the array */
+	if (best != NULL)
+	{
+		for (i = 0; i < best->nstats; i++)
 		{
-			/* are all the attributes part of the selected stats? */
-			if (bms_is_subset(attnums, mvattnums))
-				match = true;
+			MVStatisticInfo *info = makeNode(MVStatisticInfo);
+			memcpy(info, &mvstats[best->stats[i]], sizeof(MVStatisticInfo));
+			result = lappend(result, info);
 		}
 
-		/*
-		 * The clause matches the selected stats, so put it to the list of
-		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
-		 * clauses (that may be selected later).
-		 */
-		if (match)
-			*mvclauses = lappend(*mvclauses, clause);
-		else
-			non_mvclauses = lappend(non_mvclauses, clause);
+		pfree(best);
 	}
 
-	/*
-	 * Perform regular estimation using the clauses incompatible with the chosen
-	 * histogram (or MV stats in general).
-	 */
-	return non_mvclauses;
+	/* cleanup (maybe leave it up to the memory context?) */
+	for (i = 0; i < nmvstats; i++)
+		bms_free(stats_attnums[i]);
+
+	for (i = 0; i < nclauses; i++)
+		bms_free(clauses_attnums[i]);
+
+	for (i = 0; i < nconditions; i++)
+		bms_free(conditions_attnums[i]);
+
+	pfree(stats_attnums);
+	pfree(clauses_attnums);
+	pfree(conditions_attnums);
+
+	pfree(clauses_array);
+	pfree(conditions_array);
+	pfree(clause_cover_map);
+	pfree(condition_cover_map);
+	pfree(ruled_out);
+	pfree(mvstats);
+
+	list_free(clauses);
+	list_free(conditions);
+	list_free(stats);
 
+	return result;
 }
 
 typedef struct
@@ -1637,9 +2878,6 @@ has_stats(List *stats, int type)
 		/* terminate if we've found at least one matching statistics */
 		if (stats_type_matches(stat, type))
 			return true;
-
-		if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
-			return true;
 	}
 
 	return false;
@@ -1689,22 +2927,26 @@ find_stats(PlannerInfo *root, Index relid)
  *      as the clauses are processed (and skip items that are 'match').
  */
 static Selectivity
-clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
-								  MVStatisticInfo *mvstats, bool *fullmatch,
-								  Selectivity *lowsel)
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, MVStatisticInfo *mvstats,
+								  List *clauses, List *conditions, bool is_or,
+								  bool *fullmatch, Selectivity *lowsel)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	MCVList mcvlist = NULL;
+
 	int	nmatches = 0;
+	int	nconditions = 0;
 
 	/* match/mismatch bitmap for each MCV item */
 	char * matches = NULL;
+	char * condition_matches = NULL;
 
 	Assert(clauses != NIL);
-	Assert(list_length(clauses) >= 2);
+	Assert(list_length(clauses) >= 1);
 
 	/* there's no MCV list built yet */
 	if (! mvstats->mcv_built)
@@ -1715,32 +2957,85 @@ clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
 	Assert(mcvlist != NULL);
 	Assert(mcvlist->nitems > 0);
 
-	/* by default all the MCV items match the clauses fully */
-	matches = palloc0(sizeof(char) * mcvlist->nitems);
-	memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
-
 	/* number of matching MCV items */
 	nmatches = mcvlist->nitems;
+	nconditions = mcvlist->nitems;
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
+	 */
+	matches = palloc0(sizeof(char) * nmatches);
+
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char) * nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
 
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		nconditions = update_match_bitmap_mcvlist(root, conditions,
+									   mvstats->stakeys, mcvlist,
+									   nconditions, condition_matches,
+									   lowsel, fullmatch, false);
+
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all MCV items, even those
+	 *      ruled out by the conditions. The final result should be the
+	 *      same, but it might be faster.
+	 */
 	nmatches = update_match_bitmap_mcvlist(root, clauses,
 										   mvstats->stakeys, mcvlist,
-										   nmatches, matches,
-										   lowsel, fullmatch, false);
+										   ((is_or) ? 0 : nmatches), matches,
+										   lowsel, fullmatch, is_or);
 
 	/* sum frequencies for all the matching MCV items */
 	for (i = 0; i < mcvlist->nitems; i++)
 	{
-		/* used to 'scale' for MCV lists not covering all tuples */
+		/*
+		 * Find out what part of the data is covered by the MCV list,
+		 * so that we can 'scale' the selectivity properly (e.g. when
+		 * only 50% of the sample items got into the MCV, and the rest
+		 * is either in a histogram, or not covered by stats).
+		 *
+		 * TODO This might be handled by keeping a global "frequency"
+		 *      for the whole list, which might save us a bit of time
+		 *      spent on accessing the not-matching part of the MCV list.
+		 *      Although it's likely in a cache, so it's very fast.
+		 */
 		u += mcvlist->items[i]->frequency;
 
+		/* skit MCV items not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+
 		if (matches[i] != MVSTATS_MATCH_NONE)
 			s += mcvlist->items[i]->frequency;
+
+		t += mcvlist->items[i]->frequency;
 	}
 
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mcvlist);
 
-	return s*u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /*
@@ -1971,64 +3266,57 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 				}
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each MCV item */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching MCV items */
-			or_nmatches = mcvlist->nitems;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mcvlist->nitems;
 
 			/* by default none of the MCV items matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			tmp_matches = palloc0(sizeof(char) * mcvlist->nitems);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mcvlist->nitems);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+			tmp_nmatches = update_match_bitmap_mcvlist(root, tmp_clauses,
 									   stakeys, mcvlist,
-									   or_nmatches, or_matches,
+									   tmp_nmatches, tmp_matches,
 									   lowsel, fullmatch, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mcvlist->nitems; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
+			pfree(tmp_matches);
 
 		}
 		else
-		{
 			elog(ERROR, "unknown clause type: %d", clause->type);
-		}
 	}
 
 	/*
@@ -2086,15 +3374,18 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
  *      this is not uncommon, but for histograms it's not that clear.
  */
 static Selectivity
-clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
-									MVStatisticInfo *mvstats)
+clauselist_mv_selectivity_histogram(PlannerInfo *root, MVStatisticInfo *mvstats,
+									List *clauses, List *conditions, bool is_or)
 {
 	int i;
 	Selectivity s = 0.0;
+	Selectivity t = 0.0;
 	Selectivity u = 0.0;
 
 	int		nmatches = 0;
+	int		nconditions = 0;
 	char   *matches = NULL;
+	char   *condition_matches = NULL;
 
 	MVSerializedHistogram mvhist = NULL;
 
@@ -2107,25 +3398,55 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	Assert (mvhist != NULL);
 	Assert (clauses != NIL);
-	Assert (list_length(clauses) >= 2);
+	Assert (list_length(clauses) >= 1);
+
+	nmatches = mvhist->nbuckets;
+	nconditions = mvhist->nbuckets;
 
 	/*
-	 * Bitmap of bucket matches (mismatch, partial, full). by default
-	 * all buckets fully match (and we'll eliminate them).
+	 * Bitmap of bucket matches (mismatch, partial, full).
+	 *
+	 * For AND clauses all buckets match (and we'll eliminate them).
+	 * For OR  clauses no  buckets match (and we'll add them).
+	 *
+	 * We only need to do the memset for AND clauses (for OR clauses
+	 * it's already set correctly by the palloc0).
 	 */
-	matches = palloc0(sizeof(char) * mvhist->nbuckets);
-	memset(matches,  MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
+	matches = palloc0(sizeof(char) * nmatches);
 
-	nmatches = mvhist->nbuckets;
+	if (! is_or) /* AND-clause */
+		memset(matches, MVSTATS_MATCH_FULL, sizeof(char)*nmatches);
+
+	/* Conditions are treated as AND clause, so match by default. */
+	condition_matches = palloc0(sizeof(char)*nconditions);
+	memset(condition_matches, MVSTATS_MATCH_FULL, sizeof(char)*nconditions);
+
+	/*
+	 * build the match bitmap for the conditions (conditions are always
+	 * connected by AND)
+	 */
+	if (conditions != NIL)
+		update_match_bitmap_histogram(root, conditions,
+								  mvstats->stakeys, mvhist,
+								  nconditions, condition_matches, false);
 
-	/* build the match bitmap */
+	/*
+	 * build the match bitmap for the estimated clauses
+	 *
+	 * TODO This evaluates the clauses for all buckets, even those
+	 *      ruled out by the conditions. The final result should be
+	 *      the same, but it might be faster.
+	 */
 	update_match_bitmap_histogram(root, clauses,
 								  mvstats->stakeys, mvhist,
-								  nmatches, matches, false);
+								  ((is_or) ? 0 : nmatches), matches,
+								  is_or);
 
 	/* now, walk through the buckets and sum the selectivities */
 	for (i = 0; i < mvhist->nbuckets; i++)
 	{
+		float coeff = 1.0;
+
 		/*
 		 * Find out what part of the data is covered by the histogram,
 		 * so that we can 'scale' the selectivity properly (e.g. when
@@ -2139,10 +3460,23 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 		 */
 		u += mvhist->buckets[i]->ntuples;
 
+		/* skip buckets not matching the conditions */
+		if (condition_matches[i] == MVSTATS_MATCH_NONE)
+			continue;
+		else if (condition_matches[i] == MVSTATS_MATCH_PARTIAL)
+			coeff = 0.5;
+
+		t += coeff * mvhist->buckets[i]->ntuples;
+
 		if (matches[i] == MVSTATS_MATCH_FULL)
-			s += mvhist->buckets[i]->ntuples;
+			s += coeff * mvhist->buckets[i]->ntuples;
 		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
-			s += 0.5 * mvhist->buckets[i]->ntuples;
+			/*
+			 * TODO If both conditions and clauses match partially, this
+			 *      will use 0.25 match - not sure if that's the right
+			 *      thing solution, but seems about right.
+			 */
+			s += coeff * 0.5 * mvhist->buckets[i]->ntuples;
 	}
 
 #ifdef DEBUG_MVHIST
@@ -2151,9 +3485,14 @@ clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
 
 	/* release the allocated bitmap and deserialized histogram */
 	pfree(matches);
+	pfree(condition_matches);
 	pfree(mvhist);
 
-	return s * u;
+	/* no condition matches */
+	if (t == 0.0)
+		return (Selectivity)0.0;
+
+	return (s / t) * u;
 }
 
 /* cached result of bucket boundary comparison for a single dimension */
@@ -2344,7 +3683,7 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 
 			FmgrInfo	opproc;			/* operator */
 			fmgr_info(get_opcode(expr->opno), &opproc);
-
+ 
 			/* reset the cache (per clause) */
 			memset(callcache, 0, mvhist->nbuckets);
 
@@ -2504,64 +3843,57 @@ update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
 					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
 			}
 		}
-		else if (or_clause(clause) || and_clause(clause))
+		else if (or_clause(clause) || and_clause(clause) || not_clause(clause))
 		{
 			/* AND/OR clause, with all clauses compatible with the selected MV stat */
 
 			int			i;
-			BoolExpr   *orclause  = ((BoolExpr*)clause);
-			List	   *orclauses = orclause->args;
+			List	   *tmp_clauses = ((BoolExpr*)clause)->args;
 
 			/* match/mismatch bitmap for each bucket */
-			int	or_nmatches = 0;
-			char * or_matches = NULL;
+			int	tmp_nmatches = 0;
+			char * tmp_matches = NULL;
 
-			Assert(orclauses != NIL);
-			Assert(list_length(orclauses) >= 2);
+			Assert(tmp_clauses != NIL);
+			Assert((list_length(tmp_clauses) >= 2) || (not_clause(clause) && (list_length(tmp_clauses)==1)));
 
 			/* number of matching buckets */
-			or_nmatches = mvhist->nbuckets;
+			tmp_nmatches = (or_clause(clause)) ? 0 : mvhist->nbuckets;
 
-			/* by default none of the buckets matches the clauses */
-			or_matches = palloc0(sizeof(char) * or_nmatches);
+			/* by default none of the buckets matches the clauses (OR clause) */
+			tmp_matches = palloc0(sizeof(char) * mvhist->nbuckets);
 
-			if (or_clause(clause))
-			{
-				/* OR clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char)*or_nmatches);
-				or_nmatches = 0;
-			}
-			else
-			{
-				/* AND clauses assume nothing matches, initially */
-				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char)*or_nmatches);
-			}
+			/* but AND (and NOT) clauses assume everything matches, initially */
+			if (! or_clause(clause))
+				memset(tmp_matches, MVSTATS_MATCH_FULL, sizeof(char)*mvhist->nbuckets);
 
 			/* build the match bitmap for the OR-clauses */
-			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+			tmp_nmatches = update_match_bitmap_histogram(root, tmp_clauses,
 										stakeys, mvhist,
-										or_nmatches, or_matches, or_clause(clause));
+										tmp_nmatches, tmp_matches, or_clause(clause));
 
 			/* merge the bitmap into the existing one*/
 			for (i = 0; i < mvhist->nbuckets; i++)
 			{
+				/* if this is a NOT clause, we need to invert the results first */
+				if (not_clause(clause))
+					tmp_matches[i] = (MVSTATS_MATCH_FULL - tmp_matches[i]);
+
 				/*
 				 * To AND-merge the bitmaps, a MIN() semantics is used.
 				 * For OR-merge, use MAX().
 				 *
 				 * FIXME this does not decrease the number of matches
 				 */
-				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+				UPDATE_RESULT(matches[i], tmp_matches[i], is_or);
 			}
 
-			pfree(or_matches);
-
+			pfree(tmp_matches);
 		}
 		else
 			elog(ERROR, "unknown clause type: %d", clause->type);
 	}
 
-	/* free the call cache */
 	pfree(callcache);
 
 	return nmatches;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..57214e0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3518,7 +3518,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NIL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3541,7 +3542,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NIL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3708,7 +3710,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NIL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3744,7 +3746,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3781,7 +3784,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NIL);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -3919,12 +3923,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -3936,7 +3942,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NIL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index ea831f5..6299e75 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, NIL);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -342,7 +342,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NIL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d396ef1..805d633 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1627,13 +1627,15 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype, sjinfo,
+													NIL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype, sjinfo,
+														  NIL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6260,7 +6262,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  NIL);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6580,7 +6583,8 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  NIL);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7331,7 +7335,8 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   NIL);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7561,7 +7566,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL, NIL);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index edcafce..b7aabed 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -75,6 +75,7 @@
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_locale.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
@@ -393,6 +394,15 @@ static const struct config_enum_entry force_parallel_mode_options[] = {
 };
 
 /*
+ * Search algorithm for multivariate stats.
+ */
+static const struct config_enum_entry mvstat_search_options[] = {
+	{"greedy", MVSTAT_SEARCH_GREEDY, false},
+	{"exhaustive", MVSTAT_SEARCH_EXHAUSTIVE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3743,6 +3753,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"mvstat_search", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("Sets the algorithm used for combining multivariate stats."),
+			NULL
+		},
+		&mvstat_search_type,
+		MVSTAT_SEARCH_GREEDY, mvstat_search_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 3e4f4d1..d404914 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -90,6 +90,137 @@ even attempting to do the more expensive estimation.
 Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
+Combining multiple statistics
+-----------------------------
+
+When estimating selectivity of a list of clauses, there may exist no statistics
+covering all of them. If there are multiple statistics, each covering some
+subset of the attributes, the optimizer needs to figure out which of those
+statistics to apply.
+
+When the statistics do not overlap, the solution is trivial - we can simply
+split the groups of conditions by the matching statistics, and then multiply the
+selectivities. For example assume multivariate statistics on (b,c) and (d,e),
+and a condition like this:
+
+    (a=1) AND (b=2) AND (c=3) AND (d=4) AND (e=5)
+
+Then (a=1) is not covered by any of the statistics, so will be estimated using
+the regular per-column statistics. The two conditions ((b=2) AND (c=3)) will be
+estimated using the (b,c) statistics, and ((d=4) AND (e=5)) will be estimated
+using (d,e) statistics. And the resulting selectivities will be estimated.
+
+Now, what if the statistics overlap? For example assume the same condition as
+above, but let's say we have statistics on (a,b,c) and (a,c,d,e). What then?
+
+As selectivity is just a probability that the condition holds for a random row,
+we can write the selectivity like this:
+
+    P(a=1 & b=2 & c=3 & d=4 & e=5)
+
+and we can rewrite it using conditional probability like this
+
+    P(a=1 & b=2 & c=3) * P(d=4 & e=5 | a=1 & b=2 & c=3)
+
+Notice that the first part already matches to (a,b,c) statistics. If we assume
+that columns that are not referenced by the same statistics are independent, we
+may rewrite the second half like this
+
+    P(d=4 & e=5 | a=1 & b=2 & c=3) = P(d=4 & e=5 | a=1 & c=3)
+
+which corresponds to the statistics on (a,c,d,e).
+
+If there are multiple statistics defined on a table, it's not difficult to come
+up with examples when there are multiple ways to combine them to cover a list of
+clauses. We need a way to find the best combination of statistics.
+
+This is the purpose of choose_mv_statistics(). It searches through the possible
+combinations of statistics, and searches such combination that
+
+    (a) covers the most clauses of the list
+
+    (b) reuses the maximum number of clauses as conditions
+        (in conditional probabilities)
+
+While (a) criteria seems natural, the (b) may seem a bit awkward at first. The
+idea is that conditions in a way of transfering information about dependencies
+between statistics.
+
+There are two alternative implementations of choose_mv_statistics() - greedy
+and exhaustive. Exhaustive actually searches through all possible combinations
+of statistics, and for larger numbers of statistics may get quite expensive
+(as it, unsurprisingly, has exponential cost). Greedy terminates in less than
+K steps (when K is the number of clauses), and in each step chooses the best
+next statistics. I've been unable to come up with an example where those two
+approaches would produce different combinations.
+
+It's possible to choose the optimization using mvstat_search_type, with either
+'greedy' or 'exhaustive' values (default is 'greedy').
+
+    SET mvstat_search_type = 'exhaustive';
+
+Note: This is meant mostly for experimentation. I do expect we'll choose one of
+the algorithms and remove the GUC before commit.
+
+
+Limitations of combining statistics
+-----------------------------------
+
+As described in the section 'Combining multiple statistics', the current appoach
+is based on transfering information between statistics by means of conditional
+probabilities. This is a relatively cheap and efficient approach, but it is
+based on two assumptions:
+
+    (1) The overlap between the statistics needs to be sufficiently large, i.e.
+        there needs to be enough columns shared by the statistics to transfer
+        information about dependencies between the remaining columns.
+
+    (2) The query needs to include sufficient clauses on the shared columns.
+
+How a violation of those assumptions may be a problem can be illustrated by
+a simple example. Assume a table with three columns (a,b,c) containing exactly
+the same values, and statistics on (a,b) and (b,c):
+
+    CREATE TABLE test AS SELECT i, i, i
+                           FROM generate_series(1,1000);
+
+    CREATE STATISTICS s1 ON test (a,b) WITH (mcv);
+    CREATE STATISTICS s2 ON test (b,c) WITH (mcv);
+
+    ANALYZE test;
+
+First, let's estimate this query:
+
+    SELECT * FROM test WHERE (a < 10) AND (c < 10);
+
+Clearly, there are no conditions on 'b' (which is the only column shared by the
+two statistics), so we'll end up with an estimate based on assumption of
+independence:
+
+    P(a < 10) * P(c < 10) = 0.01 * 0.01 = 0.0001
+
+Which is a significant under-estimate, as the proper selectivity is 0.01.
+
+But let's estimate another query:
+
+    SELECT * FROM test WHERE (a < 10) AND (b < 500) AND (c < 10);
+
+In this case, the estimate may be computed for example like this:
+
+    P[(a < 10) & (b < 500) & (c < 10)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (a < 10) & (b < 500)]
+      = P[(a < 10) & (b < 500)] * P[(c < 10) | (b < 500)]
+
+The trouble is the probability P(c < 10 | b < 500) evaluates to 0.02, because
+we have assumed (a) and (c) are independent because there is no statistic
+containing both these columns, and the condition on (b) does not transfer
+sufficient amount of information between the two statistics.
+
+Currently, the only solution is to build statistics on all three columns, but
+see the 'combining statistics using convolution' section for ideas on how to
+improve this.
+
+
 Further (possibly crazy) ideas
 ------------------------------
 
@@ -111,3 +242,38 @@ But of course, this may result in expensive estimation (CPU-wise).
 
 So we might add a GUC to choose between a simple (single statistics) and thus
 multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
+
+
+Combining stats using convolution
+---------------------------------
+
+While the current approach for combining statistics is based on conditional
+probabilities, and thus only works when the query includes conditions on the
+overlapping parts of the statistics. But there may be other ways to combine
+statistics, relaxing this requirement.
+
+Let's assume two histograms H1 and H2 - then combining them might work about
+like this:
+
+
+    for (buckets of H1, satisfying local conditions)
+    {
+        for (buckets of H2, overlapping with H1 bucket)
+        {
+            mark H2 bucket as 'valid'
+        }
+    }
+
+    s1 = s2 = 0.0
+    for (buckets of H2 marked as valid)
+    {
+        s1 += frequency
+
+        if (bucket satistifes local conditions)
+             s2 += frequency
+    }
+
+    s = (s2 / s1) /* final selectivity estimate */
+
+However this may quickly get non-trivial, e.g. when combining two statistics
+of different types (histogram vs. MCV).
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index fea2bb7..33f5a1b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -192,11 +192,13 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   List *conditions);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   List *conditions);
 
 #endif   /* COST_H */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 777c7da..2b67772 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+typedef enum MVStatSearchType
+{
+	MVSTAT_SEARCH_EXHAUSTIVE,		/* exhaustive search */
+	MVSTAT_SEARCH_GREEDY			/* greedy search */
+}	MVStatSearchType;
+
+extern int mvstat_search_type;
+
 /*
  * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
-- 
2.5.0

0007-multivariate-ndistinct-coefficients.patchtext/x-patch; name=0007-multivariate-ndistinct-coefficients.patchDownload

From 238d8994c85d8b64bd898604b6ad1219850a5a26 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Wed, 23 Dec 2015 02:07:58 +0100
Subject: [PATCH 7/9] multivariate ndistinct coefficients

---
 doc/src/sgml/ref/create_statistics.sgml    |   9 ++
 src/backend/catalog/system_views.sql       |   3 +-
 src/backend/commands/analyze.c             |   2 +-
 src/backend/commands/statscmds.c           |  11 +-
 src/backend/optimizer/path/clausesel.c     |   4 +
 src/backend/optimizer/util/plancat.c       |   4 +-
 src/backend/utils/adt/selfuncs.c           |  93 +++++++++++++++-
 src/backend/utils/mvstats/Makefile         |   2 +-
 src/backend/utils/mvstats/README.ndistinct |  83 ++++++++++++++
 src/backend/utils/mvstats/README.stats     |   2 +
 src/backend/utils/mvstats/common.c         |  23 +++-
 src/backend/utils/mvstats/mvdist.c         | 171 +++++++++++++++++++++++++++++
 src/include/catalog/pg_mv_statistic.h      |  26 +++--
 src/include/nodes/relation.h               |   2 +
 src/include/utils/mvstats.h                |   9 +-
 src/test/regress/expected/rules.out        |   3 +-
 16 files changed, 424 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.ndistinct
 create mode 100644 src/backend/utils/mvstats/mvdist.c

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index f7336fd..80e472f 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -168,6 +168,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
    </variablelist>
 
   </refsect2>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b151db1..8d2b435 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -169,7 +169,8 @@ CREATE VIEW pg_mv_stats AS
         length(S.stamcv) AS mcvbytes,
         pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
         length(S.stahist) AS histbytes,
-        pg_mv_stats_histogram_info(S.stahist) AS histinfo
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo,
+        standcoeff AS ndcoeff
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9087532..c29f1be 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -582,7 +582,7 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 		}
 
 		/* Build multivariate stats (if there are any). */
-		build_mv_stats(onerel, numrows, rows, attr_cnt, vacattrstats);
+		build_mv_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index e0b085f..a7c569d 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -72,7 +72,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool 	build_dependencies = false,
 			build_mcv = false,
-			build_histogram = false;
+			build_histogram = false,
+			build_ndistinct = false;
 
 	int32 	max_buckets = -1,
 			max_mcv_items = -1;
@@ -155,6 +156,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 		if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "max_mcv_items") == 0)
@@ -209,10 +212,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* check that at least some statistics were requested */
-	if (! (build_dependencies || build_mcv || build_histogram))
+	if (! (build_dependencies || build_mcv || build_histogram || build_ndistinct))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (dependencies, mcv, histogram) was requested")));
+				 errmsg("no statistics type (dependencies, mcv, histogram, ndistinct) was requested")));
 
 	/* now do some checking of the options */
 	if (require_mcv && (! build_mcv))
@@ -246,6 +249,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_deps_enabled -1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled  -1] = BoolGetDatum(build_mcv);
 	values[Anum_pg_mv_statistic_hist_enabled -1] = BoolGetDatum(build_histogram);
+	values[Anum_pg_mv_statistic_ndist_enabled-1] = BoolGetDatum(build_ndistinct);
 
 	values[Anum_pg_mv_statistic_mcv_max_items    -1] = Int32GetDatum(max_mcv_items);
 	values[Anum_pg_mv_statistic_hist_max_buckets -1] = Int32GetDatum(max_buckets);
@@ -253,6 +257,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
 	nulls[Anum_pg_mv_statistic_stahist  -1] = true;
+	nulls[Anum_pg_mv_statistic_standist -1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 14e3444..63baa73 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -59,6 +59,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 #define		MV_CLAUSE_TYPE_FDEP		0x01
 #define		MV_CLAUSE_TYPE_MCV		0x02
 #define		MV_CLAUSE_TYPE_HIST		0x04
+#define		MV_CLAUSE_TYPE_NDIST	0x08
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 							 int type);
@@ -2860,6 +2861,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & MV_CLAUSE_TYPE_HIST) && stat->hist_built)
 		return true;
 
+	if ((type & MV_CLAUSE_TYPE_NDIST) && stat->ndist_built)
+		return true;
+
 	return false;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 2519249..3741b7a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -412,7 +412,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built)
+			if (mvstat->deps_built || mvstat->mcv_built || mvstat->hist_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -423,11 +423,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled  = mvstat->mcv_enabled;
 				info->hist_enabled = mvstat->hist_enabled;
+				info->ndist_enabled = mvstat->ndist_enabled;
 
 				/* built/available statistics */
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built  = mvstat->mcv_built;
 				info->hist_built = mvstat->hist_built;
+				info->ndist_built = mvstat->ndist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 805d633..f8d39aa 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -132,6 +132,7 @@
 #include "utils/fmgroids.h"
 #include "utils/index_selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/nabstime.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
@@ -206,6 +207,7 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static Oid find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3423,12 +3425,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient
+			 * to get the (probably way more accurate) estimate.
+			 *
+			 * XXX Probably needs refactoring (don't like to mix with clamp
+			 *     and coeff at the same time).
 			 */
 			double		clamp = rel->tuples;
+			double		coeff = 1.0;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				Oid oid = find_ndistinct_coeff(root, rel, varinfos);
+
+				if (oid != InvalidOid)
+					coeff = load_mv_ndistinct(oid);
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3437,6 +3453,13 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
+			/*
+			 * Apply ndistinct coefficient from multivar stats (we must do this
+			 * before clamping the estimate in any way.
+			 */
+			reldistinct /= coeff;
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7583,3 +7606,71 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics.
+ */
+static Oid
+find_ndistinct_coeff(PlannerInfo *root, RelOptInfo *rel, List *varinfos)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+	VariableStatData vardata;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+				= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach (lc, rel->mvstatlist)
+	{
+		int i;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/* only exact matches for now (same set of columns) */
+		if (bms_num_members(attnums) != info->stakeys->dim1)
+			continue;
+
+		/* check that the columns match */
+		for (i = 0; i < info->stakeys->dim1; i++)
+			if (bms_is_member(info->stakeys->values[i], attnums))
+				continue;
+
+		return info->mvoid;
+	}
+
+	return InvalidOid;
+}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 9dbb3b6..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o histogram.o mcv.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.ndistinct b/src/backend/utils/mvstats/README.ndistinct
new file mode 100644
index 0000000..32d1624
--- /dev/null
+++ b/src/backend/utils/mvstats/README.ndistinct
@@ -0,0 +1,83 @@
+ndistinct coefficients
+======================
+
+Estimating number of distinct groups in a combination of columns is tricky,
+and the estimation error is often significant. By ndistinct coefficient we
+mean a ratio
+
+    q = ndistinct(a) * ndistinct(b) / ndistinct(a,b)
+
+where 'a' and 'b' are columns, ndistinct(a) is (an estimate of) a number of
+distinct values in column 'a'. And ndistinct(a,b) is the same thing for the
+pair of columns.
+
+The meaning of the coefficient may be illustrated by answering the following
+question: Given a combination of columns (a,b), how many distinct values of 'b'
+matches a chosen value of 'a' on average?
+
+Let's assume we know ndistinct(a) and ndistinct(a,b). Then the answer to the
+question clearly is
+
+    ndistinct(a,b) / ndistinct(a)
+
+and by using 'q' we may rewrite this as
+
+    ndistinct(b) / q
+
+so 'q' may be considered as a correction factor of the ndistinct estimate given
+a condition on one of the columns.
+
+This may be generalized to a combination of 'n' columns
+
+    [ndistinct(c1) * ... * ndistinct(cn)] / ndistinct(c1, ..., cn)
+
+and the meaning is very similar, except that we need to use conditions on (n-1)
+of the columns.
+
+
+Selectivity estimation
+----------------------
+
+As explained in the previous paragraph, ndistinct coefficients may be used to
+estimate cardinality of a column, given some apriori knowledge. Let's assume
+we need to estimate selectivity of a condition
+
+    (a=1) AND (b=2)
+
+which we can expand like this
+
+    P(a=1 & b=2) = P(a=1) * P(b=2 | a=1)
+
+Let's also assume that the distribution of 'b' is uniform, i.e. that
+
+    P(a=1) = 1/ndistinct(a)
+    P(b=2) = 1/ndistinct(b)
+    P(a=1 & b=2) = 1/ndistinct(a,b)
+
+    P(b=2 | a=1) = ndistinct(a) / ndistinct(a,b)
+
+which may be rewritten like
+
+    P(b=2 | a=1)
+        = ndistinct(a,b) / ndistinct(a)
+        = (1/ndistinct(b)) * [(ndistinct(a) * ndistinct(b)) / ndistinct(a,b)]
+        = (1/ndistinct(b)) * q
+
+and therefore
+
+    P(a=1 & b=2) = (1/ndistinct(a)) * (1/ndistinct(b)) * q
+
+This also illustrates 'q' as a correction coefficient.
+
+It also explains why we store the coefficient and not simply ndistinct(a,b).
+This way we can simply estimate individual clauses and then simply correct
+the estimate by multiplying the result with 'q' - we don't have to mess with
+ndistinct estimates at all.
+
+Naturally, as the coefficient is derives from ndistinct(a,b), it may be also
+used to estimate GROUP BY clauses on the combination of columns, replacing the
+existing heuristics in estimate_num_groups().
+
+Note: Currently only the GROUP BY estimation is implemented. It's a bit unclear
+how to implement the clause estimation when there are other statistics (esp.
+MCV lists and/or functional dependencies) available.
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index d404914..6d4b09b 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -20,6 +20,8 @@ Currently we only have two kinds of multivariate statistics
 
     (c) multivariate histograms (README.histogram)
 
+    (d) ndistinct coefficients
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index f6d1074..d34d072 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -32,7 +32,8 @@ static List* list_mv_stats(Oid relid);
  * and serializes them back into the catalog (as bytea values).
  */
 void
-build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats)
 {
 	ListCell *lc;
@@ -53,6 +54,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		MVDependencies	deps  = NULL;
 		MCVList		mcvlist   = NULL;
 		MVHistogram	histogram = NULL;
+		double		ndist	  = -1;
 		int numrows_filtered  = numrows;
 
 		VacAttrStats  **stats  = NULL;
@@ -92,6 +94,9 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		if (stat->ndist_enabled)
+			ndist = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
+
 		/* build the MCV list */
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
@@ -101,7 +106,7 @@ build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
 			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
 
 		/* store the histogram / MCV list in the catalog */
-		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, attrs, stats);
+		update_mv_stats(stat->mvoid, deps, mcvlist, histogram, ndist, attrs, stats);
 	}
 }
 
@@ -183,6 +188,8 @@ list_mv_stats(Oid relid)
 		info->mcv_built = stats->mcv_built;
 		info->hist_enabled = stats->hist_enabled;
 		info->hist_built = stats->hist_built;
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
 
 		result = lappend(result, info);
 	}
@@ -252,7 +259,7 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 void
 update_mv_stats(Oid mvoid,
 				MVDependencies dependencies, MCVList mcvlist, MVHistogram histogram,
-				int2vector *attrs, VacAttrStats **stats)
+				double ndistcoeff, int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
@@ -292,26 +299,36 @@ update_mv_stats(Oid mvoid,
 			= PointerGetDatum(data);
 	}
 
+	if (ndistcoeff > 1.0)
+	{
+		nulls[Anum_pg_mv_statistic_standist -1] = false;
+		values[Anum_pg_mv_statistic_standist-1] = Float8GetDatum(ndistcoeff);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_stadeps -1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv -1] = true;
 	replaces[Anum_pg_mv_statistic_stahist-1] = true;
+	replaces[Anum_pg_mv_statistic_standist-1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_deps_built -1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built -1] = false;
 	nulls[Anum_pg_mv_statistic_hist_built-1] = false;
+	nulls[Anum_pg_mv_statistic_ndist_built-1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys-1]     = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_deps_built-1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built  -1] = true;
+	replaces[Anum_pg_mv_statistic_ndist_built-1] = true;
 	replaces[Anum_pg_mv_statistic_hist_built -1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys -1]    = true;
 
 	values[Anum_pg_mv_statistic_deps_built-1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built  -1] = BoolGetDatum(mcvlist != NULL);
 	values[Anum_pg_mv_statistic_hist_built -1] = BoolGetDatum(histogram != NULL);
+	values[Anum_pg_mv_statistic_ndist_built-1] = BoolGetDatum(ndistcoeff > 1.0);
 	values[Anum_pg_mv_statistic_stakeys -1]    = PointerGetDatum(attrs);
 
 	/* Is there already a pg_mv_statistic tuple for this attribute? */
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..59b8358
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,171 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <math.h>
+
+#include "common.h"
+#include "utils/lsyscache.h"
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats)
+{
+	int i, j;
+	int f1, cnt, d;
+	int nmultiple, summultiple;
+	int numattrs = attrs->dim1;
+	MultiSortSupport mss = multi_sort_init(numattrs);
+	double ndistcoeff;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * numattrs);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	Assert(numattrs >= 2);
+
+	for (i = 0; i < numattrs; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, i, stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[i],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	ndistcoeff = 1 / estimate_ndistinct(totalrows, numrows, d, f1);
+
+	/*
+	 * now count distinct values for each attribute and incrementally
+	 * compute ndistinct(a,b) / (ndistinct(a) * ndistinct(b))
+	 *
+	 * FIXME Probably need to handle cases when one of the ndistinct
+	 *       estimates is negative, and also check that the combined
+	 *       ndistinct is greater than any of those partial values.
+	 */
+	for (i = 0; i < numattrs; i++)
+		ndistcoeff *= stats[i]->stadistinct;
+
+	return ndistcoeff;
+}
+
+double
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return DatumGetFloat8(deps);
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double	numer,
+			denom,
+			ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+			(double) f1 * (double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 7020772..e46cc6b 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -40,6 +40,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_enabled;		/* analyze dependencies? */
 	bool		mcv_enabled;		/* build MCV list? */
 	bool		hist_enabled;		/* build histogram? */
+	bool		ndist_enabled;		/* build ndist coefficient? */
 
 	/* histogram / MCV size */
 	int32		mcv_max_items;		/* max MCV items */
@@ -49,6 +50,7 @@ CATALOG(pg_mv_statistic,3381)
 	bool		deps_built;			/* dependencies were built */
 	bool		mcv_built;			/* MCV list was built */
 	bool		hist_built;			/* histogram was built */
+	bool		ndist_built;		/* ndistinct coeff built */
 
 	/* variable-length fields start here, but we allow direct access to stakeys */
 	int2vector	stakeys;			/* array of column keys */
@@ -57,6 +59,7 @@ CATALOG(pg_mv_statistic,3381)
 	bytea		stadeps;			/* dependencies (serialized) */
 	bytea		stamcv;				/* MCV list (serialized) */
 	bytea		stahist;			/* MV histogram (serialized) */
+	float8		standcoeff;			/* ndistinct coeff (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -72,7 +75,7 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					16
+#define Natts_pg_mv_statistic					19
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
@@ -80,14 +83,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
 #define Anum_pg_mv_statistic_deps_enabled		5
 #define Anum_pg_mv_statistic_mcv_enabled		6
 #define Anum_pg_mv_statistic_hist_enabled		7
-#define Anum_pg_mv_statistic_mcv_max_items		8
-#define Anum_pg_mv_statistic_hist_max_buckets	9
-#define Anum_pg_mv_statistic_deps_built			10
-#define Anum_pg_mv_statistic_mcv_built			11
-#define Anum_pg_mv_statistic_hist_built			12
-#define Anum_pg_mv_statistic_stakeys			13
-#define Anum_pg_mv_statistic_stadeps			14
-#define Anum_pg_mv_statistic_stamcv				15
-#define Anum_pg_mv_statistic_stahist			16
+#define Anum_pg_mv_statistic_ndist_enabled		8
+#define Anum_pg_mv_statistic_mcv_max_items		9
+#define Anum_pg_mv_statistic_hist_max_buckets	19
+#define Anum_pg_mv_statistic_deps_built			11
+#define Anum_pg_mv_statistic_mcv_built			12
+#define Anum_pg_mv_statistic_hist_built			13
+#define Anum_pg_mv_statistic_ndist_built		14
+#define Anum_pg_mv_statistic_stakeys			15
+#define Anum_pg_mv_statistic_stadeps			16
+#define Anum_pg_mv_statistic_stamcv				17
+#define Anum_pg_mv_statistic_stahist			18
+#define Anum_pg_mv_statistic_standist			19
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 84be0ce..ba587da 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -657,11 +657,13 @@ typedef struct MVStatisticInfo
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
 	bool		hist_enabled;	/* histogram enabled */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
 	bool		hist_built;		/* histogram built */
+	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 2b67772..67ed2f8 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -225,6 +225,7 @@ typedef MVSerializedHistogramData *MVSerializedHistogram;
 MVDependencies load_mv_dependencies(Oid mvoid);
 MCVList        load_mv_mcvlist(Oid mvoid);
 MVSerializedHistogram    load_mv_histogram(Oid mvoid);
+double		   load_mv_ndistinct(Oid mvoid);
 
 bytea * serialize_mv_dependencies(MVDependencies dependencies);
 bytea * serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
@@ -266,11 +267,17 @@ MVHistogram
 build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
 				   VacAttrStats **stats, int numrows_total);
 
-void build_mv_stats(Relation onerel, int numrows, HeapTuple *rows,
+double
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, double totalrows,
+					int numrows, HeapTuple *rows,
 					int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid, MVDependencies dependencies,
 					 MCVList mcvlist, MVHistogram histogram,
+					 double ndistcoeff,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #ifdef DEBUG_MVHIST
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 528ac36..7a914da 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1377,7 +1377,8 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length(s.stamcv) AS mcvbytes,
     pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
     length(s.stahist) AS histbytes,
-    pg_mv_stats_histogram_info(s.stahist) AS histinfo
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo,
+    s.standcoeff AS ndcoeff
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
-- 
2.5.0

0008-change-how-we-apply-selectivity-to-number-of-groups-.patchtext/x-patch; name=0008-change-how-we-apply-selectivity-to-number-of-groups-.patchDownload

From c968f05c26ecfa9344a8a9c9209bd755fa4ddf7b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Tue, 26 Jan 2016 18:14:33 +0100
Subject: [PATCH 8/9] change how we apply selectivity to number of groups
 estimate

Instead of simply multiplying the ndistinct estimate with selecticity,
we instead use the formula for the expected number of distinct values
observed in 'k' rows when there are 'd' distinct values in the bin

    d * (1 - ((d - 1) / d)^k)

This is 'with replacements' which seems appropriate for the use, and it
mostly assumes uniform distribution of the distinct values. So if the
distribution is not uniform (e.g. there are very frequent groups) this
may be less accurate than the current algorithm in some cases, giving
over-estimates. But that's probably better than OOM.
---
 src/backend/utils/adt/selfuncs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f8d39aa..76be0e3 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3464,9 +3464,9 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 				reldistinct = clamp;
 
 			/*
-			 * Multiply by restriction selectivity.
+			 * Estimate the number of distinct values observed in rel->rows.
 			 */
-			reldistinct *= rel->rows / rel->tuples;
+			reldistinct *= (1 - powl(1 - rel->rows/rel->tuples, rel->tuples/reldistinct));
 
 			/*
 			 * Update estimate of total distinct groups.
-- 
2.5.0

0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchtext/x-patch; name=0009-fixup-of-regression-tests-plans-changes-by-group-by-.patchDownload

From 29ea451f45fa5b8891ebde195551180f2841826d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 28 Feb 2016 21:16:40 +0100
Subject: [PATCH 9/9] fixup of regression tests (plans changes by group by
 estimation)

---
 src/test/regress/expected/subselect.out | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index de64ca7..0fc93d9 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -807,27 +807,24 @@ select * from int4_tbl where
 explain (verbose, costs off)
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
-                              QUERY PLAN                              
-----------------------------------------------------------------------
- Hash Join
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Hash Semi Join
    Output: o.f1
    Hash Cond: (o.f1 = "ANY_subquery".f1)
    ->  Seq Scan on public.int4_tbl o
          Output: o.f1
    ->  Hash
          Output: "ANY_subquery".f1, "ANY_subquery".g
-         ->  HashAggregate
+         ->  Subquery Scan on "ANY_subquery"
                Output: "ANY_subquery".f1, "ANY_subquery".g
-               Group Key: "ANY_subquery".f1, "ANY_subquery".g
-               ->  Subquery Scan on "ANY_subquery"
-                     Output: "ANY_subquery".f1, "ANY_subquery".g
-                     Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
-                     ->  HashAggregate
-                           Output: i.f1, (generate_series(1, 2) / 10)
-                           Group Key: i.f1
-                           ->  Seq Scan on public.int4_tbl i
-                                 Output: i.f1
-(18 rows)
+               Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+               ->  HashAggregate
+                     Output: i.f1, (generate_series(1, 2) / 10)
+                     Group Key: i.f1
+                     ->  Seq Scan on public.int4_tbl i
+                           Output: i.f1
+(15 rows)
 
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1);
-- 
2.5.0

#86

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#83)

Re: multivariate statistics v14

Hi,

On 03/16/2016 03:58 AM, Tatsuo Ishii wrote:

I apology if it's already discussed. I am new to this patch.

Attached is v15 of the patch series, fixing this and also doing quite a
few additional improvements:

* added some basic examples into the SGML documentation

* addressing the objectaddress omissions, as pointed out by Alvaro

* support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA

* significant refactoring of MCV and histogram code, particularly
serialization, deserialization and building

* reworking the functional dependencies to support more complex
dependencies, with multiple columns as 'conditions'

* the reduction using functional dependencies is also significantly
simplified (I decided to get rid of computing the transitive closure
for now - it got too complex after the multi-condition dependencies,
so I'll leave that for the future

Do you have any other missing parts in this work? I am asking
because I wonder if you want to push this into 9.6 or rather 9.7.

I think the first few parts of the patch series, namely:

* shared infrastructure (0002)
* functional dependencies (0003)
* MCV lists (0004)
* histograms (0005)

might make it into 9.6. I believe the code for building and storing the
different kinds of stats is reasonably solid. What probably needs more
thorough review are the changes in clauselist_selectivity(), but the
code in these parts is reasonably simple as it only supports using a
single multi-variate statistics per relation.

The part (0006) that allows using multiple statistics (i.e. selects
which of the available stats to use and in what order) is probably the
most complex part of the whole patch, and I myself do have some
questions about some aspects of it. I don't think this part might get
into 9.6 at this point (although it'd be nice if we managed to do that).

I can also imagine moving the ndistinct pieces forward, in front of 0006
if that helps getting it into 9.6. There's a bit more work on making it
more flexible, though, to allow handling subsets columns (currently we
need a perfect match).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#87

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#85)

Re: multivariate statistics v14

Many trailing white spaces found.

Sorry, haven't noticed that after one of the rebases. Fixed in the
attached v15 of the patch.

There are still few of traling spaces.

/home/t-ishii/0002-shared-infrastructure-and-functional-dependencies.patch:3792: trailing whitespace.
/home/t-ishii/0004-multivariate-MCV-lists.patch:471: trailing whitespace.
/home/t-ishii/0004-multivariate-MCV-lists.patch:656: space before tab in indent.
{
/home/t-ishii/0004-multivariate-MCV-lists.patch:682: space before tab in indent.
}
/home/t-ishii/0004-multivariate-MCV-lists.patch:685: space before tab in indent.
{
/home/t-ishii/0004-multivariate-MCV-lists.patch:715: trailing whitespace.
/home/t-ishii/0006-multi-statistics-estimation.patch:2513: trailing whitespace.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#87)

1 attachment(s)

Re: multivariate statistics v14

On 03/21/2016 12:00 AM, Tatsuo Ishii wrote:

Many trailing white spaces found.

Sorry, haven't noticed that after one of the rebases. Fixed in the
attached v15 of the patch.

There are still few of traling spaces.

/home/t-ishii/0002-shared-infrastructure-and-functional-dependencies.patch:3792: trailing whitespace.
/home/t-ishii/0004-multivariate-MCV-lists.patch:471: trailing whitespace.
/home/t-ishii/0004-multivariate-MCV-lists.patch:656: space before tab in indent.
{
/home/t-ishii/0004-multivariate-MCV-lists.patch:682: space before tab in indent.
}
/home/t-ishii/0004-multivariate-MCV-lists.patch:685: space before tab in indent.
{
/home/t-ishii/0004-multivariate-MCV-lists.patch:715: trailing whitespace.
/home/t-ishii/0006-multi-statistics-estimation.patch:2513: trailing whitespace.

Best regards,

D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few
remaining whitespace issues.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

multivariate-stats-v16.tgzapplication/x-compressed-tar; name=multivariate-stats-v16.tgzDownload

�B3�V�<ks�F����Y}X���)J���b+]��Wd6���TC`(b
4M�����go���KrW�J,����w������Z������=����C?�y�/w+��a�O�8j�
��]����G���]>]��go4�f��?����GG�G�>�
�_��N��'���"�<n�������`���G����<�����h$��}G����������Yo ���gc�d��v������s6	\��rv��o���q@e�v�8�^�H��X��	�X����������#v�t�8��K��s�������Y�s�#%e��
R��FI�ZI))�(i`YA4���B��d3�z����g�0�����nn��q���Z���dhw��~/|�,#w�~a'�\�x�6��z�����X��\O0{��{�� ���������e9�l�Z�{7b��u���!��;�#���Eo6h�O��>8f=���W��<��\�|�Z���CX:��1x���G�:p{������v�G�c���_���k�{�\�SO<O���V��E�>�����j|��X�J^4����Mx�o����S�,t��K��>~�D/ptB�V��l��w��w���P6
#�|#��s�Bw��\G6	�^��{����������/���P��2���C������{�d����_I<�������2�!��(!=�RH�u���I�%�����=D��f��Yi�����r�C���~�y��� ����0���_z����@G?�"\�g}!��Iy�w|�Dq��1������G��������P8-`f�A�
�-�;�Y������r�<��mW�/�_m���6��b�����?�������8����9q>������`x2�O��c����������of��O�����q�C4���kJ����"��1�+�dy�{tX��,����-��]��x�k�=����,����2P�H�E #�],�0�~���[P8���������<�^p�������2r�&�~u{y1�d����j<�z5f>_vs�"�A,���/����������}}{���/�L.o����6���NnX��/'l������������%@�*<�>����p�5 Sb����E����+3��H�Nc�s0s���������O����KPH����Bq��;dw2����"���<�l��J0���9L�N��m/vPo�:1�,
�9Ir��mI���j���Q�H��)
 �
����V�x��7
�l#	��a�/
"b+{��)�}/
��>�	���q�d`Es�<����4@r���`E|F	1$�|
@S�{�i�L �,���S�O��fo�z*��PGe���0o ���cRlG�:��@n�\)�h0��R�I��t	��g
��o�*���?�$;�2X^��n�����hny �Hd]���l���#\�����8WA�� ��K���dH�Vs����g3&��67%�m��H��6�6��A�V�TK��I�����0���(Q���������t�C:c0�V�*�&V-�]�+��,3�#���~�5�{�AK]�"Pg-��%�Zp��"��P;M���-_�(xf��z��Dx�� ��+1d�8$
��2P�8,�eskr�����|	�����m�[���R�K8�0��d����#����D��=�y=	���B.��:�������bC���q�l[T��(���xS�1��X3�W�2�ItD���D�6��@�,��W�����o.!�L�	x�&��A*t�15�����^��KMI���6�>���a�$#~��b
����a�+��K��I���P`
�����W{/ �@��!a
���0|p���<>�%:�(�w�`�
5R'7��a���t��a
�*�`WR(����z��"��H�w�����������}���>�)�������2P����sHQ�8
�P��x|(�=�I�Va���`��
7E<�K�h�����������i�`I�����r���H���i�R~��D$K����BKG^ ����������-r}3��5�;��rB3&��P�YEn���3sC��nYZ�~�$0�\
���q������~�p[V���C{������9kP0���B)���e:30��G���hO����]�"���,�%�G���|�^�.C0�D�;������$�PS{�"�Bt����{��Y��\�w<4)������l�D@��a��l-"�k.�s��6�d:�50,��;r�jA��K0��$}���f���R2��^�����Z�l��IO_��=�N�zJ���i�
�R���H�C'��V$|H�u���K�1:��&�k�rF���>�_@�/u}p�d`����6������	��L�=C��'������J0�{�I������2�'YU�Y��',��,A�`��EzO{�T���e���q��MH]t���hc��W�m���o��`�QX��EP���U
7|Y51���l��DS�d�3���p������(�p�Z8��w�X�?x9�^�������r@1�c�HYk����\��S�)$�z*QC��#����`�����?��yb7I��j���b#�;�"��B	h�kLh��*Y��T�qTE`'E��v>���UOY�L(e$@��S���.���h2�m����J�@�
��O����!)�l<�%?m���`���������_Io���`���~{���[�6�>�Q�<8�`��=O�v�1/P>:����U8�<>)��q�f��B��;�0���A���FN������0Q�"��y�|i4)��A�(eP������yE���b�3L�S���\��N[s��+3%q�bW�l�"��^�o�������������R�jB4����)E���������E�+�N��+13S>�zyud&��,a&n�v��kG|�d�X�]���;d9�.�Zz5��Z�L�����Z��h�F
��z��T���Z��Z��:��;�{w��;q1mwV�I�Xc�z� l�'iN�%���@��G��i����X�	����tu�#tS����n����n�U�D%���k+Pu��B��h��L�~�����xk��
Yp|��wv%�a���B����{�e�����N���Q�=��sr�����Bu�l��G}���G���
��E�n����n��-�G0�=���A�n[��/�����O��{,�?6�i<�|�����;�b���A��U�wu����zNzg
��P0�.��N�G����&�1��T��$�x�*�g�}����w=����I�lf�vY��/�74�k��YQ!kEm`3�U����A2+�"�a�!9i�k������w|��FU�`s�BMs�Sj����T�@s�N�Q��F��3�\��d��!�eMp9�.�m�4�[u�F�	Xb�`5FM�!�%CU��wt[�����	U��Y��B&���;��m��������������S5
*j�2��a�zQ~��u�#���@F����^��<X��96���6��k�:/�`\��'�#�vFK��S���{�[4e:#
dg��]�p}^��9��7��� $*��W8M�y�S�=�@�x~�8�u�/���6�'�#'Z�
��1�
@���d����)���}H���j������e �w]*��4}�����G�NO2�"G�'�~���T��W����t�O���m�K�r�y�B�;G�8G2�<�n�w%�������0��]��EX�����������1��������������P�z���m�@d�Ih�4�@��E2�:�)q��kD!��4MY�k�������1e�0���3]�U��y��e��&S�
���1�V��m�.�u����!lJ�h*v'K��B0L��<�GH�l+���-,�� �1����y���2�	7���y~_�H���������s���;W����0�FS�P�2��f*�������~�T�\��v}����7'����
��<a:0xD_�jb����/�B��}c2Rd��$�ZC[�3c\��,��B<���_Gt����7�b�V��@�@��A��AB�}��8�V�!�S3��,�-��R)�:�5A�_�fN��9j��3�w����0�����L�N���w�q�!��h���&WQVcA� ��I��$}]
�����N����
Yy�AUD�
��b����.�	8���H.l@�ci�#5%�3�����=n���	�_���NZ�������*p>~���i����S�����	��@��ok�$�
��`���17�u��|�����BR�<��Fe56���s}����f��S�g%�>n��C���
R{5z(��Z@���*�����������fO~�2r6����p�2�f�lyz����A��BR%�T�eA�J�R���k�
T�b����jR��K0	�~k�X����]}��o&����d�~�V���T�"l��6�*@�g���T��Va�����Q���g}�������_nCB��P� �Y(��}�v�r�p���a�&&qc�=��<H�_�$<��PW6jO��m��QM��d�2U�����T����d�R����26U���C�X ^���h�~\������)�H7}�5R�Q�(^�J~RQ�P�5U�s���
��U�p/MMu����xT�2�J�'�|0Kx�n�Q�7��Y���Z���(O�����R��E��L�Z���*pR��!22�l���a|��L�Y.�����r�)$�i����H��|W@��(cq�'� ]o������+Vu��*�]���r~�%���r�|&�:��u{�h�w
Eu��kZ���8k���f����I���������� ����f��n���#�^lt��\5��~R�S�>m����&��
�dZ�
������W��d�B���k�"��I���?�u��n��e������WjS�G���V�#����4B����]�qI�)��_�R�w�X���v��c ���_����q)8����{����� �S!U"%��2���L�����v����%�t`U���.v6sC��95B�������x����)N�@������������D���=L������ ����}�^�t�]{��~��1f]��&��oLo���!s�&���������0�n���;O��~X[��R��
 g��OP1�P�*Z����D��X�S����<��kM.��dRt��������B]�59t\%�
������z��#��Jk��i;fxC�S9WZ�����������~t��M�7��I���z|y;aW��_��|5an���S:�n�(x�������R"�����aOmh5�l�
{����Q ����s��+H}�F��VK��(����\VW �:����7W����Y�3E*L������0){�zM��5���A�/?�}��nAV
��7�7U���R���A���p����M����]����;����+g�l��6�WB?fWot��'\P.{)\�~p	��j7�����`�r��o�.���]�]w�+v^������Rb]���d��L6��=FBv���C���T-�Tm�'�%��t��[����^��n4���&�|\IU��T`Ur��6~��%��jc�(������',Pw���i�N����\��f [N����:�7�YA��I.����3k����h�w6_���W���w�����?������_��$^�����C�hA�k��Bn$5���\T��������H�dHL�_d��K[�5q���f�������q|��^�
=$���n��C��0��
8�d���$����R�$c:��k���n�H�q��3O<=�n���s�=�qZ���������8Eq&Go�k��x����������Qp\��8�A6-q�I6l��������u���<�^_7�B�{%�� �G�}����?�f������Uiz�
��� 6����' � ����(KD���,��y��/�'��q4z���N�?���O�o�-������`vJ�(�t�Cd�HG��,��Q_�R6�\Fp�����%��t���!�9�_�!��2�xW��
�gh�&���R����F���E���8���l��=�4L/x%��0��_kJ5���w����
��`5����4!,>",�d�O:o���i�a����k4r�h�\�m�cMQ��51�_�@�
��N��@����n��YR��5����
S`w�2�7������D�hP��������Fn�K�}K_���0BU�e
��g���lm�k��-�����'�o���aN�o�`z����4��b�ihS*����'��T���O`S����w����A��|�F,��&Wd���X{>[H�m��5�/�~LN1#@���;d�`�0wB����p�*�/���*%���0�9K�C�U���1�B�\�������$����!><�|����x�y}v�y����?:�7Ui���8��l�m�05��E����+;9��i�b��
�l�����7�������7z�
��w��U�YC
i���wA�E���!*{]������
��)!��]�.��3z�<wW�����E��W^���G��?��Q"��%���eT>W:k�C()1����&�	��+���M�p������^�y������[M8�N��������
��
>�n':Nq	�|"��W#���C�u���&���".�(���6L~����p�v4q���)g�z	i������u���M���8~Y�������n�����\�W��Q���;��<?;�[6J&�6���\�\\���z�0�y}I�����o�;*)��o��_1|��m�=���v����d���	�'��Cm1��h�P�Qp�Pe��+7�����A�6^�B|�T�����9^(Q���$���"�������>��AIn>��B�+��jc1����v�-�����g@�7�2��C��R���"������p��9���`UEp�bs��g*D8��RM����g����.rA%!����`����VLI�O������������a�?��WI���<J"�$a�.#�V��'�dF�O��,lf��8e\�WR����+�Q�^>5' �E��<�CC�Vs�+|q�� 7,,���E�����"V~������&�D���O�@�J��z�v�<�����'�<�*���e�Yu�+�M�"���1��=����~���E!��W]��]?.�h�;�@d�K��P�!Iw��0.1�-^�NUs����1�{��K�%^'Dv��/*�i"tpL��}�D����3���v����T����L* ��\�
������8�������AN_w���W�U8Jt�kk�A<	+I��c��@;hq���k9_�fV4���3����qg��� ��`!.����;`������~�g��?��W���>||�+	��"}o�7S���	�ol���0{��5G��M:^YL�T��&�
8Q=�"�D���3����
`�������r�����W#8j�Q�x�'�#_p����r��W)o>��28�U���zt�4�r(�/4I!r!s�I���,�D1C���+.[�?@�@��s 3%��8�S�X�	Mzn��
��3a�	�����������.������]n������|w����H���P"E���zK��xb���c��{�p%�H��e%r�~����$�W�����,-��o�9�����s����r��G��^wA���Ee�s&z2:<�<�Z"7�^x <�x����T�S�g���
�&�?����UW�,C�� ���(����B.���i��"|B8�#lz�C�r%_���N�������*g����Qw���D�+�P�g��'	z9�������P���'�Y�x��j+��L������L!������L�3�x�-F�*g ^e�"��_�j������Cx6�Q	 8�^��
�H>�u�b��U=����:�9�����@'�C��*AHE�+E
��^S��v~����r��J?^L\����JPP���������
!�WM�|���/���Kd��d�?E?7Z?��(�/���/��E����5{^{�Z���v
�En9[<��c���j���{k��q9������.�m�<u�������n���3��"t|x��6�]���=*E����
�xX��H,��G��O�c]�%��&V�`��yL\�p|M�w����t�����B$����^;�7���`�WK`n�b���d�6p�� ����=QiGI�<2�h��X���7���/N����\��fi�^un�_]t���<�D	�i��wz���gm�%�~@e<]5��s�?o���j�L�*On����~��`Fd�g�m�{���T��a��[rC�Y�B��w#���8��p��N�F��,�.�s'Fs:2����h�\}EV���/�af)XNjr���0.*E�&����`<>���`�q|��v>bE�hna����>����A�oN�Bl��VBI ���.�][�
�Qtq���Y�.���Z_`���0�$����h(r=F"��#�@
��I�)A�w���/
T3�**��2�&
l�������%�P��fq��5~���u,M�U`*�*t�b��>P�M�?	�J�	z���d`��sP�LI�VZ�G����a�G�����s����^:n��X_���jL:��3��{h������O�S��j�� 84���n���C����!����+����!X{L��no��~�&����3h�fh�3��?B�*�4{�^W���?"����y�d���u�	J
x`�����}�7���dYu��K��u�<V5 /m����B���a��$V�r�)271\WU(\�U�3i������"R#�rqT�J��q0�u�(���
8��F��L�u���p�[��������Q"�^T0U9���/l<���]_d����0YV����j7L����b:g�id�.��o;4r������@�e�vq��-��"����X���p1u����l��N{"����l�]��G�������7��n�B�^ ]J����f�{������J&��g�l�\W���E������J)�rB�����E��~�A���(� ������>"dk��}�pS��n6@RY���n��9$&o�W&�\���h
|�����e�.?��~������m�]�}wv�y�������:w�
��������B'����������Sr	�c\u�m��?��7]�!ZW�~|�����v�����}��ZD�vve����
�'.t���
�M��/������DK���j��`�����#�sL������?��{�����o6>�p�h��.�����y�~{HD�&�N�9��36�K��f-�`WK���s���n�NXFe�!5�}!��3",���C�Q_&�2�����Qy�����V�k�������1kA*^���e��=x��.�Y���nDh��Df�_�E��`&Uo�����/���YKL�s��Q��������8�U��L�$?��
�����g(])��Q_-yej
�B�f"{a+�5�����(8\-{e:*�2M�:�O~������9bmQ��8�����B>���Q�qyv�;&��0��4E�����r�{��&0���9�����&G���w/�f�IT,5+9s4-
���`0}��$S��]�V���f�[�,S
������<�=�FC�
������U�� �&T��=�C�����Ks����5�?W���l!�:�=9�=���jf����U�\������I�4���p`y�3�D�f3�:U�B�W�
v���e�s����EmZA�p�~k�Uk�����s�2m���~��&��� �c��*�$I��������I~����a�X�>V#��"�*��v4����~w%Fg8�?�����{�U���������pz��/#��Z�`$o��p
hE��}��P���|�
Pb���*%7b>{��q]�?�j��9kUuC�*�Wd������X��/X8\��q|��VI����"d6+����g�������Q�u^��	a���k�]��������`�3l���������eZ�tK��nBB�>+��Y��:C��'����+^�,����;bi�e��>y{v�c�������'o:��	j�on�O//^��yE�9�_�����*;���������3Q��_����zO��r�-=�)C��]m�vy��U�����������@Eag������g�=�`����.9�5N�)�\ttt9�G��p���l�����9���eG�4����53��_��q��{�d����~+��s�S�uy
|(:������{�}��r$]Z�o�Sa����i���(3[��H9��G���*�_�%���L���������9a�2���M�!��>�
1KN?@��T�Nm���s���RX�[7��<J�$=S~I�5��
��3�9�?�]����G��k��q8�_r�3�s>��E�5=IYY��b��I
�Np����1Qt���c`��2�@����'8�ARW��@�*�E��H������;�4�W�T���V����p�z�����=�,~�D	9�R?�8I"�!�S6�oP��]�9�61�6<�c
���\��%XL�\���ms�@7�M(y���|����$��<*�������[(��J��������]�';���]�B%ob���a
��`�$��������s'n/�A��!����o
z�?P^�*�q_}��b��5�kQJQRg�{j���in# ���ho^�n����K������7��f��,t{`���qM��{���;�y2p|�4���9�������mP��,)A�
1��|s@���@f�/������m���f��N��J<��Ns����Z���>���N�[C/���3hR��:r�4s{��"3��`�h�MQ�
6o�j��%�h�.���9
@=,���1V��E3P�>��bT#v���8b�n�6\�A�o���M1�'��x����i/E���O���H��C��S�|��TZL��$D�8���[9��~�S�s���@TB����B�=�7�|y���4?�)aH�*�����7��9'�s]&�U	..�11���^&�3�����v"��u<��:erm�"����Z���>����`��������tZ%�$�C���Iw�����M�#�Xx�e��=�����sg�����!��\����j����3�4ZE�]�lV���3��5#�"�/-)m�`pt��n��%����s���5P��,�_���W��������9�9h�zk a��%��n��e���������+\���e���j�ur�����:y��8���^}����j�������������
�l������e��V-,;������;��y6�]W����, �X`�D�$�kX���M�8�}�W�LV}�2g��x��9���)i,�����U�������c��y�a����T'��.���>sV�Xx��&�U2���������\9����}H�rm{������?wH����04���iSg#�L������;��k�8#n�������A�����+'���Q����Q<E�z3��=���Ux��&u���������)YA��=-�v|���_���H��r3���
�mW��2<X�����1�'E~W����Sp��)�oU��ad��,Lk0Uh0G��e����8_���v!�@����eH�	�K���x*�8u`��I=M�����+2���c��t)�I|IP�W[A]��$$B�,�?�R��:I=����n��y�E(3+�/����#LD�"���:�Us�n._]J]E|*�&'p��^��s{��$r>p���R��������4�B�YFA�.�M���.@8�4Hv67���d��$X,0F��0� l����	��#�^1���Hi�>\������0SxFk(����
�8���+��7a7S|1,�nX�	6�Px��T�q�sP*��xY�W�O8'��u�K^��D�����.^u~��=������~n:��
�������&�k/�����B���x����?���Tvc����J�0��������D����|Y+Xt�,6dK<���gg2�a��S���'C1���-�;8 ��	X�"��j#���3D�3EW������l/w#�g�I�,[*�:b#��x$���(�������e�p�g���x�So�9
p"�`�V��QU��$�����c�K���e�6T��
jp]��kB{I��
Q@_�,* R~�;�����7}���*���F���T�F�#qL������)��y���9r@��ce�01������x��-Ut��H���>� P�auuB�O^`;���:u�ti�B�8k>��o����C��4�3�e�,��`T����HVg)�tG�?3^�'s�����0y�Q�����s�����Z��<HK��8�i���r;�P.U�s����9�dM����w��A�����,�]��Fh8��=����cs���)�����/~��_� ��tuv�O�Pp�;/�����e9?{IM���jw���I�v�4���t_����	���B���+�p��z#����7�
h$w��>�S�$&X�I��6����$�
�6��i�d%U����I�F�H�����2w�
���
~q��sP�����P����B�'?�������N��eh��>�X07���.H��H2���5�A�,��"��W��"�������V��I���
�
45Wy]m�L�F���0�y��7\�
���!��!���7pT�Q�<dG:��ytN�����Df ��td�1�n� �pX�9E_�,����[�Y�$����V��6D�����7(�y�-rN�i�|.����V���>9))��K�?����;��f:K1x��(8�B����*1�)G\J oi��I8�0�P��02C~�{*�����/��:0�&x*G0q���pNi�(���H����4O���F����"h�+:%:��
��b:���Y�������Q�UAW����V4�WtF���r����[�:uwC�a7������r(w�a���[%���@�7��.���N���8�e��������sd��4h�tMI�����>�����I���_����@�SV�'���b)h*�:L�0�l�+N#��,y���C���S`��=�(�	� ]G���:g�wJ�F�C\��	���D��N���nN^��#����6w&��E�-��H@����+��I�K�#��}�����p/�?vO��\��4zw(��� �)csE���J�s�2�UWm�Q |������}){����Y@6���s�-,��-�8�O@�L�$��hv���Lr���J�qR�y*�k��uXq>���~���$l*��y�^�Q>��t�0�M����/��6�3���2	����?�`����[�fh@����*q�	z����.4��u�jzMbY���)
�j ��:%^$OK��R���K�GN���4tv�����I���7��x�������Ue�2K�{m��D�����3���J������-�k�i8�l9	��^s�����:�q�>�c��i��K��������l���#`eM���:%�����.����w��8te{����m�����8s��&w�.���NSO!l
L=U[X��Ji���=�a�s�mg��v�5#�\����1�F���-����E�V���w?v/PF}}�9U�j}���K�&����3/�����>=9?�������������;�aBR`����!�M`Bg~�(\�fGj����/Gb���5�H:��)��Q-�����x����d��T��\�=���v����.�p�]���M�5@��h~w2�M�������
b/��h�}-������0�gw8�-	��tPD	�&�n����g���t�		N��9����y4Lh��	H�\���|����^�a����(��Wg7>t������)1?g�/7�:���g��%X� y���s4�p4m��������A_ZwnM�3LDi�������E�|��������2�.0ik:��B��������nf��W�E�t-{\B�`���0@�4�S>����[���w�`2������X�a_�W�;(��t�M��dN�C#�//��;7����e93�_�}�a$���o�l�s����~0����f�_{�`�����A��.�({���?Z���r�4����jA��j�`7	R��i�z��g9������TL�SQ��=����'��l�~����������m0����q���==���G[����T���e�`*s��iT��X��1���){�C&��8�Ku6i����M�X��S���|���.H��	z��[���1*�/�����N�����%�(0����Hc0R�����@���@]��=JA�CN$	�I�s��k�4^�2�Q���K�DK�i��v��YH"�_>��Ry�K^+?�?�@��cV�~�X�'qln���r��@N�cN��RA�(#������`2��O��g��M���EXG��Y�'�U�HL"�.�&�m��FB�r�8���R�c���k��$���K
$f*���RC�~>w����%�����/X��}����y�����^����6����������zR���t����wYl$d�����L)?�V)]�����rZ���3�X�����5�MJ��6��O	<(D0|)DF��s���ofAO'�i��.!�"�>�Z�ZW\��!do�h���R�����'�/�:go.8.�JyG�0����������w�n�{��!�=�~{I���,����o�v����h= ��>BM�wQ�p�H6��hH
%
�"���.����#����@��I��a	�	�'���i�P#����������:t�Q�����������k�NL�89����;&��]C��.9�������B7�P{zVI���Y������5�9��>�N�����i�@�Yt�^�/�]���i��
���yz��eZe!�������������4��H����y @��eAk_��H������KZ�]L��f����m���3z}v�F�����s�~T�U���c�����| �W\���?%�@V����]��T"�m����|E�_J����\�;Rc�\6�����t��">�?rUu=�Y��3�`�hfj�J�����8�]��n�'~v�w���j�(~'+������h�6�o+���{o���C��vE��������KB�
���<�M~�{N�����;uqy�:?��QhB�(���"p��K�))���8T��TYG��>�`��D��BR����	�p�Y�����t~��
���\�~�ln����QE��t��;��Z�+��K=����;a�oQ5w�N�XK7
��A|����>������N������7o�:o���:���$b��C�\P7�D�%� �SK��H��@����e�<B����)7��w��s����S��`fJ�Y�t�S�	q�=�+��1H/��<���i4����H�Q{����y�]��B�����q������x�����f`�{6�dr�<����������V��gwR'�[�R6y<��N���xyQ�u�){�_f���is�Z��}{l2V�*�_I��������f�k�n�4;��SR�G���!���������f����4\���<+��p�6��������,Fg����z?�B�N=SZ#����9�R%�k��U%���H���+����OY]�>d��2�g�Z�jO*�AN���6��	n+v()��::�d�GGm��91V�!���l:�dR1�';0s��4x��:;9?���+J����$8^WN~�����|���_�c������1��[�v�)�b;cQ�������&���G��qym���j�r))��������d��7�0�[�����s������Vr(�}��3�������*�`��H��$c-����{�?w��*[�
�u�/ur�������f3�;��K���>�R�T2���NT&�Y�T��i{��M�K'_�M8����yK�D��S�o�<��v6z����~�����ky��}���5��w
������|�j��1�������%���$
E^���U��R>7�gB���t\Vb�E���W���������V!g���F�-fw�����v{���Q�
+2�aG�!��9]s&��g_����3�-)��7���6�����_���j���>����m��]��md�t���}��	�-� �n�����W8������1� ���	���B����5����c��������N�`��zz��xdL�o�q<{<�'�N���G��/���1�s,����U.������gc�<�K�c�U�n����Y�'��S�^���#���
��XJ���H9eu����l������P* �N�s���.F�G�`�6�S�t=D�{��&B1F��H��gS��	�������1��>��y1�1�,G8"
������C�c�0�����;lx��q��U���N�����h���e�%�#�,����w���O�5��Bz��hc8�vOvv����i�g9H��<�o�	����a�$)����x~^W-�K��	��/�L�{������<<�&Ge�������t����+���d�x6g��\�]=?(L�������e<�5�r ��w?���Ih������Z
�"�qI��C4����2{�h{���#AQ���WZX���J��dh�U����a��*F�M<��pxMC�m��(���x��|OkC�������[���gP���
�����M�7I�@k�
���L�1�{���\fd���"_�E��B.F	����
I4�5 �����}	G���A8oj���V=����e�,����=�*7�I������/|@p�(?��`�H���+i<����������ax��d�������,g%������]�J��E7!ct1s��q���R��b��4?��
i�r�3K����GE�j	~���7�A1I[K�8���	7)
k��FW��
I��9�o�E���_��[U�x�����A�:��^������N;Uz���Z3':����L�&�J����t�d�'�	���Lc��f����@�\/�n�V.����a+���Z0��
���P���q�	���X�eyd>�y��\������?����d�c]I���Z�V+��(V`8��&��b�&B�R^�?�L����������f���_S[���a��DL��M�K5����x87!`���\��b�?��	�l�� /'$�L��
��G�u4�A�dA,�!eE9,�M���x������	t!Y�0��B��U%\���5�5���mG�0R�,��Sx���p��!1��`���I�)$))e��?��@sA���$b�xE�J�
��������������T=Tp���.�xN�f���YW��bTUR���=Vx6z��:��jy?��GP������`3�b����2�`�#��2�jA�9F	.�0w��!�W��h�(����Xr!����S�H5^�>�D����2�f&�IO`r��7�}a��Q<���z4M�?{f/�.�#�8H(��,u�UP8�V����\k:c��Lu�nz��6�\�kq����?D\2��~8Ee��c��g.�RQ@�� ��PI�#�Ulc3g�q���1�����Y�6�U^N6�D��4D*z���$���^��c��~j��Lc��sZJ��:�0y����J�D�{���x�:��t��Tl�0��?VQ�3�w�wJ����,�+�JR��,�����p�H��#���9��u�j���a������J�5���Rz���c7���+$y��o���n��HJyosA����o��_�{�M=P,�H�M� �1�&L��r� a/�EI���>�VCp��[zESjj�$Qw�����v8i>htk���m�k��������o���BMv��mL��N�|���'�w���;�Hq-&��B���~aiR5��*@���\*�6�/ �
�� ��� �C���w1��^���Bf�'�������FRK)�1gx"$n�o�J~D��-1�B)8�PR��f��4���"���n�8&BJd��1�������Bw����:�~�6��!y8O��d��I��@m��0J&W�����T��m�%���r��	bw*��O`���$E��n����_&�52'!��F)��� \Xt��#�����'q�0��&��a������-&����QX��6c���rrqr��u�-X�c#��3Z��M�5��\JL34
�X���������:�\:��oZ�~����{z�G9N�{DV������mHo�/=P�v���B�����?�l�*^��=��FS�V�),c.OGu"�rgu�R���5^��������G����2#J�`�rP�V�"m"�
2���b������qd����=����*@�o$f�8����zLx��2��/�����~��t�y�T�T���T�M�V��:-iU�/�	�w�ta@���y�����y� ����s�<����oQ���zx0��|��T4����#Y��M4�Zh�ty��[�������_:Gfk�#Nd������z���X���o�9�<$��}�giR�<���
���^�:�����w��S�L�"���,���L�2E���W��!����tk���������K)Y3�X�!�	�������S)k��
<8Hp
r$d-�}�1�|\(!�x�(|���T�ODF�+=�A
�1\��o����n���f�C}��^8a^?���U�
.�'��8��x�$aX�R8�d�^�p+H��>�&���� <F���W�Kc�|�@'�c&����9���v����[�W/��aU�&6�H2t��fh+c	It�6�|��PL�E��R<���,����x��P�u�FM�s;	ae$���qZHo���,�n�,����d0LP!J%%�I��eS����c�7"���&�q�I��BS��
N������+���3�J�������u���)����)��\t��,���(���9�^���ZJ�������D�p���u\+���{	�EB_��r����������r�
���������n�!�}�7m�:�D�zIz�Bhh��B��3l+3T��]q��F���2��#�p��/��3�!���f�l���G/{j���[���'�k;��f�����qG�j��M]�E�����,��[]7� D���5C=dG+��oQ���*D�J������������U{�0�6����PL����2�=������1q�Ms��2�r�
$v������  2�t5k�]c��S�A�BfF:$�4p�y����eH�C�N��v����v��x:�z��&<�UHr���>�ll83K2m��&L�P�����%�'�����C�������n�9zu���Rq���~(x��=�"�=�n�)���f�0Ny��"��u/0�c6�6�b8]������x�fg����
�j
�0�d��h��4���{@�uj�JA]�@���{%�}�����t��5�	��'<\C�APw�S7=�

���yD:x������|��
[����Z���Jlo:T	�L*�H�Evv�d3G.��h����5~$c{|t��+��D�{�[�,g�-c����3nU��������M�9��h2�R�:�%~��u���0���b+s��B��	�"�����{d/����rf ������! �R����EZ�QN#��OZ��q��0�X�k�G�=E@�M�cT�3!���.NJ�����p�i
���D��_���7�D�����
*'U����P_J/�o�?cd�u�d�P�0��.�����F�Yx���d3��������`
�f^��y�hBm�l���Vq3Lq���.,#c���Q��^�B+_9�H8M�{B'����={��g3�/P����eI������� -�0�Q�0�\�$�[+�Z�4Pg��rk���s��E����*�-����z����i-<�q�6O�U�Y�k�WS0N,�sTn�i�o>���g��z�+"�	9�G�v��gd!q���Tw����� P��i
��x�h�e��G�:N���w����)��b����u���7�%%\&5��<_��1����}Y����7.s���^����6���v�HU@V?�FaS�C��d�,Q���J39�[�0���r�kT}���Q]�rs�(-4W8,v��2Qj��=P�D}��9����n'�#���E��{�@#�A�$���B2#hi�FI���n����]�p�*u�mG���J��h��S�I��Z���Q���H� 91���9����(��#�A1k-�L
��m�8����oN�<��ph!Vt��%L��.O���q<�
���9�@[�Ia�����54��-D�����Y=#!�VG�4�]�!~Vs��u6�����6Z0Y9_p`���B�z�J!,]c�g��A���X�N�{��7�3�
�HD�����8�=4R��������$0!aJ�ak���o2&usR�����S���}����M���TRZ$v\n���C��2l����}GQ��8�P�D3�b��Zo.�������]�`�}����C�)gT��'}���HR	+,A/(�8��������]��o�������<kbb���y���]&y�*��E+c��!�'5�v_f���
K�`����|����lS��Q�����5N����J���"����G�N��A*�(K�K��V��27�hh�k�,Un�KO0�Pa����J e��P��=�������l�/�[<K��Z�Q��M!I�1�a������D"�5�D	{1��K���6L�Z0�g�� !��
2s�����Y��eFp�Q-�N�K ���6�G�+I��x%(�H������M�~�@�W T��-$1;Z���A(po!x?0j�h,f>�"���1���uM�I#�I��$�|��%r�����g;��#iz���/����_�TBL�6`���s��(Tg��Wot�vcc���4���0�F����/`�fK-��5��\!�3i����"?F��m��d���k�h�Sg����Z��G�a�ux�DoL3����{x@.��[�X��g*���0��R�.�o�\u���4�w��;��
�\�L�+��j4���Z$��Yx��s��|p�+��Q<E�PQ��=���UxK�l��O"t���)���g�(���vp���u�����r�9�<FR����}t�� |��%%W�*�J����+����T�
��e�T�>��?�T���-IY�AF��:���+t*�[�1�����=���F��C��0�-�Rf�L���}��4����5%�&$	w*P��=���\n�g�����hCk��Kp	��{� n;B���3p2����Pp�1f��������Sj�M� e��yk���:�s��i4�,�S
p�R=��Y�g���}��eWO����i/�|�F��P�>H_����]q�L|�=T����6�E
�2Q�2c��-���y�mEW�(��/��I��8g^�����|�NJ\����ao�`�x)n��L�w��*H�3.�8o+9���J�H��m%d���u]&����i"u�LVa;�K�y�`�T�<�NF��r����4@�� �RU���xFU�g�O�TjR��s��3��B*k��^����F���������sr�IWl�����/^�Sw��Kz��#���vd�|���w�\>xR����5!��@U�i��C�j�gO�'3�`Uz+�����/
=RZ\�����Aki7O�K����N�mNr�]���&u��dk
�"�������$�v��6VXo�����W�u����_��(��"u��yl�4"`$`�3�!��_Bz��<74+xl[��dT�[��]P)RK1��SV�:N[���s�����\�
;�he���>��tTg�5�����g
cL��X�[_�0H����x ]3�@��`�����g���uy��]��8��_���|F��G�����_q�omU}�2s���0U���xk,���x�� D"��U�i|S�Wsh�����2��v���jh�O��T�R$i/�����;������"�8M���-uXO�5%K�f������	�6��}���
� �T���9����bat�;tY�����t���`��B��B�`3�M�t��0L��!��~���Z�����[4.t���Rx�hOg��B.a2k�GZ{���e�F4�m5g)���| D���lU����Y@y����42�X����)@�4��R&�A5s/����-��Hj����S7�(�e]EX)��y���<���@�s�xP��S�0�����k���FI���6�yJ���KQ���5Ghp�����#���j���:-�G��f��D�}�8NJ�c]�&�8�UF���ec��U��V����P�c��i4s���<��uS�<��bD{�f�������)�`��e�Lbk����P)�$w�^�i�����4�jru+q�^��%��`*PP�x��p^�y�qG���j�+�#�n���n]�<�c}���)�
�\��#�(�5U�V
��n�/�U�$�+}��7H�n�26<��"aXu`�.��y��fM4A��/� H�QUrQ��?�ZK?���~&�|���$X�r�Z'���<K�C-��v8�=��~�S�MTxp^h�:g����U�S��l�g���Y�[�p�������!��	�@���'��>�[�J�����X:}���Z�����N��j0�Uc��Br�������\�}����D��gDe��Y��P�]��D�Y��k�=mU�9���(>��5��p���I���u|�Z	x4
���x
�4u��e�2j�d�%������@�3��b2iz��d���C��������G0%$|y��#�02���)U@�����QW�*�<��gP'AL�����g��������kcP��0���f"��������/]���F.�+�.b�|��0�/K����Jk3\��	�B�9
�&iv1�����^X�NF���W_�E�����������~>��D��F�l���U
vd�s��H��i��$YL�vUa��^����2�L��#\�=�+��l�z�%W]r��1<���fW�eP3�����M�a�c�^Ui(��zn�H��"\���Fs�9��"���+g�5��^`������M����7D7A�Y,#%���?4��x8���t�W�O���T��]�eu��im������{p�I��J�oSLm����O�����$'���,�Dz'�����b�M�b��F���2@��Iq����!���^C�J��r���L���S�Yr�\�����M�B�RD�'���t�0������e'������Rvo���;���KL[8Y|�gN���!���5�?]��z�~�^BS3����@���~<�3�/k��i8��v/�������7����sr�@��?'����&�����rv��,_�o��ML�	�k=1W$��I�K�#������|*m
@rw�y�[
���t�D�KM5��������[n0�u���u���\�d�oZeoQ�W��YY'�
���;
e2k������S*��-{�-���h������t���y36������X��d'%��[��d���U����;d��21mb�
�,�)-����B���l�R�T�^qS���\�K�M�0��9�rL|��|
(rQ�9.0���H._�����82��#�� �� ,���`��@���,��2�
���y����w�����~�����i��w�����?�y���A
�����a����}�^SE#���,/A���A2��uve�A��+L�l���3A;�$���p�L��K\��f������-�T��n����6��R#E�$�%�d�}��n�fD�K���wH����Sw�~���6��NTS��x�0U�j��
7��fR|
�����y�TB�/C6I�-�i�%>�m�?D������O�v�q�k��j��~KI��V�+��Xg�.�����t���?����"L�M	n�2#,?��1FH����H���L��lM#�oI���j ���o������s�
r$
^�	��Y��r'�7��V�i�~Xv���R[|�jC����}N��/��,�P(�?��1��Tq^)�>J��aI��O=]a����?W����-9�.���x����X�������4��~�^�y~/O������os/�jy8��t-�no��7>��K
�0wG�������x�����+�,`��1#ic�b1��<��v�SH2�<�]#V�p��sP?Q!�n'����;�����E:�Ntf���z�q������B��"_�����SrL|-�&�KI���"m&���@������:&�`��,l�����'�[8��5��V�����XO�tbk�����Z`���R��mb���w��8�8�c�9�n��e=i3�R��@�����Q�</���u��>��O���Qk��>����~�ce��3��P3���.F�7�_��t���J<{���i7Z�m�����/�@y~��[mp/-���f���g���N8Q�9�y�����cg�}]�BbU3�]s�7;�;�^�5������i�b�^��]�#s`
�{o�������a��pZw��{�z�:H�������q8�O+_J� �N�������s]c����oS�[�=�]-�7Q��)��p"����

>�Jy��3a{|ef��gM�hJl����c9G�_p�p��O"n_�kMc�������f%5��b���K/�Y��6����3�Ja
J����	�!���Y�B�-�+E���I6n�;�TC	��2Rx������T_�����(�H����<�_}A�&_lW&�������}����P�S�{C-��[@�'����B�oNa�������S�?.��F*jF!ix�[@�#c���L������ug���@/�o�K����,�@@�;hH���1�x�����4C`�����Ue�E1S�8��S7�1���b��SGo�/�4��E
�t�|����-\���C���D%Y��8�D��k{~7?�#)��;��&��)�{����L��nR���3O�V�y�)d�
y�����"��*����F����k�����~N�s�r�����
?F�'�����Nrw����I]!y���p�������1�'�o��EO2����L�C$��������0#nb$Y'��c�L%30/(����T�NF0�?f��>����}I>���~��\H?���N�w���B��|����1n�2y���'��!����(
�T[s#G�1�r�����B��K� J�,A��qD�pA�s5L����;���Q�<����W�"�8)���8�-��|��qqR�`��U����:|zn�/��>��#���,�����H����Q�P�M|Y$���G���Q6(��
��l�O�VR1�j�C�@/�p��� �K�������'e��w~o�L�J(tD8iX2w�@q�_;R������>9��C�~��'�4K	W����p�H�|�dtS�=&��]xV@���:�r��b����dI���}Z�d�[MK���m�������vO��t����d#m��E��[�I����B�x���y����bl�]�P�@X�%��wDa���YV�_&~�&��u[,3}��
gf�N�����
-������o����r��y��'�iJ�YTM����j�hrA1�w�L�6���0_��To����K�C���,�d}v����Z�m����� >�\�q�=@0�,M��]*{��/a��A��r���mM3����?���s�_�q�h��0L����r=N}!�+y�3�)5:�m���s�s{e��E��5�c�D�r�[�v{��"m,rt����Yk�I��$�{ l����:O�����`�+��H�Q��z����2�O��G�{����F "�}�����M*��0]{�v:kd�Q�5=�.aS�|��rC�;��@A��f�q��V�����:���u�5�v��������N9�+66a=����u;���W��y����c�6�l��e���@����:'��w�<����tY������D8��4k�R�e"i�$	������Y�4��#"����|d��lV/��v�*p���3���C~>1�po3U��� )$p9���]��.�Ls��W��Ef3����n�GS������Xh�[4n�L����M���1�},�d`������lfL�
}�*#s�z���HN�:��Q�3:����`]�A(���;��o@vRP���V��<����e����: ���WS��&G$�i���L��R����YG�5Q=�
[���y�^�i��e�*���������3�X��uL���y�c��j�N���d�7(&��b�T�I��s����mK��M6>pwn�{�k7��xMu9����������s )!�)sJD��yd[����*���qY;���$�9E/����UM������{5�C��a1��;i��0���]��2�q6�L�V���(��4�A�/�I�����.1����j���B�}��X�����N~��)�����L����\L�&�a ����V���C��Oe�f���p`���+E�H;��nW�i�@(���g��_ks�-s8���NY����T�������.K�V�����yC���L���9}�"�YV=)�ck/A?���0xk{qL�*�j�����X7#���a������5vs�<B8�\�n~)�
Ed�e����0>-RLG[���fU��$����a�R�zb,�����'3��yjo(�������d[�2����h�]XE;��(��5�I�PgP��:�B��u���>��:��N[���<�^�XkOl�$u?hGe�<��d@��R]������)���\7F�J%���0<[0|5m'g~%_��V�{�J�F�"f$.�bs���BUS�u�H���9o�{t����D��.���f�C2w�����������&�m�k�Y��g5p��(��B<��q�sg��<{���kK?c�H����tK1�d���-'����z���-�lK�V��g���Y�����)���	��IJq:���X������p�
`��IW��;T�#�#��2�(�:�f6E<��\#J	�����.�b{VAF�U�6���/,�dX>3�oe)���T��b8��H?=;.nZF����(
~8o��S��O��$��[`\�-gNJl/�J�j~s�������J�{��*��Y�&�4���1�9u�`@���0� �nAg;vl*���D�z\�`a�u�����~��!��j%h���1"��2q�@����;r�b�."�|��������@�L�!�L�=a�L�0�#��\�c�MAz�
u,����-�7)g@�"�
'(�5.1�����R�]��u�(`<�s��7�.t�Y�G����I���G��"��f��	{��$�[2�[����!�)��7%C_\�&��[�����JfW6�1�_�}���h^L����S��7���dR��S3+���
8��cN�-;�9yP�KY�����~<MkWE�1f�hn��#��]1���K�����w�����i�/��Mv�t1�<�I�$h���J�8LyW����[J�b �u�x9n#,g�)���:�0������y���^�Z�������������������F����5�kE"����A�< po�}�<�����8=���I�ZH/c���.v�!y�(�]��M�i�����NLMFl2��qTj�Q�?aL8����.�*���'�cHo
��t�a���]0s\v�	S{t������G�t�1g��5-2D���N��}uu�_@r�D�N��O����5��$�(�b�8my�C�2o����B9^I�����)�#���w2��NX��L�t��������S����������urs"�]��(������b��I}06��(�w�������"�[[�,�T�Ff�����Ov�����T.h���Z���8K�����+��2x����~���P���'_a"5<�[UA�!R�`Z��w���J,�����1��E"� ,&a�.Q�u}��HJJ���n��c�Z�M�9N�VH�Vr��'?v;?�����a����kts�:������ bU����W7��W0����O-,����<�N�+���y�-��1�(�.�
�1�e �y��s�b�������)'�Q��^�`��`B�YI��9��������4��Y�/rx�b�q���v����$Y~��3N����/�������<ME���"���D���LB�\
b'���T�a��8����#)j����*��s��-�9+�2�]]��x�Z������Z�JA���9<D�t��
���H�;_ "TSF�uy�m�WMx�t������*�>3�����6W;:�p�{�"��Zu�(`K�����#��H�4���K4��Y	i"
xwC�}�*�%����eE���r�kq~Dh
W�nf�5���L>�������K�(�p��|�?��#=���sC��.]s��R,i+9G�8����+��=:Pqd������b�fSj%7��N+��t_��8�d�'Wo�u0
.��L��S��M�Zt����s�}W��@�2
�n��L������	Y�B��\un�_]t�Q�@U�����jq��	�th��� �n�/�<���4K�9��Q^8�0�U�rp ���	:U��N����
������<�b�V�[�����
��DuK����{��
NQ����_T�>���d��* 
9�R�P[�7pF���+Y5�j�&8`�Jj�[�g�Yk;���0�O����EV��L(�9G�c�k��f����$���0-Lv
GU>��l2mN�y�)�b}�Q�b��t���Q1-�?|������q�c��V���U����a�]Y4\!~vn���m����4�lO������'�\Q�7�j88��[��Q?<���Q��P���*�-��j�v���F?�w<�-.'�
��^�� %�.���	�~��J������G'M��������-pTu�X���q���9��K1e18�9�q=�����K����zs�.��Uj��s�9�QT 	!�����g�����N��$��&������d��N����1(xKY~��*2A���M�8��g�,�M8$��gJ���|������d&t����u�$�c���gX����@]^P����4�%�A���)Z��������C��������\u�	:��s��.�^u��7?���
���V�p�����y�1�������/����x�p6�g]f���I�R���xq������Z3�J���$!�����u���vXW����L�<��YZ2w��Z���"l�m4$�$��9/�@�����!RB���6eCts����?'�?'���6�l���r��h+��T]�U���X�S�V����?�V���iyc�Q5�M��bO.�
?������9�J�`@
7m�iQ��L9Dq����q���wA�/A(�O�<��������f?�5{���f[������3����<�w���V��j
s������gi3��Z���&%��-��t��=:I�<=?������by�q
�P��t�S�����M�������U���N�]������5��]����X/�n��(bq����\�\\���zk���t���6����!���o�N�MC��2���B��r�O\�TA7]����\[������~�ul�<�+Q�~
��_
Z��v��9j6��p�=<\�V�y1BIB��1�3��	q�m�����f�&��+���u���j��v���{5����X=C���������D����Z:�7�g/`���A�'�zY:W$.���b�+?Wft]O��
�� 5�O����8�j��X`�P#x��@�0=v^��6���l.�v�.�G���j\r�(�'����.��h�"R������t�_���}�{v���C@Fw1���V��N��\�G�ut���qey�����]���u���9�������Ob~��E��2iy+�]���%Y��)z���������y��)J����SQ���3An�L���p�TD 6
����fy
�4a<;����Y��:<^g��N���[���=#�Q3��.�B.�a���,���`N]�O<�`�����������H��J2cz)�3N!4�~��6�{{��q�`5�q�(�4n+"5��Dj���s}a}`Sm
�P+�-|��+��q� l���s���'"Z{L���5@Bn�S���Y�G��n�W #�����yl�SK��28�|&y��	w"�o����=&��F�{B���
yW����s��w���"���'����c�`q����4���L���Q8Z�2�-
"�n������`�wX�|�(n���X^����0{`X
`���6�5F���
��K�K��N,u��^u��D#�_�	Y'����L�q��.���7�k�8[����t�P�7���J���-T���x�&4��Q����Z� ��*�{ ��(��)����������*�o����*�����d:�D���k��T:U�^�����s�4�3���ik��{�\���N� ~�&��u|o�Y�Pj�abR\�5q�$��>|4��w��gn��dHZus�$�����H$�$	�K���U� ������g�(�\��x��7�,6����!���_a�H�#��.�R����l�\��3�|������J��
����H�7��{������v("��kcNn���s��Tr�'3}P�h�Bqbc.o�3I8UI��
::�X�'t �����sa���Y�rd�����!v�[��y���jQ�;��x6$�'�V���R��9�w�v���gJCK��eGW���5���y._�"LxHV�o
<��@P.��&��IF?���J�)����M��B��z-���J��[�A���V4u�m���Nk:]����Y�	����[b�\��J!�����R���X�e�"��
�����6�)���~'X-�������l�d3����}��>��l�L8�T�P�6D�
�BLz�x��������h��4t
�P�;�cUQ�.H��a�?���������;��{����e*�������j�����h�����"���?�F��`������46X*�����`�������Cp���}��:\��-CM������0�{$�M.��B�Q�'�@�~V(`�l��9������ku���
M�\/~�������Xf����S���H�Y�Z@`����UL�@����N�X���t��g.Sm��Lr���f3�o��VSr��bR�4BZ�G���c�7q<y&���d�������������������~����������j�%����S�6�w7�)����Z�������z"���?<���v�7�u@�=�m4f���G����vI�vK�������E<������p�E��{������ax�l�A{	Vdz�A�L���}�	�ok�UA�}�d0��(A6��R�x��;����[�����EA/=]�?�������c1���}�ENxiDw��A"Qs�>��@�k�}�x��X��Wf)4��8�(w6���,�������D7w ��n��4!p�8g�RztmC>� ���In�Nb8y��vz�3������6p�ax��)�L��,V����<d������p
o�[X�M����o����m4y|P�G�nb�;�tDwhh���oo�������z}y��v�^^��..��S�q8��V����������*c�;�-@��Z�r��8�/j�����	�;������NGOP#	��W���8w���\2���L��������+��"��!�vZk����-z�Sb��$@��.b�M���P�gu�I���`T���������Lz X'���At��c�~���H%��zq�W��[���x���������m��]j��s_QgtZ�2L���8�����3��!e��������A<�xp=eOg��3G�
��x���-�����9�
�	��$���R�8f�� y-+Dw�P����t�	iTC�^uNn�O�9�/�l6���v����S3�t�-o{�2���[9G���
<BO�o��sz��z���?�O���=qt���sq����o��������������
@_�;�WZ���?����+vv����~o�����)'�A�qu� |�[�4�yg�*]N�Hva��0�,a�+F�s����s��������.q�Udkr�"C8H��L:����5o��&Qs��q&C�BT�����2��!����OR
��
�{q����>�{�S�c(�,��k�#+��`�� ^������'��Q2���L9[L9[���n�n�k�n��)�*[��'s=���p�������&�8�PM���Hg��<�S<��&-(��@4�����2�A�������������X&�!��,-2^�u�T;����8�[�L��~��zRr�ro��If@�)����I�#4���g]����C�MLO�:}�yw�1��q��F�g�T�|7�Lw�|�e��hrO��WH�w����'�"�i8
t(��1Ab��k������5�fTO�����$��e{�$u��5BnI�����Ta.��_pFFH��14L�/���G�{�y�����������j��_	�����}��gw��Y��*���������I�<�f��=����
�|5����rj�G�'6�{J���O��sNw���A;%g����y-���!�Zj�V����>qM����+�?�=N�f����U(&�����]����h����Y�����uii��\Z�x�0�� ��
-�#�z��<��$�
�7��y��A���a���������og���F�!�/c���������P�����
[�:���Z^p
"_����5���ESJ��9�8����|����[v��JM���oz?m����Jn���D�_rB$��=_\9�~��I���W�����k�]����E,���L��\v����l.�?g�:��iN�
��b>���h\�,)f���D\<Hy�A:M�#�XXH����"��;��Y���`&��R��g��IJS*�|�fo���}�`��{i�l
��>l��RK3t	�E9�'`z�"���0���L�t �X�g��Ko�TPQ9����������I���<�����M}���zg��'�?�v��	�	��4����]����6���s�u���Bs�L�@�iI���
\7��xZ�9���!�v@��$�*���KO�3H��3z#,����	�����b��
",�����HR^eW������i�����'��|���n���������4
����EE{�!u��Qh�Y(���a��.-�Uj��+2���(�U�jS��[��U�P��������a����������"������UEj$�<���n6�F	%�1��~����x�L�a�YM�\�|%^~te�WH���������j�:�����6V���x����/���v8C��1�G���gT�s������?4dn�z���H$+�������E�u��hd�	����x�~���&��	,�U�x�2>�@��*y�/�\��g�t�;P�z�������Q��1#�-�q�,��s��	t�z���sWA�C�����@���^x|����`�L4|�����i-x�hh�9�	������:��QH�B+��pW��47���'��N'����k9�������^�y<���
�8���%^7!��)
���:���8	-���5J��n3�n'\��+�u�!��
?��N�vs��
�D��d�jz��i�����,V�O�	�q���pO�����������8E�4���_����1Y�I������{��N�s�����0�iB��!��`������[��38��FaiW�s�Yh8�xsq���GRF<?�������}p��s���/�ip��se>�@5�����m�Ap;��4Q��{Z���.�����+�)Z4G��3��I��DT61	%��\�,wV���*����@���~�9����r
� �������`�q*iar�����
����n���K�H�D���������k���h�L!�x��E����&�s����m�����S5)=��,������gW_9?����,
7�C3t
@��:��������������O�Z���zj��?wr��,����������P�{�fG���Z���KC��s�%w�$.��T�������
������+I�4��)kZX�O�������_IG����E�OH��
�F��������UH����%�rB��l%^cUn6�T"�j�`.������Aj���������qw���?���s��������\���`XEY��
dzD�b��������g��~~VD���:��~�wv�����?*-J�����v��6���������x������b�P
�eY����<�xuu�n������"�R�\���
���H�}��e�����������N����x���z�S����&�7�l��G��q���BF�Tg�=�\���/�y��ux+��"����.�?�.&�������b�����_�u�	��sh���m�����|������}����������:���]'�-����ox���3��������I�������m.f�����V/l6�����ANx���2A�|Bd
k�������s�^u^��?�Q����;;���p���%@�����.����|w�����d
d��M��N�L=!YDs���7@��r�N����A��	%Rt>��+&�0��^�|-�fa��m�Oi�d�D__<�������E&�&r ��D�q_��8��?�&L��oAe3���K����{��QH^\�7��
��C��������&����s�;���h�ZP����}��r`���>A�~%��������i�8u��C�A��z����m�7��$�+��d�+�u+k8*����B�����LJ .%�����$��NI�'��$��]�6���~���d`���3,m���w1oo8GB�w0]rm�����p�>�����������ZKc~N�e�����H8�`��]�{�����P
"�&9
�����go8�����,�����0��'Pp5���y�;�c:��o���|�<�/�*����l8��Fn�f����a��g��a�ax� n�(��.��P�x�<�1O(��D�����/�2C�Hw`�%������g�}�<��$���}�{���'��v�N�E��F���/�<�E�������|�<�=b�����W�7~���N%�2��x1��E��cT�d%�hZi~�xx�k�6����~x�6�h�Y�&���x���3o��f@&� ���������4u�s�/�*X������^��L'Tw�2i�$�������S��}M��?���s�s�Z�g!��7%b�B��g�SRf1G�Y���D�-���L��<H�M���`���l�jA�6�_0BA7ZB��6��qS*+S�J�\
P��3,�K���Jc��l"0��X�zM�)�0�������5���~������4����I{;���Q�y�vG9)%��n���|��\��FK�� �8�g ���e/�R4�*�'�.���FZ�-���)!����=�s�pTN7|����+�h��v[�����a��7\�t'@N7#������b���g�W~�8�_�@h��k���<?W������(r5ys�o����~�`w7X���h ������?����z��t#��;���\;�9>���������UR�e���0��?	�)���:��k��������������=�����7�?.�?.���q��/�o����C�������AER�e�n+�R�XG�a���l����G�T��}-Qv��&W*
�{�����{�|f_��b��Z����y���n�7w������mp���,,��4(Y������������������l���?����v���Z�������>o���S;O���-�$+�oH(��[�����1o�� h{������������{���p�������^��Vo�����u�vv����h���g�&��.�f��j����-������x���
��g�M]����n������������@o����3������o������ _�	��%�r���
	���uL_T�Jy��}8?:7
����=�w�f�>��6u%g�!��D�I4���*"�Fui�vc�R�����z�1�!U�[k1Ld>A&P�$z6��	}p~��IY�B1�36��n� ��y�`H��j�\�R	���$q�25=����rR��*'=G�3SU��)�/���]�����8��CL��x�?�8�l�9�)�����������{t�49�c9���"���G
�?���@������G;�H��d����3���YGn%���D.�6*a�P*�YE��.���p���8�Sy��]@�>�����lh��h��\U��u���Q
*��Y�8i"��*�TU�����z�:�l�:u���T��������(m��~�}"������G)�$��V�_�y���(�)��]��'��t��q	8��(���:��b�I�sf�-v"�$���In-�0�B�k��9Y��<�q�e�/B��Cg�&��6���m�:5��_ig_������Ro���m��M��J�B���~���D��\[������w
V�t<�����{f�����8��o��R�x�w��u4
��x9����gp7�g�d�O�`����3���W����k��B~���3��k�f9+�����5j�����N��l��p0����FWV�[�1�A����\�������r��a+7Y6W���i�]0�I�����0��m�`�2@B����I�+��=���%�*n�$�55{������$�<<��i���M'�u�:�9_��hd>������y�S����z{?/'����u*��Hp�sfD���������_���Sa2����E�P���F@���\J�E��M�)m��\�I���I�R��w������N ��U�&����2E����&�edt^[L�)er����uN���}����CW��������^P����}����me?t���t��=z��"����J�����f-.y`�V���n}�W��-F���%P��@���Slh4V�a��`H������H�&5:��R�R������E�Z�\�������:�4g����u����^���~�S���=M��6~@�����0T\�!��;�9��N�+�%u�	�#����P��:���|P<��uqy�v��yV(2L(�w�
��y\W�os�-���.�e��U%��NBb|�hX�S�r(�Pw�w��n���D��q��!��!V�!��[���;��XD���X��iyA�
�9">�U�gQ�VW���
�ai.�D�����9���l�C�0Yc^�)������j���:�Z��u��,� Y��"���������PoeO#�:�<��fT���	&�r`:�~v#��.j��X���r���-K��_4$�"��4%�.����?�t�K���$��J�����%���O"lE��������S'��G���p2Ne~U�2���8�o,E�RW,2�eVV���(_cEGV��k1��p��'��nN�}�A	��6:�*�rp���P`,��DL��2�<��v������s�L[v�6����Z��{���r��C������]�b*7��������?�x��q����:�ha}���"�4-�p_Ty��d1����SZ�10)pvI�  �P�A�I�q��Q��$�a��}�w�a��u�P���Kg'�T�bG�&a-x������b���{TS�?�<��5�BA�Y���4S��N�&}G�N��r-�_���q���jU��_BC=e|�SU������RI���/0�`�����N<��J���8G*@d��k3�W��@6B-�jDX��:`��o7"Q�`�of9��Sm����!@p����z�L�
�:i
�,����*l2�}B���N�b�1��!�p��Gf%���7�����*�3��pc�{Mjy�p�d<J
�<�&-�S'�/�i��9w69=�4��Zd�G���Pt��9���<�.�/����������Bv"�L�)�)��m�N�d���ac��}8���Z�q�����E���:�'�����1�S���wu[8��fU��Z�]���.�JF�1�\����MG+-��~�P��>�~_ue���o���9�=)/��S�
>�<��`P0����M��8�g�6VP�#��U/���K�/�[T��4E�k�������6�CM$���p���n��4����!L/v1��hw���{`	�Jbc���b<Mh�F*'����0��:�������m������d_.����4�fjif����rb��-E,�k�'|Lp�5�s��G8N!��r�G�m��7�Rl�F�|i���)k0����F5���-qX��$��%��T�������xb��"rx��5��I������ �����0�\2����Zp'�S��P��Q��Q����[���>�+#s�avT'����T���09=/��_�jD��2b�j���Bs�
2��B��bwR
��"m����j|)j�����}����k����Y2Ss�	��T���
�0���D_������41��j�J����
	��H2��5.�+�Hl������{�-����C���gah>���U��y
�}J��0�������6�e�h��#Z_}���g�=:�������I��)y ��6
D�~.��)Npp��h[,�����0Qw2
����I��,���D��y
�2y���<r���9��p`?�]D������W��������LH�,s �oi�[r��x�I�7���S���__���#��:�X��D���4y������0�.�I�[D[t�L�b"	t�}� �U�_�P!E\;]��\s���h$[���Z�K�3���J�����nP��* �b���x��O�����%�z,zGr���nP��ty��|���g���^W��@Y����lKn�c�a�����e���tF�2
��b�4����0%���X�M���Rg�����5�P������C��Dm~�������:R7FPlv��T�g�9����2l��z���\%f�����p1�ue��\��v����sa���S���4�-��4f����CMB:H��b���+�.����k��$.�o�^w1E�y���b�tDp6'"�:�������'��N��U��TW� ���w���p��.������?�����Q�%1�6����)24��0�0���	��^(]-@A�&��B�z�0q�b����!����L���i�/�����M�u(r�J�t�]&�B�x���,���8�8����`���|����|c�G$��dB�DP�CR���=��)��])��,/�mSW��d���O1�'Rw���-���,5M>[��������;�]�;�����DKQ�ht�������Es��i,:�(A�$v/���(������E]I�!@�������2I$�&<$��!mt��J��.=�:��^�7y����>}�.�f`A�����/,�C�������VA�A�Ob(:����� l(�:s�H�"isp������*g�Q[������bwA�	!�'!o1&Ee��d��lH�}b���X��]F�8w~�
���� l��Z�a�j!���#{.&����07<�<�0M#E�������a&�w6d��_m`?/���
s��Sv�`�|S��v����)��p�������u4JF�m����>7�\;���r�K��b�
����'(}��l�T�u8��h�.�G�E$Lg��6�SMN�t�0�A��v�r�9�P����O2�OWzDK�c�����;�hIr��������/�]-	I-��j����Vy���L��u1K�4	���p0ij�e����v'tk���mV(H�r���b�+m�Bka��b��YGJg���@#"��p@.���Au
���HrX����e���t��,xd/�@��z��q.I��0yW��}�����n���9K�	t�����]_]��p9��g���w��3
���7�8NU�zQ������K����z������Q��NPgMB>�A�������FB������&mH�m�1��P��$�:���W��Z��yy���M������W�(��N������@*�������P?�+9}�.����,WP�,Z��T����u#%��Ild^����I�Y�0�_��^ ����a)'���Q�����$�-�nd��4\b�(
���r�(���6�L���B1�j'�2^��>��O��d��`k�����ml-M�	����k(��K�}e�J�Z��U���~V��Z1���&�EF@���$�w"#r�TRo4P��������)V�*��6YG"s'��%�������;����S
S��.�L��j��]�*y����}��
	`�6�h��b7T��x<�*����2OF_���	d�h7�F<�s���}��E<'i\���Jie�.k������(�������IaW�z�~���!�p��o^bn���
Q?�����z�����a����;�A�e����z�Y5�)�j���l�@4f�$df/,��pA�0(����-��z��7W��0k�H5M��d���{$��C����z��+�l����q������\Z�:��#QCsV!1������
�,��M\��=��A�Y)�'/g���Gq0��BG*��������aF��*R_y=7^��k/j5G�A��B%M����Ap������3v�S���`��,����#V$���eO���PR#z0���W^�/��_x9�
���Zd1����������j��w�dW��1%��]������JF2{��i��������f����IC�R�r�>n)���p]�K*��4`���Z����R�?5�i�/��.c��-��i�
�u����k���o�Q���tQ3l�F=~E�'��bD�&*QC����g<����|������g`
�������$�Ij��S��^�~�s���+a�,����z����z�	��a�w�-0{g��E�lb��+���Z��n���S	�Y��l�f��D0����c��31^3����J��G����!�#�K�<�@`�4]�����n�#�1>�."����qf�O�b!�w�����R0��u�;�[��|To� �z���������y0��>����g���m��I��@5^b:V\v�����;���0��.,�8��u�Y��$��	�����P�,������G�c2�y�,�����f����~��2�P��Q���\�*:\��h�8���.�n���0��dAeG��m.���$	
�WA���VmU����l��V��+�m����U������(�VX���M%5���MP`�l�f�0��V����-vd7
f��T�,�ib�g�wC*0KM���,n��c����N+����k0)4�(�W�)��N��f�?e_��E�`N�����T��L�����a��Q/)�Sl63�Z�����*j��0�����r��Y�Y7~d��#��cx��X�%(�	����Ia��d�	�G���3r���'�*�2����}�3o�����;�����^�Z���J��N�_{��/�Zn��z�
�� �[_W+29�����*|rf��u��*�qZ����h�|YrEG��?���SE.�u��3+�
ij<�5�Z�K��.���V���������?�l��@��@�~6��J��������<�'��Z�����,r����P���\<���������U��i��Hi���!���T��8jk�"u�t
1Y��B��fL�9[�;GK�3��"Y�����b��Q.
��Ej�_U�63�O���KmGl��Z�l<x=�Y�	�5pH�ie�I*Vg8�=����h
���~�9l�����X�c6s��o($k����I��'�d�p��>�Mn���E,S�|v���iMn�������s��,�`27�n+��9s�!T������B��2i��
m������J'���E������[\drS����������4G(f4�b��z�>A��jp� qn���:���'�6z,7�(�q�7�i��D�����H(��!��h��T����
R\���-��b��Hdf���(HR�U��j@��}�{���	gW�UhvvnMG�I�2�R�xs�e&�(3d��j*�g];tM8�W��;`-��0C��!{#g��dl����A�x�y�nY��l`H\������gQ��h��C���������r	^�����)7P��	����d��v3�t�~7{����l�a����������'��u�ZY9G_r2��4�Y& ����
�9%��RE�N���������1P	�U;�����<�W�zi���X����9���v��5�D����h�C�G��[�����c��t��GB)w`����tS�t��n\�r)|�h���pC�L��Qs!biX�3�&4%�Y�=e&���Q.Pa��A�?�����a]�!��c5��^M8�>�6���um��
�G��/As x?D����p
�����I<4�~�W���(��P��-�w��0���;��H�g�3�wmt���B�<�U:	��z�&j�7=�V�}�u)�<�l����#��8C+"q�# ���hu�����Z�!�|�J�e����!�u�	�����9T��6eo%�Q� Tb�����b�?������#{SkH��(�J�p}����k�Ju:�����0#u����^�?%��#�
�rc�9�M��&<z�$0+�/�$b����\���U�l�v��p6��|�-^��F�n�����
�M���"��k;9����3@��C��(JXL�>�|�l�������� 6IH���&;d��o���9��<���y�p�@��������Px!�X�+Y���f��[�d���K��T_+Z�����Y5rSS>PYi�g�����=�o��"������5"\���S*���;+�N�I#k@B���h�;#�q�K�N�X�C�����-���|��}^(�>#�j��M7wnbd{8$�F��,��\rM�x��c�K��a��F��(Z;f��M����SQ�����*�e�%b���/o���b��cu�9FC����!�43|�%�>��j�gqOS�H��5�U�4�=
��^aQ�5�qJ�R�	Q���I�V��(������u�����;8X `�Y���S�b>)��"��,�+4l��ZX����N-������\#as� #&��9���*+�5�����{��6�U��w����L1��iBh4P4hA6�U�w^6,�6�� �Z���!�tGt��?���*����r-O���yx�dwa7���j�J)]<�u�������.
1��$����PD��U��mj?�$�$�Ei	�nn@O�� ���3��2<���Y�0QJ������ ��0���3$�����������^��.8�
�r
rh�Ip$Y��m7�;��
�ZU��je����L��
��������5�Yp����!�d.��
6�%�d����U�r�����W�h7��9�Ri�#���N~����_F�u#�-���v�����{�����?+�Yu���x��I~S�,C<�)�T��9�������M��x?���6�7�����B�<O����"_��5���bl�4<g30�12q3M������TU�>�g�#A9L1��V���.8G6��.����2����*����L�.z��%�����/��6@��'XX���7�ZV���)�^��>��G����X���p�A�0Y�}<��-�3���:���SgpkOM5�gcI+�F����z���B�}x��O7�2�%]�#$V�B�]��w+`����W��g�U�����]���)u����|�Q�j�3�KC��
3gbvbm��������~�A�en��]��B'���&�I�Xv���Bo��v{�a���j��;���Tg��N�9�V���j��M������7]]��{r���O<�������b����Ju	���<�b�H��&�Vu�5���\��0�����l��P�(A��un~�c�/�����`$�P��t�:����1-2���t���R>Q���s]�
��C��v���b���oNn���%��9	�yUxl}�S4��7��pr�����vN����
��n�nj�%�;t�{]Uo:7�7W�Oo4�1Fq�2E$��Ry������=��N���B(���F���)L��L
c��%����ev_x��6'�C��X{�����3L���y�C����c�/���[�VN����>��a?����}ofQ�y���a��J���DHU5@$�d�C�_,�\��-�
������=�����m����u��nQ�5��C�h�P���#fKt�&������������1�K����)��{R����I�
y)��bI%��!`��B\Y��|y����>J�����FLg�W�9c0�[���D���/9�"�������p�ax��+�U.���h��oa�Uj���b���1
{����ZR]^�\���_��K��r�
d�*N0�u
�2���R���U�q,�WV�E^Y��m�g-������z������+����?�N�;�J�����4�����$������O�.T��������e�[���a�n��$|@O��������}��G#]d^�%�F�:#��$�D]S����Z�W!��x2x�*����9W� q�����c��]c��6n����	�lP���3�!��1�zRN���,eMwnk�V������j}*�j�Yg�{���nm�v6
������]���O#e���kz�����F�6y�Y(��1>��9�:UG��3gDN�B�����r�5[t0�G�4���wv[����=�VUS�vS��f2Q�A������<�j�]Smd%5[Llu�h�E������G��M����Y`T����$��e1v�<M�sI���,������n�W����|NSm�o����Q/8l{k����lI����$&�[�;�T�yo�5���}7���sk�������?�_�r�N��Q�
#�����������,
�S����
h��� �"���>����A�r����)������pf�.+�x{�]���L1�����sp��?[��;������vko��Z�����]�����w��M�|��W�[�S
~����������kt�m���A����������������������^_�	�:������yF�S@�[e������A��Ao��������bT������W�a����:P'�|��W��g��gXl}z+_/�`�L������[��}��/.X��rC\?9�t��j��
Fm��6�0
G$;�����L�j�������hn�����zO��R��\>c�Ko%=u���<��[�o�&�ra-:�� ��l���G [����]2H��j��?!U���[�J����Q�`*��'jO��O�]�.��x��|�p�/�3�p��x1������R�ZR��8<���ku��DSq��dy�?�[��{��u�'���%��0B����o^��Xe�S����@��k}tg��l��g���Z�v��?�/�lOQ������^����A�b�6�j�L�_�7M�h�'Y��h7nO�S�hij/���e���&�����nZ�e�fL���s�\����n�UuW(��P�WE��6��|�Ry���D�����#%�i���<��V`��G{�R�#[+�*��}�!����/0�A}����	��Y�w�}�C�����%��u[j3�p�x�7������0�G\�/VZ�����6+���	A�-t�8��9;�V?���������W�hr?�jd���xwru��s���xi��N���K9���e��F
<���/Q����CR=�q�/���pB^����3]�*�j���|�:��Xo�6����W���|n�eM�Wx���-:�/����u�����m�3w1�����^L���t��S���p�
3q�����O[��(�h��^r��W�U	���K��9$$:���������o_h��D>es�c;h��f9#X����e��:!��S>�����������6���^A�n.a��g:9�����C����/GA�KH�}u�@` �K	�(�����k�ub�V|��Z��z�W���W����^U"T�T���9V���U�c��Q���$������[���Y��?IG��N���t�S�DU�j0
��]���#�������4"\-�p5���|o����Y�G[����j&���w08
����A����������laS��'g=�q����u�w5a���l��9`Jpr���z�5���k]_����D��Ti�������f���8t����3�)_����,�P�Z���1wxM>�v�C��=����{�����N�tVN���O���~�B�����|���L��V���|z����Kr
,F�<������i�v{;���f�hg�� o^'9����jf� ��s�.�,]
�������|L����OSb����9?��?�gu��������-��JH���cB�2(���|�8�S�:.���������0�%���b��J--�O9[&}�0s���k��kt��&�si���f,8���������q��F?i(��bkf����C�!�L�p!�AU�vp�J����Y��(��CJ�g�(�m�N��{t���������9���C������;��4�9�������J�$�
d#+������W�b����x�������M���dD���-:WW�W�Y��f���^��t���9��K��������������O�{N���N������nvs�u���f��[�������������!�OMO�5;�v��.�����8q�O���oO����^N��C8s�MI����Z5	_+9G��W�r����Ykk��`��+��A��n�� �k�0�A��0h�Vm�KF��R�[� ~���/H�o��5�����x]ea�����[T�*/�@AgsS���c��hc�8L���VE����	#����c�Z������������-@����M�?��7�h�D�]L��MT����.���?����0=�`=n��.�\���4B����2�Q�?�gl�K#e��T:
���t���_�N��B��u��Gs����ViN������Bwa0�����t8����������h���q��U��f������=��G��fxt�3���J)� ��f�P,*�T�����EZ�.4I%��\S<"��^DG�fR�N$��]#�C=��_����������s���"8�n�����8�����t��8&`/���b��>�o�kTo�����n0����q��.��t���%���f��|���i��QW��q����'
�c��o���D�YK&2��|g��������O���k����{
�;���g���K�����
-
�\��~��"�=���5�+��u��y�+h(P�z�u�;J�oL*�f"I���J�,X&Mw���q�`+h�5icT���F�<�)�{y�?�S�R�����B&��sg�^u�l������t?��z��"����<}&O�25���B�?z�����v�^�B��R��)�GL����]'�w��%�)-��1E����h�"��#��V%�28J�%�y�������$�d9��:��R�+��������2��	�-LE	�}��5�`�tv:M~�eg�%,��B�c/Lr\�@:L�WI����`��|����A����4�$�3v�t	��:������/����::]��w�M�5�O*;Ak��\��~�pv�R)�2�#K�8���uyE*.nb��O~�|�����T*����<V��9����p�]x���kl�������������>�G VK�
�YW1�=�p��J��8��?���0�T��|sv��V7�������7�bt����YM�s�m��o!_k�w|����u��t��|I�2��)K�"���(�)G�P�>�����|�r������$�Y�*���Z\� ��>9[u�K�9I�x�
.�F���[�Lg�,����
�d�$f,��M�@?I��D]��z	�ODuA^S>��2��@��p�q4R�8��O�S����b��cK�P�1����v�R���b�2r�����dTZI��'��k�Y��.xk�u���.�8�\Z 1�������N0���e���Q����Dm|"��G��#����FS�S����h��MX�T�VmTX��;��=������T��1�S5C��76�9�s������S��`������\�����KT3V2t���z��sR��j[0�t�J�"�kkM-g��"M`8b���rX������C�[���n3�b"j�-7�L7��Hmwc�����v��P���TT��'�(�F*b�1�R��!H�\���2���gJvJ�x��~���q?K(�S�m��#b��`}k0�W��Tr��6���Bn�H���f��	��r�eKRq5L��u�b40�5N���Cy��M�\iM�lV�gB�-�R����D��(��Wm,������|�b#D��d�6,fxv��Q*t��d,d�x�,��CCw ���y#e�I��1�e���z ��-d(mR�T�$�m�� ���Hfj���_���TS>��G�IKme��\�h�FYV�f4e����\��!�l��d�8�r��*�8�b��8���TX�}��{���/��d�n�y�%D��,AJB��s�����}�p�9��M���be��{lr�����.\�G����]{]��*%�Z�2�u�6�!��#����t
�N�:�3�]	r�'J��U�m�h�bs�ZV�S�Sg�)[jRK�\�2����O���YjK	7���\(��*�=��O���q?��;����a>��<��M�1�^�P�L�
^����6���_��_���k<�p�=�o���,FV�����+��l����dCg��TM`vu����C������:_N�6fs�J����y?�Q��]�����Es���Ti�^���8����u?��#���o&��T��e������,=���i]dgj��i$�V��s]0��6t��@�&A��L�+��Y����SQx��j�Q}���#%�4����
�A� Rk����pY��j�^����`�_�s�,��7P��$��'jCU0�{���b�����m
�&hl�|U'w�Z�r��7��p�����������n����w��>���R{k/U&���N�&��-��s]���^���GT�,��~��o8Op#��t[�)fk��51�.~s�����vj��D��q���b�����m��'1�����\5�)a�l���c:11����\�������W�b���8��BR`�7
�BB������c"cJ�< ����lRT��W�{��?m}(�����(�}�K-O�� ��`��:��EZ9q�%w���s����D�X*����#�Z��U�������i��%B���{����W��m��1F�Ie]��W������wf���i3�IB� �s���
I�������m��@\mCNu���z���Y����z����K�m���2�MJ��c�A����9�K���T
�����*�dQ&rYG��m��4[n<�<���L����2�U7]�^y��G����XV��P�h�Yk�������X���������=��k����gq�4L
}E]��-����o�2SR�0%�&,�x���������e_	�������!����p!�]=(2T_����������yh����D����y�	��N.����Y]p%�H5�(���&���sL=w�zL�}5E�G�����Q*9N |���:x q�s��2���*)�>bw	�=pyV�S����,�a,1�M��x�QET%��h��_��J��P--���.��]�>��i�Y
fH�A��t��0y����qp4��b8�a������DR�d|1a��
�&��k��MQyl<��i��8�0a'�����Y'�u4�����i�1^���A�8��T��S��t������y��t��*��d�uD�!��h��(u{����������N������w^LT
OWP�CD���������:NX7:-�+Niz�|\
4�����0|�)�4-������a�[�b5�B��1N�OE��x1s4Vd#Ms�����+7��o�����3�B�x�uF
.G��N����K����;�.�����h=����ex�g"K��6E�2rx����1�1�q�)�Ep����b��b��0�N���"����$�-l�
VF�����T|�X�^H��M�7�B�-�X������0��������*w�0U�W��d��qA�������IK��h�K�
l�|-�!��|sc	b��~�����x���$�Y�0�)�m?r��gK��X�D:�9����N�m���W��.I����8���W6����Z�)F&��/9�8E�N�	5�F�r����P�e4��m�'�2�5�F����f��������~���q�T2��s>+8������QC�������eQGC��p����^a<��w~8������o`��=8����Z�~��B�G����9���1Am�A����l	f��F���\;��K@k+N����EI���
�����HVYP�7G�q��O�.+r1x&{��/8��A/���6�	[i��;������,fzK�F{,��C��"o-�������������fa0�CvE�s��=�B��<�t)Q�\X���,#D�Y�IZ)�UK���2��#E'���ed���ct��H���/)�I�	X���1*���S%.�l��J��^��m0��N-A(���h��+=���6���������C#S����f� [��{��E��O=�g�d��:���J%	r�<���D~�7&oc����zfK������C�9�ZT�U}n;j?��`��@o�@��l�'�"T�@c�FR��r�����hA2����x`�c�) ��^G
lu}��]h2�&������j�B��w�������p��j��m�iG{�8��\�^z�"�����
�g�%V��� &����_GK�J,�N&1R#O�ZL�5���[��{�+���T��Rz��W�V5!�(����hAp����!IUB�^���� ����Db"c�LR���=!�+��~�#�9����#����}�Vt'/�G����wZ�m��E��>{���a���D0t7����4@~�_�Q7�{�h������!�L�H��2�����$cD���{�����i�m�������x���N]�	�$Vi�d5E��jgLgO��Y3�g���+��g	�U'�\o38���~x�C5r'�o�19�rX��������I2I�,��R��K�
|
�m&���f�;}���-�J�H��a������
*�����Wu�&L6�A8�]=�6������r�U�N}�j*�<�1S����	�����8N~�>�!T���']%E2a4A�#W���m��d�0�~�����A�w��!X>���&�"��kJa�-�%��.�|=D��0���3��cS��M�HcD��3����2W�������������e���WiR�?��{���1x�L�w������4O�d���Y�����N����9���L�L	S�p���rFCT�3Y*�#�sv������MND�K%��sc.?��/��E���X�z>�:���9��0KQP�)�Du��?�9f�x�B���s�����g����f�z~o[<�Zx%
��������yB}�+\���t7�P�.����-+�/���%�y��{����Y!��j�����<S��v�D�*R �
�}���y�i��q�SX���I��rP
f��B���Wc����)Cyj�%L�0&�'rM�����P�v�.����D9��Dm|���<�|�D1���x�o���	M����5��ry��}����/��_�3s
�^�d�y��#�#����H[��nO�$�=?������ Jw�I�F�=�������\��l�������k�EA��E4Y;��cR_r�t�2��A+7^�!��+�<���(OD8O���Q���2��&���EG�����h�k���(G>}'�c�d��H���	���
l�=�����Pr�wU����K:Y��|w��5G_�29��1�7����G
�>k�J�<�
�hvu�]D�a������M�����	�u�H|�8�XFuG�5�T
l<�Di}G1�0_��!���Zbc����l���'p������^[F�x:�9�M����^w;���?����/������K���xs��{��`��>=9?�:���2���s�YJ6��V�YE�M@�x����y.r8<\ � �������jj�6���qF��b�n�����x��b���#�H������:n{�B����J�FO���+Z@���V2Nt�00������=~��#o�_��/h��f��P�����H����#J������m�	n*M~%���6	�~<q��^C�u"�2O1G�����R�Vm��I�nE���:�H2��t���:V�h���x���8�ZX��n6BH�v���[�������Ql���'��e�x�l(�����n��\U�f�B��\=����i_D/snX
��X�z�� C���Z_H�Lr0�}G�PFV�.�i��Q
���*�
T�/>y�$�pa�&�9�])�<���P'��P�\�����R~z+�XA��z�=��|�2+���O�nE�a6�A�v�_.U������j�T��2��G�����gW!o�����g�el2�9��7���$q���o�I1�*}sX�<8�B��V}�����O���x|�x�l;�|�%s�O��g��v�itk��?�����(�%�*G���*�k��������oZN}O��d(b_d�M�'��(7/��@��Ic�����GR/�q���d��$zVn�����=Jp�����$���SMRRY�a�����8:�����|(o����D����*�daG��[�9v����|���f��#��������lA���8(+����Mi��D��9,�v��"���>i#��"�sM>������-���<������)�W%���8��-����x0�#:�I�G�7lwI�@-�M)�R�:!������z��z��0@��6�j����3Ln���j�T�����
�b��P�x.�"y��9�u���.~���$�nd�v4�����w����SD�w2��>'���3� ��g�Q�\^5������\�
+hLN7�M��0sm��oA�"���"5s"�i�Q5�XqH���%kS���ocE�
!	a���r��V���?%��Z���8+��M��2wz�HM���x������9v/�Pf�.�%���8J�|$FG���t��~N>�l)/��.���E���p1s�����%�X�k`��8,Fq0�r�6�^�*����W:Y�������.�cg��&5�u�d2� RH�������;�u���"��qiJ % �o�%Z����o��a�&e:u���	O<����N�&�`N���y�,y�i��E������������
�
tJH'���E����I�Y�&�4 �e]'#��^ ��I?�L�2������|�Z��P5���Op���sS�0�=b�����%k�(Z�)<u��j���(�L*[�$�a���9_�:F���ba.���������by���Y�u��D���!,-�_r �?%|C�������}4}�@{M���������*b���c0��<��s'��|Y�B7�2G�-8�$���5�����7���w����_��p9�m�X��h������������X��Z�G��G�����<t�&M��Dy�������������g�����8�������g�D��Wbbe&Zn�:������|�������I�2�B�_S7�X������mm�{� �D�g����r���|��[��W�{�m�t��s�>����mjE=�
�4�9C�|��2�U�TT�����a�����~�4�i�p#�?1*'��^�!�l3~����^$6g����h�Ot�8�NI��@��#\��W,C<P�b�Y��9����� ��"7�)}l:G�x�����G�J��"����}�=C�!wJ,j���������g�t�;��G�a��W�:y��7����o%XL.F�$�""���u������M��q��������P�;��r�<�I�CR�n&���v_�up�cz�J�m�UE�v��f4w�^Z�W}�Y7�t@R��������h�*�D�	�p9=8���x8��~��$�v�8Q@����GF���n��x���'��{�������R<E��I<�������/����	PJ�������$Fu�������%7�RUN����gg�����b|��V�d��x-�,nR&\��A���5��8X���Z�q%1�
��s�'�0�Za�gM���������Wm57�������a������P�T�j��S����6"�Y�+�+g�L��[��[���t�8
O���L���P��������b�S�2�K�z��Q]�������N��M������8j�0o�N/\�z��M�������#�\w�����l��"�d�\�4Kj��%�F���s'*���cI9��A�8�1Fa�$Y������9�1y�
�F��<Zig�>��~�F�����{��4��1$�t)�[��v@�����S\\����.��l/��
���.	��������	���o�.I-����$����:����l�V��ZQ�
]����S�Q��W

mC��eZ�^Z�GQ�%�~����Y����SD�K�Y��3��L��}���}��9Q]����I����i���#�s&��n�����4�$�lyI�{v���)�G���&G��fy�hn��N0��5���R���5:7.��	}��|�m~w���*�IY��k�����<e��=5���������M����������{y�*�a�
M�,|yM�t(
"�%�%���]���s*�����-�u�:�y~|6���f ���)��K����2l{��*FN�6�����Z!�/��&��f�&K��T�(�QG9���h��P
����d�B���d�W�Tw�A1����<�7��o.�C����O�q��:���s��m�A�W�^�#�����T��s>�a�[��a���Cl��%�����1j�6�L�"�$�#������%�O!�r��fh'���
}��;�i�JK��
\r���`��Vst4rtS��o�kc�{��l�&M�1��]�Xo�����@�p�]����7^)���cV�J��k�,���JK��?��a�>�V"q��@�i�������>���9z
���y����-�����r{��>��e��4�1e%�'C2^��3�4��r�%zRO�Ny�8�y�aeQ��9�9r����]w/4�@��a�1cY�7N?9����O'�4���i����{mo�v����(r��bV%d��@��L��$�����%���8��i�L���u&�z��O
 �r�@�c����0�D���J)��bZz������$s-��L0�����B�����������mK��G��_0^CTQ��;G{�n��X���
;��O��OEj���d����(d;S���S�q��N�7g')�@�o|PM$d��v�0�2���.q�@^�Sw�G�U��GE�s�n%=>��3wP�.��Bx~���obR���C���!���cy�>N+k�cN��5/���mH�]�&� �L�b���~��
:��n�k�9�i��:���!����,�9���������
7&T/�g���
m����BF�*�q���X�du&b����E�N��E?�2�vC����l�x������W+D��?�u���������Nd��[E���ok�U_�h�_����}����
q����mwY�=��s�q�����h8T��-�8E��h�=��/�]�7+����^���<��
v�=LHz��Wn4k�W��j��I%�[mh����� e�@���e4P��e�`B�����n�D���M��
�qwz�%G��7�������7���b�#�8�2�D#�Z�ht�����<I	$r���A��w�����?�*�0����F3n:r�d^�k����vK�=?'Hmt�!3�����
�4����V,����	�ts�����3<�g����i0�S����)p����^.L3V0@�|u�����M8��?�d��4�V/>�x6�m���~�1���F#9�;����]��qo�����n�G��!�pn�:UJM�A������
$P���[6���eL�2ZB���U�&��m�����+7.���kR�����{�g��\[��7�R�aa03�^�t1�_w��:'��v�0�RX�f�I��h���*"�&[�����x�?y/���m�(Wn�3n�N����]������-�^�x��M�`�CBb���IFLw���C0���h�0a��������t������u��C%��lng�bJ�y����i�p��H�K4kB2-��s��1���\��<�C��i,���3OJS�Z����3vIS�����\����������`��a��$K+���������A��)����M*KNa��f���@$��:�J��u1�(����'����7u��x��,{���A�c���MG�k�A����a>�s��j��K��\���hle�W��\�v2J���[���\D5�Py�9�w����.Xh�y�N�4���n	c:F�Q#�ed.u�d���z~A���,�/r|)����
�L�v���X�O����U���������L��9���F���@��8�8�:Q�8a=�����d�.�L���
�a4�=���s�J����p�{.�zr:��K��Y��5	��e:�X�p��r�6Gq�sH���6oX�����&tZ���r��aLU��8�O�&�h��)������JipL���H�W�8�XKeqQ�f��;8��ReK�OUW4L(;
���f��$FoW�3�68�~h>���������	��
g�������
?�.UW#�2]�����\��!f��;���q��j�S'���WX����I��_��H%�%�����$Rq!H�m���$��n�V�l"K��)�������$�I�����,��`NI�Ib��J
U�0�7��L!dL���4��0�6��FH��a������0�7d�p�,7��3n��"��BS5WkCG7����6j9�^6���G�<�R�����zL95"����N�YSdE�q�dh�V=��4H@|��.
�>;�	v����!>xG;v�i9E���G$l��4+�mz&pW|#��U��f2%����,C������D`5Xe��_��������/�L�#�-�MPn�M�p���v�S`u���E�'�1�A�������\��aL|u9S�W�u�|�jbr9����$V�	%�?;��<��N:Na�F4+3�&)?���V����T��8d	T��rT
�PCT��;s�W�s���V�6��Yk���bJ	3�	�P��O��k'��3��S���*A@��HR@6Og�X/�y���a�
{d7F-�;�-����
���h�suT���z�J�u�I�h��>V����jc6�&��V��<����H�,�
�.�d�i�����u��qg9��@�?�Y����"���rM(+�� l��G}[�J�1��3# �%��\�B���Gw�-@Yv�
:���S����u�'oA6���f�����j��*���}������b�8�l"���:I�<V�!���*^C�&k"�]���s.������jy����LW�8L]�rb��U���\�^W�0{LB�E}VE����G /����������vS�%��\@�AK�9'��C�:
�`�(��W�
�f�L8��9
������!x���p�w�4�V<1�.��1y�r�#���(8���F���9�Al�R0Oi�����N�h��_mN��L*c���X�7ZGU���0�U�$D���C��,a�a/��X�E?�V��r�#�M����K*�g�H��tZy)D��������4��/�L�B��\�������:X�:��
��>���H�����a�pq5i*P!�h*,t�s��}I�� ��*��3�6%��A�zL��qk(����a���W�W�m�[*O��'%sH%��u��sz�F������D���{����3�$�DL��E2h���4�
v�������'#e{j�o1U������z���&1e�i0[�Tv��&����~�UV�P��1/Qs,�j*h�	1��ka��� ^�\0���d��jp��5t�l#`�w�{��fs��w�k���ZS7����,Z����Tf�N0uqO&	p��I��,�aj���|��rEH��f���^��^8��K3�e�r�!g%��1�$b8d�a��.�K�������^%�����s7	u���\A����R��N��$���
��P"����}6~.���V�B��B�%+uuU�%���L�j����@��T�|����0�n/�|:�gm:��H�7r?�b��"&�Y����6��:
��[U�(��V�~N��3
��_��.�����N&n�j��������y���o����;7�I�K�h��/�������l��������������V�	�	^z,@�yG@nGj�^e��t���Iubs���#�d���q��� �"C�
X?e��j�y��s���6�>+�������*��:Ef��b�t%U6$[�wX���ZEW@	�
��>�I�"��������]:�LMkg6��2����\��#����{G���B*U1A�E��%�4Z!��*#���ev�lU2�}!����N-�s�4���,�:kv��)i�$h�>���P��q���q0��6��i��A��Z���,�X�"��	������l{���S����%�'�s����*����B*`�9��;8"���)H4htG���$ �#�������,@>��A���1^�dQ���+1��1�i��������l1k ��h>����+�.����C6b��x���6����%m�V�sd����y�^F�o����~;�����dY�t���1-��n�<pR���RL@z����f�}++%,��-+���fJ=���_���]�#�d�����a�����U}��3k�R�HGZ��*�K7����f���t#vw�B�r��U�M�
����������V�������n�
r3�AfR�	Y�����G�>[f���<6Rp2�h�Y��#�V�$������$?�����	���?��XE�x��l�����i@��Pu��hqI�#r�����-$.a���1�#��������e�5K�*K���r<A'�|AB"Cku�A"�l��Ed����	}������:r���-��k�s�i������n��E��AL�u�y^�-"���S"w6�;#d9D�����$�#ND��<�x����
��3$���p|]�&=�������!.b�2R���Y��-k^�+`CQ���ux�$�V��w�X��*��U��W!Po�����F���H��
IE��g�����I��1�3�J10�o:W������j6q�k����2����HTF�~��x�l�
������*#��*u�iH���V��#+�����F�
iy��)��c��m��f��|��U��#o��=@����L�����h��u�
�J��I=����O���?wBv0N�|�?_��2S�@�-e\�h��HN�|�s���zA�BTH�H���xh��A�����1�SG�\no�F�jY����3y�U'��LvzW�}"�a��`1�����0�����?py�e,�BEZr�5�.�M_�rX���+�$�$�,�|��[g��a4�����m�K|9;[���-/#������I�y����?����u��J��)e�cx�JL��_Fse�����3
*��-4�16�Y;�-���L��*5�( ��������B��y0�o%0D��i��t1�5e�����Fk����1u9�W�p��D��e4q��kxY�S��C7�e%�+-W���R� ���Q�!nS��
1-���Z6��k�v�E���u�L- %h��yi���V�S�3���R�M���\���	�BG���<k� m�N�k����*N%WP��9P���uP��n �%���+��r�:�mNv	W${�����6�����Sb���B/������	H_!���-e��m_0�H�[
i�`����4-'��\UJ�)�	���
����x��%yL(�����uN�1�H'�YD��n�NV�/,�;K��-�&��J�C��:�b�� ��=L���{����&���>�XF
\�H������S��L�u�O�r����,
����Y@��)j`��������������
�Y�3�Wgo���O;T<�����e��?��=!�$%dl��H$�h	��
`��	��rF�:_�hX���2d�a_D��i4sl�T;$�~��I���Q�mY!~��{�9�>��|�^��������$����wgY�4C)���h�LB�E�t�h@D��;X���>������g\?��B��)���@�������i�Ru��)�����8@���������(��k���F�OE�����1t�I�]�$\�>W
���#i(`gN�Y�v>z)�������H��:��j1�.��K15��x�6h:���o��_�g��w�D	�I�K`�-��\X��!L6�%Y��f�@B)e��x1�IC�J���|�ct�%�`���|R�}�"�y'H����
#*np�n�2M�3�Xu����~Y���{]f��/�_fY�C��y��3]iM'J
2������Q[�K������m�����N��w��8	+	tp?t>�G�;������b�8����)�j�|����_N����
��;���K�(���"��k�k�g�F-���y��j=w3a�����Z�M�	"}wm��R0����6hG�0�����#%h?�� �e���ul���E������}]����������f�x
mU���j��y�s����u<�_�D�,�p�8��U������&qof�v,����}��.���z�4�c�S�pUIu!��{�W�.~N��?3������6�^A�%���3C�Q�X�{x��1�����i!�b�c��DAdk��+[=�nq�[�Rkq������W��G�$cA��KH�����	JaN@K@+������u-U��yW�����FC���$�J���Z�������)�%�����X|�f�p0let`aIxX��j[��L)e;�
����������A?���W����,���r2("�r��S���e�6o����n8��0�\0��jF���H�����p�������~�����l����,�S���j�;V������Q�N��A�t�E��rI�	��$~�y:�q�v�.]�t�F��u,u&�������b�E�5���`T5�H7���3���^�,@����
Xw����)��fM\uY�@� 
����Ce���O`2�p��7�sk�3nD�x�{����Z�Cj������EH����������P��lF	$eVc�&Y/?�$L�����g������y�<3K|I�4qfL���l<�47��j�h�DU�n������s�8f�#�M��B��{�}U;��|<��	�_��Y��,$�J��+ ���%�R�k}k��%�*kT�[�V>��k�������LLA/����	�����v+J�6��v���>���o���I�1��r�)�X+�Lowg��
��n��w3���;�jf{k�����
~<��zwy}���s]�2�����a�$=���qF�rL/�:>>h�w@����v^��\���C���)�_�7��fEO{X������O��$���R���8\F�(��v��sqs�������V�[>��&��]O 	��X��:��~0�2My�#�����fsk !?y{�q��)��8N���SLSR�U�|���H��Y�xXY�U��	����
?���|`ce��%G�p[j�]0�#���_��������o�Wv�Op�ae��������u*|����	��	�WE�x�gq"��&�I�� �c�;;���^yyw�)�]���������fS�9mU�I�5����.��p�[�J�;X�Jw���������?{��������/|�
���!$q3vp�8a��$}2>:i���Lw�g���V��[���sB�� ���j]���AS����U�L���b�X1*�yR��.�������M�p�>	>�mm�^���,��i�[��
��
h�W4,_�S�������q���}�����Z�y��
N*��z\$6��{
S�����d0��v�>"����^��2�����4c��P-���(\�e-l}�q1<�/F��d�r���u:"��YvV���9Lx��
�:�z`�!���Z����Ym3{��|���3��]HZ�`�������4`S9�x�z�#��[��g��x�4������NT��E�@YL��Zl�7�?��!��O>H��]��He�"�a�
����������u����M�F�(���������Xe$E�.~��F)3�2
''���A��<���L=��\�?d��L����7���A���y
E��8�n����)���G���oU�6���l�
D�W��oH��(|�A;|-�h�,B��
&%in=e�w���q����hWc����j�!�����+�)E�L3_��3di�)5�Z"�@EI��Oj���S���[7E���W���d�O�3��}�_���	��G.)@�����g����K���PK��1���N�lG^e��O��k/��7�J���"F�O'�(� 
"�L
h ����(�D�8T)m���n�����$z���K�PhF�������1���%EV�������>�&�d���=G�IT��{�c�ho}��B�|��s�I�$��u'H���s*����o�����;J�TE���~�� �
r����n��N���
�C����d?�\���}p[�������;jI���aB�&-�sE����$c��J\�7�@�y���j�;=���7f��"��I����I���f�|mL����cHYj��~�W��x����������
�-gL��I�6��:�4��^�~��%����}������C%y�PO��^��(�n�|���f ��A�9K+�s��1�3���u��<����p�a�\O�!��(��|�j\��a�_�;��������A)���a���7�gqI���#y��2v3�4+�L����[br�Ye,@,�DH�jV���}�$r��#
	as����SD�q���8E��4'�3p�	cGE��-���a;��{�1�},�"�\��s�SO$��J�n���I�*�S8�QJN;QHa�z�����������9j�>�����nH�p��^!j�$Q3���A�?�q�$���x���(�h�s�4OO�
;�8����NX�)?�������mV��'M(ayM�a*�fV��\g����D'{�O��k��
~�����N�V�'z/L���^�4��?��l(���ganj1E^�/X��g���=��KF�X������j�J1#qL`NY�vLgA4V
�|��(�I����M��t2$>�����:f�AZ�Z��D���eaml�X����9��l)��o��(������FN�v2��8�i�N���Br�.U������:	���{�~}����7+�q�3�u��H�*+\� �R;�Z�N�@IrlAU�=�,�f����7\
�t��H�\�^idy�]�������W�����������;��W�~V �rK=�F����V���q$�������.��g�R]���~~o%�=C�q��+$8J�$�+_��5����g���_���(EpB9�i�/�����7���'d��w_�Wl��)��4ywOWW�1�)d5C�BbK��G��.�b��I�
������$KND�kM���+�)f��f�m����0S	�����c��y�(q�����#��7oJ��I���b�
�K��C���Y�4W8��#�N��`u>�:�K���k���E#�H�7�\`�o�Y����"b���"���t��nH-�WBk�&Xb�����N�H��A��\����ya3EaL�.z�4�rg�`��;8������E�iPi��>�Z�
z=t�[��Zu����b	
�N��`2���,"�R�?E�Z`T�-!+����s���:�*��f�G��,$�T����G7��;�
��,�&�&��������	�7��B�DX@?.�x1�F�u#3��[y����
�hH�q3L�}��e�����T�P�oQ8I��z+P��WDD�����k�h#����S�En-l��I�$�D����Ui���<��<��u�:���1���E�G���+�@�=����n~\���>�WU��`�YM.E�
X%V8���y�'�g^�G~�]Y�JOl
����C�q�S���)�dl����l�_����
������W�����G����\VD���h
�O�c�K�m���1���1�'W�o���.g�b�m��.��D��8���$/r���������[���GJ7ikQ+P��*D
�+�M�����s��^��\zs���;�k�G���Z���;�vn	l�3:'y���(�,�84k�;��3�e!���������������A����	����������u�������k�:�b.(W��l����5�J� 	�<��?�^���TB�_g\+�$%�5��~4�	�TZ�|Y|$�����!��&�H>�������r������_���������R��Y�-7���%�	�K���M62^�i��?������l�`�r�*~r<�y5C���?���Jm�w�ar�4��;�-/"��d]���v���O�;�����[(_��V��B�G���M�B���H8<za��<:���JR��'OJ�����_��)�1#�9@SX��S�l ��QY��`��[[�)�������s���x����Q_���f���)��������Q��E2:I4��U���$1������zJw�O�=�'@����C9�[(���������I�m�����b�lp<-��X��j?��-Y�!��u��z�=��^n`����qzx����s~��9���������"��c�	n��u�q�X��}�x�g���]-��i"L:'�;M��t��}���5��E�L2��6�W�,�4������%�����2W�j.���4�������a?�vM��ROT�)�rV7o�V�"+� H%d��}�O��5.4%� �����n��S`fQ"��`��-���Q@��p������������`�����?�+��A~��X�x��~�$-�Yaw���} �3�J9`��O)�:H��tr1�N�	m"'�"��$'0A��n{�����mL�OGOT,�6�(&�0d� (?" �$/%�4��@+�5��7�yqPm�����-�� ��v��%�zS#0�Q/E'���z�|]��I�l��������ot���C:���	��C4{�X�C�Na%���W�r��)�)%�/�)�<�����33��ag��x�S��r�-b�� ���b-<C/���������!��P�Z �`g��.�WN����$G$b�(8Y����S��a�I�����C?*A���f����s�������\ktCp�$������.�������X{W���Bv�ipH"������4O�j����
:"^frK3����f"����Q��47�[&kHVS�����f�C<Ol��;����� ��`Bk�WJ��K���R��k+��%����lc��XejE�e!?Dc��r�p���#��$������O
��ia�l����Q��Q8����G}\c�,�5I1�5�ZM��q��cn�g�N�I��_��X��*D��6{�z��bu�`c���q��58����~^p�BoU� �n�Y��>�#�<�P���e^,�R7*�v�h"���	;�h�[N{���c����f!�j����>p���I����"��{�(�FcW���x N�ZO�F':*Y�Z����o��N����� w�5��x���N�
&�����<������X�&!"����w�A�>y���Q; ���U������NAQ�W�Q�H*��������t|3�V�����.��[�����2�����;g��0�$���|�W7�Tlm$7�='m�%������Z2�m6a�+���Y�$I�mj�����T�FM2R�D�W?����*T���O{d�i�v��\]�w�7��"�F���"������S}���@�7�g�+����=}���U�{�y^����e$���=�S	������6:�[�E	;��
������^8�g��@���S��
0i��S���fD�H38�t�)���3FKK�5�|���R����C3���WR���ci����O��g�������d�*���L�����E����*�I�g��m��U�L���	���- a�@��Z5���c���Z�[�O_�bF��NiQpZ��?E	���g}[�G�LU�m�xE���������Q�������6��)��f���l"7�B�x����������V8���6���m��V�;����L'���^0+q���9�#�C�R6�� �0���Ya�$������Cd��P��~����@G7����>"���"�-)�-�qo��-Z�8�gq
A�TM��uY��%+�,3�"�n~#1�Uc�=w��������!P��'���%+-v#��E��6�����<��!*b��N�
0c
�T\�$2���8�PW�b�(+`e(��X}�$���k���G���0I����1�?��f��erxzzrZ7K%�Z����M�0����F������&B2qgS��An�����
/��=.��w�&&\�jX�x<���&���6'���0�8���D#�!��t��w%�I�s-6`��"�s�����TZ�N��FgaVA����V�8�R��Z!c�1��R�P{^��������?Vg�*���9��%o�SH��K������_PVs},�����e}y����B��-�������c=bRl��-%,3���P<��!�1���^��#��a�=��_��
��w@�Z���{@UQ�����W]��!IAQ�����,l	x�K�i�C���sN�.�21id���b�q`@�o�s1|���T��Hf�HpR�Rz��%��84�>G�{�n�� ��Ev?�,~������-�s��)o�j��H�v��3��>�4��ffsO�@���L2Nh5�p��6�f��~���Idi2+����a�T<.l2������pJ��+ds�a�Ud�����:s?;�[�Z��g��T����G����(��a�)�M��`�(Pz`<=<�e�:E�*vl�����u8�K�*-K<�d��[
����F�S���=��('��_�X<#9���t!�l:tk:��#Q����� (
?5�4�{����r���@g��?bP��3���a�n���^;�<��A�iOp�%�G9`	J�G����e.�}�������rt)��on�]��,�������[o-������d<��X�� x��lr����Ip.���Fz�����<���[K��%W8����n��d���c782PkS*<���FP�xH}\�>$��O���Z���bN�*.����[|d���A&rMB�s����Rm+�f����u������osR��$nu����'Hu�X�
(�#��^vB-b�2���$�D�1/��h���
�KK]8B�Qv���);�U
�+������G���mG�]Vc������*DI/I�:a.�������� H��G�y��8;��0���8��������K��d�
'�h}I!��#Jh�Y����Dm���E�aTI���G�%_Lf�8W����?��F����������_U���
U}x0X�^�>�!��������79���P������{&J�v����\���]�3������������?���< 6-����p��n�Jq�p��nS���#�m.�j
���dfY�o��n:�:mD������k�%
����l���8��2��w���mvl��.�$��'���bT5/R���i��}�K|tVjh���c��y�i�C4�K�:���~E���"����+L$�=�\��������(�?hO�9�rnN��&�1�`&��l�MF�+��Q�����Pb�#Q����-��|��Y��B�\�N���Z���!�a��l�k������i�����n����U�9�aYn��
��C
�b���jf/����'{�
�*k���c�����	v������D"��u��UP��"=/j�B8�\������+��W��P����=L'�I��b1ly�f]�>����*�~���B��h1\���_ ����I2���`E���Ti5��e�6�'x0!W��p��j�w��O�5�P��vN�eDl4*��C��*�>�IXq�F�c'���U9�B}��&�������<s����7���Na Q������Xf�Z�S��D��q,���
�z�\W��E��.s�cI�����h��C�I|=������8�@��V�����'�@#�@��"^
m����w�*n7�izL���C<S�c�{mh|`1���vvh#w��0J���)�����J�����n�]2�<���z�6=`i��*���&�MW��y��.�i�����&b#���i��g��aZ��7]'������D$$�t�����a,AZV��/V�<������������}L<s�nz(��iE���d��E�����r��Pq.��L�w�x�eX���M"��f`��r{�6�Y(K�������DYmTRd��
��a1lLeb������(4v^���V�e�2,�V@f*�
=a�����I�>YT��l'�wQ�s��=��4e29�>�*������{��rEg,�U�6�����6~~���5�a�������rU��3�d6����o)W������M+R��������	��a���i���g�~�I��]���N_I�h6M�\��8�1g�xg�	R7�}?zi�&�#�vG_��'�H����7�,{'6�1(�si�F?��1>�����9"~�'0�?�����qb�5mf\���%�U����P�M���tL����������}�+-{�����u;��!��
�Ch���R�J���.�tGL���e�_��P�(�&B���t:=��-��D����`Ba��V>����^�ZE/`v�-f�W��)��u^�����~�����CZ��^WE��`�6�Y�I�4$��IENt5��
���!�����p���r���::O�����H;��#�Cz����{s�~,������L��:8'2���B��}v��ZH�����#$�AO���t���9��y���dl;���@���.C�vh��S�|�q�K:�J��V/�d`��	��3��Y�nl������!���UG�������D�I���8��y"g�T�h��K�98y����>��=��S��$�L��^����<��:�?wz�9>9���{�������������e���/(e8"�8����di���,�S�MGK��"�*���.����S^�-e&G-��d���Z_A�L���b{^����q���O<�H�>N��Q�����oG�R�M�x�*�Lx�����>;<���hG1���7 ��L*���+����|__�@
UT��E�9~x9a%����r��H��G�8YQ��
����>t!����������Ao���j#9��������dyizZ��h8�W��yu}9��y����yO������[P|X���HaB�,X��(�r�BW����h���@"�e���@n���z"�,.S�P0I|���;�Y����#-���\80��2�P�!|��]b�H��l����[^��-���t�Mo����v��db�������iZ�
b��VI#M���j�SKz�3
HS���=�X�@!�>C���X���
����Rh�~1�)';�z�*K|����;rQ�Y#<��p^N�����>��t�	+/���V^��<��#�r�����3
TX���D�:WE���\�lo���0��+�"{X���Yq�O���:����!��������N6��oU���]1���I��X�e��_������]�<���>8M��=K5?�E�1d��@��Q���WB2Pi%z�h���=�����:�&Q�)���cPpI,!tg}����P��*��6�~��g�������}�D�h�h��K��Y�/���=�*���;��%������H�����L���=f��!�}��<q|���"�H0�I,P�=�q����
������-I��e�s��Y���NlR;�?�k�+|�*��]���s�����������pR����|'+x3��={��2�����C��m��������M���:�1�O�^�F��?�f��f�h���vo�es����U�����|������n��Y�w
|d���s��{����1�^e�)�
��,:�o�����7�dJ��e���������f,�6@��g�&�������Q~���'������5���������[�s��V+kYQ��1 �X,H�G_�u"������%���3�y��u�\+T3���������Qb��H
��y���K*���#���1'��uK-�g��/�r*i�����W�/}g������6��6���;?rv��~��j<F������$KgN��l	��Y�%gYV�n��W��X\����Wx�*�,��}#Iuz?�I�r��p��u���/��N�����l����Z?3�a�oh�?$-8���M����7#�� *u�		<c����
�Z(�qb�0 v�>y~���+.�P2���
���|4��S�����[���x���
+UL���V6C�9���,�Ov�E1V7VS���lm������6�����
�������z�h���'��r��uw6���N������hn���[���e�&n<����*��n�-<�?�}�m��ol<��-�����Ey,]BFx�o�s�^G�dF��?��UQ�s��Yw�?zgF�OiC,6�\�=6������*N������!��&���]�0+z��3���1w������U�{�]���208�"H�����e$���I��svQ�������kdNS�I����h��)�^��0���P6���+j�E���|!�\��)�����;�~Ngw�]om��m8���������eV�@��~�����K�!�Qa,4 �������|���	�Tr�y���p�V�@JUe?�e.��%���Q�L/,l<\z|7�&���E5	XX���T]��4��@a9�Pr����r��%��<\���������-��@Y����Ee%Z���� ����e���v�|��`�������wx�y Q�:�a���Al�v����������t�QZ��JbE�w��������g�5�����{fcw��I��yGS=���D��������wP��6-�4�E�����4Y"�%�)[��gN��^���b��g��fw�hrt��
��/����LN����Rkg��ag�?��-�BL�g��H�f�*l�l�����~�����������'.�����M/�����s�(n1��x�S?����Yjo/�������
8��]�O���l����c���*	��u�! ���S���Z<��%��	q�E�,.Gk�mo$�$&�����MGH���{�]yI�� ��}������i�@���vK)����B��+I�hr�:���V����BH�.6��vA������H�����@���/����/��Z�9�x�7�;]�K,�k3�(V�R|@�o���T���Tee���{��`,?c,3Y=�����CKC�"�R-f����z.f���A�8�3�<W�S�����%/��[9UYwk+m�6�����[}����C��3�����\��
%������������t���?����k��y���.1�.&
��cV d
4;����'p����J�Th���b
�y�(�������p�|���=?��s|r|���\`�0��H9+Hu�����G��Z�0�hs�\]���������:�<�^����I��\�0epbaeq�67�M����?������/��@il%��y<�����g*�gL����w�:3�8p��h��6��kx*�#	��k�S��#mJ�3y
(�]:����T��a�^e�f(}j���P3q��8�������0�
M���"�)��M%iOPL>���Ae�u.,��"�QJ6��H�����-��h����5	mKFtf�,v�Lc�h�;U�E���O����������\��J;�f�c4%��v�������Z�B�*�a	�
��|�>��8�*��J�|4 ��
�1>���kX�UX<��M�g�O��x���E�F�n�V��o�O*s�::��X�J��������V���>��2��*��6�y9:���As����b8���f��tx�^7����@��
��Q�
w���1���1c.��4������ c)����kP�b���"+R�JU��2�h�2��=�����]�2��D���I�Ym�D����ec��)���Zmy@#\�LUM��/<���A�M�����Mz�qt�}�������D�Z���<'���%�qy���m�����Y��?�@z5��}�@�|#r5n�.[�8�d
"�%�8���E�-|&��~UB���I~��/�,e��,���h����K\dJ����[T���9��18�LT0	?Gl�	�S�C#�����,D�W��,��q�R��������b.v��{�@�����I���e�D�������`8(�RO	��17���t4W�Zx�������;1u���X�M��WG?�9P����}L����r:�w�B#	:���p"5z�x�=�mMF�=S�V�x>ou� �U_��>;�byMW�<@�y�^/�q&�� �|�_�~�,���7�X��������H=V����z�7��:+��.����Nj��x�)�����'���@�8�\���6�$y�6�z<�7i�v�l����Y\�q��!��@����r\�\8�_�bL��^7f�#^A���t}��x��[L�u>
�����(��5D^
��;0�m�O�����h7/H���>����Tv��m��m��n��
b��)�"���������Cs����Cc:L~Rst|^��/��]�}q�
Z�a&��]�Igi93�&�:���F�k��G��*�Ha9����.���<>B��U:�So,*�'��kk.��#K+S4��}�F���?���s��\0s�����xNYr_�YU����L��vhu�Oz�`�������d��E�t�����hnG@u9"�L5�%�Y���h8���x]r�p���a�4)H���n6�8e<�i�l�,��dqK0�v
3�0W(
Z�����}��)}u8�V0�P��%r����� B�9RL�����l�5ZXb��h�8��'�}����q�7���LG�f�����ezi��(2��V�����G{������e6B�3� #�N��s5��`��������<���l��Uy�V��l3� k�:�@�����C��K�m������������|��{�Q��V�;��'9���L_�����0�����)����j�P�o�^����]�\�w�����K����}��_S���_��o���l��-��l�c�:D���H����:��Z��~G����y���n$������i�����1�}*>����NEp(|�z���$'g�g���3�5[���e�)cv�i��_"e6[�e����;<�����_-��A��������9��s�Qx�N3>��Q��I`�F��k��f���������R��g��lr	�������������q�9��P�;?���u��.���]	[��;[��t;g�����w����-��-��T|	�h��\v����rkdf�H��q�R��?�u�Ip��bx`2��`��A�V��e�)tO����n6����!n����k���u:|
����	���n��n�����u7]]�:')��@�=��oS���F<@>�4|E�q��j����`F�s�V�s���\#�LJ����f�eG�-�Eo���]*{�M/��'5��ZpO��cE��7�����mHN\����1��m����wb37�iG���
�i
����AK�+%6���n7�m6���;����'�C�<d�%�3cg��cV�_��V��7KXG�Q~C&�}�s�������)��i�W�T�P	� �
���4����l�b]�������k�G�G���2�B��m�*����L�|��b8��
@�� �6�tc�Q����^�:w�<���k��f��<TG�,
)8�L9f�nm�G�*���o��O��[���~	l��T�bA:��v}�+��Q
P��B�v����e6Y�i����LnG�!�����F�L��Z��a����]
g�hT�z������~����t��S�1�������Zno�7`-�������s,�C+6�
�eh}t���A%�r��h�6���\�����Pk����K����2>����o�v�[9���=a�1^-�8�/�z�����2��x�\���l����oiB��[��O���i����
T�s~Sc����'����]R���2��"������L���F�n��W��_�6�-��z~���wc�?��\z���-�[�����������kkM���6C��AX�/�Gs{{�m�l5����f������V{{gc{������[��3�_���?�x���7��4�����E0�Hc�Vw����gY���o���ww�7w��i����y�l��.�����e7��c��g�HD[���3s>�Ns��x��������������7���b�%�g��v�����t�Z[��g�����f��n!g�?w��������������O���CB�8CfL�Q^�^�,c�e�j�-����|��wK)=��.J���d��5G��]��8���6��[����G�x�q�d���l�B�����~�3��s���Xu��Mj����
4~,���K���z�>����C�O:o�Wl7f�u�P���v(��}>��;�]N���J*�E�,z="��^���c��j1�M��p}�����z�?�dh��:����P�C�������"^�:n��u\1��n�M�!#{T<�����y��������C���GYW/JX�Q��zb�������vm�v��a���G�+��R���h�Q<��j�7g�L���q��}s����V,Vjm<u��D%����!������!o����=�����XG��2e���#�;�<�v��9&�]"�����2�K��5�l���a���m�[e�}��zduu�+k����U/[���b>/�����{��;��F�����������6�yK������?mr8(
y?��W�����x���{��$�l���JR�������7����K/|?�QJ^2_��6{�9��������� ��4�\��z8����_��0���!I������T\�b[h�����x����D���
�M������~�6Of�?}x2ho���j:�u�2�<v:o�O���k�!�'&�|�����8�����h>F��`�������A���>����&e������|r`�Z�qtI�E08?��L!�|�`����b���"��G�7�����-�����}&i��L���YD�]����xt��f��U4����t��4��U�h�
TL.�ypV�����&,a21k����KR�smU�u1�$53�KGW�Z
n��J���$��v!q�jM�m�sM��?f�az��o�e���a
|����6:^��A���b�������i���j1=���YL���1W1����F����������6�2�=T���v����U���G�?h���?[t�F�?���h�k�TWp��i�>;���St2]+�%��������)���*��h���wmTAe�+W?��"��rb=�3����������K����Q��y��z�`��k�i�2N#k^4�n����l��`��C���v�g�5�$g��H����z�p��S������RLq�I�FcF-u���_���
�
�����1�W���>�/������,�:�Xk=����K�-�+n�����f��QY���Q������l<���x�������(�Op�&��Z���[�����}��Zt�*���K��_}fIE��g]rY�������f�o�%�F$����F�������2wp��6O��AJL6 gE�n�0��0}D�V����p���{1�J5������;������?��If�d�1�9�m9T�$�L� ��'/;g>>��R>��b��e�t=E\��������B�p\�v�)z/�y�X~�y����������aJfi��3s
�n��
NRJrw�~qY=��p~A��(���R�L�Di8�gRB_��7��'e�������D�����Y�n4r)�;X���l�:t�\�����k��%��&e�G�q�KM{���1�bq�4�~�WS�0���M���6�����C�i�ex����j0O��l�������nma����5�U��	�[tl�[��ih�Y���`Y���D�U����)��Y�����1s��)����n���sIM��;������3���$�����@�����*�����K3���2���&��afK��NK�M7�0v|�N�f�l�A����I�^FI��[���_��j �k�>P��W@r��%�4�x��;#]V9��m\e�Mg%y��L�m}����������6|�^�����*�h3�����n7{��F�i�mv��j)��@Q�)������� v�L�J,9����+�����%N8fV�����c?������u���������t%�
��[(�/��J9V>�U=�u�(\ubN��.��W�����9�\����V�*�lw;-P������Q��S���
�E�n=e��q�V������u�M����2��B��
2+�2����%��v^��;;��4�^���6�Vu��H�eA)A���l�m"i����H�����$��A���!���Z�A�6M��!�%o��8 _>U�u����HE����E_�l�Wm2O���j�e(k�2�����D�>��b��Uh�V0e@��Y��`xg5X��
w���e� �-���8����Xv��^�(��w�iw������`,%�^7����'}<a���!���xP�3H}
�S}�X���W�|5cF�G����	e[��U�`N����JsI������o�?#\���p��I+2 �{�G���R�����:2��pZ�L8 ������������{}�\�'��������L��G5��M;����3��������2_�������;?:><3�'�����w��Rr�����!���d��,-�c�K-��lW���;�'���1����H������9J_������$��5�����5�gcvK��x�eq�K+Gs���5h,��s^a��~�e���h��yT-��r�����y�(�J�(������UDu�S���an4�)���;�.o�p�o���dn`g�c�Y�l��x������ao�((��8�}���7j�p�n��N�0�k�t��HHP����'OP�t�������:�i�X����j������C�1��1�i|D�2D��W|�87/��N��G�VN�k1=ze.XP���G�����=r�H]��-�w��+�X
1�����t��*��p�aww��4NK�i��c���,y�qps����6R���<b��J�\���f8���L���n{Ap����E��/�?�b��K����sK����c;��������yJ��2��]��������
)��$Mf*�(7v���9>zm��Mv����M1x����l�����;:�Q�4���02�������Y��i����t�1W���T��-�"���)�&�����7��Bg�����x�}�Y�P����{�lh2����Y��R��C�&s���6��	�'�C�a�:�j���>r&
����~F�D�nO(�g|'����+t����r��Q)�5����������K���$���`��Pr�l
M\��=����s�G��>T��l_�c��,o��C��A5�D�\���������(O��F?z+�D��()Q���#��/}-3|}M,��{��hR�R���i�N��uN�~3�s|�$��[�
����$_x���%����#���2�z��c�gH��A��`���������-b�4��6!�D�<�D���N�^x-n��������)��y0��`d���7sb}���d�HH������d���S^�@���K����{P�Y�
�0�rE~�a�p�^u��_��~{��3y"�P6�,1�^P�������^�^����'e����j$s?k5�� ����d��&����f�'I{����jm�b���
|��}�)���0!�d�<�`���p��NW���k�!u�e���~`�H~�"��m���8�=�[���r���O����`:�����������oa4�i��&����=\b��&������l���Wj�|?V�C{���$��C�Q9��|�G$iN�e��X�����j&�w^"�8<w�����eXx�8�5>s >�����;f�IFm�����nv3eoj��d�J�\D��^Rs����W�2�`vhC�n�i��������#j������x���������3�e���1���gr�w�,%��80D�������j(6XJm�8�;O�K�����V�N��f�_;v��@�z�fy4��Uo��zj��rk��K)�<��I;������m�pM��HL��t��nD+��@]�-R���t�yL
s4�fh����mZ�m�Bu>�oi�?��@���2C�����[N���^��kw���������i:@;��'���l�"'T8{Y��(q]�%�?��/�
��-L���6�^F|5�c�'����jL��j���I����{4o$�ab�;��{G���v���'�Q���F'�{Y
���������t��'P<���&���m5]���=�'�Z�\<nj�II���^X18�Ch2��C�B��cn�\mxG���Vh����G�������5�s��G�HK{�bx�Z���v+���}A��IC���Pc���1���E��rp~�����������r���Q3�����`N|�v�7�W����O����P� �t�������wi?�E&�c?'�n�S��=�O��[:#���	.c����+�����<�+ 9���7�����t��o'�R�����Z���5Zg�����,���M���+����a��S�PP?�y��P�	�h������yf�j�R��b����lu��Q���������5�,M�&}��D�L�UP�<eK�
'�~e%�*��
r��	���>	�������:����K�!� �������p#�F;�����2����gG�4�UA������x)&:�P� ��o���!�M.�C.L����������	�w�9�* ����n�j]+����hf�m&������OM���;@A��d���'�����:�`��#5�R��8���n��}��4-�������*	��?{��]0�eu���u��]��)�����������$&��3�uq�`uoy���Y`�Pz QwH��l�����:�������!_\�2����<�l�iJ��e��2���[�?���9%m)��^-[A/i��| ��H��\0v��������'�!�n�����64K��X����6wW�a0����#n:G�6g�^�!2�B��%]V,����X��k��7�?�g��|~��<�.�_���n�W�g�p���� �s�����!����{8$���+aW�(|��
���m�p���~u�o�O���u��{(�Q�8���>�3/�y�m���(��W����R�sYi_����x��Y�������i���9V���^��g+~�n��d�j�~B��]�
H��-|"29oa6F4y"N�tr/���8f��q��f�n�p���a���������Kr�j��y����pA��
�������Fi�7�g�m�p3H�Wy6+5t��N^]_2�2��_7�$�>�T�Q#����U�����:QYdb�%�>,+��/}����F�"F�~De�#WZ�g��
i��k���[L��I��=_�'7���A���J����y�� "2"
����(�8����t�|!���Ae�	C�}f�@� ���Q��������t:�DF�$q[���� i���������6���+����/Ji�{G�]�k������o~<95)up��*���<	��'�����i�����~�f-Ar��{"�|8�s��=}��g��e�i��!O0E������0�T;�����t������%�������x8L�����@p_��{
S>y�z�����sr�2.�.���YC��?�B���*�/��)ziED�<+�1�:`/�H�������+U#X&��n�.���d��#��\���.�\�$5��P��KM����7����!
d���b�?�����Rx�_N���(�g�3~�E���#	���+���sa����IW*����Y����u�ud�>t�$�f� �5�!�+�:��]�#�9��dO�rc�M'Y��]�������-�8$�0�b`�)&~�&S�t���_C�B���AYP�L4	���������+CaH���!��H�(��v���NJ^9$M���.�1�=�$��������~��)����9���}0�wnU�n��l4X&����.��Z/��G
��]��=� ���������\v}��LW���.�Z�z�U ��]�_��Vw�����$�T
%d�b����G�1���_��v$�*->�(
W����0�l 0�Z>����GX�UPr�&KhE
jm~������;���xj�R���v����V1�>�1WL�V��
�"Bw%�[��6�����0 ��m���oj��\�'�����/����u��:#���R����a��bj�lR�����C^�/6�������3v�;�!��P�����=�'�a���2�qd9����2s�2E/�4�
[r�(hn`�6�;��VM���!������S$�h%�O=4��j�%D4�:{�;&�������7��{9��v��f��h]6������Z]"�)����lr�me�n��lAV�E��	��kM�b���\�!=���VW�S�<h�g�V�A�`^��L�������1ff������a���hcT�Od��dp`W�0|q���).*2����9�}'�Lnx@z���'N@��:#
"�-���6c�j�^;j�*���N�sn��%QQ�a0��m���*�����Hu6jM>����
�k��c��c�4A�j�8�� ����8���1�e*��+�0o�!c;�D�F<
j���L�}6�����<J�����$�V.�R���m]v���:�MRahq����B=t@��;K����/�.�p��+�e��2��1�1Z������zM��S�"���Y�������?X�
��8Z��3sjN3vSE� �Z�i:I���_�s�Fz���������$��J������_��p8�>��HpI���:G't?�7�����`�Nn?� ���	`�d-D�Z�.R��*��������j���/,0$VF�G�k��R�v�Mq|��CX{1���%K#/�[9��2����)���0���f��T��i�o�O1����6T���5������C"Z:�vdE��"7$cO'����bE�f
�)~u�@��'	��xJ���|�x��/�:K�=N����g���d�
8�3����^r��������qm��	.��%k #8 ��d�q�X��U��4��7��W�7�F~~���"��YA�O:^�F�����
����8�����n�C��d��O�����Aee��jI�d"���Od�������:C�s�!�3�NJo�����!m^�k=<9[+�@��{:K����^���d�l�=r;����!��i�n�P����g�9���}n
�A�P{0�����#,8�����q�A��'A�\&�4G�����2��������a�y�)=?��0���
������U["2�n��q��O�m����*5W�+X����RL��8G����G��*�P�F���8����Z>9���(+M5\v�dC�JR��p��Q]�l�O0��5{��9y�.��<^	��H�������1�T�
R�A�
�*�/�I,���'�����&�z/O�l
�G:�gK��
�������io�����q!��3�Sk6Fe���.�N������r_(E�I�!&p,>��;R>��@���m@��F
���)nW7<G-�,�M.c8j6) �Qa�������f��W/1qNS�f|a����!#��4B?i�������
{&�`/�xX"�:�T{��LB
���6��
!��
T;����
\�?�~�vi�}�M$�q;�O��d�uN��Y�z	�������{���j\���Z��aN������c���K,���sKo��vp~<��G���R���g�t���@����"y���W�Q��W�LPY�k4��#����#k�FN� t�/3����|�'7�jB����p=�t{�0���r8]����acJt-�R`��x�Q]���m��La3GSq����XY���z|'K8}�ct_�J�+:�{�_���Uye��zS�z�=+�����8��n����0i�6������������4�����������
�P�k���L���v�8
����u���*����"E�����H��C�%�n5Z�����C�����sv	���HZ�aV�Z�W�H<��jj8U�C,]�qq��]�><��{����������V=�Ul�[4��x�������YE�=�!MV]�J=��Z+N�NC�-���).�G��"f�����,�n����o����#z����eo�s%<9�xBg�
.DBr��\�m��x��,��������G�)&�[!^�u��J��v�V��9������� ~�(���_/�J� �[C�����/p���TJZ !����z
�s����Jw�B��ka,Gg�c�`��`��KE������H��Vg��N�yPN�����fP�4l<��jW���;��T��������5Q��BT��^������`�+C�="z�������DNy�o���+�IiA�������9EAi������;gy���I��Oy$Z������AEF��H���	�d}�(.��|KKd���T��#������M��H���a�j������-�p�������p�F����E�X���{�.�����s�P#�DB��PK��C�E�cy����@�-n���<�H�@lVuR�G����_��lr�)�����>Q�,��f��x�W�|L���C��3��X
u
/.�:Ha�s
_�u��T�h�^#���A�6���a�?q���s�t�rX���_S�3���/����>-��:��p|)��f�v��r��D`�(���T���3�".$��kSA*#��Z�X�e��(�l�qXL0�w77�f�������v��y0���f!.GE	o�E�l�|���M�������F�p�����lB]!��HP�d\G���^3������;8O���7	�v�m�/G�q�/��7<H�wC�U��q#�`?
��SA�ck
]_�!�v��1h{��j\f��v������o�����e�2_[j�� ?���{�����x��`/\:���C@���yG�A���!s�h�R4n���P�����J{d��3g�9YJ���� �&�l����c^�&#��/�($$��{�o5���n��b�b���T4S$ 7�������t{��2���"��7��=.�h���[4V���Kt�
�v���T��;��?�I��_��+���1^\}����+5L<��
C�p���?����|������x��Qje�lX6�3e6�����O7h��{��u����BW���:P�V����,���0��aqu���O��Gn�0���A��+�QR�*�3����(���x��1�Q��D�mQ�k����3���O�W�� ���,h��e���N�!g���[t�`�R�E���heC��F���C��$:1�.GkNI��)��$#�'2(f'��p*��h�H/�YN�x
+��j�Cc��K~��F\7�iv`x���B�H[�����!��{�rE;�{���s��@�L�P;B[yH��ukz�<�!w�i9�c�j��A������f2?��6���f�he�rU��w6J4�e6�KYE��t��T`�`�|��ZvH�d���$o�%�X�%���c#	�����v��E�?��8#��j?K�^N������8Y���b�@���Jb�K.��D�T����1�%�nq��?tb�5��s��%���m�K�03Z�,�7JQ�+�]d+�hr�E.�h����!-�w~E�4�9���W�\Q��pJ�G�3���N��T�M�o�������H�
��Z���F��q*0�%��N%K>�w�	�Wd���>	{~"<Ux|l�.���d���Z�d9�>��A�U����p�\�x]j2��o�L��FM&]��+,&����\kh����("�!Z�����+	��R8S�{�^������r-��zY�9�r�s��x�.Q����q��;���b���{��'.����3�<�H5�/����!NC�#� ��y�[���JMW�������q�'��uJ�L0���N����q��h�0Z
�oTqe9�f���{�6{8�E�^=��[�<p�G�7x<a<���������V��1��M�
����(�zB#W���:3W�� ^tvg�9�����2A��y�i�s~HP�?���
��������*���e*'W^�S�
`�vQ��+����U�[�Putwt����.�}rn�>BM�jZ�h
M��Qw�F�P�}��X�DdO2�~z��.��'C8#����hV�wv��C����^P�#�J���3�sH�4�?x�$~��.B!/�PX�q��}�$w��1&�zUe�����������}2 W��7\p0�q���bM=G���$�S�C��0p�&�~x�N�����0��.�;����!�y���Q�2c~j���{i���#������R:�6@��=�q������Z�y,�������������E���z����y��[�L5^��n�<U��[zGh��h���=������1�K�x��t]`
���j���W�y@d��vX�3�
S�{�s�a�\y^�[� HTm������`:J�,&1��1��i�/ C"�rG��
(�����	��kD	���eY3]�0��Y\���$b�E�y5y`qp����-G�)c.��"[�\�����}�$���{dq&���������}36�.�M��r�?����^�eo��(!=E�<j�)����
�%�6i{)���^��>v�H��3j�����e5��_d����g����t{�������Z�D��.�����m������jP�B�zI�����f"�������
4����,�G�&�P�>�)���&0�</U����EL���'"rS��z��7�y��4��'0{J��1l��x{��	}��I���Kz�Q�),����Dp�l�3���T�p�)J�B����B�2k}������7�n��ql�!{=��E�aV�xRzqT��U]+\��T!��x3H{=Do�X�8�6Z�%��6j�,*0yJ��X�,��#A���m�{c�"��r������w����������Z�7��v��
��P@�%���"#�������Zp���$HZ�:+��#��6����p���&gf3H`�E����o��Z�G@�+��s�����MY�f��GDR����@�@�f���uN����Sh4��e�gyz��Qg�z b�����'B�yV��z��@�f�!s?�mF���6�h"`,0S�1���>2Bs=<���3K(]l����Y���Ei�'��2�oFear����"�oh��R��#�l�{�G���� ��%u�"�1����������(>�"�LI�|
�a��^5���q�P�6���;�<%LK��R&�%e^01!���	!�Fh1=$�����J��N. FL��A�`�/�fj.oS�x�Y�{�<Y��(���2,���h��&
�GCt{�����]��/�@R���w�n�/����P�DX�)��jNm�$�3Q�!S4��:
�Bt���.��u���7Wsz��U�P=="�<(�Y��t���[G6 �g$_����B��do	� ���V�)�!��Y\E�7��}st�������;�'?��)�G�Y2����a�=F2��E>"��z��7p�	<b��DH����8i���#D� .vR
���	���K���?�Q'��Q����Z;6�J�Yx*IS�����x�p]����Q��������'
�MRp�yV�k������Z&��<�E+���n3u�,�hM�i�&���J��0/�;�Z�	��b�#�z�n� ��hiD�Z��N�]z�'�s��4������Uz�	�?���
l�7�A����JF�s�6�6�k�n�������]K��)c[��� eI��D�=.�xPt'Q6�`�
�~^l��gN&�������?���$2�KBi���h�5	�1Z5@������}���XA!dVnZ\eK��g�='�-�L(J�L/ML�����U��
]�����g�1��l��xp�@�����������<�4M#d���tx����0�	vW8�2��p���;
W����u8Jw5��A���
.@���d��=��C�����m������%�J��e4���}w�N�2���u���gj2�vK;��X�av�^9�1U��*��=�z�g�.�i�Ht@���J�,]��6���6�N�rr�uf�D���'6��e���`y��6����O0Sb&�n�d�O��*�ng4���U�(��n?����A�_-�v+0O����q��E���,�s�Fql:��j���Y���_�|����h�6<��]�{��Z�����9����1��d��u��"�����NI����&���K�J�"����?�0���C�M�� kEC�8
�����������-��XN�9����
�0�R jv9\Jbg��t�6~`��W��>p*��0��m���i{���(p�xz�����{s9���N$�?,�S�\pZ���g�����w`t����Z�(����mCA��]��Iz��HiB��M�,=��&�K���!	"�d�������l�('�:�
��5~���r�5�����:��7djQY��E�4\I��! 0a�����|���rQrg������BH5�P��5�M����'����� ��2�`c���ce4b�A��43@�ht�XL���N.��/Q���C�E���~�dE��<�DC*r��4<���W���q���O�R����r�]vVvN'!��D	��+p9��1M�&���e���a6
�i�H��D�l���o�"�������
Kgn�+q>}#��wNA������"���_���0%r��^A����nG�f��kG^�R�8���8"�����p���e�A/����=����JI�uR�>����yvr��bGE��@�M�q1�`�GI�
��@$������L�r�����V�I�j*�iHii��T .���/�;�S�� �n4�@nV�{��Of���-������:��O�������3���/�|?�
��0QF�@-��	��w��b>D6���A����������p�L�{� �;��Q�(�p���+�-i��2��4�z����I��U�~�f�&���lt4BE�{�2�bF�D�M@vkN"�y���������=�r,�B��"���
��HYnj6B�@�=OK/����_
��|��J��������y]�"��D��mf/��Za����?�VH���A<j��w�vK5hpk���d����.d��S�T��v�g��\����}�g�x�������?=���N.'������z��X'ZE�\����><8�p���7"���INu��}jI�������-�
6i������]3'G/�Bn+z�����m�F;FB��:��A76�o���W��Qu�X�b"�I��F�F!��j���^	�-�Za�h�t\qY�y�Sx���u|>]��a���������w��(�����(�
O�������p��J�ji�2m��}$��/e�\/c�-!��3$�����;��j�����1o�!�I�p����L���U[$����BTK�[���{���9�,1�?5�����a�9G\
G
��a���M��l����9��9�a�0�8q2�����t`���Q�m�@�+��W@�����o�r����k�n�:T�$��C	-x��V��sn�:�����d�6�[���v������;��n�k��Mri�6����������?��a�ll��IP����K�~�Z�����o`O�w2`�x��
t%@���]�<�k�b�;p:��0��h|t�8��O�a�s�"�<j�c������9��E������>Vyr*�����d�:�s{�6���.Ko��d���w�u!���
.|� �{�j�T�z�ij~�=!i,�c����1����@Q`���]����$Fuu�LbA����d���H!P
7��p�w�`�Ww�%��W/4�1�q96�.���E>J\��_>Ju]7O����U���V+�������N
��B��bt�o�r3�!
q����R��	�-Z$�}�����p|)�������8�7���di����1�u"��!�����E�[;��H�����
�@��|J��%[o<]��v%�XoS��
����p��Z����c2Z�
���=3$�d/���'kE���D�����x��$���s^i ����p�`� \��/�@:
�g������]�5���v�E���
��1/�{9���
�l\�� y>��r�^8�j�F������$G�9]Hx�L�7Y���������b*�X�V-J>����$�F�l�����r�S'>�Yf,�`Y�s��,���j]d�������U87i�oPR�?����uH6�kVg�A\#��#g21��&��~s_"z������7��b���*F�M:��`��~��K�z�x��X�	{��N��n�i|�Y����8c����>���Q�����o���m6%���b�\-r)Ohm��E��4�+�R������W�U|dA��)��8�
N������Rz�(mi�ds���"~�lS����$��~q/zqL0A�M'!��B��-9���������p�`��`Z�+��`�oB��\U�+�3�C�A�%qx>��d�:����c�~�C���`���-��9jm�����D6.�g{��<f���c6Y���<N~�n�?�.Q<5D���GH�Egv��iMZ�W����Za5��������)&D�j�\���C��i�9�x~��T���5�T�u��{Un.����N��~$��E��4��?���y�ON�	/���������*���J���'l9��OHw��Hkww{��lm��i	������ k���c6�`���v2��y��M�kt���V{�n4@[�X���F8�A�����������*��o���/���Y�<�_`n�H�����fP(���9'�������� {_5�^��I�,o�h\qQ�'�7���-q����X�Z�sqKq,3��i4����Xq���N���C3k��#����VFh��
p�����?+XN�Z�����f�mF�kL����e\e�F����9.��������x���7t_��|_:������;�Kq�e����
�b���K%+&�@�5~�f^�YKS\��Vx]"��8l�X��V�b�B1��iH�X�'�A!�k�H\�IG���i�P����E�#e�b�$x�e�ATqOk�B��qp�u����I������M���M��������i������Z�f��Y5���3�K>�5��K���/0Dqv��i����9���W�
+s�L\n�a��%G�C0��zy�H����}}x�\���n��PQA9:>x�\�G����RK.
C-�����Z� �2*VDN��sr���QU�`yUn����871�7���uj�j�AC]�h�$��}��}��~�ge���zS�8�=���(/Qo@�[ss�C%�5�v8h�p,�q��,����,�������V��%��E�i(Q��NR���Y��*�#�t����'d��sU���|����y�3Y3�k[�/Ca�8���A�K��|�{5��\���p�K�B�v;�_/����GD��"S���������h�R�����$t���=�H[�I������q�y��>�����=���`���R��F�e�dG�Hs��U���`h�/�06�Q��u�+cSc�A�?��g6L/�J(�YeQ����1��a����R��m�',�Jo��w�����6���ES��N����E�3�9�'��!�qSUr�������wC��5I�#���$��ZPB�U�P���� �o�9�
�76���N���hqvN��C�=����R�k�Y2Ui�/�z`��1��p,���t3�4^1�0Y'?�����Z�'G6D	��5f�XM������l�������i�o���u���[c�GO�
���v�bw_��l����\��������5c-�������.�u�fN�bdVv}�����yy�����������;���0�9��cm���T-J0��������c^�K����:��!�������9F�0�n���3*wI��L+jI����@��&.�'��;�F?$^	^�#����d�6��{La��d�V����'$��N�G������U���(���[���3��Q�������W�
$�P��'���C���#��<���a�8��3������b��
�	W{%>�L\������������j��q���BRw�\2�����3�N����(h,Z�3�8DH?�Hh��>��(nn���c�t��,y�NE�Q�3O�K���NTg����pKBd�n�C�v�+��T`�����!�����FN��
�9�qGg~N�{$et�kSqM��6�~���R�Y�E��o�M
���9���q������������9L{-��Uc��y�5������XW��{(�'%\]�6��6����SI+TCE��hB �q��;I��A���;�>��%#w�6:���N��[f�$a�Z��������W��0�^�%'{�8����[_Q~���lB�%��q��p��BR��4o�	�5"wJ"7�R��I�X�9��i
��"�$���J\������gy�!��c/wf�G�u\����H�PT���aA��!�\f����6������m�=k5�5�2������f����@���r0=h�������Vw<���1�X�������'��u4�c6+A)� f���c:c�XR8���
�q6��c���d�Z��g'�����E�')m����D��#J��������^O�sz����������N��rqzC�bz�z���W:DA�����'
Cu����������>���~M������pL�����C��E�"{:��"�Jq"��'7�,����.N���9��}<cT�%�N����F��nd�{�+J�3S����"���YQ_>k�2�+��wa�����l��_d6Z���~4�K�b�����B?�YiE�L����D��~~_�r�c�����R 5/����#��p�'T-��)�����	�p3w/ �-b$�����~�HG���Y�H��c
w���L����Yq7�l.~���)�Pu.���`�N:�L������%8&8n����t0'�����q���MG�V���o�U���KB�0�j�5�rU����Y!u������5y"�������zo/%:�4�\�~W��Px������ ����E�L��x+i�J�h����?/Y�9���_ly8c��������/%���Sv���E��O�@������Y�����r�yz�5)�U��X)���4��97Uy�� 
E����H�Z��������vvx
��w�::�j����tvP��3X��=M���������zh4���+��L��8���Cy���.�
�lbY:�^��b-�M�f��n��{x���\�Q�=�h�3������yt*?��+K��J��,z��jM��9�V���,_���!~|x�!�ef�tRg?��M�m�1~����uiO�%��7��r��p@HQ���~�x"���U?�\%s��o;��������8�C����.A��7>?���M��O�����pV���A�T9%���K�[���^�h�.L���%+p�QOr<vB�4	8=<wz�9?�����c�/ S*-�����z�}�>F|�M�	�xc#���Y�\�U�����
��;�.�i��!D���x
��'C��������Z�/B&�PM����K���y��NH����p�������2*���]�P��n�L���q�%���l�,F@<�`!�o=��:j�-��y!���;+� �d^� ����X����B���I���.��8��<<���0�n7��J X-[�|��&�����5f��p��L�l��9qH����7���\x<�w���nLZ��7�WV��5��Y�wRt�c5�`��G�6���<1��
jw?������fw'0�o�2�e?�%X�j��$j��c��r����j�v'A'�VJ������*�v�D�b^��l��3�s��+�X]��4����
N�\����4�/���j�
��C���U<X
���i]1��"7�r�����0����3�F��6,!m�+����u
f������p�������#�QLW�u5.fF�q.8F��4��jVq�X�~�',k��5�h�`8	��8F�um%�Q������ ��?/q�����{�s?+��V��E�����P=	
!�1��lz}S�=�V���	��C�Gl�������j�Gpm�Z���$`Y���������5��X3�oEtF�`���W��'@�X��X�wd�������Y�x,��$�)���9��6X�����A������X#Vj�����T�x0]
���������C����E�e���,G�#�F�#������i�n���� ��*(���[�4�U-.�p/�=������'�������|�{N))�,9�o<7�$���*���<���J�v�a�0��P5�M�JW����-L��N~��uq����/q��nN�K�!,=�-y�SOV�6KD�R���L,F�Dh����I�����'#	����,K�"�I���U�:����i��G<���i�(����B�7s+\r��h
������I�q0��yi��g?����ow*8Y���~�vO)*�*��1��d�����"�A?�l��uW;�F���[Mj|���(O+�����{-7���ywt|��FK~-~��'OoW�>o6�o��f
����1=��v��)}a����Pb��{�r��B��?��>3�0'��Z>��h�%v�d��=|��}�# �)�����?��T����?E����g;��!����va����=}$��h�
T�$4P>;���rSN�%��`�h��l8�LOOON�f�v�dg��+�����gxB�
`��
<$�W��B����3:����g�&"�rC����h���s�J"���a������;����0+G�~������EX����>wz�C��Q�O(���~�v�"Sn��7*�������Au�]'���n~��3F����	��x�VT�chU���)��������q��g��kf��%7�0���R�t��e��Q55�p���jUQ0����l��(.��M?�7�7�xCg������`(�����,�f��C9���a��M���
\�:(h�,=�?�DAN����T"	)I�p��'jpvx�:�p�������!_��Yv�h��}��4#�aj'E}��?����uqk�����8����M��?\��;<2�������%�!P��[{�����(���3-Q�_xk5z�[��+|>9/��N�UT�2~7�c�)&0X�'K)������ ��U���"�0x�I�#)<��/?2f����,O-�l�h���k�'z�����Zm����,�Es����������z����A3`��x����8|0J<H��2�d����YK���;0��[]�A
������Ev?��0�4�7!��	D�/�JM��j@���z��#*����!��#nq��Y/i+����r�r��Ph	�+�=A�'�O��3t=�����C�����'9�6Z�c`L�q�NvUj=Q%�~�I�PP������/J�tBv4�Z�2[���S��c��#��'K`5'�
����8��@�eG:������pz�p=>:����$X����������^������t��
�P��=�����?{��O�o�gm�S���"��x9����������)>R%��A��Q�zy���<V�#z�yt����xLo��y{��n_�R���%�I����>�������1h{G&�?�>}��&��o�}�i��<�����3������&E�'k5���-������?�S�"�v%�*�Cq���DN;��]f
L�Yy��(��IPR��3a25v�PJ�����y���*O�K�s��MX�/q�O.��]F�L%Fi�m�|�p�kc�C����v!X'�v��fs���^��e�##G�G��������a�v3&�D����5��?@�/�����������{9@�xc�F*�$������<eo58�J�.Y��0^P��1h})�P/�8��Q�c�)1�?j<n}/0r���+�$��i������U�*�D`ql��6�3Y��G&/N�5Q.���M�#S!�'��c�<!%��]��l�d��Z�P���
U.������UI��<��k�A��?��!�����R:*R5UU>����y��U��vA���hO"���>��y�7���j�f�R8�� ������7�%�\���A��V\|����\Jf���_����s*�����>�K�����O��x��:��1s~{����3���T��{I�@6Z�����������}0����s���#�x��n����@���KXi`�Pf���-��]a\���@?������Qrs�'4���*7���`��y�0�|�F]"�C��SGb��U~$�)���\�:��?��NVba��������%�3�U�N��Z�z2zMRj�i��y9��Hm�
7��������O���n�^I<���'������aV%�C�
�!��������^#�	�Iix,�L'�`+��\qTb�
�F��jtHB�S�Q�bZ�J���zA�_79m;hBa����%�xy��$$lRz�U�n��DF���P�y�I��2\��)��	�		�{c�$u�Ocd9���`����#����b�}�B�?!�P�~�P���Tf�T}��zY�]�V���*�r�J���.��85��s"j����)y�a�H,�S�]r���|�>Y���>�������L����E���m`�B�����$�q���2��E��.�����L�b��Yx���#Z�	?e���N.<��F��+J~D�$*�Fy���������r�^�@��'+d�~��v�W��r2Qr�I%����<?�?v�g��yW��}�]�����6�b�JX���7���_��J�O:gm�U_�;�v<����O:�Dl�d�]5��s��F�U��w��]�_b����%�RR��y��*�p�������*����l6OH7������L�n�]Yk���i�h�w�q�����%����B��&��^�����6��-���T��J-+��6��(@��z�<A���S�B{�[	v�j�DR!>B?��}�JoE� ������'r�JP�JtX���DF�*C0���Lm�`�>K@}1���(DYW?/��������rt!�
+D-�'�Da~�{y��g�d�����"���, t"��
Z,�(���>�������7I	�d��(u�e31���Z�3i���5��$�x�����@��f_��C�Ub��bf����,>��r
��/O�^fIpZ�iI��=�����]+����y�I
��,0%�'D���5�#��.{g�Y����-����C�Eh��s�]*���J>���{�:���� �(
��b|����|��StC�)D����$<c���1�������\n�9�)�k�%�FP��1~��"�'Pty_���I�o���_�����{�����>���
��"����8vl��������3��y�P�j��Hz�kw�������j������b>�	(E��Ge�e�G����t��pg(�N�\���51�����
��+O�:eeYv!j:�6(�Z�19������F�����2�����F(����)8���W	��]@aia8������g.��-.��G�3p�"�um�k�D�TN� '��@���������A� ��������r�xy�.V:�H�N���F���"��k��K����U�j���c���5h\���gw���B��������U\N�S����o�RS�0{�������L���9;|}xp�~�����49��yuz��\�����hfy���8W`����}�W�Nl��996H��5����w�����f�?>B�i�9��d�F�^q7�$����a��)��*��+@����XW������:S"����Q����u�[$���<5?|wxz������?~	<�����[���H����vS�)z�h�)esL�T������`$�?����������cW�W�������3@������t����������FS�k�v����W{O�)%�O"@��&��j�6O����mndw-�7�Z��+�3��$+>E��TR����w����%��o?��i5�O���������-=x\��tSI�h�A�-��v���R
�;�7�a^gxUT>_x����99�2xw��Z����>.Oa��?/�[���]�#�����������3���hm�g��sx��(�tB9�(J�n<��;d&��+� F�����;J�Y��Uf���vk�m�Of���j������M�kkw���{�
lV������0��S��.�p	���Q��_�;��E�.�,�L��5����������P(l�g,
J���d@��u�G'X������I���w�*��s�I9�W����K���Q��%�
�X/(�?�w�������L��O9�9���B@
��bl\gJ#�l���3����a����������p�]���E���?9r�7�1K�*���:>��b���
.���|�\~ w+C���#z����1��r��GJ�'�=�����j���N^�
�������>a�A���(Y8�@>���h��m���O����z$R
I�c�t{��r�N	]�>���2F6���^0I�<�,�Q$5���K�"��+����d�Oj�n
)e��ac�	��rd��nSO���d.M�`(�N@�K�����Y�zL�����Im-�����f�In�[�
���7�*������
��1�@�	�q
�������+ea��UO��E�-���[r�n\��J����/\�fJt6�eWT�z%�|�rUq� �1^5�#;�Hd����=� � �SdTN"
�y�*]R�'�����*�g���[�
�����/i�����G�^$V����Pg����h�p��e_������A���k���3��hJ�� �k��������+E&���b��
`���E�~�~��77NE7Jx����kE�8���XLs
w`�������&���\j��<H�)tRQC��U�k�f�*�P�S�����A�7��E�B�jz���vI�����!���Vs�$�������e�hn�H����Se�ew�?0�^��
�����;A����^�J��%���<5������d��i�M��z�s�kr!�Y�m�Fi@��x�Vd{P�7����L���{��E-Y��=������mv��%��7��gh�'� Wi�����O~}�J������8��V�������gP�Uck�;vA�3~���4uo��������1�����@�7�	'��������� 5�F�us���\��
e���f�������}��1��Pa��������4x���p�Q�P#�/��A�OI	�
	�3Gl�`.����|��6@��J&�1%�r�t%�=l9�5v����d�m!�(�#��zQf-���:��P:�X����^��g9D��$���Eg1$�z�gS{�e���p��$m�������\��,��/��\V���]R�z��1tj���+m��t
���*y�B���]����4��|�9�t�.�#�2w��Nn�������M��t\"7~�e���<~�Mf���KJDG=�)�����.(xFO��9����17�����[��0\�O�dQ��(���P��9�{f��9;�/�������f$|X�sjG��E6D^]��Z�����^6,���3���Z:�L��a/�-�#��FL������d@�f,��PsH�>�IO�(	(Q&bA���/'��}

`�����G��	2�9��n������dT�����N�Moi���Qo}Q���pW�EA�c��V��t��4���r��Y!��U�����r!��gf
�3U����,��d{l��y�����a�D��V��(��,�."��V}[��E����w���V%n/��g�q^�P|"���I��D�_M�
,������@"�i��]�xp,��w�U;�p���@�,���.�=-)������j41�b�����I���j�P�$gPP#0y>��0Kv.�%5�������	0<��=�3��.��j�R�����"��������"W�4���f���=��RQ����in�Aw��@���n0�gd���71dsbL�/����|���eA�<�0�?aj��3��#K����h,�W�k<0�{�PZG&*��zGJ<Em��O�u��3�����ve���&4TlxUVC�?��
�K��^��[��/s�@�m�]GYl��DRv�BEj�8<2"S��m�Q�d��-^���y�$���"YK�c.Y@sR�'���j�����`NuO����z����N������;��F�Z!\�Mz�)�y�d	�I�f�����E��,w�c���'/Ix{|a�#R��;�M�: '���~R�A����p����!���N�	P�.����o�>��3��Z�?uG"�.��%���0.���*��4x�B���E�D�����l�Z"	RY��5�\��
$��t,)Q�X�O/�PX�'�+�Jx���M�[RhdoW"��+����������0�Ge�.aQ�M���:��&50k�������k�I���!���@a�3L7��El�U�B
�B-(Vm������M��ltBYl������������c��8VY�0�yt����0-^�D��7�*����K��J�}�o�h�#2�b^Jp;���(��W�no[�*,f��5E���:�s��/)T�����0�$
����^���3E)I���.V�B�:��r�MWU�#��=���KsLbR����_mb[F�ef�������x���2�z��`�+U:T���1A����N����a0*X��9q�3���\�Zp]�AY�����2�V��b����
Vc�����b����;��H/���.�]od��r��]C�N
f()t
�PR�`^
������r���@����KA��
.�=b�����'OHi�>�Na�]G�Dt��-�n;<=T��z^GJEU(/u�F���o��
\�cC:*+x*P���l�/��'��.a��>e�c�������wY�D,^���y����>2�j���2!*�o���z�f4��e��Q0�e��u&k��6UV��F�@��q�r��<��S�a`R�A��D*��(���.�8�K�V!T��IEi�f-������4�33=&��6�d��>aKh��%TN�6�x%s1���d.�(sT��A������Y��:��EwF������#��U>o.�b=�A�m���B>��>�a��6��Oy�Jw�nJ>zp!JZ���+>V-�����������55W|���������|��f!����E�c��fQ���i��.�-���X�����K�]�zg���P9�� ���]%r��@�Hu�r�:�"'��\�������\Sn����6�tD���v����0�u�f��dr��}���<�SU�r����pq��W���#v�0����y��_���������]i��u�%���!�,Z!-��
T��[[
��#��M�X�i�s�@���"���x�;4R��5�fz7�f���#'�LWM� �$ ]:��R���XJ�C�U��P�U�jh�#.E�q���0�%�1������XD���������X�)�o�
���[n�|�$#������|�����8~�@
A���=�;&Z�b�7���O?�g��c�6A�����h�
��#J��F�A����Z>�:���78�*8:��������"��v����CN�K��4��'*��X�T��h�je�T�U�qQ�S-/������
��1�tX	�x����,���?t��\�~
�iS�TQ�
W7��V{��p��\��]F�6�L�s��V��������T�[j�q���������F������gX����
7l>�����\������a��<uPn������s*�u����h*rJ��13�BJu��� �����c3�KT�<	������+B�i�
��Z��63�,�~R`�-��#��/U2��i�zO���^�M���z�3���+����0�r�����oQyq��{����������E�M��({
�
�ra�����[i�q���Q�R��6�*��kT6��Q����fd���������J��v1o,.s��A�X��E$$��~�z���i�d�	w��pz%�MG��	��
��=h�cfd���~��qn
'�D�H7b�(unTZ�Z��V9��Q[�~�������YH������t�a7�1�b�{��SYHPE����Ic
��O��D�8x���h7J����-{g�|��4�������2�4��l�n�����:�n<p4LD�"�`G��&psH������Y��SJ���Y���q!��x��	r��6����o����ap@?��a��K��s��r4��[!�V���!"�xZ�k	�N�*���Q6J
D��k�P:��:Em!*[������m@q��AHP���	�xN��:���o%U���J/��$�Q����'�Y2lXC	�n���)P���'��\�@,���n?�Z�J�w�]�������g��@eI,	����H��>�?&���NG#������,$�5hKL���-�����5�\��a����K*
�'^L�Q���5
 ����h�����J�$^�U�w�D��<�l��0b��������N�r_����F��Qs@'���^A������(�L�����������_y�tV��PXH1�'	����Fx ��S}��0
��"��jh[����"��# S�,@{�V����/P~E	�n����N�����9��X� 0�����8�I�uw��E�<�}�r����X���		:���B�cf���A>g�}�d�"�-�M)jGO��#�JC��yTZ�w���7��AXd>��`;t�������7!n~��M*�����S���T�5�feJ�8�>��~���:�H�}U,7�g,���x���?�OH�b��LO�}��I�V{a�t���:���P�����J��UvUg��;_��sTq��;*�F�zpAF\�mT$�+��>��R���9��W������U(�l��8���%���C�nT��h�b��C��0MJp���P���� nBSbO&xO2v�g1���2��/e;q����a��pv����G1i���_�����c,���!N�.�j��/��Y~���~����)2�h��}��^�_��
�����<,M��Z�� �L13��)	I���Sr2��{N�0:n2�i��n�q�p��`�G
M��X�Z�h	g��������!W��z��nv%��Ek4�����l{9�.Z��6W����jUv��%� �R@o����qj�	�R�c=�j�j��l���L��$V�/������1��>���e.���}�e��a��� E�]�VE�����[�.�1o�S�J���)x��_��?�������]�;r����1��T�q��j�U�}+���!>��fI[3�J�$"TZ>Od��O��=@*�
J�+z�_S��	�,m��&[��2����,����:?�j�4hO��d_�q�6��p����d������}�s���O�V��mW�qM:�������v/j���M����R�K��]I�Z7�� jI[�cR�� �G|����������;q��)�)�"�������^d,�L�^Xj��nu��I3x�4p�AG�O�2��x�e�D��p��|�*�v��.iF�-�@���E)�G��|TR2��V����
�L[+�N_	�i��0��K�>�������5��G/M�l4N} _���\�D�����~�
{���;�
-���Q�BY�gN�SQ�����9��8<����`t��r<'�6L�y/8�������0*{��KT�<81��K���������b)$��`L�As=��N��MV��&�� r��������O�)u^���Y�-s^ �,��uI�g��������*(��5X������>�N����.���^��������p�{SB����Y��m����`�6������'��&�%����H�pV��������\^]�'����x�7��t���h�r���M�3g������:}3J�jq��
�>������D��4��3&�A���A�O`����E5A��f4�	@��?dw�3����k[�������E�A.#�M;����u[JG�lY����&�D:�����nx:��!d*Y�����$��
}�KVq�����w��Nut����}+��D����}i����^�*bN���}w�yur�f��s����B��[����r�/1k	�b��]���j�0=���o��
�
(�<���F^�H%��O^W����z�;����Ct� +~H|6�W�S���7R�O�G�7�`��Q�(�����}2�qc���g`�����O-��s���H�.st|��N�b�@�����Y3���?R��R��29<==9��%"���PG*)y[���g��)�m�~<Q��eby�&;���J�4t��:��N���_�Nj>5��������\���%V ��&:���.�0H��5�
���SLJl������8N7HX���D���p�����A�����aN3�J��n,d�����`�g���8���`Gb:w�g>.��v�������-�Ns���lKV�E�Z_\�+,�5�Eg0�b�6�#����K�Y?8��MyP���;����||�I
rt��� u�
���:�wQ7U����,w�<������D���������C0p/��)�Fd�$�L��^����<��:�?wz�9>9���{������e�7P�:�L���a��1vq�s���S��h]:��rw�Q�n7���r��Z��E���\P��p-���cSI�%�o�B���4I���}@�	��1�;G�����K��>�������k�oN��(�$��&H�u{xJ������[{��G��o@���-�d,��W��'�����k��
%���3��dL�}FG�,4'"�������v���T��Cv�:�}UFs�>d�\���I���B��t>��>E-�\8�r7�����q������moR�����Zq���/'�U���������0#�u	�LJ�i��[���
��7����]����?���������3EbE�Q��$���o,���3�����-XK>"N�/V]��2	��MY}��rm�Q��p$k���w�V���Y_�~��"���Z�kMn����}��z��v�*���o�+��O����.���\^�-u�����lHR]��}F��`T^?1��V0���Z�����U�y|�����h��������e��e	%�N����������-X��MXZTh�~1����"���M���96=a��q��������N���o��q�S�0;�hX��c~5�8=H���t��}���:f���E�Y/ �q�"�"�7gh����c��}�H9��]D���J�����W����������������q��/��Q�+����e	<E9�A]�#jf��X������F�L"U5��v�0������@� l�[&��!�^�Cc�4da����G!�-z�H�j��=���)e��,$���T�|��6����@����!�lC����,��X���,��R�
q��'le��ik��������i��z�Owk�PrY�����R�tX����!_b�������<���;R�^�qZ�j��Yop�	�n���/\�|��Qi�b�+y��i.A�@p�?����yf��K�W��fIKa��cZ�*k-��)�f��=d��j9P7�7-�D�{�����D����V8&�9�O{$
g�7SZ��%�U+�R+��s�;PU�]Z��(/�G�q��[z"1�������uu�����</%�3F�d^ J?�|������7�bw��T����I�r��m@�e+��_����t�}��`����Hh��u>��'Q�>c0�^�������i�/�i��7���M��Q�`xi��V
N+�^fYHxm�������������?�"$������|�?�&Xf�Q]�;&����B(��������!��I_�cX�l�'����a���m�e�����r�����r�����r�}Y�go;>���xV-2�����m$jbgV�
��MJ�����4T�VB�l���ia�"�As7����I��|���������e\���m����-��4y�&��uButbrP�2��#�r��H�@�CRA*�S|�]��N��O-7+��T������o7g~���������eP|zx����s|����`���`u�L+��n�T�^5��~yr|h��M����A��yy���o;o������y/�����t���`��V�� a�sV�{�.���C

c���b��'F���������c��weT���r�����%��5>R�����cx�}�Vj	�![{!���G�����JU��_/j�*0�x���8�U���KjZ����9rB&>#�<���*����g�:����A�!|�}����w�W��G���5�n���v���h?�-�����8]EI�$$�k6��`L��p��1��������L���+����TI�q��;%��M��E�,�����������8P�#D$Ll�.��'����}�'��j���*1b>~�?=>:�C9�8>��_~�C��9r���w��9_����m<{��^�5�C�m���W���O�i6���h��8 �;������7kk���O�����M���:��/�F�\T}��G�n���h5����m�j6�77����[]\]]�����Yk����]�*��G���(;�T/�i2!�!�9,�u:J�����c�Y`���~������&|�����-��><87c��b�M���=�d��/�fF��+�4���--���f/�;��	M^w?�?��Z�_��+R�&~�:_����+��J~(��C�������
FWHd}���I//���%|�M��e���B5c^���1��~;�H�D��������K*�������7�����^S������_�D��(|��� ����+>�[���xD����g�Q��B�,���E����`���7�����8=Z�����ZuNh���0~8�}'P
]-\�G�TW��Y]7)�C���Ve����h���u��`�|��5�[E�A����m�~��j<�b��Ix]�-��H"�|���<��s��n\	Y{��������Es���i��;;�"�|�%OL.��vc���0���6������'�&Q�����V
Vb=��&���($<�F�5z.���|����QWQ����)�|O)��3���AYaw��e83����En`]�A��_�2��8��\��OT�a]m�?���s[�(<����dt��4��?%p]���
�J�����_NsGig	�(ZX]����Ks_:XZ]�O#(/��c^c�[����w�7���]�i��]���\"��7���8�X�
�����D�V�s��Q�6�[���9N�H�R�������c\��i���.q��&�M�n��K�U���������v7��Aq���b�����(�u��=�U�~Zo�#������(k�L�j�����1+(Q]�0� =X)�������t�p}q�z�c���n����R��Rj^��P�>�X�����B��|%��[X�x���n�M����j�����@iE���VvDkaa�����������-��@Y�������0�h���0������������=��qE�a9fW����c4�pk����-�������}�|����������R[��J������Vm.������u!J�,n��.+k��n��F��f����\\��0���2$
l�l�{D���H�_�c�F��H�j{fccg�$F���
r�Yx����[X�wP���i�����E�����4Y"O�5z��:�����1���������y�B��H�w���709�����5K��%����v��sI��w��_��A}����P��������r�8���0�V�!�^�%��k��I()]��xC�V��-�E��E�2K��x��TwX=����s���y��������V��7�YP�A������rJ�9��s'��vrn�8���-�vK�s-��w��b�����CW\�7%<|`�E��n�v�Oe7p'���Z�n:BB<����O_�/�|������K�4���)b�����]�
�9W����������m���_%�����H��������l���(�B��7ZH�W����f�eT��u@�4�C�vD��R1�k�$���m�J<�r�*[.2�n��L��>g���3F>S����-xc�eN�����j����j�8�c0X��b�-��1���T���)�����D��5�VN�E{{s#�6;;;��^Z}Z��������z
��*��?�e\���r�u:�������u�j	�p}eq
#e�3k@Wi5�2A1�������j�h]I#FA�N-f��G�.�����f�ce��u��J'/�&���������&O-�n�2�o���o3L�5�C���~:�p��yt~��l�)l��a���;�|X��LY��3�V)Cp���5��W��J	��:�^A�#����"Wt��8X�E]������|�8m�GW$�c�UC�
��A���E�-~n�^�Xs����=����	�|0uFQ��
/�s�]����,D���.�OA��7�w��#L��BL;Q{���b�*C �/������U_,����H��r��l�Z�gz9��J�~w���2����4��#I�S�f�����k�����4�o���`0��i�pX=r�^TI9i)U�N��������DK.�����q�N��@������]�]�B�(�L�Y���-�#�P��^�a����3������V�(%[�Z�N]����sR�c�B��!�>�0��y�����=D�~����<�A�������9����j�,t�#�a���1����P�q��gd�����C�I����|�����s��-���>��=�7cr0����n��c�1]Q)-��|�t|DgQ��s�R��<y�!��az�p[z]�V�Vs��}���G�J~�y��`k{�ia�ff��h:�->�iW�i�������{T=�b���-`��
��6����(
fGX���BnX6]T0����7��tx~�Y�[

�BM\s�t����fu.J���#h���aN�T�_����M�c������R:�2�M�Tb�u��UR���P�|�	�4����{����r%Sp���T�#�@�$���T���/�^��G�O~��~*D��"���?�&�7�n:�xyt����7�=��GsU������g�^���~�h�Gj�~D\D^���l���C���� ���W�]"
�kD������K��C@�1���ft��2�n��DG��z{k!r����XH�2��ic�elK�m�V�@Pv��IX���5A��~y�VT4p�6Q������]��R<��L ��?����1�1mG$��h�gR�$�|��L@��`�����)\�%����:��-,����F�w>��+��4xN	)'B��.rr���]��-�-K�
�
����Z�
�0 �2SO	n�17�/�t4�����d��+�w��]�8f�|^5R��[d����?�[�}�f��$���@_C�6���0{���
���tuPcUVi����:5o�9��J�"?s�����}��}Pr�i;7���oom"������^�E{t���q6�i��������*�������G�cU<^�GbBxd��O���P��T� (1����"t���O!3���K#t�v�:�c�+�BA�����M��p��6g����U���dK���q)A��Bc�!@��3?�A���������y��}��u}�<�����g%�^f
��4�����n4��������>��G�6��
/s�7���L��������f�$��������Cs����C��3	����clV��P�w���U\W������5��U@��AG�����Z5����w*���h�g�:5.�U������6��\���"�OH�p��?n���a]k��C���0����$#���8�����9����!�&���([i�8(�PF�/q-Q
�y����/=2�|��=���v0��p#zD�#�0���B5�%�i���hH����!���g0$�J���n6�xe\���e��'����%�90A��b�j�	�^��>F
�g���p��`��lr
�NQ�a��=����W_nx�1�4c�(�� ��	ye���3�\huM���>'�q���������9����&�F�~��I0���X������~�H!��Ay��y5k8H�����?���I���� NJ@�h�������O��$Z�
/S�Xw���n���7�h�'�j:�����x�����??��,�z`�E����������`����������Ck�����}m���K���-Rc���`z����}���q!�����N�;���&�2sK���$4^�7)%��������?:6�����Z� Af�����h� ���D�d�������zwx�g����<����O������i���`<�=3I����(&YB����xD��6i���
��d�i$7�������%�_Aq�,��<��s~��y�]�����������?������'Z����/������"{�+���Vs�q�[�e<���������p"f�v��|�%x����<�ov�|�|�;�f������k�x@�? �����������}��mW�m?Tw�����+U�$��i���xI�R�_����Y���/���].�����3Z����i���Y���F�1�������#����r�n�\P7�R��q��8V.Z]}�.����*�\h���
��k{���7f��m���^��ZT��9M���a�?h�t�����z����l4v���^k�^�P;�"}Iv(����+��0�����,q5F�
����h�'�a��$oH�=K�/���K���&������u�#��������$����~T�Pz$x"��"c)4�M����"��]�$�z19/����S4
���n��R�����+�>|u�jy���I�n��o	�x��F���0v���(���B!9�O�z���V��>}����z�p��8���t�[��&�l���3L�5�v����e6Y�i����=��Q�/�q����������o�b�:����U���oo=���l4�����vX��������no�7`}��9��tq���
��T���A�J���������f���^��Gkg�������{|iG�������>�[:B��<�1~	�@��������P���}��j��?P����&� =�?�J�?�U�{�����������/a��t�N�'�����Hw���7���-o�Z���O������j�B���MA�4/o�i����nl5�������+��� �^����?������_�Gs{{�m�l5����Fs�����V{{gc{������w�;�f�_f��nq���7LS>�����E0e��m?���.���������vw����z��.����n��n���c��G�,�1��l>��C2�Z�f����u������$5_O�����D����_�X|	��y5�<,�Uv�[[��z��~����6��/��^�o�3��[D.4����M|F�?����e��ch�,e�D747��Ez1bz���j���/)	��N��j���p��0P��D��=u��
��@$3�*AV0� �HV����yb���?@*(���{����X\<�S��<���4�z���L����7�\��F�4>Y�����0W��7tL��(r8����������tP��06#;�d<\`��6xpH�T���9���pd'� ��������a~���iSt��<`�6u���s��~z�u�+�v6��y�N`�	���Yp�5����u9�8,���XLY�~
'�6!�c�"�^�QQ����V-p��&*�u��!v������|����
L=%0�5�p�N���������,����p�?�a��Z��M6�������w?J�]\e|���3Jv?�.k�$��C��8n�7�����������������?�E��v�U��������"��	,�������1I�����yF�{8L'��#_��$�'g�>��,K/k	+u	�xzu����O���}����h]�j<���_�Y����J�2����tf.�p��0�����p{��`��kv��B������m��o����G�� �~�F
+��x�����RL�[Jw$����w��B]{tq��1�a���|�g���k���)]���]R���dK������	.7O�@'�&YNb�t�8��9�������\��TNF]B�"��>�k�9` �,��wt�`Y`���q�h���[�J�O�K�>X�E^x���;Z�+��gb4��A-y�`�l�hr��2e���I������^��
��$`Q�&-�K5�����!q��l.a�Ff�`I��"s�����j��^-(�s��7a�������v�}�����jm��S�&�S42x��E�HQ�2��a��,�d�����T4$��<x�p�.�M}�!j*���[�u�k�Q+p�(�3H�C~�C�)���:�����!�������������l��I�q0��Iyin��G��iX�>�8�`������W:Y��e��e���^��)���Z�[���J�}�:
�fb����$�*P5�����(����k�3@��-!3q�d����8gDC��h�#Az�
�����j#,�[&�C��GB@���Z-���s�B�|�Z�-��.sj����r��E�eKN��d{��	��$C����t�p,m��\�|���p�@P��+���]��,8`�X�.1yF��6�\��a����!�}��s�Vi/�qO��D�H`�@��m���:a+=�sE�(%)��8@kd/a�������_naI�8�����QvI�P����s�$��T��~�~����}����y=����d�����@b���
����������_]�~����0����)�Q-��������i�0��������Z����P�o ��Ap����gf{0��8A{c��}<�[��Ynjp��7]���y����,w*j\����m7XJU���V�4����/�6��A���6.W��	�<p���x�����r)�5���ZMr6��ht�4��r5'O! �U�#;-`�6�'kp�Y��b��6�Cw����`���^�z���.o�J��N=�������g'y;L��� �6�n�A�8��'7S��"�	���*#����_0�nw*���T3�=_;G����>�*�|Qx�e�'������Zu�Q�$;���t0U/}J%Jwe�������Ro�w�tcvi;�����wiV;�n�*I�'wvq���5[��l�
x������|���"�opw�^��t�A�"/�q���f��e�'_E{�>�>t{����0����I�;X�^:x��t����V�
����8�Ij����yD7�N�����x0*Y,(��
�X���^z���|p�xH@���^�b�W�z��~��+�5S�����&��;U'�?a��
�_��5�c��f�Ejo�"�w8����-��Ty��w�L<eR\�UX��fX���E����$]{����
�~]W����8�`�a�~�g�Q���e�"wZ���)������~����k}����)_��)�6>Ga�6���dY'Db��qM�)�����JN����F�)�]F���B�%�gN�p;��n���Xz���	 �Ft���k�S���mL�]�5v�����;~�����s���#)z���7.\>�_
-f%	N���J���F��kYs�]��;�X��N�>_��=G�6_����;b1�Z.Fi�NTg�S��o���O?�����L������ ��|f��)�G����]�{��>�;U�`�v��v��r������xp5���S
U.���
�)V��aBPh�5W�� ��v�����AJC��(�X�l�v�)q.+5����`W�0���3�.a��c��o�?�Rh��ia*�:X�-�J]��Q�_�y��/��6��P�Q6|=J�0D��?an��ha
n�n�ixf���A�'��v�g+]����z���"�����	��\SrX�_bj�

j�;��	o��X<b��X������B��"�������qK�D�����.J�HZ�h\����8��$�d	P5I�ML%��j���/��e�93dhms1�r���L4/)���m�vd�y�d�<���B�(���{��g�)�"Z��]����K[��u
e����D,��>�D d��'�3>��'&�0����v�j������hS���o�Kj�t��jl�=�~��G&"���M;��5q�����C6s���
�6���,��#��a3�D�"V��L1)%fZ��I�-+���TNP���(E��������R��Y_�&�����a6_�>s��_����Z��������XD��D&�Y�*]����+.��I������eP�z�XylX#���h���C��
�%�������w�oO_���G�i���%U&P��A�������o��Z��u���S���L�$�����j���xFpd��M���z���
��R����&�����>.�1��w�E���yH���gfnd(��-��i4Q5R�=X���l��dCJ8�j�<$�?�%�=X[8�`$���cC�C|p�t�)�Z�^��m�7���f�	��qWG}��Y��
z��m+�>jsm{���
Qw�Ak�o����etMJ�vG�
"�q������!�����I�oo�M:�c�ih����r,�]�OQnoH��_�=��c �}����`=J���T���k�7	m�`t����=8�5�VR���fx�kpV�>�6�h����i����U�n�+�1��LT���RVs*Qd2�:?5���i�6������M��~�QM
�{AB-���9x�����b"���b���{�� %�E���`���5a�3�~��,(����M��g��5j���{����d��������;���NH9��/�S?N�J�����;1�J&m�c�Y�5c�\R����\������������N \�:��
��8a�����L���~(�,A�A2e��t�z�taQ����E��r�FF��b�u/�Oo���L���)ZxL�I�. �D�l���e]9!p5%�Qu`�DV�{U���^>b�����7��9I!��F)oI'����h��H���nY�`;��������+~�k8���z���b}�p�.���r~t.��h�C��k��d�!u��WF��e�i��O���z�e�������(%��Z�����{���J���1����F�<�H�
�����q��v��N
|foa5�kE�$o�NA�l[Ps�F�$w���84�����v|��_�7��+��vK\��^v{0�� �O��bi9� ����1g����5[���@��<Q�C���@2�x�����~X����&����X�T�]r$�I����>��������A���x��ZO��x�d�I�*Yc��?�R�-$�����l&I������u���&�����y��!�kI�s~(�^��^���v,�!w2�����Z��j^R�B�t0��0����wY��kx����!z����g��l��{J&%��yONT
�����\c����d��t��Q�#�Q:�n���|Y�V�3��GC��M����m$p���{��8�N%��T��5g*��xb�����Hd0X�����a>����%P� �RF<<?rc��\)P`��D�����23Nj�kj�]�G1��,����N*���
�-t=���0=��S��Lx��t2��5s��r
Q��J#I^���~�'N�w�8e�^�z���U�S�k���^p�������II{���;�6�w$���y/t��0lj��PU���n:M�w_�&{���$�������4�X-S��e��U=�����#	��_^��h=
i�g�'V�z����(��4��>��X���I�,j8;���O�YM9$
����v�O��x
'�sc�BN!�`A!hbwPHX6��p�@���F��E9"�:����0LQ+B��{�l���F��m�d|��#�r���|x����n��f^�6!iy���%�j��C��_��:"�Rd�f	I�E���������-�XC�����g��[�z���q:BS�<�C
������8�|B:�L8�j��"�F���p|	`�>\�_�{r7�e�x]�QQ��������~�6���3��:+ty������b��p�:H"h�jUP�FG�*�{��q\(a�p[�5��U�K'���K��,�k�Bx*���r�INr�s8��4t8������1���
�$"�'�B�Kn ���6_���V�����w�4�����V	�C��US�G��oX�=����������=���[l���q!�=�($����Fw]e��{x��*[�n�)cq�3<���1�y�^�e�W��V���C��R���Mku�������b�����(��-y8O.]����~P�1v'��
�b:�q2�!kw�O?d���C�����oO����w�D�"����V����/�l��pD�P��F�z<^��t���yT�����)0�B��,�
�����:3`��IZ�&���m����g::�9c��z���C����:z_�
����"qC���.6D�}�O`�/p!��)Z�p����So�c�F������
����T�� Za{�az~����|�����b��N��#��#x)�O>d3�
,�6[�4.��-�ns��K��w�:��`(�8�I�W�����L�����y�����0�	B��\�Q=�����YM
1��*���.,Ck����uXH{����^��W��f]6���;k��I�d�
��5��| �<�.���?�(~;o�w��������]��8�����q�6�9�b@Lo�
��W�F~��G��&��1�l�1En��\��������.�C�r4���Z�s�d����0+�C�h���xA�_h���|\S�6}�������Ywd�4�_D�@oD�x�*9
y:���bq4��3����*eQ�;���xa�����|?�b�Kx������t�B�Z���g�4��I�J/�!Q�h	T��K]���+��������8�t����1j�����x��_=�������v7��M�0��m�[�[���H�#�RT���q}!H4C��P�y�F�������A�v��%Fg
����<1?`�"�
C�2u��u?0�R@�X�����,8d���V�H��4��p���b���G���5�H �N��CcB�@�����=����*l��E������|#�$�\�0�VO�"�]��lb��@N��N{S������y�Pp'�f`��I���N�
p���w"����v�r|����
����sl^�������jk������
���p�3��y1�A��0O��t�D���r�Yg��45[7����
���k����/�r#���HP/�����"���G{�cE�;��XkP���q_w?v�H��W�=��x��Z&f
��r"��F�:�_k�{�XS~����p3B�D�!=��m�	��>'G��d*�l��%x��ns���������a�Q#���!�Y�_�[�����"V
^��W��������hJ}����|�����y��5p��;����8~��	��IjO��B�x3���zvE�,e�<�2/���BCVx��j5G����=�� �T����\&�sJD��U�73��������|M��d@����
h4�����j*_�	����}�7��~�}a�a�u�l0���W�q��>sm�^�2l	YQ]rkr��	��s���\�h>����T����"e����FZs��{5�9>��nC�I9>U&T���d����
�����@�F8-��e���d�n�fV�����}�k�l<�^���f�B> �����#VC~/C����xJp�qB�6�h�x(�]��}.��4���i����?���;`���00)q���C�</�xqFtc?�Y{�$����d��v
�9��W�[���n?�[�����Kuk�|����n�"�YOW.w���
_�7��g��/�y����_o�w=��"6(:�C:���i�����
�9������F��������z��/����Lwq�C	}�����U7KT���R&�����d���`|<0��� H~M��W@�V��$�h4%�'���g��9S���FNnP�Q��?�T����+���*,'ta�D��F��=�����]h�"\��_3z3N���K����1���i2,A�Zu�n�P��BjId�����Y�a7l<qxD�]�M��T'i.{>�o�$j17�E�#�����'���K���^��K��&�u�:���Q.X������8�L�`����Y�W�t�7CQ���Vl�#.B�����}��cU�n�����%�5i���r �T�(�DKS/�H+�,�,��Wu|���i��2x�����c���k��`=G����4Y9�x�	L�o�4���Vl�"������z��h��Yx8yq~)�[q����wm�L�!X�|�u�,��/l76�S�6�7��������d���s3@\!���?X{�\����U��h�[��
moe-37���}$�9�&1�+���VSv����}]�N(����%� ����2{������k�8������k!���<����)���R��%��5�f��.2�&���/3+"��<����K2dd}��E�D�?��$�ayc�/�����a��*RQVa�Z��f�+�������M�<�fK��es�6
oJ�
�|iz����{���;Q6W
	����z��{x2�2��������x�$<(����/�RkKD�j)��A1�:��0�
{�A�[�4q���P���X���+��@��Du�����P4�1/��(+j(��wn��d�d(nK�;��e��x�?���2<�2zs#��Z�KY�`[[6�Y:�C%����$d[=�.:
�.����+�u\��+qr� �V��x��{7�����fe��N���,��g=���HDb�Ns=�~�R���`���F2*80v�s9�/�9��-Y�7��{;��Q�X%"$��2�q�ip-ZX��(����Z�0(J'��)�x����"�R��%%��@e_re�&*�?�.�/�����Q6�p�������B�,��
q)_�'a��J'�U��X�C�����������9rq�~R�(��8-
�7�_��l���YO9y��9_�?�S���/2��{o��CL(
�n�g�!�8"��f9=#n@���04(��?�m�1�i����o�]y����m!�|�*�D@,B3���� "Mm�7������^b��0�o���k_dwQC1pH}���8���d���L��w[K��n�emf-��?d���a������@4��-�����<�%Q
H�T��P��UR�NmH1�Wc���x�n���X�!�,*�
?EnD�.���]�=�����V�5����R�����/����
�R�Pv��Qw������6'��r����qcy6B�/ V�i_��F��������Eh�m����htj�yX-��.�GfW�=�c���Z��r��)��Ke�&�� ��������`��t��c'k���C�K������@-��q9K� ��$���U��C#'n�O��h�|c�8�
�4�����.yu��@)�=$�&��?��{�F��4�?���9'mJN~�0������+^�;J6��D�����bc	�6J���WmM�ab�&�	
��D�1J�a��18�`�b�F3�"�C`�/�c!-����+�W��-�������P�0+
������
\�7��w�:���%�w�W��AxQ�m��,���:�.	�����������B��iV���u���^k��5�"�Q���o=�>��|H��e��S� 1�Z��pN�)iN�H]��&�fX�A^T���M�p���g��7�7	X{:9�������0��dV��>l��J��;�4�o����a����be���a@I��W
����+v�%�Q�jd��AV�V���1�V�gaC>�5j�QR�]�@y��*������v��:�[g��W���G�++x]�B�n~f�R���:[�������a��'O��'�s�K����v�f�{���=sXF,H��lh6�4�LnoX�C2��w���:9�����{{~�H(�����/�9���UY����w�	&|��ok,�Ke���[#+���j+�G�U����s�?(|�C��E�Ry	*������D�QaN����^Tz�n�%
����d@����.P��pJW6a����>�� ��-"���v�H���sj�+7+;�VC��Y��/��EEFz�)	��$��u���=�sW���{�Gk�Z������PY���,��U��n�����0A��^�;�bdC��X�}�����wq��>�tE�.�E$e�u�#��(�(JLeT��O���:���p�C@���RFI�����vyj+�*y2�!%e�Dk	�O��F���{QF[�F|L/���%;-�Ob�����,��60�/<f�I�����Lhq��W�]Ev7H%j�C��n��j�m�)W�w���[���U�F�N��d�c�����-z5�oJT�����U�<�������oI��N�S�3��������or���S����H;^�*�^���O�O?��8M�Z�V�N����l��^�_�x4��ES�S���q�%=�����}yw7���W��~3)."������r�nom�����������"i�������(I�N�f��H�PX/���1�\?L:��������i������*	50�&��8����X�l����xHP�)����������&�\O�mR[0�:�e�m0�A���0�����az��Q���!�lM�99���x�����Sv����+�?������i�T`����BJj��H��E�K��-��)���r�����V|Z^�5}���D�#[	�`�n���p�JT���d�p��
o���������C� �~�)���)=j����64Up���*�r!G�|����9y����Av�4PwE�L��=��"E��eK��B�EL=��^|,�d�U��h�'��!�DY�-���E�����u���y�5-������6��B�W�~�T����P{~���c��-]��h�n'S(MB�~���vo�(��s��=�d*A�Mh�Y/`�o��;#��1��1C-yQ��o�r��+������2){'N�0�B������r����,iS$=�3+�r��K(�3����	��B��)L������/��'1�Qz��p�P��u=|�����z������=�f��Q/Bu�~U��(VT�*lbAToa���ZE��;�IR��'�S����H��xI�`2j5��S�����y+tm�<�U0�r��c�_�F����u��V��{����4�1���).�(!�������f�X(T���#�>6(�p����'��>6t�,������;�����Q�:	�����#^Z�����)Ey�"/h�����q��X4�r���@?�WF��w!����zs	Qo+��8_�|Sz�`1�)����	���"�J����<������u2�s5�KU�E�!����v�������:@8�	�lpz��`�)V�E�pt��5x���73Q�g7��{=��+�}@_`r#)���hrNf�s�j��[�%��rC�;�HK�T�q��dQ���d��= �4��, �� ����O���]���P�/�����3`��=�4��k�UKg+$>K��������\=G����w�|+8��,�X0y]g��!ad*e��)�K�vr2����HO�l&���<����:���SP�e�����;���������sc�kZ�"b.��8�
m�<�`9�B��EZ���D��7f����5�n���p�se�F�1��%NQ�����!C|>��k,��ke�gq��95�p����XT�f������9��<��>M������D��������w�.USG�����O{\���"�o�������Z�;�����G
:��9���`q#P_���lZ��if5p,&_/lmuw�K��_\�C\�ox�+b���9��~����2"���PT��L#L�[�I���U�Q�������	]t�z�]�2�����^j6�Y��\z��*�pR�`�p^�W5%�4,�m�Yy,��=',M��5���X�"��m����i��W*NI���ZpY.�[��O������o	�=�Rpj�<��Q9�W����-���;ew5�V���PM;�{��5��W]D�{����,v�A����E�B��U3(�e�����c��0���
&�0��Y����N��{%��5v���#i@.�����n���
{�S�( �+Z�����C}��OS��%����yrt7o��c���������:���5���������6�o2^����m�m�
����LB�r��t����^q�)6���0nH���8��������O5j�YJ�i�&Z�zxX���ZfY���^��0��"��?�9M���F�������E�XobIZ
������']�K����+e�,t��YI��P�A�K.��S�[�����h��y��Bz�]�����5^�.��^?����(M�h������&9��h_��E�x�s�\/��x����D:������@qI3�9\g��A��+�9���/��U-di����b�"�w���cJd��b�&9l��g���������P5�����0`I����/ m
c���O
*�8��^P����X �t���'N?�����"�v��w]�����D�7��Xo���k��]��5=G`���d<
����-����{G�_(����_/�4N�C�XR��
�����:���7W`d��T�BO\����A�o���|K�L��A����_���ge���i$(P�(�� �j�y���@pz!s��`	�;�y#k^F�%���~R���-;}�g�#lm���K
�����Ae�(��b�(
���Nl&U�5��M����'V�AX����k��P,{���w�b�f��u/�s�9���w������&[�}�����R���q�x��p��3��@��#�F�����FFM��!�Y��~R����,J-DZ,���w�h����N��+����l����^_��m����d����yg|;?]unJ�K��������B���A�>B���e�l��-����qmE�u���!��]CzL3m��d-(sx�|lx���8��g�u�-uw����1�k����
��^ lX�������y�����x��p{u� �?���zJ+K�Tb�@��P��sz��f\��`EcoX���>��:�X=���l� @E78�>�U6���&9����(S�f�	1�Ku�������7Q�������a�������b82��Q�d �������M�,�M�.�.�-����A�Xm���j��8 n��|�a�p��#4�(8�~��w��WC����
sO�y>��p������f
Z��;/�����i�A� ��[e��h������-	�!�����s���0"yt�������>�����'�qXY��&����o_�m$��Bo<7CX�y�g���<I���F}�����c�*�2�9����,�������{�l//�$�J��%,��K�${�'��o{x��n!B�?�����.�����*&��yy�7 � �MU#8�7�7��<yna�TQ2���6�Y\�*���O�v�G�a�u����������=]����P�^lQ��qG��X�1�HO�6��,B��p\�s%n��8O�5M��l��-���avH�)p�<Ba�:u��=���+����GNU����E���,e3+��QMF��|2����8;l�����h��\� �LA�wR�	���S�fS�Z����+U����3QW����)�=(���~�B�C��,���3���{���p�vZ�B����.6�cs
j�1/���6��l����Z��t���"��	Ep�C8�1��]81AM���E"x�G�C�����������_J���B��E_��((H�m:*t������c_���0^�>�v\��!�H�n������F�G�����6C�<M�IR~��/vU��)��o�iz��6��u;D?�D�Io�&�	�*f���k���o����
��;�H��������m��	E��3���<^��j[�e��r�jKGqC�&�97{���_��x�n\(l�jT=�Aq�j���:��(E�"E �O|�����:����{,�}���6�;�����?��c��=^,t�S�V�V�24L���]p�����u�FO���>�����z+�����Db8dc����X���tz`O ��T�7�S	.l>�c����r%t�:��1�b`�-FK�v�	X�C��|�'9(l�J��=V2N��G�X����kZW�P��>e����6�DFU�����O������O�U�v�|z����������,Rg����p���t����}��%��,���2O'�����"X�\B>��;������������xc���y��1?9/`S<���J���y�~^@�D�A6�y������\�,���V�$���~�/y�D;���F-��1�1ea�[�/���XK=����*lS�<+�N�_���c##�!xg":V��sQ��b&
�3[lCn��������[Sb�@����t�G���b:^�+�$X1�VK%�U�^I6)=��3����� k(|��{��������������X.R2��xG�5��5�3��'�x<�9)���Z�):�*l�-I���C�����J������� ����c}��!����v��B� ���_d��!A��):�EI��B�E��`��|*��H�
�������9�	������"`@+a�����8?�
S�
���
J�NK'���t�����$U ����5"���I��*P;�7tL�!�7��{�����9&bBO���,��J��7*��I}vs���%Y�1�usaJ�f���Aq�A �eR,[���)��}�nx\6��*�)���B�&��c3c�9��~��TtN�utUS���,���&`���G��T@��Hj���#z�����8m-Y����4w-6l��M��X�����9�q��2��!����"�$�`I������^�
��+�&����aR~��V���(y��
^:�dE��(�H�2
rj/���
C�}L�@�����e:�MB���@F����J�Q����y�$�5��N*�e�?-E+�X��{�h%��q��:;������b����Bc\����p��
����]%�+5�ZnW����2/����K]p����8_!6k8]�
��H���x0�v���%��.��q���.�A�sM;�������l��������=�j��t'�H'kp������a�s #�"���K;.�2<d[��f���"��s�6?)/*�4�X���.�X�
I��jR%�C��;���B�B�`�p���A$����u�����y+Y�T?KEs3��H�1q_��C�m�q��=���N�yy�;�����x��A-��<U��W�PO����������������9�������q��U�<	�My�S�N����&�����N$2H���b?�b�V,��\��������(
3�+ �8�@�p��8tca�]l��Z��EZ�i���>������3����q�?M��p����\s�����K���1��hIQy�C���V�����Z��4���CE�9p��zFi��Fq�5O[(��3E����*&}�8�Y0�I�5Bq�f�T�r`[�����Bh�c���&H����/��!�o���b�r@!%". ��s�pCt�Z�eA�l)�B
/T�Fm�i`N:����y`ZL�U=�����XU������|��EH��o�un�Q�Zgrer`$h��
��}�����#&��j-���F#H��S�h2�
������c�<������3_*�|���Z��h�P,�� ���NAq~�\��dI.��M�R.�r�C����Y���,�P���mhPy[���QI6g�����l��+�����w{�~� ��d�v�9$���	��=��!��?�#	�6��o�p+Yq�R��oi�"�#v���X��P#Z�_�n������]}�������n����:���������-�-���%gA�t���4�g�����D8����+Qik��^E�������H��������+���*�'�����o��M&�^�	=��?�	���+���V���Y�L���!t�M	0��6=����2|��={��3l�u	9>[j[+d�%04h�j�_�g
Q2�!$M���lT��an�2l��������������S��h.X����L��F<��+��C=�a�z"}pI�
$%��M@������
����4$�0zHE��RHJc
���\gI��G��bZ#\� !��{��s�F�����~�v�.�+@�Y�Yk�N�����/��Q�(�V/x���&X`N�3�~�@��hI�RI3����jc�d�l����9�g<^!|o�M�^��DV�TM�65i��K�af���`7t�m�#Jc��,iJ�"82��Fm;��0DW#����N����2
�$����0�����Un�����
T���e�a���9�>@-;f�gs/:I{&h�s��u�o�������7���������}^�k�`�� �Q��&����BEIc��O:����Y��%�ky^��\)Z�g@�������-L+�)��,���V�g����a]
�m`9R5�?!P��9�3�-DL@{d����o��|����A�(�1e;A�i6����L��Ve;��n��{my�@�����s�L��zs����5��&���~������b���,�������+�[�;k�D��/lc�UN�dM����C�#`������VD��'�E���"�)w(th���������\~riP������������[��������W��~D�D����A�������-�j2����;�x���.�:��w'oO�������4A�c�����������;KH��|����6��Y�_��.��~c�|�(�|
!���
�E��n����s.����c�����s�+4D<^I���������f��3�H�� �mW[Z�:�D��X�dx��
��eH����R�z�a���[M�`����R�N�O��������9\�;�y��R0��
��B9'�[[0A	�Xe)����%�5b�2�KO�S3��0!�0������=#!�&&n %��^d�>���;����dJi����=����a�����d��ZE��"���I>KieG���].��G�Y�n�
�|=h)^;x0���vs�^�s�vz��w��?���0� ?��\RG]��U��!�� U�0;�������?�:9M)�'�n���:��,a��c=��9������z]b����W�|M������+FC4��
e�N.�����6��AD���8�%�-��P��s�.Q\R{�	��%PG��SRx*O��W�A�����\�}����-/'�#���#���	R�z�!9A�/!���������_��:}���i�C�\���i�p�����Y�S0(�;vI��mF�?�;~q����w��7'=��k<�������r�4�V���H��O ,"�etc��"t2��YTl�>�=��TA�D{]x]@
�.�I<� ���&��*�cS���#-���C�|^'�9R%�����W�h���hH+U��#�FE[@������.)a��D~\bD�H��� �6�R	�b?�U�d2p��u^�u�"���i\b�%���|��.���:	52!A��4
���7��2���*L�a��yF�p��a����~nOr9�� "O���8�������A��R
�<�)�NqS������{������Vf�x���.����h���������8��6��@<�!��bF���`6��@���/^_qwZ�=�yuX� =MvUs�4,pJ!���E�Z�D��{.��`Fc�)��^j�n�Ss1�e���_B'<+����B��\�x$�������fj��%V�z�b���� ��t���������Q���"�C���<b��::��}���r�s6/!H���Y�zQg�-C�����3k�TnU	z�q�
�M��#��e!QJ���H/0�`~�2?R2�s��9�����D<�W����Q���^Hm3������\6�5�1���9�
��.������|��y�HTtYp�2��4%�[I�����c�����\��R �j�"~���-���E���n
LOg��������;��W����!5]��<��|������j.C�m�A�)&�R�s�__[STD���'!��h��I\���b!�VJl��5���&���/����c���n��
����f1�<�w=���*G��J�1��}JB�{��������9����7>���I�z���dR
�R������s�/�'
��I��7�J�e��$:<GPh����Q���
��$G�P5��E!�� '�����x�t���85���r���
��d�.j>�F�����{}����z��K��6	9��Aa�r���7b��;����Qa�AX6z�?�D�S��D�G�s�bS\�|]x6�'����w5�7�d,0���aV<�[����_X$�w������E�v��S�����	gApC1N��#�`��TO����b�����xm���Eo*_�h�OBweU
�~�+k%d�d�����E�y�����L��"���_�mJN�����6Lc=�6�Z��	�����%pt�VXZ�������F�l5��i,�����L�<-^��0�7�[�[�g����@#q���{�������J���w��"�9���}I���{�5	d��IJFh�c�����N�}��g�d�9�I��Y���[����U���������W �o1a�����:E������O���=�����6D1z��
j:0@�����{�j�R|Uuh(�"L�)7���#���������G�?��a���|]������`�&��N�+f�I3{����������H�����@f��!>������y�)�I�^��@w[��S��F������I���p���Z0>�Z6!�v�dN4��@\+���79kIAW��$\�	y=(�]�n'����o�7bW�$�a��j�SV�e;;�wI��Z��b�#6���8�-������A~���c�>_Z��0���6��M��OT8�K���~�p���_��e}f���p<5�#�k�,S�	��z�'Go��������w�JB�$�,*I)����r�]�P@q�&�$#�a��<\yaf
K��f���{��qi+��<�T6�ty��/�e
s���K���m��p�`D6	���j$���C�F9�$�����<m�n���m����D9(���r{�Pe��'/�j,a,�a�1���4�����z��TD#[N�z���=5�75(���g*��u��V�0Z�ka����H������
Y��L�7�����q�ZI�v��!���]����P+������G&��N��a���lj6��53*�}���>n16R�PsN�`s/�w����\uUd3��^���e�x~�<���d��7���X
j������U�,�T(�<?s���R�l��������o
mz
z���Z�* ��Z�����gZ����\JU�5M��#��&�)Kx�����o�fX�����AR����7v��*�c`.�M=J_fuk��Q[+���W�$t�i!��a$kVyCD�i��=���������H,����'��F����F�k&�������\������x������[�1� �/z�R�Ip��>��8�5KX{��0:��k���:��&EW9��������T������t���{u���j��_�;kn�����!�q�l���"�{�B��x&���s���xy�=v��������
�,L�I�Gt�6��9�-,F�0�4����JA{��],�����"�5F����<�x����.��_f;&	/6�8�qX��lF#!�����5:�{����������q8���	G������t�=�Y%<
'kQ'xX��A6�-H���0�������]��H�#����!s����4BK��5�0�n�i�V`���[,�'VD�v�Ve�7(B25p�<o/�UW|K^���Qw4�HbI[�����L����o]�����M�%^U�xh�^�rc����X��Z�f�]��� SA-��.�!m����Y#��!i�^���!l����	>���-(eG�rS���/{@Z���|t��/���8J�zd!��	.E�:�� 4�|�D�g(
@f�f"jv
K�$�k����D��z�������|ZrW�=�Y���I�)�N���8}.�Ee0B�X�}FJ���/��$��lm�*�_3
��@G���U��xQL�#I@&������
�����G���m��/�`�|#@��D����<rdC�\_�2�`��|~
n�?�9BCV�n���J�����^mnP��!��y?e��
���'��O�oF6���jy6���h�\2��FR���an ����8q��������St�-��K�*�C5��FH��X@�$�i��a�g����:G[u���PC ��O������5@�CN�����9*da?>��G���d�I�{P�9�+���{H�<��t�
p!����Y��v����h�F��oT
��i���|�pp��c�9�s�Kc�.���O���SHm3�� �<t>���9|����W'E0�D)�����������m��"2k)��_�t�w�4��Cd������B�v�{q����v��*)�F���sT!K�x���rA	�`����N�[�����H�(�*���b����BSvL�	�e�PIm�cy�#�k%xjv[�)��'�;7<��e1��tC�4����x$�A�~��7F�,b��O85"8�W��F�xII�|�'*m���|�N/��S;��\�H�R+���A,n���l��7y��`<�$�:W�����/��44����%N��de[R����� ��z���
s[�.U�B��[V���S���UDD�?�9q-���!��P�����p>�������K
5����=$��Z����Wd��19�PRS�������aj5kV~sc
E���6=H��������?_W#RZ���'-�@)����Z�3�Q}M�k"�RW�U�~/��������o��}z'����F��-����PL�Y*��{��hC��@|��ih�|,"h�H�����yF-����e@g���,oX�mAC�0����Y���%� �2g�@Ag����i��l�I����f��1���
��~�:�pR��4y��U�f���t,�B@�j�����������-�
��3����d�5�����������Gg'��'��^�)�V����������Q�����3{���V��j@6�xB�����$����������Q��D�o>a��&����w�m��6:;�;���nt�?w�����������UD<�D��
@k�,����������T�������S+X���	qc��dB�:f�X����<Hq��5���)������Y���W�*������]�����%&����3-���U������J��Sy�kx5c��0t���F�����	C1���6\p����_�'w��H��r>X���m>�����7pe-����l;�9���e'v�����9Q�n��C�2�4�"�����qx3^+�jAb��[qVZ��l�hq��L�jS4���"����Lr��h�Z����aVf�	�����K�h��$��H0�.�!$��������'�W���:ht�����m7�[���	�<Q<O�E���3�����MJ��Z+o���=?z!
���[;�X@�7�7f�(|���T�/����Z����J���0�]:�5�sK��-��4��0H0
R�4@���
�u�e�n*��
��u@�C"�U'��
��3P}R��hq���d�0jY�ux��OO�y�m����j��l��[��;����0vd�	�2�m���� ��l�����r����������d9�[�����=�Z�����-&�`x
�������N{�`��[`w$��01�s���b7�Om7����j�{Z�����"��]�f��W�X�<��s������=�j����(s���^�7`7�sZ�b���3��B���pa%f��r	���g��Z��__�M�������_�n�#�n�r��6�����B;-�0�& o��x�ysK$&w�q�d�f�r8K��|��<O��69�?�n��j�?�g@�Z��|�r��x��u�d���A7����$B����f���+�z}���`����o��~�������v=��F�A��E
`T��7E��$`�x��� �
G����.�Z?��:b���*|�VD���B~�!��	��C�i�s��-�v�����#�����_���zu�VO�C�}/2
�E�d$G&��[�'7��eE�G�]�L>H����Wg�)�jk�M�`��n�x���L'��P�C���L�g�����
�]"#�$ ��9��5��C�W�Y&�gX#�4��x)�:O�����D��J�qi��T��;$j-��z�?��K����������!1�0�9��H�[{������#!5�����|��,�UlA�f��73~�1��g@����O��|6�1��R%���b��<fcm����
���Nn1A�(54���P���X�����$��o�����A.�k���7+��P����)! �^�tqb��g������@�@�KO/������=���Lr���5���N�z����~�9ht�f��[��KG}$s�-����	�O��?F�#4_����Lj�b�X������l���i�jDX�����=�u����`o�����nr���4�k����3`��P������P�y�2��N�g������SaAf��t�s��j�v���U�BS�8�Ba����
Z�_�QPZ��8��9Qj5��� v'��j���������ee�\SxJ+���%�x�FR���d5=��`�I@�X�<�����%��t�:S�b>�s`�?�����g��"i���$ //�0��
FfNW9K����O$&�t��^�2<���*`t��G#\#�#T��Ld�,n`����`�	@e�q�xcE��v��N�����nm���h#��-Fx��Fg+�����{�M�������y},�������%Q�>\��O{�^����8=yD�T&�:��
_Z+�?�)��!���H����vP�@#xo��$VcP���:{�.U��N����e�3���0R&�����q����h
 Q����z�Iw��h�������
j��������v�\���c���~�r�����$NM@�����H��_��0x�@��r���r�
�o��&�2��]������l����&i�A�Aq�g���l��|��t|��s�7��0��$&�!mx�r~$�\����������>�o|���|+/�������vN���R�i7�%8���S:MB~��=\~������A�j��n4�=Q���]Z���]�_������$.\�,��&�J�	�$A/�mp���N�A]=\�q@2@!3Kx��a�����K���y��t+�ng�V����.�1Te��*B����5�'�1��������?���u����4�[(��:��?�e����	�o.{�3��o�C�1��p��^��	c3R'�L�l��1���k��2��A��z2����b��6�R�2��?���������
�;O9��������k<������t�&�\�	+��|}����A#�����������o
,��r���W�6|[�U���C��f���3�Jaz����dJ����L�|��i���y�
�6�F�3�v��B�����ZgW�f�-��6���+#7�'���l�(�����@F4����7�f��|{�{wz���������������z���������������zcC��N3N��v�p�(1/���A�����?p1�a��y|�����^��ur24'MD�N��)DB�Cn?�u��?���'�QC"�oO���<ith�fE
2����/�F
l��:�K)d��RZ+���
������7!:v�f\��X# ����~�����[mU��]� �o��{�6d�|��U`n!�����Y4����-	l�cw���+���8���C����L&���Q�

��4._�@j8�e��Hf
�=�le��`�&c�_��

<y�a��D�Q�U��^�u��)�0Zv��g�5���u_KH�35���4�n�� ����M9��or�8�E�N��W*�i,��� �<����m�w�F)K�i��}J2U42 5�pJ�zb�
��?����jzh��v%K�����?������T:�NS�a4�j!�e��g���g*��M6mRnA/:(� ��j	z[�T����43��W���������P�SG�V�;K��;l
��+�n����{������U���L��K�M%��e�a�C�^NFl����G��7`D-C�0l6��w�{�,0��L9�e��q����h��0��|ar�L�H+u>0���������n|+y���5�5k�k���W��a������K�������~���T���_��-�l�����E�B$x�zN���MC��iMa��.����g6y2\|��P�GJ�
`U�A��2+4�0%f7�.b��h��
����� ��c�������x�A�e��W�)�Wx36\h7p�>��~�������p���70a0U3�'���cz'_[Du�sV*����F�n��C/�@�h$�{a��|z��j��m��3����C�Q�$y��������%���9.��f����F#��o�]���0��y���y�N���/�k�z;u���T���b����H!�^����{��i�a:?����������������<�xGT��A����^�DP�u�Q��_���)�jk�sW����6�R����L���f��aB�t�T�E7��wD���JN��I��Q*���n��	���R�l ��~��	W/�D6��s �����)�+4��87�x:���
�hn�M�f�������oqLX�r���R�1����L�_���T��m*s:!���D��V����#��~+6[OC)!�����{�e4�C)��-`R�L�g�M�O�>�Nj~U'^��yL��R_�M����nS��^t3qR�h+&8;9�|�Pr��c���G����r�9}$��l����c3��#����C�y��O\��)~�����o��e��g��m8�.W�/���b���e'����:rN
��p���M�Df8C/�
�
5��9��2�Q�pa����@:��=������+��z4��C�2����!��6��p�3 A�P�"�1��A���J������$:��7MO�^����������dc�A�\L~��pX_�9t�-��^�U�X�����.
�l��o�7t�rw��U���������dd�v�j�%-1���#3j����)��a'91�cdb���� ,p����Q��Mm@��3�l ��L>������oO��N���o^�$��O�N
�xqr|��?�*��go_��t/�\:5�S�4:�v��	�GT����~E�������w	�jB��:]W���C��^��������g�4XR�=s�p[<r$��A#�������'��WI�-"%��>4!�{U��]�9_�;
xb����DdY}(�:�jJ���"D�Y8}U�@�aw�������x( ��H�����������0F���z9�����4;m:�)g�H�PJ�F�}c%Q�y��z��87��i���������P����F�~D���~����C���h���f�;��&�BwoEUE��PP��
�ZXWV���CP�X�nC���`�J�E��#��/[���tP��60'�Va�)�[�� ���UU5��.E��k��U�x��X%p��t��i�C�i�&9��M��Vt�Z�����L���a�2��@��!���0��2�G4��V�g�)��*���dL��X���d	�C���N��������Z��"�1�����%n����:��y��ti5T6�b
i����8���R�k����������P��j�w�������&���<�T�^�������j��}<�V%� +0�N
x@d���Z|��sy
;�����R�;!q*yp��]�������E�j����
�#�(=�N�y��CB0��%W��H�E�h����������e07?�V������U�>�%:��t;k��]����������uNo��O��0�M�^���T��*���t���hqsd�
��]�������9\��O�[�d8]2�!��W�i�~h�|���1���hRf��?J&���.�
��O?Sd�|Q�tH���
��a�]Yb6I���T������FbQk�f��-EfV�$�Y������`�������.X�|(�e�6����W����Y�=?�k���.v��y�zX����!x�n�c�#D[���K��K�+|&s����'�������g<p[Fr7E�S��by��+L��L�HW�i�\U9$�>�q��p��J�WI���@�4&p`�_������Q�G��`�oy������i��=�5w�whX��A��`,���.�]j.�/���
�gy�7?��������.��a��O,G���n'��P��������V ��.��Z�a�x��)��0���Z�f�&g`���YS��@�)oh��M�fl��
��g�9��2k�T��s]T������#s�5��i���gM��"�'o�����7���������������N�_:�������6<���v;���?�P�?�����M�[�����3�8�oo�w�[���v���?�w����Nw��i����E���w���Yo��^�n?��%]��@3�����a.�7�4M������������EO*OSP��
���[��M��N��>j�=��O�ms�TN����(��
�E${�?�n&v�&z�V*(�&}�������4���������W���j���
M��X�t��&���������avk��<b�*��T�{*5L�G�+�?P�[^	���� ���J���B���W�/m'��:cG.�3�1�|(��������mB|�^���m�J��-���~w�H$W_]T����pM�.~ta_�?@/�j��N8G���${�������vc�����
��o��'�����:vq@#j�7,K�o��-l�l�CI�ptvX�<�_��K8���m�8���z
����%��Y�$tV��7c��[���e�*>��*�8_��~{[[������>[�j[�b�Z�So#�m~�Oz���C��B���<9M~L��f�H�3������=zyrv������8����	�$p�O*�oC���Y?���Q����|5��t��.�W�O��W�6?���,�'��6�8Wu�$��W���$'hu�K��N�)my0�=�zddu7�!|^�x��YP�v��{&��7�+�������R����^�_��\�P�'��(m���_�j_}���o��_yrtZ�D�Q�N��	
�~�7�|`�
�Q�%OG��BE���5���P��4c�����W;����6��mf�{��5�O�>����ON1,���09cd����	~��yS~^�<;�����aY���+�z�2��K�-�o�"os���`0���v���V�{p�9�l�b��X,qg����y0���@���V��&c�;i�l.q
���Wx��.y��������&�b�DRj�t���b����}R�/�����H���R���������,-�$*ia61'�j������|�5'Z2��� C����;�V����=�������a������g3��U� ?�]������F'
/�ln��a'���'�s�8��$�����?��M���l�%)�J^6�Kw?����f���]���D�h�.u����()<��%#��u��{�9�0�x>���o���4��	#��='j�\SSL���m6��nmh�����a��~ >E��������t����-�%�mS*���v���N[��\����{��/���&���m�pF?�
{�
���g5�0����Q�����������"<h�E��d���W���o�txx���I��������25���B��e������<t]��y
�3�%�i���A��s����c"���l�������������[9q>��On~k���w�0w��f[}�O�G�0��y/C�x`�x��7'�i:�_G����&�H��M�����`��w�-5,���p��������R.+}�����9Pk��GV���eM:�]���p����]��x>����D�5�B��e5`���*�j����5��A���
��3�l�����x�H�7��i\[{���/?�9f������������1�����v�R��[ �
23�9�_�m��Z�Y��mG�o+4��TF�v������-������������1g����L�����R��
��_�;=�����I����7kk�_���2/���"��"�-�lW�ee^=�B��>��zl��I\"��7��H��s�AD������.�0C�"�������7���9��
(���BVE(�pM�=Q���a���Z�'HC����]9(�E-�4���fr[����C�^�.tN����Sq�3���;�V��Oww/V�s
-��sEa�;(k�/�+�������E"}�f $��r|e���Qw���3�U-������������!j"|>N?�����azf� )0����N�����-3������S��
��
������v#�r<V0���!}�����;����Y*���;����x|�a���?��=�����Xq��9�������g�*x�e�H�l��<�4�(-S0L|X$
�X,�V�0�%"���z`9��Q�`�@A�1?�*�C�fS���������j�^@}���j]����w�������-���s���o��=�d��(��[��;���
�{%�91w��H�n�D�����#��@��/{�Ia^��B�p�;:��O���#�PY#�2��F5C>Z�C��L���hAc�D[E��bh�T���B"&=�Y��u�H5���sr���4��[�\����Ycs0��g��K>�0��G�!���-��"l���N��8����@&�s[�{H��[.������j��	@E'�P�B�9�$A� �38�N�I�f0>�@��.;:��O���Qb����m�4���(�#����a��%�$���q��z��������ua�\���p�{���AdI�#��m��m8�*y��w���a�~QS5���H����'o���M3B�'�	9QO���/������a�>�u0>�E�B�io ��Y������:5&gC=��l{DvK�$�!P�Dmn�kt�/8ed��5��x��a���'P�O ���kt�F@�7�����-C�@a.�����L�m^|�0B��MP	���t�R�^���n�[�;�}���V�b��������2<x� �\�7��Y��1����H���]��x8Zwq������4�n�or�F�x�H��j��\�a�y�>��s+�����j��YQ
I�1Z(&���M6�MA���o�V�4�1d.$h�'7����$(���`t�c�����V�����=
(�~,�!��d����a���{h1�/�3�^"|]�E����^p��2��)G<�a)>�9�����k��X����Q����@e����@�y
�i.`�4��
�U���o�h���~
����%���=����f�R�r~]J�d��
�����WP�����3���:��^3�I"n�)N�R������xUn��mA$�9����Z��i�TUp�"�����MDi��4>{ER�D!��
$)B8��*<����/;���{�+���,�A�����p�T�y��J�E��DS���X���I��"M������u�v���*j.����G��[J%��s^U#��v�I�{u���E��-���1$o�����
3>��N��QxQ+
I'�mkC`g%���.���
rk���#��:������R���H�j�#�<�	iyrY�d�>��F\^w�z�,��fR�>�Z�Z.�:g�u�
���s��l��\f�g�����f�u��~l��s��y�������t���������. =���g�"=�f��<��(��$�9w~>N!���v���7��T�����U,�����R5��|�u92�o)�^��S�l��u�_�T�������e���H��rl��;�'�7�b��~GHOwZW��t��8�EU�;������Vw�����o��`�Z��}
���h��n�c^������?pNr����l+���0S2��u�������������t
!��G�N�1��o��w�|��1���f$��g�M�,��������m���@_O��Q<�t�������S��)&5�;/�����an�|����I�WJ�^������IA�B�e���a�> �X~K>�7�&�A��f%���l��
���'�)jfp�l����Y5������Q�k-��7�ia���b�iT���bd&�-I�_0��.����9�~��u��� %$���3<7<&T�6@.�� P�!�����T��a :n�J�������aN�d�k����u�+#��
�����j��C��~�&��i@�Fb������4�����=��!�~�g73����j�+`��7����R?-�j/@9`T,#�R�UdT,#|N�9",����Q���S��� �������-n0u�A�F�8��t8�qC����3�^G��p���N	b�C�d��/��"�OB��v#P/����#��������$��$[��I��kh��X����������6z�]�/�k��z��d�6��!���y�)^V:�p��d��t������+X��s�Cu�}T���K��Y�)*�J���;�-kI���6r$f!�R'�A�4��@����&$���������%_�V���9�Gj�7K��r_	1��H}�X�r2 
�0������������k�)<A���FmL�����XQz����\�2���q���U3s	��a����wo�o�����j_!���(l6��l>e�s8��hk-��i(E"��tD�o`n��l��@B�t>���R' W�zO>=�����\��\���A]�W���`Qd�U��0�������g��Vk��p�������%��6�`����o'��!`C^�8�:�%��Q8U�����`��:�������c���S��<��`6|r��I!�_�t�{f�[��vubI3��D��E�8e���]��.�I�7�@��f
F�f��{�s�G�b���E�2s���%�F��wD�Tl����F�/�e�����E��,i7Q�����c�u{���[R��4���	�-ug�s6}�H��=Ud��SE�����>��q�X����;9k:
C4����'e-�X��4�����`������	�� 3CD��0wjv������"B�����A=�|C��j$bHt��������HYG��uwk.�KR_�sP�����������Q�'��z��
��<W��4;mL/�i�~���m0��}X
��.�R)L��>��&�)�<��7�#aG�s
��W2�8���
s���
��(n2��&H��H(:��D��H/F���Pl��Y���K��tI�-Mu^iq�;.����.l^a���L�aHq��V�A#a�(�1r��;�+>�h�4���t�.^�sC���$\DW��>�M�P&�n�MUKN��t?d�M�����<P���N2(x�Vko{�r����LF���6�@��-T�/�#�O���F^�1�b����u\NZmk�Z=���.

�JT��h���m�\/jb&A��t�`���fXP�U	��/�M�$c*B"�\aId�������������,��&��� 
�P�B8����G��r���na%G�V���N�G�Q$e��e�T ���;sv/��5���=�L@���tb(@�f�*����]��r4���(��j_S���.z"�����4"O���P��0�t(�C��y��5)����f�k��$��U���\����W�����5��4�����4Ii!Z�����8�G�:
��h�XjQhV�@���o����i?�,��T�������Xo/�=�#O�y}z����S_�Q���&��7B
J�'7���j����np��D>��f�}�|����)@5OnP���PKZ���K�-����x�X�l�|�	�������'���?{~|t���+�
��?�����u���:�]���x��#�����[�8���>�Jz��/�?���S�,�����%����O]�E�P����R^f�s�������P����d�u:3�����;HCW���g���Y����e�.�������������0�y2��2F��_�?���5~��H���<�|
%O�99��@�~r�C)l�����x�b���z�w�n^�?pH�dD���
L���`-�9��sL���`0@SM��S#���?�'�L
��{�JS�=gK:��B���#5a?�t:~s0�o��9`���g��J:[`
G���E��W��0b�dF)	����jYN���Ph���a��g%>{�/n$�Q�g��m[T���&�H�]��f�O4��6��k���S4q��E�x�dc��U���B+���r����DG&>b����a�c�v��8��|���={H��w�QQ�v�&O�ny����;��Lap�B��b����r, x����:&�=i_W�)jpF��oA ���)I�y#p�"w*)��+��v���F��������Ei��6S)~��~��`�P��jAp�z��;o����,�7�r�d]~>��^V�J�Q��t���=��Q����f&�&$&�p�������=�/0���PK0_#��?���,���Z�T���S;!C��R_�n�����a�����:�B��P�5K�U�5E���i�^H���^�4��,	��G�=*tIu��CZ�bW��fo���#bc�8�`�p?����,
w�����O�#�S������~X� Wq/*
n4�����h(� � !Q�2UC����QZ4�\s1W�@7���%"|L)���A>H��i��������8TG�?C	2R��@�b$���1G�9B����-2g�#�F��a%�!��p?�g�,��o�zP��)���E����	���.�+,��N~��F���@�����������w'�y��[��Xl��C���
���X(��G�p��0�4"�ni�BM[C�c>���?y98@�i���N���5A��$���0����O��WEf/����0�E��mS��2���h��'�
gs���R��c����c^��.i��&|:!�n�^@��B��n ���H5��f�_��X"i���+�"��=C?��<L��'.���yc�_�n��o����>q�-�"(#��d�Z�'��N-���G�|�r���m��u[�l{���]�����bV���g��#����ttv����!2hckk#]��z�^���,9_;�C4��5�"+FE��g9S����k�%T�7���d�P�5}��r�e8C�J,�����f�?m�d�p�����GV9�������g~��*��p����H��x���$J��B<������5	�H*�h���m��v�|z������
b��O|
��X���m�J�g["U��cO�H-�y0@���U)6�U�i3`��cf��Z0����t�B}�&?�c$���#�r�p
����,�o	\�����5������)��l�������T#[�
�uP��
�CtHj2zV�
<~
�
�tv-���R���y@n�./
(��dw���c���������B���F��&x��5�][�YRZ�����%�5�][�sk��pmmI�mp�`IGa��+�����[���4�"(���(t�n//j��;xgyQ�qp$�O��,.������5�,��/bg���eY�3���J��;�KJ�%�Y^T�|���%�[^T�|��p�����������{6����m���_���~� ;h�v���v{��]-4P�RE���nQrA���ik>�Dk]��B����x"Rm��/�HY�U��,U��U-^�l��"vk�Z��\0�����b���b��t�!_i#F��F�����,�g��� P������;!�-4�`/t/�?w��n����\��o�A����W����D]�9�:�T`O<�w���/�����Ru���C��1�zS���s�/t�c!4m@�S�;,ZKiEvw�(uw���S���#�B�l�3��3f���G=Tb��4��������6g���4�x���A_�0�e���v����7�w�+negV�3[tP��V�F�HK����w�vo��-K�u�J)��H����Vkg���DX�e�8j��$Q���3=�Fon�(��������Y2n��� �M����Qx�>��'_��43PXIe���fZ���X�e�s5��/KrQ*�B.��S����������>�J	$9�J����k���@`���%���O���o�kt�jr������������I�����E�bR�wH�8�k{����_����/�_�������M��{���,���nw{��K�m����v����\�_��Y�X�9C���������hk�$��>�e�}p
6�Hjl��D��0F�5�W*�����KJc�#n������4*�`'���C����:��sM!S|ernv��^��_'?� �z}�^�w`	�|8� e |@�4D|`����A����U*�|��>�����:��0���
y��0�:R�*����	x�"K�'�B����N��V�����df��f���8��\`v�� �8���P`l-p��(�$]N�f��0_���3
�u@�lZ7�V�����
�<�Y����_�(��R|�_!�.�f��+��}�-d>���^@��?�j���g�l��"������:��=�fK@X)-�����5�8A.�����P��DN:�������(��'a���q�J�Y@3��	�����[sS�k�F6U}�<��0��P<�hX$���`�/�����4/���o��c	���&H|yr�������������3,_��������?����;����l�����_l���A��bk��}�s�o��������������O�e���u;�:����%����I�9�n�7iR�]�W-@n}3T���b�Wg��s�#P��k�����C�yx�w:�
��^v����E
����d���t��j_����U�Z���K��}���K��{AU����_i8�m�f���+N�T
7v>���FpQ����P��
q/� g8a�������� z~��N�lq��\�
�>!�c�:���w'o���yq�jq���
�j&���U���d}�W�X&��S���f�C���z>3L��d����,slh���

�����?zf��I����J6�$����Sp�5���0}�~�-��h]�T�OE�>�]�����.��Y���;�Z��~[���������/���U_���M�)���ghb���Dq\�J)���3)�b?������/�\a����e�m��2"t���v�v�
��qF�����=g1�@8�R�gn����^����%�A7�������/?_~��|����������/?��~�?�xYF�

#89

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#88)

Re: multivariate statistics v14

Another skim on 0002:

reference.sgml is missing a call to &alterStatistic.

ObjectProperty[] contains a comment that the ACL is "same as relation",
but is that still correct, given that now stats may be related to more
than one relation? Do we even know what the rules for ACLs on
cross-relation stats are? One very simple way to get around this is to
dictate that all the rels must have the same owner. Perhaps we're not
considering the multi-relation case yet?

We have this FIXME comment in do_analyze_rel:

+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).

Maybe this merits more discussion. Right now we have an upper bound on
how much to scan for analyze; if we introduce the idea of scanning a
percentage of the relation, the time to analyze very large relations
could increase significantly. Do we have an idea of what to do for
this? For instance, a rule that would make me comfortable would say to
scan a sample 3x the current size when you have a mvstats on 3 columns;
then the size of fraction to scan is still bounded. But does that
actually work? From the wording of this comment, I assume you don't
actually know.

In this block (CreateStatistics)
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));

isn't it easier to have the inner loop go from i+1 to numcols?

I wonder if this is sensible with multi-relation statistics:
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;

I suppose the idea is to drop the stats if any of the rels they are for
is dropped.

Right after that you create a dependency on the schema. Is that
necessary? Since you have the dependency on the relation, the stats
would be dropped by recursion.

Why are you #include'ing builtins.h everywhere?

RelationGetMVStatList() needs a comment.

Please get rid of common.h. It's totally unlike the way we structure
our header files. We don't keep headers in src/backend; they're all in
src/include. One reason is that the latter gets installed as a whole in
include/server, which this file will not be. This file may be necessary
to build some extensions in the future, for example.

In mvstats.h, please mark function prototypes as "extern".

Many files need a pgindent pass.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#90

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Alvaro Herrera (#89)

Re: multivariate statistics v14

On Sun, Mar 20, 2016 at 11:34 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

ObjectProperty[] contains a comment that the ACL is "same as relation",
but is that still correct, given that now stats may be related to more
than one relation? Do we even know what the rules for ACLs on
cross-relation stats are? One very simple way to get around this is to
dictate that all the rels must have the same owner.

That's not really all that simple - you'd have to forbid changing the
owner of a relation involved in multi-rel statistics, but that's
horrible. Presumably at the very least you'd then have to find some
way of allowing the owner of everything in the group to be changed at
the same time, but that's a whole new innovation. I think this is a
very messy line of attack.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#91

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Robert Haas (#90)

Re: multivariate statistics v14

Hi,

On 03/21/2016 10:34 AM, Robert Haas wrote:

On Sun, Mar 20, 2016 at 11:34 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

ObjectProperty[] contains a comment that the ACL is "same as relation",
but is that still correct, given that now stats may be related to more
than one relation? Do we even know what the rules for ACLs on
cross-relation stats are? One very simple way to get around this is to
dictate that all the rels must have the same owner.

That's not really all that simple - you'd have to forbid changing
the owner of a relation involved in multi-rel statistics, but that's
horrible. Presumably at the very least you'd then have to find some
way of allowing the owner of everything in the group to be changed
at the same time, but that's a whole new innovation. I think this is
a very messy line of attack.

I agree. I don't think we should / need to impose such additional
restrictions (e.g. same owner for all tables).

I think for using the statistics (to compute estimates for a query), it
should be enough that the user can access all the tables it's built on.
Which happens somehow implicitly, and currently it's trivial as each
statistics is built on a single table.

I don't have a clear idea what should we do in the future with multiple
tables (e.g. when the statistics is built on 3 tables, the query is on 2
of them and the user does not have access to the remaining one).

But maybe we need to support ACLs because of ALTER STATISTICS?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#92

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Alvaro Herrera (#89)

Re: multivariate statistics v14

On 03/21/2016 04:34 AM, Alvaro Herrera wrote:

Another skim on 0002:

reference.sgml is missing a call to &alterStatistic.

ObjectProperty[] contains a comment that the ACL is "same as relation",
but is that still correct, given that now stats may be related to more
than one relation? Do we even know what the rules for ACLs on
cross-relation stats are? One very simple way to get around this is to
dictate that all the rels must have the same owner. Perhaps we're not
considering the multi-relation case yet?

As I wrote in response to Robert's message, I don't think we need ACLs
for statistics - the user should be able to use them when they can
access all the underlying relations (in a query). For ALTER STATISTICS
the (owner || superuser) check should be enough, right?

We have this FIXME comment in do_analyze_rel:
+	 * FIXME This sample sizing is mostly OK when computing stats for
+	 *       individual columns, but when computing multi-variate stats
+	 *       for multivariate stats (histograms, mcv, ...) it's rather
+	 *       insufficient. For stats on multiple columns / complex stats
+	 *       we need larger sample sizes, because we need to build more
+	 *       detailed stats (more MCV items / histogram buckets) to get
+	 *       good accuracy. Maybe it'd be appropriate to use samples
+	 *       proportional to the table (say, 0.5% - 1%) instead of a
+	 *       fixed size might be more appropriate. Also, this should be
+	 *       bound to the requested statistics size - e.g. number of MCV
+	 *       items or histogram buckets should require several sample
+	 *       rows per item/bucket (so the sample should be k*size).
Maybe this merits more discussion. Right now we have an upper bound on
how much to scan for analyze; if we introduce the idea of scanning a
percentage of the relation, the time to analyze very large relations
could increase significantly. Do we have an idea of what to do for
this? For instance, a rule that would make me comfortable would say to
scan a sample 3x the current size when you have a mvstats on 3 columns;
then the size of fraction to scan is still bounded. But does that
actually work? From the wording of this comment, I assume you don't
actually know.

Yeah. I think more discussion is needed, because I myself am not sure
the FIXME is actually correct. For now I think we're OK with using the
same logic as statistics on a single column (300 * target).

In this block (CreateStatistics)
+	/* look for duplicities */
+	for (i = 0; i < numcols; i++)
+		for (j = 0; j < numcols; j++)
+			if ((i != j) && (attnums[i] == attnums[j]))
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_COLUMN),
+						 errmsg("duplicate column name in statistics definition")));

isn't it easier to have the inner loop go from i+1 to numcols?

It probably is.

I wonder if this is sensible with multi-relation statistics:
+	/*
+	 * Store a dependency too, so that statistics are dropped on DROP TABLE
+	 */
+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;

I suppose the idea is to drop the stats if any of the rels they are for
is dropped.

What do you mean by sensible? I mean, we don't support multiple tables
at this point (except for choosing a syntax that should allow that), but
the code assumes a single relation on a few places (like this one).

Right after that you create a dependency on the schema. Is that
necessary? Since you have the dependency on the relation, the stats
would be dropped by recursion.

Hmmmm, that's probably right. Also, now that I think about it, it
probably gets broken after ALTER STATISTICS ... SET SCHEMA, because the
code does not remove the old dependency (and does not create a new one).

Why are you #include'ing builtins.h everywhere?

Stupidity.

RelationGetMVStatList() needs a comment.

OK.

Please get rid of common.h. It's totally unlike the way we structure
our header files. We don't keep headers in src/backend; they're all in
src/include. One reason is that the latter gets installed as a whole in
include/server, which this file will not be. This file may be necessary
to build some extensions in the future, for example.

OK, I'll rework that and move it to src/include/.

In mvstats.h, please mark function prototypes as "extern".

Many files need a pgindent pass.

OK.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Tomas Vondra (#88)

Re: multivariate statistics v14

On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few
remaining whitespace issues.

Hi Tomas,

I'm trying out v16 against a common problem, where postgresql thinks
it is likely top stop early during a "order by (index express) limit
1" but it doesn't actually stop early due to cross-column
correlations. But the multivariate statistics don't seem to help. Am
I doing this wrong, or just expecting too much?

jjanes=# explain (analyze, timing off) select x from foo where y
between 478 and 480 order by x limit 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..4.92 rows=1 width=4) (actual rows=1 loops=1)
-> Index Only Scan using foo_x_y_idx on foo (cost=0.43..210156.55
rows=46812 width=4) (actual rows=1 loops=1)
Index Cond: ((y >= 478) AND (y <= 480))
Heap Fetches: 0
Planning time: 0.311 ms
Execution time: 478.917 ms

Here is walks up the index on x, until it meets the first row meeting
the qualification on y. It thinks it will get to stop early and be
very fast, but it doesn't.

If I add an dummy addition to the ORDER BY, to force it not to talk
the index, I get a plan which uses the other index and is actually
much faster, but is planned to be several hundred times slower:

jjanes=# explain (analyze, timing off) select x from foo where y
between 478 and 480 order by x+0 limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=1803.77..1803.77 rows=1 width=8) (actual rows=1 loops=1)
-> Sort (cost=1803.77..1920.80 rows=46812 width=8) (actual rows=1 loops=1)
Sort Key: ((x + 0))
Sort Method: top-N heapsort Memory: 25kB
-> Index Only Scan using foo_y_x_idx on foo
(cost=0.43..1569.70 rows=46812 width=8) (actual rows=60000 loops=1)
Index Cond: ((y >= 478) AND (y <= 480))
Heap Fetches: 0
Planning time: 0.175 ms
Execution time: 20.264 ms

(I use the "timing off" option, because without it the second plan
spends most of its time calling "gettimeofday")

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#94

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#86)

Re: multivariate statistics v14

Do you have any other missing parts in this work? I am asking
because I wonder if you want to push this into 9.6 or rather 9.7.

I think the first few parts of the patch series, namely:

* shared infrastructure (0002)
* functional dependencies (0003)
* MCV lists (0004)
* histograms (0005)

might make it into 9.6. I believe the code for building and storing
the different kinds of stats is reasonably solid. What probably needs
more thorough review are the changes in clauselist_selectivity(), but
the code in these parts is reasonably simple as it only supports using
a single multi-variate statistics per relation.

The part (0006) that allows using multiple statistics (i.e. selects
which of the available stats to use and in what order) is probably the
most complex part of the whole patch, and I myself do have some
questions about some aspects of it. I don't think this part might get
into 9.6 at this point (although it'd be nice if we managed to do
that).

Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would like
to see in PostgreSQL someday, but I'm not sure if we should put the
patches (0002-0005) into PostgreSQL now. Please let me know if there's
some reaons we should put the patches into PostgreSQL now.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#95

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Jeff Janes (#93)

Re: multivariate statistics v14

Hi,

On 03/22/2016 06:53 AM, Jeff Janes wrote:

On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few
remaining whitespace issues.

Hi Tomas,

I'm trying out v16 against a common problem, where postgresql thinks
it is likely top stop early during a "order by (index express) limit
1" but it doesn't actually stop early due to cross-column
correlations. But the multivariate statistics don't seem to help. Am
I doing this wrong, or just expecting too much?

Yes, I think you're expecting a too much from the current patch.

I've been thinking about perhaps addressing cases like this in the
future, but it requires tracking position within the table somehow (e.g.
by means of including ctid in the table, or something like that), and
the current patch does not implement that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#96

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#94)

Re: multivariate statistics v14

Hello,

On 03/22/2016 09:13 AM, Tatsuo Ishii wrote:

Do you have any other missing parts in this work? I am asking
because I wonder if you want to push this into 9.6 or rather 9.7.

I think the first few parts of the patch series, namely:

* shared infrastructure (0002)
* functional dependencies (0003)
* MCV lists (0004)
* histograms (0005)

might make it into 9.6. I believe the code for building and storing
the different kinds of stats is reasonably solid. What probably needs
more thorough review are the changes in clauselist_selectivity(), but
the code in these parts is reasonably simple as it only supports using
a single multi-variate statistics per relation.

The part (0006) that allows using multiple statistics (i.e. selects
which of the available stats to use and in what order) is probably the
most complex part of the whole patch, and I myself do have some
questions about some aspects of it. I don't think this part might get
into 9.6 at this point (although it'd be nice if we managed to do
that).

Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would like
to see in PostgreSQL someday, but I'm not sure if we should put the
patches (0002-0005) into PostgreSQL now. Please let me know if there's
some reaons we should put the patches into PostgreSQL now.

I don't think so. While being able to combine multiple statistics is
certainly useful, I'm convinced that the initial patched add enough
value on their own, even if the 0006 patch gets committed later.

A lot of queries will be just fine with the "single multivariate
statistics" limitation, either because it's using less than 8 columns,
or because only 8 columns are actually correlated. (FWIW the 8 column
limit is mostly arbitrary, it may get increased if needed.)

I haven't really mentioned the aspects of 0006 that I think need more
discussion, but it's mostly about the question whether combining the
statistics by using the overlapping clauses as "conditions" is the right
thing to do (or whether a more expensive approach is needed). None of
that however invalidates the preceding patches.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#97

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#96)

Re: multivariate statistics v14

Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would like
to see in PostgreSQL someday, but I'm not sure if we should put the
patches (0002-0005) into PostgreSQL now. Please let me know if there's
some reaons we should put the patches into PostgreSQL now.

I don't think so. While being able to combine multiple statistics is
certainly useful, I'm convinced that the initial patched add enough

Can you please elaborate a little bit more how combining multiple
statistics is useful?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#98

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#97)

Re: multivariate statistics v14

Hi,

On 03/22/2016 11:41 AM, Tatsuo Ishii wrote:

Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would
like to see in PostgreSQL someday, but I'm not sure if we should
put the patches (0002-0005) into PostgreSQL now. Please let me
know if there's some reaons we should put the patches into
PostgreSQL now.

I don't think so. While being able to combine multiple statistics
is certainly useful, I'm convinced that the initial patched add
enough

Can you please elaborate a little bit more how combining multiple
statistics is useful?

Sure.

The goal of multivariate statistics is to approximate a probability
distribution on a group of columns. The larger the number of columns,
the less accurate the statistics will be (with respect to individual
columns), assuming fixed size of the sample in ANALYZE, and fixed
statistics size.

For example, if you add a column to multivariate histogram, you'll do
some "bucket splits" by this dimension, thus reducing the accuracy for
the other columns. You may of course allow larger statistics (e.g.
histograms with more buckets), but that also requires larger samples,
and so on.

Now, let's assume you have a query like this:

WHERE (a=1) AND (b=2) AND (c=3) AND (d=4)

and that "a" and "b" are correlated, and "c" and "d" are correlated, but
that otherwise the columns are independent. It'd be a bit silly to
require building statistics on (a,b,c,d), when two statistics on each of
the column pairs would be cheaper and also more accurate.

That's of course a trivial case - independent groups of correlated
columns. But I'd say this is actually a pretty common case, and I do
believe there's not much controversy that we should support it.

Another reason to allow multiple statistics is that columns in one group
may be a good fit for MCV list (which works well for discrete values),
while the other group may be a good candidate for histogram (which works
well for continuous values). This can't be solved by first building a
MCV and then a histogram on the group.

The question of course is what to do if the groups are not independent.
The patch does that by assuming the statistics overlap, and uses
conditions on the columns included in both statistics to combine them
using conditional probabilities. I do believe this works quite well, but
this is perhaps the part that needs further discussion. There are other
ways to combine the statistics, but I do expect them to be considerably
more expensive.

Is this a sufficient explanation?

Of course, there's a fair amount of additional complexity that I have
not mentioned here (e.g. selecting the right combination of stats).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#99

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#98)

Re: multivariate statistics v14

On 03/22/2016 11:41 AM, Tatsuo Ishii wrote:

Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would
like to see in PostgreSQL someday, but I'm not sure if we should
put the patches (0002-0005) into PostgreSQL now. Please let me
know if there's some reaons we should put the patches into
PostgreSQL now.

I don't think so. While being able to combine multiple statistics
is certainly useful, I'm convinced that the initial patched add
enough

Can you please elaborate a little bit more how combining multiple
statistics is useful?

Sure.

The goal of multivariate statistics is to approximate a probability
distribution on a group of columns. The larger the number of columns,
the less accurate the statistics will be (with respect to individual
columns), assuming fixed size of the sample in ANALYZE, and fixed
statistics size.

For example, if you add a column to multivariate histogram, you'll do
some "bucket splits" by this dimension, thus reducing the accuracy for
the other columns. You may of course allow larger statistics
(e.g. histograms with more buckets), but that also requires larger
samples, and so on.

Now, let's assume you have a query like this:

WHERE (a=1) AND (b=2) AND (c=3) AND (d=4)

and that "a" and "b" are correlated, and "c" and "d" are correlated,
but that otherwise the columns are independent. It'd be a bit silly to
require building statistics on (a,b,c,d), when two statistics on each
of the column pairs would be cheaper and also more accurate.

That's of course a trivial case - independent groups of correlated
columns. But I'd say this is actually a pretty common case, and I do
believe there's not much controversy that we should support it.

Another reason to allow multiple statistics is that columns in one
group may be a good fit for MCV list (which works well for discrete
values), while the other group may be a good candidate for histogram
(which works well for continuous values). This can't be solved by
first building a MCV and then a histogram on the group.

The question of course is what to do if the groups are not
independent. The patch does that by assuming the statistics overlap,
and uses conditions on the columns included in both statistics to
combine them using conditional probabilities. I do believe this works
quite well, but this is perhaps the part that needs further
discussion. There are other ways to combine the statistics, but I do
expect them to be considerably more expensive.

Is this a sufficient explanation?

Of course, there's a fair amount of additional complexity that I have
not mentioned here (e.g. selecting the right combination of stats).

Sorry, maybe I did not explain clearyly. My question is, if put
patches only 0002 to 0005 into 9.6, does it still give any visible
benefit to users?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#100

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#99)

Re: multivariate statistics v14

Hi,

On 03/22/2016 01:46 PM, Tatsuo Ishii wrote:
...

Sorry, maybe I did not explain clearly. My question is, if put
patches only 0002 to 0005 into 9.6, does it still give any visible
benefit to users?

The users will be able to define statistics with the limitation that
only a single one (the one covering the most columns referenced by the
clauses) can be used when estimating a query. Which is not perfect, but
I think it's a valuable improvement.

It might also be possible to split 0006 into smaller pieces, for example
implementing the "non-overlapping statistics" case first and then
extending it to more complicated cases. That might increase the change
of getting at least some of that into 9.6 ...

But considering it's not clear whether the initial chunks are likely to
make it into 9.6 - I kinda expect a fair amount of comments from TL
about the preceding parts, who mentioned he might look at the patch this
week. So I'm not sure splitting 0006 into smaller pieces makes sense at
this point.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#101

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#100)

Re: multivariate statistics v14

The users will be able to define statistics with the limitation that
only a single one (the one covering the most columns referenced by the
clauses) can be used when estimating a query. Which is not perfect,
but I think it's a valuable improvement.

It might also be possible to split 0006 into smaller pieces, for
example implementing the "non-overlapping statistics" case first and
then extending it to more complicated cases. That might increase the
change of getting at least some of that into 9.6 ...

But considering it's not clear whether the initial chunks are likely
to make it into 9.6 - I kinda expect a fair amount of comments from TL
about the preceding parts, who mentioned he might look at the patch
this week. So I'm not sure splitting 0006 into smaller pieces makes
sense at this point.

Thanks for the explanation. I will look into patch 0001 to 0005 so
that they could get into 9.6.

In the mean time after applying patch 0001 to 0005 of v16, I get this
while compiling SGML docs.

openjade:ref/create_statistics.sgml:281:26:X: reference to non-existent ID "SQL-ALTERSTATISTICS"
openjade:ref/drop_statistics.sgml:86:26:X: reference to non-existent ID "SQL-ALTERSTATISTICS"

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#102

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#101)

Re: multivariate statistics v14

On 03/23/2016 02:53 AM, Tatsuo Ishii wrote:

The users will be able to define statistics with the limitation that
only a single one (the one covering the most columns referenced by the
clauses) can be used when estimating a query. Which is not perfect,
but I think it's a valuable improvement.

It might also be possible to split 0006 into smaller pieces, for
example implementing the "non-overlapping statistics" case first and
then extending it to more complicated cases. That might increase the
change of getting at least some of that into 9.6 ...

But considering it's not clear whether the initial chunks are likely
to make it into 9.6 - I kinda expect a fair amount of comments from TL
about the preceding parts, who mentioned he might look at the patch
this week. So I'm not sure splitting 0006 into smaller pieces makes
sense at this point.

Thanks for the explanation. I will look into patch 0001 to 0005 so
that they could get into 9.6.

In the mean time after applying patch 0001 to 0005 of v16, I get this
while compiling SGML docs.

openjade:ref/create_statistics.sgml:281:26:X: reference to non-existent ID "SQL-ALTERSTATISTICS"
openjade:ref/drop_statistics.sgml:86:26:X: reference to non-existent ID "SQL-ALTERSTATISTICS"

I believe this is because reference.sgml is missing a call to
&alterStatistic (per report by Alvaro Herrera).

thanks

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#103

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#102)

Re: multivariate statistics v14

Thanks for the explanation. I will look into patch 0001 to 0005 so
that they could get into 9.6.

In the mean time after applying patch 0001 to 0005 of v16, I get this
while compiling SGML docs.

openjade:ref/create_statistics.sgml:281:26:X: reference to
non-existent ID "SQL-ALTERSTATISTICS"
openjade:ref/drop_statistics.sgml:86:26:X: reference to non-existent
ID "SQL-ALTERSTATISTICS"

I believe this is because reference.sgml is missing a call to
&alterStatistic (per report by Alvaro Herrera).

Ok, I will patch reference.sgml.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#104

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tatsuo Ishii (#103)

Re: multivariate statistics v14

I believe this is because reference.sgml is missing a call to
&alterStatistic (per report by Alvaro Herrera).

Ok, I will patch reference.sgml.

Here are some comments on docs.

- There's no docs for pg_mv_statistic (should be added to "49. System
Catalogs")

- The word "multivariate statistics" or something like that should
appear in the index.

- There are some explanation how to deal with multivariate statistics
in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
section.

I am now looking into the create statistics doc to see if the example
appearing in it is working. I will get back if I find any.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#105

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tatsuo Ishii (#104)

Re: multivariate statistics v14

I believe this is because reference.sgml is missing a call to
&alterStatistic (per report by Alvaro Herrera).

Ok, I will patch reference.sgml.

Here are some comments on docs.

- There's no docs for pg_mv_statistic (should be added to "49. System
Catalogs")

- The word "multivariate statistics" or something like that should
appear in the index.

- There are some explanation how to deal with multivariate statistics

Oops. Should read "There should be some explanations".

in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
section.

I am now looking into the create statistics doc to see if the example
appearing in it is working. I will get back if I find any.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#106

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tatsuo Ishii (#105)

Re: multivariate statistics v14

I am now looking into the create statistics doc to see if the example
appearing in it is working. I will get back if I find any.

I have the ref doc: CREATE STATISTICS

There are nice examples how the multivariate statistics gives better
row number estimation. So I gave them a try.

"Create table t1 with two functionally dependent columns,
i.e. knowledge of a value in the first column is sufficient for
determining the value in the other column" The example creates table
"t1", then populates it using generate_series. After CREATE
STATISTICS, ANALYZE and EXPLAIN. I expected the EXPLAIN demonstrates
how result rows estimation is enhanced by using the multivariate
statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..19425.00 rows=98 width=8) (actual time=76.876..76.876 rows=0 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 1000000
Planning time: 0.146 ms
Execution time: 76.896 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..19425.00 rows=1 width=8) (actual time=78.867..78.867 rows=0 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 1000000
Planning time: 0.102 ms
Execution time: 78.885 ms
(5 rows)

It seems the row numbers estimation (98) using the multivariate
statistics is actually *worse* than the one (1) not using the
statistics because the actual row number is 0.

Next example (using table "t2") is much better than the case using t1.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..19425.00 rows=9633 width=8) (actual time=0.012..75.350 rows=10000 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 990000
Planning time: 0.107 ms
Execution time: 75.680 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..19425.00 rows=91 width=8) (actual time=0.008..76.614 rows=10000 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 990000
Planning time: 0.067 ms
Execution time: 76.935 ms
(5 rows)

This time it seems the row numbers estimation (9633) using the
multivariate statistics is much better than the one (91) not using the
statistics because the actual row number is 10000.

The last example (using table "t3") seems no effect by multivariate statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on t3 (cost=0.00..20407.65 rows=111123 width=16) (actual time=0.154..132.509 rows=6002 loops=1)
Filter: ((a < '500'::double precision) AND (b > '500'::double precision))
Rows Removed by Filter: 993998
Planning time: 0.080 ms
Execution time: 132.735 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on t3 (cost=0.00..20407.65 rows=111123 width=16) (actual time=110.518..110.518 rows=0 loops=1)
Filter: ((a < '400'::double precision) AND (b > '600'::double precision))
Rows Removed by Filter: 1000000
Planning time: 0.052 ms
Execution time: 110.531 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on t3 (cost=0.00..20407.65 rows=111123 width=16) (actual time=0.149..129.718 rows=5999 loops=1)
Filter: ((a < '500'::double precision) AND (b > '500'::double precision))
Rows Removed by Filter: 994001
Planning time: 0.058 ms
Execution time: 129.893 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on t3 (cost=0.00..20407.65 rows=111123 width=16) (actual time=108.015..108.015 rows=0 loops=1)
Filter: ((a < '400'::double precision) AND (b > '600'::double precision))
Rows Removed by Filter: 1000000
Planning time: 0.037 ms
Execution time: 108.027 ms
(5 rows)

This time it seems the row numbers estimation (111123) using the
multivariate statistics is same as same as the one (111123) not
using the statistics because the actual row number is 5999 or 0.

In summary, the only case which shows the effect of the multivariate
statistics is the "t2" case. So I don't see why other examples are
shown in the manual. Am I missing something?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#107

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#106)

Re: multivariate statistics v14

On 03/23/2016 06:20 AM, Tatsuo Ishii wrote:

I am now looking into the create statistics doc to see if the example
appearing in it is working. I will get back if I find any.

I have the ref doc: CREATE STATISTICS

There are nice examples how the multivariate statistics gives better
row number estimation. So I gave them a try.

"Create table t1 with two functionally dependent columns,
i.e. knowledge of a value in the first column is sufficient for
determining the value in the other column" The example creates table
"t1", then populates it using generate_series. After CREATE
STATISTICS, ANALYZE and EXPLAIN. I expected the EXPLAIN demonstrates
how result rows estimation is enhanced by using the multivariate
statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..19425.00 rows=98 width=8) (actual time=76.876..76.876 rows=0 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 1000000
Planning time: 0.146 ms
Execution time: 76.896 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..19425.00 rows=1 width=8) (actual time=78.867..78.867 rows=0 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 1000000
Planning time: 0.102 ms
Execution time: 78.885 ms
(5 rows)

It seems the row numbers estimation (98) using the multivariate
statistics is actually *worse* than the one (1) not using the
statistics because the actual row number is 0.

Yes, there's a mistake in the first query, because the conditions
actually are not compatible. I.e. (i/100)=1 and (i/500)=1 have no
overlapping rows, clearly. It should be

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);

instead. Will fix.

Next example (using table "t2") is much better than the case using t1.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..19425.00 rows=9633 width=8) (actual time=0.012..75.350 rows=10000 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 990000
Planning time: 0.107 ms
Execution time: 75.680 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..19425.00 rows=91 width=8) (actual time=0.008..76.614 rows=10000 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 990000
Planning time: 0.067 ms
Execution time: 76.935 ms
(5 rows)

This time it seems the row numbers estimation (9633) using the
multivariate statistics is much better than the one (91) not using the
statistics because the actual row number is 10000.

The last example (using table "t3") seems no effect by multivariate statistics.

Yes. There's a typo in the example - it analyzes the wrong table (t2
instead of t3). Once I fix that, the estimates are much better.

In summary, the only case which shows the effect of the multivariate
statistics is the "t2" case. So I don't see why other examples are
shown in the manual. Am I missing something?

No, thanks for spotting those mistakes. I'll fix them and submit a new
version of the patch - either later today or perhaps tomorrow.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#108

Petr Jelinek

petr@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#107)

Re: multivariate statistics v14

Hi,

I'll add couple of code comments from my first cursory read through
(this is huge):

0002:
there is some whitespace noise between the varlistentries in
alter_statistics.sgml

+	parentobject.classId = RelationRelationId;
+	parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+	parentobject.objectSubId = 0;
+	childobject.classId = MvStatisticRelationId;
+	childobject.objectId = statoid;
+	childobject.objectSubId = 0;

I wonder if this (several places similar code) would be simpler done
using ObjectAddressSet()

The common.h in backend/utils/mvstat is slightly weird header file
placement and naming.

0004:
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))

Huh? We have Max and Min macros defined in c.h

+ values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data);

Why the double space (that's actually in several places in several of
the patches).

I don't really understand why 0008 and 0009 are separate patches and
aren't part of one of the other patches. But otherwise good job on
splitting the functionality into patchset.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#109

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Petr Jelinek (#108)

1 attachment(s)

Re: multivariate statistics v14

Hi,

attached is v17 of the patch series, with these changes:

* rebase to current master (the AM patch caused some conflicts)
* add alterStatistics to reference.sgml (Alvaro)
* move the sample size discussion to README.stats (Alvaro)
* tweak the inner for loop in CREATE STATISTICS (Alvaro)
* use ObjectAddressSet() to create dependencies in statscmds.c (Petr)
* fix whitespace in alterStatistics.sgml (Petr)
* replace custom MIN/MAX with Min/Max in c.h (Petr)
* fix examples in createStatistics.sgml (Tatsuo)

A few more comments inline:

On 03/23/2016 07:23 PM, Petr Jelinek wrote:

The common.h in backend/utils/mvstat is slightly weird header file
placement and naming.

True. I plan to move this header to

src/include/catalog/pg_mv_statistic_fn.h

which is what the other catalogs do (as pointed by Alvaro). Or do you
think another location/name would be more appropriate?

+ values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data);

Why the double space (that's actually in several places in several of
the patches).

To align the whole block like this:

nulls[Anum_pg_mv_statistic_stadeps -1] = true;
nulls[Anum_pg_mv_statistic_stamcv -1] = true;
nulls[Anum_pg_mv_statistic_stahist -1] = true;
nulls[Anum_pg_mv_statistic_standist -1] = true;

But I won't fight for this too hard, if it breaks rules somehow.

I don't really understand why 0008 and 0009 are separate patches and
aren't part of one of the other patches. But otherwise good job on
splitting the functionality into patchset.

That is mostly because both 0007 and 0008 tweak the GROUP BY estimates,
but 0008 is not really part of this patch (it's discussed separately in
another thread). I admit it may be a bit confusing.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

multivariate-stats-v17.tgzapplication/x-compressed-tar; name=multivariate-stats-v17.tgzDownload

�I�V�<ks�F����Y}X����(J���"�]Y�Wd6���TC`Hb
4M�����g���m��:Vb��tOO�{�G�v������y�y������~�_p��|�Q�N�(t���7�<���l�i����v��v�'~����7��<�u��o���x��[a���XF<d~�p������~�����9�mg<�:\��Q�wG�QO�c����G����n�]>�9��Y����c]���h^�a0���=������}7�p��i����z�#�b�����|x��:'/{G/�^�Y�x�OaG/������X�u�#%e��5R��FI�ZI))K)i`YA4���B��dc�z�-��g�0����?a�w��8HZZ�F�b2�[#n��
��;s���G��<M�}e�#���X����'�=��D8�8��A"7�em�nY�;�Fc�F��6N3�8�r}G|f��������n���X��!�f�y, ~�����5:'G��t��9`�$�]-��M�����3�����������-��������G�x������+|�(}
{>��������<�)�i�����_������E������~���^����$��h&�A}�gb6���F�#��0F�gJ��C����f��1�$M�+�A?��!�_��G��wE����3a�!p���#�7��x���(�n�����Z	��B�����O,E�e�!�_�3OK��/8;����J����6���0?���}�q������/:�)���h��s��]�����u����^�mY��/�`�m�)��f����z�������^�{
G��#�v�|D��fC�?��z��|���������?q!���{( �`&�������=���q������q����i1��
;�?�����yyx��S����v��v�=:l��,����=�����xFK�=�}��,���2P�H��Y #����0�~�-_ZP8���������<�^0a���~�p����\��/�.���l0<^
�W���`�7,���/����~���ji!;@�7w����������.���l���n�������r�?\^��������K��Uxb&|�
����A��$����s�-Vd��(���`�T����������I������@�b��;dw2����"���,�sl��J0���9L�N��m/vPo�:1�,
�9Ir���MI���j���Q�H��)
 �5�����x��7
Yo"	��a��"�b+{��)�]/
��.�	�����d`ES�<����4@r���`A|F	1$�|	@#�{�i�L �,���W���f�]��H�����H�Ma�@>�?�"���F������'�R�+>�*`���0�( �8�-P�j�9iV*|}�w���NE�f��x@�k�``7�ZH:i��F�:���i���q��"?�r	�4��b��b�B5���x��d��F����� 	6�d��Q�?H��ji�7	\0��&PT%�rW10����.pHg��&Ce�������Aq�������zD��L�ZC�7	BX���:k�,����(A���i�li�B�F��0x0sH�.�[4'�t� �x.X9�!=�>i(������a!-3�[�K~t�B��H���l�X�.�2�]�i�1���:�`E���
^�&"��1����I�]�_r�d7A8���~�'�
�bB�)�mP	��LS_�M��c�_I� ,`&�-`{1��X���k^.������(2z@'�-�8hd����`� ���w?^���[���e�m�?|	0���IF����j@'w���W����@<�1����+�^����}B��XX��, ������X������	��(�H-��po�	 �(�-��)���]I�,�SX�������9�]�
�?��2	R�]���:�����6���@aFNq E!��h(LL@�g�����P'aDS�5��qs07���V,��!�VW�kZFt�'Gp���%������sf"�����s��3�3�,L�f>-y0����c������]D�W��W�	�9��C�f%��z�L�
�h�ei������r.lw��)�s�|�U�mY����2T�'�l��F��s?
�*��-����{��!��nS��:E�{$jGY�K��|��|�\�.C0�D�;������$0SS;�"�B��R���o�(R��6�xhR*{�'
e���h���RDV
�"\Z��m�!�t�k`X���w�����&�'�`�I���M	R���d�Y�Ta�@�,4�d/0����L<�N�zD����:��)�(I9��05�N���H����<���*bt��M<�F��k�}>)_@�/u}p�d`����6�����k�b�����l���|���^)��O�qi��Pj����,�*�����c���,@�`��EzO:�T���e�W����A�D��`�*0n{��c�.�����������{�r����������fK$�%kp���-�Sn�6�FX�;��S�}����������@�Y-�Q�1���B%RO��E-��B`��`�����e�d��`����=�?�I���D��.��+����@�%`�/1�-��DQ�Q�Q�x����:t!���	+�	��(x�>x����#XM�M�Q�NA�F��4�USp6�D|�G�d��X`���[��=��K�-_��7���L�Z���
����
~���~�`j����*N�L�c�����X��oa	(����^r�rbz��MOp���H����b�U0���9d(EP�W��\{N+r9Xn_1�����S���%�\)�N;s��+2%��b�V�l�y0����r	�3E��+��Ec�3�|��h��S��C�<��G)(��S�:��W`f�|V����L\]X�L�mm������8��������wHrzm���rX�����A�	5������!�d=��C�����
t�10v.��1�w�|Z�,��$��v���9��R2-y`L�G�r�\p��g�����PuKW�P�8B�lYKlZ^Kl[VI�Bn��6�G�G��D�Gb0�i`�G�e#t%����2��*�o�J��Pas��
��h��^���GN������|�m���B5�l��]l�����q���"Z������n��-�G0�=����A�lZ��/�7������{ �?6�y0��f;���Q�sV���J��*PuTZ>'�3�{9��P0�.���G�����:�����$�xp�cwR:����EA��(����
Z��@m�f)1^Z�g���!���CV����������<dZ�y�5�VC2��AWH;x.����I����fT�z�����<��1�f��"G��
y\��sp��V�i_��,�5�e�:�F�8 k�Y�
9'`�Q��U5����t
Uk��M!�l��4$�%bxdE��p���n�n6����=����������S6
*j{2��~�ZQ~����%���@F�{��^�t<X{������.��K�:�w`\��'W'.�3���3�����w �7h���4���Cw���Y�A��e� �V���`H�LD\�04En��L��	�C������Z}1�f���=�9�`� ]��D04uDNX�aD��&M!n8�C����Go��y����R�����$�7��4����x�I9�>����t�zxj����v�S}HEFl{\��;P�F����9�1g�����?�;���YX,���/�������;�����+\W�7����B�j�B27��E�g�!���*"��8m�������9����4e
oh�O[�@�e��JOu�V�����-5��:o��4��
�m2v�/��t��O�\S�HES�=Y��a*P7d�Y�8B�dS�^nn�?1�~��O������qF��00f�S���@r�-�$t�L}�������:.T�	(�0��~��9�6S��Ug�hTy�#�B�"t����U��9���PP�_�	���#��P�k�g�%|Y��k���"��&���B��Lc�����n���2�C}\0�����j ��RJ������x@������{�z9gin9u�J����	��j�3*��Q[^�!�O?k�imd�d7`��u�g��S�3-F�G{#�\EY��4WO2��'i����4Uu��th�U��Z�"�T8��V=�R���Vn�O����4V��?V��t&������u�C�I�6��E����/S�@�;J�s�������(��meH���Fu�0f��n����
���8��C0����������;�������P���3�����
SUZ��=][@R�Fo��P	YUBR��V������_����������O��������']A��� +<��Ti�65dQ��"��[��*�����,����T=���K���Z#9?�����v�.�O��M�����@�]���6_	h�}@.l,=U3m�_������ds�E�G,(�A9�^oBB��P� ��+�6�{�v�pV������ML�����I���t�GxR)��i�n ��t_;��z�u(J�����N���J��Q/qd�5W������-`mix5^5DS��J6q��jt��3]���WG)F�rx�k �HE)C�WW}���f�-_���Q�4��mrq�Q�p*����8����FU��e���:�z���x*�T�t���X$�K��U���'�� Q"#U�f+����*8�E2^TVN9�d:m�+r�	���9T�r2��x�	���OIx���|=�,�$����!.�gA]2�l!��f�o�YQ�ps&q W�Q����e�=���T�9�l���6�?��Q3T;�BvD��W�K\��U~$�+���S�;�Y�O�{j��m_�U���#!|�L��!5�������|�,~Ch{���/���S+w�G��*:��V�\V��R�A�� ��Gp�2�x[hW��7�n�XE�B}���5.��� �|�cX�����B�nRp���k��mj\
�4h��Y�H�"�lUHHIj��%�S������#,j	�+X�
9u�����PF��bN�����A��6��}��G��:J��l$4�\��fvk%�����t�.�v����(��#�r�Q�K��+�UWo;=��K��$4���M��B: d��� ���]wb}i����nv�wg	��+���n�r����Qe���E���L�����~���'��v����K&E7^�x
/\��M[�Cg�U���I���[������P���V����c�a�wC<�s�
{u�X]S���������f���o�\�.����fx� ��w�C��:�60��k����>o�n�����x$�:���Ug��*�%wF;��8�8��F��p��� �UcI_
����~rK]�ek�����_���f/�0�O?\�]���5��a�V����A�o?���nNVk
��7��U���B����A���p����M��v�m����}���m
����>���.�~��0^��fO8/�L��������?�npu�{��/+dJ�����rO��������]���������)^�t-�l~%{������$����/���$�O�Kf�
��=�������g�h�.�M����r�)���f?m�4�;JU���P��q[9�#�X�n�M�f����9H�����@6�n?�!u��;���[�\s��;����E�?Z�&
_�w�G`��;��{������U�-!�y�~������������;)�qt�%��:�����
���U��Q�����W�GY���-4u�[��ij2����$�:��a���������W�7�t(!!����y�f0�g�L~zJR$�Z%�����=��j������'��U��r�������)�X���z6:@&���;�������C��y^�49� ����N�����������a�0<*O��� �(9�$�������e�����v�.��y��-�	`n�~�!�;��im�((��_UU��v��l:�n�w
��'�t6
A����1���b��/X*��3��_�O2�	G��Q�����G�g2��ID�|<�����m$���l"s�{#�G}�+� w�}N�X���t�~�z���m�p��"`��772��	���g��l�i���.��v�X�fCJ��	 �z�K����)�0�Z�g7;���A��[!���Xl:(�b�������P����L�kN��E���������j����5e}:\��`~y}t��^��P.��n��)�6 ��cJUMl:���x�Y7�
� ���f�VU��4rC]�`��tx��*0&{�@�8}UC�dc�ZG�q����4;|��M�t���9����aC�J�B]�857&���G
���F�@����4��e7�J69�qkN���t.��it��W�Q
��LR�
�~�\��NHr:x�^���(���Qo�����;T��/�	�f�
:����<���"|xr���}��������{���wN�k���Qz�e��05��yFYk(;9V�i�b
��Hv�����������k=�M4i�w��e�y#
i���wa�EO��"T�����9f<�*@�B����}1���>�&��./�:��&��S�����Q���F���F@!;{�#��4z�t��H�b�7����L/SW1As��l�
"�=�������������xz��Z���m-"�p���v�C'P�'�6���Gv����i�o(���".�(���6J����p�v4q��L.��Fi������U���u�C�8'Y�k�������/O���\�W��a���;��8;=�[nVL\�m����<>�z}q��	`A����o?~����wte8������ ��C�2>�n������$	��cN3�6�
����ax�R���+7�����A�
7_&B|kT������9���P*��$���$e������>�CER�>��"�+��jm>�F��v�5����g@���V��C��Kd��F���v��t�2'�p�jnYl���L�gwX�[�����������e7�H��`��]�c��R�))
���"�����=�����r")���Gi@�� �F+�to��dF�Ov�,mf���;e\�WR��������+��O�1�|qo>������e
_��)�K��Gxx�@A�4
PV�Z4
u�<t>�$��Ct����<�"V��T/�6lQ���x2�s.��m�+[���U���Y���=
.Z�:��~�5Zh!R�2!�����b�f�-D��eD���E1�&���-���C�7�x��^�uBd�n����&B�G�#hl�GGv����l�#E�g�����xR�!��"n� �
u�������5�&
�
r��+��z��a�Sn�D[2�q�5��]NB����,�S�|��IX��rN:D�"������*��8v���]���c|�R�������_aE����y�$��N���I'M���&Le��!KLV0�|��[�a6
z0���kMtbp�|������3g�Q���V��������w��`��p����x�'&H�s
��
��(�?��rJ�U���$��i��P�/^h�B�B���-1$X��>b�&�%	W\�<y�����gef:JV�Q8�|�<�
���r:���h��/����s������.W`��7�y�{|��i�{#��C�!���u,�~\t��f�c�u�c���u��#����($�u����x|	8/��1�YV��xs��7�44]�������fC�������4���L�dt<x0yL�D����@xx����T�S������&�?h���P�,C�� ���0����B.���i��Y"|B4�#lz�C�rP�Vp�'��������}g�p�E����*T��G�C/�I���x#����X�������t>^c����S�����G4|*B��c�{�%SA��(`��D���Wk���*���a�u95���NaT���.����T���6j����W����w��tb:�;)�	B*V_)jx������3<�����f T��|��E��T�����c�U����j�.����Pr�7_"���$��)�y��3=�S�����R@�f�(r�������bh�f�S�.vt�r��h���W��W������.7�i���|=��i�%a���Y7��7�3��"t|x��6�]���]=E����5�xXu�H,r� =!�&���PL4��MN�^����"�����Z*w�	���#��H`����v�o67��^t�\s�(��V$����i���J;N��*�E����t_�??�;�{|����axz��~y�����,�S
 �����P��gm��~@e<]5��34�o�����L�*�o����~��pJd�g�m�{�g�x��a��[rC���B���%���8��h�X7��D,���3�F�|:2���?kg\} �����0�,'5�ia�
��T�H;q� �Sge���$��O:��R<��w�v�\�b������7�!6�`��$�k[_��-w�M:�z�UQ}�P�O�/0���o�BT�d4�B	#1��T��a�������;x�K������b��D�m�[���!�jj��!�5{�F��a���I�1�
L�Y�\�=9���t���'A^�A�����y}
�)��J���"q�/
?"��������_kJ�k-g��K��A�p����U��IGvf�t-���;��u��\
��N/<�k6�{������l?��C�%��!d���iC���`}������{�F��7g��M�2c����
T�i����?aH�����.���?������44Y+��2o.|�����/�$�q�ty,k@ ^���X���V��F�q�`�pR�pj���P��V���m�/:�'�H�4c��Q�-��G�DW*�d`b7��n
2���[c�
t���^������G��hx��Tw
_���Q���m���E&(xO��eU��H��v�$"]k(�s��F��_��#A#��N�4�����m\�����i�����f� \LC1k2b������F>��+�g-A�a`:�?5�<�F���X��d�K�<6�,
��s���]��3���m���vp5�Q"��Z%c[lCH?S�6�\��4(���du�!���G�lm<�O�a�����B*�v������6Q����+��.��%�_����&�0{���O����~}zr�f�w�����u�tj�s��|�a�.i���"4 F���&��
�:������|��@�N����G����KiM*��o�|������=��OW��9����e�����'.t���
��o4_�eo~s�H����\�w�`������;k��8����J4���i����& �"5o���S���g����l�	�w��'�D�P�qBhE�
:#C��aFt��v���j����k
W��e�VbS5o`:7a�%(�Xm6��dTn\�[9*�Vp��������p4�[7aMH��O�Y�������������	����F��T�vO��/P�.KfR�6@n 
q}�l
�u��e�`e,�`>	V��)�fT����
�����g�]%���X.|�
 �B�f"~�Q+��m6���a��Y��r�+���{��?Z\A��i�e����=�������k�����o�������)Z�&0��m�;�H�����A'��t���G�Y|j��P��:�������(���m0�ys�N!�u�]�4t��nA��4�j
*R����m
�Z���k��vvP�A4I�28��{�EnCg5^��.[|���k�����l!�:�=9�=���pf�������������!I�,���p�{��S����S :5�B�W�Jv�������������9�m������N���p�z(8U�6�o�8DovD^#2�U�J�	S�S�RL��H���q���h}�w8n�
����DE������83����$��,�'vC����/���~�.��B�Nrk�?Pa��dM������j�)�����hH�JA�����p����Z�i�
V�L��r��!���#�V5�/u�!��U��x��������Y:G�~�<�sTU�;� ��EF����
�F��-Agy-gf/���i�4��GG�{�G�"��=,83�	�Q�#!u��1�xL��h�_�3b9���.�g�,�������s/�����?~�y�Hf�|s}urq�����K��(x����W������_^u.�1���.��B�=Fx0��p�����F~����e����
�$���������?Kw��ntb���t����V8��2u���
L������j���n{Q�V��EG�4�{�p��s��
�����K$SV��B��Y��1�f�����@F3./I����/������{*;��6��]pf�������P�{]�
,n�������������,�~��[;0�p���f��������\[�|�L��z�����)gg�/��������x���n�Y����-@[�v�����L���1Xt���PR���G^�$��&�o�\t��8L`�t����``� ~���a��!m���a�����tr���O��
��8�����Bo?��\k��p�FV���(4���P�?�5�}���3J����7�N�
�����2�<���cMb�*�#�z� CM�����_;���:\a��>O\�������i���a������W��N������xj�MX
~�d�%Cq7���!���V���[V9�0Zt�$�`M4Q�����j���trvj��y��p6KL����+��Wxs J��2��~	����f�>�u����R��4$���k��C@���Q{{��=��L�6�K��0���1$zM�B�(j������u����5�3��?���}����}z�]��>��3�f�kK�X�#�������hi��R�Y����b��w�I���O Y?��u��,�2��OdB������_����z��d�������Us���;R�8+��I>������������v	'��=���/?oVT/=kN��R��=�����[:������eK�66?�?X�	W�������T�`S���L#w�.������qJF��_S������vk����yo�I/�W@���U��4�O����P��-�������c��1{�''h�H��8���@�WXM���	IU����!_V�o�L�����e��,��HN�wqWB)x�	�I��J;2LLt3��;��8����{I�I�����;u�sW6n��A�T��i��xL���lEi�y�cJ�;B�g���D��hj#l(���xy��GJ�N��(�����F�p}�)#�/�z�#L>K9�q��)hEx��3t}��Br3���$�8�����.����Q���0�m�pJ������|�����#_�=��}���<Ek��Q�F�l�����Q}���<:�Kst?v��f��:v&�A>^z�Mu�/��*`�o�Ct���6�QE��1��u���{������8�� 
�H�k{��`�I �>�����)�pC��g�@�v���K^G�>�nD�����z/� u�f�s��P�9�K���v����U��������������x�0a@��W+���������(j�y�����w�g��B���Vy'����s?��4�5��r8�	��E�`��T!�b���	e2���S���L���0< ������������m9N �P���)�ADi/�����	}:M�R��h_�-�!M��yN���k��CZ� �$��wL�B�D3�����X�K�/��&��'�������U1{z��+���u�#.��7��������f��<7�[����X:����=���Ss-�y7��)���|��+���wg�uV�J�����f����!�V���M|��n���@U;�6�1�C�+>&�{e��
{,5��n�����<���������b`C��[V/�����YCY�%5��	�6����g-��0��9%�L�3�f*�IX��#�������IlI�I�(U���kzA��W�{������u���-K�W������0y0A��"����U�������>�Q�U�plRE���i@LB��>�,�dn������%�2��c��aR��%7��d��[��^�GM�B�����C4U�d>�����g��7������3��jr'��e��`p�O$��a��g��
�4��<��jc}4�4Y�Z���(t����,����x\��
���p�_�A?t�.����E���/t�l��/?k��B@<��ZPX^��o����3c*����4WlJ�T����C+0E|�t@�P��Y�����P0��A����Zo�5�!Y�!�8��_������i������]�x~
�������s'^�F������?r�
N��O�&�@�"pD~���B�A�z�Y��������~�b���<�Aq�}��b��e�q���44����
_(&�i�����7�Y����G�j��&%�I���Q>iK�@�N�Y��@�1��	��4"1@�t;z��H������a[�%3�4&8'Z 
/���-�/JR[<_�	��D8���b������KW���J��_���&2�`]���;+&%>ny?�7�����@:CZcz*8��)�es�n��gSo��c��l���b+��G��;v����1������R��	k(}�%s��^�'n�l�;q���'�#|OFf�xP�T�v��ao� 
N#��4�x��!��C�JQ*��������a��":%��<Cdk�'�\�Y�o
`����>v��Fq$�fM��k��t��,�b|����?�W�w`�O~������BL��q>�8=����a�%
�Uc��8����p>KF���2�G����>��1���3�A<�$���ne�
0�������d�pP��`-���*#X$�{	q��FRV��4�;��vn�D<���R�qd�ux��hH�{hv���@��)ha4��/[Z�F�4�e�������>��+����'Ei0������Je��L��ek
M(��f��&�!���$�v�u�K��.���EJ�	Y���S�i�=��v�8`R�|�]�,��8�5�dq��ZM���B00�F��O�����Sg�=%��5F/I�Cz���(��)Q?enBI4d�V�c��0���-�]����hO3-]� �d�t��L��?o���1;���a����9�G;�N'��F�|�]�Y�g�������}9]2����,�Tm`����$z��y����Q.g��i��s#��� �}�6dD�;`@�Q:���?.���=G���i��U�������&!A�K���^h����:9>;����&��������;�ab�1�FLU1�&�$���{�����{�;-v�����y�W�+�V�{BqT��q25�&p�a;C�&uQ�����n�,��eN�+pR���H}�����i�����ZB��l-����Y{��ss�l�w�o�����(����[\�~�)�C��hq1�Q��
�D��6��M�@B�������7'��3�t�D��<������)���������X�R��7���@�b��t t�� �$7�E����_�����>Xf
����sk:�aB4CV�tk?'��-[vz�n�+�{����� $�?���ff�~e�]N���i���DM�;�
�=�_��y7�c� ��t�$���}Pg�AI��n
$�G9}y�5��I}�>.��������#����[���5����Y�,�5��������fs?l��w���tiG��]�����������%w/����H�EZH�����!�����/����^�L�tX�6�W����mO����93: ������pg��2zs��xzl���*���P����D�"�{�����qa�#6S�C&��$�I� i������X�oP��|��m�.L	��	���W\�nT1���i?���{%J�\SNlN�'�4#��H}�c��sm'���
�T���(��8�Y����������/i��U�B�%s���<��L�f�,$�/��W�|���o�g�y�j��a�u�$0,L�R.b��v,H}X)ITb$p6���;L��q z���w�I����h;9����9%9�#)�#��.9�#���8��3R�M���k��$x��K$f*�p���6�|��1���x���������o���K����C�OmL���uw�K�����ei�W�;�.��l��6�QS���~v�J6����C�����3���Z��Y���c�����V?�)��!��/����pN����4��,���J<O���V�6�P���#�q;��OTja������7�@Q��w�L�\����|�pA�;]�}����jw7��YG������Rk��a�u�p�����I�.���!��w!
���c��a;�6�[tK}d���A}3�?���!��8u�8	SjD���W	V�p���[�3�p�������t1yMY2�G��O��;,^O�m�����]�����j+�R���`�D�=���<8�^3�sy��s������;�v�����.�u���c�V����������]�U�"r������,��MeA�?f@������,Z��F��,�8�])B��s�I��Lf����g���:=��Z�(3�OU~U��oXqw�9*�+�W8��O� �Ui��p�>�T�w����1_Q&��J���;�?���Y?���7:-�������4]wv���5?����f�d=Xw)�k��u��A�o|��.��\�9��x�VW�[f�F��c���=�����v�=��Y���{�;"[�niA(0��������������S����<�'��~)4���.g������
8�3A5���G8W)���q�T���oV��>N(��)46����5l�U����[e�|��-)������ P���]��^�������	�|����uJZ�i����z��x)mc��=����=��T������<!��a��]XL8������zI�H��40'R"����7���e��g�M~�&���#:+����"������%]�mB��n�`����G�n�&!���W���s)�w��cV�"���@z�w�����8�lGIB&og
��r��=�d��~�m��D�vr�	���Q����;��9�v)�1��@'Q�n<����:��=��/7D���T[�_��=6YS[�W�Z��N����Y����[.�����T��,_���
����l�v�>�.�g3���LNE��s�'�Cy7M�Y�=��&�P�S��x��<�R��@X�c�6j�|��)��cF�6
8*�M�'��d��2������R��B6I����/����%	�[��m��z���7�f!�J��8�x&��Bj�dg!&yY���������g��]{J��+'H�x�K>����������Y�Z��nQ[��f��-��,�ON:WW����o/^��HF%�8n�����L��qq�������4���o�������|�V[�����s�������/����s�������*�����I�Z�����9.�����T�c�$��_�{;�{;;�f��w���t/�y��e�
�x��|��w#�o6�����	g��5o)
R}�<+���Nga��T7������J^���bo=*,@
������"5�a����c����#Ebd�>�7��P�+�T���
���I�9U��w<x�gD�v���������[������l�����N���?�aE��2��5\��y8��uGV�YPp��?/o���zp.�
$������q��	�h�`g,��y#�[�-��cPn��!v��Y�^�E����|�a@q
vA�\p�;���\1h����i���G��m�F�d�x��1�����6_���q�#wgX����n�tO�y6�����$�V[7V�F,*��}�m<U��u$�czP�Z31<?�%`	4�x�h��]��v��A��>�AG�����9�y7sUW+�!�&'D����h"����$tLqHD�V��i���s��1��>��y1�1������b�w��!����~�u��6.
�w�8u6B�j���d�������>F92����jp�^��Q��(X���6��lW�ig����e��� �����u6�V&�����rNm-��"�����,S�
�w��F����/�������6�!8���L�4�s�kPi��ly[=����T������5�#�?���@���SC�aZ��H��C<��]'���W��B�����!*��_�J��B�^���M�J��Y�",X^%���g��i��-z��B ��oY��sm(��\�Ct~�����cC
B!�Y!���I�&m`�}hM��:�_�i8jS�����.I0LB����u�/�"���Q�x1 �_�

�,���?w_�*{����}�o�UT��e�S������y����/��Rj�> 	�F���e8b$�kX��4>u�XI�H�]���D���J�[�L�Vc%�n��NC�
R�����|�M�,]���v����������O��&�v[�	����?,����7��sZ�\��]�SI�K��7�*��Ar
8�1�-��������+���l�����w��U��@~��b�-g��a8��P���fR6!U��(:���0$[<�O���b{pT4�%�W��z�l��R�������%C8ID�B������_�J�EP�7@@����|�M�I_����3N�ZO�=����^��hUP�R�b+�sI)��(7iV��v���Z|�]v�_��4��)�p���J���9�������E���o��q�~���L0X8� W��X��z��L��)I\�y<�E����"*>��Q}�H'H���$I1���tz����������(C�K3D�)�[Z5Q���Z1&?
%������Y�;�������hSBb|�c�0Z���Z���	'g��\CR� x��=RQ3V�r�>f�T���
��N���+)��b�d�Z�\K�6�����P]��j*
bfeO����>9]��Ap���8�0�}=\��X��#�����T���-����G�;��wr���y�~���B5��#������9j���c*aN��(ahn����&��~S���1~&�[�/G������<�S��9�b$@��:���S�I��u�4g ��FMg��"r;�,�Nug��E��;��WY@��MP�2��X|�����TR =H?2TR���w���Rx��t|�_�|`F�5~Du@�58�}<�q��^V���ez	�������3������%���&{��'
��N�Z�i2�Crg:I�*�P��"��5��L�Z��c� 8�������E:� m��4>�>�����5*j9L-�Z�J�h�W����k"�9���'p��1���WH�`I�����Z�������W���p�����z���;�Z��@����*4��@�8��D��y}�?������������H��f�����h�|���L��[���+QUw=d�����G��l��T
O�"����WF2Ft�'�yG
H���
;R'���"�jx!T�&� 14�I�}��H�T����!�P�����fNr!���L��`��#�����1��x"C$n#����~D��-1�B�8PR��z
�^7���b�"��n�(
�BJ���f���{�u\��p��n���S�I����Yj�$�WM��j�����0�K����=�&�5S��r���g&����5a��'�Z�z���t7s�A�/����2P�	��@.,:w�^q�N$���$N�1+��u;L3&pc�����)1
�#��#m8 ���7@p|~|��v�-X�c#���SZ����m��n�t��!�&k�qf��� l�jo��1��VhE�>f���I���S���� b7�?z��k�KT��b�
��_�i��
!]�K~�`h���?�,H��t�@ R
tV�-�_X���^MX:���z�WfD���D��"�W�MDR����xU�b���:n�������f���%���=d�DF6�1�y B��s�^*���v����t~�+�"�{��N�Y��|���,]��|V�gf�}(�q��
�0��m�[�`����3$�Y"�xG�/�HV2b��Z1]i���;dj�+�,������!=t��RX4:/�o���Q�� ��g{�&��q�[�@����c�h�x'�9��T/��������IW&��&��4�������tk0����2�g�K)Y3�Z�!�	���?��&��G��[c����9���>�b�.��B<c>�8^\y""#��� �\������k���n���fO"}��^4{�`^?m7�U�
.�'��8��x�R$a\�y= ���:�[Ab�`f�Q41��|��1�G$�2])��:3���%��Yn6�K>�M����z���{5����b���R����
$$�M�d�)HB1A}?8w�H����N�7���CQW	5M&�4����:���i!�=�+6��n�=�4O�����0F�(���u�\v1���:��+p#zi���$�I.|����dP\����� ,�>�����L	�.]��=��]�/�2�
�Eg����	���"
�����iz�E�$�p��)yH	'> _�%6X�1����_t �)G��a��������\�=�9�����6C��,
n��u ����������)��.g�Vn�2	���%�_
�3<e`�G��@ES�ug>C�e�� ��2b��^���FTo� ��\�	^7���'�P�;�W��l�|$B���4bQ����)� �����	� ;�^�������pA��E����:D^b�L��
��Z>@:D1��O$�! �F$@71~D\s�\k�L�,t7;���t�s8y���N��1b�)!~!3#v:�<~W�z,$�!i'�E;i`B";�h4�=����3Y���������6��!��k���k�X�X����1PI*1�����L�����W�i��3���Wh?�{���@���|����YN�|	���w�y����1v�O��	�.{�
U�)P<]j�3W����o�
iq2t�DtE��W�= �:�5M�v�F�2dO���Q��g�x�J��Z~�p	{��u���n�q��g���<�2�H�4}w6��O��aK���^�BP��M���!�IEI���.��b��E��p�u��o�dl����7
%����
sk�����1�[���U1�.7�6��o�!�i���Y�@�l�h8�������|L���L��X����-N�)�E�F�#{9��.�0�����-�t	Q�rCi��]�%�4r����up���
�����}T�Q4Z�:Fe<������N1�_�7?-0:����,"�M=�����?q��������K�%���3v�@�]OF����9{����(7�n�_��f�W��4�a����8OM�A���=,~f�����.,#cU��Q��^#+_9�H8M�{B'����={��g3��Q�����I���$��� -�^4D�#7� ���
�v���6
��+�J�� �Esf%d���
h�$+*���{�kZOk\n\������-����)'���9*��4�7b���ZM�������Q;j�3��8l�#�m$uL�:�)z��j�=�!J����x�h�e��G�:N���w����)��b����u��e�(N��R/�Hc�/R���hto��}Y����7.s���^�������Z����
����(l�q(;���F"��B2Zi&g|f�paX��`����;���oc6��f�
o���p�L��aO�*Q_�k�y�k��I���6}Q%x�^ ��rp��J$K;6J�4�ht��df��*�GST��5h�"���s|D��?�2�HBu-'��#4�F
H���e��)p5���@������Xk�dh�olI�i�F |�hLU��C����,az�uy�d����V�5p����8��JM
�0�MW����m�T�^�-�EV�D��5�>�*A�u@���\�3c�M��	+��LV�\����H���2K�X�Y+s��&'w��$���g�h�:R�k4��/N�Tm%(��_��>�KLH��c����bu���I���_>`����Qq��7C�yS}3'������7��IY��!�qsv�Q�i9�*T/��1���V���`���(vt�4�r_w�$�}�>�I�b�	�T���%�y*%G������sb����3�-�wXt��03�gMLl
]$b�E!��<H��lN���1H�����U�/�Q����p0�dt��FXNn�f��}�x7�'Ewiw%L3m���_�#g�]	�� �����gu�hI��O4���u�*������0|��z%�J�y(�R�-H��p:���-�%f~-G�(d�����$���0wfXgt"���_���H��EH|c
�\d���A��z�	�9n����,��2B#8���K'���I�L��#��$sp��v�mi����&T�@ p�+* eI��V,df
�[���<���5O��a�Tk>p]�d��e���{�P�R���9]Jw���z��,=����_�h��U*�|0�D��J3������N�n�|�����F�� Q"���Q�B]�L�a��3�fQ�+ d}&�}q� �c����HI
O	����8u��g���y��i����i�p\0w��skc��J�Lm3�S��������U������}�"���U�Iz�������fS�@�tv;���|����z�;L&�*����]��tKum���ct���	f�J��8���up���u�������9�<���X���>:�S�����+]�?� �p��i����M����a7��}���dGPCT7$y�E��\������nlM������D�:�������K�Jm21���NIy��n"��X;��8$��@��V���r�!d���V�
�}G3,�%d��=���h��D\f���D�C�y���-�&��D?�N��7���M���a_G�L�)����<�O(��K��o��0������X��;�R�����U]4�t��G��W�C�)�(i^dp.�*3�)���}���V;����_(
-�*Z't��0�P�5��5�T�8����~���R�9�
 �0%U�����������c+��7�����2����O��)'�����v T���j����y��ZW����uti����x��T��B��>��?IS���������U��R]W��B����4zN�����������O���.jL�������\-�F<��#���
�������*���6����
L�'�^G={�C=�	��[����l�i�����z�|��Z+H��]�����N�mNr�]���&u��b�����������$w�@���Mag����j�v\�/�bML����<6��
00�����!�l�.����ee:�Q�o%vNe;-���O��8m=�N���ss�q)7���a��[��X��Q�%���R?�)�1MNb0��BQ�>�=��@�"f6����c�e[�U���DO�b�q�P�<�����r�6�����?p�ol�|�2u���0S���|k,���x�J"��Z�4�)���9�T��ZY�F����WC�}2�e�]�"I{	4O�E�I&>g���j:��o��zb��(Y�5[�f����L�������T�y��@]�09.G�#����t�����h��{�$��tAH�Y�a&������W�T�>���j�"{j�o���)cJ�Y�Y2����\�d��/N
�G-�~8���hkK	���@�$e����Dg����6_���b������`�O�L�K�����p�v�n�`�$�P+v|���E��(�*�Jq������D���dZ�;����O�B���D��_e����T�1��S���%^�*t/�>B��M]\<��i�?�d���Ul1h>$6�'����sg�q��P"��5&���7��(�����e�Zt4�����N���T��1�����)�#��,�$G�7_��Xs��(KdR[�&��Hi&�S��J��]�\��
�/m%����3�BJ� ��Q��Ww����F����e7�XO������>
j��	���z���`�����jG�M��n�/�U�"�+}�57H�n�26<��"aXu`�.��y��fM4A��/� H�QUrQ��?�Z?���~&�|��%$X���Z'���<K�C-���8�=��~��)�&*<8/�H��MPw�*M�)o@6�3��e����f�������!�	�@���'��>�[�Jj����XD}���Z�����N��r0�Uc��Br�d�����B�}����D��gDe��i��P�]��������&{���s�1YR��k�;��-������ p�,�hDwG� �y�����=�����[vL���&���� d������[���B93�.;��e�h����%��Y�����T�|��Am��2�xyP���(N��d�}O�O�-M-����Q�����a%-��D�a�������t�Vf��/���I��7��xY����VZ��NL�z�Q 6I���89���*��uZ0�1N���,�]�/�w����5�����$�t0jg����Z��#����G�H;L��|����S����c�81�	05��pA��������\u�q�"��|��@�]�A�|�;b7��5�z5Qd��7����a ��p�z�-�����z���T���K{�]�y�o5s��B�R�o�n��
XFJ�d?4�����:`��-^Ql>-��J]�v���
��t�%O����+O�E�p0`��PB~�`j~]�|��E@�=u%9I��g� :��$e�u���{K�0r����L�Sl���g��
�T2%L�K�fF}d�J��s�
���6matF�v��~��-�8���C ����:^��J���w������,1el�d��K�9	�G�`4����t9��	�{	M���G��g����P��D�����b��P�:����]wO.���1;���������r�������dt!��41�&���\�H��'.E���j:�C�M��5��e���o)�����I/4�Db�6b"��o����=�u��Yt�
I�!�Y��E9_i�feY��W4�^��8t��p��O.a�:DN���������n�UP#���b��k����p���R��c�F����0o����G��[ow.�?��eb���,YpSZ`#n7��6�m�~�z���������t���aH	�3����B��P��s\`,��\��a�+��/pd��Gb5-A�UAX��
�J��e�yd�����O��[���n%�����Qo����\�����������
o��	�$�A�"$�8��F(��Y^��7K�te������)W���p�g�vZI�����C8�3/qi#��F�+7��2m�s"�������xOH�����T���
�����/e��!
.���N���E7�t'S8!t<PILO�I��T%J��@C4����I�E<�8��R	��p�$]����
N������_�Kr6�>��%�mO����U�-#</[�k��bI�����&�:�aH#��J���1�7%�P�L������!��W� �W3�
�i4�B�%A
N����H�!�������+��(x�&x�g}~+�������a��/Jm���
Ah��K�9AV�����C���\���gS��y���\x(�;��},c?�t�y����\2Z!��`�#�p\���A'�nLb�n�b�*���<���z����<]����������� ~������������l]f ���?����dF]�����_������r�������4�3��@�A��9�A0���k|4������8t;�]������.��(���s�3[l����c.td/:��o�T��c�kA7�]J�.vi3azJd�Y������8l�m'�h���N�<��J�����M6
�?wT�z��[y�|_7������ ?��i6j�1_(����!����!�����iK�}�>��F1��,����������>�@O��}���K�����s���B���K��/�����=[d+�@����Cd�hi�En���O��m����n�����3��-���>��#8�DIP�<�����:��%�
��U�v�]P���n��z�J����+J����Y{M�w]���)����_g�S��;P�1�I�q�����q� �WJ8�C"'F��?y��A�������_����98�-i	�f�^.�Z1o�d	JShI�Dv���|`U��7f<����4����������k�r�6B��v��1 �6D���
���Rc�M9���JjH#���c�^&�.�m�85�1f ���D�3�BC0�+-���[�W���/�|�=v��*�de��,T�����T_�����(�H����,_}a��_l��������}����P�S�{C-��@������B�o�`����7�S�?.��F*jF!Yx�[@�#c�-�L�����ug���@/�o�K����<�@@�;hH���1�x�����4C`�����Uc�E1S�8��S7�1��b��SGo�/�4��E
gt�|����-\���#���D�Y��8�D��k{~��?D#��;��9&��)�{����L��nR���3O�V�y�)e�Jy��f����������F����k�������s�r������>�X$�����Nrw����I]!y���p�������1�'�o��yO2�����L�7H��;���aF��H�N(���3�Jf`^P�������`��,
�g��|"������}�k������y�0���2	b��;d�&��N��C&�Qi�a����F��	b��e�����#�ZA�&�Y�$�����9i�1�0s�#��$�z��iW�0X�o��E�qR(+q>[�1������0F�80�7�Pg��u���\_X�l}�G�}Y +9Q���������%���H��	�c��|P�������;��b��*
�p!�^�$�Ab����<��O������3���P��p��t����v�\�=�A}r$7���D��h��,?N�*�������{L����� ��09�u��t>�:b3�#���G=��8��[�J���m�����!�vO�u����dCm��E��[I����B�x���y����bl�]�P�@X�%��wDa���YV�_$~g&S�
[,3}��
gf�N�����
-������o����r��y��'�iJ�YVM����jkrA1�w�T�6����X��To����K�C���,�d}v������m����� >�\�q�=@0�k��������x� sI9�BO����U
�����[�9����8Y4�[��j��t��������������`�9���r��b�T����Q"y����@����k�69�9U�a�������|�=�Nwe�}��'T[z�A�t����\��(�z=[��E���]�#�=�s�{#��>G�|���&�d`���Y;�54�����J���|>�{���Y�
���T3�8As+qp�Zx�c���K��JI��}wJ'��
���Ml��:��H\A�+��<�G�]�1dS6~��r_Sd��qU|�?�v��J��fK�,�����xx"�Ci��|��2�4p�����K��i����
RC>���l6k�/��v��*p���3���C~>5�posU��� )�p9���]��.�Ls�����Ef3����n�GS������Xh�[4n�L����M���1�},�d`������|fL�
}�*#s�z���HN�:��Q�3:����`]�A(���;��o@vQP���V��<����e���� ����2��&G$�i���L��R����YG�5Q=A����|����ly���u0��p�,��2Vk4B��!h�Xs�Z���t �
�	f�/�v2%��2>.b��|U������������,^S]�t%=�3������H�EH��S�b�����i#����z\��*��;IbN��*�iU��su�t�^��P��_��;�N�(Lq��F�����c��6��U�k?J�#�l���w���8���KG����<���y�}m?V.�{����5}
F�=615�+,��,�o��pe��e����:�SY���Q?�;�J�/�6)��5s�c��?���������q���.g�S�e�2>��8�-{�p7�����#s�mr������YR�1�/_d8��'Ezla��"�g_0om/��IV@EY������f$ �`L_�xZ���n��G�������h�������E��h������2H3->(e�'������q2:������*I=�]K�.s�K>�����U��Y��=^�tu���s(�{_\�A��s����������%������@2��vT��+�J�-��,_z�{� Q�<��uc$H��y8�������kY;9�+���b��kV
7�1#qA���
 �*pLE�#�Rg�����-���{L���0��S�����j{.J����vR��M,��Mg�g���������\��S�m�m�������g�-���#����-�@��������[�9��-yZav����2dux'CfLR����'��&)��(�2gec\�n����l+p4M�:m���E�)�G��+E���4��)�Q��PJ�}o�F�t���
2r/3�-��-�|a�&����|+K!���w��������qq�rjoN>Ei��YSu�"T}�&9���o9sRb{�{�jV������~.O�Vj��VY����X0!�������1���g'����v:��cSu'&�����[������.��0E��U+�@�|���1\���R������3t���+�'@�����N�*@d��e��	sg���1����@m
��L �cY}n}mI��8
�U8a���p�����q�g�{G�y�������t���b>J���L��e���c��S'��9��o�|n�s��\�����}q%�bC�o��ox�+�]�X�Q��J��y15���NM{�Wd���O��H�,'6�d��@���<��A5J,Qd�snc��$�]��������M�w��3��/������#����)��q�6�����p&���EP�N���0�]�U��o)��������Qxc9N��}�y�9����</j���*����w��7�W%��%���/
4'����]+)\��Ja��sx[���CXD��������N�6�Bz�����p����D���
��o2N{��|6`wbZh2b�I
X��R����	c�9�6�~wYV�^?YCzS@�'s���������N�����������:��s��9���h��
w|��������"� u�=��h����'%D�3�i���y�Tp=���Jj��g'N9	Tf��Iw����gB���u�uu�����{�N�R�������w�v������n�B'���*�����w���3�,om���SEXD��&,��V?���gB�8S���2�/�Z#{qV4$��yW\C��(�3�5�����O��Djx�7j�tuB�<�2����\��Zr5(�S�c���D�ATN�4]���6�����T�/<%������3"��������u���v~�E����9�	���u�B���@�
��������W2����O-,�����<�N�K���y�-��1�(�.�
�1�e(�y��s�b�������)'�R��^�b��pL�YI��9��������,��Y�/
x�r�q���r��O�$y~�Q0N����/�������<ME���"���D���LB�\
'���T�a��(����#)j����*����
�9+�s�][��x�Z������Z�JA���9<D�t�
���H�;_""�2F�Uy�m+VMx�t������*�:3�����6W;:�p�{�"��Ju�(`K�����#6�H�,���4��Y	i"JxwC�}�*�%���*dE���r�kq~Dh
W�nf�5���L>�������K�(�p2E|�?�%C=���sC��.]q��R,i+9G�(����+��=:8���H���o�Z�)�R��b���7����O(�����+���0f��{����o~��w�tC�T���-�IX���2!UH0�������.>
T�l�>=��!�LHoLxd�����k�"�f	5'�2�gf��	�9+�3QM�����;?\��=����.�l���l���p-LT�D}����q_�����.�E��S��Ig������(M�exg��oL��U�,��+�����V���/!km�|	���/Zd����b�sl=v�NfF�����K�M����g'pT���`�is���N1�S���;���5����l)��{���d��3�����W�G��/�����JA��s[,=�0hs�%��<��xk��e��>a�������,P7��A���l�������������
�b.�vk{�����so[�3��b]�	�U4�a*��X����7
��4�
���}t�t�9>�����GQ��e�=��	Z�S���P����m������?w>N���77���Z���:g��kE�`�}�_?{6�n�_��J���iB�`�������	�]�������'�.������h:���=Cdi��!��>S����[L������(��{�`
��+8����8=�Z�������u����/��6M��T~�����XT�L}�m����H�����uuq��s���Q����>����S��������Q������|!��}�&�DE�i2�2s�
MJ�
�v������������TZe�%	��$��d�(�����k�2U�lMgi����k�����������p��� ��a�7H	�r`0[��
�-������6�����Z��u���m4#s���LSm�V��?c�L�[Io��h�����l�f��%���\`�~B�Oi�Ys�,���,��n���"�(�r��t>d�W	�(��7���_�Pj��y���I���~�k��3��N�k�5�gzQ�s�����m6�������svb/����j��6Z�����	�"$G\��d�),�������w/��
�5�Cf��Nm��:������7o:�
��C4�u�2������wqvz�c�b��$����M����������[�����#����v�%
Q�pzu}zre�T�Q�7p�V7����*����]UrcW������C�?�/����Gx)�a�O)r�KA���603���^��o���|^�P��P�u���8@DB�i�b{.������
�xkC�c�Z*�s6@�^M��7VO�P*���$���a<�?u�D��������F�v�tP���^��K�c������]��|��#H��c��+D&N���)�!�0 �1L�]�E5��6�x�>��Ka��o��K���a�����.���iD8u�C8?Z;�T�:pv.;����~����������.�3�����q����q�p��N�~3������q�k������h��[�Vb4{3��@��o6�(C�C&-o��S��C +_6E�|�3����m=5�C\0%C����v*��\q&�
��,����F�||�a�,����&�g�Y<?:�VG�L���%Sq+W�g�9�`f�Y�� ������;�����s���t������xZJfL/�t�i!�����v��fsw�s�_�r�(�4n+"5;Dj���s}a}`Sm
�P����>c��H�(N6����Q��c����|� !������4��^���+��������,1����g�cg>�����;�7{�~��!Ge#�=�p�i��+��������pvH�G����s���5X����$�Q<����s8��� ]��E	B����������_>�[�e���<?.'��B���s��b���.�Bp����%�K��W�~=�D����hBDV�5����k\�������u�J!�����9*��Mf������p�?�&^�	��k�:�#��C�i���G?+�la���1����t�~ge�����+xUT^F@`2�
�~�����T�?�(T/	��G����9G�����5e-��C�uN�O'~�?�x�E����,�)��MaR\�5q�$��>z4��w��gn��dHZus�$��d��H$�$	�K���U� �������(�\��x��7�,6%���!���_a�J�#��.�R����l�\��3�|]�����J����k��Ho,��lU ����PD���a4��&[�:R�1���A
�)
\��������$�Ty�+����b=4���DL<DO�����c?d	+X���.`�Wx�`�����O�U�#�O�S@�d<�l���\(U~�Z��{Gn7@�(_�4���Ztt��/[��(�����^!���du	���3�	��k��d�C���:�h�Z��.�l���y����5d�xwIS��V*{����]�/��hy��%��������kj%����^�.�{��;��i��������r��a!�(m�f���O6C����W����`Mg���Q�*e���
oh�b�{�cE�u��L�ct��k����n�@��0���]��k:D���7�W������V����n��f��@U����Z{_�����.�Q6j��H������RQ��=x���Db�>o�����J@X<n)��|U�g����#o
�p�
������B{dV��h�56`7�\����>�hhZ�x�zx��xV]|n���2C�wH��B���D*�Z�+>�����b�"��_vf���/��T<s�j�6�f��p��5��^{�h����;]��r���]���K�I��3��� �g��@����C+��/�������P���m�o�q#��K�?�Fm��NaSvG!;2������1���D"��wp@�=,�o4�k��{P�h=LcL�-��G�����>�J��;:��-��d4���L�Q-��5VD��fsw5?7�"�CR�� N�!N�[��
�����E	�ER��W��6�����5�0,
z����!�������u5��)���S.r�K#��
R)�Z���4�X��������f�2K�1/��E��	oo��-tmg�����a
�pz���	����8cU��k��Qt��Nr�0t��������{'�y��;�9�����GG0��:���{�8��<`:�y�C����~�}7��[G4E�=|�������X�A�9�Z{�4��[�t�=~s�^_\�����?����T0�F2��m��n�.�*!�;r-@�y1�]k���cyR7O��L����UG]_v:z��id��d������6D8��"����mG��2D��(���y��p�-���;!VN���R�"����~�P��l��{zL��|�1c��0}q�����d�Y3��?x$��>�!2���\���c�����4(.���p|}�1�Z/��W�}���'f�����"��(;bDr��w<�0=�W������t������8�u�n�l!�s��t�C�7��uR��k�[�3iO��V�;r'g�}t2�d4jS�\v�����9��l6���t����S3�t��n�-���n�]pl�2_�#����;'����g���������c�A���;�'<�����q��O.���/�O���������c�q��5J��#k_����������<8<l9��:(;�N��^�����^F���I.��`�@��C�o�hs;zy|�u�8������.��\M�\d�H�b�Ig�P����a�$j�>�dR��>��a[��9$\�5�I���q�Bt/�4��6���u:|���t�wd���{$s8�����V�!No�a�9SN���V���S�����t*�`#��t��RS�@;~��yam��z�t���������13q�SAah����
Dc��H��~#1����;}���~�]J�e����"�q��O�C+i��!��7a�E���gC�&%�/����dt�R����>B#9h{�m�H��0����d+�gA_tW���^Q�3~v0J#�w�9�t7�GXys��)+2��
���IC��3�Yc0G��Dwc;&HlYp-@a�����l�Y�:�*kdA!B[Y�6LB'��P#��T	��I����egd����B���r���a���7r%�F����<\�j��v�����m��E�������
����b���v�%/��)�qO�d�\,[M?t)���a������h��S��\��]D�Uz����?}^�|�z�d��%���O\��z�`�'��	 �t��~����y	���+9�����K[�}��,���;���%���7Q��h��>���\��S!���J�g��x~8�\�����4S���_>���|$^ ���x��!�Q��<U�ak^_X��A���or�d�!�w��0c�=��:�_b+2����%����c����O�t���x�8	��W��H,o�WA���rc/���}��=���� i,hs�,D��<��%�?������N�c�w�!��K�8q��!�,�>�;�)�3���}$�B�H<�=RT�qE�:1Yb���F���<;	i*��������OL�{/��M�P���Sii�.%�(g�L� PT:���f�;�8�!��m��[0TR�(�0bv���}m�!c����kS���R�l���R��a��������N�>k[�}�f��Tz�-�.����\hN���>-A�v��A��nO+��s��J�h��ZA�������8����;�7�b(�(���.f�aa��T�\A�r*�Jt.���,�W��8���3$�v����������i���M/����y���f��Cg�FvD��hV�����hGv"|�����|n�U�1w0@	cGot�z��n���`���������u��u��(S#��PV��w�A7N)�������v}��f�+�hB��*�+���+C�B��Lmc�d�=4TK���OTJF��
g4�$S`��{��<��)��Q<��$=�r�[Tc�f����!s�W��Gr Y��X?����{����F#�O>�tG{���f5�H`y��
������T��`8�R=�&C�����~(���%�����-��nY��g�&���|F���s\��
:��]j�������M<��g���?��.�Mk��GCK���)�8O9-���\��}��]�BB�X����z�NsY�xq.?�tB����������l�6�G7�;���
g]w����&�58$��aQD'��d���%���FiSm&����W�c%��00��V�	�Nc\�!���lW�n5>-�^�����);A2�W:�)�\�Z�]��:'h������%��>��0���AV{����^��m��-�$��2Lr���Av�/W�q��1=���N��QH�e�hZ�������������2�8�zg\�G���5�K�@\��\��s�DM; �-�Kg��p8M�i���~%����n�����>���iQ~�j�$�I�*����|[�{�;���zV���/Td4��{��}��Z�3�ic���)#Xs�J[��t0�@<Bk����o��R8%�|����������;�!S�,�4eq>y���I\n�6cm��d�TL����s",3�����5�����&0��������O
���������<5:�)����k�����3����|<�����d}�ni1��^��Q�D�=�~�����1��$#��7N�l.�j����sy����l@>�J%M0]M+��?�;���s+�h74���	i_M����<>?>��?KI��!��Pn2�%���
`��*���I��B���e�[�9+H�����3#�c�G3����^�G��v�V���q����k(k^_�L�hU]��1o���Z��O��x����������8�����G��T|���m~k�o�?~^����\x��M-�
��_v�����z����W��K�*�$I �a�=U����F�p���,{�N�?X�?����u��`����?X�?��B6Ve�0Y�)b;�?����{2R�2����z��}1X�[���[���WO�v��iv	0��<N��:�Q���B\���5>$`v��
�����W^'�����w�����>���������w��O8�|@[pTw2��A�5��������&�%rH�RcS�����vZ����f3����;�V�p�	��� SH3��,��g��K��������zwy���Y�M�#&//r�/�O�t)��e������Gg ;@h8 ���j��t�e�	�"���B��3�u�BH�
��M(������T�zQ�
���F��i1a�E��I}}�@��ro�J��<�X��kQ�}�1�@_�p�0i���z:K��9��7�/�Vh����$���wk�}h��c�T2���1Pv�w���M�����w��!�N��J���i����iV
�aA�A3����l��s��/Cj���$�;&RJWj�c������5)������o(J�l�*g�G���i5�p���	����G�j�
���i2�Xgr��T��.f�aT����>&Q����]���95��*��o�8D��g�����MU��$��p�N_�{��l��]
/��
����W��o8�����������0��'Ppu��{�;�c:4�o���|�<�8/n,�����
8��Za�f����a�����a�a�� n�0����Hz�<�eO)���-L���/��I�Hw`�%�����B���}�<&�4����{��O�7���������J�a_��y��(�%/�6*��}��yl�md,�?vw�W���&�G�H�e���|�/�?,f����R����IF�n�z��f������W�$m?�H���q�r>n�����$3��q���M�]������W���u^_�rL6������fB�M���>�Z��D���>s?���{���)Cuc�=[Cn���92����H�'�oiD�e�Lga�o�<�e���0
�	�����hqpA���C�M�\y*5@9�����,�[8��������20�1����j����|�������R8��#�ml������tG;����\w�d����9����x|Mc@�Y��a��h�4�1�OL�xU� ��6[*'B�s{���-�x�.n�`�Ee�6�N��hg��NZ����v	����9��@����{�<�������P���H��tX�u�#��>��H��)�{q���[��`y;�j��e���pM��=[����_�����?�U����������������V}���������$���������m��'�JF����������|t�������	����w���Y���.�D��������I����|K]`o����lZ����L�Z��eWAkr��R;G��|��g��a}-�H�u���o��`����n�5��������&'W��F�9Q�MJ��i����J����������������m�'���n���K���Z�����';�\m��V�������� �[�n�����^c^��p�p��������o���G�������������~k���&cuMT�@mo?��!O��b7��u2
S�]2LC����?�[L���5���e�p�����W��)|��S��g��g�CU����Ws:k��O���O�U;[G?KepT',�#T���T(A'o`z��T�����Y���i0|^M�� ��~8�������T'�&�L��d��7��+u)���<�C����1���!`"s;2�%���S�I�K����;O������u�9(��C��VS�(�����y%�U�&U�D�Q2*��HMt����#�G��UkV'\��;�u�C,4E�T�f�T8�L�	�>�D����P�tcX�X�sL�r�"�bn�i������#�����*�����5
i|7v��K���#��I��VO9fU��Jt/���q�Q��RU��:y�pIO�e8v��q�������U�w�f@[=A+�.S������R�P��>�I!�J|SSqZ��X�a�w������*8�5�E
�;`QZ:A7��DL��%	�e�R�I�8,���(Py�a�S���j�'��t��q	8��8>�:��B�i:q��-v"�$���%I|-�0�"����9Y��<�q�U��B�Cg[&S����-�:6���jo{O���e��R0o���m��M�oJ��@��~���D��L}�vF�Bf����%,����;f�xe)���p�3��y�����0+�p9���
>+�^������{B�����(�����!%Z����m�
#~�	<*�vt}V��j^(]{+6������~��w���MA�����
��jz���&��\}6���3����[���Y���p�>1�*����n=��m�%n�6@J�f��I�;��=���%�*o�$�53{�.����$�=��i������u�9�_��h�>)����|P:���F{�(�(��?�u"��H��s�G`��������/�n4������`9��X�`�=��I�8=�N1g����?msQAI�]����{rv�����_����L�����u�"uW�@y�:9:q/�
�:@�-��3��M�������+~�R$�\JN�^/�T��|��������6���p_�pW=bHY>�#�$�����i�+��Uj$���U-�U�4����q�
=�7s*�
���I����1;��_�	�d&@��3�CjX�\^��>=�\��ur���{�j������������C
�����s�0]���I�R�&��C6[���Q�RW�4���������8cy�WQ�1)sb��#�GJnf�X�_\������c�m��AG�k�ac&��`���R ���������F/A���_�$�F5�^J)�1��j����k��>w���d���jO������%���o���n��prZ`	'`x!G�����,��<�D�7,���):�X���q�JS�D����F$L����}Jpk8����u�N��:��#!��7�Z���Pi��I�}$�����u�M�gS�����u�3]2 ���XL�xz�i�c�}q���s�/6%o$��4%�.�����>�x�K���$��J�Z��%���O"lE��������U����G���kp2Nd~!�43���8�o,E�RW,3�e"�����(_c�J����4��p���'��n�5~��{�6��*�rp��P`����DL���V=��}'�����s�-\v�6����[�#�������C.��$/P^db&�8��������?�x��q����:�ia��.#�4-�p_TZ��t>�$��S;�M0)pvI�  �PUE�I'���m��4�a��}�u�a�v�P���Ig'*��b[�&a-x)q�����~���{T,��<��E�"A�i�5S��N��&}G�\n0���Z��%_������/���z��l����7�=�q!���Y�_`�26C���x�1$��>p�T���U��t��%�l��O�$��4u����n��a����r����dii�#���o�>���Z
R�u)��n��eP���y)�F�c8]C|��+��cr�ao���K��3��pc��{Mjy�p�d,<��<$&��S��.j4��9w6
9�>�4��jd�G���H���9��?�2�/����������Cv!�L�)�)&�m�N��������p��k���X�[
����)V/r.O��cR�������2��j�����\��\<�AN�1�\����MGKIU�~�P��>�~_
e���o���9>i/��S�
>?��"R8����u��$�g�6���!��e=���K�/�[���4��L����o��m2���H�%��Y��?�i��IEQ�^p���Ncwg������F��x������:a��=`T�v�a����U �B_O�}���+;����Y�g�������������$�1�����},��8E$����D��J�>����?�l��f|cnT��;�'�%��HbdQR�N��:��K����'��*"�G�^�'��TX���r�p}]
�A��5�����wr1�
N���k5�%X�$��0��h��:���m�������y�D�RSC"���ex*��6
��4�B��4���������P�i{��.T�KQ����E������	��lCY-�%3u��O�0��������J�E@o���T�a*H3�7$8#��'����,#��p�BK���K��!�cr3�"��`��2���kx�S������! �F|���E/wF3��	���3&?�8{��!�����H�L)1���Y Zh���Lq�B�CD�b��������Q�t ��u��z9WP$: �2@�������#D��k���@w�/jF�+_yN�����3!����(���Nl�����&�l��zO��ph}]:��@Za��jh0.q�~���+�Vj����(anm�5A=R��0�������Q>E�q�t�3���O������k����8+�*5&~�Ay����L'���i|�Nx���N������-��H]�G�����6E'O�aQ���9���('�%���)�
�$:���#������4���i�8����0���i���'�9{��������rF��n���/���76Gd6d\����1�b���8��]��r���1���k�q��	{c�K�� ��_Ss�^�aJtx����M#l��x=�5���t��1���i�Ar�;�\�w�D�.\D 	p~�������:����#��91J�'>���=)�u����7��
��Sv����v�Q��������y��E�s�&���n6�������F��*�8�'�H����=�[t��uCa:�
�����C\K�����K��_��+�'6����+}�9r����(��YvU�I
3pF��1����\������L�Hpm���������E{�SB=��R��y^4����	���s�aX�����sI[�s�Yj�:|��YM���"Fwz;G8�S	-3������@��I��;�gr��Xt>�q��I�_��PrQ�e;L�&��B������e�HpMxH��C����o]z��B��K����W�}��]����62"W})�
^X��t���;�<z�-@�A�Ob(:����� l(�:w�H�"�p��/�5�T����(��E+����!<BrOB�|D��T7��y����������+��>q��P,��A�.�i�����B^�5G���Eianx y�a�F��jU�<�����L����Q����}������/��Il�N���y�M�R��{2�4�1c�B�K��(��)��qO��Ps��
'�Q.af��O(��;�QP�
�������p���"]
��Z
��L�!�mH&�����`,�z�����s���H�c=���d(�����������vD���t���A,a�_��Z�Z����aM}��0L=S����r�i�(�`����2�i�N�����P����kq��W������5�c����&l�S�F�$���\��-��TxU�"�$�R'�Q���k���i��n��vP��	��\s�%ar��i��A�9��!�3�2F	��9�������,�rL��pU���g�=	o�s��L����U��������u��k��?��G����,�|F�4M�1-r������A�7_M(����9�G�� UySb��������x�_\j�|��;�� ��#J+���>*�f���0�4d*6*��JN������b!��5��d<�s��a�H	��s����V3cRx�s�����Hs�9pX�q�0|��v4���z���E�*
�X8J4�e+��7�
D%�qC�l��z�
��Z����t���~�1�k{B�6��&�`u�����%��2
n��S����i�v?��v��F^��2#��y|��;�!y���
g7�Q����B�G��+Z��f��#���Gd�S_��P���s���������r&�N5��x�.kA��k�2~�q�BX����0r$��
!#^�J3&&�L������w0�
���L4��~_�v��H�.��t�R�X��Z����|�>����'�7AfR������F��o=\������1�ebA�Oa��3��^����nq`���_����C�o=������j����Z6x �0\R�����fN�0(���-��z��7W��0o��5M��d���{$��C����z��+�l����s������BZ�:��#QCsV)1+�����
�,���]��=��A��5�'�`����I8�fBG����R��s�������W^��/��5���#� �!���@���� �[B]���;�)�YK0[�{���(r��2I&@O�(�=�����W^�/��_x9�
���Zd1����������j��w�bW��1��]������*F2{��i��������f����IC�R�r�>n���p]�+*��,`7[?��8YxOe�'~j< ��_80��]���[�q�2~��R�����+5:�_����[��f���z��:�O����ZMT�<F�����h2ce5��]���;0���{'q��I���s��������%qW�*Y��\G+�au9�lv�����[`�N�8�������WL���U�D)�, ��n���&y�`8-jU��`�gb�f�����1�S�
B�G&��y8 ���8i�6	Pm����9�Nc�0]D�C��������C���=����`8���w���-�����@@\?�����m-=\�4����zx���C��}����M�'���|�Yeq�A�-�'Rw�I�a�QCX�Q<�������IF�.�-�3�h�Y(�%�������dF�$-X�{�E��$#�����E��^w��c�FUt��12��q�O]$���)wa�������_��\ecI>�/@���VmU����l��V������B�������q��h�D+,�H��� ���f(0D}6|�C��L��ed�;�3|l�o��,1Q����!��&���x���7�m�A��^�v�T������������gaM3������
��f�
'��|HB*Lf����]�0������6��h���AmJ���]E��{G��2����?�T��i�1�]3���
���������0fs������H���)9����~�c��pn����7�A�o��Y(vi�}�N�1qy(�j'���=��j%�cW=f{V������������f>9���:�m���8�CK���z>�,������U���2��������^
E45�T���Pm�u���p2~S�y�w���6�s�+_ K?��C%�p���h�\Z����H�C��R_�����X���z.[�aF���X��P��ZZd4iV���Z��\�GH5��`�:]�n0���B���L�[�;GK*2��"Y��%��|�
�Q.
��Ej_U�63��S�����$
�$����x�zh��Tk�,�T��J�T,Oq�{Z{]P��h���o6o����{G�)+����\��
���o�`�>��1Y6\o~��&7�Ha��
�iX
>��Br��&�jNHkR���3��,�`�9�0-��s�T������R��2i�����),W�9_��bU��7����.,���Sm3�G-�y�P����w�g�Q��7X�3�s3��d��Nt}81��cu@��Gz����H<��?x��2R����u��l��I���a]����P�"������I��*\
���/cp/�9������fg���t��$-�+�7�Xfb�rCV��-
#}6�C���xE����349�o,��-6� h��2��-kx��
���"��6�,�[��|�P`>��_P.�k�<�M��t�,�.$'2�]������:��$��X'>/,Ar>!D�p�I�v�}WV�����,"M��	��g�����wOy-Q'���Q�sP��(�%����Xm�[��+C��x�{,�P|��b#_���t��ne�]�!������Uw�1y\�W�#��;�����k��)�t2��P7nX�>C4y�s�!q&���������i���4����2�G��(���?� �sGQA��.������dD�&^���Yo
�����������B���9�b`�zF���]����J<4�~����(��P��-�w��0���;��H���3�wmt��e�c��*}i=j���U�����Ti�X���Lo���8�d���:���|N�h�Y�E%�����n��������s6��*�x�����(�{*1�Cj�G1����pbs�������%�C���y8����K��OG����k����L�HG/B����������Sus�9�u��&<|�$0K�/�C$b����\���U�l�v��p:��|�-^��F�n��'��
�M���"��k;9��*�3@�fC��(NYL�>���l�������� 6NI���&;d��o���9��<���y�r�@�F������Px!�X�+Y���f���d���K��D_+Z�����Y5rSS1PSYi�3ZbI�����IB�����{���s��(�������V��x3o@B���x�;#�q�K�N�X�C����e_���|��^(�?#����M�wnfd{8$�F��,��\
M�x��c3I����#��yQ�vL���|���0�/�U:�K�@�c_\���|��cu�>BC���1�<��x���X�l����=M�"��Wl�Rm��^+��?,����0Ne�=�������4������P�=���2v�{���Q���c�P�'�\�A$��e�cE���r+7�0��E0t�r�U�$lN�d��d8?G�Q��f
@�������&�J����0���)f72M��-����*U/��%����;H��w�x�8�]�������)�0��Z/R�ve^5�]�M�i�Z�R*B�g=?�g���#��!I������<l
h�G��O/	>I#DAQZ�[��S�=Hj���|��:�cV>L����c27.��9H��������-3�i;uj��������Z�J#�{FI�"f���NB|�V���Z�ty66�A�f�fm��rf��v���(r��~#A��l�B�����sA��\��2=��,���l��LD�Hgz������
t����f�����FF3d�2�Es ����X�����|(*8W����m~�i�@�@�y�F��0;��^���K���%6Q L�1�T�;�lmB���,��,�Zd��]8�M��h��� �.�����{��L^bJ�E2]���yK��BG��N8�����V<��!��<;#o0�����d��<M�T`���q<��cX�e�{�.QQ:"/U����S"nz�L��CL,hR��@�%���dNk3bX<r��,��MC�t��xdd��S��@b����E3�bz�L3N���=O�����g�_�����.��0��p/���}>��/%��<�9l�(��M
Q�|�HR�`�.i&N;Q����AgO��A�]�5+���0�"�O7��}�n4��~��p��I�t���4
��<����T5���.u"O�t����h@u$1
�mN�����$R�7�rg�z��R^�=P��I��x���y����xM�n�9ce�����I��'��xa�=�2������8�8���W*��k���������h���g����r�n��/���J���xX���7�Zw��g��A���le���d��y���R���>������sa�Jk��3�u��4�gc�5�����������.B�}���p7��po�]t.&�%�xON����A��k�����M�G�:��"9�N19����=�+�9��0�.�,��P���@{��)*=!Hz�DZ��]l��.:a�-�B+��L�E'�o*�fow��7�i6w[�Ag{Uz��l��4���@vvT�i3�|���v�/�\�)��������]�f�}�Q�8����.FT��M��j:
�zS����a��v��q�r���*��GL_����)��IwA�/��7��<���(I��;������$��<c�uO4�������cz��$Z��s}|M��.����H�+��K���7�
�����.k���������Q^S�����{�^BE�k�M��������&6�S����/�T�sv��������N���B(���F�8�*�%:�M
c��%��8������-�@��4k�N<NSO�0�ogQ��fR�w�3OY1����rJ���������n8��}ofY�y�|fc�j���%X(%S�����a>q9�Mh��Ao�U����W�o;�Wpd�*�t���0!�G[�r��2[���� nd�=������`��]$�ax)%u����-������i�DP��b�T��� ���rU��%5�$���(��0�4��n�������A��]�^p�fo7j��c������^R�wa�%y~����R����c>W�d������������-���^J���%��v�)�RD�Q�s_������:?��q<���8������ok>kik�G��l7Z����L9�����7�����������*<C�����������
N.����.�`��Lr�aQ�=�M8������;����?��\��`���+�@;U����jM��/��a9I��g*x�:��\�b��/����uM�[��q��}�$���K�R����p:���I�
���^��
��m� �j����ZP�O�Z�6�v�7�������\�{�E�k�i���C�pM������&�>�[>��P>�z��Q����3vK��M��[�.fZ��Hx�\#�P�E����f��n����4�M	O�o�D}���N#~��4�����!}�W5��m��x��#�3$9	�G�����q��c������$�Te���<M�s���JM�:�����BYa���4�^{������<���pw���uV�/e�IL�o�`w0�����+����n��s��W�u�Y���;�".�>�rr�� �F�q	�����i!Y��bW; �j��A� �D���}
!X����vs��]�����n��t�n�L�����6�x�C�?�������u��������mo������G�}�x������n����c/�7��������-~�?��k����?j������Q���������a���o�Q;:�������-�rW�D����3��J��b7��u2
S��t�P}5�����# ��f?���{��4��:�L����j�?�k?�_���[�jN��g��w��'������}��:�W7�����g���p0�p}����n�h8@�}�/�t������tA���p�.����K��j��MOh��<,�4tJ*rdgq�[���(Q�W��ORDz;�q���Jx�Y�4|����.p@�];��c�K_t?���E�'�oT}3�������
�roT+��^�*���g�����2�J&3r����:��G�h���G0H����n�n��+�	�,����
�#r������k��OZ� ����g~�?Z8�q�Rf ���>���c��X��cS�p�������x�5A�v�?�{���fm�1����2s~i�4��E�``�^���]�N���)�~���E�2�w��g�Jm�P4X�7p�)h��a��T�E�%��"���d��| ��T��DM�+bap��P�j�m�4I%�������.����^������l����OY��=�C���o���j�P���U	wo����������fs'������S�����U[�h��fU~[��d���W��'W�'u����:?������jM�G��M�k��/��v�;�k/��q���W[x)����*���xDuo_�`�O���d�
��_~5�o����qg������K|����Xn�6����W[��|n�e��Wx���-���G�w�f�J����r>sS/XL}���?8K@w�(?u	�1�1��I����U������%�I�
�*���~)�<�DG{��"�R�������H�lu��������+�o|���L���	h������`^���6��T^� �`�O�:7�H���C��s``(1y�5�����$�;�������{v�Y[��������j��2>��
bT�������*P�c��Q}��$M�����[���X��?I������t���1��fu%h�^�<F<.�P�i�6��G��sV����|e�,��-e�{+5�ka�����Ak�}t�_�)[�Q�K��)����[�8r���N;��:/u|U5Xw�0e9�R�k#�����.�l_K=2jEJ3~�_���3��R�:k|�������sw�x(p-�S�X8
�&oz;��!M��� +�^�����t��<��I�=��7v��YH�u^_��p�	����>=�OO���yIN��H\$4��6�6��A�O6�~/���[�I�5C�=h!��[;��t)\��/��������s������r~�&B��������[h7��d���|PxeP0	���p���xu����a?c	a����N[U<&Zl��x2�dq1c����k���t��&si���e�7�k������.9���v��'mlP��?�0;t����Ik.�����6n[����E3�]q�H4�d?%c��M�Y�b����}s-���� ���z]���y������,���]��[�T$�h�pLq�d������J���p��o�}{z�=������p�	�E��������t��������N���������f����(�
�|�l�H	�{N|��kN�������vs�u���f��[�������������!��LO�4;����9��]ku���&|}�w���ss���J$Q���Nk��{
������)�<�E��������i;W6a��6��aj��f6���=`�<��T��.Q���A��K^�*��e�|�����u!��	J����m�5�^����f��1����F�q@�_�m��>< 5a�_��(�����~�a���o�3<�Fo�v��4�e��Oe�:��n�~"�.�R�&~I��Z��EqgxS��
�7`Q�H�t��]�li_H��(��36Z����fT*�KI_:k��/X��~!X�:E���sZ�����	�X�+�q��nP
�~0���O
u�<t>��s?K����R7g��-|-�m{g�f/:j6{h��������lm�	E��v@���iU��$�;;���CJ%
���9�e�K�I�F0�z���������Y��i��U�"8�n�����8����)t��8&4�x��`��>���o�kTo�����Nx3��k6w{����A�v/�*�����<5yU��p�$�������a�O�����u�����L� �?��6���3-W1�#���.H'
��\8|H�u;$8���a/��p�����?���#?P����P^�W�2Q���)�������:��n"���8*A������W*��[��6��Q�6��*h��S������S:u
n]��sb��?w��7�W�(���L���h�)������g��/3��\/T����
=���m���i��(WZ����d����%Iy��^���
�OSQ��l2�gK�8"NfU�'� W���_����z���L������S�/3��<}X���,���Q���qT��'o��&�Ma���]v�\0��IT��0��$7P����L{5���M�����o���;+�Hb2ewN�`.+c�?�j��������a���>�/o��0~R�Z�����w���k��\�?�F�i#��V�/�\0�aFz��V���#����3�#��V�3����J������y�J]_������_k���w����#|�Q��%��Ls�������6��tR��Y������O��R�|S��j�}J�F�Kg9��)u+g�0�c��Rg��>93u��`:	�x��\j���K�M����t7X���T5"M�X�����~B<~�I��	��I�y�j������e-�* ���h�x67��] �{��*o�+Qz?��8��Y,r|�%_c�6�K��b��qxs�J��o��N�DRx!�Eg���x��rc������1�&Z�"C�+���~��������%��G���m�5�:q����0����L���A�CuLv�K��������6�3��U��Ff����3��k8����&���|�������/�D�^�cW3E��K�T�tY����U�YK[�m1���Vv�~�0��+-M�����a}(�T�
�]��#bo�	[��$�7F�x�c�]L��S7_)�����t����c4���C����*we2�y����
��3q���!�~P6��2�A���O���`��Ro�����mR�;EmA� 9R���*%N���j��u*���h`k��2I}�����R���@��@���H����&5����~:=����VVu�����s� ��d��HX8���5���PqI�n��Y���M���[x��H�']6�l��E�*]$�����N����LC������#������A0p{Fx��F.HC,�-��[�s!���eY����`�J5s�1���h3c������${$V&�=�z
��B�:��;���D-F��h���$�.�K�����:h�#q_8�
uSf�XsF����}sg������^�h�^���J�L��LB]�pH���3�9]������,dW����S�b�t[�����=�������g����������o ����$��
j*��e
t�EW��+b����N�3�qX�b:�3S����-T(W����~o�5��������d��Mrn+3�}�+���/������3���|2-��i�o��2�:�]�c�����+J��������Rak�KL���v��J@w�Lr�#U����{<Nx���~��)���|��	�{D|�"�2�7S.f����j�u�
�G�J==C)k���{���0��``t���2h�+��47����3<x��i�Q-]������%���)@�����g0\��+dW�;l��'�� ���\�N.�5�9	���ZSR��P\X���Ri[a������B
R�Vn��&��n�jh�,�������[4��uB�Z#����K�Ib;�����s���L���[��_����n ��FDw��dU��_jhb0H\�g�c%Z�gm�4Ez3�NLan�D����SRSy�P���A��C����j�a>HN�����y5j\���vb���)����P)!�B|�!\X�1�1e�(�~��,�e�Z�������V���K��6��2�]��Q��gK�':xQ��y����Xr'\���hH�AL^�rz��k�����\5Z��_�����"d��?�7k�z�oP���a��T��	�]�*x�a5
\ �6��$$q�q>������#.A�L1}��w� �6�f:lYf=�U�,Dx�f�z_�����S�}��$�k(0~���K����lDM*tc"W�T�H�(9�����2�xG��n>�
������%�2�	�
�(s���-�>���M����!�]�\K��|��mf�]�NJ�v�c���ckh�[:�>\�HB7Y���?M�t����WT�d	vK<6�A�����T��H�)K4��xb,�r��������� Hp�������!C�5��>���%�7��Zq)|9kr�`�xL-���I��d�\�bR��,��h�I����S��-Sk_��rm���V|nV*���0���H\�����,jjHC0d��=p)d�S�H���6�%���T��>*4���2��!��K�������Q�����g}*��H�I4(����&O�V����	&��6;oo�%��I)���x�)4���x�X�����x���x� m��&��K��2�q�������>m8��J_0�4����;��Mw)?���~�u���[�����������{�����~A'��0��mO!���Y>[y��$@e'u��>D��O17��3���u�����������@���L�
����i�������Ay��������q�~*P�&����
�q�5��J��W������#[����}���
(��3:N3Z8���R�����V�{��{����g���I��,�[��V�����'{���_�L��`�H�Y�g{E�`a�7�.�1O#Dg��q�[�&1G�����T��X�^D��M�E�B�-�X�m�.a���C���lT��Q�b$��	���pBOQE+6����~���4�V=�Z|"6����r�fq���9����[�q���a�
S��)Og��gK��XD:+8���$N�{���W��.I����8��W6����Z�(�D&��/��"Pg�r6���G����%����hJs��JX�]kL��U���O��?K�^+i�T��j��|8��������-Gg}�x���nP��q�vdz����{��}�����'�-�1"=l����F�jA������b��H����MS�����e�4F!B[0
�`�.��!�8%�r�%Y��$�#�fNE���B�9���=~�uY���3�k7}���
{���Q��O�J������������[�`4�c��Ech
�O���tG�I�pD0���h��+�v2P��jgQHe��B���wd�����$��q�H��G�N��P��h%&)G�hq���K�R�
L����f��3bT�Y�J\
�~�U�3�+B�`�Z�PZ����7Wz����{93E�k�F�n��a�)FA�(w�.w�f=*�1��z��Z�"�p����JQ�y��8��5n L�&���k��-��N*�wX;���"L�V������V]6z���{�f��i(p��PE\�3�iS9wGJ�k� Y��$�X���%���������.49`���P�_��R����hn��B�V��Su�������T�l.�/��RDqV����s�%Y�� &�7	�]�
��J�&2��L�F&���5�k,k=�\#��Wz��T��Rz��W�^3��(����hAp	���!iM��^���� �>�r`&�g�a%5�	�H])���	�A5$�8)��M��5���������h1����Fh�����������D�	����-G�l��)���^���K�h2q"����L.��H�Q�^�k\L�+.G.��i4�{�V,������eu�MPg���$k��T;g:{B���!3����]h?�?+:�z���}}������;1|{H���������}m.�<M�Izd)C��M\U�kho3	>��������l9VBE�y;dM��P!w��N��j�,a���i������)�N�/`��[�"��[V3�k���������LP�U��q����BGPD<��r()�	�	r�����&�&[����t�'�]��+�
��I�74a�i�LS
#oQ/�&Hu���!F��G�������"5�KG#���n&C�\�^/4o^��S���Cd�k�IvC_eI���{�}����i?-9I;Bi����������|��0)9N�7t7�������X������g�T�F>�T<yWs�5����W*�#�����-_���>pM����l��tF��sXUa���~S����E}�q����6���w����7~>(�\4�5�{�@�����hvE}m�����p�����hC��'�.�(9�|�K:�(����Y��B���������y�����&U� @�p?��k�5�������#��q]t��-��N#�������H�!h�?R���tK��aLlO�(�D
^�����]�)���r�a���N��Y���9�b.	���������x=+.�k����rSO�AS��$S~���5�z=���B2�� ���Cm���=�^���d���]c(G��(��QdA
��r�lCEit&nD6n���J_�5��"� ��<�E���1�.9S�t����/���j���A�'"\$���(~��2������eG������g�K���(G>{'�c����H�(�	�9��5l�=����Syo�wM����:Y��������/h��l?u������m��#i�5Wv`�b<��:�-"�M������M������
�H|�8�XDuG�5�Tl<�Di}G�0_��!��Zbc��=������p����&�)^[#V2'���&WATV���?_u��a)n�������e�`p7`�=�6�c\��_�]�H��o�9�,����F+	����NE&��|����E.r8<\ � �������j��6���q�C��|�n����Sf��b����]I�������:n{�R����J��2��iiN����3�����a`�#�+���{��?G�D��[^�����i����u��t��{��EwW�D�LpSY�+)�<��q�@������6�\'�+�3d�Y~�-�h�F���v+�>U�)D2��+��5��Ek�����C.������-v�qBb���������`���M�W�=���-���eC��}/�w�����4C������������eI�2'b������7�`�r�8�Y�)�I�$�JJ�eB=
0rJa�xCe���A�j ��'��������`o�#�+?��Y�J������P���_!OoE�+H�TO��V�^fE�����	���<L���9�������*z|���S����\�z��es�<1]�pvys���O=c�`����X�)���'���|kL�YU���b��>��o4v���X=��K����Kf���.��~��?��H��HO�[�@g�����O��D�-�%0R���U�\S���|Of�b�r�{r�&C�"�nR<��E�pU����&O��<�z����&c�$���r�&h�<��Q�,���$A���h�����F�J�������W��V4�YM����xA.Nv$����`�*\���jH`f�9���9���i����B���X�&m�z���3���j�F(�9��6��+�>�����o���������ZKl���^M�'��#;������?��t|�|�v��
�r�h��/5��8*�Z����w�d�k��n�C�Pc��-_�_��J�8�Y�aTC�*��P$V='��1��%��F�������b�T]�����5�z���XH����&*��7+�Y�e�9����D��C.A�4&��&Qf����$� L��	�?�M��:����D��~�8�C`�����x����"h��D�0-�a9KY��jQ���TN-��W�N����;���&X�{<wc�2���Y(s^[��nk�l>��Q@j:�Z?�g����|�
A[*�hA�S���]�������N,��5�r�$t� m���5o	�+�,�y��������mg��&5�uJ�2� RH�����L���$Gx����4%���7�	��������0s�����(��'�q
Ajh'Nt0'�w��_������g��"�F�FQ>Flh�1$b�:��gI��]q�����Y�z����QD/���#�����eIDCc�p�����i����Os���}�1���������z-��E�[B5y�_u&���v����V����#�fL�0���TB�����	�<��	G�,�:�@"�C�����/9�����!�������}4}�P{M������o��1bn�/�mz�e����D�,y��b�#��Z��^����������t��;�pb�/�~�����a�	dG���v�������PTu�����!f��Drt:0@4�5y�����I;�7��y�3��7>�[��c���zr�%��R^=��i_�������
/���RN��R�&5�9
E���be�:$�gW��!���d��M�;r�j���o�_�����y�V�%����<���D�i�s����e�k�������T<�����~�,�i�p#�?1*'��^�!�j3~����^$6g����h�Ou7	p&,���mM��p�B_���@%�g}r���?�c��P�W��8����HaH��R�o��*�W�t�f��I�,u��)���sO�#����-���Df��y�3\����I���N���}#�br1r'yA�|��e�n���g�7w��n�Z�����F��-H����w38ff��j������c��h�&�f�p��h�f��>����n2���>q=�W9�O;��1Tf�@��br"p@G	�p���L�H��89p����C��4�����"M���O�%�����-���d�B��xU/$G�SW_�3���T*���;�IE��0x��]vs������B9��K���"�g���%�[������t�P�I�p5�I����c�`|>�j��Jb6>tA��*�`6��.c����ik��#�=#���jn�;��S��L��������HW'T$N7mD���W�W����%�����Yc��q���:��Qm�*��[�����'le��8-�����Bu������������K��q�`a� '�^����f�J������G���"�Q�3,�p)�E�a�p��i�������������KA��%�H�]����V�d���z,R�\��qg|4\i��h�����-�S���R ���[��K��h��:��o��mQ�crd3Lpq5�jp�:B>�����;�]�9���+�C�����$�]�Y�kaUI.5s'u8�M)4�Z�������9����~/.7u(�
���i	�zi��)�K�	���fA�&�H�.�g�A�b�_0W���P�a��Du����&���3T�v&����\�������L�L��a����O��e�C�[������	�w;�0�`SBJ�]ktn���	}��|�e~w�����I
���XgN��mc��@.qO
'=���>~v�=�8;;�>�8�^��*j���B�"_QF<���g�rE{D�y�l����t1�x�x�eNg���y:���.~�~����)�L�����������t��$�����\���d�t�JE<�(�@�M�G 	l��6Jf)�~���@V�dq6��3��g���9��?~sI����}��{��~T�;�l����Z���\G�Z��q
[���g��Nb�,��0�m����o����Kb>"�����-\"�-��j�v�..���a�c�����T�����k���uGG#G7s]���6V�����VY��.�<���&�@\�j���mn��@��WJ��������/�"�����we��O��H���"|�< $��������d�^Cdu~^�qe��@K����������gk����?�q�X������c��*M��c���SiP.Nx^pXY�`<G�y����`6�W�sM$���lG�X�����OA/����G)���pZq�G���^[���a#+�>��U	��7�<�<S74IGl)$;&@�� �(wZ#��g��C���;�Se�0�����<+L5;'��R)������Ee5c%3�B�%�=6�����w��c,u{p��k��h�0���U�@������k,0}D���N,�S�l��S��a%<�a>1&
��/���T�?�S���I�8�5�T	Y�"��>F�L;2��=�W��4���iU���G��F�[I����������G�z�:*��f -j��d`�\�
����G��������!�<M��A��.������?��T�4��,�i��:8�. ��������4zt.//.jMG�	�������`M[�8L��\��}f�_����Y��X�(S�a��)V��d���%��%���u'�>d��axh���Ay!��VX'/�[����pk�,�FQ�����u�p/��&Y
'1�b�9�v������<�r�m�B��Y<���1�����L��������~�y��n�{���t��}�)Umo�1l��
(��M�$�@�F�_��8�����I�|?�6%����t���z)�j�M�������up7�Ot��M��xh��_'[�i�����p!���zQ/}���~�xM����H��d4�f�CL~��6����u~N���1BfP_�)hi����X��l�|��N{��sgx���C��`g&AsS���	HS;�\�f�p��5���1%3��h�!~���e
�Q~��l�[�e�����x��
���n���A�ys��im��oI7��[��Cz8H������ F�A��Ur&Ar�-��-��*�b�����mU���a��������������G�d�L� �fR�/~��Z���0�T�A����s���Wo;Mb!,m��8zP��#t�M�-��ft����x/���-��n�3n�v��������v��_�x����0O�&CB"|j��I�Lj����C8�K��}0-����}��t��N���A��@%���hn��|B�yq����iCm���PbH4kB�+g�s��������6��:�i,���3OJF������3vIS�����\W���������`��Q��$+G����$s��A��'����M�INT��f���&$��:��J�eu>�)�����'C�d��a5
�����\z���A�#����M:�k����}�a>�?��g�6K��Z���hRe�V�l\�~<L�������\L����9Zv�����Vh�y���4���n	c:B�Q�"fe�c.h����"z~���k)�/r|)���e�|�v���X�O'��������������L��9>��F���@��9�8�:q
�8fm������d���M:����a�U7���3��C��0s�{.�zq���K�YX��5G�K:�X�h��EQm&�&g�t}�mv�j}����i�������+�Pf�\=��d��'��~l
Ah�QS>(��1�n�!EYe�%-�%idA��������=�/�j�Q]�0�(����_���]�����8����+����w��s�w;�7���#�����L��\���uACC;r�~�Hh�����``�F'�)���K���^Q�Vdv&aF�z4"f��Bg.-�J]�0���,o� !F
���`�`�Y���@��
����N�'*L�`q��pJzL�MWR� �)�Y�f�c��L��Q4���a�Q�0Z�;.[������{�;��IS�s�q����jV ���D~L�Q��$����Y�>�f��}��E��c�	��e>t���R*�X#�<[�����A�#�w�p ���A�i�v^���;��F�H�)>��u="a��Y	m�3���Q�/76;�+�XT��e�H��|S����*#���6��u�c �~�c�199�l�srKk��r6|��S�k��V�-:=!�il}���X7'V7�Z���br�����
�S�U������l'}�N$���5�d�'t���p6�X�Q�Iq�xn��/�x��+GK8��f��f���AEP�����Q��;g���`_�Z7��S��H��O��V|��\!!�K5��~�b/�Z����$��q���<�E�8}6[�GvMal9�"?�K��0E�,�vWGm��`����X����F^�c��N�96a��aR�me9��@j�<���������\�p�����Zw�\��d�svP�oI8!��{�����>��P��okY�"{F|�M}j�d����Y�PH�{���(�N{C����� ��j��[��`*��~|�$�GZ���p�{0�h�(U�X3N�@�H���N*����p�saG��P����w3�0���S��2�=�YA�=&�(�U�0�2���d�|��7W��58��PhQ�Uz�������B;��/���)���L
	���nh}���
��6�����>���dF�B�Y%_��es��C�D3l8~|�5C6����I�uM��K}�BL>�m��n/
��aB�~4aNz����D��Q8��S�������>����0�R2V���Qx&�����"�>yhp1%-"��3�����'���[�q�����vI=�L)��N/"�#�7�^�0�����e�+N�s��\��s2@�H'�a��G��	C3�.�Me�c�M��n�
�/)	�$zW�������
��.u��8u��u�0l�������k����b�I�8�hu�9��\�����b#���^
�wJt��x�	���A�0��Z�N_��r�-�ddlOM�-���"�1<q\�W;����8����(�v��dK�f���j�+Txk�K�K:��;�.x-,_�9�#�H���g�_AG�6�z��Z���a��wp��?\`�\����qnL%e�jv���b�tZ��h�gI�cWOM�����e5�+B*"6��u������C�]�����3�8�����
�!���N����|���=
��ePRhx��f����t�%Wk7�����S��"�yE�����[(�wR}��>����V�Z+o�xq���������3Y	&]3_��[ �&�_>'�b��/����r�N�+���������c���l��y]i5��IhE�v?���9^�����A�'���e5�A��dB\�������``���$�O$Nv���
�hH�}��4z�l��f��JlhVGJf����/= ��" �#5��2E��:s�k��:���
+����S����x�Y�������>5�������������bG�4|�[cqW'���X�����dK����Zt����`9�cp�$�1!2oy�������#���vf}�(*9�D+�0"A��X�w���*$��	.��4�?�������oF,�kd��q��8�u�ty����t�h�`��Y�sMMH;%���U�����9�"�u�C�`�n�N�(B
�������Z^N�%F�f�;�vK��w{;x����9[PIm�#�����@��	������#4�#��Q�����1sX�|�Uc OC��HK����n�,�y�������t��:����l�j �'h>��D���!�����C6bd�h��d0����%m�V�s��m��l�^F�o����J;�����dyt���.-��n��loR|��2L@v����f�}++%,�u�B-+����
:��V_����#�d�����`������|���gg�HGZ����K7��-�f��tcv�v�B�r�{5�3���������
�����������nE
r3�~n2Y�����G�>_L��8;H��3�j����*���u\�%, �OR�<��E��?
��XC�x��l�����Y@�P
��hq��#�k���9��-.������GBe�uo��q�-k�<S|�����x�N�����)F��bCAF�b6�42�f�=���xKi�t�oo[t����h4�,���@�xC����E�8gs����BnA�,�������q!�!
��F''
q"*|��9e������OT��k`���
y�x���E��u<
q3�����g��DDh�X��^��rWo"
�a��{��u�j:[}B���	*mt��|�C���Tt,�p���9�t�7�p`�V)��u�R]s�Q���XC��P�
������������3�\�c��o�m"k2m:w��d��$]���+q��:�7����;�6��7{��n{U���f���4$���F�����]t�������&-����������]����.����`��y��j��SI��&i���
ru�Q>�����l�>�������W���](3u�P���&*���7Xa���
�Kt"d���T���w.>D&�$�0@M�p���#wbMK������{��"#������H��E0��p���6
'��!���\��c&+�P1���������&��N��-�*	7+&/����e�&�H�t�K�9;[��-c#��Z����I�z����?�nT�7�I��	e�cxUQL��_Fse�����3*2�G.��Q6��Y�.������&��(
!��������r���p��Jh�����b�m�H�'�u��:00,,Mc���S���.���h������'��P7�m'�+:����2�(���FQ�A#nS3�Z1-���z>��k����U���
��- %hn�yim7����S�M���"�M����f�1�\G��pk� m�N�k���.N�WP��9P"�c�uP��n �%���3��$rH<�mNvCg$K�����7G���Fb���G?����9�1�y��5g�t_0�H�]
Cy�p���=-'��\UJGJ��f������P@L*�>���N��O�>(� t����,��\�'�������a�Ws�z%��G^u�zx�Y�1?f����Y�mr��~aa4���;�A�dj�Tu,�g�)�-/�(`�3K����y��"d�Z�X�������������2�>kv�����=�i���W�R��`�L~�3B�/��D���]��$*-!;"��� a�T6�L�%�<���,K�
	x8���7�(�j_7u��+9h�1���]���������s<d�&��%	OuX�$<��fJQwP=GKgR�,��K�":����������n�M��>��q��]BQ�����p���XO���w�M�]�L���Y:�_\?n���F�=^G�m8L*;g����YC�Nj��$�Z���T-Lr��v���������r��/�x��@�����W�Y~)"@����� �p�f�S�.�F���)�������Q���$&�����uX�d�\���1�o$�Rv�����t1H���j����(A`2Q�	0.�N(E����.�)�w�t=�/�0��W��-B��13�EP���	������e>�������=D���
=��tB�0�-Q��H�p+����tp{�������>7=�z���4
R��2y�|����:K��5��R
/��Sr��sG1���
�IV��^���(Q���E���:�����Z�����z������W-���Q���#��b��8+�m����c�}gw[J�~�-A8��Q���S�����U�C��r��5���-������5��_�����������d:�'�4�O��q����sC=M���,�Xn�/;����]��!�m�!��$��N�n�U%U�@.��_q��9!j��h
��_���zUn��
MGEc���V����k����P����]3;����q�l�h�E8�mqM�G�'s�:��,�_�O1���z/%�
�{x'(�9!-Y���z�/�����T����)�C�7����;8I����
5o����k�SP+N�13�CM3�����p�`t`nIxX��jK��L(5<�
�����n���k�A�o��k�R��`�IC9��F9�����p��V��i�R7rC�G��	.��O4�V�H���j��VQ�����`�w���m6w����W�5���v�c���{�_���=g��t��aA^Rq��9N�s��I�����K�*]�1f�I=K����+�����t��505����
�v�L�c�W:�~�j�����~j�A��W�Yga�5�7�@�'�7�PY���'���0�'�M������j�a���V�����/��wR�8*�r��uh�+�@~5�Q�M����M��P�]	��w�{�/����E�m��>��_R�N��z�!�O(�<��/�=Q����=�q��\:����~������ncO�w$#�~�bj��<��J%�
u~v����][�Zdk�����U�����%��
w�s�b~/S�����|BGu}-�����5N-�u��O�!�[w�+�oL���fJ-VJx�������'&����\7��6%���X�6���R ����]\]���\�8-�o��w�3I�I2y�������7�� TA�tv;���|��P�#�
���#��L����]��tK�b���~c�`�{~D.�qzS;}�9�>}}zB��em��-V�
���'��
��,C�G�q?����<��V���[��5������8��]f��G�)&J��j��|���&H��Y��&��U��	����J?���|`�u��%��p[j��]0��C�����s�powD���+;��p�a��������v�|*������k"a�
��$�TN��8�C�1L�]�dg�����O�_u~�\Q�Z�6��6j�$��Xn��e�8��Z������������}���G��@V�����{����,_|~���
���IHb3vp7�8a�7�$�2~z�
d���t'�o����N�E��e^�����Sg��\g�=�}3���Q9M�"Ew�T�T5��oj�3J��|k�����_$i�����b>@Wx��lH}���2���9�y�?[8zW\�~�7���z}n�e<�',�qR�7�;�$���~'�S�r7]L�����\�g��&����F����k��e����>kaC�����Q&x1���]���1�4}H����z��a��g�@�XD�j�M'3���h��3G�����B�
Kw��.'(�J���'��.��8��\�8�/El�i*������R��
�w����i�_�������g����"�}�����`��]+M`�0�lC����"�5h����x��g-�0|���.��l2Q�O����Gx�����	����b���Zs>��E&��M���?$��D�����Z�����f���������h�6jM�q��FX���x|��*plpB6���dW��H�'�|�N;��M�x�$�)�N�H�����2��P9��(�%�/G�a�!�6TC��?��/fW���83�}-�������@k�t�*��Y���.0E_o�uS�I�pu<�Q_H�b�>��q��+��P:y�������=��`){Vi�+�����	��#��q��v���������lvs��t�pI/b���4�&B���
���������*��P���-���B?���w��}(<�T<�-V(��G�����	2�t8E��'L��0�;����9*D�I�����W�R���"��I��N��}>F�nF���B5
�3w��/�� ,0%�A0��3�6;��;MO6<��{?v��<u��8�mQ�@��#�%=t~J��	��&�|��n>X���^+qMt7�Pa�]���z���e��1+��5O�����1�Ll�`
1�
C����������;����.X.��=|R�Yy&����j����Hs�U���'�3^r?
�5���&�m-0T�'����U�0�bH2���7X�Q�!������>g�#9S0�~x�YWH���/{(7v��;"^��2��Q��rP8t���o����~{��4(�9~t!L��}�w�-����H��L��A:NZa���-	19k�2(v"$T7	��]��?H�:B��������b���dV���h����
8������`���0
�S�?��c�P�a*������G��j�`������m����L�N|�K��z*����������9�
}Z[�]������B�R�d�����I�'��
�(��a�F�Xp��yz�����!q�7v���H��/����'�����V8i�)��k�b!%0����*��]�E:�d�x��\��n����n�wB�b?�{a2������X7	=��{��f������h����k���9��7r�0$G�B��.����Z���|�%0',b;�3'+��}��x�l�d���x�2!����"�3>����E�V��8q��]X��0�%vNAr"[��.���9J����S�����N��3����|�%���PU�������������+�c�,�(N�Y	���a�[FD�UY��I��r8��Z�u
�J�;���Yf�6s�m�r��R���\D"�B`�Jc�3�p�������^���M>�]1�n�Z���(=q�,k9)� �i�,�lE��MG"���v�s����.u�:?�g%�]C�q�+D8J%�+���5����g��_���M����V�|��.���� ���g>"#������dc�O����{�����,H.{'[��<�v���> �L"�o@f����yr"^k��\)M1��7Sn��XE'���H��LS�(G��[�G�����yS�\S�8Xw��S_X"�@HJ�E��Z�����`�p:����TJu>,�5������F&��s:��(����5Y7��n-6I4&"$
tCj���Z#4��5&Hw�D���?���64Ms�)
�`��p���I�;c9�����-E&k�ZO�R�L�	���n��������W��V���K��ly v�h�i��fK�(B�B��n	Yq��'�
��p��Wa�4{>J�(g!��
���g�>��M�a�W�!d�5�q���l������A�gRr%��q����o�QZ72
?Y+���^a
)>�G��/V���y2��JH
�-
G"�DX�`J;T��h����ysr�Zo���yj>�����5	�D�HYnm~UZ=7&���2�s��N��r�Y�nz��R���$a
CYG����k�����������,9���(#_��
'��<����������+�V��M��)+��h�C��v�(�B�G�~6��NVS�v���a�����c����#hT�x.�	���Qf
�O�c�K�m�������1�'W�o���.f��c�m��.��d���,_��4�Qzs<z�?�_l�/2)�0|��E�@��.(�.�6������F��0�_�4�*�w��R���'��f�����%������m���T����H�
���r�ug��T�kJk(�#��<���W���3>'@�����o&~�y������7�ulg��T�;�
g�$H��QV�i`���~��rn�I%d���u���KR`\��G���O�U ���GR�yo�
Q_��F��o�>��;��������%}�����rH���L[6�/�D)���2Yr�ld�&/��SR�p�v����
gU��xP�j�l ����&{|*�������0:�T������tq{z��;==89��w�����o�|Q�]����	��7�oq#�"9���`�����\��g+H�k=*�.��~-����`�M�Z��0�@V���t���[�)���f��s��x
&��R_���f���)�����������E2:I4���_��V�(�}��h=��{'���� aL����-��>����^�M��B�Eg)v6)Y/O+�1�b�;���tLK6pH2�E�,�n����XEd��?�i���=9���x��q~��l�����1�7z
M����8E,r����z����c��]-��i"T<'�;M��t��]���5��E�L4���+j�r�b��y�������l�+j5����|]DVNyL���t��VT�'���!_9�������a�d��^�>)���?��d��X�Dx���)0�8�W�O�Z��O���(0�l8q��R[yAbu����W�}��������N�@Gqm�u��T?����������>���N�0kA��]$yn<=����6�St���a��"�d
���A\@�6&����'*W��D2Hx��T���c�JJ���l�E���<�6gLV��Wf���y3���]�������V[y�f|]��kIm����������;'�x��#J��x�h�F�f�DK��c��,�3N�J�_S0}_�����qk,��)��9NQ����-C��{�������
������!��P���j�;J��K�r
PRZ�#1W�,
J����)��0h�$QEFl�#@� QZt��������L����R�5�!�u��S��Rs�K��aE��JV{W���Bv�ipH"���f��C4O�j����
:"^fzC3����f*����Q��47�[&kHV3�������C<Ol��;����� �[1���+��q�%
�P~�����p���Y�[6��1DM�4�#���� ?d�������=��������C*VnR<q(�ftH����������P8����G�����Ytjc\k�������g�]��;�'���A��2XU��m,� ������&�]���Kkp,��#��������A`����[}�GP����C/��X�J�L��n�&b�+����������
p�:6�w�\0�T#��pL�F����OLB��wy�����]q&���<H9�k=!���d]<��h��>2��]�;��^��
r��[���g���>�`"������=�O�em`"���{�
��W�(��j�
![u���++���Ux	��$�?���N6��f��)���=H^G�v�n�
e\��)�8�<��s��y3OBH��'�qu������������-���%3����l�C���M��vev9o��I�M��6Y�
��I�Bj�����'�eS�^���,C����������.�����BR#N�I�}qB�`�����_��
��������wjO���n���F�k���F	/4�o��C��������N�VnQ����B�?�����N�)�3+X����)x�LD���ee���]���z0��������Gk�$��$�T	1���Lo�����F��DZqn>���Yz#�-0��+����{ 4��"|a�{�"�T�9������3����v�b���hxH�1��F���������{k������Ah��LpZ��;A	��w�}[�e����������/�};M� ��C�zW�����Z5������sa���'�.�?
o]���4�"~�q��� o5/�s\�;�t�]��Y���K�9�;����Q�����&�x^�w
���C!�[�	s���|.����K� !lIQn���w���y�Sz�O��OY���y���2�( 2��s��/������'/fV��r��=!�5/Yi�9^/:�w�mf2��,Q�u�h�9k����%jA2f�B'���T�e�eW\���t��y���}�O��3�[�
���i2�\D''�'u�Tp��)L����Q��T7�U/��%0�<�>
rCx��� O�Px�$�c����+41�Ze�Z�����5�u��Y�f�������$�A��3��[(�N��k�cF�����>/���w�h�48�
z������`�z���
;��%�@�����",��V����:KT$���t,x�B:\"��<}�����c��t�oV���Xy�����+�j�I�>�3L�M�b����a���3 �����1�sF�{�k_~d�c�8���K����oP��g�PCT�����+/K������l�N��_�<��4��H��9�R����4�~�W1�80 W����F��x���
$�RFpR�Rx�s�%��84��G�{�n��"��yr7�<�������-�s��)o�j��H�x��3��>�4����sO�P���LNBh5�p��6�V=�~���Idi2+�~E�0r
*6�������pJ��Kds�a�Ud�����:s��-VU��y:!|#���#k3�$j�f����&l�X4
�O�xQ�N��[������r)Z�e�g��ttK�a����|�{z�+��=����+��HEN'�b/]H%����N�;�H�h[�R���������{����r��g��2������m&�6{�k'��1�{�	.�����#,A)�(zP"�����9�vQ�.�X�������U���0�9`b���������g���U�6	��:�^���a���e����7dy�9�m���lm�%N>:s��;�1p���
���)M�m#�_<�>.O���O���Z���bN�*.c_�n����i��5E�9���Ji�a� ,��~���a0�~�+S��iA��l�S�,><A��j�7���8�|�	��-��9��p��PVx�A{lJ�.-u�G�=���'�4T5h./sK*$%��D�����,z7%����^��u�\�����,�;A��5� ���`����p�r�X\���r���/����(���
$�q�)�6�]�b����{��Q&mg�=�_�U�d��s������6_�7^0R~U�24T���`�zI�< ���e����;���xc�B�b�.����(A��W�R�azgn��v�O�����G���G����&r����(����Pu��P�[��Ev�N�yls��Tk�=�U'3KU�J�m���� �����Ho���4�{���5�F��+E�_�������>S< \�i��L-[���j��������������i����
��h���u@s���1Y����/1����r���)�����%@q�I{��i�Ss�'4�L�3��\c��~h2�]Z����.,��{@������o���3~��dJ�pRy��J�}8��P�SfSu�a��t�������^����o�RVa��e�%�+@R)h�������@���z���+�]���=��5J
:�e}���dh����WAN���<���\sMf���
�^��}��a
��)
0��l����V�a�+5��J��HuT�� �0�}*LF����o��~F$O�o+��.S��8��	����M�p�T����CW�����>��H�B�V�9�O&#b��TQV��P��a�������}�1�*�V�o��dS\�6����g���2��9e��)$J�]�E��l^KW`
X�(��b������u��I�*Eu��K����_�@���kN�������_����B��b����?���m����Y�q���v���'$��=�3�7����{��kYh�a�6r�qn��
>�a�:o�,u��������%���h��l�=�&=����_i��t��^t���Vh��W�j$v0�>��v��q�U	�w�u���7U��$�$�.c��x����%H����j�����>�	q�l�����3WY�M���1�����$Y#�~|�&��m;T��T�.S�]��y���u�Yr3�]�����,��kkji`Ig��6*)2Z��^���,lLib������(4v^���V�e�2TWK 3�E��0k��~�(g�XT��l'�wQ�sq�=��4e29e}dUD��EY��b����X.���m���w�x��'?:k4.���_�k�U��*;g,�
�l(
(y����R�:������&V�>]�����G	��`���i���g�~�I)�]���N^J�h2���T��8�1e�xg�	R7�}?|a��c�vG_��G�H����7�,{G6�1(�si�F?��1>�����9"~��0�?�����qj�5mf\��
$�U����P�M���tL����������}�+-{��+��y3���!��
�Ch���R�J��J�t{c&i�C���/��w(r����6�=����a��8Q��l6�P���������V���]a���U~p�������'�g�{�^E5�>����d������f0�3M����7������4m���1iB�:�,��0.7(����H��>���9�=����)��7������-��9A��tO���I$#j<c��F���B��d1H<B����M�.H����'�[J���.��*1��2k��9U��7����4�ha��=AFI�`�9�+������	��L��L^u���i	�kJ�
z��c��Gr�I����t����o�O���`�.`���:������7�'������������'�������7o�O�^x�](~�^DK�h��R�#���Y��M{L����e���^<&X�^aVi�\w�����Jn�(39j������M���
Be|k�L��yu7���F>�4"��48�n��T�����KP4���+%2��~spB�����{�m�87;��`�3�?�XBO/?go��}}c\5TQ�����������`���c�kD�dE�W*��J�L��J�^/�����

-/��f��&��f�����i�_����s\��������2����=YW�������u���\Y���Q������e�p���Dr���i��=���E�Y\���`��d3cw
!���M�	FZB���o��e	
�C������WH��l����[^��-���t�
o����v��db������T��J�l#��FZ6�!����yH-�=�Q@�Z�,�i�:pr���Yj���B5�V����\+��yM9����VY�3�)�#���1��y���\��kzP���;J����r�(k��i�#�8/g�p�L�;�@�%��H��sUtHL�����F�,����)���������xD����9Bhl�k� �N��:�t�B�U����b^�&�����\G�,�!P}-�hy*��}p��%�{�j~U�1d��@��Q����B2PiGz�h���=�����:�&Q�)���cPpA,!tg}����Q��*����~��g��[�O��]�d����,����4X*Cg�1���uwnK�w���i�8��e��j����&C����i��;aE)�`f�:X4�>{R��q^��i�[��1��
4��%b��)���v�p��W0��]�a���u�b�?��!+(�kG?�Y��.b��7�N��f �{���e���G�i��/)��������5c���=��ds�Kvz�����:��z��zo4���$��br�N��SlV��������^��Gl�EL{��UL���A�a��E����>]?'%S�.���,��:�?3y6�R�?y2M.�_f����kW�=1��/U�9��N-�c���H����4
��(7c@d�X����6��H&/H��^K���'(����f���f�����&��o���|Y����5�T���:><B P�c��P�[����_��T�|�]��� _���+>?���-<?<G�w~��P��~��d���
�u���N��������K�����F������&�)��:��Y^;$�F
���~8r��&
C��I-��/�7^���z������w:��<~fr�P��<HZp8P�������7#�� *�u�	<c����5�Z(�qd�0 v�>y~��+V�P2���
��|f(�����ws�G�V���<�gV*���u-m�.��)���T�'�������V���SS�w���v��@cxF��J|����g-�h���'��r����7���v�����9o�����[���e�&�?��7�*��i�-��;�{u�m�)]__<7�[x��c��D0����������������;����������������X�w��{l�
#�)L����M+�	v�#���?�ZnV��=e��'cn)���NKWq���q�K��@�,�� P���~��(g�&�g��ID-C��J.���18M	'-2�J���SN�:8� �al�h���W���T�i%3�ly#�P�/�CF�,X�S:�[�z{��n��eouY-�R�
���p�p�L]B`����@��p���O��E'��P���qu��B����*�&~��_K(�@����T��/=�'Shx����T*�~��K�u����),�Jn�_[����"?8��;Z���_<��"�ew�)�������D�}Q�=��b��������,rxzv���o�"$*_�=���<����f�7�I��x}/� JsB)C���v��t�XR�����cT��Q|����<6�Q� �j�GZ:�(�|{t�����;�mZ�U5/N�O�%�\q���<}�������(�H|v�hv��&E������������yg�,����v��3���*�t}�*`��k����Vv4t~������|��z�
=q�4f=oz�t��t�3D~���d':��l������Y�{g��Y�#���������/���>���_����X�>�;�/�U���u	�fB�x�4�������+����(~��1��xW^�/�b{������w�;pe��R@
>9k��8�RR:�����?E��(i�k!���w��,��;���rB�k OGsE�[o�w���m���{�������%����F+^*>����L�f���p��2zf�<�?g0���3����H����CKC�"�R-��{�53�\���m��q�g\z�nfp�8�K�;V�o�T%������l�w�6��^������l	<S�m|��������(�H0�V��K�#����q?]��nn��[���"��D��4��Y��Q(��lz#�oh�w������'�P��#��5L�	�L?gw����U>��l������A���0H��_�����������W��D6L1�\*��z���+�����t����V;�����N,�,���z}�eV7���;�������(��d>{�G����SR�3��IYZ�;f��x�_�5S�{ab����XB�����"{H�b�Ln���}�|�I�0L���R��6��|-�5��5�M���|�``8iCce��Hh�6iSI��%��������:+�T�*F��
��)����rfK���*�(��FM�z����@'��i,"m}����9���5���)�������4�o�gd<�.a���Z8Z��1B�Qe0,�U����'��V�4C���c��=�O�b��Vy+2�&�3�'���x���E�Fv7�*��v7��T��ut**9�1G�<p��O�����~�|�!hene��m��rt������%���pP��������n.n�)�$�
���b�����G������2�j�������Ae�	^#���+Va��0�eBH@`����f�w��(i�n�ib]�aij���N��m����;������6�����<���A�M�����u|w�pt�}�������D�Z��!<'���%�qy���m�����Y��?�@z9��}�@�|#r5��[�8�d
"�%�8����E�-|&��~UB���I~��/�,e��,������S����I-���,s�S~���L$3Q�(�����F�(���S'R��_s@���V*�Z��YO�9!?�T ���L��s�'�1�!�,��*��c�!FW�pP��"\�#n�#��x���
�h�7+9�Mwb�]����1/|} �n��9<��`
Sa�t�o�/�F"tr�]�Dj���,h{����%�{�j-��t���A*��ZZ}~�����Nq������68�o� �|�_�~�,���7{���%Wk�3:��z�����ho��uV�U]�sk��(��zSJ]�;��NDk��'p�����m8I�m6��@@��q���~�����q�\!I�@����r\�\8�_�b�����o��-F���������4]��6k|8�4�P��8��5B^
��;0�m�O����\���I��O>����T�v��m��m��im�
b��)�2��������s������&�
?�9<:�������^]�}��o��	b,B�k�YZNM����u���5����w���-RX�c-�Ka�%����d�N����J��nq��� .<��d0y���jt���c�:�=�����(�/����!���B �JP�4�����N�IL����������C8����=�����.�$����?K��
�j��K�n��1��&IPw;����1m����%��C"�aF�603�q���E��M0���g���W�Cm#
Uj^"�I^�H	"��#�$�_j���B��%F����c�z"�����>���+��tx�a�H�]]��F�=��G��i���,t���;��^$cd?�.2�����<W���w���/��Sl�����_��k9��6s����(?|wpr`�{���t�\}{#xS�����w
�� q��j��Y��U�$G�����������5���N����U���~��w����������p�����_�������������*?|��e��o�?e�����D��gW�����\g�;*�?�-��-w#��
�e�N���'� ]��a�S�	��T,v*�Cq���W{�G&�?>=;E�f����
��-3O���-�w�)��|*�������/�>�}U�3��9=�����9��s�qx�N>���q���"`�F�k�����������L�����[�6���������������q�9���W�;;���u�����][��;[��t;g�����w����-��-��T|	�h��Zv����rkd��H��a�R��?�u�Ip��bx`2��`���A�V��e�)tOJ��n6����n����k���u:x�������n��n����u7\]�� )��@�#��)oS���F�C>�4|E�q��j����`N��V�s���\#�LJ����f�dG�-�E���U({�M?#~-Nj���������Xor=��������7�m{}n�[������fab��P��+��)|z3J�{-����������V���|�����{����}fq_�<������f��%�o��?[�:n��k2a�����
��6%�T�M!�LC�b���J`�m�/���^m`��D���1����)�P��6�e\������Dr�2���������m���Q�9�C���Pf���������d�\s,5{M���:jEQH��@g��(uqk�p>�P�����N����7��~g���o�����ymw�p\��R�*O(��;�/.�iU>��"3�����wZ�6�F����Z��a��{��]
��h���g������5������yA,��FJV3[��rk��k	����t5�.�����+������_�X�U�|��_��s���������^B�����q��M�u~;�������	���j��y�|Y���������������e��}�~K����������.��q(�������7��N�'������me�G����r���l#7��;��/j����=�������_�y.<op����f�U����?�~�(n6����`��5"}�>Z[[�o{{���m�6:�����hw����V����������������&n�_n����?���l����w���~�<l�6�7���V#N��N�q�x;�o����9M�M{��ZO��������s6��S��d�������?n����f�o��/�h=1�7c����c�����<i?~��0�-� 9�9������2�k;�B;�?��j�����L3K�|�~�,c�e�u�Ua�+����P�O��UT�_axzm^q ��%�6�)�m�!�^�28�D����A��Fw&A�~N�;I����U����_I�h'����G����j�6Mk�n�O:m�Wl+7f��j(�F��
;@��.�%W����6%��`%��Z�����U?m����]ej1h����p}u�D����o�t
h������H�C���u��?���m���u�L.��[{H�t���Z��x}�t�kuvvh������E	+1��ZO���}�}J����iu�R�H{)b
~pm?����]����Z_�?;�p�u�;pe(��J���n|�~y�����m���xR|B�1��������j�M������������&o�yB����on�C�R��j
>�1�d���Z�&�U����VWg���b;�X�������E)���%����w�}j6{;��d'��b�����hivKZ'�P��C�	yy�'s�����Q~g��&����_I��]z�w������d����;�����5T|1]�lz�}E���C����%��g����o<z�����g&����$7|R�f��KUl�����������z^�A��u�5�T�o�����Y�O���f%���&��H���������J�
��I'_n6�d6��<>&3��C���_\_��FV�bE�P�2���q�����X:0�)-�8���"��Z�}�&m�a�Kt�z6[�I��])����bt�FtL����Jjt�z�c�,�r�x���Mvt�Ng���9�F�x�4:����h�T�3��p�����,a45
�jn��&*t����������.��+h�7mR�lZGe�Y{��U,�u�1sE9�?&�Q|�5u%�e��o`
|��3��6:�|^�A�[�b����X���4��\��~���g����?_��u}mos��l66���rD���Jx_Qv��&�W�gG=2���5af���N����z_�8����|_�|}�E-�'���(�%P�e+z'�WS�K=*�
T�~����Q]����\���}����'�}qw=U��~�J\e����T�=e����I���q�-��V�4�;�p��p��6Rp������������?�t��%9u�D�?HOgW3D�����`�wLc6T��5�0�����_�T��q�'P}��a2���u���\L<G8����P��~j����[�W�lQ��M2����*�]	p�(#mK������1�/
'gl��t/�=����p�;�g���zg�A�j�$�zk.�&���%k!,D�NR��&�TS���;5�Mf��%�FO����F�������2wp��6O���LJL6 eE�n����0�D�V�D��p���;��J5���7
�;�������N%�c2������n�L�=���������Q)�j�@���"Z��3\����9�2o�l8.���=�g�<`,��<r��b�P�q���0��4�����G7����3mK����/.NEh��PW��QvA�t�(
'��
���?G>)Y�bE��w%�P�>�-O#���$H�H�cgd#0+���R�e}X;u/�6)�>��^j�|��9�,������G��z��,�l����H*�Q�������C;��`�r���}_-�����@A]h����z�@����	������I��6cN����f
�}���$�p���c�W~�k�1��\c����xwx��!)j�5Iz�{�&��z�	��Ut�����f0�Ie�7�M�3���p���z��n<`����v���b��OSY��������(������@*�X��<�_I���2z������tX�B
�q����	���Gw0-�a�nN&��z��t�1y5�}(��+���������-���lnl��zI��k /����M�xl��?FA>�B�L����R�����s��ln2��}��'(��������W����^D���e��P7��o���X�u�/;Y�H�V�8��e�p��9�.{�^~6�K�,(�e���u���l�wv����X�����9�)�����_D��S�T 'hU������dz�����_QR7Gh�Y��)C�d:AF���j�����y���
��i�����EE:.aJ���TfCmI�<��0E�@o�l�'
F(�
��i���������,y��7p��7���j��Lh2R!���Do�W!��U�N&�z�A
�DE
Y���
��LQ��w\�r�
-�
fK�di�`8+��-���������>t�Ld�%��e��9�c�n��r
�������!d�7K���Mvs��'}2e���!�����V�.��6�������W/��,��K��:Z������V��9��w�2N%��������p	���u
&���\��d����zg�,v��qG���e���Oo��������������>��_�U��3�Bw��>00��������c8�?p�1����'�o��N����?~������~xvx���/���@��5������pb���)l�-p��;T^��u�K�$Wh�������zt�ro�N��v��B��<�1�o]<��lFL+Gs��j�v����Ta�?N���J4F�<��eLioqLW��<b�n%���������@��A���0���JXs��&7#�����!I��G��K��;X�������
�-��EA1��I��w&�0�V��w#�v���_[��'DB��L�<z�*��7��t�)����4����n�V�f~�~��au�18M���!����E�4��O:c��Z9#�e�P����������V��k�{������[T�PWZ�bX�+�&��w�wK�l���N�Y6�,=�E���r2I�@�u�����g:���v6�{�U"��N���L�d�,$w��S������_��fCl/�GW6�-������h��7�'_������e8��d�#�?ay��2R
��I��T�Qn���]st����
���	7��U$(�E������;:�Q�4���1��������Y��i���-�t�\W������X�<��1�&����y:y!6'kugkK<����y��L���w64	j����,�F)������g�h��R. ��0��l�Z��9�&��h�?����y�d�G&4�C&g�:��1�4���\�q�|F&���%)l1g"���V���x(b2�&.&�>&h���������0�gx�X�.����9�iP��+�-����2����c*�S�����J8��8J��
��E������c������f���-�rJi�R ���������'i����Hv+`V��Yu��o�;���7���Q�@RO�}��������M�6���Wm�hS����d�������3�;i<�Z4�J��e�B;��jl������U�2������W�
#!1�b���#��$N��y����C2u�3��8�\�_dX1������W{'�����D�6��4C�����bne���3�RR���5#qY�h�g�f=�|������
�����$iO��?W�����jo�c�J`b'��&Y8O$X)8D&�����c�ZiH�/EY�'��X/�_���V��^��trX�B�:��8�5LG��:�
{s�����:�-�LDn�g�Ify��.����&�uZ�;�M~��f��cU?�>g��}K2�s(?*�\���$�i���K��a�W{U�TR�K�s�'�;2�~�����g��!2��}r��L<M�
��1�"��K�g��@���NP	��(4��Kl�c��Qu�*�&�6D�o'�!^h
<�����N!]L�P�J*J��!�+K1�^��x�v;�x&�}�R���#���Nn..��b���z�S��Dz$X{au�T�o����Ca�E��g���pX�5��^���6(�V����������=�\9�	w����L�J��o�����u�"�������4�cj�K��-1����Q+T'����c^Z��	:L�M�rj����\����=������!��(�=i|�fH9���KbE���.a��-�R�'la�M|�����|8��9��<��%��Pc�dWST�W���f��y�x� �S����;:����7�Vca����7:��Kb�t�B���u�0�3<���.(~6J'�m��������>�����qSKNJ�5�����QA�i/�2: s�j�;�f7B��_?*d>�c�L�l�9H�9y-�Y���j�����H.�9e�'
j�B�%gU�T�Cdb���]��.;�H{Z���%KKF��������8G�<`���`_���n:����C=<����Y���������H{�D���H�L���r2��n��x
��4$\�e�Wh�Q9y�W@r��A:!q��n����L�����;�5�0�K<�
Zg����g,���M���+����a��S�PP?����P�)�h���W]�<3_�Q�\��w�T6�����(��o��\�R��~����d���Cs�k�

��l������X���A���=E6��'A{�B���[��3�Q|!2��hS�w�n$Wh���5V&�z_��H�f�����W��� �D�$���M�6>D�I�r���_��;�s�;�"w�=�[�����-X�+�{��|
�Lz�H��������7����J�@v��}R�yO�����9�QC+E�����?[�m>������%xS4��v�P%����g���s&X�������Nc<��0n�ic���U����#���t���|�_3\]�A�t`*l�J$�I�
�����\'P��	�1�+��Y�w9g����
8MQ��h�^F ~{������S����;���d�����l�4����s��<p�����xR�������o#����%i�~�hs{9���9��SdAoR��"�-��z[�Ca��{�
4����_]���?�����3�������v���=���Ev������Y�7C��O�wpH��/U���Q�c���������6������f��P��qg�s8
f^<�m���(��W����B�sYi_����x��Y�������i���9V���~��g+�]��I"����z��n�0��Ddr��>l�h�D�4���N���)p�
����E�j�(ao������Y��������*,�r���AE��
�������za��{���p3H���<�����MG/�.�k�P��r�eI*a���o�����P���O�LYdb�%�>,+��/}����F�"F�~De�#WZ�g���h��k���[L��I��]_�'7�w�A���J����y�� "2"
����(�8�B�\<v�r����2����>��s k�@��8A�uB����i<�L3F�(r[���� i�"������g��Gn�
������B����?E��Zk�������OL�B�}�
a$<OB��Q�l�����Z�sa�os
�� ���=�K>��:c���>K�m���4���'���C^DlX�X���W�D:VgM�H������>�xg2E��n�v_ �/����)�z�wvx|�=>|�-�.���YC��?�B���*�/��1zie��y��c�u�^��jO,w���W�f�Ln[�2������.�#�L�]&�ZQl�������??:>:�oX��]BH�]Tae�������K�az1������M��EzM�$l����������&�C���"�f���W���##��;"a6��i��
�]J�����	�Y
� �xR��(�^<M7P�V����o���!��I���1GL2���S4��*������f�Iu���v��5]a
�@RdMy����@�7�vR��i��.wY��i&�(F����L�����M���H)H�����(�S��v�g���0a'�dPu��O�za�]RX�x�|������������k�e�"�@�Ty�J�7YRJ��E�
juGa{1O�NM�PBf-~O�}43!�
�%��nG�������p���1����s��������rJ��d)��A����R���r������m����������*��G2�������!YD�n�q��>����[�$����~~^3$�r?�u��;z�/������V��>���V^Scg�"e0�|���|���N���}����a��U�/?��?�Ms:q���#K�G��K�)z���l���GA�s
��I��(���j*���PU�&�8�!AE�(�|��q0�U�-� 2C����3`�9hb�o{m�i���)���-���M����K��d?���	�HY��]%�l+���5e�J=(�&L�?]k��f�G���u�s]MOi��=T��[	��5x3fx���_*�����>�_ ��Y���}�Q�>��p?��bF����MqQ�Y�Ut��AP�<8Ydr��O|�<yp���i� n�<u�kU���QsV�Gv��r+�,������mn���V�����E���Hk���p�Ll^����(����	2W��]N�o���v�q.3y_]��ym�%J4��������%3Za������Q����M�o��+�^\
��e���q<���D%��]b���CT����<�-����
'���\Vj-��3��%XYYlk���8ka!����1���h=������������}��z13��$a7U$
��%�f���}��=Gn���{��+����HZ�����3�E��#����������3tB�c@jq�
�Xj
������5������oA�B�
��p���LA��&,��UTX@zZ�0$VB�G�k4�R�v��p|��Ch<�\�'����������yE���r��l�b�TB�^v*q���7W'ca�[�m�����^��M�%-����pq�������iJ��^3����I�i��mz<%���>@��~�+���h����
���o�	l���v
�8�����_��9���2�E#6�`
d����:�4���mt!�9�
6��������]'����jV���N�����?�ps����%�����
z���r����<!���7V-�B��A������]\s��]g�u4Dx�I��pi<����������b� j���������%��K����#�����K"�i�]��%k�9�~V�������������;B��<+�[���qJ��r���s1��?EfQ_6_A
{��r�)=?��0���
������U["2�n��q��O�m����*5W�KX��K)&|r�#OU���]U�G�m#r~~�I[@-�p��c�����;g�!}%)��c�(��A��'�������<w���Os��mf��l��d�
��i�o�����r����D�Ud����k~�x`��gV��#�1��Wr��wL}������
��R��������{#�����qZ���xl���/�"���8���N�)�n] �d�6�Ku#�	K��mW7�@-�,�M.c8j6) �Qn������f��W/1qNS�f|a����!#��4B?q��������L:�^���Dut�������ymQB.LQ�v����9\�?�~�vi��O��$0n'���.-�i��fl��YB~x�������:��c1���Ek�c��o�)��X;-��)n����[=��������7c�Y%���;P4���H�(��}��4�U9ST��M��lG�f������6���$+l�_���t�������!�(\
?�\7�[������r�{��R]���`>^tTW���z��;�f���L��@�49V�=�����E��������������qhD^�����^7G�
Ew�n�8����%})L�M������#�:h��!�E��#��Za�o���Ta��2�3��k-�=$�y���ac�bd��(v��H��),�D6R���u	�[�����)����#�����]�o/��}X������(o����Kq\�6i���;��Gl��Jw,�ce]����{���-Zp{��T~S��_��2�=�!MV]�J=��Z+N�NC�-���).�G��"f�����,�n���������#z����e�����R<����"!9��b.��CL<Cb��cL�D���G�)&�[#^�U��J��v�V��9������ ~�(���_/�J� #��
�>���v>���Si(mh��XZ���5�kL�`87�;(�a
)�����v���q�b��E����"�H9��Vg����xPN�����fP�4l2��jW���;�-U��������5Q��\T��^������`�+C�=�!=YH�k��d*'�<��7N����T�������nNQPZ�5{����Y^�f{�Y���F�gz���|P����#R��J�5���o���9�oi�3bq�J%��b�}\s��)\���5�P
�=ay�ay�;\���*q�6��Q���;w�E��V|�����+��k�k�\,�H1��o0��@��PQ�Xjd�=�w���f�H4�;��U��Q7z�����1�^$J��=��O4����g?��~�zr������������u��F�x��k��0��!��S���V��&N�,G������?3�����_�'XU�6M.��,��Q#n����	��0�N
���m�^�����4j*%`���yH��_AW���-�8�	����z���7�����v���r��y�������F��U�>@~�&Ee{H�[��co��1�`F�r6�/�,X�.z9����k���������g����Z����CHm��c4a�U���iQm(��#��x�md��!<�C*�u��KJ�5�����-�YH��ga$[������~�����.����R�&X���.d*�
�_���t������p��S��;�4}�2�X�f��m���jR�q�}�E��������&�=e#~�%��1��������2�9�[�fs�~�~�UNBJ�����P��D�_Dc�9�Q�/�]���V��%��5�Uc���_�K�h0��&��"H����:%�LJq=	���O�����zO)��Wj��@�{7��
�����^��~�[�\*�qrK	�uJa�p���j6���� ^��^�'�P��w;s]����@��u�J��?VW_k�Z��P]�-�������35u�q��l�
M�T�
�}3���"�?�l*i������C[�Z�A3�A������*���#���fA��"�������[#�L��Tz�����K�@������8��T6`��kT��-D>N��"�n�p�0XFN|���������l
|&������"�%)%���pu��$�^�~K���j�8hB������������=}C��w�����w��v�����v�;��6��U��a�������LHw����c�����8j�:����U��!��Y"�����-E�R��?Rt����jyp!uLC(!���H�b��,4y���%Hr��1v�w9
Z��D�LA��Ac��T���3qJt������I�����\��r��_�c@K����EK���+:gJQK��+�0�Daf��\��J�P+�]d+uhz�E��h��,��S ����LfXs��#�x��MA!.`���gZ�5�D;%@N��2��)4M����������>��%�(�/�%N�N�J����)�Wd���=�w"�Tx|l.��Aa��Z��$)�>i
�A�H������A���.5RU"7{�S��&���|�!�KAn�	��M�5��Nop(!Z��1a���+
	��b83�n�]N���G��rUWy��o�k��)SI<�q�(L�j�hO�?���h1At��=����5�|�gk��d��	B���e����<z�-e�%���K�V��J��c���z�:U�f&���f�h6���bV��0�������T�X
�*���e���i��}G?t���<����
�L�%x$|D:!����z��c�I����0
t����*QQZ�*�b������9t�4c���|8�I�x��\"T�}��8��B6s/$w�e�*8�q���aW����W���`>��*����L����T]����������P��V�\4b����-�g_!&�
��L����lA�������+����m|t���3
md^)�yFR�������w�E=(B ��
�0N��/C��n�-C|!��S���Lb�����]V��]<�W�N�D��Z��k���5�����8A0 ���$�\cJ~��N���O��=���}d}���Nq<�pl�l�2�������u��N��!��Xt��i7^H���H��;N}[Z5��R�!���X/A8��"�x���[�h���U���^#�p���F����)������f;<���|I�M���1�xn�"?�?2]��(���+�x��m��?��������^��W���.U���W�LG	��$�6�v2N��dH$�3�(�R���^�b%�q-�(A�,k�c�l�2#�U�JH��~�2�&,.v��)���S6c���*,����!����L�G��G'`�����Xx�l��3c`�`����H1�j�{�<��EX��\�������v�������Zrac�&�g�".-���8�c��@'��QK��m4/�����<�L�>��.'��Uf�
d+k�OG�xV��,����:(�S���N=`���������������
����,���&�P�>�)���
�/;6/U�R��E���G"rS��z��7��z��4x�G0{J��1l��x{��	}��I����H�q�),���^Cp�l����T���)������kc-�����o��q��!{=&�E/aV�xRz~T���(\��T!��x3�H�}�h�DW��7�5��%���k�	,*0�J������#A���M�z�b�"��r����������������Z�7��v��
��P@�%�}�"#�������Z����AZ�:+��#��6����h����%�ff3H`HE�|�Bl��Y�G@�+��Sy���:GY�f��GDR����@�@�dS���u�����Sj���e�gyz��Qg��b��a��H�����%�-��=:���C��z���8�%�)&��T W`� c�|d�zx�
l��_P2����K��:����HOje,������$b����_��?�N��*���3c^�����������Af�V�/�'�@�)gJ��+'C4��b@�.�G�K��HN��l�u�)bZ���2.�(����a�~��M��4B�����\9�U��xz>1bz�2g�B}��73sq�;K������A��<Eq�%�a�@����iX<�����t����)|�������M�v�*��N�>Da��t�R�9���|�DFL�D@�4�
���B��[�yp�k�\��}���%�zzL�y�I"5����#l���l@��H�&c	9N�nM���A:1^���3�C�S]E�7��}}x�}�v�Og���N����#�,�M��L0�����"�
��=� ����j!p�l�H���d�?�}G:)]c��ho������[��g�(o��R��]�]���=���p��e`<t����T�XV����VfV��&���&�6�4������t�|�,�Oz�����j�	����	|r�P}�	���"�9�������D��p5G���HG�^��vH6=Z�����c��.�y��9�d����y��2������d�X�����n��y#�l�|��5����SrZn���j]���-���Y(�����j"��39(�)�t��O?/6��3�SK:��O����M�I��%��G�TK�����-��X���L�>��I��2+7UW��+�~���,�I.���>�K@�jqh��jA����;R<��5fpXA�
C���s�0CQ��%&!M��*��!�54k5�e����|9��D�����$X��]M�vP��;��p|�B���>[����P���6"CA[���B����?�W���3�H�P��:q��35������t4Bn�BwV�t��9 ��J$j���eS�M|�(�;���"!KW ���%�
�S���{���1����Mnd���0X�19�� ��������"Y�������A��mU$Jo�;Hn�yr8�W������+�m��I1�2*����Q������yj��!�+��#�03BJ����
Z�?��������T�VE2���zY��w�eh��c��qA���%q%���XQ��J����e�I�
8�Z�P#N�B�#YF�i���K��E[,�������y�����)�4�
/$}��g:m?����%��
��1�-J���Z�j�NO����po.s{x��������N��wMg��sZ|97����(���Hl�����X������FD!�������5qZ�u7K/�${�\_��V�9����6`���I�����/Ct��A���!S���
/� ��J:t���/�8L�%-%w�pvOr���-N�i��XP����X��x�*��iJ0��/�6�r�>&��E�3���f���.��z���%
$>���A���,�"R�����cHE���gTS�j7�;�s6�)R�b�|_{������$����������8�������R�>J��1-)��(�
Sx�MZD"p|3�\&aA��mq%�g`����)H��5��'���KQS�D.6�+�ZB�����8#x����Vj�M*#�#��p8Cb=�,cIo�O|�=i�tE���RD1E�4������=�A?��H����h-��o��(�[��[��d?�����[��/�� �J<)VME�)�l|���@x��q�}��������fE��	�4aV*���1�n�}�c����Y�jZX�>SNAP]}��s�ao��'��2��k!�O���#(]�!�Q��r����$�W�-��#��f�@�3����7��@����v^�mIN��A������N>�����#5��0!��f��1*z��|��H3SKT�W�pY-�.��l���G�Y�����i�Ln �pDJ#���%��-��f�kd�!@z��9�TZof��O5���01����o{���
+-���|[!��ZD�������,���58�_�3���]6��C���NE�F�?�^w)���������n���Kz��5�\N��'������N����I;=xu���p'��E(�i����)�����M����l�L3��a��0��/r�Bn+z��f���S��#!�f�X��k��7fQ�%����A4�����D�f�ic
��� V��J�V����W\x��n}{�OF��d��="(����x���K��-��s�{�Svy��\\8��!�j�4{��gY�>�T���T������x�_��D������Zp@��h��Cd[�@H�����&>3�$s�6	�,�����'���`�����uNK��O�z���`�~.��QCla��l��7�[���ygg������a�0�8=�������oQ��Q�m�@�+��W@�����o�r���zk�T�:��(���Z��*&:-{1�0	t�����l��7:��`��nmo,�Q���6��
Z����C����������a�ll
s�IP�����~�\�����o`O�w2`��QZ�����F��5Y���]8]de���4
:~�J���0�9|��<����G�)��SZ�M�
����'
�}���DbOQ�7MFu���
m&u�]_���
�Uo��\]���A#�*�|���*�i6~��=�eT�1��o$�:�S���{�V����(�.�I,h����L�� ��p3���ygVxugK��Bcs.�c����^���5�����Q�u�(����p�be�z�y���QnU�	<b��x
�Cn>�!�b�d��pF�x��5G�}��Y<�\H|�����$+�I����F<	^���=xHm��$l�T~��_$������T�/����$^|�� zZr�f������d�m�Q�AAc]��Q����p�\�j��3�kFd��G�A���d�h�vZH��;��<g���z�+
�y����@c�+z�H'���l���}w���;f����������3�%w/�]�C���Km�@$���~���Wm��(��r<��H:s	���:�7}�wb\��	W�������Z�����&W��M��6VN�v��q� ���,ks������^����������
�&���
J�{G{������{���	�ku�#�|%w�j�w�o�Kd�� ��n�Mr����=��Qo�N$-��g?��K�z�x�����=@�q�]�t�4>���o�i�3a�9�wj��Q�����o&��m2#d��b�\U9Y�'�6F�f�#
�
�Tv�65$��DqY�.KGJ�:0Nz�#�F���o���'J[Z:Y��1���_?��oj2������^�HPl�IH���'��%����}�UV��b��L�wI��M���
auc�~�7���T
O�W��LPgu^�`,���{�aQ�x�������9���C����������l�_��l�9y�&������O�m��c�%�N
1���������bn�A�����w���[���<��ot�e���`5w�S��!��8�C<��v��?��ZP�������*����`;�m=�D���GZ�u�Y���������+��9>=�����L�b�R�7�)[F�'�wS��#'����jtZ��:� -���_�o	���H>&��5Fh�o����{Z���E�h��o�C�!��O��m�c0���_��<��L�2��_����M��%��f`�hTP�_sEy��.���������
�����gla5���W\����M���K����%��Y-��8�%?�9N�4�L�j�8�a�9�v�����7�����qP+c4�Y8�dYY���'^�unX�e��1c�5��A��2��Zcks|�+Y}[~{tgn<�����/c���/N�����;�K~�E����
�"����K+&�@�5~��^�yK�_��"x]"��8l6X��V�b�B1��iHY�'�A!�
�?$�����&���+�\��E��ce�b�Dx�K��RqOk�B��ap�u����I�����rJ���M��������i������Z�F��Y5����g��|�k\���?Y_`$���������9���W�<V�����:$�R�
��` 5������}���4:�U*�
w�RC(�LA9<�����JD]�E���T��QW-jZ�+"�{�=>z��LU�`qUn����871�7���uj�l�AC%]�h�$��}��}��~�ge���zS2q8@{�-EQ^�������\�Jh6j�p���Xp�(/XPS'�Y�����O+�Y+��wZ�4��(���1UR�����W��\:����#�L�����\���a�������U���[7��&nP�R�)��~��&a�-��R�����������3���X���<;u��0 �0�f���cB����	����tjO�9��pj�+�o�lrg���Oia�+keo(X�(���QeAy�9�"R�\�b���6��H3L��`���������1���g�3��%���(i�U�i�����a�{�6��^����;_IQ�s�A�2S��N����E�3�9�'��!�qSer�������wC��5I��E���Q������j�$i�0A��y��J������:�s����9��9�<Fb�Jm`����T�a�����'���������x�Ps�X�:�y<�\��<9�!��\
Pcf����g��m�6g�4��7�HL;~��7������4�����h�*v��J�fY��{^��! }�.����\3���+�������ZGn��-Ffe��7:�������N��N-��3k�	3�3�8��h-I�2i$�W����]u�^xr�P���S�s�=��#'�L9���}�wJ���.���igZR�4�1�����	1�����W�����d�r2�5FCTsO(,1�|��Cw=�\��d���(\��?���eQ0Pb� Q�z�6J~p?����J�@������I�F����=����p+qF.N��L���b�� u��D��^��!�'\q�M�L��jt�Wr5��l��j.u�;K.e����f�s�N�����8h,����q� ��#�]0h�D����-�C�9�a���E:�Gi�=/��R� 8Q��u;w����|�����(W����';.C�)�!{������sp��O��������7���:�mF�E��R-fA��w&KmR������k���<^k|�md����arCh�����]�c4�jK�c��b]�{����Lpu���s�XK�2�JZ�*RUF���*��$p*���y��goq������%$��h���EQX�V�<�w������+��{������4���W�_�)1��A)�l/:\4�������d
��`�����
��~`��bN$�B����A<Mn���1rz
�,#O3���p��N������0��1���J���,���4$���S�����4�����g�f�f�BF�?�����/�tx*���G��9Lp�:�����/�5vL4Vz���>v�D���Fs�f%(�dY�S��N?�izs�m���{�����A+Ww��[�����v��iL��x#y�&��R+h�uu�%{�#���*y��o����&������\��_S��>���p���G�dz�y���F�P��r�������������^���{|�"9mt9���@�7��N��/�R4��u$��x6����Z�?c�A����g�
������0Y�Q����4�����gJ���)�������Y���]	���a�+���i;|��hy��O�!�`]
��������J�(]c�U�V$����]����1S�J���H�,7��|N�I�P�L�LiT��O���{Yo�E�!Z/��.���t���%���?&j�������k��uw������<,��U����+V����("K�m��[�c����k,�Acp�i��-���>�t$o(}@��Y�Y�$�
#�6Y3/gPe�+��m��Rg�,�k�X�G��x�����;{)������������W�6�WA��E�Kg��[IsW�G�D��i��,�4�b��y���h�8��{Z���c�Oq���G�.��@�xr���>����e�����c����[�r�_�)��bm4�3z5��*o7D���9P�Pk�����t�NON����`�O�G�B��|u��*�g+2��I����9;5��((��C��.��7:�qp�H�g�����0�XO{�6�X[g�����������<�t�~+��L{�<1�gA������R��R� �^��Z�voN5�l��y���WH��U��I������MH����7����O�/����KL;o���<N����������DP��U?�T%s��o�/����w����8�C����.A������u�D-�'��J�l��{� 

Z*����\T�����cf/���_���8w)�'9;!T��	���=9���x��q�1��)��b�Q@�z��>@#��&��_�1���b���.��b��+�y�K�L�4x�"@�D<
�����V�KD���D�o-�!�[��jkT�%����<�B'$bfey4[�jZp�Fr��s��`��iwj&DV�8����u�k# ;�`!�o=��j�-��y.���;+� �d^� ����X����B���I���6��8��<<���0�f7��J X-[�|��:N���]��r6\�'S ��kNx�*m���*�!����/���������U.�l
lb���]i�X
��(�*�q����/���$����*#��������	L8�����j��t	��$����{����b�����I�	B���rg�bG���6���W7���y
}VH�"���K�1<�?��T���I�O�����)�����4�=T���j8���H��������m���8d=�h�,{a������-=_�`������{!�?�=�I=#3�����f��y�7F�����{�1��U�8@��
��-`�.%�����Q][����CjC�h�����l�5V�e�4!�~�#��.D�.�KS�&z8Bc�j���u���\y�'4J-�a����-���T>�kK��7�$����~��q~�Q^���j�������,3|���(�����_7��Q�33/����&5��V<����
V�gm��WD�h�t�~(�+�MXU��x�?�=�.��pZ{w��r�!��_��R���=*�Q�����H��7?�����o�B������,�W\��xx��&���
�"����p,�x2���e��3���S�d�u��do<7�$���*�l�E4�a�R����a��j������+;�[�B=���)��=�[�C<B��)��9=.m������%�<Y��,M#H=�3����.7%�[�j���$�F��,a��&6�Wm��N>�I+�8�N��'��
Y��[���G�P�<���-M����M�KS�|8�����v�(}�S�����7��;�{JQ)Ta���O$k]��L��
z_��g����Q7�f��jR�s�~ByZ�O�,��h9��ov������Z�k��/����������&�-�������<:��7�>�����1�J�YuR��Y(�
���g��h_!@+�'4�M��n��y�k�_1a���`
��G0i<2��+Ae���N�s�����!��e���vn����=}$���h�
T�$4P>;���rSN�%��`�h��d4��NN�O�f�f�dg��+���~�7<!�	�����
<$����Y��[��D^g^�3`�p���i�V4�@�9_%��
����^��C��N��sf����������>�{���AO}H�7�
���_��*����9�
��<���/5uAP]f��e&�_tZ�1>c�93�<Ac�����*|�j��0eDh%p�w0,d2��,��Ywm���#L�,��*\/��d:@TMM=�/���ZU�n��,32��#d�
L��so��{�y^:���U ��E���t(�v��G���u�f/h0�z����@�2�'����(���<q+N%������g]�����l!�6����?��c��}����F`�0�A��(c��{g{���.n�����%�kb����/��L���i��dnI�@�����>��*!�b��\K��D��Z���1��
��y��������*�^������
,������me�xf�J_xd�d<
`�$�������3I�Az������|�h��uj�'z�����Zm��iyYx���/&3�Q�=h7�p��(�f��3��W�5�q�`x�"?�e�������]w`Fe����:e}���n"a^i�oB0)��.^�������;�c��GT*��Yf��
����Yo�V���l�.G&+h��e�BK�^N���<a~2�� ���^��0P]�UWvW�����h���1���;�U��D�@��G%CA��{�!f�_�����w4�\�2_���)S�����z��%����
����8�J ��#�q�U�A��O�Ts�`",�l�B]�_�d�^������t��
�P��=�����?{��O�o�gm�S/��<��x9����������)>R��A���Q�zy�1Le��X�	��usxPo^�!�����.�}�
=�B��'��>
�����,%�A�;2�����g6I�}���O�����#Ho|�8�|_����aR��P������|��S9��#�jWB��.p0w*�H�i���K�L�1;+#���$�1Jj�y&L���J@��Ww9
r^����~)�Y�	���%.��y2�M(���(�z���.tm,u�����.Uq"�a�@k6ga�?���]�?2�H`�����9��\>�p��'�����9�A
~�0���������z6�^��'��g#�R�C����^���_%T,��N��H������]�����(�������4����q���K�$��i������U�*�D`ql��6�3Y��G&/N�5Q.���M�#S!�'��c�4#%�J]��l�d�Z�P���
U.����-��UA��"��k�O��}����z)���*�*_dd��"�����xp�� -�}��'�zM]�}���f7���j�f�R8�� URq�����F	C=W �|;����*`n0�R�Y�9��]"����d��=<���y4����7���=xA�B���}#�pe����*1"x/"I�Fk����8���x=w9�����|!2vd����:h�3w	K
����"��p�K��V��?u9�87Jj���F�	Z����kl=������K�v���6c�X�������6����k"Zgt��,����B��-.�$!�$DI���_���,������^��vZ�~^�#+�6���tb}�����O���n�^I<���G�����aV%�C�
�!��������^#�	�Iix,����`+����8*Y�
�F��j���B�j������OS��(�nr�v����M�������=�H���P�6`����5j��C���&!��p�����&('$T�O(��]>���L��'��l6U}HO�����-d�'d#Tz�������Y.U��d�^�v���%��
�\�sg��t#N�p=�������)�rL^x;����(�.��_M>J>S�,����l��{f�F�t�����Tgm`kB���d����@Wl�	iTUi�s�&�*�7S���z�^��0����F�O��,������XpE����D��(o�zT4tR���Sl��.����
Y��_+�����D��L�\fb�h,F!-�/��]�y~c�/oe����~�M,����1|��-���W(c����6����u';����_t�������j����q�0b�����/s��r^�.���X�"~�;��������<fTQ?�<����yB�Ag~���+c�uk��Z{�.�M+DS����7|Mg()�����5y��2=V���ypNoae���ThY1��!�D���[�������S���H��F�M$I����S��2�����KV��������P���DF�*B0���Lm�`�>@}1���(d�*TW?/����{����r���
�E-�	'�Da~�zy��g�d�����<����!t"��
Z,�(���>�������7I	���q�:�����\B�h5���E���5��$�x�������+����1�`����|f�L�Y|P���3F_��=7����|�"�d����U�-����fa�2'5�6c�)H>!��(����Io.p��;�����Ol�|�:F(���;���R����s�P-����^�J��2�P�*��-H��9E7D�Bd\���D�#�1�x8s)���������������]ro����Ah,R}EW��%9o��$���
�5�.:����I��C=I)��C����c�6�]b����t|�~1�J\�["^I/x��Qvb�\�#�@�[W���B���/�h������7��(x���_��	��5�&���^\"�y�	R��B��,���cDMg�Q�� K0��f�9���$�����?���w��
��>}��D0q�U�}iPXZ������������E��/��C�����E]��:�q*'n��d@����{��e�A� ������������B�+�<t"���=����gp��`�.}BDjRT�	�[O�����qY�b��f7�B|�M��w���"p����w}������'�l���33{��������{�#
����������ks��wQ���Q���\�5�FC���^=:�I���� ������g�)v��d�53��!Lc�� 8�=&{�����'��f�;���0'TQ]\�D���M���F������
�G��4��8����8�"��^���������6.������/�$�����GZh��mwZ�M�@�CD{�)��p&�RvF��i#)��������7���\�_�K�[>M�jN1_G�#����V�O�w6:��Me���Z��]�>����?�q��$�J���l>�4�������0^CkA/)����xP���I	��V���V��|3����0�'���<�2W�|Jz7��q	���
]$��aI����]�w0�A( �����y��UQ�|�U������ux���j��_��<��>��ll6�p#v$����^�/s\6��=.�S����{^�����0J�)�(�(���4u�����.����^�g�(iCd�3W�-������1?����z��~k��7�����wf�0*�Y����O~3l�h�rL�R��}D�^c�����-���:_�2��{mwcs�i�Y�P�4-�X�6
%�����x�N�nu����`�C~n�U.d���r������D���2)���7�c=�����1"�v��2=�>���xD)|L���U�q�)�x����?I����F��&v$&F�w���)���������_O&x,��&q�����uk�*�TXS*��s����@I��I��?��g�t�cG�8RR<9������h����������I|hyh�cv������S����]���1�&~��.�n��"���<aI�?�/&��u�����!c$cLl��X���b�i��Fsys�Xd�te�\�"Q���I��M!���c�a�0��tY��au�N2��3d$/X�K#����C��+%�E�,w}&}ue���Im-�����f�Hn�[�
���7�*��������1�@�	�q
�������+fa��UO��E�-��:�[r��]��R����/\�fJt6�e�W�z%�b�bUqVA�c>�j�Gv]�M�G2��.���4PO�Q9�h��3V��:<
5X�|Y��l6�Z����������m=
�z�Eb�����9u�������i[���9+}���a
� < ����GSR��\���&}�^�(�0��hfP��h�=�,��������v*�l��7��xx��X�>�iN���x�>4Y�D���K����9�N*jH����bz%�,V%�j�c�|5��G��h_�X�B�����]Rc�L�A��oP��nuO�I�|��O4���uQ�������[��H/Qz	�FaKa���Ef_�T/]���[�|V�����NF�Nz���&	�����9�5������6��4�xA�[+�=(��^��
�b�H�s�=s�����b��AP�����6������I��34��I���iM�
`]�=�'���%��cF�d&����O
���	b�X�c���{������S��p�QY�#��p��P�w����Asr�����!�kc�v<�� ?P7��������h8(�>5��6+c�	N�X�B��J�WJ������Y`lvDC���,�I?%%�*$D�����,���e^�U"s*��&���!��
�-T����2y�����\�Q�G(����Z"��uE�tN��� !����r��I|����bH������,r��A����I�r�YD�S6�;pa�����brY���vH!�����!:~��AJ�)T&_+�}r��2���\��I��;�!�3vYY����wrT�Ti�Nn�V�����-���<�C���dfq�����@t4����-���7a���������s���_��e����i�,��)��j�q=G�co��0g��E�����52�����n�<!�.WV�Sr��M?����s�{�t&������	���w�&QYEH
2�[3�Op�9$�����JJ�IG�CP#���	�k����� ����v���DNo���dy��a�	U���ycG%�����m���e��_�a��wQ��X`��)� M�"�zc�HFY�*IBT�gX9I�33�-�*l�\aH�k��=4]�"��~]�0r"Uw��f�rPa	��tu������3����u�@�=8��R�^F�%�N�����DL1w���(�� X��5�m�DJ�\�p~�X$�2�v��M/�i��r�wZ8��H��x��qm��la��0�-0�#:����y�X��I�0�F`�|8�a��\|Cj>_i3��?`x��{�g]<5���<w�E����+���J����]6�W�H�JE�wF�r�����]���������fd���!�����Y]��wM>/�,��'#B�B��"Lm\pft~�#IB��O�����o��|�-J��DE�RC�H����r�)��TvF�w1p�.Ms�����
/�j���4���q	��k?�c�A��E�H�-��(����� �H�nQ�(B��Gf�7�A�A�A7&m����uj�)�����H���K�����x����5�����)R�!�Q��U�i��;��xg��(Q��+�����8��,��1��Lb52������n�bL���%	o�/,pD��vK��^����O,>h��U�83��d����1J��P��<����4{�U�#����H�������c��f�k�,��b�Y�_��$���h���l�LUK$A*�2�����U�����c�%%j��%��<
�%~r������M�z��}/�F��q%BO������|����I>*Sw	�zn�����4��i�cZ�`J�o'�"�F���J0�xp��W�5�UQ���aII+�3��
���	E�	�af�E�xVG�WDS����r����k���i�
'���S�&�M�X�UB��|[yc���B�����w@F)n��t{��Va1�G�)��v�Y �5|�H���oqe�&�� i�wv��:=S��T�l�jI,���!].�i�tY�]������4�$&e�����&�e<YfV]*�-z���(.��G���R�C%�7�h8��X�)���6F�:GNq��B��+Q���9(j"rz�+ke�U3�
�n�;������������z!/&vY�K�z#��-���54o��`��B���
%������`���)�y�	T�=���tZ�����3l�#Qz��	)�����-������n�����m����}^�kHI2Q�K��Q���[�������
�
���(�����&�����KX��O���'2�q?���m�I"�]���y��S~d
���eBT���9����iB���C�`2�.�L���m�,��G������\-��syh�S�a`R�A��D*��(�����8�K�V.T3�IEa�f-�0��iV33=$��6�d��>aKh��%TN�6�x%s1���d.�(s�JS�R-����B;u8�����0/s�3����W��������(h������[�n�����pG����{����r������V��U�B0��FB����<��S���C���q�,$4y��H@~���,����x���Cj�e� ����� �B��BuW��x�)T&>LR�d��FW����)�(R�������	�'W�
rW�+��:�Z�F�)�.Tq0�f���l`�:�Ln]q��;a[��p�j�Y�;�VW�x��?�n&�}P7����5���m���5����\�X���]R��"��x�@e�����1��d�����9Wd|>�(���'�C#�:Zsnf��^���8r���t��
�H���#��!E�����Yh���T��@C

S$���a;`>\���@7�Sv#�#+���7��u#{����:��5[a�����`�-���d�n}��:_q��9��1��\�?b����,T>y��f���O���J�!v� ��j��6�p|������q�+�Sk-j�R�jw��������"4�v����CN�K��4��OT,��j��c�
�������2�O%��Hz+"�Z+8���y`%������T����.�*�k����������)���m����Z�orP��db������O��M��E�2��(P������^lOW4c�y�^���y������o�a�Y���Gs1F�|�+�=���A��zf�W���4�YwGDS��Sx�\�����H�R�['gq�~@�7�����X�2�� �I�fw%]�L���y��p�����1���#n1�`���A�M��{j�]_�jl$%������P�_q������CPd�v}���;M�K,|�_��<>|6�-�l*�F�S�|P���e�R��3�>�+��m�)3>�Fec���)~O���9Y�{�/���l����7=����wzP8�s�B	��������w Ys��7�]�p��wl�,bCmk��X�|0�_F�qA\��[�	)�'��X'J����Qz������Z���Svs�L_��$�f��jkO:�����X������,$�"`|}�������'�Q�|<�i
u�%nr����3A�ib�}��^�j�f�h�\_�E7ALv��:�nv�h��0�;�>6���C2��xl���9��N9�$*���i�@������� Ol������+�#J��h����/�A0�k*G���
��b!j���O�"����h��0HXv�V�lm���F�����N
J�wR��-DeK2���vh�r��ht�)wj��:��?���e�0�I�"��<.�d��b����(6���M������xY����lH �
�A��fZ���w�=�������g��@eI,����H��>�?������c�>���L,$�5hKL���
�����g.gd��vxo�$�����wiu�xh�P4����Q�V"����;��_�=�b2�����'a��xB�A�/�l�s#P�����q�k����\r,r(S<"�q��w��>���4���0RL�I/�����X_�(���5+��)��V04��b:���,���U��-��^RB���7S��w5��R#g63�i]||Du����:�;���]�?Y����d��������P!�1�]J� ����r�z��w�rS����'���D������f������
�n�O�,����61+>��G���_b`�
��io��a(E)�|M�Y��4��Ob���A�N9�CIp_�M����+�w0���\
�����K�3��r/�����_���J�|���Vi8������q��8g*�PtGE�8PVd�9�FI��|��C��,�K���;z�?�mL\���&���Xb�D��}ah ��j�-�B�<�OY�I	����*�^>:��!t1!��d��$cw{�e?�^��/��l'N�0�8���N����(�"�;��+����3D��S��'WA5��/��Y~���~����)2�h��}��n�_o�o�����&dG�Rv�j��h���$B���9��N<'M������B7�8B�V�l8�#��&��,W-T���	�n�?	�
�P����[=�y7�}Y�M��3��{���a�@�6�����jUv��%� �R@o����qj�	�b�c=�j�j��l���L��$��_��G-s�cx1}fo�\������� 0�A�*��H,���ka_�n]�c:�AN�+=j^'��km�kO����]�;r����1��T�q��j���V���g�C|�?������HI�Pi�<�V0?��v(y4(�X�	E6�����?,�lo�,������||����=��}�&������x��';��U��������}���m�
k���rN��X��{Q[d-�h��D���TbE�J������QK����zP���8�G�����p��{����"��,r��J<L��E������N�V�h�4��N�t4��Q�qO�0�����W�����P�"��^�%�(�eb�|r�(���}����
��h�
���`�������@�& ?�R����Chh�Y�o�
=|a�V��t������E�%�m6�{o�+���!Vh�_X�3
P ��x�t>%%���J*�SA���3$������.��qk`�����#>����
��w��D�������$Za�?��*�B�0j�
��	4�������]'5�j�O�"����
Q�{����RP�uXL����27���B�Y�4VZ���x�lV6��E�nP��C�$�����	���^
���w�� �0%tM����:��*m8
�h�].}~�~B�_��$��}}��jR"����	>�������S�����h2�3�Z�`�b����4�z� �N�����\!��;���2"�.k� ?�0����@2�e�����fdQM����
rPi������G�z�������CIexI��� �����O~�����-��^�,�O�d�v"�yf�|x7<��2�,g�f�qc���%���m����}r�:<zy���]_���V��4���g?1�o���=��<>y�w�=���Ri�}++mW�x�YK`�vA.0���aDz�{'����%P�}x��	����J�f����R�w��srL�!:u��?$>�����K���g���kx0q��u�A�@dE�>��1���50�o�`���q���pB�J�9<:[�DV�N�F\�����6�'B*�X��\D''�'u�D�z�H%%oK�_��Y:�����G �Y�L,��dG_�^	����i���������WQ��&���2���+`����
�}�Dg���i��c��A^#�i:���V��;
K��t��e=<��{\
���Z8��!�F�,�`����BfP���z�J�M"Q�;���>��q.e(�3�����Gt�lQ� p����g[��,�e������"���[OP�q��wis8�|ZX�����c/�����{��5�p�'7�� EQ:�^AR� ����AuSu�rW���/o�Ht��_�9>=<;�����roD�����7�'������������'�������7o�O�^�lq����hI	hc�l1g��=%KU���D����r�z�����.�j��V��csA)c��"�9�%��=�
e�Cs�4�5���z�#c�w��}�����}]7C+�����P�P&I UN�������.�����7��;B�];����C�?�XBO/?gO�}_��@
J:��g0�������Y$hNDJ10<.+�C���?�T��Cv*|��*�9��;����4����f!�����
��6�lVN�����"{ue\��";�zD����~��A�V�����)�EU�w���|��,�Iz]@>��ky&u�V����4�M&��n��;),����c��ph�"�L��X�~�.:	p�����x��oJ@��%'>��ji�������>�o�6
��	�|8�5c��;^+����/� eP�da-���&����w�W[=��;u��������������S��m.���:����u���.X�>��F0*���Se3TXE-���K��l?���mG�d�������e��e	%�N���dq	�3�c�X��MXZ�k�~1����"���uV�slz��}����������������Oe����a9������ qV�)H�����u��1���!�^@.��"�EJE$o��2|W�!+�<��&�r����)]��(,i7����s�%q��u��b�����q��/��Q�+����e	<E1�A]�#jf��D������F�L"U5��v�0���U�d �6��-��!�^�Cc�4da����G.�-z�H�j��=�G�e��,$���T�|�6����@����!�lC����,w�X���,���R�
q��'le��ik��������i��Z�Owk�PrY����
m)l:,����N��~0�G��-�p�o����0�\�uCM��6�o4a��U����/W8"*MY,�O�=��n��4[2O��`I@�JZ�(h)�yHk�E�Q5���Y���_Z�"����%��y�[����hY\��r�${N���������e�.@�����.����U�V�*���(0��w�AO$&�q���t���na�U����%�D|����d�O�_�5�{���)��=�����w��KaPm����W�9�?1]{W���C�s{ �`�K�I��O��|�8'�h��lZ�KmZ��M�?l���y�6^��,�U�����Y�G�<m�5|�5�������<�+���K_���	�YxP��I���J3f'k��b�=e����>���w��iZ��e�}�����\����������t_��[��,)�U��m{��d�4�=�	�����&%�iI����B+"Nv0�����0p����}����lC��}>]i�t~uk��2����6�IE��Zw��o	���*�::1
9(�dY���g�]T�E��!� �)>�*�p�������kT�����v}��s��t��������2�>98{{r�=:���Z0�bW�:g&�%�Y7s*Z��UM�8>:�M�&�Z��pT��8x����������n�`�O�o..((�%���U�2H������I�j�����`o��o���C><:�6fk���,�]U���Xwj�>b��a����cx�_�������Z@�e���DH/��q�uu��Q�(���V�#�G��N��\�*����U�:��#'d�s2�C=����|�xV��i���d�'������}I00m�zX�����o�N����2.J9����Q��KB��fs��f����3�[�V1Pt�i8��x����)�:��NIyxS!gQ7�?}��7���Rvj~����m����57��z������j�#��������o�1����� �����������C�+�y�Z�2m����z`d{����A
��s��z��$�jn����-q@�OX���
�?0����������p�v��u�������3�e�T�����t���f�|kg}��e�����F��h��Z]]]����h�vk��cV�_��8'g�*�;MF"�1�e����UR�<F��
3.o�|�����`�L��w��-��:�?3��b�u�K�<�&�/�wj���+�4���/U�9��K�n2�qB�W�������V���\1���3���_�.�!=fA��������j�P�(�m}����"Y_6zw����p	�<At}nu-W���'��Md���1>�E��v�=�f�����_��G����M��9>B�����h0o�����&7
���+#�3�����F�1Q�����g���)�?��j�p��]��:z������E�Ji���QQ�����
����w�����@��q$NuJ��u���Y0���L������)P7{*�3����J��	��H�D
hs������{EN���l�/y)�k=(�y����Gp�R�����|�ovv�[���v���������-ybzY<��;���Y���������:�6�����?n�`%�/i�Z��A��o$�_�����������tE��*��������>s/�vg�[�3_ahY���M^����1���"U~���j���p��=�����)���b�n��FP����[�Zn]���X:��in)�,eVW�+jn����V����D>������+�����&t�M8���K��U�Jd=�f�����!�p:���*x�7��[fus����)�PJu��;yup�Ku7Kb��%.Z�����-qu`��J������<�����u�A�a�`�AW�+�����=�U�z\o�#���w:�(k�L�j��&��1+(Q]]1� =X1������9�t�p�����G�X��^��;�?yN�-X)��n�P�>�X�����B��b%���T��/=�'Shx����.V*���V$�R���ZZ��Je�������/��-�sOY�������0�h���0���),�\���3��x������_!X�vk�����E��J��,j_��/j�e���v_6�/�#�Kmi+���z��Z	���w�C"���(
��-�^����~�>���V�6�qY��\�J��4�������n!)~�M4�{#������-EPS�z|7�������7�*o�x�A���
�U���t�$Z"O���3���?B����cX�c����M�kX$�;�o���F��l������Y�{g��Y�9���u����/���>���_(\�v����ug��BQi��0�V�!�^�%��k��I()\����x���[B�2���i����y�S�a�<W���Q�P<�
?�\`�BG�Y�gA9�?p:���)q�����2�����A�}n�[��+hat��U{V=�U������)����K,��w���y,��;��]�2��1�N�d|�Z|!;}����_�����L�����m�����'�$]�`ls�<��P�A��N�|���HZ����rJ�k O�sE��^o#�^���A�Q�n���,	���P/���\+� �(�5l3e���3T�b�8��t�<db ���9g�V,�3�����|�3sV�|Z����j��'}��\,x�~L��8���f�{��#���q�|+��������������v?.?������-�g��d�U����2�d�[9�*����q����y��k��Rm`b��c�a
��*���Z&(18�y^���+h�(����,�i���c]��s�?�n�\�����ut|tP��l6y��8�h	v:�����u}�a�`��B�x����_gJ���g�O+��`[M�%p����*�f�jd�i�J��sd���4���p���HH��a�
�|+b<�����C���bUtm�:WD���A���]�|�W
6<��������z.�����)Y{�oF�'|���E��6��OUw����21����?]j�X�ej�0i"s1�D��;h����(Pz����	��^�Ee��EE
�s$Pg���<��aOTR��S�0��T���8�-��I����������a*���4��b�S�`�����z�m���r�R�>�Q�ce}���,q�l+���
��;�r�S+:�{t	<wu��|3�g����x�$ByJ�}v�ac�$^O2�Z��l}���w�\u�X��X[��g^�`Q����}L�>����}t��������"H<��?��$�?BN�9&�D�.K�����U�N�����Q���d0H2���
���_n���d|�����s��Um�C�����	9�	dNjC���1�������|_>v:{D�Q��s�R��<y�!��Q|�p[z]�V�Vs�����o�����������v��:��o��th+m>��q
W�i����W��zT=�b���-`��
��&���J(
fGX���BnX6]T0�������:8;e�,���Z�&�8P<�����fy.J��#h���aN�T�^NF��M�c������R:�2�M���p����$����'�t*;hT�@]�0��o�,�L�m
f0pRA<������[R�WL��z�i?�J��X�2p)y�Y~+~�Ml�?�p6��6�X���;�s��2��9�)M(�."�9'��8�������������D1��Zg=��A4��I���DH���#���I������9��Z�e��z���>[���Bd����XH�2��ic�elK�m�V�@Pv��IX���5AUK�<I�
*3p�6Q������]��R<�*�L ��?����1s�1mG$���h�gR�(�|��L@��`�����)\����r�k�f��d�*�l-���
�;���+�i&�>�RN5�����F6Gtm����,�6*�/��k�6�:��P�L=E�G��G����$�;��������;����u��x����j�N������^��<�4K�cI$�
6zS9�>�m.Va�V]��������jiw�uj�����*?s�����}��}Pp�i;�[���ln ������^�E{p���q6�i�;�f�����*��������cU<^�GbBxd��O���P��D� (1�����"t���O.3���G#t��A�
���o��Y�yy�&��p��6g����U���d���q)A��Bc�!@O�3?�A���������i��|��5}��<���U�k���C?��
~OZ��l&��[��[d�X�'�P��lv���U����{L��Uh�C�b�MN3Fguu��`�����=u`ts&~�{���������^]�u�l0)��\��[d�mtT{�Z]�U3?�}��\�m�V}b�S�RXe	������`3���	.(1��t)��������[0;�Z�!3LJI2����c��XXp< ���k����������e�Aa���� �7��9��#sA������9��oc�7�Gt3F��1i,T�\��6�{��t5LI�����C����VK�W����Z���|��ZO\<�#T>��-����0��������9����&W���E`6���rxq����?��Os���<
�7��W���?s<���V������s4����.���!�=�`bl$���8���o����?�w���]���)m��Q�f����������
��X�	��T����zR������Q�D���e
���q�>�����y����?��S
�����������	���g�9��F ������������:����;�����^�����|m{�����1f���gW��h.���N��.L]���3��l�/s�Dm��LB��eA|�B��?t~|�j���D���g�h�B���
�;X4���|�w�I��|*������/�?�}���h�9�e�)8��������d�b��Y�QF4L����g��xH��6i���
��d�a$7`u����������8J�
Bp^H�9;���<����BO�������_������������\�����=���r���8����"`^��x�����p"f�v��|�%x����<�ov�|�t�;�f������k�x@�? ���W�������}������Ww�����+U$��i�OyI�R�_%����Y���/���].�����sZ^���i���!Y���F�1������#����r�N�\P7�B��a��8V.Z]}�.����*\h���
��k{�������e�^�^�O��'wAS��f���� ])�3�7oo�6����|��^��W���}�H_�
�7)�
�%L2�'��%���8�&#�\
��?��M��'`)�}uR)4��$_��o����}�"��������d+��|�TE.�	�*���Xr
{S�o:�H�q�,�B�����a~����XRQ����Y����^�<s�<���k��f�	���P<�x�_�h���x
�p���Z�����~����l�������{w�����/HG���o������<��_�Yw6^\$��|H�E8��>��������o�~7��y��/��uf�����)b:��;��fs����x�}kn#%+�-F���Y_���\���Q�+��R�u�W��s((���Gg���D�E\>
{Q�������o����5~c�����o��O�h���%��
~+/�q;B)������;�D����������+���V���_�>O������w��;���t�N�� ����t��-jy��?<eV�{�����~7������
�Z���l�������k��`�T�
j$���4�1��W���������-�o����l���������[m���n�����������0�?0���r���7��d��3������N�1\�������9��%���v���:�$�w������&���mZ�'�H`�Ul��9�\���~2�Oc������"T�f����Y�\�'��3����9�no�V��F���c��jP=�9\����44[k;�L��F�UD9a�22�~3�"�����<>�0qct{9�]���\�*�g�*�/�#f���z��(���?���J`��1L�T�~�&�3��s��}���g
��7����G�z8�di�Y!i���h=����\8��%�F05�D�Rbi�pX��:��|�&9�l�O{���h,?����!�����#�L�CX��Z�&o���q�4��$�Y����3���\\6��A�5m�n3��&u���}9�O������N'X?M�)�?�a#�0��&(\������!;q��5�Wpna��	&1��3%pL���;b
^'�
�N��$M��6_��c�	m@�Kht�@h8G��5?���B#�m<���}�kI0�����fN,l���8��p��U�����o8[�J53���q
|o�1���������������?�E��v}U�������r����^X��m��>eS�F�=�c���=����V�-4"�Y�!0���Z�J]�9�]^!9����5h9@;$~2Z���LR>��@s�vpR1_F�M����n����4" F��R�n/��]�C�N�_��Q�i�
~��~��$��'l��p���L!�7�
��-E+��tGr�i�}�Ok+���G��������������������_x�%I��``H�4��{���t�xt""�n��$�&����!^�Dv%`���r<�naw
y�C4&�{~K��&y%�WU{�o�r�R~�]����Wy���b��Xh�lw����f��)���}��������;�4Q/WTc�4���*�����Z��K5�����!q��l.`��f�`I��<q���wUw�^��9a���a����Qa<�>����u������e�)<z��a��mg������U5�ju_V�lQMEC��������Z���QS��8��Z���	����>�,:������L��cV�1_\�
���I��~����a��&���� 	=/N�:C��3M����G�Y�p�x�
'K{�,��L���:�a�<x4\�|k�X��p�^G���B����|�$Y��~���#R���@����l	���*��<#(:#��D�
,pV���WWa���21|jx?"�$���J�n�K.|
a��k%���[%���q	+;�����������O��i�[	��"x�D��S��|Y�Gp�@���K������,8`�e��2iB����T�aN�����Z�9]��_�'	�|�~$0��Plt��0��N�JO�\Q&JQ����(��KXx�f@���+��X�Nn�4 �v�\�9�����D�7������{5�h�.3=xa��'�$���eh"1���B`�5t=�����f�����?�fU���4CO#�����X�����6�}&v�8@ ��5�B�k���*�\��yx���@��c�t���Q��&S��=��F�g���5(���-�r����0��]�����Q��jaN@�vr����A�oThom��p����'������UY��l��8�v���RC_ �j��\�iU"d���vX���INb�7�T�����v�m�;��f���v��}��gg�y%sr�������	�������vNc�& D�$���r����������������/�
�B�����������7�����I=����$]c|�������]L�K�R��]�ws�)������$�� �nmln���v���W�rRn����?�k�����%����1l�i����E\��4��&�x�E�"��q���g����O����������%^�.�wx$�GC�b�z��
N��-v6���V��F��#��v����A�S�=����'�q�bA�o@�r����_t��c_�����������$)q���W�k���>7:�m\4�������X�����o{�����A�H�-^��'�}�"q"���O��	x���S&��Z�U���b��,�����q�Y���V5�����/���q��p����O������E5�=�St�`A!����ds��lnt�~�|3�t�@S��|��Dmv��/�N�����0SR�K���*9�������5�7�
_��9u��T�{�Xbb�)���p�)�2
�"�)�t�[��;���K���;~���!��s���#1���o7.\>�_r-�%	N���R���F��kss�]��;�e��N�>_��=C�7_��O�;a1�Z.Fa�NTgDT�s���5��G��0++��2\��A�D��#��y:`
��(�r����RDU�+�v�c������#�2\�l�L���'���z�pi��Z�a�'%-(0d��A9(���u���o)����In���8���&W��I�+zt�H�"0��c��o�?�R8��in*�:X�-�J]�Q�_���V���D�1Tt
_��?�Y��07O_477N��4<3�v� �Wp����]����|���"{��-�����P����t���f4������x��-����@-�*�EH�E+����������]���p��@]���s6�	LA��k��[�dt��j���3��h4��y�x�X0�-������e����yA���9���#������@&k�9}jF���i�>L9K�2m�����_J���Nm(
1�T�h�&�)�!������&}1)��^����T��w���M2S�t��/��s������n��`�7S4�PK���NS�~�����_6t��
d��#�������q��~w���(eS�
&�<Z��O�5�}�$��_U�=�,_*m0��y��TR��;��K�g���K8=�i��W�������~�	x�J�q*����(a��g��:�GT��8��X�y~,3�U������H��	��`�d����'/�B��TAo���;;����}u]���a
����@S�+t������?a�Z�<�?�c�C��A�������6��%�u�>NM!���C��}\ L4�w%�D����O������
W/���6��4[��,c@	�6t���Y}�Q�W����hT��}�u=��P�_V:F�_J��rW`b��U�i�YQBil���Q�z*9�w���t:"�>hsm{�^�UP��>
J99`;�2� �h�#��8��xB���L������7��:N���L4�hv1/.t�(6�7%�����[�1��\�]�c�FJjD�5���@`:�����3Y#�$�*j�7 b��g����kC� ����f�
�y��&����8�G�AIt�
`5gxA���sHC��-����U_o�I���1��('���"T7����W{oO8g �B�9�!f��g���8%����rt��Q|�u�0����d"%�^}�y�wms�F���������6�p�J�����]�m��Q�b2���,��*)�����$�V:���2��%
�n{�^d�Uz�J��i�n�L�����s��l���EC�3�g��0v8!���,3'�vy���k��e
Z4�G�b�	W����9����):Z79�
��;	��'YX�cjyTC|J�ETX>�%r]�p�yr���f1>���%ODRH��Q\�[@b�v�uY��*�@�F��[�7�!j2o�$�����������h��X�\������������7�>�Z�BonHdb��7on�s����c^(+�#k%&�o J	���q���{��q}i%j���z	�Q#���$��Y�Gc�
�8@C;�H'�>�������O�+"^�7v���i�-��Q�E���wd�Mr@3�p|3�IG����q@%�rD/�=��h�'�_��.�i|��3��A����+V�m�}U�C��@��xI�����W����&����X�t�]r$�Q�������������A���x��ZO��=�x�d�I|*Xc��?�R�-$����|�d�I���v��u�K��%�����Xy���!�kI.r~(�^��n��l;������~xW��B5/�o�x<�v������M��kt����!���X�3Th�x���=%����''S
=��]���Bc�p�� �(���G��q��O4��4�.g�������N�u�H2� ����6�I�����k�T����6�����H`�v���{�tRw������$���Kxx~���5�4R ����C�Aef�������L�b��+�y}�T �H��cRn�az���3�����dgk�:�E�b�]t����<����M|��#Q���Z�2�h�����s�7�
���(���O�~�e�N�M���NH�<��g	
6�w�P����^<��w_�&����$�����N�4�X-S��e��U=��������_^��`=
����'V�z����(��4�>��X��!�I�,j8������Y�9�	=�G�������d'�sc��A��2A�fb`R*HX6��p�@���f��Ei"�:����0LQ+B��;Nn��������&;�=H��3��[>������i�v�'�o�����X��M���>���mE����)2H��$�<�IIpDlN��W�)~i�_z��Y�|����.B��)<�#�,�������|B�8wL8�j��"v�����>�}\�����nj�������p����5������g]3�u�����q���>�u4D�H���|��}Tb�<��L��G�m��L&��<.:�- �U]��`�`�]k��S�,��D>�$�=��9�C�2��p~=���`����WL"Rr(�����m��
�ny�I�����q<�b-��` b?����lH������������\|~�_�cX,�����`"���b_�|�l�e��ga�������/����r������\��-c�����&gg��J>���V+���T����T�@A.n�
�1x*B��N������;y�W8������a�X����y�!y>��pPx�?�9�3���#Q��jp�����y_��pW�����	���y�4<�5#�&)����
<�3`H�.1Y��c�U�uf����-��)�-���hVs�vk�b'���'bi�G6t�4�����E��,#��]<:l���B��_�B{����:)7[�� �9�o?ni6h!�3R��'h��y���^����v�Vs�=?l�:�(��V�m���>9��Ps4���l�������;�����G��u�]��Ppp���~#�]{-��}�?-r�7�G�a(�<���z�tQw-��b��U���wv`��z�s��E���7���W��f]4����;o��i�d�
��5��|(�<�.���?�(P;m�w��������=��8�����q�6�9��=�n����&W�F~��G�&�	�d�1Fn�\��������.�C�r����Z�s�d����0+�C�h��zxA�_h��)��;��$x�mz���v���Ywd;4�_D�@oD�x�29
y:���bY���g�+zU���v����:<�����a�Y1�
<�R��e~���2������in_���'#���%P
.u����="k�z���d�K�&/&���S����w~���>���w���h��[�d��x��^��|�Er�1��R�- ����@������[4��,U��
2�K�YbtP�������p)m��3������G��G��������Z�o A��<H?��o$A��q,Z��6�%W(#��;%K�	z	:j�>��P�+RhbL���6��38>��`��FpE���Z=��v����e�9�:��A����b/��C��h����&ID�;�7�����p|C���
�
���6����N,�x%�����rk��^��!	��}��F-d�|��ja6�>p1������~��q3hj�n��2(KmL$*���A�bQP!7�<��21��io+Rz��w<V�-�<��&��l��U�c��y��#~��W����eb��Y('�p��_d�c��������51�W��
7c�G��
���;tF� �!�sr��Lg��f�[���*�%0W��#q~<O?�iV`�;#$<+�Kq+��v���B���k��K����14 �B_���6eJ>RW�d�������g���$���)n���(��t�z���	�������}�hp�<�S
Y�
��ev��h��$SU��S}���)e~TW��T�lM-�S!�������J�	��hp���p	�T"�(5>)op��oDC�D��:�4���`p
���4�y}0��"��E:��&:�{�����#vC���53��|������[��E�L�K���6��j�3|LQ����>D|�L��Q1M�?E���a�Z�pZ�����If���YyWWQ�����x ��$@a�.��T> �����#VC~/C����xJp�qB�6����P�����\�h������HQ���_�]D*�^%��Zu��F�-��8#�����Do��G`>�=��G�����-��X��-��mg���5f��I�a7l��$�NW.w���%��@�3Z�9^��D�[���x�zn�E������C�}|��<��+P��2��&se�{�nN�#�2��	��x�:R?6�)\����7���n���A��D����>�)�����������5�f^1"�`z������������L�T�9�FEG]G��bQ�������+����AAVa�wMV�5R��B#�"�}�0�d8��"N,,������_���Aj�]���CY�RK"�W�TV����a���#z�0�l��:�S��)~3#Q��A�(B�(�����4�n��X��vt ���_�_4q��s�u�d�rQ��u����e��!F�~p�{UL/|3��?h��7�"�,�/X��!q
?�P���P���P�]#�FyO*�vI5��HX4�B(��"��2�b�|U�P9�e�R�__��}l�}w��c��xV��\C�&Kg�]'G����� �{����%���d��w?�v�N^�_r��V�����]fY&�,~>���q������5�S���{b���g����h���S3D!�}���4���Wm�vu�6)3�[��
moE-37���]F�sNMb*�W��V�����k���D�P��	pK�A _�ev�����Q��q^&q?Z��B��Q/x�o���rD�	����O�$�1C8Op�<B�L��pg���/��.�����.�:�TNeI���&fi��3�o�Q;.�HEY��3h�{�L/�k1���,sS.������:z����D���7E���	�0�z���=��H��(���������=h��=<_��dDhA�`���	J�/������R��
�u�����t������-z���k�Y@O�L�+��@��Du�����P4�1W�E�5����i�R0�������e���l3�k������mT��U�j�6f-\�elm�\%�8
�s�+��m����(<*�xn:�/��q����Y@��ZI��=6��h_��&����;�;G�dW����~�#�5;����%J9hW��S^I��P[�q�%s��|���$a������Rodr��*!� !�P:f��O�+���"�'$@9��p����(��g��[<�y�HK����^��vI=f
��P�@��� 6OSZF�Pp��2Ng()�x�~�6��|���e�B(m�We�w+c�����{���������Y�I��<���04�~9��)z�&}����,|1���~L��>�����	�1�(��a��`��P��������;�6���������8�1*s�v��"o���M�|��������4�i���3��-=������}{h��^y�"����C��xL���(i^4e".��Zzf��k3k�m�!�?�1�B^w���^o��|x�4����.�hP@"���������rtfC������nE�st�',�*��e��T�)r#`�|����}lpE�r�1�����3��T|��o�L��������{#���9�$�,���W��������X1�}�7��:h*�>��I���1$������E�V�����s�L�]����y@��k���9��HZ.��MI��M���b�[0me�E�������������%��F��q�������s���o�k�*�\�������T4	~��lMF���6�G���z�K&$P�i���;�����d:��h�O��e��I�����)��rr�	�������
��)�5���"��XB�����A��C[��C�����uDC���:Q�L��t�f���b0I1h��x��!0������6��_9:<�<V��#�BM��(��J�K7p]_�&�&�`.��^�A��E��EZ�@p��C��$l�V���7O���
��gIAsXG�OK��Z����Y)RE��V�������$�Z�^;�0�c�����d������5�~Pn�o�E��AE
jQ�$	�i�}���|C	������_�J���nKfE���-��/7���q�����x�)V6���F��x�H�J�!�}Y���8
V��x#h�j�
�P6f���,l���f�b8TzW�P����
�!��2�%�����BD�a%������Z�W7�Y�� s��E���<�L|��j��#���JA�\����%�����E"1b�&���2�
'M���k����������/�O��Gg''o���F�J���Y����r������rLz�3;L��5��2���-���j�g��q�*�m�V�s�?�}�C�[y�Ry	*������D�QbN����^Tz�n�%
����:����.P��pJW6a����>�� ��)Z%[DGA�p�J�g���	��,�8D�Z
��f�?���13HI@N��_'�-�W�1w���{�h��
�@m�	
����2��t �M1���]fB��}Y�l������=��n��G����E������|�EE���j!����.Wgvu�|(Bs�Z�L�7�"���]���7J�`HD	@)�B�Q�Z���0����NT��-T#>�W�������'�pl$C���
L��Ya�@a���F�(ZA����yE��
R���P;��4��x�B���+����M�-Q��*I3�N���������-z5�oJT�����U�<�������oI��N�S��O�{�:%����x9�s)7=�/n�_/wz�'����JNS�_+�j�I������������OQ���"�E\���3�,'�����t��'�O�,I�)Ra��=�����l�
ER����k=��\����w���<7�yK������T0�>�*9��<F��GI�aa�{���/����S1R%�������g�������3s�	*:zP0 �Z��C1�d���Q���Mj}A�/��@�
F4Hn<����6L���?
<<Dr��i5''�2/����p��[r���Txp9-��
���[HI
|����2b����2��S8�_.�?�BS��O��,@����p3���V�/���y�0���UE � �.\u5���[s4q�f}�<���.���g��-6�EJ��|���
M��Dp���\��8_7|+vN�~���w��=
�]�7S�"AO�E�H�Av&A�R��GyS���*aU�*��	�d��QV�GC�)�|�+�g�k]��d�ADeMK�����l�
���}�U��=�b<D�+�^��g�Dg��kK�.���J����������6����`y5�JP"Z}�X�[n���H���G���PK^�����\��
�k*�?�LJ��	��6
��"%do�$(x�;K��A��J�����J��'��=�����z
��=�(�K*���F�n�^A()4�hi]����ct=�h���W3�����G�*�{+�e6� ��0{gX
��G��()t
��)R��q�����
��I0����)�T�@����tc�*vyp����v#�V��:pp+_���g�@�~���RY�L��uL���qS^3h,��}`��o�T�~����NM�jVh�I�����T����R�@����/�t���c���<P���X\T���N,�e9[_����+#������D�������e��c�)�D0���\�d�e�hK��^�Ol����|�&�G����*�����H�m�N��}kx���
 
���J68��F0����p8:K���}u����(������BL���>�/0�����r<='3��D�S�-��xV�����O�����h��cF�Q�mk��M�]D��t��}�'����.��Y�?�����L��0���z�	��5����52�����ny�^���i��#���H����e>��5���`�k�	����3���02�����e�c;9���f�d�/O�z�N�X�Dj����)(�2r�`��@�D�N
��"������������"�f�G�"'�E7��z��8oG������zz���:1#��\���mL9l�S�?��3p������#�ZY�Y��mN�1fi013��8e?a�w;�yl�������*0�#�b"c�w5�E��K��?A�2y����"B�����?/�����3�9d�Q�N��kN�8+X���!m1�4w�Y
���[[�]�)�W����
�pEl\�p����:��##o
Em�4�D���dY/]e��XM���E���h0 u<�<|��fC��Z��g���'�iV
�E!K%QS��J���1�������s�����]��@,��E r���(�h�6{{����������%/p���.�J�����0�#�F�3j���)
���H��SvWo��������G�Z��~�E4�G��O�2`G����
�Q�+��[5��YV;N�X:��c�	�`
�
�����E)@��tIpbWrJ�`\�0���R���*�v2����:uqc���U~}Z�;�W�4E-[����Q�'Gw��-8�;������=���/P��
m��s�������h�`[{�x���0��e!�k0a�W`
�
��3���+�+�~c����a�3��c��c����@�>��*/e��YV�q��j"�`�����rN�n����@��r��9��X�VC{m����I���o�+�J�>�kV���Tw�����T�V yx�6��p^p���nn���Ep�����GK�*F�W��jd4*J%8?6����M�A,��wq�+���s�W�K�<^q7�"�N�10��)P\��x�� u�Gw�Jc#t�2}UYe'����H��fF���B���9�XmD�q��/�/����� T�FC�?*�.X�z&��H��#�����gN��T�m7�H9�<�����?�1����]�)�0�]��.�1y��
x)6��=��Fxq��xC��%�+O�lac�o-������!�1�����S��8�Tl��2���z��fd��������1�1i����n8�8=e���b��Wun�YY�`n�	
�,��5��Z|��cH 8����S�������5/#������?)\q�V�>�3����Be�%p]��� ��Dba�n�����X'��*�jE������#��� �r~Q��Bc$�����;`�o3FC��|�9���U�;|���G�-�>�M��J����8�w<FK�����~�x��G�H�A�a#����D�,x~?)����,J-�DZ,
���w�h����N��+����l����^_��m����/���U��3>����:7��%CBSm�k�!�Q��P�c���]6s�n	l�������]���I��!}���i�
��9:��>6<|_E�?����������������W�/���^ lX�����#h�����	����,��PAj�;���V(����z�@c�����F	����������-|3�}&d[u�1�z�&�)�RA4��np<n|<��l|Mr�B�Q�0��b����%#!��9o.���
�S����B��{^��hl.D����6
@�y�%�YR�J]=\0[;�������1�.Gq@������,����GhXQp��8m)�^v�������H�|q�� `c'�)�]%����w^�	'��(�VA�����U�2�	\���[<C�C�.X���%>aD0��(������}�?���L�aemT6�^VO^�~����S
��t�a�i��Sw��\t$e�j>0��k'�����T���,�mX�++�^[,�V�
�����0*�&���/eh��	��'���y���u����2��/V*��_���f�����6U��<�@�\3����yzPE����gqm�����>
4B��2����IK��n?J���t=�e��@�[�ah`DKZ�}�S!�H"=e����mZc�qq����e��v�<!�4Y�����7��!����d�M���[��������98T)��q#�����BG5	����b6����u3�s��M�s��2��IH$��N1P�MEj}3�+�T��#�DMp\�V�����?�u�����������M����iI
=?����`{��5�1L���0.�P4���sf"k=��a��� 
&�	��&�c�p��5%�#	,��ex9-l�v���;lW~!(UOv!�m�}�� ��`��arV���}A�Kn�x5\cx�q5��PW E�A7(��������^�
#�4&I�qk���U-Bp����Jg�uf����M��4�&���'l���[��iW��q�3H^*4Z�������GS6���k$9w��/�>�x}B�m��q#�u�-�
M�p���CI����q����Q�$�Y�����h������?�Av�b"�f���Bv�����[+�$��C8�N��~�U���x��Q�OexZ�[���0���v�e�_��I�=i>_S�x�s�O��|�&A��p�Mp��bQ�zR���=�d�S��4O%T������_��[������C&4���(eF�q'`E�n���������+u���Y�8���c��gJ�i]�C��S��9�+��\UMr~j>������|z�������Y�O���:O
l�"�qF��[
�,J��,.I���[�>���lp*�tR�oc$���%�S���1�l^��}�qT��D�7��
�W����6�3�</�d�\�����H���`��w:>>Y���+�2���@�dp"��� �%��h�����0�;�,w��eu!�k�gt�1�
��T�.�
�����)�����>#�������\T�D���������a�� t�����V��X7P�{� ���l/��N��
3	�G���R�h��A�W�mJ�}��w"��F2L'
�2�^�0/�,�w$��l��@�(��T��71��d
��f���&�I+OsN���#���b����B��zK��Dp�396����9�k:He84���+eH�,���a�-����-BaH��y�N�DGQR����z�t��f��I*<RyCci>��h9yNs�n&x���J�;&G{�$��Ob���B�"������I�l!�9����<I����b��7o��
���a��M���w�|{�����a7�t������J�sR��1�oEVk�d�\��Y�'mP�s�r����p3bJ�l�����M��Jf
�p�P�	��P�L�v��_�_�2��j]U'�7�*�):���u/�Q��!��%1�Zh{����6��"N[KV� �(�]�
���gb��� V�>��y�w\����D!n(�0�H!�*X�2mc�!��WsC=���Izayw��_���j�e"J����KG��(5	SFA�@@m�eA<]a���i(�x4����L��I�?���9�43Ci?�Z�?��p����I%�,���h%+���)Z�<�m|t�V���q�N�[3Ek�m!�1���XSC������������[/�+i��R��V�~��%�.8���@����5�.��jN�g�v<�
���3������8A�]�������u��b�ob�W�Tv�xq�E5BF�e��
�����K~���9�r��P��L��]d��Vd��s�6?)/*�4�����)�D�
I|�jR%��B��;���B�B�`�p���A$����u�����y+Y�T?KEs3��H�1q_��C�m�q��=��N�y~�?�����?�-�i�$��Te��_Mo@=	���'?Gk�FB��g�O6:���J8We�$h6����;�����<��^�9�� �������[�4su��������4�p��(�|�����X����w�ul�k�i��5[������'/��Z'g�T�4IV��c�f;hp�AC[�;n,�{w�x�_�%E��a������'���*��i���(J��+f�3Js�4�S�y��@���)��dmW	0�S�����L2��6;���;�2��E0B�%�7Ab�f,�xQ�	�x+����
)q����������:^(�eKI�jx��52h�Ms��}\H���bJd���fDe����UT}���M-�C����&_�6���u&WV!F�v!���p��G����=ba}�����4���;��&��RA1!�xm9f�3�X��
���1��r���!���?�&
�Bj2���~���g�e�LV��/��k��/7?�����+���
���������y��dsFh�z�����Z�J|���wR:Hn��C�J���P�����Bj�3�9��n`�I�����)Ey���.�;b�����
5�������
+�K��WZ�W��V�����k�;h@j��������Yr$H�ZjK��p���~
O�����Q����Q��U����M��$J��Q.�y� �r���|�I��������t�����S����`]O���\ln�k_��4�:B���3i�#�Mn* ��7I����Y>��\������V���BV[C������}��!�B�4���F5���6@)��?y���=>{vz����3<��&nt=S9�O���J/�P�l���H\dII}k��br��k����7
I5L�R�r����B�p$�YR)����C����5HH��^���������Da��������
�n`���S0#���K7k�3���^g�������g8*0Z��T�L��2���8�)�j��(q���W�[s���v��%UE�MM��h��7�5�
�v[������"K����Go���`8�������:�2���L0	*-<'L���`��>�p�[o�jvr�r�z|�Gf��!�dF����P�������N��	���)�CD���72i��)��x�ej� �r|���.Y�+�wT�����)�PQ�X���Np�l{�<qI��E��f�fCW
����9�#�oGu��CK�G�
s��9�l��U�Y�w�aXW�sX�T
G��F7c���w���=3���0�xz1�E�#
gL�N�d�
h�c2W �U�Nl��h�^[�(`����#�S��\)�6n��#����jP��&!�����q�q
��
�����-��5^�T�����*'q:�&�ag�Z�0�ms�
�R�������XO��;:4P�����I�O�.?�4(mY��B����|f�|�-�[���i����g?�p�
n������b���W���c�
5�B����N�aaCp�����'������j���1�^T�����P��%$qK>E�mQM��,��FF����1d>j�Z���Z�U���"�_�Ms��9��K���vUn���9T
�"���$n\�q`��A[sX�b�iz���-�3���l�J],A2������2$k���a)m=Y��WK���&��KJ�h)y��'@T�}C��`�����<�i)�o�����OX������-��b��A��j����M�������c��[I���y���T7��~/���T��M|�`:�4j�p��Fp�������xWG����"CMRB�$������G���.��#�,����FY�4�
�<�CL�9b���y��;�E�;`���d��a��q^.���X�*���~��L���{���h�������f���ze��S��
K�������DO@��e�.�[Xw��������|�{��!�Q���@'����EY��?"��u�����s�k�9a�(.�=�`���
��))<��v�+
���P���F.���pC���S���{���Q��U��������������'����8}���i�C�\���i�p�����Y�S0(�;vI��mF�?�?��������?^�����I���g�g�>��1�HZ5O"e�Kt>�������Q�[<����<cgQ�e���8�R�
�u�u5���'q����b�P/T���M����X(���y�H��>�^Xz�I���!�T��m�j.u������A{�q�9#%
�[���J1$���0.\T����Yr6yq6y���k�q����;��}��t��o�$���<Xu��(@#���C�(�#��0qj������N��t����=���z��<�/+�|��r"JX�K5L�,��;�M�&Hn`����Q����n[!�]���~�zO'���6��v���hj����C����:����'��`/�W��x�E�i����A�u`yL��4�U�e���)�����k�[l1��M�X����n��F{���
O}���U
�o|	���XRD


}�re8��$����j��iR�Xq�i���Jc�0/�]��+�7zFqf~�xxt,����>��`��Z�a���M���� !�xd��E�%���F[��R�IT%�M�E�77���4g��D)E;\"��������H���-����r��^���;G9h
�{!�
���2��Lh�s�����d���0+���8����F�S�)�#Q�e�!�$���8o%}�����a�A��sE�4JY������9x'����S��50=�U�J
�c'��+v<�����Cj�<)�yP7����LO���\�h��� RL�@�@������D�'�B��Ja��2GG�B
����Gvk2F��M�i�������
+o��>*������{� ����S���(���ZuH�)	�=��x_6��3���\��'M��]G��I5:PH�g��7���H�4��&Y���+}D�����X�q@�=L��SG��7l�{H��+�j&���B�'AN*�W-%��f�Cqj4��C�bE����9�p]�|(��+9�_+�0yO�*VG�XE0l"rhq��<��y;�o0���wj�7��4��l�J,�*8�L����-�.�����l���l2O�5��w�j"oZ�X`�w���xd#��rCi��H$���w%4�������]��>�������b� �F|�X1��p�_[�H���9�8���
�(�T$�B�������^��W�J�H���I1�����6G�E�TtEL���d)���*���qm��z�mZ��3,���w1J�������A�s��W�$��j<e�X;(�+�7��yZ��#tapo��z����Q��F�T��4[�s'����=��E0s�1U��g�k�0?����������*�X���g��I��Y���[����U���������W �o1a�����:C�����gO���=�����6D1z��
j:0D�����{�j�R|Uud(�"L�)7���#���������G�?��a���|]������`�&��N�+f�I3{����������H�����@f��!>������y�)�I�^��@w[��S��F������I��������`2|n�lB�p��h�=���QT[��l$]u#�p�&���`�v����^j�m��]	�|�9n��NY�����%�;j!8����,?�����+�3:�}#������F,�Ad5�%HmR����\
���S�cM��r�(�37l���q��>^�d�:J�$�?=y��������{��P�N(�$��B%)�}��W.�K
(���d�[ �V���+/�La���,ZPu�:.m���������N��e��a<Az� ����~.���&���B6S��4�z�(g�ds�z2����M���
��<�(X�aYnO
������V��!�%1�4F�{�����
R�Poa����hda��Y�^��&������\eU��Z�JF+y)�v���)^6�&��6�!���+�&���0�W�#	���!d�����794j�2�T�������;=x~���f��fF����2��-��F*j�i�l����2�����lf1������,�,n`��7�z����F���A����][���qM%�2��3x�k�����Y�.yy�n�������G�K���r�}ki�z���z���T�Q�T�=�n���4�w��+���E����I�{����K���4�A��xb4�(}���qbGm��^_������,���lX�
��5c {t?��&
h�3�"�������CyNv�����n���Y�M�<-���5�Bcw5_9�c�A�_�������o}b7q*6k���:���at�X�H'}u�xM��r��AG�-��k�%[�����9����(��*9�����3�^8�x���]�����A����<�K/�c�����M���JH{x7��0�&��Q������m� �����*��k^t��B�{�����B6.[���]�R��|���$��H`���aY�9��\�n�:?����w:���^�`���8vd���}Xxe�X:��������<�i�� �$H�	K�wq]���.�m$����H�O����lE�%	l�i�I���A+����-���	�X�]���!��8C�����+�%�`D��;�`$���u^Y�X&�r����.^�l��&��*W<�@/V�1P�\]�^��c3�.�kM����uj��6TH�����st��4���t�����y�������#l�)@a���= �@�_=����ZXg�j=��j��"~�[Q�B>v��3� 3`35��%n���-n�f"iH}�D���@_�-�
9�k��,�������y'�^��>���2��],�>#%^�
��M�N�6O�����_�#K���U���(���$ �`��DzDq�s�������O���K�X�F��Ze��u�N8��Z��j�Z�wz����_�!�e�B�J	%����P���(�����|���M������k��7c�E�z�<Mhw`��V.��K#)�l�07�mIg��8B���d3C�)�����%l���vp#$Md, W��v���3pF�������z{�!�x���tG��l���!�������{���q�#���j:�$��=���L�=�I]g����;�H�,fM�e��J�t#��oT
��i��|�pp��c�9��Kc�.���������SHms�� �<t>���9|����'E0�D)�����������m�5�"2k)��_�t�w�4��Cd�[�����B�v�{q����v��*)�F���sT!K�x���rA	�po���K���^�����2
��l���X��o����SqBd�=TR��X����Z	���Vh
u�	��
��mY�i%]��"��1f%	l��_������l�SN�����0^R�:_��J)7��K���N�}W4���Jd�w��< 3���M^�(Og�����o��n����(

����@��1Y��T"�-�'�6��{-����K�z���<:��!|c���qDN���x�+)��=�|;�Ov i����JAC���:�zb�S�Q�+�}���|(����r}#��(��
+��1����������o���z�I����)��U���N�����Y-��������!�5�u����u�(�����������}z'����F�������PL�Y*��{��hC��@|��ih�|,"h�H�����yF-����2 
O2����7,���!
b��l�����g��{�T��{��3�^���4Ej6���Z�iI3����CD�
���K�v6xv��G�O�
��OH7E#(!H[���c��2�|�?��#U���y����I��������w	�S$���Z����am������Uz��-����8b�<�����dz+�ih�z����=�����Im�&���v��Fg������F��3�����-�}
��A[����Q�!WS�Sg���,L.�lL�N�K-FK?���(�7�N d�cf����xI�v7�x�1����q��h���<�~w�mO	m�N�����<�5vWN�����.mWq�IRn\*q��K��:����'v�PwV�G��:��#�5�6�y�S2�g��b%���D��W����GMikp{�f�f��+k�v��d�	�.���7�����4f���p3x�V�k�9��D�`��?�Z�����jT!��V���$�e���b������nv�$��`����F���
�7C��2SKHl��w�S����E�IsY� Q�c�(�D�-|����N��6����n�����L`���yB]��,K�i�����nP��Zy�������KC�����6	�������$
_{j���+'�.���|�F��xm'�`W�m�f�R{K��1�d&�H�T&
P� �x�e�n���7q� ����z�I�%��D���d�5Z\e�8�k��E�]�����o�|��8T�1��?���A������F���>Y�m���f��m|����O�d�)��a*2L�Vs,>�hOg�#g�c��;]��yl����sxHf����3�	�R=LL��b�X��K[�x�}�����1a�����w�XDo�AT%�1Or������d��F��Y��V�����~/��5�l�]���B!������c�F�������E�c�{����{Z����/���A��f�qM��>����4-$0A  ��x�yKT%w�a�a�f�r4O��|6�>O����98?�m�j{0��@lZ��|�r��xx�����Z��4��CB����J��\���z���W����v������h_d(x#�]���d>�CF��E
�R� 6E~�$��x4�� �G���G.&Z?�9b����z�ID$���A����	�t@�O�s��-��!
��G��/���?{���������~d����H����:��\n�S��,w�P2� �bO�8{F�P��������m�77���>��}D �I~6M�2�*���%� Nk���Jp4y1�gbe�5���u�"D���/��15%��U���H|sBG�!Qk�����im4X~p�����
x�8����O�Z����<�����B�����4������}(��xU�y��_���	�'��]y6n>����ld���J�h{���V��M�u�I�{`7(~s6��d����x�BuOkbW���I�?������C^�k����7k��P��_�)!<�*�4nb-�g������@@�>O/����t��u�SLX����5���N�z����~��ot�f�{Q������[N����xQ�#��/t�pt&����p,�Wo�U6����k="�b����������?�7�����nr���������3d��PG�����Hbs�2��N-��������SaAf�������m��������:U��eU�0jT����9
J���d�}'0�G����b"��D0�V�=�������k
�h%��1��b/�H�����������%9�'�_C3��c?����fj],�tv���_�8{|6�9��f9	LR������`d&�t������Db�	doE��� �����9�!y}4"42�<B5��D����x����:����0V�O�yw����f8��Y�	�6R<{�b���Z/����Nz�M����-��y},���i���%Q�\��N�/^���>����R���p�0|i�|������t�Z��r�A����%�\X�A�_����pT�V;i~���(�y�r��2� 7W����<�@kIo/'h
�N��G{]��{�n[�P�oF��_����C�rvE�������;5(�&qj2n�x�EB��$��;r����0'��l8~�W6a�iw{4��;�����)$4-M���|�7$����
���y�D/���TG�ZoX3a8�3�/>J
C��:��E��&x�K�~���`�>�o}���z+/�������vN�5�R.h7�%����9:MB~�<�}������C�>j��n4�=Q���-�nw�������*0�K�Kb�	��|�/�E��D�%�a���|PW����u��O��
^�z��/�>��fco7�w�1H
����1Te��*B����5�W�	��������?���uu���4�[(��:��?�e�����o.���2��o�C�1��p��^�������y&���{�xo���5f�l���!�=jg�M1�_v�OYZ��H��J��C��,���t|i����5��B~[�N�mg����{�����{�z�@�~�����?����u����9��B�-�*���mnQ3���r�0=N���e:�<|�o&n1�r����C��U��M��q��L�2������?0���V���l��������fU�t��A^o��"��y��w3_��=��9=y}zr�H�����?�go^<{�m���w@��f\B�������������*�/J��&���W���c�\���Ed|_���������t��L���EQ����[�������m��o�����U��������O���4J�"�B������V+����+)d��RZ+���
���������S��o���x�gf�o�3�]��S[��oW��8�[t�����!}�ZH���dd���{chK����]~��J���8���|+}3��<��c�a��E�����=�N`�.<�eBg�A?,[�8��8�rC��C�C1������cU��W���`
&���� �k��3c@����������c#M��6&;�`>bS�_����'���������d3���| �%�s���t���|N���C�����FU��H
&�R��=��`.�������G]��~��|�G;�Wv�[�Jg�i�5��C-$���2�L���\�q��fM���B�$���+Aok��J=�'�f&��j0�Z���:5��u��C%���f�����c0j!����c�nUdn-Ss��sS�����Q��}��#m���6
Q�(���M���=�=�`sI������������t<���df�0�Nff��:�[sIJ~u�|q7�����O������5�5kd�+��B���YF� t��MK�^��wJ?�u���K��IX�o>dn������g}��v����������S���R�2���.�ZQ(	�#�O�)� Pi��l����IX6^���
�Tpz�)��1��EC^�t�����[�(G���*�.�8����`?�cqC��|X8�f��������Sk��1U�����-b7�+��er{#�?7D��� �E4���0>s1��D�p5Dr0T�|Q
�����m���p�������������u��YZ_�����.eSm�[��fv�"g�ULcoXo�.s���4S]�?����/��S,C�`��;��;L�3pR �]C�^1�Zz��2S�#����J}4�R������hh��0JC��b;@{0#^�#Jt>�J�<��f^B^^�`��R��l6=lCHv��*���� ��(��Z���;IsJ�U��m���zwT���������;����f�e�����z5�F���f�F������M����X#����-������C�a��W�!f2�i�������Q�GI
�X�����J�{����a��ic(E"���|���"R�N��f�c(�Y��0KJ)��Tkz��������I���$
9�	�V���)6�p�m�M���gn&N�m�de'g�OJ��z�7Y� 2u�A�= �$��
0�}l&xdx:�s�;�������0E�q"?x���c�,��������J�{3A��X�fm��I6����
9'
�8��K��M"3�md��������Q�(R��aB}a W��k��O����6����_�v#�nL�w��� `(H�{����\��ipI[Y
��,���'H/�����~B��j�u�	��f.&�JU��/������_��*�+}���j�J�y�7��A���g�����nzk���h:�W;Q5�����bC��5�o���"������	�121���H�8���	��t��6�=�C6����&�Y|h�n���'��N����|��2?y|j���'�g����s��y���sN����3�<�N��n�	~@}DeT�;���=�����}�F&�*��uup*�:T��������������4XR�=s�p[<p$��E#�������'��WI�-"%��>� �nU��]�9��;
xb����DdY}(�:�jJ��"�Y8}U�@�aw������x( ��H����1������0F���z9�����4;k:�)g��x�PJ�F�}c%Q�y���Qqn����-��	�/x4<�9��=�+���������}����s?��8*)����v�M�������L���dY���2����g���)��D���RE��$��	G�*_��	X����m`N,��hqS,���7;m��j.��]�X��`9$�|����J(������J����M9>�'�����0^���#�e3��8��;���e8`��0��C���a0�e0#�hD��$O3<U�!
������F�+�
��|�;�n����{�F-��Y������k�^�/
G�`��N���KEY&p�9���&�u���v�eF�_���1G�����q���y1��|���&F3W�������\��G��?M�HT4K��������B��D�����;)F
��b1�1Tef��[������:�6�(���Cj+iV��g���p�o��C�����
o�oW</*k[	�p��7t��w�B�vL)����3���qf�t8���� �Lsj>�w�w]�B����A%
���
#�X����*��I���w������Z�u�o��O��0�R�nN�W*Rs���M`Y��p���j�������o��o���c����\2�.���
�+��4S?�>H��f_Hk4)����%�bL���^���)�m>��\T����O�5$�"���&���a2�4���H,�����������1�rRu���>~~����I>�
t�+���f[�p'��V�3t��a�Y���@�!��n�c�#�����L��L�c}�s���9;�������=�n7��,�O~��E�DJ	�7�#]k��Qr|�0Z�42�%#\=���+^%�w�Q�����9~yz��G�
\i�������?���V�|���-��a��
KP�1����b��v�����>;��1�=�~��k|k �����gJ6��:�i�:����Sg�f���c�X�Te��o��Q��;��I��hv�y�A�@����^h��m��W����6Qv��
0T^h�_,S�@��eR��v�uQ��/?�o��}������,�`�D�+����
l���F{oo~w�w��7|�����[��gu�:��kw����K��w��C���f��-����)$"�8�;�t�lg�����n{��9��/���;�����=8?�=O�O�Z�I:�I����tA�<H��������V�4�j����K�}�[�=�<I�b�7�����������}���{�������.�!(�A��+��H���=V�&M�&�TP�Nx������Yv�M������������0���}��<O/���!��w���T�e�,-T2�r3�$�Tj8��|��R������a�U3���*�`��ec�
_�I�eu0\��c��c\'�P4�������'����=7�����g%�S���T:��F$���.�CX�j��N?�������^��~'��X��Y��ws�{
�f����h��&�g����\����	�H�C�8�\5�~%�����
�y6o���p8:{	�u����%������Mx�j��T`hxz��U:��,qrw�w�.�U�y�u����%�����^�5<:;O�n[�_�[�3r#�m~�{{�\�c��)b���<9M~J��e�H241��G�^=~�������{��w����+��6H���>���
��G���Xvx�?D-�W`����7��#��_m?J�_����tb������\�}��_�?��#9A~^r_X8+O���Q����##�����{�
�����>��������9_�o����^�"�����`P��\�P����(m,���_j_����o��_y���b�
��9k��o�����Kb�>������k!J���h�>`�,�	�2<vz�;Wm(���\�,k
���}4aU���b�|arv��q����,T��|���V�@hFx���/��q���/Q�P�A������)���������:��aI��X,q�6"�2��>��AV_K�t�)
�%�!��
�RT$o!�
i�~���������TG�1�|���
Dv�T����5��4�/x���x�.�?(�j�JZ�O�����6���^b���L�*"�]=����Vk��3�������%��aZC9��o=��&�����z�l���5�]Q�d�,H�`�d6<����E7���e��=n��N_�/I�W���^������7�n�q�������]�
<5�S�>xNP�]����q�*�J��g����9 �Gf�&�����{5Dp��)	����6�C�64U������`?��X�����dE:Nk��-�%�m�wq������V��Y��l��]��!"������}�!2�/��gC">�Z�,���&n��vk"��G�v��Z���"\2����l60\:<<~�����g����25Q��B��e��.��pbt]��z
�0�%�i���^��y~t�
�DF[w�%���	�>/�/'�/ x*6q�|P�����x� ��������."��C�d�T����y��!-
[���'�|qY{<���c -�7I�W�[�_�9�������e�K�s�KK�j�p���&2�g@�m�mXo����5��zq{����8����,������`����3H���l�����f^����F�7���R��y�u��!+���d�I�6�K�?5������3a�
�������x^+�Ookw@UZ��X�� 3S;�t8<�m���^��[�������0�����d#�3��n�7�A�sL�S��_�K��Shm�~�������)B����?~sz�?������''�66�����2���"��"�^��T�ee^<�B����zl��I\���7B�H���Ap������.��#v!��y���g���c�2�9���ZH�e�)�'�x=�O"��P��iP3<���+���%�&��Ln��Uw����������"��b���;��:�V�5B����8v���(v�(��NQ��cq�������������D3��9�2�e�>L���)`����|{rvz����Y��5�)��������|��?�q��Qn�P'm�
������\e���)�Z��W����f�So��9+�dH�����X������t{����{��ZmB��7%����C]�c������8�#�l��+�q6���\���������������2����AR�`���j��"["2���z����E:�	d�c��2<���|���/�=1��9U������Z��y/=�Yc����.���n	28��k<�����&cE����"��	��l@p�+��Y
�c����t�t���=��b��ur��O��S%���c��!�lz��_V��A�Y���<6�r�2R�f�
G�xCP�*2��Z��&1���B��G��!^��[�y��b�gdD�_��������\z��E�O>P�A��l	i�`�w���.���;fxt��(O���!�Rn�����
K<�	�M��2�8B����\��~����=lcA�!���P3���"���X�!�!���N���"�N�)@K4/�����h��1Nr��g�m�#������Up}����mlW�"!
@*=���C)�A���RT������F�����G����=y%�(n��?�X����z�\���c3.����Y����Pw��y����0nf������\��F�(c��$.y�t��@��~��`���$z�h��R����@�?�p�N�/��e��x�����UD��<���7�*/0��y�t��N7A%�;���K1�N����]a�>���(Y1|��K6~�ys)!)@� �\�7��Y��r�G|�����{���P�	�y?������6gx����d�?��y�����s�={�������|V�C�o��D�
o�����3����c�#��%�,��	���
s|t:	: $h���1M�?�U&@)��ct�����ke�-��*�bvb����^L���c�`a���9w@;���`�(���bL���g�m����Zr7��dk<�����.P��~`g�h���k��(
�m�sU�E�}�[8��>��_� ��p�EB�`�sy��2�	2������2,�{v���7��#P�;����9��N��k�HI�-R��Vj:y1����
X�.j5�|�Z+!4M��
.V�QT����(��w\��g�H���#����$E'��T����q�b�����w�y��b|��7��,�.�*�!o5|CY�(��h
�R��u�4	Q5B�i����<������TE�E������V}M�g�v��j��	�7�.���]�(y���c��9�;��
P7�����C�:��ir0
/j��!���mml����"�����SAn-;Vxd��Y�~�34����IYM}�6!-O.�S��L�gQ�����NY�eQ�L��G_s�EZ�tt����:�vNP���M��\S�;�,�n����w��2�~[<�:�.p�B��0<?����Z������p]�H��*�H[Ey���� �.���D�����{����J�Z�������_U]������.����!;��o�|
�
9�N=��
�4_���p�	�[N���{g�d��]@L��o���N�H����d����}GyP�B���u���.-��0{�
��uAA��`����Y��0��z�?{�(�g�$�������@��X���Y'D�����x<(��f}|��i�C+���G��,� ~�iF�<1���BiM_�����6����tU��Hg*��kn�'
�bzQ3]��R��*��V������#��z�����M:���0/��Q�~=1�c������A���xz�h��X`V�A��c��� �	jp�q��f������U#��L0`�3��%
����y�����(��F�H~n,FfB��aS���x��c�7�������_l[��*Z0���,� ��7d�����
R@L����P����0�?�)c�
@�
ss�)ue$?��:p\Z��wh��O��
:
(�H�������;�f�G�10���Av3��(��A��W'M�#"���O���K `���`�H�F�Eh��1n�e�<���
B�]���8!v�BSG�l�����F��7�8^1S�u�3�y�
�������:`KV����j,�%�Ll�v@�2k�<B�O����L�I���4+�0��H,Jk�F��Q����������ho��L�m������<�/+�k�o`�bM:O��U������9����>*��������S%�\����5�$W�]9��t��x�T��{����w��\��]�~�"��?+]����!�6�P�K�e��>���>X�j�9�#
�0������������k��@A������L�����Sz��5�\��2���qb��h3s	�a����7��o����j_!���(l���l1c�s4��hk-���i(E".�tD�o`n��l��@B�t1���R'D!W�zO>=��
�p\��\���A�[�W���`Qd�V��0��������g�#��w:;��?���&�%�`����o����C��8";�%���Q�7U�����`��:�������c������CF��`6|r��I!�_�v��;f�z;���P��fV-�-�����iv�5ww���`'���%�5c�b����L�3��-Z����8�	(5*� �#�b#<��4�`�����.B��dE��*������K�����
$�	���L�	uv%��'�4��EA6�;UT���������%��������0Do��xR6��HH���_���y�� �	23Dd�~���f�p�?y>("!t=�q�����74��F"�D�K~�m����u���Zw����$��:UQ�:��z��hp����h����p�I���`c{�3�To�����j��xw�����X7��m7q,H	�!W���	;�S�>
��)��<wU�C�\����&��o:x���s��Je����mv$6	��,�J�%}O�$����:�����
���q6���`c��(�8sV����0Z�9���e���xi&���8]�(���L^7I���4�}���0L`�2+����<.�!���P�����!|y����)�dP�����>�n�����������m�/�&t=T���#�O���F^�1�b����u\NZmk�Z?���.
�JT��x��l�\/jb&A��t�H���fXP�U	$�/�M�$cVC"�\aEd����������������&��� 
�P�B�����G���r���ni%G�����N�G��Q$e��e�T ���;sv/��5���=��H���lj(��f�:����]��r4���(��z_S���.z"�����2"O���P��0�t$YHW�y��5)����f�k��$��u���\����W�����5��4������Ki!Z�����8�G�:���h�XjQh��@���v�h���P��z����J<�u���������^�<=�������(QKP���)!%���3t�5H��p�	�a
S"�_����~�|����	$���>��V���]�h[`����������fLF���gON^�={�������/x�k,���S��W�����t~��z�=��������o���/s�H+�4�X����NM�� {�8r�R�?u	)�	;&bKy�����C.?7@-.z�Z����ZfV��� 
])_����g�jZ�����D_�F��C���?����Dr��kq-��o��C"�7;��)�<5��tq���u���~n�G���BK�E��!�y���!	�A3f���
���1�G9b�C�Dh�TS���z����nf���=i�)��Q��3��3�BM��5����t�[c~��y��y����7��
;��Z����8�0�Sv��Wo�Z��/8&ifm��A�����[��wT��V�{�U~�?���-Rc�5����M��
|��j�M\�Ao1�-��p�@3���
<,o�\�<1�������p�����/�a���zem���_����8�(p�C�GGI��sX���anf�	8�!�V�bX�D���n�X�[N������589�S��� ����&���8V��;�����\���L��YO����G��4�?�d��?Q|���0U�jJ5� 8O���s��7���I�f9��.?������.�U%����W:�ps��Z�(Fg�A3�Y�P�G���Ks�����zg�%���|�v{t��]�x-t�z_������B������@h|�(a�L�X�{�BC�������������G/$�bX��i[�~�#i���:��!�F�+��P���7�
���1��a�i����b��M��;oq�[S~�!���c�>K��.5@��^T�h�R1��PA*tAB�e��8�%�l��h�/��b���n�1<��KD��Q8��s�lc���*�d��g%1���u��>�Fd�r=7�0J�H(�kc�.s2��[}^[d��G��Pg�J�+B(��~��!�XX)���� �SV�q����
!b��]"�;:VX�����#�������0����'����������o�Wb��w���[4�pb����
��c��������K5m
����r��=������:8��/��U�x�f��|O?w_������/�8���`h������f4G�l2�n8��;XV�*��d����6tI��5��	Qt��R�Z�vy��/G�Q_���=H�I���\a�U��a���pjN?q�x\����uc�H3�_���n�f0AO�3�B=i�vkq���8�����D�n{��je;{��^�puK���e��G�	������_~"�6z��t%�aP{I����|������ ������L����i�Pi� ����C	��R���M�+
`���w��5��3����#(���Y�R�S�"��
����Xs,��J+�"w�����(��l�4�$\#�d�a��@�I��-�S��7�!N?�Q4�^b��s�I*��m�TA��}
"�r�����V���W����
T�������/+�-������a��.�J�����(|��[� �%p�C`gx��Qrm���4S����N�c���#P� lE*��Aat^*��!���Y���)p+����[�JZ��R��1(��(�l����J"���F���
����F��&x��5����]QZ������5����wk��pc�`E�mp�pEGa��k�����[���4�"(�[]�
EwV5}����(�88��'aQ���pE��vV����Fw����[����p�����j�wW�%_�8j��W�%_��"1-�S�����������"k[!�����������V�� =8��-gW
��B��{=�,��{�����A��%*�z�;�'"���������YU��R�KX��%��)�!b�V��E��%�^�\�/�+��.g)K���6bt j$.�<`��y��-q 	���w�Q���r��Vko?v/"�e��K^���n7�������u�Cl���+�^^w�.��v��*�'����
����}T�r~��^]��U����)��l���������6 �)����"{{H��{���)�J�w�q6�I�3�z��>*1pv�h��DG����0k�!m�y��8Ls������31�M��]����[�����T����8�?9��
���[�z��l��SJ�+���a��j������v5\U��(����~�7�J��krz����Y2iM�� �M>���Qx�>��'_��43PXIe���fZ���X�e�s5��/KrQ*�B.��S������:��>�J	$9�JM���k��@`���"��O��>h�kt�jz������������i�����E�bR�wH�<�k���t����nw�K��?����\����|������po��>??����U9���f�����l�5 ����	��K::;z����������4��"���S��m��^�<��sp�2/9(=�=��b:d`>������Zy@ZX�S� $�c�5�L�����M��z��v����8���f���%������}��I����o�����V�HR�M���OD�I8{�d��D��7��+#���H1�\Os�[&�	�,Q�0
��Vr:M(?Z�+~��m���.[j�Sr��u&�dT#��G@���8������t|9��Q_�|Q"p�@(����i�0Z�7���;(��ln�*^�|����+���,����C��6�i����O|�{�3�x�u������M�T ����������7[�Ji0����|�!�	r!�F�������&'r�����D�WD!�>9�f���#W����qPO���v�����^C7���K�)���~���F�"�,�k}a����1��a�b�~q�Kx��4A����;$g�0o� ���|�����
�_o���G����3�:��{����a���������������Ew?3w�g��Ns�$O�s������;����?����d7i��4��.�� 7���?*
��\����3��9��j�5�������<��;�����������2�W�/�a��3H��������u�Z���K��}���O��{AU����_i4�������+N�T
7v>���FpQ����R��
q/� g8a�������� z~��N�lq��\���>!�c�:���7'����������*�+���|��W���f��%\�ce�|w�O��s�]��S�����0i�i��������*<4����/��7;�Hw�>�P��(1�����+�9�7��
Zn�]F��g�2|*R���F��R��4���l�K����S�e������)��XL��}����t7��?����M���(n��ZI#��Xv&��#�g�x�>�e�+L:�����-;ZED�nW���A���A�;�hX}����Y�7�_����[F��9�W;�<g�����"7}����������/?_~��|�������������u�

#110

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#109)

Re: multivariate statistics v14

Tomas Vondra wrote:

+ values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data);

Why the double space (that's actually in several places in several of
the patches).

To align the whole block like this:

nulls[Anum_pg_mv_statistic_stadeps -1] = true;
nulls[Anum_pg_mv_statistic_stamcv -1] = true;
nulls[Anum_pg_mv_statistic_stahist -1] = true;
nulls[Anum_pg_mv_statistic_standist -1] = true;

But I won't fight for this too hard, if it breaks rules somehow.

Yeah, it will be undone by pgindent. I suggest you pgindent all the
patches in the series. With some clever patch vs. patch -R application,
you can do it without having to resolve any conflicts when pgindent
modifies code that a patch further up in the series modifies again.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#111

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Alvaro Herrera (#110)

Re: multivariate statistics v14

On 03/24/2016 06:45 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:

+ values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data);

Why the double space (that's actually in several places in several of
the patches).

To align the whole block like this:

nulls[Anum_pg_mv_statistic_stadeps -1] = true;
nulls[Anum_pg_mv_statistic_stamcv -1] = true;
nulls[Anum_pg_mv_statistic_stahist -1] = true;
nulls[Anum_pg_mv_statistic_standist -1] = true;

But I won't fight for this too hard, if it breaks rules somehow.

Yeah, it will be undone by pgindent. I suggest you pgindent all the
patches in the series. With some clever patch vs. patch -R application,
you can do it without having to resolve any conflicts when pgindent
modifies code that a patch further up in the series modifies again.

I could do that, but isn't that a bit pointless? I thought pgindent is
run regularly on the whole codebase, not for individual patches. Sure,
it'll tweak the formatting on a few places in the patch (including the
code discussed above, as you pointed out), but there are many other such
places coming from other committed patches.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#112

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Tomas Vondra (#111)

Re: multivariate statistics v14

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

I could do that, but isn't that a bit pointless? I thought pgindent is
run regularly on the whole codebase, not for individual patches. Sure,
it'll tweak the formatting on a few places in the patch (including the
code discussed above, as you pointed out), but there are many other such
places coming from other committed patches.

One point of running pgindent for yourself is to make sure you haven't set
up any code in a way that will look horrible after pgindent gets done with
it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#113

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tom Lane (#112)

1 attachment(s)

Re: multivariate statistics v14

On 03/25/2016 10:26 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

I could do that, but isn't that a bit pointless? I thought pgindent is
run regularly on the whole codebase, not for individual patches. Sure,
it'll tweak the formatting on a few places in the patch (including the
code discussed above, as you pointed out), but there are many other such
places coming from other committed patches.

One point of running pgindent for yourself is to make sure you
haven't set up any code in a way that will look horrible after
pgindent gets done with it.

Fair point. Attached is v18 of the patch, after pgindent cleanup.

FWIW, most of the tweaks were minor things like (! x) instead of (!x)
and so on. I also had to fix a few comments with internal formatting,
because pgindent decided to reformat the text using tabs etc.

There are a few places where I reverted the pgindent formatting, because
it seemed a bit too weird - the first one are the lists of function
prototypes in common.h/mvstat.h, the second one are function calls to
_greedy/_exhaustive methods.

None of those places would however qualify as 'horrible' in my opinion,
and the _greedy/_exhaustive functions are in the 0006 part, so fixing
that is not of immediate importance I think.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

multivariate-stats-v18.tgzapplication/x-compressed-tar; name=multivariate-stats-v18.tgzDownload

���V�<ks�F����Y}X��E��UQl��+K���fs��jE�A����u��u���x���MrW�J,����w�����t�������=����C?�y�/wK�}a�O�8j�
��]����G���]>������I��O��?���;����o:�C��:;a���XF<d~�p������~����
����3���.�b���I��D8���q�x*���eW��Fb����<��X�i!��l��d|'��4�o�-�������`~f����q�h��	;_������y����!;�:kO�)��9{��|���ma�����FJ��(i]+)#%e)%
,+�f"\�R(P��l�]O��L�l�������-�=��V������������E����"l����O�f_X���>����	f��/�r}���
|Y;�[��N����w#��[��lb��#>���'���Vk�����c��;:���0���2�w��fwx������!�')�jq��G�g>�����?�"�L�R�������g2�x�Ax�.^4�/�](�8�Y(d�E��W��[w�j�������^���������g��?^�Z���:"x�����d��������D����@(�a��7��
���sY'�zS�I��WX�~&>-BT����B!��\S/�3a�!p���#�7��xzGG(�^���J��f����R!e��r`O'	��"��
����������B��p%��m����s�D`�����b�q���p���a��D���e�����:�L�Y�V����?���}0�6����i3C��m`�hr�iNc�F����#��o�B~E��fK�?��:�|��v�?��.���n��D{C.z����������sr�A��|r4�_�����?�Z�g�G������)	���G'��%����_i'�j'��������iY�s��y�g�b���0j��)�CLq�Q�@!e�d��Y�|��#o��� �:��/�L�d$�����{V[�������k�a����������������\��kqpb��}Yg?]�`5��� ����wy@x|�v|q�~�j���O��p|��lt1f�W?\\����������Uxb.|�
����A��$���V�-Vd��(����`�T����������I�����@��^;w��d$ s
WEd��Y���4��`<s%s�����^����tb&Y0\s��4��-I���j���Q�H��)
 �5��M���x��7
Yo!	��a��"c+{��)�}/
�>�	����`2�������HPJ 9bda�$>����Z��	���4B&EZe�k�_K����j"��PGe���0o ����cRlG�j�u 7�	������
es)�$JH:�xT���h��*�	�z�w���NE����x@�k�``K7�YH:i��F�:����W9��# �e~ 6��i&Y�����jx%�����|���
B	|�/@"l�����D��%2��*@o�`! gL�� J?(��b`(�u��]����U-����U�o����K
�� ���i
��}�R����Y�(g����G��-�N�f+�4
^����C�u���9^�+q�s��A�)���Z�"����anM.���
��#�z�;�Mpk`5�\��v	��������V|�6xy���C���C:�'�w�~M�E�]��{���,6��	���I%L�2M}�7ecZ�53|%)����DG���QL��kc5�O#�yU���������������B��Q�@^�����|���4-Km�����H���M2�g��'V:�����$\����
�	���]{7�j��R$L�b���f��5P����DG��N,@�Fj���{SLp��n5L�X��J
e���� ]U]D �����cTP�����D�X����;���!�?\��9���Y
9�������p0y0e������C��-�<��-��pS�`[�d���Z\)�i�!�����d{:�j+'���H��V.�G���AD�09�!�,�t��b�s�iXL�.�"�7ua_
��^)'4�`2����V�m+57$Z����%�'J�����
���M���W�e�<�g��Pq���A���C�� ���P��3��{^	��t{�
��)��!Q{�]�}��}����2cHD��I�-Hs5%Q��� �(D��:��}�E�����q�C�R�<i*��HD����V"�j���:��l�A��^����#.��4!<��HL����hJ���M%3���
+��e���&{�����e�������'~�:�Y�B����X	Sc�D���������9p��"FG��smT�(�����"�����n�,�v���}6y?��������g([���>_��CvP
x�Ri\��Z}��$��0+`tx��;�hl��H����!A� �.]fz��MH]��@@��Z
����g�>��R���
,���<(�A
{�,��	�(����1l)�DS�d�3��N���Xe`��Ni���+� ��^�[�g0%wd�P�E�8R�
9�H=��\d
��T����sG�1�U�A��GwP���g��L$�������uX�t 
����,V��iU�x���8�#���r��X��6��y����ZF�=e�#XM�M�Y�nA�A��<z�*8�"���_���MYAi@0�r�^A����_J/.��d�`�����he�-��^��kMu0����tr'S;���W86f*��
�bB\��$��v�NJ|M�������2?=+��~�!y8�������$Fm��AlP���y8���`���0�
N��)����j���E�������8[1���b��<��^�o������U���U��R�lB�����)E�����
�d��E�+�N��+03S.�|yUd&��,a&n����k[|�d�X�]���;d9}�z6������-jL��	�6o������J���j0��d� �����t����g��<O��
s��f��x������1�>5�uw�aA������P�uK��P�8B7�XL�Z^L�[VJ�BnK�����+	�_�������,�+��F���9�W9~gW��
����l)�'[������x0qZ�A8�i��f
�]�e6o��w�`��+�.�z+.�t�Ht� m�=�9�q\���U�b���_�fd�#>�����������vO�����"�Gu
V
��U�����9��<������YwQ�uL<*��`7�����N%q��W�?u�K'�6�3�(��=8yRA+��]�,%�K�
��4dZ�y�JQ��~`��u��L;�A�jHF�9�
i��e5��<�Q�2���P�\�KMs��'�8�������Z!o��v�R�
2�K����&��Y��6Hd�:[!�,1��
�&������J`�;�-��m�����D��hs!u\�������h0�'� �?����y��AE�4 ��6��;��k6����2��<��[������:h6�������(\1�y���t>�>rA�a��4�a�>�x�A�IS�g���t�s���2�m����DC2�"�
��)r#O�`��'H@���g�����A7w����g��&{�z��CPg��5F4�i�����>$X�{9y��[��"���.�q�>JB����	M;�'�D����+?XHW����+�^�W:��Td����|��<`���!j�#s��n�W)����������.X�"�a��^�x{�|��P}a������Z}��I�T�VH�6X ��$4���v BE�<��������5"��U����5���i�i���L�W�����Jt��Y�e��&S�
���1�V��-���U����!lJ�h*v'K��B0L��<�GH�l)����-�� �1�/��y���2�7��y
~_�H6��LB���Xp��x�����r@E���#�)i�X�Cn3�YuH�F�G?A*T.BG���]����l�K�u��0<��
e9��{�^��e�a��6�	)��k�	p��-��1.�t��!����":��S��~1Y�jY e�T� K� ��!X�dH+^�����Cq���3w��{������V=�R���U��D��v��FvJv&�Y�~F�;�8��b��jo��(�� b���I��$}]5�����N����
YY�AUD�
������G]�p��-��\���
������������.{kH{��"i����������e�&��B�q��S�v�#X�E:����u?���Q�@�Y1��wx�d��n4��`��r���5*�������{�/�4#T�:;�(���q�T��rO������@10TBV��T��ue��|�h�))�*v�w) '��=���
��y���dSY��1%�Jv�!��������W�%��Ted9}�u�j6�_��������]�a�7cv�x2b�l�����"l����J@k�=rak���i�0�
��o`�/����7}�bAI�!~�r����2�Q�\���������.��]�hb7����O�����?��J	uM�v)����d���CQ�4�u�z
v}V��l���#���R��>UxhK����#�RW�{�A��F�I>��F�:J1J��+]�/@*J���j|�U�0Cn��>�����n�����P�S��$��	�]7��&�0�<������S	�z�����"q\�F���X8)����"6[q�8:��T�Y,�����r�)$�i����H���������,��N���zL�SU���9eY%���u
q?���f�l6�}�������3a��������U5-k��L����Yg�����I���������� �E��`��>���#�^lt��\2��~R�So>m����&���dZ�
���<�����d�B���k����=�v�����o��e���n���W*S�G��vU�#����u�(��O�Z���<	���
K���c�S��M
���W��7�M�K�a�&��;K�V��
�)IM���Ej�MuU���t�E-�x���"�nt������T��bv�>7H������HqRG	x�������`��n�$:��X��^���N1�8��}�\�k����C������N�1�R�5	MoczS�����'0�okg��X_Z&��w�]��Y.���j��J7U9Ux�����2U��2H�T�&Z_��?B@?�����v����K&E7^�xo\��U[�Cg�U���I���k������P���V����ca�WC<�s�{u�X]S������D}���G.�G�cvy=�A�����Wc����0���t���>ono�����x$�������3Ys��K�]�s@o���C��+H}�F��fS��(�&��\SW�:�����������Y�3E*L���0){��u���&�{�n��+�O=A'@����F�y�
�M���|���|�;l��=\z#�d����z��a�v�|����YR����}�R���
�]�]����d/����. �S��QW����B�������;����p����u��zZ
�K\���L����W��H�n|�H��������MB��d���n�s+�Z�+{����R�D��+�*w�r�J����O+��dP�m,������:���n��5m�i����T����d���SR���C1+H����	�����jM�@t��j���yGzf�!6����oO����n	������}�m�/����I���k,�'�q^x@-�/�U�N��t�S���n��<���l�������LS�?��{_G�0����UT����@�n��C@v����I6���-�$�F-3��^�����I8���O<���r���s?^�C��;�]^s��L��<w����29���i�N�w�wJT������!���u�����v�k4�����Qy��\�L��&�H�wW��/�W�o���su	l��A��Os�O����i�O�[EAy������M'����h��U�K/��i�4�����{L��R�G�Z~�~�� ����.+�%����I"�������vNo"�_Ef��������8����r��G��4��u�x:C��'��G�Y�vW��!��r#����)F��:�va�6~���kG��ej6��~�����M��b���|6���������.�b�A9�&������B�]��\wbE-2���p�`�F�Wj��}�)�����&��������6�r����N�2lR�8�\���S�yp��'�u��_W�"L��b�nUQ[O�F>U�!��"^it�
��=.�N���<���Pc\��g'�N��gS�4��E��z|�������fmm��8�snn���e�&0������I�Yi8<4+Jn��lr �V�\���\���@�(�\�'������\��NHr��N��|�eU)�7��Y�Y�j��c���}�������@��%��\��;��O��������o��}r]��(����`=k����?�3��z]����Os��F��M!�^����/��\�Yo�I���..#L�i�H3~'�
�z2�P	��J������h�
O��o#��0XF��������x�v\$0c_����U��zD��3�e4
���QF��s���G����a��x��`z���	��Dg�`p����������N��r��5Cj�K�b�����������\@!Ih��Hgs��Zs����L�z���HB���(��g8#�������R2����kH�.N�_]u�?\��1������%�f����v}y��u���z��������������������������W�..�2,h^_���G�/������'��Pd|p�Q�����6����J2�@�L >�4#�)C���5�7){X;u�_�}8��p�e"��J�9�� ?�=�Zk�K27�gD7/If%��>k�O���&9@��`��Pm�>O�^r3��i�a��3 ��R,C�!�
��52Z\$cwg�~�p��f�l�VU�,�`ky�B��;,.���}f`�_[P��T���R0D	���1
�a)������_
�����{�]F��h{AE�����Qa����h���`\Nf4�d���fV8��S�zq%M�;L��rz����w���
qnX`�����������|��a4I�eE,FSW��}�L"�?Fo���'Q�*�������-*XOy�U]�
pe�Z1���[2+��y���Uk��|S��{�&Z�-��A�_uvv�����n���^V�2"I�����5&���-���C�7�x��^�uBd�n����&B�G�#hl7�,�r�Y��������*T�IQ�����`7�z&3R'��h��u7�����;���������m�H�Q�V�;v9	���W�O�V��k&aEL��9�}�GpF
�����7����^8��*����_��)�$���JB���H��|�
*k�\���dL�=�F6:w��~��j����;���cn}�L$�H+6���.dJ������F&{��)�>��a��}�,2y��+'�y�����(�4�2�F@O�.��9;1����&D)d�`R��w8������v��f!g�2Pt���LB� 6
g���gBS��7_N��7b�����l��������*����\����=��p���$H���PE���z	�b���Y��{��%�H�+�"
�}�b��8�O�E�=�=�J��o�����6��:��<����W�{��F9������g&���������o}�R��0Iq�9����Uu[]5�
�eH5�!��&p0QU���4���J�E���r�M���V�5
�z��`���T���P/��,�/UG���O�zkk����+�n�]/�	�E����X�������t>\c��zN��qs`�h�T&� ��\��'J���Q:�#yv3��ZfqI�'�a�u-5���NaT���.����T����4���e��������2�nL�x#1�H��knx������3<�����v \z�|�BF��T����~#�����j�.�������9��'��O��jS5��q*�V��'��i�"�aU��m��PM�ovG�b�@�-��n���1�m�50}�Z��,�r3��;������VOA6_�:i484�!� ���r��LwED�7����&��p�a�"��M<���X�X�B1Q��61,8{��b���k��k���&x/���"����w[a��h�����Tn�R���D�0�; ���{�����y�v��!1�����W��O���_��ba�^���_�w���x�)��]�w�z�@���6�Y? 4��H���?��a�d�H�<��$w���n�)Q.��}�=���V�mLjly�
���u�^����,N�!&`}:4WI?bqt����52���9����}1p������/�bf)XNZr���0.�D�J��� A>���`o��n>i�K������s}��3 8�+���������m
�\�����v-�.0k���;�����JexZ}���.�dx���'���ZH�)�<��-KT'�&@
����@��e�U�����
l�\����U�P�^hq��5z���u,N��[`*�t�d��G>P�M������9d&��*���k�a��*����������)�����F��I�u�/��W]5����I�-��B�V�(&��Vr1��;��p���#��/�l?��C�%Y��d���i+�v�`e��*�k�{�z���e��M�2c����
��i�}����7$�����I��?�k����5Q+��2o.|�����7� ���tu,k@ ^��=X���V���q�`�pR�pjx�����f���m�/:�'sH��b��Q�-��G�D�)�L`b4��n
2��[c�
t���X������;��gx��Tu
_���Q���m���E&(xO��e=��H��v�d!]�+�s��F��_��#A#��N������m\����(i���S�f� \L]1c�2b~�����F>��+�g�@�a`:�?5<�F������d���!xl'�Y���~km��Ig�Y3�,����j��>Dd7F����k�!�L����2pU?��D@�cz����T�V�12���.U8��"J7 ��[MR��Z;�>A�z�Lb��:�hg
|�����e�.?;�z������l�]�}8{�~�������M_�u��������R����������SrT�c�:��}�_�����4��l?����FO��b���>]��{�u�����+#oK����	z907t6h��Iw> R���~�m�_��xi��Zf1������-��bg�t�����H�[���9����p:[h?�]��Q -�g�ZQ�A�b��P/i���4�]-t�r�yy���	_S�R/,����J��p��1�����/�Q�qeXo���Z��J+�5�r���O��� ��?�e����/�_��/s`:&z���S��=>@iz,�H���\@���E�>�`��V~dI>	V���j��_�|
�����3�n-��Q_.z��KB�f"|�Q3�6��~�0\�w,�Q���kJ��=����\<�i�e����=�����7�W��?/��}�du����uC48
`���*w�A�'
m�N���F]m�68����{�6�
"a�Y���iQ��*�8���8�>�J�"g�4��f�ip�T�.f���5���(�������6�V�h�RUp���0����j�b3��,����d����BZuB{r"{B�������O��9�O��S0C�*Yd�=�x�4���X�N�@r��d�6��T�Q��.~/�k��s���5;����vs����Pp�rm���~��&��� c�#*$�(�������K��O���������8@��'��#}�;�O��?�����0���
D~G+��(�n�I{W�Qa8=L��~AuY8�7�i4�����m���\�
Pcv�!�(�@>���q`�_�j��+WuC`/�G�����ao>����|�������K�EHmV^[[a��s���3?GU�i�
y\d����pjt��t��rf�����5h4���v����=,83�	IL�#�t��)�xL��h���2b)���.�_�,������7?�^�9>���u��"�-�����������/������Y��/�^_�;��j_>cF��%���{��`����h��O
��j�0���0,�?
�$���������_Kw��n�d�������V8��*u���
L���`�n��p���(N+���#j��F8��y�~
W�m��%�)��f}������Q��k�G�7 ����}�`����	��e���=U��Lt@���*�5��w����d�K�.�����]������Z����P;�~���R�������Bt��t��-z>��i&_�u�{J���3���iS��`t3�����yz�|hk��E#��p<Sc��C���y����\y����!�o����'������'9o�"h�0z��cFPp^���>�@�4R��2���s>�c�*�ZH����[p����s���0�����Qhc��No<�+��#�vg�# e��o���[��~�Se�y�������!B����e�5M�S�}��v"��u��
����k��-4��jm��F�������@S�>U�Q%q�>j������Cf�Nz�/����J�7�p�a��8I&����
��u��jU�����Z���w�l����V����.�@�V"28:ep7�}���Y3E������Z)�e�	��Za������^�b�v��M��g�s�����d�^��������������[������}~��>������j��f���;V��8�u�l��6Y����m�{5�����]�BR{��H��,v]��g���G2��M^������Y��S��g���m^�Ak��9V��)i��gB$�e�pdc���[g�y�����L���7+���5��J����~��U��n�Nb���6e��z������wH�}MaX�
�9LA1f0���+4r����}M�d�
zU�<:��lm7������z�t��:z0L&��R���|�������n0��N,�~S���>9A�E2�����b!�Wg'$U���n�|�E���2������8�z\"8���
\	���'0�&y�*��01�`e9v��Gqvyy ����3�+r��w����l ���7s �����.���4'�b���r=R�Ytz&�C�b�� 4�4�+����}��FJ�N��(p�A�M�@H�����&�O;K�&���p�x����"=T�!��8�x&�t�qW�{�RG�q��on�(|�F���6C8��:�'�:yH`(J��{���;1o�wj��a�rfg&�LEq��<��*��%�9:��E#jp4�PG�A��r���>ET���Ct���6"�oC]SWR����1
����PDJdt({�z�},���eX�&B��fA/��/s�$\W�����w�O�Y�
4"zy�J-���}���1�.����/E�����[~:;?m��y{�C���?^����o&�������v_{��DWf$��;FE��Y�w�g7�B���Ay+�
��nr?����5��r8�	��E�`���T������	4d�����S���L��<��@�g�QC�&����A��-��*-B�3/��g/��f{�	w:K����h�b��%�sG�������	9y�'���:��`'H���b_:`A���5��7y�h;�rTl���@�+���ua#.��3��������V�a�<7�[���V�����]����^L�m�����fh?�����2G����� �^�=#_�Ij���\Te��7qS�����U�c[W���B�������������������"�0G&����R�s��l�k�~�*X��ppXE0��[r-��m�k���xF�1n��S���;���(�Ea:��Z*��'�y�����:�Q�`f���NT9�����������-�W������0�71��"W���O�U���,����>�QVe��]86y�����'F�����`�dn������e�2���#��_R��%�6��d��[��V��L�A���W�}4U�d>�@���g��7��Y���3��d
�����Ey0����z�0H�%N�d���q��)���_�1�S+����2����EQJ�'�g�-�����m�-5���
b�GG�Rm���Ph��BG��?���v2,D����sN-o�P��!�B�1UCta<E��*�9S
�ZI�I�i4h�d��8O�"�~���9t�^� �@cs�������Wdt^���w@���B4�U��[�\�x~
��������HJ����wT5�~����)#��������������U����%��/7j'�B#t���<�A���u��b����q���<�����u�	�>�"��;���:�0zw(T
#����"1�g7�m�H�i4��c��!?�-�F$h+�nG	|S<�C�8��C4��%���D��EY���"�%�,�/�����"w��\��������+t�U��:����2��]�Uz+� %^nyG�/���j�<��U���^YJ,�����)7_�1�]NV���b-���#G��[�����;������r��I�+}$�%e��S�'n���[����'�|O�eIsP�T�w���q� N#��4�x��!��L��P*��+��������l":��Cd3�'�\���$p#r����0��#����8k��l�K��S���q�����?����60�'?v��__dH"&�G}K6N�@j1d�sY2DU%)��T����x>��,�F�J?tx�n���xN��Usg0�x��Wm%���6��\f������A����8����`��'h��&IY�����.�������K���s`�Qm��!I���m��r����������cC����~��]SP��
w�R�J:����:4���O��f��I�	&Q�2��$%|�:R���fza�Q�j;��@w�U����
���|�l}�A+�,
��/�,��*NlM0Y���ZU&!������s���n��f����J#��b�����QZS���\��\�*�0���Uh�[���!;F��fZ��gA�1��V/�<����{�c#v���Fc�����rg�|ycx�
���\�o���:���|9C2����,�T=��VU��I��)����kc\�(R����2F�5�A4"�ND���e��V]������s��^����Z�M��}i���ln���������7�����e����]x����?����gL�S��:I)�����x1�������eK6���S%$�7\�g�>������������5�[�0��Z�\+��-/���Q�)'��_8�N�Nh�e�i�����$X����fG����&{��R��{Vi}�$�t��H�NN��dj|p���t��Mh*����0L�i������
���;����g�����hj���y-���|��X���1������G{���2Z�tPF*�&����(�o�rat�	�^����V�\h'�	`!t���i`F	{s��.�0�J�I$�����6�%L%�3�A������u+�����/�)fIWDlQ
B\261�E����"�l��2����������k��9��l�R��uC_���5m`m!����s���63[�+{3���Pa�ZkG������x"z{���Q��ff���}�W2�,Y���\H��V���'PB�������\{�Lq����dGv������H)�	�;(Is��"ISt���l-��|�83�l��W)DFb�5�n������g9JX�L����o�t����N���U���<e,mJ��pI<��NW�������Y���x$�vk��'q���"�<�+�J�t��6�:|�����3��f�(�"�������8�.TG�t�b���/�f��]�S��������E�AQ�#!��5�(e(2^�!_J�$�I�)i��]I��oYIT���9�_�c�m�J�d&����A������v��9A�W��${:eY���b�5�>����H�EVr��US�����J;�,y��~�K���(l��e���e��d(��tn�����D�|�_eR��y�|w&�����P�O��u���,LA�R�k��oQ�Ms�$��Q�����!�d�?�G�f��1Y=W��8�tN�:��t��H6�(%�W��a
/�!��������a�*N� \�+g@b�B
7_jh����7&��b�g��<V�>v�[����R�l����S�t��2�]�s�<&�yY2�U����,7�.Hs����d���1�8DN+��H�G�m
�7�0�xlZPR!�>%�|����=������]�c���	������J��������3N�(��J-������}����r��T���)��vp��yq�hjC�u��i�����l�7�hd�V*����B��.���l��S7�^���*s���S�dTeZ�Z�.z�x2��������B-��J��1%*!�F���?��N`U�H�v������E7Q
:]m��@������e�Y��o���1Y�2�����������[;;{�F�����,&���s��{�5F��n���ca
��%f�A���8������
��K28
�	�/���������)���'��b �@J�����$L�}JON,M�41%��g.��~1!���b�����qz�	7z�&��w��;'{��.|��f�������{����Z^���V��(qy��s����/w �#�=�o\��h?�+t�pZ��oC�
v�^&Cw�V3O�x����6tZ�h�7�����?�;�^�<h�+�7�|��w����Q}g�r6Mf����g����:;�(DB�PS�U�����?����sT���8���g?%�@&R$����pm�8����)����z������aGj������\�mY���C~�n������LmzCz�l�%�������5-S�������[z��x����h�mt�8&��������S��	/��~`�����B���<�M~�[A�����;u~q��?��Qh��B�����2p6[�)�[����H4�}�?T��s�p���}�M�	)��u�pC�����_���
�j_�|�lv����Ee�_���{T��~W������{�=v�$��j�r����n$&$G�b�2^J���f�k �db��H�������xBP��X�i���x
�<����"��+��)jbN�*D e��o2d!�Ts����;L��9GtV��5s�y������%]�-�c�`��k�G�n�&!���_���s)�y��cV������A{�w�����8�lGIV6og
�Ts��=�d
�m��D�vr�	���Q����;�S��v)�9��@A��``�c�N?g����
Q��6]�V���o�M��f�Kj����V��7��_qw���Y/���%�����/u���v���hog{�������m��#����/'Ez7M�s�=��&��Pc�Uy�<��t�q�����2_g�v�����QTB�Y*���g��k����)��>�e�ry�J�&������ux���U�yx������l(u�"�������r�o��)Ovb��u>l��}yv�������)x�1Y�����@YOX��7�|�������y�����]n��0������}u�y�����T�D2*y�r�
�5ir,��+�x}�V����I~�|�������z�s����������m���@���|�;0���W���F{�x&��hi�+�w��j�s�\-�|���$���Rg��9������
�w��Hf>��L�w�"OrQa�_���c$��������p.`�[���� ���������tvIe�����g�W�"����Rqj�UGo��Q����c��}8|,F���{�.E�bBQE�J�1Y��QE�5���`y^dn�h�[Z����i���vo{��h�t{���pV��)��\Cr
����<����|v�EhXPt�g���m0�[���B�4����;G\�
�x�X�3eR��=��J�a������������1���2]��I� �n%��K��c�|���s������(��h�LN�1�RT����&��<��a0�����
�R���d����Q��Z|,��c�G��GE���O�I����$�T�3B�3��Ht�@��1�(�b�'���v8���h3��R�x0C�1'>o0��%�+tR�{�I����&B�y��O�I�G�h�N��m�|0a@����C��8"i,&p��C{�i����a�Q�{���#d�����A&z�FO��q}�rdO'�Jt����R��?�{Q�>'���g�J���W�0-!IVx����P�:!�q��k��{j+���b��\%v�'m���<V:��0J����X��oLmXA��$�~4����������:|%[�Z�,q�UaF���^�h���H& ���������j��b<uu�z��I@�}�m��6�)"*�S�J��B�^�(�M�Z��Y�",XNtZ�3O��4d��=�S
]#XWY��a(�D`P|D~�9����C�D.�Y!���I�&�c�
��S�I�~`���Ei?w��0�0	�u����|YY��|�R��>q�:��h���\��y���#T�
���75����������c�}���]�R4�JV����):_��$�Q ~�����\�a%V��@.b%m#��7D���J���LvWc%�n��NC�
����XI�����E6SA;��\P�����g�����I������z����*�������C&��e^��l�����z����!&�������-�[���]������CNV��n������'7�LA��3k�0��H�Wx3)��Z��(T���0$c<�O�oi�4���hFK������F+%�>:4���D�%C8���N��/�����?���(�  ���K��&���VX}��_���.�JJG��J�*�X�=F��Ia/)��(7����v��5�|�]��O��^"�E�S�~%L��ZQ��HL-������W�`fbC���J���?�cv+��$}5g��N����q
BdD�4t�� �Vj�$�`��F��c�j2N/}`z��x]HvA�=Sfi�@	��J-�i� ��h^��E�#}
/��=W $��bF�A�XoXp���}�'W��H��E$b�T���������������?N���+)��b�d���\Q�&�����P]��*
bfeO���������Ap���8�8�}#�����n ����S�Tj��.1����;���r:��y�^���B���#�����LZj���aBq���(qpn����&��~S���1&��8I����K�<�S��9�b$@��zNoc(W]����3��$�M�fyYn�jOO#L��	w^"���lK/���e���s�s�i�(�� ��PI��C�Ulc���q'������i����N���w�$��!*zUX���^��%,<������4��ab��*B��%�A�����q��n.�4s���/5���`d�����*�q&w5��A'��q�u%)��tA���i4|�}��[7�+T�0r��Z��6��<D�:���#*!�D���9�O��]g��'�H�`I�����z��{,��NW��z-��7uO9>nAjm�v�N0�g�0��cD	�d�-J#T��h����?�j�+�RC# ����&����q�^�[#��l�o[����x����
�
5��	���F�%<��d��O�����9",.v�N���EH��*B�MNAbh *�6�/ �
�� ��� �C��z�	��^���Ff�'���=&��b��)�1��x"C$n����%@��-1�B�yPR���F
��0���b�/H���E��@H)_U:��=c��q/��Aw�����:!��6��!y4K�����i��Am��0�&`�M��a������`��Tnv����������V�]�3���fNF��e��#sQ&z�"a���E�:�+���D�q��)#1�(�n�i�n,<v��<%F{d�`�
������������m`����2����*�px�%�!]��vC��k��$~;����[��c@���Z����y�'���,�R�d%$<�nH��7=P�v�A* |c~I��G6�t/�>�`�Q�)c+�S�Es��a�HeX��au�/�Ua�LWx�cR��^�%b0K9(��P�^�6IF���U��
W��.Gf
�u�	����Fb���`�����A�)��-�Rx
�@���c�k���V~9D�F�pS���*����#Z��'W���R����"P�H
���9`�%f������=\=�
�uL>�D*���r_���d�&o-�b�<����w���W~q<�S���	'2Bz����ht^,���7>�.aA�����,M*�G��t7@�>�W����N6s*t��n�5��7��IW&��&��4�������t�?����2�g�K)Y3�Z�!�	���7��&��G��_c����9����b�.��B<c>��k\y""#��� �\�����u}Ja7{wr�'�>�Z7��G0������h��y�Dz�AH<`)�0������Wx��� ��0��(��<x>g���#^�.��A}���X���c�,7�%�U�&Jo�_�P����LMlT�d�C���V���&m���?$�� ����{$��YP'����������&9}��H�
�K����.V�ZL7�Y�'CLx�`�B���:t.;���[��	��4���N��$>r�Fp2(.G�dH\	�y�QW��mb�e��s�O�.�O���3e�`)�a�D�w�`�z4=�"Pb�{�no�<$������R;�����K�/:���#�������l|���*�St��{��N�!�}�7m�����zIz�Bhh��B��3l37T��]q#������2��#�p��I��3�!��}�f�l���/�pP%�7b	i����Y��f(���Y�6u>!S�o�(�nu�TQ�w���@�
�/`���iiU8� A��	�����������{�0�����QL�������	P�M�W0"��a�5V�PN�����Nq`7���@F��f��k��z���/df�C�NC�����\���<$���h'
LHdg�&��!��
x&��������8t�6d�|a�� ��t� K��?���C���Z)����n�zu��FI:���x��"��S�D|�������Z��r��	���s�y����1v�O��	�.{�
U���[��t�f���[mBAF���,�D���b�Nmec���Q�����a�@���.^#������$\���p
�ZG�-<@�tM74��V���!�����w�I�7li���kqB�*����T��!�IE�H���.�Ed��E��p�u��o�dl��O�7u%����sk�����:1�[���U1�.7�>��gj��i����Y�@�l�h8������*�|L����[�X����-N�)�G���%���ef �����%�6! �RT��EZ�QN#��OZ��q��0�X�n�C�=E@�
�cT�3!���.NJ�s���p�s����~�""��c};{�7��h=/|��H/�����2��x2z�f�}T�����F�it���$��d������`
�f^��y�hBm	?���.VA4Lq�H�?Q���RO�(�p/����T$�&�=��V����=�������_@���$��s�|�H��a7"���K��uk�b;eAK�������[h8�>9G��YD	������"������^��������k�t^�����z5C���90G���F��cA��~V�����"b��Sy0jG
xF��yD����	_G:�A{�@���5DI�WT��-���t��P�i�����27E[��������l���^P"vRi��E
_���A�/�Qt~��e��T�"�K�����#�Tj��A�#UY���M9e�;�HD�nHF+���o��.��X������
��L���Di����A�����R#7���Z%�ku�9/~�u;)q��/���Y.�0���Z��Q��G�,$3#EWA/��J�A[9w.��#��d���F���i�f�y5R@2HN-;gN���'}
��Ht@PEL�ZK$3@~cK2���7�[Ec��*6Z�]-g	�#���&�=����vw��1�VjR���o�J
M-l�y��-�EV�D����>�*A�u@���\�3c�Mo�	+��LV�\����H5�53K�X�Y3s��8/V{��$���g�h�:R�k4��/N�Tm%(��_���LH��c����bu���I���_>`��Qq��7C�yC};'������7��jY��!�qsv�Q�i9�*T/��1���V���`���(vt�4�r_��$�}�>�I�b�	�T���%�y*%G������sb����3��zYt��<;�gMLlu]1f�Ub��<H��lN���1H�����U�/�Q����p0�d�t��FXNn�f��}�x�{'Ewiw%L3m����#g�]	�� $����gu�hI��O4�+�
�*7�����0|h��<HY�_C�dB�E��W�I����4x]��	b����8�,7l��9�r#�����oD�Fj�Sv\ 5'!��k(�*R�R�)�cmP�����YU�?������(#5����J�! eI��V*dfF	�[����:���&>O��FK���s��7��	M�4����������7����KdOb�B�����Z�a��>3+4���������2q��"���?�����*fd$>�.�{��H� l,���jOf�@�Q/���dl�?�)�Z�I�+:E�e��}��J��{�^��?x�����y�9d�������W�cJ+���6���������2�H�}�e,�{1����WU�����&����f]���zMN���v�L(w"��[��.&����R����1z���~�y���8��e=�iq+@]���p�|dN�������������>��G?�&�{L|j�iR�JS��C�;�m�t�1�Q[���.eH�S�L!eI���$��^'�&�5I�:������l��k��:����-�Lu��}����Z�����$D
)t&(���*��~\�YU�����j�f���J ��� �:�8� �w4p2Q��SP �>b�������������&q@�������
�r0
�1�X��"J����f�43����[	����v�(�ph[����]9~���������V�8}��&P�)3�)���=��������P�W.��Nw�{a����g����&�d�L���I���w���wT�s�Rrr����������F;���h�S�	�g&����%��&ca�F���u.��lk`�e?53��������'i)US������8�1���I��j:0q�V����j]{��5;#��l1a]R�:���;w'�;e��c3a���r|l�����%�o&o���+��wl
f��]��^;������C���[#H67�H�q�^:l�	�\���.aR\?Sg(7�����I���%�fka�f]�����cI����zIF��n�~�B��:o������+���v\������L=j}�ME��<����oBv�(�E)c�JS&��L<��j�Z�������X�h:��o����)b����Z��&
+
�Y�s�^��3=1��yf�7'
��&�>�I7��&7#�z���p�9�JmU��q�,�X8�_��G|X�0����\A�y5Cj����:u���e�6YL��D.��-D�b�R1P)=��I������^bY�Fp���Ws�$	;y��r�R�/����;I�����"��L���-spO�%%L�$��������6o1Cy;��0y��@��03-G��O�@�cq*�oL@������M���K#�\��~�f��y�@	a��k�j�:��{j��J�o����c�JR�Y2�n������5�Gm�^8���z�K	��Y_&e�y*\;� �4���������%���^q�]�X�L1?��}�G�d�������h+��qt�W�����k�]"�s��IA���<�3�x�c���0������k���B%���@�&yLq�Naz���]ZZ��7
L��w����v����H�\y��Cr��hb
���p��J�<:���z]�I��+��d
/J����D7o�
��8��$�:�UN�R��H���T-�R�rtHu������Y.����t&�U��a_���������,������PU���xB/h�R���
!()K>�Fi4h^t���N��9�R������>
�?%0W3���O0����~MU�A���F���������g7�xOQ(�L��d�<!G�&�����]$�������Z����.!	����
��8�*B�d8v'�|���80�D���V���	���DI����Fv�<��;T�����`���,C��v����� b�f�n������,mD.�Z��6�U���Nx����- �J~�o� <)���K��L$�{FffX��Z)%n
a�YH(�e��1i��U�3<��r}xVs�	�o�����������tG� z;J����CS�\�(�.N�ny6���O�;460o�-�&��wl�����
���)� u�h8@�C��
SJ��e��Y���+�T�|��~u��2U�yP����I����}O���-M-����n��fN��[Ld�}l�����j������N����m1H�����&E��j
��b���U�"�I��O���x|�B�����������ra����j��~��n]?��/"�F�e����c�0"����K���J�.��8p�A�/�1�`���J-g�"��G������K���6K�YxH��
�Sv�^:��������TE9�e�a�m�������\�(sAH'X��[|���R����" -|����j���CT�����2R�� ����%�0w��e��R|<����5U/�8��w�%����@��v�0��2$��������70�-��{O�I��)�]<U���$){Mn`��v��e:���;f��gR�bchg(>=�+hX��0E�U��M���4k��*L[��3���[x
TU���tV!������1������R��}�\��b�'Op�$G.��H�S��p���������4?:�Dq�J�L��a���b��P��W���\wN.��!	T��s����j*�*g�^M�����_L��$JA��Ssc ������;����7!��0$W�1[�W����K'�$�b����*��5�49���m����b
e�qV�o�����le��v��{QH����2�)�_�0��R=w�����O��Z�5��z*&�&����pG�o(�i�I�>�.����,�$y��`�v�����u�QE�((�����FD�+m���Ju3M��Mx�;s
/X7Y��,LgL�1���1�(�f���h#�|�?WDJ_���s��%H�*���"�O 5�N�H;��3'�ix����j�w������4�u=
�����8j(��
'�T\
���������Ley	2�,	�Y�}�Sb[if�b����i%�����pg^��F4�-%�n>e���Exm
G7�5����)��/��}��C�
_�>�CZ,�a�{�W���n86�N�pB�x����&�X��JO�'��j�9�5�d����0,�=o�p,�$t����u�b������d�KrF�i�������==����[F.x^����'��:+�������6F*�e0X�c6n�>���I����1��?�OpJ�h{2���(�r���HzJ����??'"G��������pvt�����b�wg������+���9�W���lD�"��6����\�)�;��},�D-�����9�%����8����O:��c�I�X���f���T>>/���z����u��:-����Ar���s>����pw�G�7�.3�"&�4k��J�/�`�I��fxvN������������y����&��������U���i��f��?~��4w����f�;
h��H��.a������|�:v�NM0��������;��*��c�z�V��;+�����
v;�F�l�t�)�V��-n����?wt�z�:[y}�8V�4��� ?�	A&����(�4�s��9?>��,u���V������Fo��#��r����������|TM5�������v�m��+��w3N���Eg�6d�u�9�F�^4���)Us~�4�"?\��	��h�;�&������9w���+�'��:��0	T<�~�,���DF����i���fxc����U�8�]Q�5
bL�k��bbX��6;�5�w��"��M��W~v�\�]O�V��!)�1�F��C�D8�Awr'QS����u�(r0��R��`o��\�u�����%(o�����9�@���Vt��p�����L�L�����R	Vq��&0����\@���|����[S�~j��)'*�iE
���x�R�d�!��*��"T��RY����qY�hH(t�%�������A��|B�.�``B1U�\�m�w9s�^��XC��7Kf���zS5�j+o�U��'��E���r6�r����~
{�	��,t�*0:��{���e��b�l��f�������X��/���p���j��u���a�J:y�[-O6f�^2{e��D��b�w��8��X|�a�7���_�X<y�{������(n�����F�i���Y�g����3��@��nh�z�x��cB2S�9e�
K"Me���������A�~�f�e��<p
�vc��=9�4@�������Xw�D�J��R�`$�Y0�(��*5c�������6+��j��3�c���86- ���O1	��8����]�p*iRWH,�<�J�z��)?�J�K�v���2��m(��)<t�s1�"<�����F9b��L�����(����a��F(�o��?����]��������TB!�$�&?A����a�1*V?f���it�L���v��H��*1���OmSs5���0cr����EL�H�J>��_��!7�fr��g�v'%�C'N;�}�2�}u'�������/K	��8��I��W�l�I=���[�pGW����S���8�w�����Y���iq���#`t�~����?��S�����
E�.��"�d\)�R0&��'g{)������`^Vz�����0���%~�H(��;�R��z��q����~;e��aF�r��q�����`I7S�������1/�	+/Lz{�P9�O�d�Lk[���A�'+�E��f��ra�;r�C��Nz��{�x5$�$
g����L�.�]	��(-#[�?�
�!�U[_�&����{D�����,sq��FV,�3s)������I�>j"�3E��?GFw���X����fD=�o�~uj�<���6��,+��t�Z{��5����d*E����5Y�Pn��R���,�!i	�T�$yv����[p���rY��M������������]�`a��%wEQ���4�]2M��A��Ij���~���VR���u='yfq=#p?�B�SR��KQ�LDm����`N�+���\��S��~d�N^j|`<�@*��F��S�����f}u'8�2�P�>{���b'������Vt3��.�dk�l-�	�0�8���R���0RI�s���|�l�H���7w�s���FT���S�-���Bnnv|�(�:��F�c��d�^�p%}����}gMJ����t7��p<������n���3dX&�DOq�/U`���Q����]��O�����8���N�2B��2N�`_f9O��'RN�`�9?�����o
�P���e�F�Q�|�W�:,��~3���QO�������;������"���iN^�r
A�t)����}�j=+��p2���'itp?�:�NMt\��E�7�1�gb�X5x��%l3�c�w;C���D���`�m���+���Mf��Xr��L����i��0��2vB�_4��������������/�<�X/����Y�`����ah4Q"I��U��+�-���Ju�\M�IP��������.��Q�n� }���R�j�F��B���c���v2��%o=�=���c�����s������,we�����kt��\����k������w���wk4�I�iB9[b�P�#�B�X.mD8SS�������HR�S���Z}Z���\�=�J��&�����hp��I#�V������]�s��R�JV��w�]�?�w
�j�8����E����<���}�*��2b��'y���S.r�i�}��r\L�V�c"��0������_h���
�a���i[���9�1	�����XY�nm�s���N�.��S�e�4>�����.������I�CK|NgIv�/_(���zr��!J���sf������0�l�(��;���	�������+N��B�E��8Y���E�h!��K��g|T�v���Y��*5�4����aaR�|��b~X�59I�Ui��Jk/�*�`-Y��a7����~f�vc�
8���s��t]�G�a��� �m��;��4�+�y��c���)2�G�z��"�uW����B�/=�>L�0p&t��1B�hR�<�_���<���Y�:�+��L��2V7�3q�s�-$�*plI�u#�Rg���F�����}LQ�����S�Y�q#�ly�N����vu���n��Mg�g�}���B�*",��q��F9�<�R������Fy�����b$1��2F����W�U�}h��xaf������9����������d�>J9
����x���p�`�[�@[o�^}��E*����jF�3�2l^y���,d�s�[�,]%d�L}��K�r��[�ZXX��jf��:H!�����p���`lvj\�����3TQ��h�Pm�N��*H����p�[��d�^���
����E��2W�;G3U����]L�����#Zg"�=;��GuV���}���2q@���)����3�r�G.f�p4m�&���|P��=\E��Rx�D���sc�1�.�w���)�@��L� 1+y7W	���z��04��,��F����R���8������6P+_��2a�
<+l�*��%�[w���s�1�{	��AW�������L�F8j�Y?4��N�[u�@��������hzcr��L����}�����dvc�r\D��+M���t�*u;5�=Z�M%�~93!1�,�W���[(eq^����;���OkY�0g��d�U��hbl������&m�@���T�OLK�m���c]������6y����&�j��G�jZc}_��"��_S�R������&�Z�w�s�~��_��</jM�}���?�kw�=�*�b,���}Y������e���Z�D�j�c� bT
(8�����|���@�����I����v������p������\�
��2~y��|awjZ�1r��+X��RA��
��	�b�����X�<�A��}�~2����m8�=s��R�k���S�CG�t��JS�Lxmm�������N/��[�VC����S�%c�sM-*�-J'�yN[1��������v����N�n<r(��`'�o��(�7��U����:��v����]��e7(xz|},�]��5�����[bE�y�02��1����m����&�[[+cT�V�6�Fg��������%�R.h���V�g��5
	�yAG��WW9L�aMA�EMe(�����1�n�2A��R�nr��i.�
K-�����1��$�~TN��DP��|�#"%����m�+�m�
�������M�J�`����N��]�]�l���*����Y��m�'��4�����u1��Lf�C�%��cT2�����;qO�2��sh�
�A���R������lD����t#��2}����T�����<S�B�����'tt.�>�.�yj6��>�i�G|��\|�S��I����S65���2��0=g0OQ������+(�i[&�:Y��	a~�s�`������*"������\���Vu�Mi�b�j�+���U�u�0�Q,<<F\��{��P��uWe-\���s�s8�t\��P��y�-����0B���YG��95��V�,EZ�U��)��G�e9�/��X��(�!�RH?Q����sW�=�2,V!�(��L��V�_#�Sxw7s,]�5�t���$���gWr/A���(����6��tl��F���c.kI���lF����6vQ������C<v'G����M��z|a;
����z~B�@�/__�P#�2S�����5��|��u���.�k�r����b��D��,'�P���l_��<����@U����lr��1�t`
����n�/��_��5K�9E��f8����dcW4F����N�)_�����y|��%L�*s�Yu<����Q������=�kSl4r���U�O1�'�!�
HCn�n4u���
���2�)pA_��X:��^t��.6!�m�<�	}�m��'�J�Er2�9��8\S'3#�b���Rs��d1��	�X�t#�`2����S�������Nr�p�
�~,[�
�a=5����no��u����K�����l�@���:��k���\Q��x�5I�2��"�0Hd�K��H��A���;j6{�������+���Y���P��-��j5�w���F��a�P�[\��k4��F�0��b�/�d�Q���^Q�w��7�&i���oU�6:�h�FY������:�\M	%48���v]�����s����9zj�>�H����������K0�A���=�F7�7u|����4���H���nw������~�[J��\E��4.�p��!��7��T��}���������?���QJ1�8�*N]��Lsc����8;�"��������uM���/�R6L]�T~�f��ah~Eu�����/����m�)�P���K���@Q����%<�zb����&���Yzn��W�d��M2KT4�&��X��dHY�\�xq�����XZ3�J��/$!���~�U�����XW�`���J����-���{vMu�w&��&��9+�<*�D��� �Vk��!�����������YW��J�n��m%�:�<���U�g�L�KI����hw����l�jD����\X�~B����������,@�z���"�(7�rh�t>dX	�(��7n��_��j������K��*~�k�����N�k/6��zQ�r������m4�^��w���rvb�����j�������m��S�EH�����0LSX�������\�,����E�@�G�
?���;��g�_�/����h<�a��a��i��������5��$�,�o�6����W�..��Fd�'�)�����/i��������+��%��"����+����\BUA��j-7v����:�����uZ���KQ�yJ��_
Zu���9N&�v[���h%��#�4 Tj!;�� "!N��Ma;�T/
=�N����[W����e��
P��i2�6���%%?�)�3���X~��Z8��g}�_���~�'��Y8W$.�����K?Wft]���
W ?������L�u5�S"0C�<`@�c�R�(�jvm���}�
��n���z����Ao9.9]�����p�`�p
�i� R����\�;�������9;?m���-/�3���;�q����q�p����~3������p�k:�����h��[�V�/�3t�@��o��(C�C&-o��Snw
�@N�l�^Q>gj;;��zj��`J�R9%D�T���L�!2S9::X:���������Y�CC
1���xzt-��V�3�Kf��J����(�Y_$!d���zJ�|�/���'\u��\\�?�[<�
-�2��R2��:����;�A�����V`��>�	���(��Q�����0�>p���B(�����W���8E�t�;�B���E��:���+��\^'��w���J���@B�%}���g��<�+�����o��L�fS�Dl������*C��F�{L��}�
y7����s�w��6�
�^��Y����c������$���L���a4\�r1J"�n�l�������Ni6���8���n����r|D�p:%%�C����9'	U��S�.a�X>YG����F �D�EI�Jv��Nvr��.�����+��av���d�P�7��K�&�-T���P�4��Q������ �*�y ��(��I�c�����`E��w��+xUT�F@`r�
�~�����P�7�(T0	��m���dFG
����5e-��C.�N|O'~�s�:��u|o�Y^S^�Aa�\5q�$��.z0��w��gnu�dp�Ss�$���d,�H#q�k�-=/TW�C�k���_��R������8)oFOl�I�es�S?�]����(�dF�C=r�2W�����h�g����gt���1�#����Ha����T���i��H
�!:=o��
6�a��`I�2��D���OQ���O
�u��y���$8���5y�|�Ks/�D*������M��X������f+�@�������s�|�����wn{�t
�������u��:�
r~o��e�����RW���&ekPO���xe�[�1�!^B�.��~U9�C�f'9��6*+�N��6�7�#-[���������5�h�xwIS�������t��|x������3Wc�R������Tk��>{����Rd�ho��J�����+���<��!����>�
�_V����N.�u��aG���
�D(���OhI���|��3G������jg�u��n��v�����d�^�_������,S�������j����Zo�����������������~? n��hT���C�B��eQ����[{j�y��[
���!_�YiL,�HF�B(-�z��3�@�~Q(`�l�*8�����sgu�$�)��
�����Ga���[~,[������)���O���i-!�������*�^ ��egF,_�R:NE7��mm*9w{Q��������)��E9)w!-�%��n����$��PN������#��sv���������@�p?�U�pg��n2nd�q�8��g��E�6e��'S�]�{��Y��'��������c����}�����4�����Z��G�����?P��;:����E2���u��h�G��{��Qw����G��`V�z(@�\���=�	�����!�Fp��OQ� ;Pt!�?}��b���[�[c
�b��B��x���oW�����|�r�^�����J!���O��h�bf�^��6��Y
�y!�-��Mxs3�n�k;�E>��k���*M@�=�k-��$"_J��)�C�	�<�;���;�������8�n�-�L��<V����<`������!p
��X�u����������m<~~P�G�nb�;�lD�hhm����Pt�>~}�^]\�����?����T0�F2�*�m��n�.�*!�;r/@�y1�]k7���cyR3O��L����U[]_��z��id��d������6D8��"�jE�awg"x]�a������D8�?M���;!VN���).nVMP�J?�+�J6��K=&ek>�0��x^����d���q���D�<���d@d�ga���K�����7(�k���&�'�����Z���(�\��dA13-~�T@\q�1��=�I�	�����S����H:s^��3Nf.����'���-�����%R��gLz$����|���.:R��Mur�>��?��M8�F�����W�~��5�O�����$<j��Z�i�5<B�o��}r��|��]��^�|�~{�>h��}�����T?2~6N;���������������~�zw�=.9�F]U|X�k9�Qt��}6����"w�\e�i���k���):����b2Cb�5��YR���[1����<~�U�8���wRSE�&W.���X�{1��3h(*[�F�0�5�Xg2)Dr���-s�.��$s��8�	�7�Ix��Ia�:�=�AL���2<2���=��9�����y����i�{�4�q}��������&�y����i8�%�L����*����&����+�����G�1�Tf���3���AO��I��w6��-$?#��e��x�z�+���6���u)s�IQ��6K���u]>�.�����N���9�c���Z))I{�7t�$���z�����Z�;��.h�@P���&�'[�>��[��WF����b���Q�`���	�[I>�����x|G��WH�w,(����� S�9����	[\p1��4}s��&=�@We�+(>��I�$`�j�|*U�3{F�������"�"��(-��Ht�����\	�Qs/l��5������m�6p�"����F�����v���`��f�!��):rO;����,UM?v(����a�	����h��s����]D�Uz�����?}^�|�:�d���%���G\���`�'��	 �t��~���8z	���+��t���fk{�_K�o����w�f��������&��
���G�����\*����������D��;��#��c0�������5/��|>'�|O~���!�Q��<o�ak^_X��A���or�d�!����R+c�>N;��_b%2Y��R$����c����O����a��s2P�>p�6@���s��
R����	=��K��}�5��B[�:��\
&�f���hC9�:���xtsbp�5%W!n\y���L1������������d)����_a����
7���a&�,�����K.��5kk������=:n���w^**�P���Hf�)\]J~Q��	��i��gR]l�z�&1:�D,�3����`*���Q�a��HWc���TKA�^��
�u����O���h�V��1��D=�����`��TSs.�����LhN�I�yI�w�
X7��X�	������$���Z��ka���K���H��-�C,m��)����b6���OH���!)���8�RX��3�|����
�Z2p��t��[��i�i|R��Q�E���� O���||��������%E���:�`uE;����6m�v�s������J9z�c�[��;h5�~�
������������`UE����N��yn���)e�1Z�^����x�L�a��L�\�|%^}t=�WH���m�������j�:�imM����V��F�d
���\-{��e8E��1xD��j��Qy�-�J3v�������y�#	���~�&�N�]�?^\���'Q���^�h9�Y-6X\�Mq.e|���1�F���_O��Pw�l��J]$`A%����b+�a�����Y��f,,�!���8��]�Op�����dL�s���6����3���/XS����������S�TA����_�W.��.z�.D!
L,[��]l��D�5���O8���c.���7���;�����`{�w�_���x�����$8,��d���4�tW3�(�b���s�jt�>��5��J:��i���&d���-������C��>�XY<e'&��JG�=��K>C��+^Z����x�~����t&��:�jo���m4��������tP�IN�?��s,s��lq���E�]���B��������m�3�2���+��W������}FiH��w�K�q��Y$�A�o���[���:������f��F��A��e���K���Tj�'�O�P��$���J���Y����
��?�"#��u~�����AL�mu�N���T��T��I��YU��]�����(�3��u��u�0t����8d��)�D��3�M��r�E[h�%S�RvFu��a�aa�����t~�7qYn&�f��q�s���7G�u���������Z��zzj��?wr��4�%7c�������P�}�fGq���M����B�����$_�8)��T+������5������+��4-��4M���i��_�gN�����$�2"�'�}U�"��������]J����e�r3?-�g%*�Ua��L��j�d.����9@j������/�}�qw�?��_���3������TjA
�U�5�/A�G�*�.����
p���Z�����Vs�N�?���?p��3���I��Zu������~7~����/�����M-�
��v�����z��������������@H�I��<0���<�_z���S����_��`����?X�?���������UY'��oj���Oh)�T�}������{4�?e_��V;��V�+e��c�]~�?�]L�;�����6b�����_�mu�	��7��_�������|������}�����������<���]'�3-�&���Ox�w�3��������I2��|����o.f����v�h������N������2T�'dA��e��,��o����������k���������6�F^^��_����P�������j�Go�@v��p!m�������E4�?~
���o���5Br��h�vhB��Ol.��$�P���o���4�}�M�	#-b�N����0��{T������B$X�����^��������-6�Y2���������[��B2��X�N��������9Q�pK��@���Ac�3�'>�y��b��:9�+q�������YiP����-����3�I���M����&��$[��H
(;������������T2c�W���(U�E��@�Md���� ���J�+�����.�^�6�"4&L��� �"�S,t���w0o�GB��1erm�uXo��f~���
V�?+��!��=M��n`�� P�p�N_�{�����]
'��
���Wg��9�����L����~1��'Pp5��{�;�c:4�o���|�<�8/n,������9��za�F����as�����b�a�� n�0����Hz�<�eO)�����-L���/��I�Hw`�%���������}�<&�4����{��K���f�N�e��F���/�<�y�������|�<6��2����;��B^`��R'�^�{2����
�5�d���4�t:���Q���h��z�������gi[r���CN�
�6���B�!�}����o��7t�s�<�BX��M�����d�����L0n$$�����-�S�}M��?���3�s�Z�g�)��21T%���u�6)��#-_��z���Ft^��tf'�S[3J^q���Z��M�X������9��T�����W�C�{�����%��X�=���.�^c� L/���_�����x��xw��G��lc��~��vA�;:�98�]�$��[v spM���������4$�U��_��WS	#�����W��Hk���{2!�1�1��.�������XNvi�d��v�X���<�9h�5Y�I	�����;u���b���g�W~�4���h�P�����O�2��e��"E��h��MWro����������
�z��������H����S�?����*�r���xo��������_V}����_����$���������m���JF����������|t�����������w���U���.�D��������I����|K��o����h������L�Z��eWAkr���;G��|����a}-�H�y���o����1�]���klW*�����-v69��4����nR~�MKs7=M�$��n7����.��<��v��?�;{�k���v����n�v��S��U;�x/)�oH-�[����?�0���.l�Q�j�
{ao7���w�v���������a�:Ro����&�y������5+��3u���T}H��i����o�gr��~z�F/��������}u<����=�<z��z�<T�m��r5���L������;��u���PGu�b5D�r:��
:Wy�;�f��%�{H��NN�Q����I�;���~��c��51���a^�4M�\�
�R��B����A�'���#��=&2�$�S�=�O��H�$��"O����z�	��Y'N���K0$^�U5�"�����W�YiR�DV������D��*�^�}��]�Fe�E\:�Y�>���Q4Ju6�H���4�����Nn\o�V��S�q���0�,(�,�b��
�nY�>a����K�����|�W���pc���Y�N>Q/0���J���aR���D�J*I `�rA(U�����X
w�[�3�6�3'�����.o@��Q��h4��4�����J�:zp�(�	U��k�I�l��U�;�����^R�z��V���6h!�o�,�j'���`�����$A��R�4�:�����t�}��:uF5zW	������n1�8.�p����R�w\16M�#N���N@�sW��$��E�VD^��:'z��� n�BNV��c�|�l�$����t�%_G�F�:m�m�^�'����yx�����7��)���V�������;�y,���@����C�l@�����@��������v�x�'���������a�xy��+����AW|vD�
)���h��{��F"ei���i�
#~�	.��t"V��J^�]�+6�^����^��8�n��MN�]�++����K���y
��L����f�p:�S�+7u8�$�Xs/f������'�:�-��H��[;i}���������@�
����ffo��U����S�'q?;-~9����n3���K���� ��^����H���_o��*���g��N����=�	l 5�wA��z�t�d6�T�M=��`!5�X`t=���^���N1�����?mq�A������C������6'�~u�~���m�Uj2E���}�&�Eut�_L:*E����%���[�=�)��������dJ)As#9{�`������[��8)��i�C�nK.���G+)�'5g�dVW~���7���Jmrew��
�f���`���}��`Je���X�����1�<f���Ca���t$����kk������W��B�\�}�b�)9�pv}����{�|:*�e�fg�w5T����Wb�f�Mq��XJ���a9������8�����K1&2e.zx�RJ�`������L�*�Y���0�d�]t|65fpYV���=2^<MQ�@~Z`�i�d9	��EN�aEV�J;��B���Z�1�N d�f��s�\.L��h,��>/+Qw6������s,6�����H7��!�>FU�c9�VW��%
�aQ.�$�1�	���S]�'�l�Cq0�W^�)������jF
.������Y)@���ebS]���'����
�F�u6y��M��+nL���t���F0&b�a��������=�[�������|������:� ��c,m2���W*��j}�G��:��>��a��?��jgW���O�k��Z��q"���������?�x_)���by�2�������Z�6��������<a��ts��#������V9w�SX��c)�%bB�^������;���}?>���e�h3?:)�u*s�(��)�;<��iNn��E� f��s��������8z�h�k��ZV&��12N���EH�aI��I2�9e�<�g�D�U5�#��s�zu9�a��B����:���
;f(R���3�Um|�-g���T�C�gB��Q�>�	V`g�{�zK� �4�K��y:�S��N��}G5_�2-�]y���z�B5���h�������oS�#R���w���E���M�f'}
IB5���!� 2c�)�+xI ��5	c�gA�0i�7��hW03:3����6XV�� (����~�i3L�D�T�l�J�x���>&z^J�Q+�N�_8��C���&,]\�/�!�#�iR+�%v�K�J��������Lv*&y�H����>F�������X�����|�B1Gtq�Td��o6�sA:�s_y�#d��	�3<��������1l���g8�V�:
�u�����h�bu#����a� bL���y{��n�I?@��������^<�AN�1�]����E��N���|��&|������k�i���]4���D�NY^�x�0��I0�>��MK�;�oc5�rQ��\/;�y���A�/]D�aL���i��{�Lo�t��G��(Q9}�'�1LSnP*��D����wv��;��j,�0�4�����4�0l���	���)������VB����"��R{���E����V��jY������b���UE|�kI�|Vp��	`��=���p�"�����[�o���Y�
!4h
��&��x`�U��;�;����\bQ[�)���Ck�B����'��*"�G�^�'���T������p�]
�A����������kS��3&�QLd"d�G����� `)��w�i4g�*O�xwqW�.1brx^(Q�T��b0d9�j���Bs����:7
{�D�6�2#<+�Se�}��x"�v��x��/A�A���+��T���"'����|���c��iU�/�(~��$S��(�Tg�oHxf���O�q0_YFjk�~�L��+l����A��d0�"��`��2���kx�������b����������!@s�?�k��R��������e./��� K�������W*��35.� J��fXo�O$A�(|@�����+�=(�gY8��
��k��'s�5�������U#�/>*C��Y����Y�@D��Ol�����&�l�{O�th�}�:�������"�<0�r����3pF�J25��h��#��DL��.��,��z�)*�����0�i�U��� h+~�vF\9\��0��
��V��=Q��L�t���Ux��T���X.���W��[[��N�x�+��
XS7�~N��,�I]J�)o���a	���.����^�E���
��|�\����0!LT�5��C�4>�ie�j�+5c�������;����*����
1t�bw����,=�g�C�,��7<��%����J���������Z���QC�p&:<�
���6_N��2��l:w��x�jJ��4
�97�O�P��x�G���!���}{��tQo�����#"�91W�'>���=)�uC�I�U@�$�����7����X�Q1��
��s���:L�uj��ad3ed4`r�1����9H#���H7@M\�&&���F8@���9���|����z��d�^�:�x�@@G.��fe=:���0Ia��_YN�,>�=�j�d}��3;�����"���r����L
���B�����|���'$�gN�y=���N'��N�f)j���3f;�&��")&�wz3G�8�k	-5������@��y�K�;�gr��XtJ�q��J�k���P����<�'��G����9�G�e :$�&<%��9(�A�\s�X��:)o�~��eO��,h� #�����Q����tHWxC�c���w�!d�$���	������&w�H�"��p��/���+�����(���E+�n���<BrG�|D�K�����������[����#�(�r�Q7L��0;A��+/�0f��c�D�c���N��H������Z�U�r$w�0��;0�s������������us��cv$b�{C�������i��p�������u4R��z�n���>7�\{���r�L��c�
��BM�����i���=�s���"]
��8������d��)��5�P�:�?_��<c(��X�?�4��+I�z�b�����S���d!t���F,a�_���&$uM.X��U��*B0�L �O�30D���IS.����C���>l�B�������GO����k��f��M�"�V�$���\����U�"�$�R'�\���k���i��.��vV�t��\��Ebr��i{
�A�y��"�3�4F	��y�������,�rtxo�������|M�r"���?��]��E�%������b�3�H6��q%IH*a�
:�&������F:����o&�oH���#�Z���)1k�vqz�L]	<6..5s��"�-�X����%��Iq{����0�5d6*5��N_����b)���5w�`[�s����H�s����_3ORx�s��}�K�9�8��8�>hQ;�{��z���E�:
�N8:��e|,�b8�
dES�6��T�����H�Ww�b���:�H������X.X��q
0j��
zm��`_��������MS�Y[ftx����a2$�~�y�����X���h�]MX���P�k��:"�3wD�8�u�u����1e_g�i����!g��e������-�jh���*��e�~�!G��
Q!6��p��4cn�gJ\}UZ~#�������DGL���n����q��L�(��������>�n���{r�T&�]��i�i�^������z����2A!��������S���Y9���gn���0��l���']x�P'T��.E���<�����UX��sxb��4�H�`�l����b+t�/�f�.X�)W��R2����c�To?�����j�<k9�Y�
��(�p���,8�2�i����]������[��M���
�G��}�
���z���N&�$`M�od�.<�A����X}�����^SxQ�9������x}}Wy��
v�K�s����g-�l=����D�|�N�I2J2E!�(� �w��\yE���~��BlCx*���m��~��/?k��&���.f�*f�&����5#����o�����tF�T���Z3�\�%p!N��j	P���ZN^Xy�?d��+p����Z�,��2W?5��Y�/,��.�����l�
�u�������o�q���tq#j�f=~E�'��b��*P"����g4������������`��������$�Ij��V�_�~Zp���+��,�D��:�hM9�<��	�r��-|���'�����4��i������)���E�S6,�� g�E�*�l�L���v{w��a;Fqj���@�{���=��/M7'#�l��y:�jF<	�_�&�3#{bNq���8d7Bn��$<�!^7���j��@|��s$W~p�5btq��h0�)u��=��pM��5n�g������t����x"n��G&��SE�B�n>k�d49���9��z�sB8���`x�����LF4O���!��8��L�"z��?\�"��7�>bL�XT����HD���J<�pts��������s�>�~C]�X���4|0�^�|Z�9���,�����*���-�Z�;v�o~?I�FBL�0I�"�l:Xn�c�6z�0j�j^�c����/����v>��
�Y;3&��%�:���	���M�q�����P��F���!>nm������{����rK�pXn�Q�M����DLz�i��m2dRn4C�����������#�H�y'�F;�������L�����a~Kw
o�1zoJ���1���M���>|p�Q�:#WV�gR�rX���
�������i��b���7�$s|��ne������Z�o��lYi�u[=��j��2�1?������m���^[ym� ����u�}�T��I��'��gU������nb1xFbZ��>M���fP�*)T�e���z���S~��]���
���
�l��w���M�#�|���*RF0�u���_�I��;���1N������eT���U/����R_q�f��@[�H��;������_��2s;�������%�\}a�h�$��|�
�Q�	�cE�[���WOfP���O�/�!+�!�'j������f�'���Y ����&�X�������v":jD�^�1h��.J;���|����P`��~}����-��b��6��	orN`?6����U���_7$�Kk)�����P�|���L=�T����af6bQ�u�\�NX*];S&�meS�+���1;��y�jo�h�#Z}������\�v�-�"��(��7S��3����=hO�x��A8���&s�Nf�A��P>T6�(�q�6W���������(����h��P���+��r�E�HXA�h����:��)Q� �������2f�����
��Rhvvn9E�Ar.;@�x3�H&�(7d���0�������h\��o��D>�M�B����&c�
�&�$
�UPb��e�%Vn�= A[�;���z�m+x��U���#�J#x
�G"������y��$E�����������d*�����B�	�(�<�]������9����E$��g���z��l��)1pw���9p���&%+bHP
����>���2�Ke������I6��BleA��V�GTB=�-�:�Qu����{�=J��^������A'S�U�pu���F�3D�{?gJ�lT�XZuUE���	-=�l��vOE
������>�|�������oX~H)�X�a.��������B]A[��GC�Q!�����10q]#W��.xv��L@����%��#��!��3@3�m2A*L�`����#R�)e���B0}���%�J�CA_Z��@���G�*����u@�-Vr�}�ghE,.u������2�S+�#D��P����`y�|�s����!�3^���"7H��f��$1���J����P�����4�q|��r\��J��Z{�pw�����
�8���(Ys+Y�+b�+���;�O�����$�m����������2g�C���R�������{�
�����B�8�3�`��J��>@����s�I��V��V��� Z2`"��y�[��u��e������-I��W'U7-J�!��f��T��-C��P @��~"�?�n��Z�\��p�C�|��f�P-�p����|CR<�P��8�(^��C��)����p�?�|A&�8D���%-�b,�,�����o�<���n(�f�AB�����PyZm|�!��U�!���T|Z�(89���E���/��0��6�I�I}@�����k@��Z�{E.O�b:hUm��M'�Bn^�h�]��*� �Y�y��VN�9��:���m���LO�U�v6Fgr+IG:r�N^�r(�
K6�U6�F�ne:�&�@"��QO���f�I�|(*8W��c�l�0V��pg�<u}_C���m/��3���3�x��C�dL�y���9[�Pf[8�9��V������x����]���w!|�&��p��<�`�Sd�P~d�����%�T�h���	,��;�rP��dvh����S1�����`��
EM�T`����ra�U����c�DEB��R�z�+D��N�hI�C�Vd���@y7��4���U�G=�$5��w�P8�*=���p��$����7�����f��B�A{�4�����L�Nzj-"]d&��p/���s���/%@{���c��bM�A�	�Q����&�Z�� ���l@?J�l��;���U�K�kx��	����[��F�9h�D{��B��2&�4D�pW2)RB|S�&��������=���Hb���(�\�q�����D�d8T��2c9R^m�W��I�e�����2���7^S����(��� u|����=+^��c������Nw���\���wmm>�cm"Q�y�q}!�TU���W�\���
��E��Jr���fK3��:�A��v6���u;��.z0@��l}�3�L�P`4X$������)p��\c�zn�>u�6�LSz&0��%�k2>�+���jPR���6CHT������v�K�����F8'���U ��n]?��Tm��v�	u�QLA>��8���A�!���9�0�����N��I1���A�?*���
:�E@'�^hE��K����M5�9���9
��p���j�Jo2�-�:�����������m��"[�^wt�����+>���}�������:�����o:����Aw>h`��jU�u��x+�BV<l���)v@�S��]���p61}1��wSt�&O����F�����DI���N�AWe.$[�9�kx���U�����O�D����5�����!g} 4��W����U���WX'�s�]�����_�(����d��{�V���U��}}u}���Zc��/3D���=g0|���x@�:�W��qg�8�q[Q��l������	���ev_y��N �C��X������3L���Y��\���XV����.����5[��F���:�pe���Y��n�.�U����6:�^�dO�2�����7��8C�&��l��7���������������C�EG����FHM��=���'��tB���b������=U���p��J��X��I]�p��u<C�}�-]�\�]�v��?
0oA�2�{c��l���E��g���Y,��Y\paM��r|��>�v�f��(:�����=�\�
!�!�G
�nn#�|���'�����[�+X0���/%_&��e��V)����/UV���������iiER���6���|�����;y�]o�QR3�����ux�*���i�Nj9���PA<��x����\'W�W@�i*�|��(��	�&Mr��T{���������/T0�y�R��T��J��HQW�Kt���e���$��� �k�d�s��n�T�sv2�klu��-�����/�J�UUR�^�X*�����fk�����\��l�p"����tk���a(���F�Z��})��P6\��gD7,5����/B����9�o�������Lk��Y��k$��a7���4��S�K��|7%<i�!�mTd�?t��g\��c��|����-����� ?�a����q:��1���#�n����%D�� CS�\����C��_d������+)�]�T{F�5�wv�F����U��u���xQs��M�tv��7�
���������Un��{�����H�K�O�.���
>��L\B�i�i|Z�D���X��D�4�{�?���(�mCVg��q���klW*��B����w7]~i����M�1���/3����.��<��v������>����fk�`og������^�����{��9����h��������yE�������~����w��������`7�E��`?�����zb�U4Q������S@D�����NFa�>�46
��3���Ln0~�7k�����i�5�����:�L����j�?�k=�j���[�jN��g��w��'������}�^��������L�YB�C��>n�A
�H�+���.U>u<���Nbl}��o����������:�u�����8I��� �:��3��PW��G@z3�0��B�v����+*z������u�d�v��G���5~�vs�����n���������|�p.���g\a8Wm\�Tm�2�����Aq���j�h�3%�{����b��X��s]�;��|�����^��9����n�����*����-V�l��Thsy}{�#����	J����i�����S�(���No��������O0.��5�7������_������*�������B�O�5�'�m���>����*��K4EE�������@>i��y��W"��������b��I%(8������-����V���T�D�`XT&`n8�J���R�~�l�V���R�W.�J����������^�h�����v�W-��k�VmMR�N���-	�������N��O���vn�p������i4��R3.M_��;�<~��n_��t�s�!Km_o���>�/+�;����{�2	>��~�k�&~�����r~����{��[/U�5*^b���h
O�_o�'�����_�I�����1)���}!q���j���]L�`1�����,���p��%�����L\�5;��y�0EY���K����*���z)�7�DG{��&"��/���K����l�t�������g*�o|���L���	hv����cT��YAk�-`�g��%SY/&uDy�Cv[�Z�>%���$7��Ct-�uv�5��0k����]��[���X'��A��� F
F���\`Jz���9�i58MRr���������?�[9k����C�'ye~:���AU�j�4h-B#��*�/4RG�#\�9+��|����:���������������F����::�/wl-�(��Z�T�o��s�������*u|U1Xw�0�_)���yM~��Z�W���	�"}�*�������a��}������|��u��;K<��W�X8
�&�w;��!M��� +�^������t��<��I���7��m!�����|���N��V����|z����K��+G�"����� os��o�y[���� 
����-jFswg����,]
W�����j6��H#tzA���m<�'�������)q�G�I#�f�v��0O5��T/k#���B��.����u�h��4����S�` W��W�,'��-���#���p��S��2u���K��m���y�(XS E�L{�I�C�gVXwa�^�"��?� ���]�&���:�!�(�m��e���=��������������!t�br��-�{���K�z��d(���)7DQ>���K�&��4v���89w�
}L@���y�����JrG��'�����K��D�)����'��������?t�U�z�C�Qz����6����v���?���,��s3������z�[���.���+1�1+�
1WC���Ziv|6����<�.U���4����tfg�r�0{M�R"��<wX�:)dPp���7e� 's+��U6Fo
�2�y������
�S;�u�T����i�������(*U�t��'f��U���V��!���/��B�
����1��"H���`E|�L�B.����
����`[[9������F������	t���
������&�l�q8������g���D�@ �%T����kY������
�	�U��,�i��.����/�L|�7�bl�Ke�;T:�`��t^��_��>$b�!G��9Q�^�0��D�������I'�o?T���g���L���z�9h�Izw��R.6g��.|-��~?������s�-p�*� ����Phs�����$�I�d�kWH�&��Q���cIEE`�������p�t���x�yu�~s��Mx�k� ��
���e���8�	�+E���T��GA��[����]��l��v3����V�9�im�o������1_��M����6I���90�y�z��~H1�_���6kK��b��l]�1��q�0��o��tx�KAB�{X�!���?�w���.�I?�yM��z�t����+�F7&���8*Se����������@i�qT�������][�������F���0"PVA#~��s�������S4��Q��i��>wf�|P������3����I@����P��GO��_f�W�^�pxqO���C���'�W����������$I#��KKYai)�&���M��lIrc�i���0z&)n�f~z_�i��939'cU~�6�:3��	<~T��[0�#���!J���Q�k��=&�0
o�<��������)��LpYy���(�Rs��k������b�J�jL��[�i-���L���%��r��O��I�%vr/����{g=�+o���}T�yZ�����w���]�����F�i=	���������|����'lQ�*���x��pJ�/����./�_�������:�x�������}8�>k�Q�K�����Md�k���(x?��0T�>FS��D����|��R6>
����(m���D5�>�K��E4������������3�x�Z�4��<I���������p<�M��D�}�p�[�Gg��/�0�ZJ ��f,����$~2:~�	����i����Q2��2��@I�p�q4R�3��R�1���<zq��(��TE.J�G)��m��WX������ws{\�]P[`����q�)�D�-D���lrF���b�S��^r3��@�Sd����z�SU ���Z��^��]��~R_���zC��8���������x���������
�$ge��0gW4��j��PT��T�F=y��� �Yf����h3&U���%Ij�W�m}��,%�9��h��\B�������W��fpr�����[��]�����EY�(��u�Z����):s}R�IJ�M9>�nbE;F]
\������qW&e�wZL�:I���+����]k��,9����� �|F���~��)N�#
mS��)~���g�[�j��%�
�e��p�!���w-b<0d1Ig�:�%0��H�0���p�t�RM��P�T�~)=���(VV?�S��ig�ff
���-��6�:,i��+���7���;����������z���D|`�,�F�e���3)�l��Pd�&�~�����G��������6���r����k�ee��RzF�l�uD(;b��+����[�*��DT m�$w�3	�:��|���
������M4Kuj�Xgk�%H��d4���/�?g��I�>V����i
�]��G����~�`�^x�/;���6M?]���8����yU�������,d8�����b�R[�-R�=���t�\��r�KVI�lR�����i���r�B�����+lT��2��j�
�1�E8o��	:�g�]8���=^��\Y^�:��u������s�i4�1���Ql6�.-�T/���J2t�!k��pi��@N��T;�n�
,���K�,���c����K	x�|"�cB����L)�����(n���G4�u�{wxx��
�{��Sp�-�T�u�lSQf�{��S.D����jtu����������r�.:�@������H�����z	�$��q�T�9!�7�����i��tHj*t�J69�rH��)���`�,E'��.��TL�P�pj�V�d�\����q���s�������k�r!i��9����R2N������M}��LK��b	*���:'v�n��]yM2-l�w�7=F�,��t�ewB1VsR��<�2i����]���L��YI!nf�NH2��g���`u�u
�����@�2������bst�`�e�`�5a���:���=��9����^3��:�x:��U.�#)F�Uj���U�4$,���B�'�#��t ���)q|����2���W�v��ze������K}�@-��YB��3����S��a��Fq��Q	�K��~L�r�Fko����\�/s�_����?eF�6z4��1��S{���33��(�I+�~�e���1�t��s�\��Fe���Q*}�+�b^L,�R@Qm��@��s�+ Y��Y��	�W)�E]dh�H�y�r�O���|UJ^b*R�����(�8��)�2*m��T�|��N���
1��G\�6[�Yy�L���?��#��������k��.�:4�g��&D����=R�p����(\�%&\Su�P���������d�W�F�E��EL�����\f1��b���7W���B+�S�aV�:mY�t���)��������.������<l�3��v�3��>��k���b��'�f��l>�����s��|�b
�}�Y|*V�l����3�ZO�]����*k��l�(G���~�y�����v+���3-��l�}�1�H����~!#{����OtJ���������j��T��l�f{S���C�4��������+#�����3���I[6�,W*i2\�W�3ho�����
E!n� ����pu���������0����EF%`(��L���
}�������h���4�?���. 5��^�
m���(���#��s��� �����o�2G�?F:��#����o�6�Y��y��`]�3�,2O�%x���8�-l��*)��j�����n���Q=HS_����F�5�D[���I;��!cO�_��i����p�(A����i��;�l��2Wn	��m�.|,&��
#fY���zeKs�����T���xwSu�)He)�g�R�d�����W�!�T&q�l�nd��Z&�H%���|{g�����.�]����tP8�*B�M����&Wz���ZF������'��M���X�|�E9�_W������93�#��D��b�{�T8��������(G�v�xk�~(�r����\i�*��s~�;?G���$\�f
�X���G�*r�����(CC$Y���R�1E�����J��6��5�Z�a���J{�D�:p��!Z>��$�6X��x3'<.S��ln]������\�������N�v��\8�`?��f��I-��L�6�w��hTb�(<jW���&%,�����������FK�}����A�UR��`�>|P3�B*J����U��#��\�����M�L��H��!����V��
�J4��$�sQI��������u�����J&bL�P�-������C����"��j����q����
/	_�O�et�����G�O9}��^��>��^bi"F����{��)r8���Z���.����W���Q|�I�/���3[<�-��GX#
��b+���lg�g���Q����p��S.~�3����p:
�c����K��U��N�0�P1��������o���
���2%��*������� NB?��a�	=�H��t#	S��eP�3����mm\w�@����J�=�_�u�w�%�U�S!��������-�"���	�`�x�D�D���pJ~�g��q����b���s��j��	�E�5F���(��g�I�0*������o���i�4��s�'�a��r��Q<3/
D7����x��6�s�@w�RefU��o�����|#�b�N�K�j��l6�7�#��@~�_��7����O�����0?�z*]�t�r"q�F�������uoZ��pY!-�M����X�c����xs��2<u���s�d�2��g���-�s<�_��=a��B78���~x���*���'�"wT�W�o����L2S�8�hS��K��\���&1n}���;���V�hBM�y{$c�!�K��;xF1k^�U�@�p�h���l�J��S��T���K��P�j-��C��yo��WU�����*;����!�H/�.�uBI�q�'x�l���O��,��Y�l���6�������b$�3M2��E��n���arE����l��!i�H\���as5{���y�R������%&�}���C����o���Vg������QP�G��e#�������8��������fJ�J7]��3���^�&��������6<A_[#5���������0�H+a���}����q��9���^�qr��My����f�i��#.c�l��w)H�o|�7P+>�:Z/�
��RkjM��+C��!�3p����i�;�&��(�K���{��{�x������r����8m����%CO$#@	�p?���r���e#*���>u�����q���tcGg���TI����<-�&h:��tO�O�P���T��5U���G_9C��'�'����s��No�)�Q�2���]�7�!��AO���M=�{
>q����Q���Xx��������@��'��P��V����\�=��f4��h
E�E9y6��0�v]7H�:p��o�B@�;�S�yV
���v���\
��"��&S�>#�.O���5��"���A������&unJv�^t0�Z7�Qn?9���O��X�G��SL6�a7�a����O������]�wv/H�d�?������R9Iw���7������Q})����
���8��4��B����7�k�+Z�� ZW0�@����`�]9P��S��XT�P�<�4����i�q}go���UO���	���$�)�S�X2'��f��8�^u��j�y�uw���+�A*� �����XR@�;���c\��9�|sM#��������h��$�Ns���H ���o�'E��rJ<� � ��#�h��bj�4a��q5��|��nc��p.��]I�s�'�E~�������4*
Y��I�Q��2��X���%�<���o8!�(������?GD���^�����i	������z�{G�}sw��� }!��,	�<.���8�'��Pxgz�k�����D��fi5�T"UO���4�[Q���> �N�7]i��&���]�r���n�g�����^8�{C.u��lK����z�L^�_�'�p��\%�fC�����G���>���hK���a��3o9P��x���D�$�rzf�S��2�i9]0�����a!P����k$��K�n39������J<�y���8Q����!T����Qe�
�<�3���#�YQ?��p���\w+�
���g�!�{��r��_D!�US�v��9�l6�Gf&�. '��C��g�Cjrj9�����$Q���o��p�*{sX��?��������6���sf�$�n�d�@>����'j�3L�4���4���W
t+�P��L��_�:?e�ML�5E�v���d�/6+g��l�d"C�("�
��[�W�Q�/|���%n��M�i�-��I�=+7�[+��m��������	�zi�!Jj�m�)�v,�N02�,(��-���IG
H��&�:�D�
�{�}���`u$3���f���G�
������s���7�E�h:&�L�"7�aAH
��GzG��(����ow��)<����!p��1�\��^U��
s��:����	9ZS��=�":DGx�'laI���A)tLc9���+���z�mz���'��6�d0��z��8cBT�#*���a�^Q�HF�\|��\U������u��S�ba72B8����seM��R\�?D0w���������3> ��g��Q�\\n:�ER
y��v�g�$
�Yk�Nr"����cd'����_��-�0rj�������"�5�H����"0�hD}��+�Y98Y��j�Q���
��rSeiV)��(�V���%4K����$����d!�MK,�9��-	
��)���h��0�B�jq����KJ4~>_�!�K�-�p�	3����]�P&��/�2�B����B�zw�tm��U�Uo�+���y�����N��m'��5�ur��� jH����Lz���'F�W����KS��|����*]�|�y37Y�t�S�8� �S�):����;q�/������3AZ��]�����
m��\�7n�S�9h�#�]Jz��`�5�g�m�:]
�6��H{�0b2h��Y>�%44f!�����9������47��{���g}<8�8o�j/]�G�~O�Y��%d�'��@Q�y��G���_�����|�;���� ���$���L��!O8d��qq��0��q�cl��
�dxW6����^��Ol�}%���]p��o��80���;�q2��I�!_���
��F�
��2�tY���#�C���k�����.�Ym,��N�,
*zC�����5�1���pB�{��7&e��C�����n��
���k��� �n��O����M��K��[)���%s�j����~���
��{\��l0��{�P��/�F���:G��f�������o���fj�
�o�\d����Z."����(\(�7L;%�m��/���~�/��r�?���
Y��bS�}I,k�S"P���v���s=��R�� V����S�M��I��[�)���d>�4�~��u�a�
���	��#6��#�H�W�&���,]/�����A�lfku��xg������s9�#����-��NBf"��y��7��;�[��30���@"��]h�*��e-`��:�?�J`~����s�k�]�������W��w�����=7I�cRF�V2���%1?zx�.��e�H��j�������4�sor��|>q��W8��;���)sD�	�p19H�g��0��n&��#�v�.&���:M�##�Tf�G�n�u��w���jtCw�Z2A�����
�������)��I!��.��w�1���%�t=�����!��q������"�cO��%�W�R��W5u*C�5�&���~������M��LZ�x���^S�9���R��9������D�]w� �j�����L��O��f6�,�h�L,Hz;�y�9���t��{+D�c�����+��^����$:��1G�z������_P��|�Vz�SB��)>�+T���|�����\���]Gl�&Oq�����+0a��dl����"�����0���Z���X��fA�/o�Cy��
���X�����`���G-�h�H�^p7s,j3������"#n:*�����v�yMS1���uIN�A������(=+��f��*�1�U�o�cm�` X���0<SD��O���PY�ny����a2`x%�,I�f
�'U*&[�8���60I�L~�j������q�&P�~�o�X��i5�A�)�c�	N��fa���HA�.�g�A���P>W%��P%b���u���t���3��v&������������L���&���#�L�r�!��������3���� ������"�����x�����/�
�����9�_QE/�T���	0�-���]�����������������7��g�����l;h���Cs"�_�����EA��t[��+=c��5W��D��Wu�y�e�j�O��T3�
���o^����g�g���[1|��nB?�C'!
t�����O��6��-pS��=�3�W�[����
�}��$�
Kp�^ �]��P��j
�����9��_~+��(z_�&l�:�H�Y�������	��E�6uuq=���ul1#,�3��)�S����MC]EM��N�IRP�c�v@�z��K�C���5��Q���K��������)-�v�n���JG���Z�Q������eo��o$�D�U�:����v�b����������v�����������Y���|�e!����t]�yW�-q������s������>nt��#����}<��/d�5LVg�3Wg��)+��,�
����3b���k����?������������F�DW�9�K=�:e��,���e>�c��������s����'n��E=\\;�t��V$.9�RJ@l��V���ry��=����n�1�<E����t7h��g��<y7��Dy}�����R���#��f-�C�����p�xw jo
x[��a
(�y9���"�i�9n���B��f�A>�{lb3���$d�X�N���W�e���u�c��E,"������F��X�l������X�W�Y��s\+�e>K&|������dK�����M�V�5��VLY�&��IV�����$��{2�v;����=����/~Y-��`�-���8��k��L������G��������f�����j4~�/f�Hd��sf��/�,&sg�����,�j�*:��>&�����",�q�Fj��um&���g�O�um����b-�����-rvZd5,b*����>���fM������`���l/z�G�gh��;�i���:%�T���a��L�N'�S7-{���k�U_iXc�#�������Ur�����mwY�=���r���V)�=�����0��
f���\��`w���7�������U�eg:ZT,;���K7[���%�R��^+q_�F�_�
����32���(-p9^S�*x�Ls%���u�������up;�Ot(�a�u�S�1��t�b27�$%��j��s�J�!���:��jM���I�0���V�l������jb�y���D�$HmHoI)z��,M�z���c��m��|s��sg�e���Y0�3s���d��g��r1���>j�������|���
J<T�)D�z�����n���[���!(����>:4#8�������+?�%��nIC���!u������5
�_W���>�l4������G����d����a���^���o�����Q2n$^�Z#���G�
�TvX��� ]��V�������v�XK��2����]=d�dK�O�����V?���b��
s�-��c�x���t�R3���������"����dHH�PU>���IqB��-��}�|m�5��|��9S�.���w�x����'qp��4�O�?/��Rs�@miJp�fLH��{/�'��7���u��	rPB�?�%��q�I9�Q������.i����r���D���R{����&*���v��W��j�d��OPg�����I��	��,@�B["���S�=)����1�e���dN���6^����6��o4�1t�{���y�6���+8���SnT|i�Vj��-��F�������iRwz`������
�"�;G;�7����u�W���40�-aKG=�|�L��������RV��8q��F~��$#�Xt����.���\���jd[P��	�2�E��p�C3�u!g#GI'N������p�������I�QM'���QT�o���0+�cN���,�"N"�q���Q�����\W���G
7� ��l�������.V�m�a9-�vS9W����*Jl��R�$6��V��M)	��j*�g6&���_�`�L0F�e�$���R���-��U�b����S��+��]�����l���|������g_;��t�F�������L: ����g�E��*�T�Z��+�3DB�$��>�}4:1NI���������� �3	3�K������1:sq�T�3�����7��|�������Bd)43�2�7x���Y;���0_���, �i����T��������t3�^1�k&u�(�Y��0��^Z�C�
��b�����=�����6�9��H�Bm�'���[m�I�*�0��?�Bh���F�H���G�"c�1��@h�2':�iM1]���
���h$ >a|��B������n�E��
�����)���5="_��Y��v��q�&W��8f����`6|Y.�(yU��D`m������o=]���_��.GLv4������6Q����)h�K���E�'�1�YA�������n�t�:_]L�Ux�:����\���{;w"���v&A>���~��������Aj�O�sk���{a�s�3�;O���7�UC%��@��6Fr�G~�\�����}5k��3
L]W"a�?
X�Is��d.5���a���z���TI���|*��#��'��g��f����+�� �%[)()|���%�����.���p��R����K=�!��I��&l�5L
��,y�H
�=��Y�aUQ���K��q�X;���h�n����
	'D�1G(+�s?���{��,�al	���F@�?N�������nq[����61lQ�N�b�]�������B��G!N�^����������R����4d��D;[��x��X	� :v�x
����xw1cN�8�20c��������2]u�(sY��Av�W�
~cx�:�=��PhQ�Uz�������B;��/��+3���Ln
���n}��
��6�����>���$FGD�Y%��es�C�F3l8~��5C6����U�UR��K}�BL�45W v��Q�����	s���T������p�S�6���J'�}&��31��d����
�.4Lb�9	1E~�P��L�D�g&�k��O�'7�6	n#;#���"��<PZ$�}^D��5�o(c�a"=)�?��6W���>��"��$3,����N@n�@�O�a3>*�f6�%\\��*��9�
��\�|_R^�>H����9�����
�i�.u��8u�*5�0lt������+���V�b�I���hu�~�>�F���[�&"�����1���	"���e��a00�����8;u�.�dd,O
�)���"�1<q��u�I
F�q6���Q��F��v��fU��*+V������9�47Zw"���ZX�:�'s��R���>R
������m��n�5����Fc�`��;��j���ZQ7���0-���{T��N0u� 	p��I��4�bN���D��rEHE�f���^~�n4�����.0Y)gqr.4��>�CV6-��e{��B��;\����p���`����t�%Wk7�H�T��S��"��E������[(�wR}��>�%��V�Z+o�xq���������3�&]5_w�[ �&�*_>'�b
��/�6��������p7r?�n�2$&bZ����n�uu�U��&Q����q#����%��M��_���6��f:���L�k�U���VL�s�������v���
	����7���o�H���fu�d���H@N��c�O]r;R��*S�X|�3���T�6_ae2J���2�1^�DVdh���l�P�0/�r.��h��3����5_�VY���5�%s�.�"�����5��]71"�7X���'	bL��[�n�X$�v��25����)J�J��3�
2�H��/��u��
	�b���r>M��h� BJ��(������d|�(�r�e�����i*]5'X�u��\S�NI��]DEE=6.����H'w�H+�mX�S?�P��u�������TE��������lk��j���n}������9[PIm�#�����C��	�����$4�%��Q�����1sX��c GC��HK����n�,;z������
u��b����l�k ��&h>�����x�.���_�C6b|�h��dL����%m�V�s�\�+d�sE�}�4Z����M�w����+�����#�YT��f6�����GCe�����3���VVJX���Z&V�����}���a;GX��Ws{i��]������['���v�����!�n
�[ ���tc��vC�r�{U���������������V����������+	���s���d�j����?����������A@*��d��V���G��PI�<-a�aI~���Q{%���0��i���*Z���f�����8���E�����3MD�K��;lN���Un�q�0U��w8�
�k�{��[�Y���b�lN�&�tq.$$�1�V2��q8�]�6O#�lV�8����t��@G����EA{�p�@#�����T�7�
:��Q���1q6W�T=/�D*K�)��������n`tr��'�������SR=��kl���D���6)6_�G����/*?.j�������E>k�&"B��J��
��H���z�Px
�����M�X���Z�����O��PMi�sM1��W���cY����`�����9�3�J10�\�/����o����Uk�<�����9>?~���+��L3��X`��zf���L����.�#>I����J�>�����Di��v���V{��^kU���f���4$���zS��?���He���wm{[������}�=���(S��C�C�+���T��������?W��=�)&��gi=����Gm�~y���'��� >��SXa����T��D()&�\��Y����OM�p3j�����I��?B�f���g���N�RXW$�Sh�e�G8\]}��kt�@��{�B�a
kD�P
�>=u�Kt_w��.H^m����:�x8#�E�)�*��SF���[�W"�H]z ����/��
�U)�E�5cx�QL��7#���V�����M0��4�8
M3�v�[0h]=�B�*��(r �K������rc��p��H0�������b�m�<�'�AZ+���|9`a�Sd��<�v�,��T������%������1��X������	 ��~��*��p��0��0'������L��#��[��������y�`�5�Z��e���S�')�b�C�"���l�������Z�G��s�T�R�B.N�Xp|*�����xO��+]M��e���.�@g�C��n����H�|3��� �w�hB�����1�1;=�-"�6���h�f��I���a�MN{�:
������T��ut��hF�	8��
���1�CJ��t���zb�A<����U���P�pR�~e�q�~��b�f��
�c	Z��zL���U���cG�
t���M��}���:���T����S�P_t�n�t����,
����YB���i��-{|y�9=�>��������%������i���������z�:f�
�%?��U)���/ICaB+v"+}Q�2��������AR��5db'F@�I<u��dIT������Q9m�1��8^������3�Q0��Um��|_��dQ�����,f��Ew��&%�"n4������IePk-��v�������P��5��5����(��x��L��l���#�>����P�O\Cn��G�}N�d�����a���jb
���p���I��>���r��b���}�@�����d��T i���!��,���/%����Yw(��}�~�L]Vsg��
���(>f\��xI���:���ary.��r�G���8�	/F��b��S!���O�Q���d_�O�`�P��/b~��?�L���=sj7m����yf,����M�������.����0���1�|�l�����S-�9���D����h���U�[{���������/No�$wz�$��:�L���z�9���$���eg�X��:�)pk���B���R�I��lJ�$1�l�"��k���g�FOAb�����|����i��.��KWa*���YF28LuegEZ���m����H	�/�5o:�`M�p��~QG"�J�ch_W����~>��kwS}����rS�s�]���wW�tv%�iJ�z�<C�����z�������e��������;
�#���[��}��Kj���5?���sB����$�k����Z�w��v���tx4��6�N��A���K���:K�1�)^�/�z@��[�s����j�^2G:D>
���������P�uS�����Ap�28��k�:1��k~��:�
��������M#�p�v`�����`��D.�'-��a�97N�1S����U�h����C sKA����<T[�E4��[&���x\.��!�U� �����r��`�I]9��t9���9�p��Z��i�R7rC�G��	b��O4'W����j�oWQq��J=�=h��A��?�k>R�~��J��5�����m5�����������:N�s�I����Kw.��1f�IL$���+"����t��5pU����
*���c�S�a����7���y&�\?��)���/0+%�+��SY�c�����/F���~�����o�NA�r3V����<���R����<�.J�G�e@@N�%��e��O�����|�� �Y`��IQ^T�<���#���$h�s�<�����T\�����bV(]@zk|�����>g:�����&�T��+�������6���C|��<��-���,_��=�=}Pn��y����Aovj,R�w0B�d���{�v�/��/�M�'���4�bw�C�hE}]Ko�]��A����z��yq/�4��h��lR���^RI?��(��<���K����<_�2��G����b��D�A�����w�5[���h��b��G�1�"y4����]\]��l_�x��o��w��cO�����*0E��h�����C�tv3����F���T�� �	������hIO�X��������8�B�R��.c�,�q�M���}~}����������[>��&���0�0@Rr<p_`���7z��8���>���{����L
)1��]8@���f�+�h@�9f���J�O�H�ts`�H��d��\�Vj���j����L��&��[J��F��*!�o2�Nq{�7�g���H�w�V�������g�������n�R�����3:����6�M�T�k�?F��=�a�;�;���x���)����h_Q������9=���Lk>�m]����V�|w�n����e�����'?z]�Z�Wn0�?Sz5w���@�E�"�s�;�_�~�&�$.���	�CO���f}������7���Ci�2S���d�hcT���zjR&x�2/l����-�&��&���Zi"�/�o�i<���N�;q�0{��Lc���C���n��
�����V�~�aS}0 �-
T�/G��Re�E���z�Z�cf�@�1���/��R-�U����[VY�VpN

D���Q4�ML��*��<�h�����A������)h6LSId�8�g�8pr�xt�-���E�S��4���L���{�E~���<[����W���0�\OW��f$;�#�-�E��y�r3U���YV�C]�v�$N�[� ���/2S��>���%�����n���
�/-[���hc	�����mb9P.�������o0L��n�:��"���C)	���+l*DvY�9P�x+��P^$����MJ�S�>Q&�K?�F��*S��Io�$''Ys<���L=����`U������b#�������+ O�5�/���g}�rU�]G�):V��bS���
E�*�8#��x��6�1��{3���c>�l�����*"���/e�8�oN�Z4�&�A=��E��b�����C u�����/g�	^�^�|��n����i8N).��'�|�	��0����I���*��d����/��<�'�[�����[�K`��+�N�
s�0�a|C�4�96�
��,��c�+�'�U�3F�$�B4�RW5Z{���`O!��n"�$i>�����������>��X�xfp���:�(�U��
�������P�m�:o�� wy�C>�'���b5Z��rI=�S�{�����?�ud���+�+*�se$!��`���m�N�/��w��@��h]���������rI8N/���	 �z����"bB�&	��������I�w��$��U%�Z���+[�K����x2���$�������Q��7fk�ma�7��A*Lda��~9(��e�j����7���,X.iL�!���"p��������\�,�n�9+3����]�9�o����d�L�������'�s�|���&��>��L�]�)��H�/\� �vV<R�Id�]��v�v�:^�����Q}o+~�R
71������gA+���7����o��-��h���+�FO3(��B��Z�g��V������DO�h�����%H��d !k���x6�TJ6�y��@*���sa�6btU��z����������Y o����%�� ����u���!#K_���I�~,�C��}��
W�4�f+���FQu��q��?3}^ej���
&����p
�"EHn�_KH�'n\GN�����.dI�������O����u���N&/�a��r�J��5?�JZ��)I\G^�he��Uu����{{��@k��8����P�Q�8�Q��9��{��Dq���kwR�����������j�zC����S6�t&��wr(!�G��V��e�1�*�Yrs����@sB��}���W����e�� 4G"�����x*J4��k9_�\�x�����I�Kl4H0����-uR�D|Z<5��tH:qQ���x�+��&��K�����W�������? �n 	�I$�8������Ce��8�mAh�]NV��Y��J?H+�/���%����
1*����=�!�_���Od]>T�4R������:%[�NZsz6~���$�554:o`z�r��rS�eAi/��i*>8�y�X~E�}����q�G0�u`~~�N�t����U2���Js�Ll�g*��\����#�1��+��e/9��8�m�i����S������6���Z��`���4���������$W��X�j@�zw��?`JH�DN-�����!���^m�$��P�I�����>��N��i$����/U�G=�V�t<Og�����V���B8lcN����KpAzQ��6��
�\z�����!�\�����(��&�d���<���9Y��[�Hc&
��%
tG�h�z#tA	N�O�4���0=�4���s��;N��O�,���2z����r���Z�J�4��@�����+�/B76��x�������g�����*�X�`�,�����)AE#�@�f��u����J�U��S���i]@��Q�^�Qf���N'95��D4����O�i������(����-$L
t������h�T?@����,��CXVC����`�o�
$T�P���*�������H��D��0x%��>oWv`M�����u�'(����-\���)����L0��rr����u1w�c��6����-XD�g��H������7i�)@V���V0h�r9��Z�F��J�D�?���^��n�ZH=�����":
�q�c�6��"
'�;�u1��W+�y�V���hG��.���GKYV�6�I���Q��1�`�a����_
����"��]/b��o�����[��k���1}�26�lE6�$�s��b��6����
O��&�V��m��r��O�Y�gJ�Y��~����^A��A�_��*��f�4!- 4�����W|�
@,����@�}�+��3x��,���e�\f�\:�� �?g��4��?�w�}m�k���y=�f+�-��e9clA�G���5(���w����d��OK!����5���c�\�B�_����A����2�{TD
�D����.�\�c�]��~�+'����w���L���,�7��^u��h6��
Zr�4W�Oud�gq�}���O���W.M��Pu�lH��
���K�{��@A6?1��,zN�D��>FL�=?����]t�?:����w���j����@*��M2z��D���X�����8������-OWP��<zTP�9����-�^�4�����$����B�OTQ*���-��
��V=:�u���s�����RG�����^�oIe�(>QR�2)��?��F�DK��;h��Z���]����C����}w�(�z���B����8:�����@���=%N,%�L	ZI&�D����)��o��5��:���{�������3�����������O��.���L��5"��gF�`W��������Et���|���d�O��$5#��]���V��Xy�$C*X�~�q��}Q����:��p�|E
6�Jf;�e�W����t���
27{�zE������e��v����((����(v�����BSl�H�$�^Y���~�J��!Z��,��ga��Fo�V��h_ %?]�7h- �t�b~�$�8>{��%��T7C�c4jr�Y����%<F�<�B�&�����Y
����@!����`:�'t�\f�Lh�j��	�K&?����X�&6�*���`q���b�CC)��GL++%p�#2N���c��58�Z���P���LY��j��A*��l�e�Q���QJ�Q/FK)@�s���S�z[�R�[k��t�u������D�{��#�C@�7�B��J�,9�)�9������y��(n6�P��d���Bn��+|c��Cq�h�x��sA�����y�#gA��Yh#����3LFW�k�#��RRo���W��?}sta�WdVC"�GM�z���``�v�S};���<��O�\ptl���bb��I����GS��������5e��6�V�&r��!�/�{�vo�g��F��J��Lf��)���&���1:��!B4�4���<��R��I�VY��8����p�fd�*��(T�Oyx����.e�I��.Voo�U;i)��@2��~,�������?�U+Dq�����8���O{X���\�y<��-d>����i|�$� ��)C����~�����z�(qy_�
��@�����{��N�3���5�o����=8�����~wl�g��XcM��Cu�u���*�����3��*aDPL5�
.��L1��������{�1��7f��(�}`��3S��]�	��;/���ms�f�y9c"���4�����x�0����q��[�Jf�z����b#��7�v��P^�����L����m5�	s7�<x�
$��W�(��rl�e>����Sb`3�5x��%����-�wz���?N���xePE$�V��P3B�5k-D��n��r.���P�d!G�,�bc%yv��;�4�}QE����Me����l�	;v��0�E�� \���|�=� M�����m����p)U��+���ds�d<�3F�_d���+�ee$W�2��]%N���qc�`�����*�D�.�{Q<z���w���������f�_p���fS��O]�Lo�O.Z;�����')��^��{-�)�+_�yy1��d���am�m����"��R�����W���>��86l*�����W��&o��n��^Ks5�O�2�u!3���3	�����(z^�y�
�S�~��
��~di�w:6'��7���p`Bg-�������2��%y����d��>��%��n�!O88C��sp�g����xI���M���o�Z�f�:HI�.<���!L
�7��q'�����{�\���������	m��9Y��������Ea�l[%~�<��KXE��S��$�����Bu+L��mU��P�p�}Q����'�#Lz��L���PW�����`�
p2_�@�����ESog���Z�rk�d�I��v�[	�/]�u���?4��_B� O�n��R������7��]3���d���!O�'�5C�2�Q�������J3�J�=Z���T�JPVE����]xv�����`��&j5�[�&��oe2_EGgg�g5�Z�����4l���W]�T'���%��{25�rx��� _V9�#;����EG(_���������7�:Yw����q|/A�~hU���0�L�����
E�9S0V���<���~Y��<����
�5t.\����`�Z���0���-�B�b�
���w�-���:M��$����p3���G��O����W�K��~��������UN��~����l	������/"��d
8�JY|u��:g)������&�T��ZzEkT��(�v��yE!�����������p~kBh��D�vE�P��\�����D�3+My��y#��|������<�w)��,�}��db�H�g_
�R��6�W��:h� ����P�t7��0���X
C�3��w,A7��)��z�� ���U�n���H���ws �dy&	��T-"���G�Y�����djYD��:���x��\U�W��f�s4o.VY�A<�,�/rf��gI�!�{K��UA+����X9��.+�oV�b���;�L��)Up�|��Eq?�X�a��k?:�V���%DNJ����~��'$�)����B�}��uV2#�8��v��M6(ZC6a��d����]�eW�f�J-7��f�sxeW�fL�h�3\���f�1��h������|���������O�:P�R��.�9C��L0��dm~�����R�nK���	�������p��3^gF�������<U�vV���T ���:�%����P%,;���C>��s	G�q\���Y���G$�*5*�iWobGO:�`�X�)�7g��1%�OT(���J�+[��que���}���>GH�D�(L��N�*�@�=����������"+o�I�����;2>OA�eQ9/V��M�W���q�X��N����G���,#��=�s���/����n0^^R�Zf�J*�
����K`kj��NxdJ&[,~2���Ix���^Z�VA�f�gr(����`d�J�xjA=��T<�\�`�+a�d���N���R�:�GTP�P8�:������XJH�'��}���2��������s�_�^�R+~U��e�j����`�Z��� ��%��W���,�������������E����f}����?�m��S����N��OO��WH�yH�\F�bJ��xU�xj^����9�
9��p��Um��U#�KE2c9����Z$4�g�MQ��n�v5�}��f���_T�/v~"����\i,A��CM��x�l�#�i��.EK�)�}/q�A��O9����Y����%k����=f����\Pb?��	v�c������w(@n�A�B�(�����2F�d2��A~hO�^�����6�m�~��v.���Iz�����%r}8�|6���=��dH9�(��F�p"��A�z|��~*�W�g�i�<S�����<�Md@ris�Q�����@2�|�X��v+�^`$��e�dQ'�`G)ubND{����v�|
r�X$�y�_�@F2���������zbA7��9p��^�������C��Dx]	j��+��*� ���3��� ��c2�BU?@|���d�+��iL����69�$x
0�!�����aY��>��2�H$����[?E�2bmi4^0�K�qV�}�G��W2��	y�{��MyD;9Y�d�5��Ma��l�a�Y���	������b�1���
�2^���{��R��:D����.v�c����)O�}M��%$�!p��y���,!L�V��q��G��FX�bCF�!C���w{�j���!W�*v�~�[����,4��3y�X�Q|�O����^�z������N&_�Sy�������	��i��!��z���s��B��?�V#1��!�,f���Ty���v�������
�D���u����~`��zOy��B/�n��OTC/�1q�S.����'�4B&s)I�X/��[� Q	g�����L�=�8�dnnR���dhSY��y6C��X��`N�W6a����0u�$5;J�p�|���ar��c�;�YH)X����Q5���F.Ge�$A�ga�GM-"�)�r�����h���"�n��N��-�	*�&��]���Cu�/�`.���h�p�1E:����~v�SkrQ
]�U��U���-c#��������l��_�u%.4p���)K���"����M��'	��f>��L5����������^J�i2��T��8�1�\��L�����DM`IF��G_��G�n�+��JY��4�1h�� ,Fw��5/�����7��SX�zQ���5�W�hX�aA��z�o�Q�^�5������}?>yy���q���%�1��������Q��D"A�c����REZ���v�#�hR!"����I�����*f�!�tz<z
'��O���E�t��
�H.l�`pl��ft�=����y���������WQ��~T�{]/���'�#�1�M&��n�d��uZ�-{����*�0�,��	���"�(|
��R�X�C:C|�Y����fS.�� �e:�%�1��9Mp1�dE�������������V�(��$.w�v����$�I��������9�G�+����uB��:��9L������{�` ��Qp�\��t2���$��D��d�p�!,yO ���Hn����$��f	��Mx�:���������)��'�g
3%�Iw�K����G��Go��:'����o���]�p�{��Mz�Z�`�R�$��p����jE�`V�}��#J���b�V�-�]��.���Fo0+,9n�r�O��YW�K�������*���Gc	'Eq�!R>����! +���nG�<�
�x�JiLx�����>?�x�Fo������
���yD����{v�97�7�P_-���C3H)����)�J���i��c/�������>V� ��jm�=�V�a�>?��(�)��Tc����-���|��g���t����jB�����B��PX�=g��S��!�_�Z�dh;*{(����I.�.Ls$h
�5�Y�b�H���^�N��sm��gd$����2���e��+��"|�%7���sa��l�^?\�	X�fZ?�h������
M�,w*�v2�|z��t�V� M
~��c{��Kf��.��6�Ew�s���2��ir���B;�bP�+�0�q��y�Y�����3�gJah23������@�S�7���e�@*a;ea��\W�LY�B��N��zM�>'�
[*X$��z.�\M�jfg�fV)M����!T����hx@<��������Et�k#�NW�Yu���{���uj�9�^X��s��Vn���U]���6���a��D/��ny���[F��%r��@��Xz7�K!!(��"����9��O{t���(��OaG�a#�a:��7B]�����)m��Gk��\{�����j&F�G�:]5O�j�x
���P������:��=G�!�
�=e�������:��q�r�idy��G)�`��#4{P��l4�M�f�����|w\����Fs��U�O����!��f��>l}�i�����3d%=���O*��D�P��C!�r�c��������H���~)�?m��n�\s�����o�,��V�����h4�{�V��E���z��~���o�J|������u��g�#mp:J.0����H\~��ur���b�p�kj����O����)z7����z~���������H����$����s3Jom����k�Z���,�6���%�R�G�n����RG�4�a�M&�o�����u$�������I����(��jf��u3����g��M��g�Qr�@���b��j�����?N�O0g��L�����-��q��?���!���M*����&�������6������5?��^��H��tX3�h��j����c`�}�{�Y��'��f-fr4��t��-��e�!�7��S������/�x8���&��E`"n��'�V����������-'���3R�Buon�7�	<�<��rP3[ ��V����o�xC��#-��I����9(.��u\�f���%Xf��v���PW�j�`����z�����M��J�b����=�:Ew7%Mx7���c���z#�/q�����Kc7�����r����m�3�F&	n�Z^����E�m�����Fc�o����m�H�}[����.�q����^�����W��E�����]`�l��'V���c���*2��_�g\8�r��z�Q������<sl����y1:GJ
��lsw��<cF�SX�+�-g�E~CJ�
�nk�����c������  �2�'��u���S�����\7�.C��Hn
��J��%�|����J+��`���mo��,u�b��v�����	���+��w^����&W8�dir����F����_0m�������v�I���j-zVQ��5��2���f�u6�1���v0L&�1��gc[���?�p�R���e�O��O��u���/n�j�UY*m�B�����BB�%�o��������Q2���4������Syk�0������A�'�[������
.`qcK]p���[f!�voA[�VsqS�D���),BQR/&�|�m����:r|~q|��o�2�!�_����������n3n4�fkw�/���s�=iC���"���P����#���9|�l�����x� ��D�tt$G�����
��-�k�M�4M�b^��E�d�$b>e;y��*��_�'�8"1����������V�����������m��z�j(�ag����v!�3��#�Y��';Y(���W~������{��w���l~N{>���n��Z7����nt�6���3���U����N��	(����4kO��_��Am����PA��&Y	���fun�d��]�B���2ajy/��h���F"$��V��n<B������ i2@�;��d���?i�G����K�'��l�q���t4�x�
a�Q���BH����{�'�F������+'���t4��x��&�
�����1�^g�����j(�X�\��"���y�B��a�.��k+��gv�1�s���ekY������$�$��V�FI�"�5���z��T��w\�W�)\�
����*�������n���>���cU���-�ZO�a���(�"��m��-�q����M<����lvC�Z������'�2 ��@���1�CC��$(���R_�X�7&CQ�t�`������^}���r����wNNO�VV���Vsa�\6���������W+-	s�`\�t.�����W�����a/��]�]QA���0�`,@�hk���{��Y�k����/�K�9����e��yq�����s����K/����S��FX;��H1I|�:Wu�IS�^�u �w��g�OS	�;%A�P��XC����0�2���/����!
h���(���Uj%JE�,�t>�����++v�g�=�*Gd���$�@,������)���-*q6�JH2Ul���#	��o����� ����9�	���}k�M�h���8FS2���a��<5u�q�z"L' ���?'�#X�����pQ�e������oA�&��������29��H�L���+�"�+��$�_|pd�C?��x��0�4�@te��FE�<�(�K�4��Q�`g�u��O>����B�������$������b2��%������o�f�P I����(������l���N������e$I�h2�Rm���^U��E�����b�^T��h
�$ 4
{���Z+���fT1	0�;�M�D�ai�5�����)~���VZ����F[��e#��Wv����P�t��Vz�_&\�O�u:��]%)&,����2�������0E9��k0����Y��?�Gz=���l���w4Jz�t�����Q/�����`"�f�i=����+���fy��<��`\�OA"�`��������i�++���4z��x6	I�0
?��bx�T�F,���Q�c?���%�e=�(��%���N�n�����	`"�(�d�&'��1h��a��N�)B(������qx5~�b�/���Sj�|��O�����O��$��d|	�;\��X)��y~��V�>,p \X�4���V��D{�U��;�-��EE��K��/�X���)CyVp���o�"��k�?��)�N�U�u��l'VD�;g��-��&��y�N�Y��������Gg�2�#�f�>�3����]Hz�H��:�:��k��i��s�+�<��W��%��<��R�a9������4��<0|�
�;[�Ye��89dHzV�gj���F���f�yIr�F/�����'�/��f�	�|��D�ce�rE�2Y!*��gGG����WGFs]��!�����E���~����u<5}6��
#�]���dqn�-�p�K�����/��4b8"��<52�Y
���=�iu7���I>V����K�,.YL^�NF���6&�!&S����5�UYb���,�%����O��QNzZ��n�K�Q���/�f.��z_XXr	�5��%(F��jf#H��h�
�-.������AJ�#y5�E��B�T�o7��:e=�/f�,.�����]a�j�op�P���.�1
���5S��p��`��n���Z5��e �,�R�8����l.�Z1J9�@L��>��������7��G�'������ezi�](4��V��i������~g��d��]�A�V��j����N^��9�
����� Z������y��+�����;sM']#O���������M
��Z��������^ �fj������{e=ja�t���3`�p=8����?���]8\���	6Z�;�K?�Z�~^?�����]�����l��#��g�nu�
0I�h.���V�y4��8_5r�V�=�������2�����'��������;V,�R��������O�/�1�1�X`��wn��[���V���@�l���i��������
}���2g�{��Q�bs�5������������D/�J+�h�U:FSnt�cBa8�)o#����-J�Ap|qv�&#�����b\��"��uG?9����K�dW�m���+���������-���~g����/����k�1Z�^+b���I�=�U*��g�N�	3���|�X��3���orA�	�|�%���Mtxp~�~���J�~@.CG��{����R<�m��m/��U�w����K���	���@��a1���Jh���Y���(���^U��W�9/%Z>z7r;@#)�h�i��� ;Z o�/z�e��B��fz�kyRc�����}����&����
��{k���s����� 7����M;Cm
������l��-���Jt���^��hm�>��n�]����,�Z��������N?)����)a5F�-�0�����_���M�oJ��rQ�W��Q�g�t��?�U��h�wN�/~���sz�5�W���M���+7zx2�hA�ZF�g�20l��
@�0k@�m��������;�:zya{�\,#�9���ts�.�E�<n�C�p�;x�=��"�5��l����e��������,���0%�oH��j��]���C��TAp��NWW��"�7d���Lf#��\��N�:���GU�?6�X����7��&�Vko�I��Y��w�����R�l3���Vm`	?<X~�V��"�-���G~-\t�b)W���Vo��[�U(3��%�z��w��-������o�������V4wO��W�?�C�����c�,��?��.V���,��0�-M�z��������q(�������7��N�'�����_
2���?)���i�7;���s�
��������0C��2��n���~��
���nl7�������������p��q��d��W�C���l����������jo57w����y������o��N���7���S/�7Cx�oXHg~��������&�����;[���V���<i������Nko��no+�/���=�z<2���i=1��S�?���
��\�o���0�&��f���������F���W^j=5����?��km���v����z��g���1}j~~�	�����������R�s.��4��-����Y�~k�S�Z�������7�"�r�+R�<1	�y���w����}4��2#Q�s�*�#�7�g��*��s�s��q��E\��J`���0���+�7���=��nL���t�N��[���VU=��``�3�O��M�� �K�0�����N�L/��E4�{�K]���T�^�ANK�=�\��Q-|3��d���z��u��P�C��&L���e}1����N\�� '����CB���2���w�����E#�����u.���z��v�tW<�����%J��pc�z�z�]��nn�����
j{��U1�U,��n{������y��W�4��<4-K�����j�����;��"M��� \��P����%x=����F�-P�|nK�� 4�����m�.�X�at����������u�^fn���a���-u��$p���������r��W	������e[�Ew3�lo7[�Fw����eJ,;~�m�1���~��AN���4'����c����o&	��������98;x}tqt�����������\X�v:��^
�i�!q��L����o������f����A��s)��T������B���m��;��#��S�a�L^\�������p��f�����A�������e����C��:�4���x��nc2����)��5q�����������m�	��[���4*��P����`1�yw�1��wB��GW4����<P2]�|���[��D����)R������l��b�=g������8�����\�"x���.�&L7����pO�.A�V��/�C��.���5�f�l7����M��-����1X��5�JDU3YJ�T0Zn��L���j����`����M��qC�w?&�a|�tD%���������|N����������#��p����Hs
�U���7��|y�EB��R�����,�N�������,H��p ������z���.������p|����1��5��x�������
�{A���d�<yO�z�,��(���g>���5�W�����/��fv{m�)���SS�~VP?��h����8Z�{.�~%N�'��D��>��j���d���M����I+n4.w������.����!R?����m�������[f"r����)&	���"N���)��\di4�D�^tN��"�������o���:`��k�������j����=����z�xr<}%���}j�8��(��*Y� ����K���J��e�7�4Z��8=��(j&sB\��S��[���t�9\-V�h�j��aeOWpW��,��/������j�+��?E��]2��9c���-�;L���$�I���:<������RrY�������Q��$�P|TZQN��`z/F>����W��f��^�~���OG�R�/QegN��lb�(�L� 	����/�:�>���V�d��f7�U�z3e�m�����]�����*��P�}<�k�M��������_��p����,�<�i�Z�rS�-�~��E����"�t�~�9�
�������Q�����%�T�q�����I(��"��	�|n�qP/���2���I`A;����-�h4N�	1�C
�GX�n�`&���%y���:�d?z���Kh��5[�[R��Aw,M�����-��pD������7v��jr��H�R�]�
�0��G�2%h�fV�O��G�l�UOjH@`�_�����Il[; Lb`;�7�����}�����������.�z{kO+^>�m����a�z,v���
�����yczv�9C�!~�Hx�K���6z����B�A�l�5��R���@I����KG�~Q$A��p��)�
H	�L�\�J��s)��N����$z��"�fS>�������Sw8K�W������R�����E��{{�{�[�F;I����r1&7@^��5�������wQ��;�&S9*:����+��j\�#[_�<�9���L�xp����W���G�^D�����k�]@B�\�^��iQ����P��u��e����-�)����7��X/�lhK7���mL�������������3�1U#")	�K5n�-c9�34+R��~��Y2��������2�fN�"��G���R�Y�������G\�����7X���*o�5i���&��R�-��P��zT�A�������'
V(�$9�$Hx���_���uY,y��l��`��ny�:w��d�#B��������C2�+7��5'q�
��,��g�����>���X��*V��=��`�=h��!+Z�+������ \5C<.����s`���~A��$�^k3��e5��B/��6*�^-c���|O��pP�{\�'\�K�k��-���ko>^�if����f����v������h���n$�����`��U�K��"�M�P$��U�8OG���
�����9o��8�vt���EtS���3W��YF7���4�����V�S�c0�?�|��������N�^����Ssx����5W�p|q|�:�/9��D��q"������B/��j�O'��:Qg�M��#OX���H�P|{;�G�3J��i�����B��y���]���leA�dy�^��e��ZC���f6|`'Z�����eD���pM7���b�0�v�N�;�ZH���
��o�����B��en���j6�+�L����=n���d���MD�e��Z�~�b���
���N���7����wh�q�B�|����|
��"� ��
&��g��_��>V�"I��s�D��h�����"�BkX_b
V����e�f���q9T<'��d=R���2C�����~��@�\��a_�{ �Hc��C:��<����K�&��{���������i�J'iM�\��x�&��i��Q�$m����6��\&b-P�s;L��L���_�"����u�����s?���x?�rz��n>Z��nY;8�|�9�9�7�^���d�#�����	a�)�lS,Av���}sr�J��-rG��������r�w1oV��V�_�������g/��P4���i��T!�	����uW�T�����X��2�b�1*	�����0H��,�dg�������?/w<�Z����OqDU�X��;�@��N�^���y�fr�7�S0*Q�	��[B��}�XI������Q����0�h-}�sl�����E����t������|��[R�b��G�0��C$�I�N��	�#Y��*�%S�j<�a%��>��Q_�}���zr9��.kL��/����-��9JP�YC�e����1�����ZW��o�������o!�+��o�G�f��M�AIe�Rx)���}�KZ����S�[k�`�M�V��|n��;�k��w�A(zL��SL����^@��
����g�yop�r� �Y>�\���"~����$��N7������7h��<M�n�$�`�:[fD���S��w�C6���hsW�P���s?$������i�������,+�z�9?<xup���SO��a������{-�O��R��$��(���Q��(&�����dU��H5N�dr�E���7�"&I���?@�m�wZ�����"22niv
0�������q�M��<��H���@]B~Qo��&{A�l���i*U }aE��lZ+4RLK��j�v����AI�f6"���`����C�0L67Ia��y�����+�^��qx5���1wc,<-�x�Sj��
:�bUj���- �k�5n�Fc��������Zz�7��r�hs�`�%FW���i.��Ue�g�2��H���f����H�IY�vU�x|���
�SPn���n,'����26W���Hy0*�`P��g��.l��`���G)%���E�
��w���2���S����8�`P;�-��t����Q����f��y1���IO��TO��
W"V����$��
�6�7�+k~LP�$kX%�$(����x6�>z"�����$fx����!L�eGl�B��;C�HDc��	���O��v��B����Q���
bD
`��xX�
��WbYlC\
�y�����c62���$�B���q��n�5n��u�%Sro�������N��{	��>\��}��)��v�BFP~;"�����A��1��q�������0�Vb���z��F�O��}��S)F��W	���'
<l�����Y:#��U��d�z
�1��t�����5`]��c��@�����e6��h8���@�f�r0�K9�>i�Ue�1]Uhz(�:���W<���q=��x���B#�;���{w�4�$� �:���]�������C�zr���)/��)$�F@{i�V��'s����s�����i$K��
�#0I�)����9#���^�'w�3��#;�[r���2��2M&�Z0�Y����F�3�%��'�4��C�����eck���e��?d<_����h�fc�0W|�����v�����S���9�Cd�Ja�����\~G��#��$��Oi0�8�����z<�vg�$���E����O�/	p6����4�~z�1�����X}:��=���+�D`���u����J��"�.< wW�<p�]����x����VLy�,NhE������������d�>�W��G1�T%`���~l�*k��`s��������&�Y�g$�?x{aUA_�6��5��
z�'�rd�pz�I#����3�a|%����8�u������]VD�c5���+�)������~Cg/�;�J�,�jK�XX�k���G��5����1�3��o/6��8�% ����F�n<��;�[k�mF���(P��J��?9??�ly�Qv?���� ��%�/:ot�s���G�~����}IN/Z�|��PM���y[_���4r.�����o4��0j����M����#��t���|�����}� ����qu���C_T��=(��������H�Jc�F��Z��
���NQ��j��������2��/()�����U8	6+;�XI3!�������{e�����v>$�-sNWCx��f���*������`"���)����=��W����DwQQ32d�n�6>��De��\�C�����+����*!�S8�{@%�X���23��
�<x�@�����t�o1R'�X��.0�@)�������?,p��.���?�@����o�~�y�j���^r9���U�%�P/��? ���7IP&�)�7I@D�%�}����-`Kyc�'4�\k����f����|�=�a$��`�
!���G� ��<I"oyj�2���~�99���<}{�����m^�:?Z���go�{l�Y�������A"�R��:���2A�x����s(�)��5C~���K-�^�^��M�E<�5����UY�J����mE�C��������Ol[�@*6�S��m���;��k��������g��w�\S������]������P�F�$Eq���x���UK	
��@m��b�,2q F��,�(A�tL�r��x:�dlnQd�/�;>�
G��U�����IdO���;�j`�+$����o_kn��P����3��_�����7!G�(���]=7-����������!T.������w�>�'N�y�ry������>y�p`aU�wxB_���7M�H�9��2�!x�p<F��^f��/�^�};>}�������sz�"������J�Z��P����U�}A�����!H�i����g9@�=V1<�����
J�P-�.��P��@(YR�P�|(��bE����:^1"�����H��"J?H;�9#�tP7�%������,�����|h�<����_Sw��������6�'�A����F�~
I������p���]	������+���(��p'$�&�q M4��k�:u\���K��cN��`���'d�/Q"�nkx����C�]`�1�I&dk��RT��zT�y����D]?
����<�Z�0�
D^ +�����#�V@���i���c�ea��]+C�0����\���i�Dj,|@J�H�>��2;UU�������^��?��������>i%L����������+���&��-�*� ���P��J������G�6��!
���%�p�Xc��#���(�z
�'-=�H�=d��7�Ym�'}��C��C�Q�
�����Uy���|F��:��+�=�+
N5�9�P�f-R���Z�����1x�����X/Q�|$���T+���A7
�����1����`0 ��Mt��o��[�'�h�����	��N��\���������������,�����F�8M�����V+����jL=�0�����9[B����hJ��
-�!y���f��@�r�2$��Q�-T����!j��T�S��h�$�M-���%������)��H�����������^���f��i$6�.yV"��]ec?��b���n��9<L��*T�r�
M��W��d��3	��R�����L}e��0z������/c_�g���? .g��%&C���x>#o���Nb�>Z�Y*"xD62��^�

����f��HaH�0#{Z�F�*��j�p�����<��!����`8ds�0A�����*��H6:WY$U����������`���6��]`�_��������6%`�8�Z�����?��;P�FDdj��)A�EmZ1��������5l���P^+|7��#f�j�������@������������|���o���p�����n��h���#:Z�D/�K-#HW��5�X��w���?���U3$7�h�Y�0��X�cs���5R���O�I��#c��F^��'��W<r��G��?�6��s��[.I����C�n�	�Ph$q�@�q��z�^b����O��R	�������]�R$0�p����K����?hR*�����P�����)����V�[�*��1�
.��||;�S""��_�����o�lK�z�q����2�e�!�n���T��(�YK��VV��i2��;$���iG�
�yYz<�J�Hw���O/g��!��!����~���?0�1:���J"����7��CG�����g���M���W~@]���c���n����-��,�7�7�f�����L��	������r���_��&�x�G�p�<F�fB�����[v��/pI�����f����`�O�����Ag�(�G��x��t��;v��tM�D�7M�����?���
���(���J�Aj�W=z�W�.�
7mP���������|=��Y�<0�"��1p@�����)���o��5j�,o�v�A�qr(�����}+�8�Qd�t������k^!������8��������LV9�
�Cm����{�2
{����/�
��pUWI1��_��R�$�	u4v���,�n|1MGD#���������V�~��!��W���-��y��a�6���[���s���y���R�)�Zw��%�w�A�#9&u���q����,��D�d��l���:{$���*s���9A��{�B������s
��������
�F[8y
�n�PIO��h�o6H������M�'��])c�� ��
�0�w�%M��9;�?��d�p6�������9�r��7W���������x������>����y���8�/�un|j��N���\��JJP{�Na=���Z�����0E�Q]-X���)���Q�4L>e-%�q;	�J���36���"�������w��j����4I���w
s�,�� E���Zj��[Lq[+�{���n����x�FI�lDN��Dm���M/�&z ':7H���J�	j�XM���B6�����yh�����H�.�Y��N`�=�NJ�%�$������m������t5�	x}���F<�l�f�E�u�������	*��H3��Cm�����un���{��}�+����I���@TJ������n�hO� �6wg�x���)��0�W�W�%�}E��DT�/���i7XQt�m��kmYf��Zq�x�EJXd��l�C�0-k�L�a��m�����Q��P�HG�|��A#N�3@9�O
�u{�t@YYR����uB����b��a"3�=�!�.d��^<
i������c"^��c��Y�w���ZpV_���U����_��2��,�A�����k]�������Eu ����E<=�����������3V��/�z}$��.6����2��{+�/
�"��s�}��
���D�@I ���X�����1�����-#���9,)��YE/�����:��OH�g�m�p����g�,x��Ja����O�kX�G��o��95�@�)��n����-0�_"���X���Z��;'��xeE�`J����#�������^�U�2��YT1MO:��_�M��*s�,�5~e<+J.��ch�_������]U�x�������@-�����N��=�X��7�7MgE�G]	��lj�f[��Y�e��V���-�H�9���<S�J�����r8��
���RYj������L]��6����^R*�bah���P�����������z:�Il�k�����Q������k\);bf
�=�[+���
[��_�/�YG!�0�q��}.�d�2����R���:��4��f2��w��b��W���h�����x��~m��^�2�L�yQ	�p�5,fH��9U���5� �^�5-��+sy��b��x:�3C��'����������p|%���l�Jv�*Fy\�Ru�4SR�z>�8��������_	���z>=��/�d�Tc.��6���������F#��m5�{�d�4/�x�)eoa��u� ������R��q:�xntJ%�1a?;c���|�Hr:������$WW�j�;�8�8{{x]Og�~���(L-�
?Mh��~�7�J��"5�]}��<.�<2
�i�P�!���ZC���<��O*�bg�8�
X[�6U������*�:e��j�`��������|l�u�:(��B��nW���@�j>��������M�g��f����jY]V�#/�}s~�����dzo�$�3D�Z9))�~�D#! ���~��m4�z����;��d�<�(iH�������	�f���J{W@��%H��e����W1j��"�U���68<�O�~�����*����p�'�oc\Y_��+`-=Lv!0�]�_������Yf�t��q���(��:�~em}1����v?�����%7P[x����X�5�I�Z{��U�_���]����~�?|<�o9a�R����l��&H�����x�	V�@�-�
��BEy~; &�=�/���x��M��]e�^R���MS� G�\F������O��2��V�}��S�Ry�`h��9Kc���r�!����u��U���g��z`%�@�t�)p��D%����0I��T��u�T�%�#,�5������a�m~FJ�\H4�Y�q�d4'���,��
L<���Gh�#Gj���7��tB���������XS���c�>:���8�Z}�y�����������*�`�h����N�i���a�w`���[>����aj!N7$5��h�*�,��������)Z�8V�f���;c:LQ�X�Qe����/%~ol�L��[Jn.Q��L)���4�%'���G����*������&���7��2�`7:0�DQfT�������&jX���Tb��N���f��M�@	��k��a�IB���0��4�48��o����h~-��2��$i)'�5��-�(��Bx���x���"4�`�� ��&��r��%g����+����>;��L���h�.������T���CHR8}�������/�a��m�*��dt��*N�����Z�5����F����`GC3�d���$"���(gX^alI��,���}�z<�&.��UYgx����\���$�?�����g�������.�I/����9�"��9��2�8Ye��/(g��F�^��$����C����p-4\#��0*����O�l5)>3���\<w���O�mL�=u�	S�v8S�t��mF3�S6��4c~�U���;�b%��������'�&x$\;e|C���������!2J*=��u�,��L���A.
^w��9��y��U\�-�����$1BU���������Lg���(.fU��?�a�k�J��?��Hb�`��h�}�u{���V���
�;>g'}���^1�����04�!h����+�[����A���H1�\Gg��e����J�� �T����	)S}�)$�y�tl�I��8���K���
����z(4s?���I��w��3KZU:*�$�-����������e@.�{o�1�}�h�|Ow=G	�^I2�5��1��y49u�@,;]�	��/S�B�"�SP8�7�x��bfb�����~��a����c�����a�����N��4���?B��Q7B�xk1+1`ZM	sL$�S�j���U�I�,��	�dv5��M�=�;�s�&yJ_S����h
M^�=��m���L�%�jj���0��F��#���CM�@(�Y�%����������A����u��[�i)��� �`�I�){���D|��_*��w�.+v.�EO���[����sN���He��Rt����������m5���rLb6�\��u�z�J�4q�U}��'���8��A?��4����a�v7)��0x~���k��KP����y�.�Pt�#w4�K^o,�D����Ig������8���*=>�p`�Q�)��y���	|��mU��M��Dd��|���#��W�&�tvR���Te=�S�:I���R���\���[E�$�%���8.S4i����lF�T2��c�Ru��5�Q����=����I������@|�d�  ��9X��c8x�d��I�M����U��"VE1�"R��S��]
���>�{�O�;D�H7��B��K��
\5���,��am��j��M�d�g����Z�JGJ/���[��f�_�8�*��y@���{=�������u�m�XK�mV%G��d���(sRLd.4��G�$�k�&IWN�H�����~�}���s�Wk���N�
*W��z$���"#�������j�dU���:��J�����y�p3r9��:q>�����%����)���u�G@�+�SM���H=�4g�+�%")VTr�V�` W�	w��Zp�����T%x�e�gyzj��:�����2���na�yV��z�@��]����Jg	q$K�SL�G��;���#�f���l�(��1������jy>9�!�Om�w@zR�H2}s�&'+�5M�������$Q2[e��Qf���H�t��~�2��0�U��K3�(>�"�NI�|�a��!X�`>��*,�Bq:�*<w,y������M� e^01a6�	qB6���t��D~���z��xr91������������Y�4IR�>cA!���=��P�E=������D��lR�]���������Am��k���h������&"�P�)�V��S�%�f&�0d��y
|�^!��*�
���Vw���Z�Oe��TO�H?�)�H���g����GG6 :g$_����B���l)mLQ��o�����=�u$}������o�tt�9;����f�l"�g�a����e��/.�#N�w��Pl6��[K3�
'U�L�t�N'�`j��2���4���w���:qF��F�/u����X�2Mf��$M�e������j�N5��|�"KY�=�I��DD�R�q���\�!]1=r��I���X���>�1�41e���9��	u��"�p������	���@�R/�>���t����G����<v��W oq"9'�L�X�=/�\��x����l��Y�-s&3�[S>�1"WC����]#��V?����0j[����HU���f0�����*�I���Gw"�&@����X���N�����d�?�����1�KBiP�TKnKM~�@�-P�ghp&o5|+(�L���:[zE?��9�%���?������|�8QY�\n<������F�zPd����m
���W��0y��RZ'�t��K���tx����0��+X�rX����������^��p��j���3���*� UPVH:�c�=��=*����|S0����P�Y����gb/��\���qw�25��o��Ny:�!7�y���O�@+sD���������*w�+E�z�p\X$`���a���A3���������<�D�)�=�1&�p�}$���3Q�;��K*}�v��p;�����D�M}���3O�TBW���$_�+i�x�j�qeU����Q������:j��%��
>�'�af�+J����-�������h��d"|�"��8]�@@$�}tzR01|�����������HnT�sX�����B�g�IX+Z�$�����H��Zu}yI��,����K��Y>�p�TU�5c�\
WR@��3����o��K���tck��)��S�$2
�.b����pot���H�?|N��������[��a7�
���(���Hl�4t�+,�&���GJ#�Vn�����Uq(���&�p�${��^
���5��j�6`�j�����N�/Ct��dkK{lxQ$	B��W�$B`��1h��&bKE�N�$W�����l~1+T]<6��,��Jc6I)�������\���$S��y�������F����{���%�%|>����B���,�"R�����eHESI�1���j7�w����X����@����f��\�c���L)D��+�v���:{l��JK��0hZ(RX-Q�������D�x6�P�������J�O�����S�4$=5�\&W�����mE���d� k	q������#x���� 5�V���G�G�!�w���UK��^���I{�wt>:^�H"�������s�����o]�;*Rj
d���-����d�������2�~4lm�+���_[uAp��)���J�T�9��%�� ����c?EJr�f�kr�"���~�0+�
oY�
V�u?xv~:,�jj^�.>SVAPY��s�ag��%��r�
��B^�R;��A�����*�����5:��P�8�1�4���1 ��q��(�p�&^mVz,��-�I��[_'WQy�4�H�X=LH��lt<BE�}�����^���Dxu+�U��s��Fp���������/�H�dri��Q��MMF���iG�zQ���$�JR�Q)��S������jD����l��������k���u��*�{E��={����v���:�+������t�.d�!���SQ��3F���>����i_�4��8?{��G}�	s��.���7��~yU���L�������;;}-B�+�dU������<u0��Q�0`��>>���usz�"�)�����a�<�J�u:1�fS�D�n�fo����UN-��E�Ul�Ed�E�F�����VU;HY��
�
�� <����6�q���|>]��a�����W$�#�d]�>8oi���������g\@���J�*�^��Y��$�����6�R�r����2�����XY/@�{!���a�\�@H�����O|�dI��-�Yh���O��oc��ha���%���f����a0L?��K��!�0l?�oww/�����e{Nl����������K!N��]q:��E���6@ �M�+��N���E�[��Z��7��Uha���
�A���]�/"&-{1�<0	t���7������1���d��\���0��6������U�_���a�8������Ux�W�p��pw_���c�!��u�����x��#io�!H�3��D��$rA`�Q3�'����0�.��\[���}��8c^T����Cm����_�@Y)V���~W������S�Y^��r��'�f���_>�8��.FP��x�����D;0x��A��	�]�����_
c��r�����E>����_�A��2�����)&x�K���U�)O�2ZG9x>��Q>�VrW��cw�"�#oFZgd$66����d��k�����mk"='�8�
�r���6��s	<��_F$���R1�Htv����6�UAH��� &Y��fB��B��B�Am�b���.t�0��/��GeD9�*E���P�}3${_/�jE'�"0�Q�v��a�s�$x�gi �D���.����C��	�w���<L�q��V��so=��h�>���s�I�^�}�K���-9��/HLY?�^m����UT.����IE"geE�q)�~����:��8��;�r���2��T�a��' �L�a�`�b�Q#���J,�`�����4���TT]�J�\�>�4�����N,(������:"S�
k�� �P����G�-b�����0(IC�d��y~��E�5�a
���&puJ��6�*����
+)E��yh������mD�9�5X#��^f�`�o�j��1���1�~�L)]vD�vB�*\���U�\Q�l�f��N$��;�dH��� �^!1�t�T3CV�<^�E[���	*�/gn1���<�fg�1��u_��~�=P��e��K!�G6������{X����Ll�{M��KM�/�<���s��{xd*��l~/������u���L��22]j�GC�����9�J=�R��[��!�A��9����Q���4Z�r(2�.���	��=�i��p3����!������-�`�<[������HY�b�q��b<���&����f�*�#"�����e��k���l���[���N���z�@)�zY)���'O�G�����y�|<��K4#��~���'"��|�_0��H����c�)�T����+�j���K�1���)^�KM7��#27���5��yY��9-���5#P&��1p�����8�ju0�l���w0C��,��� 3K��?��Nirr�aH'�Q����Wa�����56SJ=C	h��Y-c98�V��i��W1���1��_@�R����g�'v���d���@L�[
���
G�����~�&��{�����@_3��,!�u�`e�����<v��c��V��3_���[*��v�����}`�#?�Qs{�Rm|AM�V���D!��7������i}�+u�E&�������v(����v��fk�-HEy����w�-��H>&��-���&�������4��M����h�fh�H��C�'��
�`i�/�N.�_Rg��r��!�(�)5o��6��J���G�Q7������}J6���N	�>��&�^7���q;��������-��H��|���{R3b���P�}V��9q1�
��h{<��e�����o�k�l�����p-�G�V��6��������@�[��m���Q�T��"��F�����������,��YQ���yxa����L��_k�������X�B������Jf9sVV;���.��(�-vP|��i��O�!���X�QA���H�i9�M�9U�H�=�D���G��8�Vs
�m���`���YI`"D`�C�����C��a�T��Y�1wX�gvWY��t�Ya�F]��dG@�^��m�z�u��]:|FS����<`��������u�P��_�>�����N�8enY�M5� =-�/��`!URx�E+=�x����ytY]Y��t���X&���O_E��Qu%��>�F���t=���Z4 �2�wD)��sz��������]y�G?���U$��5�N��->�l
�NJ��8F�!����i��:�J&��N�����=!�����W�C,
�V}�� f,��p-��`v����X�lE2`��K�{��2p����3Nw!����������>�����Gd	'���tZ��d�D����	�
6,�kE~��0�.�����r�jo?�A���������S�m�����'�z=wq!t��(/J�6�b�r���Ba�O��������T��
ZwmMKN^���Q�0���\�E��c����pz	�R��f
�M��`��a�ts��D�JefP��\��&���L��������(��p������<f��4��o�������$W�f�(���\9%�!��=lFp���,��u}���\��C�q0;�;��TW�D%.�G���}���x}�	�qiM2����d��p�u+a��L�?���.u|D����o�y2Qa�����c���.����r�O��������e������`(z(��5�3��i�g�9@��FAja�"Sg��E}M �q�Bi�Y�3�A!
���f��B����T��V��In����J]Ur�9��uXY�*6��:�$$���7~a��:/�_��������o�=;�I�Klaw�������*4�w����?�E�AHW����Y�������lS�Kxw�=8��ak�1\03�'���n\R%'o3%q�T���1�-2%�O���
zc
gN�B�7���hvb������!�v�H�YF����r�D�ZI)�����x��J��L�����<��x8���>fZ&�3q@�h��z�j�`�T�ct�[�8I�nX���=s2���T	��*_d�#s�){c1���qSp}���?��n��y4
������<a�^�r8�
FT�r@�X��,�����&���9C^�O�$��	��k��
��<���My	<����%da�����P����}���.��;�u���nO�fDE��SrU�����Y��<�d���N���^-(��SM4\�������*uM��&,�
#��ab����:������Hy��g(�)\_WO��fU�W�!�W���Q����f:��$��8�G&b�W���+��:��N�Hh%:�
��<�w��vt�J�a�U;!�Bt�����l�O��[o�����`jt4kpKi��O?�7�$�T$wJ���naR��"�(L�V����'���z2Y?F��.���L�t?\�^����X�����H�� v��}��&n�O&g+z��^�t��I`�Ge�Z�f�l���C=q���_D�"R�DT
�����s��h��rX9���z����{��M���9U��2c5]����2,�gP&��q��i�C��;a=�����G���L�A���tb�-��IAJfk���]u)��'�'J�x�pY����N��p��yM����kg��f��i�$�T8 ������ex��v��������.:���^���sz�"\y�u8b���A�7��N�/�@�J=���H�Z>�����Gj�G�2AM���D�Q^!����I�==
`���K|�R��g��)�����e\���{&�y�uK*Y:I���h�
�������X�5�N��\��#'����2�-Vv�[F������T��<�{�=���"I�p�r��\��>�n�*�2��<��

���"������V����i���]�J,N�������#�DPb�:��@`�_����=Lh���W��*��_�jc���������GvB��W�"��x�T)�O���K�zM`���,��E�� ��?Eb�k���#9^�"�;9�UE��&c��y��.�e�\\�A3����lD��I�����EF��Yp��_� �����B�o���=+'�d\��<|��%*��PnB��)?�O4�~��9t��::����B�������m8�3:7��/�O'}�h!��B��>���u�����=����O�'�A����R^:�!����I"q�?��(��hJ1���Q�4���0�z����� '$�/-Q���j)I��K���C��w���q��j��-�����Z-]Q�HT�T�!���Vh��<w��K$+�*��,����h��a~R�A�K�q���|�D��O���	0N=Hw��n��4de�m(�2p���:/��up���^�lT	��C��o�|qt�y5��lIq
�b5p�
���H���<��t���e�sD���4��fVU�����8��IeH�����������O��.�+@u2�2�l1�_��2$��������;J=%�$~t����W�M���|q�$�N^�zy)�4�!*
��������r���EU��z�}���qh���:���#bvem8]�^�$U��\��M1���%���]5a�F�HP{�g���F������E�
Te+��z.������9x9�����+���gmB�YN^zmT���"�]�������i�Q)~�7�(��������n�4�_���)gs2��M��OOqhP���
�GEl�pJ�NJ��sx���_��,�:�J���{�T���i<J|����Ss(u��w��������8lo�,��[C�f�^�L�h,���������\a��T�����C��rw�b��^h�CyO:�W���^Y!UKg���i���xs/��k!���^y��k�����P����p�"]�kG�Sa#ga���J=��zD��Y��ae$�N[�v��Y��q�Bn�<��l����QLY�"�@;�]��������F{�N�r~6�>�����*��\y�9��B��&��G�� ��Z��Nn����`p�Ha%�$�^�E/�A_���	B
���I��j��7�U���)���Y�a�ln.5;��0T>����48�
lI�]��Y���=���}������IK5o��4��D�?a
�;N�L���t3���f��$UO�-'+����l�YE����"=6m�euf�fU=�@���ap��tSL!�Z������@�X�
T��Q��:Il�FIj���-��w~{RK�������i8>"�_.q��|�4S���T>���(9��y�jh2�;�)�;�� ]A�_�u��]Fyv)U /� ����������X���������+�c<��vu��(k��.H���LQ����%��������������el��RNv��6o�^r�[�Ly,��MmQ��Z���u���++O� Q���>Y
�p�=����.C��1M����,e��|��	�������&�
����w���R����J9����k�W�}U���wOwG��W��Z���3�$T��g)�3����f��=>�h�����}�2O�T�=|��iEcMm������:����S�\A��Fz*��=H=�j���|B����Q|���1�L��� ��d'�}�]�c�M �)�`]pz��W&���^E�����3�e;'�5d��r��g���#�������BEcB���S\�Y!e�`�.���FI���������fVg#�=#��Rh����
�~���e!U��`�=Qa�u�7G��y��`Mf�@�G�&���;t�Q�L,)W��-���?H�=t��;u��[������1����*��N|�fy$iGW5������u��)��	��7#r|�'xy;��k�C����v�[���8�+y ��K��)l��@�����=	k��� �~?M�y��K��u

*\�62�Pp�o�)�M��MG���5���=BE�N��8��C�������[g��{�u����?������R�,�q�������^U���'�e���<�%���8�qz�+V�
�	�
���]t�^F��f��x]�mH�����,o �~��J��q��R�00�����+�o��y$c1R��&
���y�t�3����+-t��-�� ����;�@��J�I�s�ON�����j+z������W��,�f��*���u����WS���},|��X+0/`7��+}���u�2�B��;w)��������
=tES��h>�����U�z���X
me;��b��j<5~�	���f��(T�7`�9�07�U�t�r8�"��i�����;���h����nw�3)L�&&����X�����_�`S�'U"$H��!X(�M����8#f����<�8��kn�V0V6R�C�u2��e�
K�_N��I!�������U����)����}���c���O���kc`P�%Q�V^
�y~^�G%�A��{�O���\U�[�]�H����)��:���G��l��$��$���=������n��E||=��1����_��[=������7YpT]���g�[~��J�t�����_q�������^z�����4����U���UG��i+��&�����I-�����dfs�4��c����lF���Zv>������!;���e�e�7�M5��j���{�]�����7PyZE�'�<���D�o��vC���@ �M���-���$��?=��r����t�n�����8���wiX�4f�g�w���_&BiO�_����rj��^K��;e��p^*�mk�����-�t�L�J� �N�n��b�"K�DB&�E�uEP�������cP����%����A�����	�K�n�cRtdKp]�P�z��"�x�~���t�<%�����{$������O��8c�7@fOL/NF���]�F|%��P�Y^�:���V�R����d��IW3n����!�����]�o^/�E��he,:�G�56���K����TU1�/"���d/�A%�x����#L�G%�RB�T�����+�+��,��.D�UA�2���������C��<��Y
WE
���b��Y�����a�l"^�b����LKx%y����V�pq���r����d��R��K�/����G.(��T��q	Ra�r�Jg������w�����G�G�k�U�]�����HhZ
i�$��'=_+<$�a�o�3����&%	�}$eL��M����U�	�5/'���>��/E�����,q�B�����`��B�K��[�Vu��b<�r�C
@@eKjN���1��&��-�>��X�M~�'l"������Gb$(
;����r*�gT��oj�6��g0��	��Z�����Y|��OA�*��tW�n[��+���
A�X}��� 
	 ��F��k�1���H�}L#��9$��{Ci�e2T���W�)������TZ$A�	%� ���h�g� ����gP%�C�Q
�U�)�Z���O���m�C�������y��*D$�R�R�:<N�FReo!t���IP�K��L��X�z��6!��y~n-kn	����hx�g�!=UU��g���})��;T
��G�����vrr�\�q��&�����$����Ty �`�U?�9�S���|�0�$+�lP�o����j�������y\d/�������Z�*��:k�zSW_�I������b������a���@Qm�k�{����������W9�[y�s���JE�*T6�X}�����b��b�_D��O`��G-{���q)p�����Lk�/�?���<2���7��7�R�d��1�_A�e�p~���9el���{����������u�8�O��H��d�%���a�������/s���Nh.�}��"�y��)}m������y��N5�w��<�5�:�^������=����Bt�y��^�������~�%���v��_f��<lW�3Uv��>��M��-��h3G}c=�8����4����	'���A���0���J>]�3Q����d�� *��
���yY��:���2�]`�Z��x���X]M��j
����Y��Z�P���/�i���
��$/uR81�~���$��+i�)�\��|>�@���}�2�
O�G����T�R�q�BL&u���ii�-��]�G5y�"?��Y:3�[y��|���^H�#�]�	ts�%���-�`^r�`�,e|���d��;�-���]ja�Z~p�Q2K|p��KJ[����R�e�6�:c])�O!z�(����)I-.��qz��2w+�&D�\��>~h�LOz�j^�q�J>�=���v�K�Sq��
)4�lA2�%FV�u_YW$D�r
�`���Y��a����,=�J���%����zn�d'���|��4���^f��8EI(��	�`"�;�����j�������g��S2G�6����
��[j�!=�L���9���KNw�����$��O���W��\���R"@N����S�_f� RRI"0mz�8u'i`��i41��5�������j�=\v&a�uvg���z �R
���b�
f7��~+���f�����B�6<�����
���J�2�e@�q���^=��!P���Eh�I<�.2\��{#��3��� 2��I�W���+{�q�pT��[��N���1�.�{�4"�
��?�v�IW|���"�����%@�h��0}4.�/��]�0�a:�������W��).{g�u��NzP�@�_|���L���9?zutxW~� �y��H����������;��L��UkqQ��I�^��$.��I[��� Q�]V����{����?7S��9F�g�$Wr����������'���P���tc$�	u�n�d���1��6�o���>I���u'����i.Q#�?<�������?f O���������'/�S��D�l�a��Wh��m�����sE0-|L%�Si������]����=:��y��@�
������r����c�1>y��h�N��
�&���jo�o���V�	����z�jWEY�8
7l7����^���n<y������xT�������S��6<C��L
_���{����Vo��������s#���S��������v��k\�6�2(�EwS��^�$����t��j5��q�+W� 1���A�~A�<�s��.�i����n4k���������������s��-`���<x��$�P	!�_�OR�}�����Qj)X���~����LY1z�������i6����;���������6�����"�i4�����i�b����j�&��h��fu~i>Q���u��$+��4{uk{����(�`3fe>C�-`zN�u�G;=���������Cp��;z)�a��)p�������LA�U����^R�}�V=���,������
���_�$���Rz�i�)����A�7f������#�&*�{�������	��>6)����1b%f�%�(�����B���8T��{����@Hnn���
>�����+�� i�l���[��`�B�n_�Wt����.L����p��+��D�XrI��E�w���D}b������$��Y��
��1��#�.4}�Y�����:� |��3c��f�gW�A���L�
���4,�R��=���^7�.e�l��q���S���X���1W	��"|��	
6�le������<+cfLk�����:g��$V�5(��x.�������we	�8��p���'�A=�b5XP��,qE*�&�I�<��[���W��-W�������5�N�����Xk�SP�!N1�)��&Fk���;�I��A(���e4T�9kWI=����6w{Y��l�����`�����8�`�g���b�H�0����� �z�U�<M�����i��"}��x���)a�S�����RW�����M���������5�A��_���X��.��X������A)5hV��4����{,9Y���4�.��h����=G��b^D���L��$��&7=���P"�����A�7L�D�A�j��e_0���V2�~����C���.��]���v��&�'�*�~��y��|���l��	Fd����u<�������r��.��HjIo��jm���+Y�������\0*�����"�����&�W��2A�����5�`Mt����-L�5vG���\]Kb!TN�MJ�J�� ��2�b���M,U
XA2�G�� ��d�2N�TGk�B#VM�/0V���Y���ug-��������PZ���Q^�����D��m��5|�-z����9���mln�7��������S����(�u�O�#	�*��x�(���z����1gjc�e!N8(�)���FB�������R����M�
dO����	�C�+	4�iZ_�����x��I�6W8��b�$3�H�������pU2�'�*��)��Y���t������t�4��[���k�m�+?���{�vm��zz�[�1���S�B����en�[�e�
���B�`��R�'W.�r9�,PVH����|���[�,���*a��&�0hWl�P��-�LP��}���~�[�4kq�T����)�P���axI��	��	�)� ..����e�y��)����,8e�SgI�1kNt���ZRX�mP��V��5}~��k���A�7���������p�9����a/`������~��P���,Wk�+Q�3K��=HIiQ�2�(��
����]V�b������}����:�DY��(�f���U&	u��y��J��g�[��D�mo}QO�Wpi��A#`XnSdFD�&
�(�zJ�JY'	.�F���>��3�-Yl�
`(�kj�=��2U�~]0����Y�%��,J��O~����
��(s�LY�t��n���������YZ�P�"V����s~x~/AE/@��kP�2��eB����$��q�5Q�����T?�M�F���K������e�%
Et�I�!��hbm���-�7"3~'!;������J����)�q_c�7xFjA��7��>R�����C���z
J�Ry"oG���3�[6��U�z�?q��R�"�*.�Sh�]�zq����n�K�]����/~����g3��2�������oh��k0�d�:F�hLX��!I��(�B~!��4�_���6	����_�.����lW�=��8}j���su��a����z��\�q�e��A�V0�\��JN��&������)�����h�,�`�Z�gB-��8z1C�x ��$&L�F��IU�����(�a~�:{l��U�UN��h��^oU�h+
�Ts�	��]�G��U�b��{V�8����Q�� ���/�'Y�wd��E�9wX�5��N��
�'�Fx�\;I����h"��P�F<F�8��>���X���x��������i����������Cb��t�g�^lM�n~ml9e� N�{�BW���E	E��L�d��I�=�3�p���*L?��1�����
-��2
E)~m�>�x,����sk!qF�2$B:�r�T)Q>F�����S&���U�t�������p�PJ�0m,]���Ly�\C�|����$^�_����P���R��8`�x����I[��O�bihAQ`�9�u����x!���Q��c������H���Wt�TU�������	�D��5���H���u#/$V�]u�Hd�����ng���_qj��O�9[����/=b�i�x$���/�x���������85�m�$�!�c+�#��G_��g�3����X�8���M��h��l:e�X��n��~H�	�,P��f�G&���]��_q�v%��y�I^B�2�m�F.$�������d�JAt����d������\X������X��_:��^��a������YJ
��54Tn���3�{����e=u��9=X�V����HT�?;�J+u�srP���rM`��/<wz�����
/�,�����+�l��_,��?4���#%A���V���7=q�e
��*q��D�j<�:�%��`Yx�%�Jg�r|)��~NtQH~p�t��6g�"������(e`�	�Q���=��z��!��,�&���ut5�F����"�Ci�g�d�,i��e?X.�2nI�Ea�e5n0�\Y�<��(8Fe:���:,��� ��]���e4<��9JL��.�:g��O��1��K??\���r�fA���5�s-���y�����a���qf�9cT�m�`�)T���L�G�Q0b&r����x�`��1
�[.��3�
F[fy�u��m�����XV Y��-Sf�P�������R������s�Se�.,�^!`(�>��#���v"�<q7$*U��y�s#���m�\��������'�I��"z���D��=�<	���s�?��UkK���1D�be���Z��d�<�Ny9��_�u���-p$x�&|���M�-���!�-/"_n�l�W����JP�Y���������""�S�8������W������x��r�1�1��2�Y�����4b��,���xs�����y���R$�%���l����o�r
�m�M��GV��Fd/d������ok5��l�a�Q�&$W�����-|qP��dq����p���A.������2@������L8�������x�g,�d�y�t�#p�!
l%R^�f\���J<H����9�@X�ng5�9��&5?vd���#�NqR��ivfW�Xr�c�BII��EW��r��h��^$��x_�8'��)�(4t��I��g�&�oA�t����U��RD�����u������>�r~��[����U����r?�b77ag�{&���,?��]��.�Q������:e��s,�����[���d^������aQFciN�[��Z���)[9�J/r�$3N!�)7�Z�*��{5�h�|@�8� ��^�i�< ��8%�a�=[#t��)G����
MZ�b�S@���5���^xE���A���D5Z��S����������O
��W�|��l��Kq�b����@�����0kQ#8������.��.:��R��5�>�4[l*7Z�W�^�g�������Z�-|�/���]��Yk�"��n^��������B�����?8sEup����Z�>X�	����������L29m�4"�p�7��[H�(��23P�@��Q��g���L������;����Y��Riy_��(Ebx;#6��8?��*
8a>��Q����:����~de��������5�_I��N�@!b�Y#f���4�,��R\����]822�D����KN�vf�&d�:��Vk��Q�(��&is��KI���D���^�T=��C�D����8�X�����%	�xK�h�~�9������������pX��K�:��dNV3`d�S��#��]EG����(6��%a�M�Z�,��'��D�F���]�e/�k��8��6�+�B��#;�����V�jh���d�����4����x'"�~���g8���vi��p�Q������+3JZ���&O��	��o��c�/%��SD&��D��'l�2�r��S���/#���)"�v��I<�^�v�O�O?�j'Zt��1i�����z�-�s�8�-&�����Y����d<'���v^�k�	�d����R��^���dR�"aa	������kg�N���%\�`N��*e�w���qU������m�K��,�4�����$�[S��o�PzME�f�Q�g��w3�oO��9��>A���##J��/���rGu����O*�|1=2,��qG���OZ����=��L����[f`�
<��� -�S�#2�u��>$�,�	��|�N��*�s9*�3�A���Z_�]q�/��<�|�w�2�u����(?H[�&���%��e�5�8�z�>����'�x�t%��Wk~9��������3���;�Gm�b+1,�e�v�S�d�L�!���4*A����c�y���>�d��:�?��~Oo�������(���v6\�24K���r�x�R`�
�EJ��>J@�X�����%���W�H����R�v{ZYF������N�Nb�	����Z�����?G7����V��*3��Y��1+��e!N�&�h�k_S��=�z\:���G�C������?��X�/���K}����>S���hCB� )���fQ1h����$ ��$�%���p�"X�`��\'z��W5T���	�����Aa��^u�%�^��^�N��S5nf�x��s����r~�Q@PU-���&<���j��h�&��bM����nbU�G��,���l��$��_s��s��x1yfe+
����E��a��M�A�(*�H+,��k^�P�/��=!���}6'��������������W��\b�)�]U�k���g|�����`�E>�,Zh�hsWK�$C�|!= {��r����~��Y�/C�2'�����/+������?�z������A�
�${.�3�Q��:F���na}�w����.x:�X����x+:�e����j��[k�.�.�w<���T�I~��n�����K��O��G����Y</���N.��a=��[hf����\NVDwbb�4
WA����=�9 ���U�Y��s������Pn�$�Z���\�P�"����K���'��pQ#�P��Q?�9/����}�A�Zp�~3��'��^J&�D�a*q�:9�Z��Z�7�h�#=~a�f���J�����ER�
5��3�~O�k��,�G�(JJ���rOIK�j��^UNi��^�� ��`t�jrD�S����['K\�z�
F�o�:�z�
V�����Dhv�[��h�wQ���F�&0\�/|��M�x�$�"�C��h.�DIL�G7TD:�0�'��eo�$���S/4�+M�;��	3��	����)�6`�����E�I51{����+z�F���W��	ha?��|[�t�����<P��W�x���I�>6���������� a�=����8���O������n�������e�p.����.�P�d�{J�iG�$��O�������iV�|�����A��]�H	g
� �����Ps��y/1��$��I?$w��rA��[NWG��6�*��t��6�3�������d~]���Q7Y��&�������.6��+����&����1���PI�@�t�Tq������'��,u|����C+��BY�BG~���K�@<����7o/:/O�^\t�~\Yi�}+�WZe
^`mX�J� ���z�2"�g����%��Cx����<F���~r�y|�;������E��[����5�2�~�
|:=����cP�=a�t��Q��f����;��5��Y������h�����b����Rq�oL�j��7#bf�*K��������fV���B���cIz��K�XZ���H�sVo#��R�:��s|�yy|v~qx��UTu�F}_>#�Z���i������R$vs�0��}/J��x��S�)�V=5-Wi�@=��<W��V��l�d;C|�~��i���R�y�2z�tn�0c��s��b�����=�P������J�0�p�������I���-�"�p�����e��l���Q�sD�����$yv/`Z�������{w2�E��|<��):{�:8�UA���_G�.��j�����w����!I���~sz~|q�9�����%�Iw�K����G��Go��:'����o���]��js�o��h��)4U3�.N�>��|�����g����#���nr+������C�
�l���g,�Q�#Y5����SwP�k��H��\T���08;K��K��������f�J��p7'�PtI�B������9:��~~t����vK��v��
���yL����{v������T��4;<2���S��?�����m��(����x���Fg�N�>���|��z�q`>$���_�I/���YE���T�
�81��9a�1\+h��{+���6;�
g���`x~P<����	�E]�w�����9%�hgT8`��)�tS�r%6og)�a�!�Ic9������=N�����]s���[7+�1�9���]k��tD<w_l��Hk$C����Zw����@p�iO����R�Z������q_�����|������V8���n�n<�n�On��r�� '��YT�:��Sqtq�M�D�	V��i�.�)!���������4�l�'�����������y���r�|�l�P�����3�	�S�
�c�ZQ���X��I��?\pa1���6�yp�2�����{��O?���D_&R"�����y
K�r������DO}�' �������$��1BFd��l�E����H��)e�����=���(8�C�H�������z^63��P(��,����-}�������r]�>R/�=6��In`0����X�>F-U���X!,~�g���U!�����.�R�Q��n�*:�Q�)��2�:��!��3g�f�@��\[���202����|2�RMK�s��bH��w��U[�
�.�����J��~K��I�
��Yk�5�������*j�jD�'�0��nZU���-�I���A�n`cV�h
���:]}f��h���e�����3y���r��L���7������/�����%Qs*��+x���d H�hV����Y��J�����
F
i2�v�hA�L~�F+������j��/�9�{�6R�������'���,���E���a
���0)r��*�_��C��w���V,�F�t�2���dA�Y�
SW8���u]p��LN��o������
�L��<�lJ�?���Y�|���T��@�f+�_��9��i��ju������j0�z#��h{�J��^�I3���[n+|��M��b�K�bo�)�v��
=�\��!�V��S�=��-��WT�-����i1L��y30$G^y��?�����A�0O��I�S�p���6bWlO���{��0�W,����G�ih6��m�l����_n����_n����_n�/������W=V��U����]��d�x2o8���!��j�����xwD3�6�8$L�aLb"�UO���C�m����>��I��I����kjw�����cH�(/4\�0�U~G	�f7�����w��&�9V��Pjz{?��,�������}���[p�I�������o7�~�5��m�.^.~�_��A5����������Oj���Bu�D�
���fU�N�������H������a��a^}������o�u���K.gWWT������{�1��N}�`7���seA	N�s�Xpb���>>�pfj5sx���ey��X'kn>b��a-��-������q�����)��e�j����L�Q��������"?�~+I2�;��[�#^����IO�[y�B0s�!�s>��
=��z*!���������aw����w^R��?��O��� Z�������A�[�E�����8���J�������,����?c��c��n2M'���>�SO>�FJY$)�*�>jf���{���[����@��1�a�c�p��}M��w�&�������o�u#�����������L�R(�����w��
�opn]w,/~E�~��;���9_�5��_��	q��|S��d��m|����;���~��o��+��H'����h�6��p�
��I�k.����S��o�{�F�2�n%-�j6w��*�z�|����������M��j�����������(��:/�3�d)B.S�M<�o�Z� �V��y��G�n��������cX=?zutxa��������<}:I��_�����[��*yk�5Z���p���d���!o���M)�7�_s��+R�PJ7�z}�	����w�M���R����M�ou����	|��������C>}����\����P7���N_�H��F`	��xl:��47������s{�q��9=A/���h�o����
���+�+n2��Qt{���(
?[O>G���k���T�'�~����Fo��*��l�VDZv�5T�Y���z`����b����}@�2)V����}�����a��u�a�`U:��S n���o~��f���E�o��MD�����{9Oqv�(���I�����9}���Jo���u3n\u[�N��v+���7������N?OE��h������{���Y�;����N��2�k����*@b#p�&���]��2��1����s@����F���~�b=s��������:�>=��Z�qq�+-��_>��x~���Pvd^�%nlr�?Q7��I�m��=[��y�����"g�A���@'pG�I�p�9�I�y|X�;��K)�2P�z(8�]���\��nAs����+���s����(z6_�o�]����?���R��#D����������%�P�=i�������	��~���\�`*P������	z�6_��#`�?�q������n�7(Pu2�zF x����%y�]k�(D�6�z���M~,�)x�b9&N#�L��acl��[�7*uu:>�/��B���v��sZ���<��F}������fB��r-�e[Y�\�z|7J&0����>U[Y�^���h++;���-�YYy���#���%���{�
o���\���M[���"�xS��'����3`�]fp�^8�w��j.�����Zl�E`��esqS=���M�\�7���V�6P�7�w��n�+�����,�3a��y�/<R�l�n�c�4�����Z�G���F��X��'[���VI�xi���!)vm�ln>�1��j�G��9�������
@�-4o�M�4�}�_��8:?<�V���N���{�� ���5l�5 �1w���������&�U��[{f���jV�����7a� nc���f��|P���/���8j����5k��Fd�?�����\��J�M�E!0���������w@�6��'�}�(��|����q��{�r{��<�"����'�;��K,���Y�!���M&4��j�����cA[����WM�V��&U�ty�,<��rL�*���h��w�4�$1;uU�x��x	������r9��M��~����`�3�����t`��RZ>��tX8�^�#��������{��=�v��n��������\��;�-$���3��9�2���Zf�L��L��R�V=Z�a(i�0s�[K������CCd^21Nz��tO�+[����d����9j���[p$�����sAlM�f�����c2a����l
�{��#���f�U�c����$�F���m=�,��l�<�f[ ��vA�]�������+U�VA���G�t�c���n��C��X.)��0r�%��T���P���bb��7��F��b�t�SMS>��%m9���2�	VZ|r�`���K�!����he���ib#���h�T��V{�����W�p.	!<����/�3����G��Wv�`[]
3&���[��L����`J����+l������N/=c��M�+����+F �����S�6SV�����������qes�i���1����J
k�!� ?���E������g'�����|0��M��
�3������4������Y}�Y7k��LZ�\�;'�v��[e5Eh��)2��~	��>���}�,Q���>��T�_
���������M$Y�4���5�Y"}p����������\�uN%MC��g�
�g\�+.�,���������4�!����/�{�7P?�����K@��W�L0^v}u��S�k�	N4BuN{=�vav��$����B\
�>�<�;��]y	�X������@��G���%�4�c�p�iU	���/%�����&�����-���GH��>�W��bKh���
��
G����������O��;��'G��	�]�Gz2^�����9�j!e�,�j!E��1&�4;�~����]Xx�����E�y���U��!uF���3�G�+����-�h��egx������X�OO^�n�<i�h�`���`��!�J����n��h���>)�0w~�r�y*h�����%�������R�����Q�WvE!g+�$��������?]�sV-�o
\�BC�pQ�x���O����n�j���i�r[���!�	pQ�<��pT�+4����y��<��]�c� i)��\��+"�T��zP
�G��`���Ul�� �Gr
�H��Q��,A��T]H���p���H�J�N�_w�����-���� D6���u|�����\���
+D�	���L��:���_2����:��!�g��c�
�Z����<�����-�����F�9|$ S��a~�1v��l.�j���f��e�����.'+���DS�������4:�m)����z�p>�M0rE��%X��dV��#�����������fUXb�U4��Lya����e����}�I�I
����91~J����5M9p�CV���A
�Lf����3}�����H!�8F�t2y���\ad_D�&p��d1���6����5���i2In���o���s��`m�u�<u������i��Z���z�t�
g��Kd\+""��b�b�\����>��g�n���&���;m�l_bN�r�Y�{n�R�QR�{,�u������u�d�E����e���i�Bt�,����T����Fgbu@���o��q�-3B��\���q���(�*�����H��9��f���})�-�>�fiz]:JI���$����yf����B�,�tc���4�H>�\���P
J����L�����7�^������F����m���`��|�@��g"Xr*�����V��$��:���baJ��E�Ye�������\|������XL�c�R���~����u�+����lm��Z�`����n8���p������{/.�����F�A��e�wk���.���FP�^�	>�E
��5�s�u�k���@m�x�6�w$QJLP�5��p/���.��0Q��8��X-��� �8$*>oY5s��Wfczo���s	�_�8���+��0o����zoPn����^�%�R�%<'a���\`H�=���l���u�����c]���1T�jX��7��7b��1*V?s��v�n�E�����r#,d��Vw=���r����Os���/-�;��o�X]?s=���Q'0����s4�Y���O���.���S016L��-�r��7��;G�hV������q�Y5����N^���O|,�1C^0����:$��������j����(��;��	�F7��74�_�'������'ja��*�g�t��e���z������.��\���V�F��pg��"?������]*�l���3h.���V��U#���O��.�����-�e��x�"4�2�0X0!H!��z�G?�yup|b�����s4a��c^w^4�������&��L����o���l`���W�%��c���"�9���9�8K����SE��*�2�eb�!4�?7:�1a��(����2�2Rq������M����B�&JJ����sq��y�?��=������?������'������������������\����Z�/�y<A����p#���{�aC����O�����r�i�%���Utxp~����������^�����	U�(��.��^�w�����+]�$��iw����|����k��F�m��K,k��K���ww��K��~e��f@Fp��QA�'#����������|�`�T.��^�h�0Za,���}�]f���*\h���
��{k���s���������E)�.i����I��i[������d���h���Oz��|���q�"]Kv5x�MqK��br<G��)q5F�-�������'
�����S�%�]_Ri4�%���^����~�&�SF�gG�M���CF�u���1���/�������:�z�5���RL� ������ab�����{4,W���^����\���I�n��o)��|�F_��0vz�JE��k$��������4���n�Y��`���7$�n�k[������S��5�v����U2�����X1�R��!E����:����v�����6|!������B9��&���m_v��������g)�p��wg��	��K��8��!��_��|P���vK�|��t���Ge���Y<o��������o����5~c������o���h���%��
~+/�r;B)������;�@���L�Az�
H�?�U�{A��Y��i�~s���0�N�'�����I�����x�������e-o���Sf%[���C�@_���)����-�6@�J���hV*�������%w����)��	���?�[�7��s4wv��g��v���ln6�4��k�w�lo����[������7��2[��o�����:������_� 4���V��ko�����I�'{�p6q{�G����k��q����yrkZOL�����n���<5��85?�G�Il���_�~{�/��Fw|���(�S�r�E\���z��M��t��ts���4�����=0GO��o0c����{g�8j�V*�#�	��zI�91�����/����?Fw���5~I������Y��^��qk�E���dQ��va_e��� �Sn��2+���#�-��!>�g��7�����J��O�"�kK�j�-��Jd �����Fi
��7�g��**_��@��������`7G��A��w���k�0�������'��@�������K��t(�U%/��qM�4����������3�,Y�E�bj������EF`m�c��&����	� ����&y4�w���j�N����l����C�n�X�H�������$~�\`���p�p��aN������s�����������5:�lj�GLe��o`.����f���D�	�����&���4�X*	cz���f�E(�c���o0]�30Q��������S`/�����`ZI�p�WdnH&�&��'p�i,Z �it�U��sonD�
��@�D"�a
k�`K} �h�@�iD9�����J����M����PF���K������������A����h]�z<N�/��T���/#�lx����NcN���*q����!���f��L�	M�������k���,D��?a��KF+��)�s��h���mAiQr
��~���`+���G����'�������k[,o��t����:�,I[#�������7��@'"�x'IJr�t\��F���\x�9���M#����K��(�W�v!p��${����X`�7�;���yf�YJ��v	��W�xA1�y$�
!��gb4��A/y���tn4��YS�2}G8��U9��K��(�����U9v�j:����%�����\����n�Ixx������{�^���9fM�����h�-������_���L����=xQ�0RX���hY���)o����@�lQO���S���av�N|�h��a\����8�Q�bJ.=����lLq&���p��/.,	�'{zVe����>�m������`<�.�r����p�A����1��
<C�n�_�f����T�h��W{-������o�MeV�yb�����nf!���'�X���{	=}DE*7��3@�K�-e��b����M)����7D�KRY����b������`��{�����(� �g�
��C^ri,��g���R�n��S� ;��@D��-'x�'���O��I�[	��`����XYw|9��5�\]#���M�e�QZ<�!J���R���uwo.���J��t��^|k�$d�������f�Y���:a+��M�(E1�K�8@k�,��M����V�/3��7R�%W������^�(���B���K��n&:��L�K��y+	��=�,H�+x�0��zdu��;�K�k���l�u���):`Q/���=Z-�A�j��`b7��;��+t�!�N���o ��<1������;�i;�W�����)�1���wy�[�\{ �M7`��o���II��A����uP���f���7�8;:x�����:�vv,�E+"�}��������*�������j5�;/�Q�+�Q���-�E[(s�:�����p��:`b�18�.�}+�����^������z���tC��y��Gp^�����.�����	��A���cx;'����A�8K���S��2N)g�W5��g������N����j��p�������,���g^�`��|p�J
��0������v�����O�E����������hZ;�{I��O�4������q������]>�J����NM�|�L_�:��1�������S>w�a��4��*���~]t~�U��Ok��Y�5�`�N�D��`�lS+\��m�HT�m������[G:I��!��O@6��o:���Q��U��\%�^��E8���
/	��y�_�J
������T���V���?K1�?o�����o�V_��r-���@jK���������UN&��*O��{�����Q�0W�������{l��j�u��������@�������GNK���P��S�L���
��~{���k4�'���f+���P��~��Dm�(%._�s�k�:L�/�l����"M��k�����`���{Y|)�4^�M$�{�;�%&���r}	�� /Sg-a1����B��gg��?�1��p���q=�$�\b����]��77.\>��/��)K+	N�`�R���F�+����������i�u�t�~�������cz�x�Mx*����X��^e/��4��h���(�w����/�+�CAd�6�+���e�G����\�{���H1a���m��f�
��^���'+��J]8�)���Z�K�m%=,�A��>7
,k4.�1���G����N�(��2U������v'P��T 5In���T�en��2Li\��f��������e�������[���&5��F����{V��1s��P�Q�|��d�M��������n�g�M�2�o0.�{�
np���a'p��)m�;���k�������Mj]���,�C�z��
�te�K�x\�nu�v������=��mlj������0PZ�������b,���^����R�1x/4�6B��^�sp����,��M�<U*]o5[������e��i������~�\���*Sv�U!���F����2L#g����0^J��{_��fQ.f�F��&f�k����GJ>��K��a}B�E�uO�e|"IOA1g���x
hRv[gE�Q��3�EUC�`�(56��w?T���M��C#�!#0�����1��l����d�S��W��Ck�d�A\����RI/.���H��V���U��.�>ejl�/
��9J10��0k���2U���N��-�t�\��p�����E�����������L���u��S����%,�@K�t����/M�Yp\?�~��������i�~������������^����Co��8�8����}u��&�A
�����n"���&z���y
F����?��<���Z���A�Tuar2��<�!��2D]�+�!���S eT�_3�X���F�y�Eh������d8�2T����n>�m��Q_9�`%�{���!���H�_��j�+����*j�m�������U�b�T���^����8\V�*D�gs���hk#����<�hLe��XL���jq�����b�������]������\��������R��mD�4��� "yC��
�TI!H5���a��,Z���d�������*8�7yU��v5�<���<�����Oa*� d��ZW0T=�Z��[�lIQ�f��!�I��h�*^��;����q�?,e����X��j���6�LY�+��������E�K�q�v�u�{��DD|[�����7o��9.�r:V�x���
��\��l����O��������+������rN2����c>V��b�[R��cO�y�<��(�8I�+���}��O��L���C]|"��dF/�� g��R���,���u�&�t.Q������<)�����$9Bz(�8P"� �+�8Xl�3�:����0�_�����+��)��Gr������2��W�zUi���;Bj�+�����^Q�D$�Tu��E$T����l�:��[�GZ��� �f��G�������ZO����g�������(<������g�\9�^��snq��Z�$�v �	���W3nt��p6E��ZUF��[�$>j>�F80���_KF�4��'h��;>w�p�����}'@��7z^@�|.*��U%���N����w0��g�]B{���G�4��aW����zJ��~�!����]A��S^i�\��|t�+x��-���#q�����tl��	�}�dvY�[L�r2���?��W�Gl����<p�H*s�N!	����8����h���e���%����l�d�<u|���(�<H��$r���X�x�\d
��1�E�-�R�p�u���L-�!�-������b!��s)[�3��F���O����5��gT���i~'�
m�,)�b7+��C��Y"w^��+��]jaF����S�D���h��/�i5��>���d<��� ���Kf���g<���Sqh��S���0�!C����o���f�:[�8<6g0�Q������#�\����F��z"S��tf��;����m�1����|�*��
�p�8GLsB����i��e{r6aS�E�zgJz-z=q�����HK�a�eY�?�0D>,�<��X?Z��(�#�n���6wFN!U����R0�Hw�R�s��n<����'�����#SY|y�YD�a��}��h�5�5�����������K��_R��Y����R�[-����*�.����j"�RF��`-
�`t?�W5��#t|���==6��)�%�������/�d@�^b�P�J�VXr �0@#��������]R�3,ST}0�����w��Wisg�����=�r����i@�|�rv���P��m���#U$\�^���x��B1�B��KHp.���&��L<\b
q�D����nm����_���1���)D�4HI`�b����qE��3�d����L��/$
�W�����1�De�i,��5����-6} �@�D7/���O���^��CVV
�c��M�C�!� ��5:��P��K3��>	M�5:A[y8r�zc����$d�4��0SD�3 �K�s�r����8t�f���\�c:���b��:l����[G�"w��3>���� }O'��h<��U��|����v�a�~�8`_>�[3�m/�s�":�r#����F�D
3�	2I��u��'�����R����E�66�Z�7�\y�p����r��vB�P��/+Ajq5#[�e��`��{9[8�C�����P�����m�U
Y����t�C��x<�5!"t�~zsVc�
�kE�V���[U�
��]��pE;�J��E�Mz�\Zn�L{��yU0��^�)p�B�� k�6���Vol�V�	�0e��2K��\�ao�8n��G���O"�U�QC'GC��=|#��'�Gy<����@4����PN	��#p2y���9�*��#ye�?���9�����a���,1����.4�.��>���ll���!��
<29�XF�>���F�3/rV�0����p���`��-_ljx�>��*�7B�}�u�3���E����h?���a���Q�����L$1��wk�=Ck���8�D�^~o���t��fE�.�s���������p�d�����{ \��6���<a�)$:m�#���~k���]R�X��,�X_�6��C�*Lg��M��� ?���$���c2�#��GnC����h�K�x9���},�k�O���Y+�>sd�0��I������Lw���%a����
����n��
�i���6���.�$e,2�t�������<fZ�-�����������exb�����qz��E`.q�������V���Y$�s���dHA^��;^p���@X
��o�p�L�����u����x��_��p��������@�wp{{�
�l?`�x��7@���1���4���'��~x1���y����R/��.t�-�k$���.t}^����2+�iUWp�"�~V��h��R��h�w��j|�~�	���j�Fm:��"�#g�����HB�H���@�.c�y�����n����#����M�A��A�����LL��u�B�^��Ve��{K�8}qj�0c�8AS�E��5�������y����h5��
��Yuw|[���7 �uJ� ��dC�	2�}������p;�����&���[�e��AOp���}�JB8|{Hh���(��R���L�G^��#9�jx�Pg��{"TM���j�D,��������SP2�vwQ=2��9�H5T�`L�����gSgXV�t�#����8I�;���`�
)?��9���l�QC��9���f��C)�X*B�����#
��|yB0j���	�3������g
h��c��-����%:z�)���L�)p
�T{�97���������{_V��jD��O��,��r��YzH�;�� }������YoT`:J:�,=d��$����Jf+���(�e�@r{���1�����yAo�u�����C�u?�Rh�+��������}�6����q���3��K#�#�������i�}&�?��I;���s�`�Bd{���*��xqGtg?��D"���Vc5��	GA<�nx�q?�������N�_�[�
O+qDx��(�]�<E�j���y��qmB��.���K������EV������� ����`R�A����6��D�T���<R���O�$�3R�AoC����� y���a���}���������A�B����x��qF-��>^�zQ��'��%�"J3J�O4�R��G2�w%e�0���(��	�5��t����EtY����v�eU�����C����#[�p�� �+)}U����<+��}����@z�l��^��
�T*���9��E�El�fwQD8��;N���|��8�s��7��QY?�$B&���FHF��0=���^��;��O��q@=)�mDW��O�odp�2�c�Qr���?*����G{
�B8t�(������k�F!�8���D������i������Y(�Y�]��v�,,G��':x��k����8�����5�WT��WJ7�����i�������c
��<�t<��x������_r��*����&�7��3�����+���m7k�����d���� ����gf��F���s	��?�
nZ���:Sf����6TA��<���w���`������Q&�8��:���<����Y���(og�m��)���aW�����=�/�0���C��.���|�<�W!�1�Lp�������pon��R{�d����?�"�0�u.���T^�� ��4�%�S����`�:��}�M2��o�� �~�Z�$��H�_�O�D��<zF��_�^3��N�e$�V��#Ij�u�B��~A���W���+J���gH�/����T�A�L�����S�������hr��Fvu��Y@L����J�B�3W���������pR;���5��[(&���R�wn���q�R,���0/Dv����iiWi�)j�F^�|�.fE`���fn�x��z��!jQ��:�]�.��x�Y��:u����`@
(����]��2c��v��`�.���3�kyRQp�:�=��J:h7��S\J�u����l�wri���Y�����AJ��	e����(ar2JA��+�?
nD�@�OH����C����8�@N���x<u��@=A)�4_�U�H]����D!�sAt��F9P����e ���1!�,��=u]s��fl��ztsW���jR5�7�����A�g�
��0�����-tb�����d�^lI/�/�[A�'uKj�Z�y����������
��0\��j�$��C�r�p)-�����������W����[��m2z���IJ�h^���@XoUK@��C�0�v�����mzs��hD��i(�Nn\Z=//J\~�xD�>�(i\5d�6�TM9��A�U���^���F��=
/�����E/��cFS3�y%,�K�Z �����*8".J���_�qew"��7���]���E�!��'c���I���A���!���ld0�WJ��0zs���TR�4�T���C�V�d�:��������@i2B� E�D��}E��������H���2	t���nw62����%.���g$�O�OJ5�}b=��"��#��bR�&��7�G�K�bl�3�dbD�kYD�M�MM]��D���^9�R)-�Z,��}���@����A��O��&t;�V+�I�b����V��L��U�kc��w�!��4�>�v�\��(�K&o7X������������N!�'���l� ��v1>��^�8���n��F�\x<h����@��|r|�L���������c���m��
�Z�	����fHrh[&�TDd����P����I#p_���ij(�n����3d�������@�Q ���07�=��=������������$3��"FdL�����*��'
MX�Gs4��x"BI���3O,��2-kipa��]��j�������
�P���N��l�����o��	��� ��5���C���Q��.�W�|��
3j<�PZY_�����d@w�-�I8+�f�*O��Yr8�O�l�Q����	'G�}U���Y}u��`!���
���@[Mvh[��8���l��:�����a������|�/`�VM�{��$8���f�x�>����>�N}��r��=���������S�����"V����Ib3@�]c����}�d2�e�I���������s|rqtv����y$���]�j�����#N���ew�&e����,DKwR�Qr��N#�>�}� 
��<�����N��}�X6��$����LaF
�AMC��GMa#z��
�i�r(��/�R��J���@+��Z���}���k$�^�B6��h{��(t2�CP���WW2����N�X��0%�zN�9�����"Uc&��IF���	�$�(��i�,�y�oY�U�*��&��,������a
2���'��/p4���d(2�a���($��2Jt%Twd��M ���������)*G��������7�!����m�T��q�@�sN�������9���Y����8U"��������_�����+U6����q{�[�����{�������Xx�ig��W��dB���~.TD�:z���`4��,��(��������Bm$�C�n��43�I	b�r�c��������
%���z�������o
Q�R�Wb��@1��g�O��~���/��]v����Y�	����+�&�^�Z7�FKz��V����K������I���6�@���
f�����}�z�C�`!�<��e�m9c%iQ��t�����L5����d���L���_�beSk
�-���q=�7��Im��Z�]�8�D������}�F��1���WA�B�H/�������q���B�z�a�N�%���<��do�K
�y���A9��_��Dy���8�5��V�yfrw'w����/�G����	k�!���g��[k��`0���B2��7S/��*fb�gs0r�a���>���u^x>j���[���* ��!���q[���K�B�s��y���<����Q$�cs<����$�k��-������������L��Fy_����r3���q�P�u�y�$���hJ�:��yi$���Zc�}j�@��l�#�(��V�����+f�u�_�.�n<��l�#bL��np�SS�B�B���l�L�u����g�����������^.���kT�@��H��{��i�E�2l����;{��6)�������*#q�����'��P(`nr�	���-=��M���z���x�H���|����?O����V?O��zg�>�Y���UN�����`"�b7}D�	���hOO�������
��:<z�o��CN�d�Q��^6yPe(\nmsYt��0����(<��	�W��1>������yx��+H*=���2-�3y���O��A8Eyjl�P�����nZU�����@�Zw��p8���%�[���$�KqkOw^�{Q7Oy�Q��sm��r:o'a	k&nTG��^F�������{}uoz���l��I�Gq\ue�5-e�&��Y����f^ �C�p�>����A�w��8���g�Q��
7�;�},s��Oxg\��P�3	���*�	������y�U���SU�'��h2�w�a�F�@n����c8��p|��k���K����#0�
������D�Vw��-�B�������9��Fg�.�x����)����Bw,�T���	��[�%�p'�w{F��D�e�p��x�D�X�j�iN�Bd3���q��C7o��^m	����1����?7	:�Q��KE�E��4V|�:Zk��t��
�}�O�� ,6xd��L7�������7�����8o+�7���=~�
��)�AW�f.���{��6�$_t�>EY}�	�p�Lih���m--Rv����)E�(@����_���U��~3�<�&U���kd����ex�yx�@~7#3��N��Fe����Rk����{y��;�r��%.q�H!D�n�M���������G�$�����0'2��K��n�9*���,�.��E������rz��::x��"���^��'0��"27�}�?��k}�W�T���
��	������4��EH6���M�+x������\ �������GP���si�J�Jp���S9(b=�H#qf:d�b��SX�J`��vB�X�o�����������5_��h]�)���BS�:-��1��$�#bI�W�J�;���x@�\9%�)M����Q
���"b��(fh�4;j��^��d9�����:�9���2��[bf�hn�46���A�E������C!m,��n�j��J|��^�����j��A�'��{f������{l��&���2x��t,���H�P
��!��RQW7�F�����a\2}�*:$?Ov;T���$���nG��a���k|�V�&?��m�o&J��a��7��������2�C��(P5_4-�����b�>sb^��[�A����@)k����sF��������� ��b��/9��=�G��Up�D�c�s��.���5&H���m�T��P�}�����rN7l�u�:\���z����M0I��=6tcQ���v�7�e{�l����U+�]���;���W|,r*C��LQd-8NH�C�
7���"���+���%^��k�g52%����BW�$� 2��]��r�_{�{C���@��f9D�Z��
������p�
b�Bt7�3�>B�� IP%�zN���M�c�-f�YP,��'yS�1;�������QX�dw
lA����b`Tf[���L���r��?��'���
 av3���n�,�r�	�~������1����]�k��,�~W�f=���������[
�������1%�3�,[�������a���S*1F*���jS��8vU,��B��z��f���K0g�su��	������q���B���)���-��s����s�HW�sQ�N.������C���,���%��,��z<��k<�.��	H������9�A�`��]l([+)��zfI�E`�01+Ci��]���fT!�D5Tz���$�M<�DW���[����k��H�y���`�o@�]P�P���)��r�V�<�����2����U�*�y�dUx�c��k<�X~���t�����������:9q<�_i�e���K�-�F[���w�H����n��+����l�<�a���?����#��d���|���C?�{nb�S����0���*T��B�,U��4�E�'���������������1�
[��RP��@���~������5��]���=@�cg�`���#�`�tY���@�����1d�Y���"��(����&�s�����(�k%%0'��+����p��I�gB>V�1��[���-����������L�L�����N!T�
��� �w�BZpUX��)8Q�6�
�L#r�(�[��Gcs)���ws9��$`b�?j#����;F#�!��~�>V�|1)��;
Q��6*�v����+���3v�9��9�x����x�15D���+�<aDY�+����{k���p�����Y��v�*�BE����nI ����`
�3��@y
����f%�����60�?��J�'O/*�o��~[���%�d:i����8M���$�I��^F]y��~�4r�j1����&�TV�������ya��w�%�J�']��^�6~E?<�q�M���%���=�6��**���y7 �f��%c���^CNN������P}�������i�
���4�aw�Pom�g��cq����C����@7��� �~�1T��Y���3F�)�&?$���5^tAt+B��F��8�eA����l�Z���	���'Q(�P�
~h{6�
*). ���Y��~�lc1����!|.pT���(�.f�1��>s3Y?adH~.cK� �;y)�	���W�rf�ZO���+5�d��3�*��)�H(X�~�DgD��,^��3���{���p.yZ�B�`�r]s����UF	�x~���#;�f^6}&�p^���U<LtD��"8�����1.������b{���#��1�-�"SZxG�F����3�z���o�����
�	U�pmL�n>�)/�������������V M���������{r���6���Q�IR���/��&�-#&B�M<����z)�Yf����6gDl05#�<]����g��h��Z�[������le���HR�����y�����z�����U=��7�HR���� '*�q-�wK%���'.(�@M��\�fE�TD���������k/&�,���������E���:��K�;�P�W������j]���B��+z���:|�Z-ab���|L��!���>�7D�iX�<P�f8�d��?����'����G����y*q��-(|�����J.��[�+2�Q-������`7�����0 ���~����.�@g�c��l���mJ����C����y�{r�m2���K����E1LT\��8(/�4�N%����gP���u�_sw~�j�����~�yUD����������J����7��z�/��p��b
V+���6�g���`o��� �a*Rk}��Vt��;y�T���B�����v��7Bl����4}��'t1���
F�#����O�J�����<Ow��$��h�&������{�a��� �!	>�C�
~c�Y~~�2�*���Z��z�)�*,�u������_c#�(����?��.r�~���"Nc�s��Bl����'7�/q_�=�HA��B����bV��b&��HS�a��;b{�)�	��������A�2t)E[�#h:5L��"�$2�N��_>�}S�WS���u����a��9��I��*�
;:0T�!1	q��
{^V����i�3��\��6�������K�,��6���e)��_��������o<z�4���0��*	,�#�w�$�/I
K����_���B��/M
�*H�����2���8?Q���A
#��'���%JO T�m
��X�������G�zs,O������U^�9����UEJ��,yn�n�s�	�iZUB�%�q�%2uC�JP�'Z�����N���J�Vyw����Z
�����������^���kL�)F�A�E�]��B�'O!��+���'
��x�bOGs����eN)A�L�!��t�H��uk7dcy���E��4'���n9$��7���4	q�]&�+N��A?~
��#��f2���Jq�D\T6��B�y1�}�n5I�h��g���	X�NXc��>�
O-M�	q�J���.[��*\Q�"Bu���1�sHA��5��^�~������";��R���g�5�xJl�,_���L7??�t#�Q�����m�~��[c"I���
n��H	)��G���M���2�sR�8+�������/3����52��R�#����B�������.��Ri3���t�����6�']p�7'��.jQ��JL���t�
����7J�o�MlV?����sh�ggn|Q��R)5g�gT�R���M�%������90�j��������~��F��=���<����5h+��R�����F�s�0H��wYS$=���<1��+R���S���3��
������c	�a�����Cs0[���==>k?"�3\!����m��dI?�6����1�y�`N�?����������:{dW�#�#K� ��@�Bo
�2�������xg�7���a^hry?����/L��P|��D�x�5P�[f��Js�3
c�z��|�(��dnW�*�S2�MY��!��X6�����2�gDd�Bh��j�<wd�l�%�jJ 'o�k��lPsp�J�\������;�-���PNKK������8hsN���}�Io���k�P�s;�}���o�u���=��z7>7]:���;lp����N�`L�Rdo�v��KR���!������d��(�!�����PEC�)������8|����Y��%�W�l���)����KE|N0����9���{�B��Vd�/��ke�/6�����E�^��*�d�om�7B�/��B3s�g0���b��5��`��.��@o�P�K`��_���,K��p7Y	�������E�Nl���p�RJ�h'f�Br9���`�VT�{F�##�.�B��!<��y�!p�q���J1^�g$�a�0��CQdV�q�����pAw����Yh��`��/b�c#�!�`r19���t�n���S[��bO���_lnmn@��4
;�����:!�&�{n($���I#���Z:��\����������L�Z��`[��������!��u�e�5��F��aJ���g�����������SNi.[��������9�b-��C=���z"}`J$���M-����}�9�a��7�2�Je8`�t���r��c����c�'\23.i:�n���,n�*�)����x�!�J3�J0��J6�R��odc,y�b�N9���S�/��l���)��7��8@����*�P<��Jsv�qAg�$[3����A]�����5Z�[5�A��&!:��a���cp5��[���(�'(P�O�K[9�T��t������Fh�Ed�8�����`����5t��j�����	(�n�V� w��v9��=i��g�sn�����mB
d��N��^�>{�L���5��:O&�������H�a��4�@U���N�1a;����������}q���b�L����Z�c��b�{e
��W���4��[�|%���<�+}��@�=T'�\�p��%Lc �`�$��;
9s@f�3G����r�O���2��5�]%���(g���n93p:"_'Ij��F���+{hSm:��Z����W�$��k��%��M�W=!C�{P��7��DP�t�'Cjr�����{�w�9a�e��E`x�u�CwM4���6|���������f�����Y�`�"&<�.�!�9i*�+���#���j:#S��5��!�ZCxJh����i�5D{�x\E�.���#�s����R�%Qm{��tZdQ�Kq���=H�������B�0�@��"/��l�N�\Epd\l�����|����CM��d~�)C��/��J|>�+O��e���L����y*)���N~$�n���-���O�pj�������l�r�������\m��K6�#�@�R����	+}d��o�EL���>�2OX{��Vf ?q������$F�>��r��*q�	d�����$e��s%���b?awa����&�b�=���|f����k0�Q�5f<*�9��@���E��r�������FR�iHZ����c�c��7��a3��t#h��?��8'K�Jb{D.QTZ��
z�����:���D�%��stI:+�S��f�����J���w�����e�4:�)��"Ln�� �"f�����2K������������9�B�]����*������3fc ��
EAN.�)�Y�me���("W:�F�*GN��E�"9����]��t�
d�VV$���`��cA9�y�I,��(*�6)l���F��1�H��r	����}����w��7�=��k<|adzV��z��\D��^L�����#��#T�9z��#
B0#<;<;��<�G�����W�'�_�"��j�c3b�4�_�u��/��dS��2V}�k%�-^@Fv�����/} |������_�u4��B.��}`�"s�~����l��^��������H���h�@:�X
���X���M���n���@���h8'���H_��&-H�!#����@MY��Tb;���+��t�L7�J�mqd}�D�<.��1b�Zc�
S{+�"R���@�x")'<P��Ae9Tt�8�����9r��a���?�'9Z^M�'��^�t�x;�h��Sph�����)\'�
|�]+f��/SjcL��GU��,B�Y��c���q44.5� �8���(Fh|�f�1�d^�B�����#>����� �u`E`����
���3�K�a�T2��� "���x��V�����%�T�(���;��t>�Y48�g��m��c�gS[��o�G+7x��Qs=
�+�=���U�>��	��W�&\���<��4���Hx-"�Q5}���HE���#;pKI�u��0��#*+�e�5F�]����H��Z8�'��F�U��|4Y.��B�v�'�z��c�H��.b�����e���n�������I�3������o"6�5Jy����8��.:�=�� *9��������@f����8�%}��{��rT�{�y$�ge.����Y3�r�E�?�/rK����uV����t*��O,����~�-���������_x�x0��,B���bN)�B�Q"%ED$V>z\5"��&qe���)��O��d�Rn���N��������,�������_�kf��Q��x�C�N����:s#h��U�D����@���f#p
�9Q����#��d@
B_~� �����(��'��I��7�^QLr�I
4.x�8J�&��;e�&FW{:���|}TU ����U���.�_-$��j�Cvj4������"���,d��j�����u����|����^��%_Q�E���Y��0�9������n��j�����F�^��E�������S"QR���:��HU"(�x}t�D$��a�?�;������u�s�#9A��/��T�Z���7���/�9S����#��|�O>�J������3,$�&G0��F��.CPH�8�Y+1#A'�8_}�r���<0�f�S�c��I
�E��r���5��Z��M+]���%>�1��E���V0jz.�	���������"@��8^��Z����8���C����x����n3��KF��0��I���]�l�9u�3p���8�&��o.qc��!rF#��gJ=��-
.^�q�x����*�����E�@����`��<�\jF�^3����gN���2H��Xg���T�n>�������38�/�U�
�'
?2l�����si�q��w��MA�hs���t�t��@7��%����$0�$���6�d������������.�F\;�����;N�',�u*�q�wr��;0��f��.�qK�I�>V�2�[�`|~�h@�"�`L��2������]��z�61��3c���r}�"����^����5����K�)���-8�z�<�5e}0S����T�����0lg8�6P����_*��-�I��
'aR���e��R@B.��"k��#��r5s�9�;�������G6���,S�����G/z��xq��P��)T$/���[J��@�O<e���S�p��:)'�fq
K�����{�*��do���|6Ntu~��+���<��������G)���\X!���
�4��:��%��E� [hq��6X������o����=�&"3Vv�'�#b��xD%�=z�<8�2[�-F6>P��,,9u���������4r����i���p�s���/�Y�na�e;l�Um������6��A�cr�V��_pE<	@&A	�*����P+�������M��a���df����*�X�[���9���@�����?8��dt*J��L���<��x����N���#���kYSv�mQ�i�����d��
�[V�b���j#'q�k� 32h����o2mr���@�
���j�W+��������o9)���Y�����$�`��Y����O��t���{Q����5����P���������{�u�C<c��������{4L������n��q����7E�%3���$3������'�d-&�v��`�AG�a���sm��/HTL��
,xR%�h�c�"���-d.���N��Q����>���p_��p�R�W�<��'��S�c��t�������j��w����m����7+���]!����b}����5(x�qj���"���<k'\�����]+�������������������M���������{9�og�U"�����S���t0"7��d����o�UG+3M�Nw��o&yo�]��I���<�K��fS�u�j���-�?�W�Ft5-�0���x��5	��u������ol�-�~#yH����V��1���e��Y9M�<-�
	���K�Ck�%��
��1������	E6p6���:u+�1m� '����j8�(��j���u�����Q����
s�~V�
;�����R�������z��;���F%]�v�������_za�/�~@O�G�|'#������b���<��0m^�R X�e������2��~�}Bl������/2���a���"UMg5x,�qI��dj}s9l�6��h���Z��7_���z���)������1�S (����+#S�YX�F����b#3��~��[� .��y��e1�8�\�9������(h�n��$ps���L���
Z����p�@�5�VuK��V���V�ST`<��yN���.���]���i����-�y<��f����~[����
�[��������6��P���9~��r�
v�A�e�F��J�i��t�&�%�p��HV�>�e6���-������lj'&
Fm�!�y@�2{L�����z��H������H2�[��dr�E
i/��:�@�N�������J��LE�.e��Q-��C3�4�u"DLv��g}0?����\�v�|e����]�G�l�yEn�ns�"�lx�s6`Tt`�����<�]��cu����`����D��9h]q���1���PEj����LzTq��r7���e���_O���K�X�2^�J��B�4jE���V-�V�L-�[=]\AL�/���5� �N	�5�����m>�0gCp7R#�&�
�����O/��6_��{>��eC�����_�����4�!�b���S�sh��df:0�w)^��6���S
���$qr��Okg�]H�{
�
����Tj������f#q)*k����/���9%dao���9���6G1�LAm��p�q�#�N��� 9/h����v��e��wU���pm��00|1�4XR��Z�:s����n4���F�O��g!�ls�e3�T���>Y���������W�YX�H����W������F��f��u��~���;[�Z�����xl��.pX�������li������,�9Ec%�6�y��`�P0�������g
�2���@PBS���7� ��T\yY�A������8��V��f�e�B�U���������e���.	o�������'n�]������oo��������Sd�����$�����T�"G�79U	^����xv�~�:x����6-�"�����
;&���PO��,pOgI�f�BC�����b:4����%����p[R�����*��z���
s[�������_XQ�$YCi�����!43���&O���=�\�\�@�R����E�u�3�5DgG����
cR�'���f�Q�L���*$JV4tc
e*��22�?<l(�� -u�U�WO�����
���bl�t��^�U�)|��>�xe��Y����Z�����}�\�z�����w�^,���
.XB}Z�e������"&�t
�s����W�����O���H=��WH2� �g�2��'Py��$aFtm�G�|�1�� �-������F.����.��%��5����S�Dne��v��_�����~i'��o;I<i����h����?E�A�@1�b�-�^�����a�D��	���L7��yvxv�{{|���3�x�M��i����?~%$��S
���}���U����JI[���]L>L�7�����G��>�C�	[7���2X�Nk�Uow�����_�����,��b:3\��Id��z����*&9�4�qm`2-��QA�lx�mD��B�JQ�J�,��F^m�$�����N�aMu����L�>mW9�O>|��L���W[)����Ujf���?�-Q ��.��y���r�&�|��A��L���Y�V�>i�5�Mp#Z�Q����9���\����ft2�f��=`�����:�hR�����d6��P
Ig��2gp�����9�{�r#
z��������0Kj��+�`�����cM����u�tal�R��O�cc����w���(|&^31�66�q���^}������d�]�������WY������_�����Vh�H��}0L��kf�����sf`(f�&[�i�)�O[�g�RJf��,�g[������<�����z�H����_��|�v���?��2���kP�]��Za��w�ty_��,�kM|���l��#]���5���:�vXL�u��it}`�}c�()2����Yd!�f<��`�J��#�^�F,��?���,�bl����Jl�v��n[��W�l�#[Xy�x���6!��g`tB��n^Tf��Zys�����{i�V�����Ez�p�x�,���=�����X�Rk}>R#_k��f���Vs�9�s��� 4O+��%2����
\������hw.����]l>��%B�.�������M~����5F������!p������������x�l�{yw��Uo���"�6K�V���^6|���J8>^�gZ�q2���m�k����}LCc0"k�`3cGW�^o�>�mw��D����E���R�h��(�����w��;dY���b��]g��m������?��.G�]=G���a=GY����uO_p�v���d�y�[�@�������z�n�j���
���>�N
��%�i��>�����>���})*�U,������"��O���x�|��$�
��o�wh6��)!��x�ygxw�q0��F�b4���t6���f�N������6
�z�9��s�T�A�_�\��(��o�v���fg?�;{���qgk��h4��b�V���U`����T�7ZD9��!�F�x2� T�<��0]��%6�s�i�%���������=����:�E(7�8����}��F��|�S[1?���26y�[�4���1���>y�;y�����������i�/�&#:�8
�tO���x������.�S&�W�����
��B�'L�n{����z6��C�D���d���&��}=��%2N�p���
��2<z5�'b�5�Z��+�t~u��)Q6�=<�i�oG��@Ej-����Yu�@�Ptx}
�g4�D���X�8��9��nwh!��<2�Q�o��)���=�l�� o����9?�0lw��7�PS�|62c�}�J����V����5���I�
���Mo�y1�ih���2��U1�ww��$��wh��h� I��4�����5����{hN�)!��C��#�������4Y>�����$��p���b�����`V�I�v����.���.9��w��;��$s�-����<���R�<��C�������Vsf�����"�L�@��v�G������c^��-����&��U��MN������D�:?C��u�(Y�w(���b>oZ1k	?d�*������fs�����n��Pe�Z�Qe
�.r���:{%�{���������-b�"�B��f��+"�Z���sY�+���J*Sip��,�@=�rc
�N��*w8���@�~Y@����R��L�S�]�c:[t���/�=>�G�Q3���-�tq�IDn0g&�t�����Db����o��2<�����(4$��F�F&�G�����Y"��O7��|�K�A������w�=#���8)>���d�^n1B�{���t��B�&�j�����y},��E��j�����|zr�{���������GDHeR+��������c�r?��[�#!C����S T�X����1���a��}���lE��[	Q�r�g^�aN��&+�'+�y,��rc_L�j��t��v:(�v:����f�� 8/��q�k/����>��0*�q�!w*,jP _���	��}bRd	��9*Hm4����O,�Yw���l�.�<nwi���4����)$�/-M��T�kAq��%xo��z���t�T�����f�p�g��pe�����sh!����	L���.oT@/��~x�;y�V^���7s�n��sd����r�����!2�&!��^F������A�j�*n4�=Q���6-�vg�����w\����U�%1�_i<���"��Q���0���|PW��?�u��W��
^�j�6/�>��fcg;���1�
����1TE��*B�Z��5���	d�J�������$n^>��1M��}}�\����yI����/z��3N�����c�[���cxEbg�f�O��8�5��qn����-��9�}�'A���)��+�.�(�{�I�U�~{��Ct���m��@�����s�����6
��N����k�= ���U�������������C��w �^��-yV���6�M�|zM�!����r���Lg������[��8�$'�d����a����8�h���U��[���{f5w�jf�'r��b��2rCzl��������/��u��Do�=��������z�N��������;~�����W'����>{�h����4�MM�<8M8]���2��"a4'g7��*5�3�b���fS�����o�|f�5�]'��acz�@�B���O)�~h�~�xO�O���D6�>{y�$��\��SP(dk����7����`7�[I!��*��y�a;��`3����x��ylv����5�����5��}LB{��xi�o��A��a�r5A&���� �C�%p�exen����b
����]�c���\k,���)���F��1o�a@�Gk�$ �/zPZ���	\xNR*
��]�2d�2t����K'0S�H #�!��I$ r�k��ri(haS0F�N��Ow�Zs���T�kIdI�ff����&��s����@A,~1�.�Z9��������}8���%�,������O���*���0]���J/Y� 5�jR����j3`���*y��:�����
����`�
����Tk��@]Rb��B�9I
i��!{�G%7�Nf
�*��S�����4Ao���r-�v�f&��J0�j���5��u������xg	"%2M��{�q�?=�&�O�V����Z����}S�BV�	x{���J��`s�u�=���#jE`��������f�1�t��FcZ�*�/�����s�����*����k|`nf#�u����6V�ud���,���Y#�_�~(���j�N�^�UsVu��w
?�Xo*�K��QX�o>$n�h�x	,���S;�z��`rHZS�����m��)�)����\|��P�G���z�A��2+4�0%f7���&a�x
�W@��S�M1O������l:�hPi��^�g��$���;N �����wq-���pr��o@����!��c�1���&��8E��Xr�������!�LDM�����Q���5D�������b��	`7�N�.��K�qnY����Q2[=]�������*�8e�%u���j�D�����M��]_��O�+�z;u���T���b����H� "^�#�8�-���N�H�����xW���W�|��<����!$�����YR��?�:�(
}J������x5~�8�Z?��$^�b^^�`����lC
���=�-U��^��;�hN�kF���(�)�gf{�%�C���t���l����p��Md��R�z�k�����b��}<a��m�A@����X#����������7�dR����YHfa��orF2[�J�6�9� Xbq"S{��	����kN���
P��J	7(r�,�Y��q�o0*��&*���h�h@��:�����_���r�<&`�Z�m�����
�)6D�����8)K���`s>a(�n�1�d��H��U�x��oI�7`6���L,���t0�0}(��g�a
c�t����wGfY %*�i������f��Q����2��l��H
�����?����b����b��l�lCMRtN����jd)\�0��0��5��5����J�`j����n�w7�
M�;��H0�p��a��qf���4���,���M����mB"`!�t��:���p1�e��t}!L��5���z]V9��Sv�W:4P�������
��u<�W�����Y�����^�D!�K&9b�
�Gf�<0�9c3
6�Nrb"$����3o#�`��+�'L������RAg�@"|�g�|f������Gg��~�2?:<5������hT���U��y���K����3�<�v��j��Y\}D%eN���W�=���'g/"�Zu:�N�W�j�:�����>&��2�2�������#�8��4"���/����E�D�����l���(�V�nt(C`�*ARp!YV
��7Z�R�����bN_U2{��6D���A�e(
�4<rqjc�����j����Q�a�^LF���u�0���d���(�>��Q<@�ZI�v^���B��?�[y�r2��/���+!�}��|����#s�d?������s��h�}3����_��BwoEUE��PP����ZXWV���CPSX��M���9A�!I"�tT�lis��@���� �XZ����Xj)��w�TUU]�������rHV��};g�P��������iM��C�|�Ihu�S��Z�!y��S:|�/lX�v#�:����\3��F��r�|1��Qa���Y�_�U�\�V�=�{��v�]����w�����d��2���K�Q:X`NC�$}�����R�JML9� '4����.�B�Qc���k�=��f9��bx���*���L]t�0b�0Ug��'�(����1��o!�i
\?(��z�c������)F��|��*3�G��,���j���>���f5�yF9K�����<�9���4����a^�Zr'�Yem3������T(Z�����O�F�`6����{�D�iN��|��A��@���*i�W9�r���DU1(O�u��^���C��o�%�6�	�R�9�'����z�ps��R����/�oB�M3�U�MC�G�
W�7��7�{
���z�N.N�Lv����eG`��X��[i��si�&e>:7��`R�������k�4E6����>A�J�/�N��[#A��fk�M2�0&��d:i���9p�n|���'���kxg�������G?�8c������Aw��d�`�k�E��� �i6��A��fm��
8da������Z��.Z���]g
�f
��0[�{)<���}�Q�Y���v��BA�T�Y\��J�X��yS8��Fi%�W��Os��d���r|�+G1�9�����g�p8
�h�Jl�����Vg���o6w�I�l����n���U���/�h����Y���C�����`����F�����A0��a�Q���F��28m+F):�]�IQ�����,���$�GC�*v�y�@�@����Z���
Q(���y;7��Z	&6P��x�|Fx�,���M��Q���6�E�?���������
��0� 
��s���6�a�}�7Z;;[�����������V�?�����������:��V�?���5���xE�w2[^n����?�!��V�?���v;���N�o�����N��n����I�����w����)���Q{7j����h}��G������?�+jG���_�y}O�y���'�g1�	~�`�N�������Z�G��G�W�2�K�t��H�G�OoS"�������M�MZ.�;���I/�����|s����\��W�t�j�����\�x����mj�o��(�1U~3��x�*������T�W�qp5L�j�pnA���8�d�U�K[Q�����K��~����u�z�o����mB|�^i�������H���T���F$���.�C��j��Ng7Z9����M�^{�NR^��E��w}��x
�f���5~���T�Mq0���L�uBv0[��dv��S50�J�n��.nHn�SK�x�w"X�4\��8�[�=����Z����hh�z���#:��
,�qw�w�.�U�Y�u)H���J���d��l���V�g��m���uKS0����y�g�$?E'�@����i�>�f�21H��d�L�<xs��������O�wz���+��&���>)��
��G���8ux�?D��0�?�f>���]�o6�D�o�Y~�iI<1O��lbq���I��o�
_���>-�1,������v��j9#�����{�
���������������U�����v���0g��;���V6��j.,J�{����W?��9�F��eKU?�2��rh~C���U]2��@"�Td����D����F����>�U��N�u��u�V����oM������J@��S,�o�!����Y9j��A���H%@�������j�h���WX�W=j�z�%��7h����(��r/���������=��,���&r�c�l�m����<N{\�gYy+qW�	 ��Q�#���Kx���!b��B�7��-y�Qe�}��m���;h���"_�,�}���)�7����AIhP1T��|jN$�^�1�}k�k^�`�U�l�t[�fsg{���Zc�u#KZ��3�(d����M��V���t~5����U
YQ��d�I\�F&�s9��&����>��
���j�-��
���[wG������)���O=Q�������3=�
/(����Ow~��Uf��t>\]W�)i<1�6�\��<@��a*�����]2���nh��������@-w��Z�V�n&+�������^�6�w������w��.�r��;��$��m��I������!Q �������Fn��vo"���Vv��J��� �Z2���J2�
��^?;�������?zX�*�sS�*��<���D���k�yS�T��s�������n�u�G����a�#.�B~�[/N��@�Tl��������Vymc�Xg�����B����!Kl,j�K�;F��
q������W9k��aY�pV7I�W�[�_�)������u[�E������g�H/�y��6����F��|a���v�P:;�����b<.������a>�l�j�&�S��������+|��i��g�'�(��x�H`�)�x���f�W^~�s&�����������t��z�S��[�C�

Z�n���m7�����Vw}�S��j�SUi����N)^�����-�q�q
��;x�q����_b-4����p ��w��������?���?;~S*�>���e^��E:�E ��*����z�L�=�+<*L����Q
���V�xc D��!8��>|h�l�M�����`��"���
	JxFS4�	�o%�7����M�=Q����(g����O��)��-��rLj�Z�IRM���jA[5	,��]P��z)��-�g"�n���������%�Y�
���sEa����F��S��$����
-��k�e����K#`&��t\_�*Y@���w�g�go��U.
I� ��I�1�]����3M����5u�����Di���U�M����yx�o�/oF?���B;�I���ex\,b�V��~.���;�U��a<�� �������):�&&�����Sf��\y��9f�}�������7��������q�*����1�E�������a������WJ@�v��%����G�e����|�x��O�����������f��������1�_���~�S���4�
�����l�yE��mjt���c�8�@�,��1�RPy:��E[�[r��0�ur��O{�S!�������C�?��/���EE� �,��@�9f��w=K�#�
�!�m��S-���
I���@g!4�I"=T���-���<c1�02�/f���hx�pF*}p��"�'����?�%x���v�� :����p*��G�{���Z/�������9��S@�����fQ��R�PO)��m(:"s3b�"VI�����?�3��@3X����@�\�Z��1f��u0W�
[��I�;�H�IxD^zap�:�`
��t{�;���
�jP��J���B"��-��KP!�op��o�}>xU�k����G�F�Q�,!�B�����4d��D��<��`�A4v0���E�,B�io |�Y�������:�'�"C-��l{���Dm� P�Dm>�6_0p��
�{�h��R����@��@0G����?��to<�Quo[�*��$\��_��+_|�0�SKP	���t�R���E�@wk[X��>����w����[&W%���Zd�spH;��Xny�����M4���ql�>��?�=���4����$�2�����'�6��~����{��g�;[�����n$��PX+�/���d������o��V���Rd.$h�'7����$4���`�&������V����QU
k�~,�"��dC���a����x1���3�<��]:I����^p��C��+G<��(>��9���}���f,��������@%���@�y�i.`�4��
�U����h���~
b������=����&���r�`��d��!�_���@`P����b2���:}��%�HI�Z�J�e�x�c�+7`�� ����abh���4\�+�X?Qe1�&ri��4>{ERt�!��
$)�pRHUx�='_ v���y�5��^�I|���'�p�T�9�����_=�Shb9�N�!W����\kn{�����4Qq�w0�8=�U�Rz\����)e���S��co�,�<���1���C���q]��1C��49���������66q^pY�h��d��[��$`����
qEd&��cRVU���MH����T� ��Y6����]�eU�4������t9�sr:QA�@�J;'(��H�&}f�)��Y����v��;�x�W��i�O�7J!�B6�����fs����K����Yi�(O�<���E8�A4_Q�����q�
�d��)�+[���U�����l�blh����������������h� K����c�0w������w�O�o���{�������������Ir���y�w�*���ng�����o���`�Z�}
��Uo����c^����r�,�u���V@��p19��D9�p��������lQ��G���a94�o��wD�K�	���f$���M�,W������m���p �o���$F<S�^s#�:Q�������r�W	��0�j>@��~�$�K�M/����L���x!N�������g��,�! F����D�D��
���F[����:�LP���53P~6dh&���f����a��(�	�����83_�F��4*F�sc12"�D30�roXx��-��o����Q	1?���<7<&T� 6.��Y�A��o����m7�~��=�2<��ad�}�S�D�.��dC��H~rC���4�M��&����
�u(T��������;���G�10���Ar=��(��A��W&
�#rY��f��@0���1"32���>���Pbsc\_��\�yJ�T�:�qB������i��'�����n�q2�0b���c"�`�`��)�u����3���X�J����p&bdV�y����7�$�d+2iVzah��X����������6z�]�/����z��d�6��!��'�0S��t������k��R�"�`���O�%�Q���W4��:r3TL,r��w�[��\uwm�H�B��N�R=>h��z���MH��s�+t��qCH�J��tmCs:����Bx
5���������p�rrjG�a�
�MG�9��A�4�d����9�	�`?������![��|?&,����X�f����������D��3����A|�������/�����$WG[m:�E�HC)��#�}s��$C��h��	���!	�r�{��QM(������J�^�B�:������&�"c�z~��5?��i�?�?��n����m�1x��������v��4�0�cHM���NqD"iyE�M�?�"9�*��E��5h���<����PQ�.���\6iR���tv��[��&I���u�6�j�lA4��1�"�����md/�E�7��@�f
���a��{.�=ME�v�Ve���MNmJ�r#���R�!VOi���VUgk��!CH���H��s
��	T�Ez�%eHn���i���f���!��/k������f��zc�g���ZE
��~���Jgf�5��9R�����3�AG{�#oFO~�ZI�J)�J�Y7����8j�����,QI5B��'*�(%IS�t�i�(B"t��rB�s�
}Tr��D�L��m����2k�uw�/�R_�q��!���M���w�����?Z6�z�u���6j��U��9i������lk�a%�G~w������I��M7q,�	5$�����;�S��u<�tA��*�"\��n�Fv����7@���Q�=k-��h�pl'���r������������S.���.��|�:����cF� �:sV+��1r��@~����U3Y�����E�?������JC�,�V�9L��*�k#2g��1A�C^�<��	�:qV�g�C�^x!�SXJ��=���������?U����wQ��|*J��)���@��Q��v��L�e-dk�T`H��Z\�"�����4�����^T�r�N�AX�-����^=���.�I��D{���UVV��UYY'/\ee������ �)IF��{>�/��u��)G]��x�/��(�*)���Z��	�>&�A�9�}���P�N���(Y���~����0CM�uvTvF�� �9]�ZvV��Z9���F�3�����A�c^��$����G]AW���U����G�9����	,n1CmW�0��������HORQ�p2�8�Fqnd�]���<���\�Y�Lr�H9�I���{���;I�'�q���t���*y���f�g�a�7�O��{{|��X
4'�6�fJhF����v��C��i���4��nJ���Yr�������=�|��kT�����������l�y7A�%6I�cP�OF����g���N�����~��\c���N9��Y�o�$z��|�=|���� ?���1��T�V��(?|)	�1�_�����E?Ev�,�O���@��rKR�����"s�K'�6f~n>��ft`�@(�v�;(�$���F���=+YS�*������Dw�z��c���
FI����D��	lq%��oQQ1��f_#�d������.�N �J�6�K���h2B�[hi-��?fO4�8$a�%hF���L:�0$��|�"\<$t�F��e:%h�!���zfp1 �9!0�}?A#&�;�!�]q��8C�4�o0�@�M0����x�1�D���q��/��a����/�yG��������J3�xvvj�_|	�HE�=�^co[�9�/�W2=�E��U�7���7�i��>E+�m�[D���R�J�x�i7B���K�������	a�?��7m���5����^Y�����lYr`;L!H�����A�)���{g��lN��U-�<Q&�!dBG[v�S�<���9Eu����a�-�1�0�
�Wc���u�N%�8te4���?�h~�S�3{�?+M��2�f*����/�9�A��Ru!���)�La��CDx����Y������'����E���������0���JW3���3C��dV��D�����e`d���6�fk%C-����K�����l�BHn�S��e��	���
�pK�D���"�����*z���,-WeJ���c���h�a�Fh�-Y���)z����w�����w��U��)n�v��@6��0
�)Z����p��n��`6�1$�*9�!����U�R���n����� sz����+$j+R�t���\`:9J�~�a��1�M8��4q�$3����x�
��@�G�,w��$�z���� ��c�3�1��Q�kY�s4�
fi��z��E�
{��u6�xK�2���R���2Q-
�8%<7i���!��%�����%|�	�8���D>�=���=��8>������W����M`/�� t����7�(����Vm�,�_�u��u���;�������v8n��t����tH�9�
�>[��V,|Ed�/O�����>W��m�fDS&��
������d�T`!X]�TuL�}2|�kb��%-���OGD���s�
�iu�
��#�LG}���K� I�or��nV�g����'��9��e�q�:pl����#��p�~�'�����@e<��T�������'�{��W�5/[�����y������nq������:��"���l�r�����������nw�q��z �]���;O5���p�Y�0zE=3�+k�~Oi�Pa�8�J��^	��	R��gu6�
��15"�j�e3y ��SG Kw�:��!_��[��F#��S5�Y��2��eK�Qz�������H;�>��F�Hx\�@z�����)\�FpM'��7(j�
� �`��'o��Ng��s���F�3u��3dk��b&�q���e�a�x�P�V1�
^#X���Z8����Ei���.e�
���%u�fI����]��	8��(5#���0zFe���`X����d��i�X��e��-X���|�]]�;M��z%%�T����u!]Qmo����	^�eM.K����,�vV���T�uk���T�[Q<�1,������\k�����u�bKwV�f
��.
���[���>��^]z�����(��k���xf
��j���^�;jY���-�]kb�x{kEi��������Z��������ZCk�����|�{�C��tX�
.�/����0�Iko��m6�������bF3�@������K�
�wW��t>�������,�>�D]�e��>�l��t7�*]�g�������\���
zm����]��0j^��%u�uD.�]�lk��SK]�?@���������!6������0����z;�m�n��������n�0��a/�9Qty��N���NJ��_(�k��s+��j��9:��������Z�7��%~��B^}I�k0k��U��B��Zgw��A�v��q�]jC�k��T����{���3�zz�*/p�h���VR���y]C3�^����3���������^��w��/�kY�v���Y�Yy�~���!����o�g@�6�/�j��*�T�!OW/�����e�KV�n��3����g�uJ	�H�������p��i���j�]#�J<�.zh�o�gW��~�N��?>:�&�Iz
r.�%J��\�����8�R�$iSa����J���-L��$�R��K�t�j��	_�h�M�M)g^�5
0�Re���R��IR5�i�P�	������s_){[��^�������M������yo��
2`7��
Hm�!!�����n����v;[�?���?��5����v��9���������~7��w����;�~��$�������-�:d��k<���;Q{�Q{�Q�[��u����<8Y�~!$50�|��,�������I:W(��#����=B���G���@QK��0�ql]o�
�9}���<�������6��,��nd<	XB��&e$ C���V`NpV���j�,��7p�������Z
9�^����S����|5M!h�P',�DuB�(To���4��he�8xL���J��h��#F�9&���Q�P38]����G
�����tfF}�E9x��q@[���uphF��]�p����+�~��I��W���a�T�}JN�6�i'���O{�{��3�x��������
ywkr\����6[��J�a~��s
�D�a6"�8/�=#��IGPt^#}^�����A��R�Df����x���7c����T����MU_�N�����z�����"�'�k�������7�G����� aNK��iCR��F���?�������V��������?��m�
��x�om����v��v��v��i����d�o���w��`���I���N�Q{��V�����7)���M�&�*�K���
�o����1W������t��;%�_���������o����+~�~,����|����Al.��Y��0�t��Vi��ho��;�Q
o!�
0�+�&����?���,�TQ�pc�)p���:�s#��R��%�zf8a�������� z��v�lq�]���>!o^�:�������3z��������4��^��e���dm���X ��S���&W#���z17L��h�<o�,sdh���
�����?{f����4�[J6�D��_�Sp�5���0}�A�-��g��T�O���|�>�Pk-
^�-~�b�{w��l7�����������������Nw�7��t�~�#r�Y�L�
pQ-h$����tz���d~�'��s�I����]�eG��H��r������4(w�
�/���9��
���_j��-#q���*�m����tS�)7������?�������?�m�?��

#114

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#113)

Re: multivariate statistics v14

Fair point. Attached is v18 of the patch, after pgindent cleanup.

Here are some feedbacks to v18 patch.

1) regarding examples in create_statistics manual

Here are numbers I got. "with statistics" referrers to the case where
multivariate statistics are used. "without statistics" referrers to the
case where multivariate statistics are not used. The numbers denote
estimated_rows/actual_rows. Thus closer to 1.0 is better. Some numbers
are shown as a fraction to avoid 0 division. In my understanding case
1, 3, 4 showed that multivariate statistics superior.

with statistics without statistics
case1 0.98 0.01
case2 98/0 1/0
case3 1.05 0.01
case4 1/0 103/0
case5 18.50 18.33
case6 111123/0 1111123/0

2) following comments by me are not addressed in the v18 patch.

- There's no docs for pg_mv_statistic (should be added to "49. System
Catalogs")

- The word "multivariate statistics" or something like that should
appear in the index.

- There are some explanation how to deal with multivariate statistics
in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
section.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#115

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#113)

Re: multivariate statistics v14

Tomas Vondra wrote:

There are a few places where I reverted the pgindent formatting, because it
seemed a bit too weird - the first one are the lists of function prototypes
in common.h/mvstat.h, the second one are function calls to
_greedy/_exhaustive methods.

Function prototypes being weird is something that we've learned to
accept. There's no point in undoing pgindent decisions there, because
the next run will re-apply them anyway. Best not to fight it.

What you should definitely look into fixing is the formatting of
comments, if the result is too horrible. You can prevent it from
messing those by adding dashes /*----- at the beginning of the comment.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#116

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#114)

Re: multivariate statistics v14

Hi,

On 03/26/2016 10:18 AM, Tatsuo Ishii wrote:

Fair point. Attached is v18 of the patch, after pgindent cleanup.

Here are some feedbacks to v18 patch.

1) regarding examples in create_statistics manual

Here are numbers I got. "with statistics" referrers to the case where
multivariate statistics are used. "without statistics" referrers to the
case where multivariate statistics are not used. The numbers denote
estimated_rows/actual_rows. Thus closer to 1.0 is better. Some numbers
are shown as a fraction to avoid 0 division. In my understanding case
1, 3, 4 showed that multivariate statistics superior.

with statistics without statistics
case1 0.98 0.01
case2 98/0 1/0

The case2 shows that functional dependencies assume that the conditions
used in queries won't be incompatible - that's something this type of
statistics can't fix.

case3 1.05 0.01
case4 1/0 103/0
case5 18.50 18.33
case6 111123/0 1111123/0

The last two lines (case5 + case6) seem a bit suspicious. I believe
those are for the histogram data, and I do get these numbers:

case5 0.93 (5517 / 5949) 42.0 (249943 / 5949)
case6 100/0 100/0

Perhaps you've been using the version before the bugfix, with ANALYZE on
the wrong table?

2) following comments by me are not addressed in the v18 patch.

- There's no docs for pg_mv_statistic (should be added to "49. System
Catalogs")

- The word "multivariate statistics" or something like that should
appear in the index.

- There are some explanation how to deal with multivariate statistics
in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
section.

Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses Statistics),
explaining more thoroughly how the planner uses multivariate stats.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#117

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Alvaro Herrera (#115)

Re: multivariate statistics v14

On 03/26/2016 08:09 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:

There are a few places where I reverted the pgindent formatting, because it
seemed a bit too weird - the first one are the lists of function prototypes
in common.h/mvstat.h, the second one are function calls to
_greedy/_exhaustive methods.

Function prototypes being weird is something that we've learned to
accept. There's no point in undoing pgindent decisions there, because
the next run will re-apply them anyway. Best not to fight it.

What you should definitely look into fixing is the formatting of
comments, if the result is too horrible. You can prevent it from
messing those by adding dashes /*----- at the beginning of the comment.

Yep, formatting of some of the comments got slightly broken, but it
wasn't difficult to fix that without the /*------- trick.

I'm not sure about the prototypes though. It was a bit weird because
prototypes in the same header file were formatted very differently.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#118

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Tomas Vondra (#117)

Re: multivariate statistics v14

Tomas Vondra wrote:

I'm not sure about the prototypes though. It was a bit weird because
prototypes in the same header file were formatted very differently.

Yeah, it is very odd. What happens is that the BSD indent binary does
one thing (return type is in one line and function name in following
line; subsequent argument lines are aligned to opening parens), then the
pgindent perl script changes it (moves function name to same line as
return type, but does not reindent subsequent lines of arguments).

You can imitate the effect by adding an extra newline just before the
function name, reflowing the arguments to align to the (, then deleting
the extra newline. Rather annoying.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#119

David Steele

david@pgmasters.net

almost 10 years ago

In reply to: Tomas Vondra (#116)

Re: multivariate statistics v14

Hi Tomas,

On 3/28/16 4:42 AM, Tomas Vondra wrote:

Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses Statistics),
explaining more thoroughly how the planner uses multivariate stats.

It looks you need post a new patch so I have marked this "waiting on
author".

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#120

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#116)

Re: multivariate statistics v14

with statistics without statistics
case1 0.98 0.01
case2 98/0 1/0

The case2 shows that functional dependencies assume that the
conditions used in queries won't be incompatible - that's something
this type of statistics can't fix.

It would be nice if that's mentioned in the manual to avoid user's
confusion.

case3 1.05 0.01
case4 1/0 103/0
case5 18.50 18.33
case6 111123/0 1111123/0

The last two lines (case5 + case6) seem a bit suspicious. I believe
those are for the histogram data, and I do get these numbers:

case5 0.93 (5517 / 5949) 42.0 (249943 / 5949)
case6 100/0 100/0

Perhaps you've been using the version before the bugfix, with ANALYZE
on the wrong table?

You are right. I accidentally ANALYZE t2, not t3. Now I get these
numbers:

case5 1.23 (7367 / 5968) 41.7 (249118 / 5981)
case6 117/0 162092/0

2) following comments by me are not addressed in the v18 patch.

- There's no docs for pg_mv_statistic (should be added to "49. System
Catalogs")

- The word "multivariate statistics" or something like that should
appear in the index.

- There are some explanation how to deal with multivariate statistics
in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
section.

Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses
Statistics), explaining more thoroughly how the planner uses
multivariate stats.

Great.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#121

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: David Steele (#119)

Re: multivariate statistics v14

On Tue, Mar 29, 2016 at 11:18 AM, David Steele <david@pgmasters.net> wrote:

On 3/28/16 4:42 AM, Tomas Vondra wrote:

Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses Statistics),
explaining more thoroughly how the planner uses multivariate stats.

It looks you need post a new patch so I have marked this "waiting on
author".

Since no new version of this patch has been posted in the last 10
days, it seems clear that there will not be time for this to
reasonably become ready for committer and then get committed in the
few hours remaining before the deadline. That is a bummer, since I
was hoping we would have this feature in this release, but hopefully
we will get it into 9.7. I am marking it Returned with Feedback.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#122

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Robert Haas (#121)

Re: multivariate statistics v14

On 04/08/2016 05:55 PM, Robert Haas wrote:

On Tue, Mar 29, 2016 at 11:18 AM, David Steele <david@pgmasters.net> wrote:

On 3/28/16 4:42 AM, Tomas Vondra wrote:

Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses Statistics),
explaining more thoroughly how the planner uses multivariate stats.

It looks you need post a new patch so I have marked this "waiting on
author".

Since no new version of this patch has been posted in the last 10
days, it seems clear that there will not be time for this to
reasonably become ready for committer and then get committed in the
few hours remaining before the deadline. That is a bummer, since I
was hoping we would have this feature in this release, but hopefully
we will get it into 9.7. I am marking it Returned with Feedback.

Well, me to. But my feeling is the patch received entirely insufficient
amount of thorough code review, considering how important part of the
code it touches. I agree docs are an important part of a patch, but
polishing user-level docs would hardly move the patch closer to being
committable (especially when there's ~50kB of READMEs).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#123

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Tomas Vondra (#122)

Re: multivariate statistics v14

On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Well, me to. But my feeling is the patch received entirely insufficient
amount of thorough code review, considering how important part of the code
it touches. I agree docs are an important part of a patch, but polishing
user-level docs would hardly move the patch closer to being committable
(especially when there's ~50kB of READMEs).

I have to admit that I was really hoping Tom would follow through on
his statement that he would look into this one, or that Dean Rasheed
would get involved. I am sure I could do a good review of this patch
given enough time, but I am also sure that it would take an amount of
time that is at least one if not two orders of magnitude more than I
put into any patch this CommitFest. I understand statistics at some
basic level, but I am not an expert on them the way some people here
are.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#124

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Robert Haas (#123)

Re: multivariate statistics v14

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Well, me to. But my feeling is the patch received entirely insufficient
amount of thorough code review, considering how important part of the code
it touches. I agree docs are an important part of a patch, but polishing
user-level docs would hardly move the patch closer to being committable
(especially when there's ~50kB of READMEs).

I have to admit that I was really hoping Tom would follow through on
his statement that he would look into this one, or that Dean Rasheed
would get involved.

I'm sorry I didn't get to it, but it's not like I have been slacking
during this commitfest. At some point, you just have to accept that
not everything we could wish will get into 9.6.

I will make it a high priority for 9.7, though.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#125

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Tom Lane (#124)

Re: multivariate statistics v14

On Fri, Apr 8, 2016 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Well, me to. But my feeling is the patch received entirely insufficient
amount of thorough code review, considering how important part of the code
it touches. I agree docs are an important part of a patch, but polishing
user-level docs would hardly move the patch closer to being committable
(especially when there's ~50kB of READMEs).

I have to admit that I was really hoping Tom would follow through on
his statement that he would look into this one, or that Dean Rasheed
would get involved.

I'm sorry I didn't get to it, but it's not like I have been slacking
during this commitfest. At some point, you just have to accept that
not everything we could wish will get into 9.6.

I did not mean to imply otherwise. I'm just explaining why I didn't
spend time on it - I figured I was not the most qualified person, and
of course I have not been slacking either. :-)

I will make it a high priority for 9.7, though.

Woohoo!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#126

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#122)

Re: multivariate statistics v14

From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Subject: Re: [HACKERS] multivariate statistics v14
Date: Fri, 8 Apr 2016 20:55:24 +0200
Message-ID: <5d1d62a6-6228-188c-e079-c1be59942168@2ndquadrant.com>

On 04/08/2016 05:55 PM, Robert Haas wrote:

On Tue, Mar 29, 2016 at 11:18 AM, David Steele <david@pgmasters.net>
wrote:

On 3/28/16 4:42 AM, Tomas Vondra wrote:

Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses
Statistics),
explaining more thoroughly how the planner uses multivariate stats.

It looks you need post a new patch so I have marked this "waiting on
author".

Since no new version of this patch has been posted in the last 10
days, it seems clear that there will not be time for this to
reasonably become ready for committer and then get committed in the
few hours remaining before the deadline. That is a bummer, since I
was hoping we would have this feature in this release, but hopefully
we will get it into 9.7. I am marking it Returned with Feedback.

Well, me to. But my feeling is the patch received entirely
insufficient amount of thorough code review, considering how important
part of the code it touches. I agree docs are an important part of a
patch, but polishing user-level docs would hardly move the patch
closer to being committable (especially when there's ~50kB of
READMEs).

My feedback regarding docs were:

- There's no docs for pg_mv_statistic (should be added to "49. System
Catalogs")

- The word "multivariate statistics" or something like that should
appear in the index.

- There are some explanation how to deal with multivariate statistics
in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
section.

The second and the third point maybe are something like "polishing
user-level" docs, but I don't think the first one is for "user-level".
Also I think without the first one the patch will be never
committable. If someone add a new system catalog, the doc should be
added to "System Catalogs" section, that's our standard, at least in
my understanding.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#127

Simon Riggs

simon@2ndQuadrant.com

almost 10 years ago

In reply to: Tom Lane (#124)

Re: multivariate statistics v14

On 8 April 2016 at 20:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I will make it a high priority for 9.7, though.

That is my plan also. I've already started reviewing the non-planner parts
anyway, specifically patch 0002.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#128

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#126)

Re: multivariate statistics v14

Hi,

On 04/09/2016 01:21 AM, Tatsuo Ishii wrote:

From: Tomas Vondra <tomas.vondra@2ndquadrant.com>

...

My feedback regarding docs were:

- There's no docs for pg_mv_statistic (should be added to "49. System
Catalogs")

- The word "multivariate statistics" or something like that should
appear in the index.

- There are some explanation how to deal with multivariate statistics
in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
section.

The second and the third point maybe are something like "polishing
user-level" docs, but I don't think the first one is for "user-level".
Also I think without the first one the patch will be never
committable. If someone add a new system catalog, the doc should be
added to "System Catalogs" section, that's our standard, at least in
my understanding.

I do apologize if it seemed that I don't value your review, and I do
agree that those changes need to be done, although I still see them
rather as a user-level docs (as opposed to READMEs/comments, which I
think are used by developers much more often).

But I still think it wouldn't move the patch any closer to committable
state, because what it really needs is review whether the catalog
definition makes sense, whether it should be more like pg_statistic, and
so on. Only then it makes sense to describe the catalog structure in the
SGML docs, I think. That's why I added some basic SGML docs for
CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
not the catalog and other low-level stuff (which is commented heavily in
the code anyway).

Had the patch been a Titanic, fixing the SGML docs a few days before the
code freeze would be akin to washing the deck instead of looking for
icebergs on April 15, 1912.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#129

Tatsuo Ishii

ishii@postgresql.org

almost 10 years ago

In reply to: Tomas Vondra (#128)

Re: multivariate statistics v14

But I still think it wouldn't move the patch any closer to committable
state, because what it really needs is review whether the catalog
definition makes sense, whether it should be more like pg_statistic,
and so on. Only then it makes sense to describe the catalog structure
in the SGML docs, I think. That's why I added some basic SGML docs for
CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
not the catalog and other low-level stuff (which is commented heavily
in the code anyway).

Without "user-level docs" (now I understand that the term means all
SGML docs for you), it is very hard to find a visible
characteristics/behavior of the patch. CREATE/DROP/ALTER STATISTICS
just defines a user interface, and does not help how it affects to the
planning. The READMEs do not help either.

In this case reviewing your code is something like reviewing a program
which has no specification.

That's the reason why I said before below, but it was never seriously
considered.

- There are some explanation how to deal with multivariate statistics
in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
section.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#130

Simon Riggs

simon@2ndQuadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#129)

Re: multivariate statistics v14

On 9 April 2016 at 18:37, Tatsuo Ishii <ishii@postgresql.org> wrote:

But I still think it wouldn't move the patch any closer to committable
state, because what it really needs is review whether the catalog
definition makes sense, whether it should be more like pg_statistic,
and so on. Only then it makes sense to describe the catalog structure
in the SGML docs, I think. That's why I added some basic SGML docs for
CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
not the catalog and other low-level stuff (which is commented heavily
in the code anyway).

Without "user-level docs" (now I understand that the term means all
SGML docs for you), it is very hard to find a visible
characteristics/behavior of the patch. CREATE/DROP/ALTER STATISTICS
just defines a user interface, and does not help how it affects to the
planning. The READMEs do not help either.

In this case reviewing your code is something like reviewing a program
which has no specification.

That's the reason why I said before below, but it was never seriously
considered.

I would likely have said this myself but didn't even get that far.

Your contribution was useful and went further than anybody else's review,
so thank you.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#131

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Tatsuo Ishii (#129)

Re: multivariate statistics v14

Hello,

On 04/09/2016 07:37 PM, Tatsuo Ishii wrote:

But I still think it wouldn't move the patch any closer to committable
state, because what it really needs is review whether the catalog
definition makes sense, whether it should be more like pg_statistic,
and so on. Only then it makes sense to describe the catalog structure
in the SGML docs, I think. That's why I added some basic SGML docs for
CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
not the catalog and other low-level stuff (which is commented heavily
in the code anyway).

Without "user-level docs" (now I understand that the term means all
SGML docs for you), it is very hard to find a visible
characteristics/behavior of the patch. CREATE/DROP/ALTER STATISTICS
just defines a user interface, and does not help how it affects to
the planning. The READMEs do not help either.

In this case reviewing your code is something like reviewing a
program which has no specification.

I certainly agree that reviewing a patch without the context is hard. My
intent was to provide such context / explanation in the READMEs, but
perhaps I failed to do so with enough detail.

BTW when you say that READMEs do not help either, does that mean you
consider READMEs unsuitable for this type of information in general, or
that the current READMEs lack important information?

That's the reason why I said before below, but it was never
seriously considered.

I've considered it, but my plan was to have detailed READMEs, and then
eventually distill that into something suitable for the SGML (perhaps
without discussion of some implementation details). Maybe that's not the
right approach.

FWIW providing the context is why I started working on a "paper"
explaining both the motivation and implementation, including a bit of
math and figures (which is what we don't have in READMEs or SGML). I
haven't updated it recently, and it probably got buried in the thread,
but perhaps this would be a better way to provide the context?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#132

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 10 years ago

In reply to: Simon Riggs (#130)

Re: multivariate statistics v14

On 04/10/2016 10:25 AM, Simon Riggs wrote:

On 9 April 2016 at 18:37, Tatsuo Ishii <ishii@postgresql.org
<mailto:ishii@postgresql.org>> wrote:

But I still think it wouldn't move the patch any closer to committable
state, because what it really needs is review whether the catalog
definition makes sense, whether it should be more like pg_statistic,
and so on. Only then it makes sense to describe the catalog structure
in the SGML docs, I think. That's why I added some basic SGML docs for
CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
not the catalog and other low-level stuff (which is commented heavily
in the code anyway).

Without "user-level docs" (now I understand that the term means all
SGML docs for you), it is very hard to find a visible
characteristics/behavior of the patch. CREATE/DROP/ALTER STATISTICS
just defines a user interface, and does not help how it affects to the
planning. The READMEs do not help either.

In this case reviewing your code is something like reviewing a program
which has no specification.

That's the reason why I said before below, but it was never seriously
considered.

I would likely have said this myself but didn't even get that far.

Your contribution was useful and went further than anybody else's
review, so thank you.

100% agreed. Thanks for the useful feedback.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#133

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Tatsuo Ishii (#129)

1 attachment(s)

Re: multivariate statistics (v19)

Hi,

Attached is v19 of the "multivariate stats" patch series - essentially
v18 rebased on top of current master. Aside from a few bug fixes, the
main improvement is addition of SGML docs demonstrating the statistics
in a way similar to the current "Row Estimation Examples" (and the docs
are actually in the same section). I've tried to keep the right amount
of technical detail (and pointing to the right README for additional
details), but this may need improvements. I have not written docs
explaining how statistics may be combined yet (more about this later).

There are two general design questions that I'd like to get feedback on:

1) enriching the query tree with multivariate statistics info

Right now all the stuff related to multivariate statistics estimation
happens in clausesel.c - matching condition to statistics, selection of
statistics to use (if there are multiple usable stats), etc. So pretty
much all this info is internal to clausesel.c and does not get outside.

I'm starting to think that some of the steps (matching quals to stats,
selection of stats) should happen in a "preprocess" step before the
actual estimation, storing the information (which stats to use, etc.) in
a new type of node in the query tree - something like RestrictInfo.

I believe this needs to happen sometime after deconstruct_jointree() as
that builds RestrictInfos nodes, and looking at planmain.c, right after
extract_restriction_or_clauses seems about right. Haven't tried, though.

This would move all the "statistics selection" logic from clausesel.c,
separating it from the "actual estimation" and simplifying the code.

But more importantly, I think we'll need to show some of the data in
EXPLAIN output. With per-column statistics it's fairly straightforward
to determine which statistics are used and how. But with multivariate
stats things are often more complicated - there may be multiple
candidate statistics (e.g. histograms covering different subsets of the
conditions), it's possible to apply them in different orders, etc.

But EXPLAIN can't show the info if it's ephemeral and available only
within clausesel.c (and thrown away after the estimation).

2) combining multiple statistics

I think the ability to combine multivariate statistics (covering
different subsets of conditions) is important and useful, but I'm
starting to think that the current implementation may not be the correct
one (which is why I haven't written the SGML docs about this part of the
patch series yet).

Assume there's a table "t" with 3 columns (a, b, c), and that we're
estimating query:

SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3

but that we only have two statistics (a,b) and (b,c). The current patch
does about this:

P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2)

i.e. it estimates the first two conditions using (a,b), and then
estimates (c=3) using (b,c) with "b=2" as a condition. Now, this is very
efficient, but it only works as long as the query contains conditions
"connecting" the two statistics. So if we remove the "b=2" condition
from the query, this stops working.

But it's possible to do this differently, e.g. by doing this:

P(a=1) * P(c=3|a=1)

where P(c=3|a=1) is using (b,c), but uses (a,b) to restrict the set of
buckets (if the statistics is a histogram) to consider. In pseudo-code,
it might look like this:

buckets = {}
foreach bucket x in (b,c):
foreach bucket y in (a,b):
if y matches (a=1) and overlap(x,y):
buckets := buckets + x

which is the part of (b,c) matching (a=1), allowing us to compute the
conditional probability.

It may get more complicated, of course. In particular, there may be
different types of statistics, and we need to be able to "match" them
against each other. With just MCV lists and histograms that's probably
easy enough, but if we add other types of statistics, it may get way
more complicated.

I still think this is a useful capability, but perhaps there are better
ideas how to do that. In any case, it only affects the last part of the
patch (0006).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

multivariate-stats-v19.tgzapplication/x-compressed-tar; name=multivariate-stats-v19.tgzDownload

��6�W�\{s�F������UUV��7EQr$UG����I����TC`Hb
4�Lo����= J�c���P�H�==���Fz�^��(����4��2#�+�_�e�^�m9���}�t�nrN;K���/�s�p�g<�?��F_��3F=���/D�Y����D�B�S����~���>�B�����=��c58��p��G�7d�{��t�*
��Z��X�z��?1�4B�J�E�������8I��7���<�L��g�w2Q�����e��H��_�_
��e���t��&���;�{���w��-XI+i����LI�VI+�((i�8Q2W����eH-���x��P,��UZ��L��n !i���mG���b��U�u�e�/�O*���t����_D�P����S?P���p��
�C�O�G�n�l:��O��������'��<9��CO}����O:����H�>,���V��y�����F����-,�>�C�;�Zq4�#O�!��J��(L��D��_����K��}��I�T���e��5=�U�����N������������>o�E�7�x�O���xW��}�8���MB�5��I�Jr*&}��_��D���1�EP�%�o���U�{�����2�<��X���>.cR�(�Ob�2�ms�<h?�	���M�1����388 �F��� 6���#*�T^�A�q��(
,�aS��Y�xQ�z��!���H2��/�/��<JO�Qc=!C���@���t�C���x���sM�����?��&��'�Ag�9������(���\��k����<R�Wmz�i���d���A����g�/������p��{�������������d2���}%G��q�������N��h<Q��3���7����U�����4D4���5��������_���v��v�=��NQ�N��Vj����x&+!�@��QkM�1����A��2$�%�N���D�I�z�8�����+�(�M&2�f����/�	����61���������;�����|}+B�P���H$�@,	�E������D�"-������M@�>swqS���t����q�������;q�����s����>��V��
�6H���h S���D$���hGl3����NR?�(s�����������A��K(4|:aT�f��#��	H �f\U��i��� �Q\�ws_/������
R��fe3-�H���$�%���$e��i������s ���+!�FA=���`��t�C$ L���D�>�>��=Hd
b?H�e��k���~����d.��{�
��ZFG��g�P�"��+M�������9d1�
�u-������2�c��N�� � ����0R��Q��:���+����\�U6���N#�@�h���S�<D�����$2;
�:Z0^��n<�x���@��*�?Y5����Z�BBG ��(~�b.5��MY�
�����n���t�c��h#��(��������6Mr�!����U~��EK�����%�
�o+�.�/��D:�a ��R6K�Y�]
��K�*Q
�H����VCe0�b,u�%Pg+�
K�ZH��"���;M��VN�4
��!�C�uAo��/��b��	|X9�P�����22�;,�e��-��G�kD�O��i`w�M�5X���:r}���)\�~S X1k�G����D�������"�~C�eG\G�B����g���c"��m��D�����
I1-�=3~��!,0��H#�=I��6���~��������o�\ �L�	�&
�d�L�0c(j0���o�vu
��6%K���y8\�B: �M�� �O�:�O�s^2.Z
�����P��|������WH)�'H��b`	`-�,�p���<!mKlT1Z���r	
-R�&��7���+r��a�cq���P��)
���*��~b��OQ��G�����ri�����zH����C����Z
��0rN9
����H�<L��g����H'1�c�Z ?a��F�"��m�Z�j��*xM'��S���`��g���r���HA��k�R2�D��&'3��"K'^�����]���o�c_�1���J��#���y�)��D��8V�a�$�X/��OW4Eyn���~m�v��TC���P��{��hp0���(���g63��=#��H����T��������3����g�\tV���HD�mGR�*���L��.D���V�,�!��p��K!6�dR&{����l�D :�0��l����(���H�6�b:�5��?�M��_������	4%��^� H����e���	>@��e/��N�{��)��'~�:�������$�  15�N���D�H�m�� �q��a��wI9��Y�P�T�0����!�0��'�^��9��-�.��,����/����b	6V�.F�������PT����B
X������h����Gb}�MJQ7/�{���P?d��\�+-`��D�
��C�J�~�tw���V�
�u`y�_m�����j����������-U�d���m�c�AV���O��,�������#�!(����-` �����p�1+k�*������>m3���j�HC�����)j�����=�N�Y���D��WM�T�'���"<5j`Y�(����bKG
X�&uP����J�,���"G�b�L��4�l���xH���}���d���&��C��4[�f�\�S
�&�/�ikX�YT��o�,�Jz+�WC"���E��S��z�Zg����Q�I����Q�z����.�������
j[��a�rO��n�O*�{M�����#�*/=���\C6��K�k�[��M0.�!��%�q���o�Y=��8�,l|��N �D+h�R
�D���W������)�j5�l����%P�����l�,;����d`F��:��X���yT�����3� s�|[������^]�����yI��]{n�U��c��U������HqF�,���,�Y�B���w����^"�x7�Zb����Z��t�1%0�:_��O������l�'yVE����1���uBf%���+A���CD�Rcg<(s��t��O�n�g�$�Z��xl�>�����I��X��`��31d�����?��k��<OBn8����v%��Pcs�9�v��]OmKN_
�a�3=����*����7�8�FP����QN}��� t*`������2�xk�����a�n�?��#!N����F�����;�+]��v�w�a�$��@�]��o�1']��z;0�Y��S�'0��e�/d�:�9;?�f�pP��6�1/�����9E�
��e��kkZP�	�t���#������t|nC��g�f�����x��i�|�R���I���������n@�~�#�l6>?&s"�����k�cu:���R�9$���[}7%�+{z��@�`�-T#�H��pu%�@3W/M���tZ��Gv`v����Q-����(rr�BD�i�6{�R��9�b^�W���)']+��Ip����
�P'1�P���Z�B�q3C��7c�B�V������*g�(xP��u�2^�H�QT�g��Ol:�:�>��	�p�*�(*]Q��)mR!�����>#[����Ee�LK�$��e�u0M��������F��L�4X�MU)
).��ZbB�Kuy}{qsG���������w�_�le_��7o����{���F�E�z���
�p�r�H���}�^�9��$\��6Wk
�^`�e�2)��
[
~3���p	m������B�"L�M��/�����)�kD�{��.���L7������2�����I�����p��~�����|D*��}���A�����v~���~�F�K
#�z)p=�t����`��n����B�0
�q�	#r�u�]	n�5�i�SI9z��L���a5�	�s	����Y��Jq�C%�3�"l2��u�Z���W�w&���5����wo�/���ef/��+2	a���9�����]��,���3�DQ���U�����x&��I'��N��������qJ*��{����	g�&$!Ec��Ng�9������d>}���9!����!��20�s6����W���xh�W�������r���G�N�7�G�������S!e�;�6�f�CQ�F|]
����&ko8S��X�����[X�� ��S�y�b-�b�M�V����S$��H���s�*����,������i|���K&^N�V����=R�����@SQ��z�M��
�P����>�%�{pkQo������7�_���s�3`��M
i���
��s�a�I���1�;D�����
��^�	p<Z��[�!>��`6�������~�?����JvS2e�3q.���N!X�DM�1���mf��(y����$;���-�4H�V8���_S'���\�@M06V���@���,�YH�y�������H�fXi��ad
�������Q|�OWb��Z8�����)�$;���XG�+��i�}k��ri�P��5Z�%��}<�>�EF�����H�&e'�7s����
�k�����������Sqpl������??yx���j��~��7�h�0�w8fc�:��O��W$�^��!���Lpg�Vz�����_r4����#�ZhW2�)s����g�����4�n
��,[2�5�
��O��&������O�����:hi�������U���N�������9=(L��
�X8��d%P�<��L��W��u�v�v�)�4�q����ct���W�,�W�#����M{,��J7��l�dn�u�t��]������v[�������:�Z�(d�	u'	V6q�b�����:=�������1�,��7�b�>S+G���J�
s�|C�2���U�+�/���r��auy��@}���J���9Z+#�������n����s�:i}O
��E{*�V�!b��.���(N���F�*������g.���aw�%,�5�����f#��w^n3 ?w~���;�KL{��8~��m��{(?�'^�3��r��P~E���� :��L����kj{W���6��7a�u�������
��D��~�������g���[������wWb����S�w�
����R	h����w����g��
��t�\�����yT=�������7��LKE
�-r��?��T��,%A��$���*hS��5��eJ���mL�\��,�z�V�lA��`���r�(�M��6CJ�����v�\�����f�R[R~�}�����?T�T�.�{M�)u����J��W+mY�%����f��Yo���4
([u�F�9Xn�`5F����i~��6{��>r��+CBU_�{��~����K��������;�����C7L���jR�^�5m���@��M�3�_��^k���P�E���H?�'���<t��a�k�{F�R�o����C��)�J�Yw�M�R�w�xKd,� M�������A2|�d�g����G�n�k���3�D[��f>Ej��m��c��1�k�,���y�C����J��I6�f�����;�k1���7rW������r,}zbS�ks�$���[����M����t�GY��
h{g��������R�g%�������`�?��i{W�0z��_��s����������i[rKr����-��*�P`Y������=��q�y�u�@����k�y�u�;��{-��N�^^�y�����~UW�����s������}�q���� ��?�'��7Bz@G�����wWo�P��x�9��w�2����Q\gaW�������%v�����|��+u2�w�����������X}X�8;�Q�j4U�Y���D������9o��q��
8<��������>��SC<�j�^w��� h�/��ke�����&���w�YMwEx��x���wqL����X�s�-�v�=h�����33��M��Hs��Kv	`����&����������hZZ�D3cS�8L�O3��H��&vy��gO@+0�
<��"���n�'������,���F�{�������8j>�� �1���C�3:7p����g���B�x[>A�|�J+$��;�<����?�������q������E\R��s�[^�K��m���������� h���G�a�
`���:��W�'L�&�CY��&�we~�4�����(=N�o����2�0�.�P^�����!����*�y�ueJ	����P��e�[
C�2���h<���6	��8e���s�.oD=�^V/���@U�!�L�r��������8~�;8����1������"���Oa!�lX���n��D�V~*�������_.��<t+��������?�����Z���'W��e��L�E����g1_N���Wa)�����>���a<����A"v=`B�J5����FHo���c�AUR��rv���m�-��Y�?�)��Dso�aq�`�s��R�\Mx�G�N����6��\E?��n�)*Fo(l�1�}�4�%��5�Y��l`#��%�&z�U�����#g�����Pq�����=O��D�<�F)�k��������Q��^)S����������
��.�����q�w�v[H����V��`����@#
~����������e�1����!M�(!�H��
#u����$�(���u���B�1wi~�OV�+����g\<j6C������$+(���gevkU��������K��`sQ����31%�:�g��s*ag\��`<sO�[WH�@�
O1vH0�v9�
�F7��1{=T�,O��$����
W�C�ku�:+������E���7{FY�u��@�A�����U)$9>iYH���m�xX�"��~����?
�dq�C�#�}��d�vM��E�*����[��t9�Ra�J�;*��Kf%F*3�S9'i�t��������aJ��"����lh�4Mf�JA�� RW����R4(�}
	pUZ8���K`������u&����!mq��c�a���C��=&�va��b����\��Mk!yS�@���0i��?os�|�0���z�>�������9er�/����{j�(���?�l������95�n�
����hBY�6]c�l<YtJ�w�����Llsu��qq8]�{'�e�h�{�JN�MU��A�h<c�I��_�j�=������b���]��N���6Y�eA�D
I��j������������q"m���M��������Kl��b|��7iTt���sw���{�.<0�[@�����K�3�'��xu�pnt��mn��UD���acg?XM4���#>l.�����0�d]E&�� �zIv�typ��?�48ot��`�����U���KR�U%�9�W���Q6-��3(��*qyI#�22Fj�b��a&_�S����]E�
J�44�T�3
�V#m�F~KU(iJ-�	�K���+Q�M�	��
���O��0����i)�,d�H��������RR����T��V�������z7�����5n�|��-����!���hoOi	�o	���1v���;��>J��x����}B��d���]��
��s���n<���s�-dti/
��;��!���-^s����<u����28���)��7�w�W������h����u��^����vg��;h,�z�i ��0S$��������e������:W�@���*�� 7��gH�N{�hZ����W�gI�Mn:�l6����-E���T�H��h(/�P����_�[,��v�� E�{��.+���g<��$��T
��o��7�p�Hl"q�y#
��A���Z6����>�����%��h����M���m�|CD5s���j�&\��Aq�m�m���A;������K	����'��M��|��������vP��w�;;���4P���
l ����S��ZwB7Z`X/,���
�(Z/UsK�PS����EE��Gg��������Am�E�2��S�I"�)�k�"�M:��Q���L��`�MUR[�G>V� �j���;C��}�^]�=��4��JU�����I���7��T1���a0���^k�[B���5j���961���=�����|�q�8�u�'�FE�>
���M.K�'g�l:�,��;`�+J���)�n�_�T��N@r��������U�|���f:g�7(P+S��'P
�����������e`.zq?����������/����;_�w���"�%7�������6O1#�5e�B>��o����:�T��e���������&�4�7qb��P���w���A�AK��!*Y�����8�.�*�S����m1���>�&)t./^�	��%��{[�Ql���#�F�B*v��G�QI�T��4w�d#B�{�������6#��E�`k�������~k��qz�#O�!P"vZ�(ze�^-��D�6���C��jso��v���z�(�F����0��G8�������R�5��kk��.N_�\]u��]���������fj���7�v}y��e��k���Q��+�Txs����{,��f|Bma�������������r������n���#'���t�J�����`��l������,�LK|���c^��a�`�$��y�������Y��7�cA�J������S���F��2�|B8���X.����m3*�I�'�Yp%[���������pX��P��$�V�zC=p���Jt���S;�u�	g�2�Xp�ZUp�e�9[�#��aIRMus����m�R���L�l���I0vEFQK���
�G]�Rdxx�����2�����GR�5JB�D@����em:�q1��X0��X\��Y��w��+��r	;6=6w����9�I�R:4D�a�����!���Gxxq���IXF>��0U�"������;�1��L���X58������'�<���X(�E�Xu�
�<��������p���C�&B� ��:�vv�8���6d�H^�B�	$��f���#��Y�z[TN���%q�_'�v��*�i$�wH�Ak�������Lg�f;Dv�
�(kPx'��Sp.�t��PSh��@T���i\�� ����zP��
�����Qy����1��967	���fU�=\��,��9��&p����"��:���	q���+|�2���*�/�>��?��������\o;��&'�������
X�����t8��6�)7U�����:0�9�-2���l�H�j��kM�!Y�ZNz��j��(�C��s�8����M9�I+����E9!���zt����q��G�����A�nl��X�.��~��-�^�Q��@���*2
�p���r��Hh(��������p�������l������|��
4�����������o�������2��2����B����*�f�c`u�bn�s��P#�_.��������(?p~��,���7�.��u�KW���OeM�������p����w<s0x��Df����>xX��'m!��"���R&g}��
j�l5��)��A�x���C7*"s;V�����B"�}9���[)���	�|��`��2N�R1��2$d��8r�y'��������l>����4��'@������^�����4��FO�8�NM���<��.�>��%>Q2������+���{�SY�V��F���S
������z��.�t��yDRY����z\Q'����'�'�7��eh�4�7R9�5R�z��^_Q�?��j�:o�K/����Q�1���T�(A��EI�C�����n�����y��C������G���?��(������f�)�"�aE��m�r�gx�����^�&��)��$�P1f�
��M��T����^.��`b��6z�:������������f�!�.��M5���>�����4J��[3���In�)�'D�D�X'{�T������:��(��N���l��h5�7rJh����������`+H��6��0���v�j�(�����fGI��D;��������o�O����\��bf�^���^�w���xU����z�����fm��~�h<15��W�L��)Dx����N�NH��m0%��#��O��3������I�-� �
@~�����(N�!�B{<4WI?dvt��Y�5R���yn_�C�a���2�Gv����R��$����;�_��R\Tm��|N���4^������a2�z��������O�)����/`���/��x�������o(���#&�_�q���71���$��F^���
R���^E��N�HD���]��Dq�5�^����R�Eh����X;�e�����!�ih�
W�+v�F��a��Qq�1P��Yz��`�>*(��F������9d��*-��k�!����D�����Au����%�\�hm��q[%\�@��8x���i�4�BK/�t�|T�*�/���A��C��=�^A2�n��rH�$;��>��Z�M%XX����������ZY]M�� 3���G��@I�����	zC�\�����y���6=A�LM#�|�/���g�,�n���\��cYZ����zK��a�;��
f7!�
'���~�l��L����V������Xi�k
����(�H�y:�Zi�>��dR�S]��
t�QY�����)��fx~�I<��/{�{x���"���E(pO��e9��H��v�D ]�)�s��l�_��##�F��vQq��dr�i#��7��Yd�v�pB8��b�~2b~�iODU#������A5PoX6
�U��G�(���Tz��qY3��D0�����/�W2�5kF����
\������F����-V��J?Q��6"\���R ��12=H�DC��h�GDYZ���}��ongr XH^��&n�v���C��	jW��%�P��_�&ka�2s���_�}����u�o.��9{�~���
�N��&��:]����E`@�H��Lz�k��b�ME�)\2R�c\u�u��������3��t;����JO����L�>]�{tr����yE�m��':A����M=�������Jp����������`�e�t[N+�R��8GW��.��]���;�k����g4�Lg���K�$"��4�+������%S���F����V�8/kv�Q�kWh�e�V`SVxbn`8��S?���Z}���+�z3G��
�U������
EC�:1�A�^�d�e
��~��T?��`���ND`_���{T@nZ,H�[������S�$�����:�A��O��v�c�V1b��q>��mN�T�����-g�8�D�F��>�J�kg������vo����/g�2
s_����=`K| �8q;��:��x	{rU�����/��_���&������u���1��y[�������`�Hyc�QS�
v(>3���F�A(,5*9p4,�2b?e��}^'���+���l	�.A�L��(R�t}z��|-t�54w3�-S�~8I*����=�{���xy�;�c�Ssmr�.�����N�{J{r*{B�������O+�9���R�,�AH�4�����{��S��:]LN+�d�4�T�QcG���%#�^�^w{�����jl��D��rNU�������m�DG�`���$S�;A�F������-���5���
��K��F�u�'�&v��]���|A~b3� ����U4��M[W�a8-L�������^��dN��[�C�
���OJh�n8���$�3(%&�af�-�X��y)_�&���Ao>�/��_�RS��;�%������������8����3;FU�!�r�x\`���[��0�e������i������z����Og�Z���#����~t��HG�	/	_�F������3	.�u����'��/N^���>�������'/�O1l�:�W��/�^��$������Hqur��������U����&n�q����	����GS�}��P���
C���Y�e��oY���&���N����������#���L0���C��Z�����S��;����>\j�����"�l���)D��63�������#���m�����
��)��rMRbqyI�G��}9��+]Z�o�����O���h���B%-&y�	�{��u������wQ+���w�����2j��Yts;���=�	1~O/@+�T���1�
����^����7��O�8=R~I�6����)����n�����C+3���Q8�/��I����":O�*(~�`�,��K ��������q���>;����7�,k�0��]:PpH����!z�i$�1���|��@U�4��c>Hn�3h��+55�IhY]��B}�uz�YM�g�Q�;��	��>k��`w�y6�e�((�W�co
�o����&�PSJT����d���
+`T4w�SP�Rm	�(4���@��;�p������"�t�2o��5����Y�����]D���z�$u����i�>��8���E����)�����@'��������sG�DO������|iw>�����S��fu���_LQ�_V=�Y+��T1}�7z���n��n/5��E��o��D��2h-�W�#��|��_��_�9k�z��R?��Y�o�������o���Y���Z^Y"�r��.�M��K[���`�W�������'��c��@R}������,y��},k�@"��+����u>��}�g�Df���Y�����]����yH��<�!�$��3�O^;�*%7�gf|���YN���9EV
L�����Vn`��81�Z�
J��x�s���K���0�+���&��9�Y���3r�o(��i<������������fX��rB����`��>�W���x>Y��NM]�7XN��z;�(B������"���@�]��c�g��U���n��������^�-s� q����3p���J�k�aX���/U��!b��4LS�0������@�%E$�'�W��3�`�s�����r��P����rO[�Gc�L���H����HAf�������}�������]�4�3��&#h��_a<`��=����)#
0q�(#�9K���bX�#�
�P1���������S������ 5�����V���n�N/�i3�sJ��{<���G9�}�^�~�6�[�����|&T�	�R��1���0�{'^I�v�.oQ���u0��}�3=��:��O����m����P!��P���dz����Y��%Rn����1c3x�f����`q���������vF�r�����p����^�~�TM�`�>H\�A�8�����<�sq���-?��?o�y}�]�����~|jZ�����'���1�v_{���Wf�#��;:E��V���g+�\����@y+v
���Vr?�#�����t��	��E�`�����������3d�����S��c&�Zv�A�g�NC�&����
��@Y�-���*B�3O��g/������x;�%'e���h�b����s�����	�j	8y�!����2�Qg�*�����y�����k��7�/?�jV�(Y�����Q�3��F����d����S���Jyl~�������U���RiCw=��[h���q��f�����Ev����y�TP{u;�����R���_�EUt�?����u���U�&�5uB4P���������*��
@jLQ]9("q�rd@��
�d��+a])���	�e��(e��@I�e����7���9�
X`=q
,�tcb������A;USIev_�Iv�D���d�&��kdA����������������k����]���a|g\�1E���#���*U�!������������1���&$����D(�"�Q��X+�Y}e�t~�moG���/�W��v��l2���4_��s���&����.��n<��La�3��Q��@��R�2pcDd�|<���cH�m���"��>2E����f����y���$h�c2���sc����	�Y�=K���2p�qK~�?A�>�3.y�F���y�N�G�)6�����a.��|L�sji[��)���(��@S$��R�2�k��A�	0*|��Y�!�cx���=�(����50-P���5��R#�\2�9�'V����8
6U���V�W��_�!�B����9����E����g?^��e��)#��
����������EvV?GyS43�^n�N�9F���!y������"��K�$~���<���J1�5�1��|�B����fKz������I�E"PS�n6�"+���pVfB���d����T
��`�=t�H�7��9������3
^BaKpL4A�^8��/��� �����r�V��N��+6�T}�x\�t�9�|Qc��m�1!�zq�4G�*��8�-��!4������,<��T���V�K,����7^�1�]NPr�JC-���#C��[6���Hw���i�B��I�)}$�%b�R�nk����YC�S�%��D9((*�;��������B<�i�p�7���[�0EKC)G�N�����F����8P���D���rg���
��.�=����^����P�17@�M/�����\����V�z�~�b���������J�X�(oI�����,iR)�9#)�*���mQ"S�}<�|�P�IY���z�n��txN�HUsc0�h�1Wm���6���%L�����N�t���J�@��b�����h e��JB�����v�����\J3�YF���*���1�,��Wh)-2��D:����������\r�W���2�P�	������}*G7+%M�F0�J��5(!/�������4�������]Z��q����b��a���i�-��r� `��|�U���*?�5�������2�@_�[�p���few���k�w*-4���2��-+����	T�"��BVp�a���B+���%
�>��4U��?@��uH�z�����z��{�wo�vvz�{��A���9l�d[�*��e�����C����"{��!;_�Lv��.K=U�1�U�~:t��<�P�Z�Q����G����)�}�������<G--U��x�}�y�g�W��Z�M��~i����dn���������W'����e���Y5��d?�����g��Q
\p�t�P����'F����b7�=1b�t�T$�r���#(L��dj,L��v2��E��D�y4�o�	��6��7i.�6���V��hv{2�I������\�g�P�}�S~���N�����K1��@"p���#Z��#~�(�C��`�������kZ�M�k=z�M�F��D����]�j��n�P���g�m>~1����*QAg�/.�-�r���/%<; �]�	0+���p�y������{K�A[�qu�O�C3����:�\l�l��5C�������Im �4Q��b���.f����%��}NS���j�i��	����G�|�����f��pQ���,I,k�U��
�w\wRP �;���i�����M2�������/��
3b�Q��p���,[�[/�euQ1mS�o�p�7����^��p6�=��E��,q����r
PX�B?� m�����8Hn
�w�$J��l�������gt��*0yP5�ny�O��������������8JOT�et���p�
�b�5T���Pe���D�"Q{���!F�������,����4�g�H
i�Db2F�[f�k�=��)1��
�i|��-���A�"���W��Eo�^���0�����n��e�S���8�3	~]SaD�l�������am�������o�4���H�)s���<��(�f��-��O��T,�5����g��<����/�p���
^)1s�t;��=\+SbXq����;.�=eQ�@.��l~qu^��zr$��UsHr�G�~��P]���k	aHGf�8��;�x�	��3vy�����P��c�����}c��hS��[��}����x�����V����V+�V	:���UyH��������v'Y�F[����CI���}8xq��V����X+��mh86%(�J��>&������ �������]�`�-Z	�����������w!zD3�a'Q�	K-�����}���=(*���
��6��)2�{��zA���Vko��<,����Y"�/i��������p=�%�@��7Qxw�K�
iH�4#���&u�F��i�S�#��$A#~�B,�I��8
*DU����R8EL�-��
8�/&�4�-]L^P�L*����.�s�S�M]���?��n�_��MK-T�����h���������f�s#^}F+nG�>�{���=}��P-������t�^�Cw�V*G�x�$��A�"�dn�W�^B�1/��L��y ���.�}�.F��L�(��=
�����k���m�O�2z{uv�R�y��nf��Z�E��a����(| Q�\����B ��\]q�*�����[�d��Lkn����}�=�H�%uY�_���"-s!��d����d-g������F6�'�
��"��Q�@SD�2t}#�b����w�|W���53o���Q�����Fn�����j{-xq��������� *3���������'���7���Z���'���+\����l6����Z��b���re����G�8�����0����o��1t�`��Spl�u��k���������F&����-)��~��P^m�����$���a�oA5w�N�B�7
���=]MK4v��.�?����H~���//�/���<��#ta��C��W���%N�����H�@�����e"KuTu����;L��9GtV�)1�ykfJ�Cu�S��1�=�3��1�/�<��_i0��O%O��v�I	��e�rc =�w��}-�29�Q�����2�����M2�D?�6}f�F;�s�;����cwR�sZ�RLc<D�N���xX�9���S��/�L9;n�P�l�~����Mm��_	k�_�����f�{+�n17;����G�lrk��Q�6z��^�~w�����Y�6U�C���p,�����4F���\�j��D�P=�]X$��xM1��Bz�0{���L��*H���1���,Ie�I�W����?W�T}Pdv�HH���A���upS�]I���a�4�����7i!�J���
=��|!�y������a{����N^��O�9	]H��)'HZ��-�T���/��X�|�q�����Y�v�)�b]�7������U�u������&�^�B�(.��Q��l\�,�k+3k/~
L�s�k���GJ������������w����m=mX����90��UC��!���D�D�����W����*�s�\-P��&� z��R'���������
�����Y��*@*4�I��IR�~�h�^����A@u�[��P D�zV���LgA��6��~����J^��b�=�+@1���Bw#5�b~���ct���]G"��>�5����*T��
��R�>��(xqy�,����A
wKK�3�`~!
�V�o4������ �^�f��#Sp����|v����fA�!�����w����3�GB�Y�mo���@a���3*J������3�Jk; 7G����,G��<]gNK�1�	�I�#�6I��<�I&��c`�zx��xh����Q<�?��GM���<����yg����&o��s�#Ovz�?�S���$�V�7V��hT4I����<%'�u(e�S�3�W?�`��x����xn��pS�X+~(��R'���9�y�9�,���[�]�="��sS4r�q�H�K��7"@�g��lS���Z`����]p#��UFO�D���E����b��w��(�������2��X�'��N�~���1�������Z����0'D�<�}>'Z/��]������[�1����r�;s�iO���38[U���9�����0�A@&	��6��aaF����5�V���I)��X�x��a���]-��J���s���e���B��0��y|���|`$
�6`��s�{jh6��`2����f����L�b�m��6e�����X����|��$\��O�Y�o�(yvY��h��g�&�����x%��BK`��2-���`v��&�����M�;�9�B���1z;`��MR���P�p5_�fz����6[{��`���� ]�A��L�F	�E���?H8�(���}	G���F8o����VR����eS-������]7I,�$�_������|�p��?�f0b �sX��4Vu�HI[H�����L����v��-hf1)�$"�1v��b}�IH/]L��r��� ��M����'ur��I�]�([E�i�0�:Cs�t�����`�J�]���U���k�)�N�nV��+&t�2h��Bs�!d�C�wJ�U��@v��\�Mg��`8��Q���FR4 ��K��(
�����?!��5S�[GE#Zr|�!],HZ)�h��p��K��t���Mp~>c������� /o�Oo\R#��j�0���_�&�.�J
{/U�*(X�>D����9��d\��=S����t-��]�O��n�� *�`'��J����`����������o������Co�J����'�(�����H�^0j+���	0�!���u���T��	�_���N7Q�1z��;EchB"�a��1yuK�:r�@���:�'��x~����7�hG�
/�<�;U�$F�����E��ML5*N�yrv���$&
��B��E�j��Vv������������$�:��,\*Jv���db7�o�/}uFE%��I�S ������GV����8��h�8�0�}#�H���n �������f�1TZ��9i]�a�0.�����&S��z9���J�vGy���(���6�U�	s>EAC3��$'0aG��R�*G��u�o0�
S��F��dN1��B�a�@U[�&�*�SU�������1�o��������#�}������, ��'(l�{�����^�+
���X2RD��w��HSx��tt���|`��u~Di@�u8��h���*(U�Q/L���{�C��'r����c9J���M45�=ND}����x�
���4���T�mf��w�W��3qk�;�1�;�Y<W�����U���V/J��=���\����8k�-�z~��a�����K�5�� Lz��
�������sDy0�����v��@�-�/�/NSq�����Q���-p�u�@�1F�K����a��EI����fC������
���D��f�'[[��~���Oo����sUw<`������BI6@B"y���jq��+#�	:�o�#�s�@S�������^E�*�����$
��jAyA�U��b?��w���q����.d�|"��a`,�s$aY��p�'2@��N"1�)P��{�
��"`�pU(,�B����H`��T�c�0:�
������=�d��:�D��^b&Z�i��zF6�x�)�$^X�����nIF���,��|(�B:L����k�~�Ot��Vp��$��]�>3@l�t_vLW��r�������~��H����H�j���s
���D,�8������Y��8c7��x���-2�e�
���jy�O�O^}�?m ���2���&�"�`x&�!^�����f�%�8�I�v9�u+m��Yb�7-�
?D=��=�� De��GR����~t7`I�����T�A'U&xczI��F2�t/�>� �P�!c)�����9�����H!�Y��~av�����t�)<�	�t�L�2����ME F�H����
o����h�+5\�#���:��%Ua��Fb���)`�����APT�[���Pe�o�`��1�[L�$X����Z������������>���f�jj���-�T��������l[��������#I� ����(�
�LF����B3���
��~��@Z��O���I6p #��N[��z������2�D}9�����i4�Fu�=�pu��m�d3�B7��F�]�ipi�`e��k�KK9�<9��-J�������y4�9XL����bAN@O��9�6ezH��5T���Nz���#&k>���!��Ba�pe�����g�Hw�5�!����~]�R���;���P�R���B���*���i���X�(WS^wH�+�N�Vg6��0��X>e��1�F��2^)���:.�H�&��Yn6�K��U�����F}�"CU$Q��k3�����$�I��>���������I�[f���F8pe(�*f��	L��03bu���0-�������
�G����]�2�Q �9$`�:t.;��h��-�
�����p�q'�{��a�8�#w:$�����+��6V��K��{O&p����n���3e�`	�o�D!�w�`�z4<�"PP�;�no�<$������4,����K�/2��G��{�[~G}k2���-��^�:�{�}���p�}�7m~�K8k=%=fA44u�`������������8�	�WCn�,�2k�G��������xp
���4g�/H9���{!G��z#�1����N���E]>a��b����= S�#a2���!�"�V�La}�|�PK������[�8%��5�:;��!��w`hg�^�u�d�Cd�i��r�"Xd��0�&z���j��k��)�n�� �S�*Lw<��b !OW��	�5Fd=�"�22�!a��C�c�
�i!&Q;/�A�������y��p��,$f79�F�q4����
6_XcF��+� ���E�'�����E%�����$���wt[����4����+�
�=wO�{���O��2k�a.�����{.0�[��?a�n�)6k���\0�����8��ob�V[�@��#C��{�D'a���
�_�6�)�n�(Q��I��0r Z�L��A����@,aKx�����f����+s�t��L�wg�;����4F��o5;!H���p�T*U����$\dc�K1u��|��u�u��oel��n���DI����5}�lf��	�-�g���e����
��3��4��i��KY�l��8���{bF���(
+	!�<7�)�Y +�����K�r*+M�A�+1��ZB��(e@q��\�)�4R���$up��
�����=�R0��Fa<#�����F1�_��n~|`4�q�/i@��z�oag����
j�s_�G�J�����/2��X2z�f�}������ix���6�@������@
�f^��x�HBm�.���.f@3Dq���.�#c�����^C�_9�H0M�{'-���-{��g3��Q���������h���-�n8D�#7� ��V�v���V���s7g�� �dEcf%`��'
��NVDd]}�Rm�/���\��'�*U��k�W] N,�sTn�h�o>f���g��z�*"�	)�{#v���B��=�H��1��H�����������8�+�����f��g���U�a����05Ek��������l@�^Pfi��Y
���>P��|��q�b$��p��f'���6���v�HT@Z?-FaU���9��4V���J9�h0E���
.����1��$:
7�#Q�i&�pP��5w�����X@-���������������Q����$�la��{��0�c�$HB�J7�HjD���^8E�:^�6"R��N������I�(T�v�D�0D�J�e�Xr��W�O�d����������I�
��-�B����o�)���ph"�u��%��.�����?�
�
���@k�I`����JUT��.������H��)���Y$������`f���m4a�������R�����f
`�k=i�2%��Lo"�c�d����	m�G"|����������-��/h���R��@�0��h��&cT7#��O��8<x��6��P|^W_�I$�Y"!����:�RZ/�z!���wu����Tc��I����"X�?>��%
F���m<	��A�r��x���n�(� ���<��#i�gao��9���Y����<,:�����F&����1��I@�$JF6#A���n��I����Y)��a.�,�P25:_F",'3��a�>x�k=
�"���$��Hv�W�������d�2U���l5)3SE�v���\��?��>��u��1+�kh�@�LH��U��>�\u����<AL�Z:����� ��>#W�g[2�2��H$B
"Q��$��$�~
yQEB`��8!q�u
2���#�H��Z��qa������)� �I�A�#�
��^���}#��F�f���2���-v@����4e"������"�C�=lK�<�5O.�>�Q����|����C���X��p�uvE��F��� �W������`DJ���n����G!c�%��Pz*k�D����k���FW���������Q�XQ�����d�����[���?�.����y���	�)#���������~y��*�����������8~�C���j�m�����4�W�x�^���z{;�'H�)�������tCe|��#�^V��L��qxCK[��VXu��Y����9��g���o�Z����z����t`�)(��c(M�I'&I!^�@���J5�>�QY��F��%��r���������MZ�S��|�Y�i����JlV��5�N
M��2��l�d�[��Nw��!K%��I�b��SBMU��8������)�J��#�G�w���u�qN2 ��f��x�MA�p���k���Fd�Kt&�U7���p�=�W�����#���=�Q��Y�7�23�P�W�m+1��7��"�#�X ��7��c7]v;|��U������C����}��e�&"tO���)G9�<����f��3/l2�^E�z�
��g���@j��<�C�Ip�;�8��8t��Urr���rd�����F:��H�S�!ng&����d%��&�a�F��9���t[�`�����l�_xQ��i������?�H)�e�H��V/I�H5�5q�V����jM[��5;#��t"Q�N�����)�b������vB1>��c?������\�v/W\���`k0���j����z4r��2n
 ����"eG�){�a'x�-���w���y:��9��w�O�e/�5�W��v��F�W�%����rc�����l�>+�n��1�H>�������5h(����)�����"����\��2�,7e"�����)_�E��(�z�����#ctn�W�D
�����5VQXV��r��3tR;���!M�cyfms� �7brj��d3Lkr3��G>�7�"*UW��v��A������y����{��}�EPN=��P���5~�T:r���m��F�e@���.������L	��%��o����b�%qVswyKm�qQ=	������
^���}L�S��F��3��P������Vo1Cq;$�(Y����1�adZ��5UXA�cQ"�7&�������&����#]��~�+��7��Y%���kk�8��{j��J�oP���c�JR�Y<�n�2e��k���$�`@!{�Vr�R._���0)��c��i���4� ?��c��,N����������u�����z�O���������{�fpV���i�D�E2:VO����Y��KE�'���E��_rY�r�$�17�C���V�e/O�A��u��;���k�=/c���$L1`>$�1�&����3��a����C�h����D�:�G@����_���O4�&����N��#X��-�/
�[��""�4E�XG��,/���2�`)�g�g&��Hr$�X��^O����1�S�/+%��
��3���
R��Q��4.:��]Z
�����c����+����*-s%����~p��tm�����,7�������uV� 9���X��;�Baa��� ��19�4���7��z ��V/�����J�G�ML���tv�ST8�by��x��l@\6)�D���f���	��2E	����Jv�<8�T����������e\����SUx\Z���%���Qf�!q�y��lB�-��_eA�����.�;b���`%��Wp
�����%m���=�3�Z�%Z(%f
A�Y�+�%��4a���}34��pxV3�	�o���l���3VwG� |;�����CC�\�(#.N�n�/���O
;440m�N�%��wn~t��;$����HvHyd�aB�����s��a!�*�B��_Y��T&\�T6�}�VL�����g��������(t��L�a�$)����a���oo�Fl��m��g4���n�A<5��7	
��k�Z����j�M��|2�_E�w*�lq�)���L��:�&�//#v�g��������"Bl��]�1���1!������?�����2I���3haz���;A,qj�������+�k�bs�.���d(Kf���������;eC�EKg������Z�������2]�����U��u.�n�����	,��k|���T���l�LR<����t
5JD�!"
*�Ra!)~~9]������; ��E��<����5U/U(��w��Gs�/���q�`���2�����^3�_70�-��{O�I��	�]<V"���	[Mn`�Xv��e:+
=wL_�e��I��������� a%
�	Vn�$I�����l.U!�RN;C`��I��'GT���#Of�A�K[hi/�z����+��y�G���.V|2#��N|�������O�������X���m���y���E7M��*fiN'���z�~q���u����+��@�?&��;l��Q�"�r���4]����UO"�[>17&-$��A�<8xg52�.��e�*��*3j�����ri��HX{Z������#g�t�E�<������:NK�-���|��T�.z��r�y8Rf8�����O�������|d��J����C11�52�Fl�;2CN�p*T��
,Y	����1��
�����F�m������.6zM�����%P��*�-.��7�3���y����t���Q�6�������b��G��SE�����@����HZ���rR��4���:�a�a�������hqg��;8�4K�[����74����P���mh8�5I����Hl�q��*JQ���#�� ����:�1� ���`6$��a��R
zk��&Q�%NmD��BWR����Q��Y7��pt���	���q���r����<4��!���s�;��B���q�y���cu�d
'��J�)i<������{����SJ3J��������{�Fb��#G�MLlV�q�{�uuE����d�n��i��<�{��g�t�1�<-�����bF��,��k��R�pE6
��"��M�g`)e$��~��l���+C<']��	�`�q�Z9	Tp�D=�l����
��g���k�O}����y�����
1t��3H��kED�����WjQ�Y2�P�T+����>rJ�~bA�(QKvf��|JtIg��_q��pa�� ��N��b�m�E�o���u���O�[y�niv}�{�N�}G���y�loo��R9������F��z]D�F��_I�e�t0I����������3p@B5��:��`��&��������U���I%�f�FW~��4w�����Hw
�d'���;]����a=���:v�NN0��������;��*�2(G����&wT�����6;�����t:�!},X���/�g�?ud�z$:[y}�8�0���r�o��LkDw6_0����%�r�>��,4���V�����
#7���(]�r�o�y?
��z<���b�OC�<�M4,��1���n��]��H�F�Lw�n�'�h���
X=�j����g�+M�?�G������~m�i�7`�?���J �q����;������t #�O��47Am���U�I�,Q��(��^b�k��db��4�3�����}g���pRs�������:p�L����(�&��GR����w�5e���X��"�C-��V�Oe_�����\��j`8]��Y`����;n<��8���a;:�S*�*NQW��r�������Z�o��W|k��O����@e6��A�� �]O;d�Y��V�BT2KPt�o!�
��4G����.�y�++�����J�����=Wy�:�w�9�N��XE���7�g�0�z5�b�<�|W�BL���G�l�
�l2�|g0�W�1`�10���Y�S���I�r�����(�'�d#��r�K����8���9w��k�Z����y�d���N^��_��
\3/
���2��D��B�w��0����0��o"a�F�xrOw����1�^��LI�����Lo�Gi�g����3��@��nh�z.y���B4�9E�
KBME����������~
�f�i��<p�vc��-9�4�J���7�'�2�@	R_�������x�w}��F�a]_�A�mVx�U�?�0�G&9AqlX@�}�"v<�c�I���]�p2iRS�,�<J�Z#�)?�J�K�v���2��m(��1<4�c1�"4���'�F1bjL����������a���U��}�]n�6E������oJ��z�.��oP�A�0����Sq_�0:�����v�_��KYb��9UmQs5G��0br��h�EK�
>��_��!5��b��g�v'$�}'J:}�e�����>N0$#�+:��%�C�X�X&��W��l�I=�#k��O����)�B^���K�8�w�����Y6��I���C#�t�u~�o4�'dm+!�u��.�5@S�p�2�����1��?9�K��������������*�S�����}�V�_P��6�y?�����	��/���p�HX|q�t#��{���|J�B����������S����%����|��k9o��47[�s�Cnp�
�����^8^��C������,�.�� �2sQ�G�W��!�Q[_�&����{�����,cq��zV,b�Sc�������A��k$�3I��>Gzw�������hX=�o`����R=E��6��,J��t�Z{��5!���x*I���5�/P�������,� I	�P�yv����[��f��i��6$[l�2�{��7gc\vY��F�YrW�y�/ �
]�%����t�RK�m��w��������8�3��Y����z��B�_��e"hS4tTs�\�(��eB�E�E�##v�B�������$���jN	|��4��9��1������:�x;~U��~
LaY7�(�TI6'����PN��SM�������������?��g�G�r8�{c�lhXUi9;���q�%��f�g	q�|�sOh��1�O���U�w�����7��p�){M'p�����-���f�?C�db�@���R9�q���r��G�_�$����;�=&q,�%�E�����Hq-�|���<�v��8<�)d��H[��B=�MNh�{����ze��_:tX���f�G=���9����w8���-
LE7�S������bz�7�~�b-+��p2��Gitp;;�NM�_��Ec7�!�Gb��5p�
�%d3�c�uC��GDg��`�mh��+����g��Xr������i���Rz��4����������(��-�<�X/����Xug����"h4R"N��e��+�����Lu4]�MT.�%A����=�F#:4\�X�f�*�A�6"%?�8��P��_0��{;".��x����$� �?��c'�(�M��6�������g��
{p��\J��.f:��_���Qw$y>$1�l�(C1�l�b�� �TN=Jf�;2r#	]N��j�qU�3���)���!�/F��r�dFX�C�+Vf@z5�i�	+Q��%�6E�4�)������9'�SJmyR1���!T
e��7N��Y��\d.d�P3�`�7��o;��GD4Sn���S���a/���v��"����[s�#b����XY�n��s����.��C���4>�)���.}������	�CS+�����4��)���x|�G!J���s&���m��0Im�(��;�cmo:����+����N3�VC�I��8X���y�h.��S��g|T$w���Y��
5�Ij�����I�U���*�sr�$:���k.(-�������}������v��M�E��U+��v�����t�E��V�ns��P�R��|�A���{��H���A@���]�\j�	���p{?A�������0�"I)g���,�c�����3��/qq��-c�p#�=gq�1�YC��PeG�dM7YRj��:�H���|��!*|5g��v2?�8n���-��i��S��Nc���
��i����/8S�>C�@�.r���P�,��T0Ej4/���C�wkA��(IL�^Ji����ji�
��{�/H���{�=�~2=�8yp]S�T�G�"G����?�#��NpL�aSh�
���Q�H�;��L�7a����+�RW���}��U���SFO���D-�X��P�����fF����R��	'

����)H����9B��gu�v�$P������(�|��L���T Z�<OZdJ���L�,iS�y�@������t�N0��Q���c�"�L��uF
�j1�L4��������t�IL�.j�0��+�paA�/<��bj, &��f���;I(Y�I$jB���#�]ZO�;t
�!K���CQ`f��Se�m�CI��	��/A�i��y9����J�q���;Gt�������P����bK4��T�D#,;b�������VF�WP>��5m�`&=����q�|u���~W�tO2��Qo9&�l���E�b�F�����
-O��
����I������*�Eq^���;���N�i�0g��x��
kbt������&l��B�W����V�������]Q�����6q����&mj��G�jZcy_��"#�_S���^�����&�\w����#��<;}�W���������|urUPc,��>/�SHLr�R�Ff���p���r
!*
e)8����x����b�=C^+N��y���IO�*�e��Q3����1�����lawh��1|��+���A��
��c�����H�,�A��}�~2G���m0�-s��R�k���C��"�x��0��+���0D{�oN.�~~y�?���T}��<+c�krQ�oQ2��sZ��0��E�84�����^vtbt��C�g^v����Z�m<]��;�XWg��.sq4m�1g�
�??�>������Y��\��-�"�8@�@���s�6�xqD�&�K[
+CT�Vf6�z��������%�R��<*��[���F��J8�s�.���@Rjr��+*]	��U�a6<�exU�,�2����\�X���AW)�c�qZ����������G�J�<�i������@�a���������������w���^�l���"��������o�v�����W�E!W0�U�� �QI
��k��e4����O��5�fi�P�N�
W3�f ��Gw4g���X����'�R��n�`��`L)Y��b;����}�]���d�9t`1�8h����������,
Y���hh��ED�0<�3O~�G�����+��i]&�:Y��qaz�s`�
�����*B����g�\���Ve�
i���,�����W��������!$R�#�yx�P���v�����J8R�|a�g*�P*�h��w�EV�	�X�z[@��j�e�)���1�n��R���ME����k��[������r�\�L��T�A�>u����Cb����{�P9m��5�r
m��f�D�K���N>��@���b�J�%h:2�|����C����s���:]���l-	19��(���&�.�=tPv������%b�j�o%]��F�o^v^�=?�X�'�/��+�ah��SP�e�Jt����}�yS�k�R�&��b��X��,%�P���l_��<����YU�?�OP�u�!W�����^�w�?�[����,��]F�����^��C +��1ZU�t�����5,`
��7�Y����2��V�������>Q�j+�q��6�F#u�3�QF�]}�R�4dF �F��m��)�+����A���S�P���|���&$�����`@�z�������c��=��I�������x�0ZLW;�+U7������v�w���?�A.���a��E��O�}��'%��9���
�?o�CE+�F_�E�-X���c���n�}�����[����&�NB_*�F"G���a����{a/h��:����fq�n@���?��<����*}�b�P�K\��kT�=gA4L$�_x	����K��`I������=��K��ut����� $�c8A-uB��b
hp��k�&}������a��-�y�
U���U�U��ZQ�%��� R�O�L��K�\�q21E����s��6����O
��/xK�����2��������',�
8&���jJ���x�����|<���Vp�:/d��S������h�s��sr�9��:��$0�����|�vf��nh|yy���_�/����m�-�P�����������eSx8���\��U�!����� �/����x�p:���������F� ��x��1�PZ5�J��-$���~�E��P�XS���Is�<Y�A[2w������"H�-4(~MasVxTiG���J0P�5��������%��%�\W��J���3��Ln5�y\�T���8�����;8������������aq
g�'�r���
�����JS�s'�R�i�J�p��d������Y`%`�� �^�
~	<�}
G�\(\�mT��/�F�7[:���Ll�EEt�����^o�^�7��p?{�,l��=��%��k���?�x�i!b#N�t:�&|q�����W/s�57@E!��Qm�o��������/��5S�}8�u1������s�������iv#��7u��������M!R����c�����D��������)�"V���J�[��ON�����7����T]T��
��H���R��� ��<���/����A��^�
���`����%���H���>�����sI�����9g��`k
s"T�����|O�
1!+zI��9s�� �Oq	��=u�������9��W�7���"r�Vv��.��L�:E�%��������8�j��X�W��:<�!�������/�g[@`�����~���f��ZKN���"��'?��T��pv.����g�x����?oW�%#����hg��L�jn��������=��R�#�����`���C ������~����B��Y�Z���Z^K��nw
�@J�h�^R>gh��
=4�H\0$����v(��\q$H���/�pDI�x|�a�(���^1���2xt�R��2>&������L-#�Q�df~���%x��C����G*���@�\Wq�EQ����������H��R,cZ)D3N	�3a����=v���r<��Q�h�R�iv��i�����0�>P�6��P�S��3�^	������s.[,����x��/`%��:5����=�o�P�r/��_?���]1��x,}����$o4U!ND������2`k$�����'��w��-:g�|�n��A�+x2k_|-v��~����Q4����s8��]&!F@d��M��3hl7���/��	%���E����}CQh?��H��//Z��$�Q.�L]��yb�d����o$��-�'�*��w�=���j��K�_\��d!���.�O�
%{��$�2 �B���q�Ca��L�t$Y�����T8.��[��
2[�|0F6�����o,�����|9��2�����y���-`���M&
L�ph���R����F�F:c1mM�Ds���S����������x�����,�)�� 
1F.���k�H�.�7��w��'nvMdp�Ss�$����4��#q�kB-=�W�]�k���_�='_	s�;�qP�������f:�����~���a"���M�Z�e���#�s�Og�e~����S���G4�V2��������������J��CxZ���@l��������ep��L�T?�"��i�B���Dr�	p$�Wk�����^���l['$ �4��������a3����f���Ox
��������#��U������)�9T�k6Hv �����-�9���I;-���_������s��P�a�CR�x]����r�6IwN|��-T�d��m./�GJ�V+I�}mm{yi:�������^[�]�44hzo�(�@���%�e^\
�J!����V����E4#�7��F��j���F?�=�^�����R���{��6�O�>o_�^��u��9u�n |"��?�%�{<V��Qv�E4F����v�u��J�I*v�b�4�z�7/�����o�Fe?��7������U�5��zkO�v�e��,�(���Anb~?a��Xg�(g	p	����/�9��:����Zo�����-\��,C6��B�Xl��8��p�p�:���
��g]l�UY9�y�����`���d{E2Xa�@���xV�|�����EOI�LdD"�fIk
��g��T���������x�K�8%�\$����d?�>���z�wwv�cr��bT�B\�C����Wq<�&�\c
����#��s6�����������
�~����v��m2l��q�8��gT�E�n�e��'S��w��Y�������C�w���0�h�����[h�M#��%����rI���l�����.��x4���
��(��*��N������������2���o���!IGp��O�� =Px!�?}������-]�h�0���'���<�C����j6����S�y�S#���I�����4�h����k<G�fk��P�b����77����#Z��'����z�������
���b�������/0��{��[��N���N��@�^�`����@����@����B���)N!8Oy�P�:���_w�����#��`����������[��C6�Y�:S)
��W�:_���R/..����������x�<
G2�
#���������!��0���6H�8�'U��@h�`��\^���e��`��f�K��-��Kn���As0�:h��v���D4x�p��-\g�hz-��S"������2�e�-��	jP�g5T�fm��@�l����1��!�tG�>� N6���h��GRB�M�(�!vi��o(D'�
����e0�	��L��Cg.FTg[�DL�����+2P��8��R���$/�����;�C��i��4���Ag�:���)�O@��[�-�c�AK$�':N-0�M��:�%��w�����l�����5���W��@A��u����m�RQ=*�U�����Z��v��e�|
��F���n�^w.��j��������O���m������G���)#<�8���<9;���{���7'����j�U�������nxp�@s�m,�d(:�N\��&an��;����b2Cd�5��QR���[1����<y�Y�������$.�
GM�\��H�b�IcPPD���aX%*��>�`h��}p'���=u<8El�J�z�~��N�h&�M��&�!�p��1�����H@�����p���S�	��}��A�)���������V�n���at3�{x��)S���HcO�ro��S���zB��x25ko�9�%�t�IMO��Ql!�����]$���Wq�w�Q���K��L\BV�Y�cl�kRU���\�*8�f[����vkI�D����z��~NQjYg��G(�'�lO��7�/�
6�<��������Ne�4�@��e��a��"sl�*�J�(ys��Q�c<	����D�%��)���Kt3�aZ�-�\:p�/��3}�=�(e�Q��� ���T$��,�nP�cI����l�Iw]��9��!",�
�����B�a�*p��-�����z���o��n����Et��M-��J�~��3`��!���)�nO;$��X�HM�w���,�a������p��S��l��M��UZ������WS�]{��P��Z��P��Vh]^\�{?@����@L����y��)K���m�KMY�m�,�����������(aB8�$F���C.q��l���Iz�g�����z8�\���fJ������z��H�>�-���k��G���P��y�m`j.��*`�����5Q��F���A�8��}�b�Nv��c�3����y?b����z�/�@��@4Z�H��>7Z��L�DL���_����hv���1'{B�O0����E����������y��)�
��s..s�3}�6��<� <��|y��f����'�pc|�ev�d�E�&��u�$|��NT������vvwOZ��P�w^�)C!�;f�)4]B�P��i5���^�O$�� ��nM,t@���g���k�PP.9�x���N��%�����6��_j�uq��$����0�&�c�5	{,Vac��0�I(��Y`���G6��*)���j���n2�1?S�����{�ks�JN���^�n�/=] A���1�	E&�w�I��J���$���"-��r�~���OW��a���Z�O�0]���i��')���^��i"�)*|���BK�Z�;��h<�^�L�h�u\x���M�|j-Q�s�A!Go�A|���������a3�[`��-"��-�'X��*����������;QBA~� ���q#�0aXq�����Zb�G��~�x��j`�x�-�TSi���5����Z�H3M�)�C�����a�a��U�G+,zB9�(M(����CA����]���<�L�-�z�;�w]�x�~FR�V��������%f���H��l�A)��)I7Z��C��=
'C���
��^ �����������nXl�G�2�1������y���A�D����PK�$������(�<)��<��7�Ou-��"'��<��:�r^#��{nB�"��Te]����M�_c�����)>��Z���`�����7hm/4(�4����"$& ������e?�Ih����=u���t�����{|�_���pa�jQ5���s���n�s�Ggy����?LP��w}n�%��'����:�����n�^o�������%�
�S��
����m
�6W��)�8�Go
��D���W��P�}r����u[GFt=R������}p���9�btM�@
\�i_�����j������n��:J��+��xe��6������{s.M�S�B��5�;@�cN�cKUO)ge�^��t8���a���|���R� t�5�:'��`5m*ib�����1���o�����FaO*�����_k��A:w�@&�2i���|���e����t>�*�G��Iw�QM���Xf�����-�i��Q�^7�uC#tR6�8��o������858�����������3n��|<
{����y�}.i!�����%�7�fH�}x���>����#	�7��4+�R����}y�����C>�J N+c(�&&��?Z��|��I��
U�bh��(`q�'�'����B%�K/���i	+���r�$�$����lP�����zR�:#����qw��j��|��L9�����T-7Q-XA��8x����;#�,�g��>?*��f�F��G���� ��?�M
����o�����~3|����.�\p���
�)��~Y���m<��x��s`P���)c|��C��i8�G�~0��C:��"������/��/��t�.�"���?m�J:a |��v}@G��+$�L:��"���)������^��X)"�
���4��\s�OO��Q���BX?R
u�i1���������^#Rsgi�����mK�ck��<X����	��h��v������|�W�6I����L�*���E��4{}��{�
��z��\����B�c��m��������z�~q����zsy������6�A^^��I��N�t\����{*�j�D��V�%�I�c)SK�Q]���% ��\����� m��)����e#��:/r�@��i�{o��#F��8����A<�!��d��C���G0�E�W�=���	#!�[P�Hf�8���|���\��%�|�����������bIqK����9��;��3e"(s��m��
q`�d���h���78d7�J/EM���oP������M�7��\$�yH�6����b�C*@�HMslT��\M�*������
F��-��bcr/U�v�uB�h��?��������R�?�E��0���>,)���bN#�m�$��������;��Zs7�sMo����Z��A���>9E�A30�X�
�A0Bc/�=��q�
��g��H&IN��������K�f��8$(}������o\G�U�f���u�����/6R��K���/�}b��[��}��q�����E?�3t������yp*@@��	��U�����=����~ ��tf_�/�8��~j���c2M2����e�x<�n�l�\4,T8�����F��x�z`��q���fy���%����m�������DR����<?�|B���,%)M)�w0�n�����n��cO���e�-I>���?��x]!���G�]�����u�(/���j��6i�l��1��)��116=��b��T*y�	�q�W����k���S�g2�PB�'O���� b�|>�����[��y���Y��M�6=�����e*AL �/���-v%����q�$�����W/({��x-{���,1�������@���00�!�h��%�U����|�����������#�.�C����v��<�v�W=�����LN����[v��C���kj����y.�Z4?M����^US #�����xB�c~1�?o��tp��s����,�no��}Lt�`p��X���F
�9]��,���1��~�t�����������9C���^�?RV�����a�������]��������V3p-���m�i�����u��k��G���������c����*��?�{%���C���:��T���V�����/�Jj�?���[��u�o>�]@]@������/�?�%��bK��5��><��-kP���PY �����4���p{P�7����t����@��S:~�/+�O����v4,�%����_�1�'[�@�R��[�-�������7��&s����9!�M���i���'�������h����gs��~�������4[�����m@�Vk{����������]�� ^\n���C�^`t?�j��~wp����4�n�����V��u��:l����m�:��p�����xB�!��,a3O�u<
�M<�O�l�����
F ����xt\z�l�����d
����y�d���y��
h�t5���D������k�����$PFuD�|�F��|N)t����*�R=��w���tN�Q�����>7���*��h�:)6�@�a��$M��[�Q�_��Y�h��xx��b�*I	������(��
�B����h\����y��z����Y'J������%^�5CN��q���Y���DQ��B
�E� ������{�Q�*�����t��U���a8Jtl�P��4�p�����7��>���P8���r�Y(ZY�:M�+�~��aG�v�������F���
p}	���|�V`0�����FJ4a���WU"!%@%��U*		_A����)�f@� �1�SA�>�����^����NQ��S�*�9����1��)���S�q\JT�����()�q����������|R�i%����b�	�al!�w��]
�&!�0A|Mfa��D��@�g�w�2�0���-�+�)���H'�����M�������b#�@�W�� ��^��2���uN�(��n�D�[��G�r3n!ls�f����6S��:V��0�-�c`����t�=[}��g�g�R�y������]����R��j�z|��8��!�cQE��
�����K:�������Q���`�y�
34��	������t���1<���S��CW��
�5do�`���
�����$�;�
gn�	�;��&���5�Xq?K>������
7=8h���=s��R�����A/*Aym�.����.J�R�o����+�(��Im	v�6f�@>��#%��;�9.��wo^���+-^�.s�3i)�d�OK�9W�������4~��J��+=��R&?�g���*�Y]!
��A�r/NfG�:
�d�z@h�?�=:��2g]Q��B,����[��}��0�'����fl{�p�9��nk���%NH�����P�9J��������?p��l������]�oC��e/;O�^�����a"��|>�m���!V?��/05�NUSQ=��l����'���P�#�}"dP(k�0��J���Sc&�:R�Hy"5�m��0����.�	B��:�&t���
��-��(����ID�~�
���������9�5:�����������[������>��.��i�
���k�? z�h��>����&���?�m�!��x��Z��Hv��i�����0<v(��N�g[����3�idA�=Kf&YD�q��]�����^tpg"�"N#?��
(��j�4X�oo������������'3n�P���w�m#R�����u�6�����C�#%~��@��\�d`���9�������m�>���#T�Yib��B^3$��w	�
D��u�@d����X�G�T����P��#�����MJ�����Xd0��#����f�P�06#��%�O<��pJ��0� P�	3�������2��=a6#�D�_����I�l���3`�i���[�~f�W� j��M�������k
z4�
z�9l���{�����GJ����lBSNA�N�)���D'�3�`R����`�I�7�����~;74����4����������5q%c�Q��'���8	eO�}���y�`V�8qs<��?V�m�u,�i"�,�4���G0��E��(�O "�!�n��p�e�����X��yMK�0C+�/�l
f�w�X���S���f�3�@�g���j���O$ >!��C)����"^	���l���U��d����8�:9S����x#�o���j������x�C����W�����*�j�F��4(������dp�jQpOi8?���PB�x�;F���p��a�	!$�aS"W�"���J�>(���t�$�Z��W�l1?&>���jSA�J$�5�� ��'
���V`�rN/`�.��4�d��`�>0��I���:M.H"$�D�����+h�������}&>��?�jf"��V��yS������/���P��:�S����D{$��1
�a���~�����WfNOx8zIs=�|����N�a�����tH�	���D/��Z[H�&�xJ����J��`l�?�Cac�qB�(�:N��<�r��R��qhi�K����t���mA���f����S�X�e�4-�a ����BF<0�J5-O��H��u�T����:����j�N)�6JZ��47��6����j2��j{���$�5���a���+�����Z�M��[<�Gnt�Bd����~K�@���,
Bl�T���H(ks�!%���pD��cY
�0�^�l4h�������U��Zy�9O��E��
i$bY0�7�x'`	!��/6t �tc�'�d%����t��&}�������B#;b$�dJ��L��V/s6��:���xnk@�V����p%~n�<�SC���R���&��7OdzYolY}��xS�r?m0��4����YVM��CR���,�^{4�03>WO�s�����g��W$�`��������^>�4r.zu��Z3��v����wW�r4<�ZaCBMMV���r�Y�4����5�-�P�5�#�4\���(�5��NV�"lvp���@p��#�(��l���g��g>��$��^�W�T�NJ���)K���=-h���MqoU��5���*��4��Lm�]�Q�&������3<A(0����������lg�=����@g����V:�VMK��J���Wgwnjd�O��K��y�v=�$^��0����v�����X�4[K��U/�*-/:XG1�����'���1� ��9��V�����J��������%�7Y��q�- �������7��H��J:�)�����:��[�Js�%�#e��	����$���R��(�FW,�<�������j���k$�;,K�K��vh�FB�Vl"�G��9s((���b0�!%HJQ~�rX�������|{�s$�x�/v{������{�+W4��H��Q�bZ��?��Eq�{2b�X�T6^J�2���>Kw,�dp��+���h�\�2>��@y�-��4�I����[�q�3d���ut���^�'������1+4e�BV(L�;����*Vm���;�u�2ghMT�]G-�}�����9+|��I��[��%�q?��
����I����"-|�+|�3�������Fo����f�n�q�O�_m�J����XNM����@�V�o�1��l�~vg��������?��=��,1c����p��+�����?#r�+#y����"p�0�1��&
P�~G����e_YfC%-r������^�9}u������_<o�����&�I"5Wd2�
HQ�S�7����G��Sc��#�_�fm���P�H�������T
��4�5�Z/��[]X�q\P�q��[��(�����*�&�a[��%:n��j\*����]�
�O����p9��m�K��!-
u�rl�CW:�����X���2���H|�?��kk�o����W��B�^�~��Z�g��]���85m���~�3�o2�j���L ��N��4���[=t�������B�Ps@�u�n��,��P��,�bV
z~q�6��yTwh#�F@$�`:C�|�����~��wX��A	����Z�!D��w������#V����/�!k6����E�nM(Y"4�@�����+q};Op����'Z��B#^`����6$<�����4�d���HOIc�f�����$s�g%��n5g9�����:�O�K�.d`!�('���DV��I�,�,�u�0��&����x
�+�\�/R��F0$p"Ea�B}�����jw���C����3�K~�M�X�d>%	�T��u����_�'�"H��u��w��������:��S��=pD��(5�>x��c`"��^j��1G=���}����Mx������Z�N9w�#����p�g1#"` �a�)������M�����S���������/_[���gc�<�><$�58��"7��#���
���n�A��_ i�i���E�}��S� �mMb�O�aI��I<�Y�,\P��K���P�����VF�:4�}�����c���TtLW��4��`�.qI����IP�^����L�p���p($�D��0���p��4Ofv�T��u]�z�P�\��C�<�k\���#(��������{d������u�&,q�p���G=��C@C����b�1���:j�v�H�5	&j�,&��&��GPVN��T��+5���e���CN�!��B��M�����fs�'���4����22<���A���^dk&��2��=�4������V]���w�I>n�m��9�+��/b(Z�����$����>7�m(Nt9�6�G#�����cb.H�yn+���v!�q�L������C�,���k�
G�.7z1�-���X���<��|�q�k���75uC�8iPb�?����;7����g���l�a��~|�Zt~��z?��b����VM�c��F�|��k�\���S�>�35I���d�1C��;����)'�05F����|����(]D,#G-*����&�Au�'n����%�+�0������4���1���Nmg{�b�������t���bj���V��
�Oz�~����D�V�[�>��-Uq�m��N�PMS�U�-�_���&��]�_����4V[���"�THl���QK��u@0*�!M�2�d�
��j(v�Y"��K��RK	@�LZ;o|<0�V����(�w��T��d�f��K�44Hh��|�%.&��i�c�)z�@�Oh�,J-�����<4���e+<Y���]]C��19<GJ05$�X2��d��������!������GEx�H���xA�D���F,vG����P6�;Vd�L�%)��8�d�0H�>+���=/'Y�X�;�2��2�r��L�
[�21�����8
�L#����SE����S��� ��d0
CS�v0{Ty��kx�#�����1�����.	���{��{�����9���%h�}Pv��BD���R��iO�%o���M��]�^�R1m���8p�1��dl��f�D������o/�
�Do���:��
��k��a'���k��
A{9��5*�,�|z�:�f�'C�g����?�$�?��!x�1;�=�L�C[���y	iU�xu�������X�F�Zg��8yJ55����?*%b��������7��p�(x�	���\u���}�h%{h�9��f:\�0����V��w���/�F7)��!��H���_�����������]�c�x�6`N�{�	���(lJPC:oT���h9,w�{:�������i�2��),
���R�%����1\P�^����C�X���"����
2p�$d�bg�I�[v)K�{t��)]�pi��+DQ�{�M���6�1>T$!v(\:<�u���6�'^|iT6�;]i�v%%B���t����v�'W0�m4���@�P\�_���`��W����E������\���f[�����A���j�L�D�����7���G�A�6����;�o�/���$��=���Quba������tG�D��s����#��������%��0k�&�06��q��-O��	Y�bX#�Kw(�2I�������F�����d����������������\V��$�9{�cC��;QF�Y�4[���I����"^O�$����n����b����e������,t�����1��55�"���@��y��e,6�v,�/1�K0:���>�3db�ka6s.��$EC���tN�a�	�	O	?r���"n��Y{
�d����]��]��.mv��^�m������q#���oX7p��2�2~�H����;QD�}�9j$q��\8��c���i�A[��h���d���8�{���7������++��~p���h����
�� ���a����F���#y-VFt>)>�������iI&�I\q1G��x�d����1��g����1���Xc������t\)l��&Pm�F��W���X�5����a�fc���|���GF�]4�Gh�{wL����P����,�%�p���t2E�@v!�htJ�s
����N�W�WOxio�Lm}cS�Yq���=�Q J�]�|�n�:c�E,����k�R���5`XQ_�<SOT ��z1a]�r�4�@����CQ��CI��L5M�����>����3����*;���a���'��[`��)�\�8Wj���tt`.�`:
�9L[��y��q����-�������@=ny��s�tF��
��]grq����6��04��s��>���78���8�=�=;��?*UN��u��e>&�5)��I�"��JG���p1�
����G#�Z���Yqt��/��+Y���KM�o Ku�:$���E	w��:
���	3�31�/R���N�����|.���5uVg]�S����H
�s�' 4I�1�����Q/`�Hr�P���nx�Y���u�&�-od&��4\<��P����(�v6�M�����O��E���������Fzj��V,��+X��1
0b��
rmk�`_�*��e���NS3�i[�th�O�g�dH�G�r��^g�GC`eol@���h6��hi��;�d�����J��o�/�NL]�^�\g�le#!>T9o���MYW!,�>���t�:D��E�����)�)1m�Ei��0�pv�����i����<�7�m�):�Dce�.���?B����PGP�R��������
��;�����Q�������?������~C���A��=������.�*������)����Z+�z$�s��Y����G���Cr��Bt�N�e����3�
�]L��!��!�}D���7n�+Y��g-c<������e1�GT�:�"�c��=��06�{+��)Sy����xR�u���0��T��2K�Fx#�y��,G�#����3���c�{M�E��p2��?�������w����S�����~����c�]����7lt��$���N�����4��3��g��&b�S���'iS%[���~4d��I;T���.nM�}Akk�'}Q��'�8c��`*���Z��\�%�B�wR��>q)����6�(HO"��@%7�hV2��J]M��XC����`h�����c�e�+t�C�P.��,���
_OG�*=���M�H������l�)��,�&�ar����
�`��*1���4*��gZm����'9w-��S�y��5�B#-)G���K��[����'[����4������R���UNi @:e��u2�N�lk��#q|�CO�J���91��/��=$gI���9����@�s4TGwT
��pR�OLo@�g����6l���)�R����X���>^7�>A�F���N�B��\�		���s����P{��!\�>dGI�f9P�����\�����x:^	��@_��PJ:�"�t�I��&��	'x�����(e�{
.��\�;��GdR�ylL��G��F�f����"Q��a�}�D��F�]"�7��������O0#�6z�dNY|�/�)*Y7����+#�Vq��V$��Xz����QK��h���c���'��H�*Xb2��I��
{��F�m����1D��\���=����
��;#&��E����������9�&vBYY�K����F_!�O�"�W��w�%��a��Dm�K=���bb�C���n�!�r#R�6��e-�������5g��r�l)G�v�hs0m��c��-�)���h�)^���B�7�W����5D���LY9j	��-a@���������i��|�����4�e�oe������X�n��lYn�e[=k�j��"�1?k�����l���^[Yi� ��6�����$�����GU������f|1xD�Z��>!
���jP+*������r���Sv��]~��{�o�N���;O���h�M6RL_���?���P�<����������t��j�R"�?b���m�r��$i�A��-$����H���X����23
;�������)��\}f*o^����$�5���b����
D�+�'�R�$h�O��Z�Y�������z`��

%p��if��&�g��>����[�a�[�Z;;��N,o1}byr����mcL�l����u����@�d��v$��./5+tgk�n@V�N�{E�
J�N�������VG��Eo��+-��Us�Lb���VW
��crV�s�U_��L{���[m&r]���t
����fb��\�#l�D�:�3t�-y��Dx��L���yp���X��}i��6��&�j*��}<|��[LAm�N~C!
[+m��!p(m#a
#}�{"�(F�F6���rXN�d���Q����,Wpt�j�Fg���QT8�@Q3�H&�(�e�j��M��/�7�p��@U�A��\��}3g�@dl���8�GA�
E����7z�&1����z�h[��t(
`�;���^��k-yyJ����<o���o7�
�{w��A�[IU>�Q��0[�y���������c2s���z!���Y�En=i�6}���w� ��w����>�	��m�� ���)�2����X��X�9��F�\��,h�����h�
���o�:��m�^q�Rn�>������A%U4e����(r&w~�AD��W6*,-�� vF����d6�a��"�D��W�
�k�����n���0�m�V���3kl$bB��8%�5��V�)Lb��<*����3�Yo�a��j���O���cw�����(^�`�����$�L���x�~J�?0�mt���\{-�U:
��|���7<�V�~��$(2���d��7���HL�hd��4;��W�R�G,_��u����6�tg�q������Xa�J5�)%)q�"r��8$���)L�?��cjvR�}�����s����b�W��v�����u[����x>�J�q7H�5j
��j�,���������Q�R���!j��*�c�[�g�9����Rg�C���R������{	:�����!�#�`��R��>`@j����s�Ih��sQ���$(M������O{���5`����.���v���z�/Hk���+��Y��]�1���#CO��|����0����_������Q�y��Y��
)���m��D�!�K�N�?��	�`��b�o����17��G��I���0?&���P?}��FB�aq�����Wp���*�iYxbV��Y��,DyA�"�,j8�� �������f��k�b���!^L������$Vp��J�r����!(g�m�<�{��b��m��I�]���6eZ��X���`et*��4�=����q *���0'`C]�}���!T�V&�pb$�9*���k"y���RV�*u��:S�W4�����5���hx.�`�n��@�x�%�r�H!�L��s�6�l�p�s�]-S�C�
&����f�(�#Vo��&���p��8a�$��d�����%eR���G�X47�����*�e����sM�HJ�J��(��Ye���O�D�.L0K#4Q�h���@����]�s����)�-�{���L����C�
�'��4q��)H8���~�
�3���i���|J�����|�	H
3JX ��=����f�]'%�f	/2Dc��ap�=��`Mq������z<_�c}9MD�*�1;��A60r:$ ���5 �FT��*}id/"�t!!!w��~���$��p7XUBh�YFd��H`�H$����\�K(�)��Q ��(�����5F�G1��y�`2�*�~a����+vs�$�2�����t��c����
gc���b�sY��~C��'�y|������Nw�v���>�wmm>��1�+��#�8�j#�A7$�����W����
S�"�k!9��1�����x��d�v6���u9h�2;�"�V������(0�:Bx�q?5e��=w��Q)�>u:�:�TQz&k,�K\�d|PS�W��OA�Z
^!Qw�^�v�A+%�_��5�1A,�?�5�|zS��(�\$��(z1��P���T�k��Ck�c�aT-���������q/�����pt��2�0����8�����E5�9���>���`���j��oR�-�:����������E��8����o�O��.�;'�/�����������u�����r���7�Y�A��rw>�c�JE�u��xK�\V<ll��)6@���]����l>b�b��o�h�M�2M+���,�3�X�$3�}H���\H�<s���D��+��xu��b|�&j�O�/����E�'8�g}@4�2��5d�
0��eO����������?;	�����'�j-�^W���������k�l���k����G�{�:`�������VO���L����9bW�lEal�yfh����#�1������-�@��4s�:E�$:'��0��0���2�h���%��V*2����rJ���n��������A/�*(�ofQ	�y����(,������S��%gW��>���#���	%8�������>�����u��
��������C~t#��&��.EY�m0Y����9&~�?{��cQd@B|a��
p��y<
9J)��@�:w�B���/P�LJ*�����-�ZF�7�>�6�?�dS��g��I,������l�Y���B�}��R���Ug�a��;a�Tj���[c�-��^�s�xT��f�Dr��(�i,F4�U����R;?�\��|�j<�����XQ@��&��O��E��n���.�T��-� �5��D��S��9"h���(�X���-J2��VK��/���&�:�A�[���5YR
<i9BP���g������S����3������}+^���5�R��;t�`,O�x��A�0=�G��~�����0���T��q��p����m�Xo��tb=e�4�O/���/���O�T�=N
��������M�d35e^��n���������)iR���4�Mv���^iW�+m+�8��5�:s���,���J%������>����3�>�b*NV�`RJ�������ls�V�����k}*�j��9���f���ll�������V�a���~��sz�����F07y�Y0��>>����X	�G�V���2��>�<ou|��hE	�p�����p��W($��N�
���z��:�vw����f
h�lAF�
d��F�������X�#~�O5�K�gG3L@e�1�Ni�k��h��1��������
��������zM���$?�C��5R���]H~f�ox��6E�e�nso{;����~+X�#E��KQqb�jM�4v��7�
���������Un�?���{��+b�B���K������K?�]
O����
�Jd�p@����	�	�@m�D�:1�K��n}�T�����;�.������M��0A'��X�����~6�w�g���������<im�vP������^�7G�P
>���r�����]"Z�����vs���?�ZA3���������~3�9���{{�5�$W�D5�U����S�P�%l����GA���d�g3���Ln���7����q�9@�l�����d
������d���T�Z�jNZ�'��7'��_����}�V���RiS�DpT������2��o�z�7� 
�}<'Z-�&-�I��Ok+(��Z���[������&�]�t�Z�����:S�5��l�8�_�����L���~�#b8�
)�.�t������Ts������5
[h�����aQPK���	U\��N7���;���������f��V��z�>������n�
�,��3���KMG)��_cArh�������,je&iqo��|U�kL����ua1�������]�3���VZ���H2`����k�[��N��lB0g�~U�fk/������&������Q7������Vl:y��cO�*��EU��K������2]���d0�M���e��7]c�i�U���DmWi54(/���%]+
�Kd-y��s���LU�*M�7.��q&�,�h���&��D�����n4[�����JM����$��@�1s2V:l�H@]Xg���V�_��/�Y�����j��V7<���;��As��a�g����}m������K�{M-k����g.[l��f�){�=(��Z�Hv�/�����:����$	}6�f���\�����;?C����$����4����&	%�Ma��i����-NT�4whu���_�rAkht�6���������qs��"�@���i|	�,�CJ�#?%�2��T�&cl�!<z�x|?�z�u\�UBf���� ���,��_Z�g�t���xR��������6��v���b��1E�$��m�#wU����"����[������:�7lkl�0���P$���v9�4��:��+9u��m��,T�����/�����������(e��O��DHGn���]'�� �46��`������k��Z�P���gyznOl���D4s��PV�8T��9��>;��\�Z�>n�$`F�����������#���_�$��H���z����YUGt�l�53����x���K`�p>I�rxW�E����AV��A�b���z}��h6mYJ�p�)�^DC��v�<MQ ]��%�#�]�����z��'�Qo(�e����� �_m�[�CzU��*#8Oi��-�v]��l^�P��~!l3Cd��)��4�!gx��LF���^�(M�~��]�<S����X��Y������g���;,aAe�7`d��i���
���5��&)�-��=��j{��e�.Us06[��3;z��"�I�3�������N}F?9��1R���I>*�g_��Q����'�.������)	���D$�^�}4��:�5����)��!��B��_�a7������E�Zg��������[����I�-n���9�g��/�)�n�1�sft?vd�@���
y��eI{i<�����	�F�D�����sJ
�X	(y����E*v�4��~��	���i����l^��V�^�F[�_���oM����B~g�����
"�#�6(FG'�^<��%��Z��A����+�	�	'{��+��:���,=q�	]�;�9Er��k���H[#!)���f#����NH�e����	�:#w(�J����������O#)�>���+
^�\����_.�kt�X,sw ���E���^��9��h0�qo`,U)��
��K���H0���6�(��]�)����Yk..��w�Z��T��{�v���am��cq��4�]������~mgA��T���s��
�p���M�O�o���D��	�8
=�_��P��(������xD
�`���a��r�oL4������v������\5l���Y'��=��,Qy�"�Q`d���IX|L�����Xv^���LT8�./�M<d/t)�!�G��z`9t��V4.���,��M��`B�a��*� ����3z|��(�����2Bt,\�S'�X!��7��#2 ?�S�����x�!��������Iy��o����8���/����&�\� �m~�duF�cs��08��Ak�^�.�:{�X����:��i�080���U�+b��0H��$(y�u>xa��������e��[J�����������)�*(�H��a:��(�U���^p����)�n��w����Jkb@W�SA���]��o���/`1@X~9����Q�aDFM8j��hNE�7�jI�h�x��n�	�����Uew����np�;�
a��nk��\ 	_�V�T|Ai�D�n��vKBof�g?�����Z��;C���i�A/$&�,Q����\��n_�/������-}�m`�p\��h:����^�����w�����|����0�(L~�EU�������J��p@����e0��=�u�-��;�j�d�+O��RO=z�`��)�����\��t���fa6�2��c��������D�����
Z����y�S<zi\��5/v���bQ�+���piM�^�S��N�	��X����(w^��N�������-��]a�pZ7Pp�yf���*k��������Te;6�v�����~9B��J���s�=�^.+#o�h���;r��$�O4Ts�M��6��0�T��ykp�8�r~��Z�S���^�����O�,�Ph��]��\����n�_�����a�8BaC�@�Eu�����C�>�����'�;�*���k�����^S�x��������Rd#��y��2���.Xha��CN}-i���5[�%	qa.F����0�NNE�=���l�J��4<�V�������U�5��R�+�����T�MH&pK�s�z.UO�P��$��b �3��.-"������V��������r���b���F�����t)XA|�>H�f#
�?BGI����������e�1��[Rboa�a�(,����=�Z�	����D���?��~65��[��!v��f����',�B1J�
��$E�%(hG
����%�YrR������Q�������W��|�""M{�I��a7�*Xw�h�r�7xr��a�&��)9�lA0m�t5������An��a/���n��B��#�o���D,B9��Z"Y���tR�Sd�`L<<��{i���<���B_�y}v�9�n���<y!��1�E������8)��)�J����������O��P�����F�My��a��Qf����o�u'�^��*f�����:��F����q�L�h�L>"��l�S�S+���&���6������F}=����e{w�Tw��
;HH��IfkE'*��_U�r���Yec��`�K;NvKJ���
�;�u����c���i)H-�t]���r��Gf�<����f������O�kB�
����1�����T��@E	�0��j�t�t�`mW�g,[6�^�8#��D�
�k�Uj&8���� h5f[�Y��Pp�(NL�	�
&�
�&����.j�u����5a���y��Em"�Z��a���������FW�
���G�2��
�'����x��#�U bCYV��9�,��K�D,���@C�NKZ~�����x���8���7�������UH�f,��_k�����z��kl��l6��\3E(v`��	LV�>�$8�@�TF������D@������McI_@����A���^�������U��Y��������Z�@>o[&�����k��������Q`���]�b�]��l�a�����z8���N{AS��_P��l���O��i6��`s03�E���Y�}����u���f�0$�x���l8_T2U�a�:X��vB:�;����D�sP�&���?�_�&������?P)������YS|������oL��rD_�x�ivj�;J^]'�
'��F%�_��okk�h]PT5*��(r2�QR�a~���r-{c��Vb��s&%)<��ol�I4���S��4)��:�w*��L�pz���X?�	'i�;"��N��<��L��2�X�����mE���'9U����o�t�����8NB��S��%^���`c�a4[�OCZj��'��J�����X�a.4���
��X�����-���{u<��d���������
O�r���_0��<�.)/Tq
�Y*.�y��X�7��'
j���P��k�)����;����]��,f�d���Q����|pPs���7������YJ1W�z���	�G�����������UQ��U+�*�|����X�RQO�G4��3�aP�������g��+u}�N/^�y{�u-g��]��Q)�9{��p"DW��=����V�Q��>�F�TU��E[��B�Y���Z�<�<0YQU�$j�x��b�R�v��V�t���}����Rlz���M�$B��2�P��/�%���A����[�� +
y5a��$1}�:3��^~�IL����-�O�D	�)_zyS��x�p�pfi�1yL�D���{�A�6;/��Qc�)Y�J1�6��#Ss� o�%%���
�M
:k���[��76?$�?��g����Q�[��o�K~�t��������D�P�j{����3��Z�+u*�Hf(
a�k�H��%�6����F��C"��<G9w��<��z�5fCQH�Rs��J���2�P\���D���t$
��_�
�9���|��;0���r�����Z~�V�����
�1������=�H�BD�����yf'&������Q�F�5u���>j�O�*���k4
�)������rVuN��)�b7,�9!����fe����z>"����_$���W�j�O� :�����?�6{}����l�oNU�1���E-��,B=/��	�����2������k�D�4<��?������fs��m���Dg�t �,[�>?{����er���^��)^�|����i:�����l���G�J�^�E���A�`����1����](2R�AA?����m��v����V3i����:(�'m�(�b��G�
�R� F��������"T'�{����a�������9�����>7F7;Y�y�M8Kt:�Hg��)�v�1��g�@�������:�Mz��b)����s���]�m�j����[~�I.�x��/�-&?���o|J!�V54�Iw���\�#���9IU2��#�>���d���%���3���5�y<RB6Nk#���wrV��3���O��!E:�:�t,���������rQ����2H(���'��&h�u����3��PL�I�@�)�"�	ti1�z�QggC��y.�	�i�o��xY�)*���},�%.���/%����%<���[�!B�s��"
���6�I�I�Ac����4t������P|�l�!�*�<�wH%[-�P����h
:��$�)%X8���hhL����,)P������_�#�aj�>��Q�KE��u�UL7���#�m����%�4�SJ���`��N����'5���@����r�q�v�Q��x��s�����6�B��cr����8���D�l�)�{f�x�J���{����(���|�1*5;���s�aa9�S��1p��L��/v�����=�@��53aw��m��hV�����L)^��@����@mk&a��!p�0P�X�ijHN����3�Q�[�#��>��fr���cJ��������������T8-���S�iH��q|�W���*������z����
zG�FC\h���V���T}��ze������K}�@���Z���2�Bw���a��F����hO�~��R� �hm�T���+�Ee.�����Cf�m�E���������`L
�v�C���L�0
�m<J�9q.��E�2�|�)�*�O1�9&#u1���u�E��s�3�E�k�Dfnz'�Vi��r[�hHq�Y�R����#YyN�E"`������8�#)�2"mG�T�|��N���
1N.�G\��2�����[:�L��v������R��}����#��#���Syh��e6�����Hy%Q_K��PeGJT�5$�Y��8��f��wJ�I���6�x�D\�o�$$E&�Hc�u����D�\!�2�z�Q������5�������3<�4�y����b�N�M36�O�J��xV,������:����6��O�[�Q@�5��S���S�T�C[��uk��j�'P��`EI�
k��lv)�Pc{��<`�vU}�en{���n���6�pB"����X��<y��>>bt���&�����e5���W_ ��!���F'�%]�x/z<�����f����$���\�Z�9�����V�(�M+d�C�4$�|������9�(������q�0t��V��1�a{(KAxQ:')�����\�[��=�Q���0��B.Y6�$1H������t��4��������FB�z����G����1GX�y��p����mi���v��Hyv;O$��n���"u����6>^�0@4_�N������4�`��Kr;
=���y8^�P ��m��n�����[���}[
c-�
5d��HGdq��+��s����A8k|��u��)A����r[9�Tc�z�R�ah����E�E Puk���d����=4T�;�+�k�� s@���h�����^��#��tSF��i�.�zZ�c�����R?�_\���.f��M7���2��3�S;�r�)�U���� ��N�^��r������|�;�C�n�&��..z�bH�}�B�z��^�/Hy$����r�	
�|&��w�wq�(9�2#n��k!*����y'��Ck�of��e�^���i�S�b.�4�rxJF�u�w�����k����4=�
Hj������zGy�HL��@�j^Z�D����R#�������>c�@aA4�l�_������(�s.���G"]�������&�-��b��{��;�t���P3|O��1{���~Z��2�\b�A�{d!Sqt�t�D~�F����b�e5HGC_���K����%�&�2�k�P�C��Oy��V��6�,^bn"B����{,��38�z�����	�4���E��&��?�OX��2��{��q������lc�'���9R�o�!�;L�� Ed�e�i K=I��a�B��"
�8nB�h�1�(�`���vCVy��b�LGHf�
=m�����'����a�	=GI��t�	Q��eT�d��f5�r�����;�����>��a��}n�>6�PH���&��&�@n�:~�^<F�d�Yif�
?K3r��{��&�"�L����j�ba�{E	���,��*��D.�����W�jF����%F�_hB4�Q�$)JR�=KJC����a�������E�u�n���9V��YQ��J��e�������;��f	+�����D�8�b����3o6�W�k���!�Uls�y�GO�k�.\J$�����y��\��M+V.)�y�i8�[-5Gs����{K���y��9���TRx��wF�����_|Y�$���wr�d��8��u[�sH�
> �L$w�q*AD_�#��d�r���t!M�J��M|��:��w
���h��$�.�H�d���`#���`����4�����4H|[�\�>'��)O�-�+�|�Z*>)��>��h�^U\n�>Q��`,4dC�k�"�0� �%��
�����[;5��p�f��K��`.�{]�M8O���~Q+��u�G�.�5y�����l�!
i�(�&
C3��J���q�|=�������%&��Jc]�{����|�:IsDyV�����r;��-Q�#��!V`�$�C����� �nwp4G"C��FM�Os�e���x"���Fbr)�!�H�Mn���R
��A5�8�s��C�
�� �Qy��C�?�X�v��$���~m��d��w@z����������op�b��^���"�2���x�\I��nH���P7�v�d�<�1�\��g���g������..��/���&�Ok[2�A4� ���O���!m]6�>`�'vG�"	�R&v�4�fl���iKjm�DBQ�.	4
��%�'�U�������Z����*,-��8��Lm��'z�`xO����9+�q@/�������������^>1����Q���Xx�������
�$�'��P+�*����&��C2�dF8��3�
M8A���I�[u�T��;�.�
�����h�hkb�M�s10��h�L���.O���5��<���A���q0�I���
�L����a��Ov����&<?��0���s��k�<{$��/gD#�'v���� N���7p1�]��8M��Y&��{]���(��"�_G�Bab4;:&��>��������hEk
��U��Y�����rVY��J��%N%�����6�E�u0������\��m���g��	�C8?e �;�d��T�A,��c��C#\C�V/:�\�_=)m�C4��96HydBt���K
���<�mW�'�N._]SO��/�9C/�ez+J�#��8���<���Q�9��"� �bH'�-;�*MM��'G�F:���	���xx{w$����������i���e-e'�LDi���N�l�o���4�������G�����#
�_�M/�����4���Z�Ht���!E��i�SH��s�!���DZ:����mAX�����&M	��D�i`�V3S%T����M�v+��*k"��>����0ss���_�!g{g�M!�[�8!��������7�R�/��DAj��������z��9�+���C����}�`��'U,�����z��\`����� Z-�0�=�����)r�H�3M/EF���4`�(�d+|���!ho5��z��7��O�	�CU�*2�����7���* �P���W�.3�~���lE����1\�3���=�v�XE�/��?T�������Z6��#aW&#��]��e�Cjbj9���U�$^�����IQ�*}sX����mc���Am���x��.�����L���;����
��#I.=�n�����<��)�������l�*k�&���oI���VN��h��"E�"���[W�A�/t�8�$�����I\-��I�=*��[��l���1�����Zi�"JRVm�)�v4��32�,����{����'��B�uD�����9�Z�����g�|5��=���
jl��R������-�����S>�\h�E	�����"�6��uw*�)�����!P�{�1\�nE��
q��:U�V�����
f�r�j�UX����+�
%�1
����D�RlS�y��}�]��Z���|H��g#��qQ�����#��{����R����LU��X����s6'!6#� ���;��d�+��u�Cr1�R�"�M�����s..7��"	�<nb9��a	9����V'�����3���4��K��N�l,�#�7��n��S-������4�>{�������r5��OPDr���0�������Qhz�����x��1Id��5����
d��bK���FQ��!�)��F�,P��Zu��d��������oh�%Y�f^8��M���&�����_
j.}`����0����6�6����7]K�rx�y�y<��v���Q�Y'�x&����]���@���(Kxb\�
���~iH� �oB�YS����;/`�&���fx�)#��v�4DxR�p#��E��|�1{&H*�����Q���������th7��hD;��KI�Z��������s.�|mn$�`2�F��K���k��\U�������47�{��#��pp~q�f�^�B�"����P7K�&O������>�� F��Z���Y'w�����|JJ��\��2!Z�<��RZ���m VB���%{���S|4���]���O}�L�&6���lc��|��WE���`��g8�1�$���L���������d�V�,�L����!��h��q����Mf���h��f#X�T��7��Kc��5�0���p@�{�7&d�iC��#��_
�E�l��5es{`r�F���]M������\j����	N��Zg��D�]�.l]}����/��g�^3����o���g��{�����V|���6R�h����P!�E�o)���"�vv��%��i�D�Lz���w�]�w/G�����fJ64������:$���h@��r���w<����3���l���	��%�[��t�1����#�8c�����	e�$��p:b.����ZKX�������ZS9=h��d���V����YB�4�Vs.G|�\r�����H�$U0k��
��5�N����Lb�-���j��U�1KZ��!u��D*��e��c��c��5�{��J�l_x�	�/�� �JU�Z���Mb��r�3�@s,kCoT37&��l��Q�{3���ko���~���ZO#.��SY	��0�f4��Dl{d��Y���#�\�	~d��-������wM��]������O�qv"����x��|��u�lP��������"�s��@��;�8���2N��z��35�wlI1�D���S���e��	���w(�D06vS@�#�Vv�G�f����rn�3�4q47�M������1�����f�of���4m�`F�����QOT����S'�s��Ts�z�ZN'��O��P[_���D���'�)����1��c���7�>"�����
�����S|TS(�?=9���yy��xsi�6���Z<�������h�av���n`������;f�2�tLZ[rh��:�a.h�	m�`#���C��q��8/�$��1P�m�X��f��=�nf_ �j
3�����"!n*���������b�([����������,r^xV2a3�Pw�c���u�c���E(c+���TQb�zeWe�����K��I-�!e�K5s �8�R�1�����*B��	*g�[U��������6�����i�B�M�!
*J0�p*4s$y
w�<��������"��\�����E�M�>jmic(g�mLi�3�'u�_L
��L�r�^�	6���n��������������Gl�B�K��"��*nE<������J�y�e�;g���%�*�/Dus
Doc<����CD�}������������W'�g������rP�����Dj�L	�?����g���z�v�k.���t1�j
��ZK��<�>���&,�����s��u
`����RN��(>������$C��L|c0#����M���n�=#�Gq�P�JB��X(A�{� N�L!���������e�����=��?����������vU��t��6�B!�	���y�6ueq=���yl1!,�3��)�SM���M]]��M��N�IP�c��6@�z��������5+�A��]�hSK`���S{SZ�����������j����M��7��7]"X*����a�;AQ�d���^m~r�r���U���x��/�c&��<�y��k5�9]�n��uL�-�������S�9����]��@����u �s�~�&��������W\�~��P�)�!�'�5_�]��Fb��l�XJ��P�F�e�+�����J��uq��������gpl��Fsv�9�X�N��|��.��VrI1|+"�n)D �SN�~|_�����FO�V��E���g��ot3���g��<x���xy~���\�R����#�����Ae2��c�uKYw jk
x�g�a(�q9���<
�)�n��fB�f�A��{l|S��o$d�XjN��������uLcX�EL"����#�F��H�l�����X�V�I��c\+�e>K�}�����a���%��~tv���U����8S��rs�Y~a�%	�p:�������%Y`qOq8���_f�1@(Z�!�<?��u.���'�����g���5���������i��(��A?�������I]�����zpW�~�������-���\�;{#�����6\�������V���[N�������;.�Q�QPr&���X�&D�i~X[0KIqm��o��#/k���a����Twd�a��L�N'�c7,{9	�H3*�<����E��7#%��$B������pey��3�F���?��2{>��[�a0��$���I��fppx�k����v���[%]v��E��SE)�t���0����JFx�2�X�\D}�]t2������v ���xMR���x�Iq_Q/��W��oO�����D�"���=1�A�*&u3R<���V�?G��k��c����o������N;���;�CL?��$�9����������t�#��X��l#�e�;O�:�/KT���85z�A��y��.��*��Dj]�'��{��mP"���M!x���/��dK�8�^c�vtA��PH�n��p������;h6��f��� {	q��esL��~�����%,)�d�K��[����i�������`t���a���^���������Q<����Z=.U�G�
�T�[��t�� M����s������u�b�Z�b�qx�h�Gh�!�&[��zx�����[�����K7����QC��Zs�l�j�iK�#�����t�2��������|�B��(\xL�h�"�"k�c�8�s�0\"�c�����$S�Ob���i<�P{�G����ZJ�:�M�O�!�Z�Oeo&��*�d���J,����bZ�����?�M�)J3�:���(�i��"=.�MX<*��������$�����J{�q�X�^�b3Q���y(�:��I�&v>�(.����'u����:5n�����|����@;��5����j�O����x��7��F��^�R��P����h��Z���0�kN��ye���Cd�H���U �s���.�LtR�:&�%d�W�<1)�4s�/'����3����Fl����;��&�(�0�����e:��n�R��v�}�	��N�c"Kd8���,�uG#GN'J�,����`��A��R��)�:sQ�(��7�n��0�1��D��B"�r	9�Q�����LW�B�E79!��l\������.V�n��1�	��5���k�OT96�T�J�8���"�JB�����)��qtp���+��k�,NB�D*Um*r:jU6��u��t���lW2~��8�z�"�0�8[�T{�T;v�����mW^w&������L��W\��4A]C	�s�z�@h��1�g�o�N��GR��:1��z�������Xu�G�g�u���$��A��f���?k���L�vA����Z�"�7p��Y;��07^���( �i��E6U���+�!�Y�fR�b��T��Q8���hEz�k�v9v0\:K�i��C�8wr��8cl ")
�����+o��'����#� �C��sW;b9�0����.4c�	PD�9��Nk����x?MmX��F�������/���[���kv\�bX�wt:U8����G�k��4)Q�f~�����[��lv 7C^z�
]�qC�^U�q�bm������o=�������.Gv8������6P����Ih�K���E�%�1������+K���W�Po��B�m\�1N���8Iu��{��@��(������w�{8�$���o\��=k��Y�,�K�\jqbuc?Q���7�c|>�^�L�G����=D8;�d`O;�O�I�V�n�������Bj�O�s����{a�s�3�;O���5{8��J�����FrLo"��=���Y�}�s]s#�l^W"a�?
X��Q�3.����,�6k��G.�2�����,8v��'�I���������� �g�e
�?
��)�fI�+x:�3_�<�e���"1Zy��9$rgRl�#��Z&��V����FO���J0�(�_��;�qk�;����?A�?wQ^���+N��c,�PV��n��*���lX2��cK��>�r2���C���}�t���e���a�*h{y;��������f0��0
�6�$*�P>���1�ip/��X/N�@�H����CR�Jx�{��s�����w#���`�cJ3�=�:A��,�*�S�k�dK�1���
Z9�j5���c
��gU���?��2hYx�E�/gQ33���6��T��~Ug���=���� ��i$�X����^V��Q�l��}��h����w��,?��%���`~�+�R�q��4
���<`�_E��Nr��twd�[t51�M<���
����~K~�L�t)+��:��B�$V������@j��	���������Ozo�6	^+;#���"���,�����xO�FX�-Hi\��\zC�D����AfX�N wQ��O�a�>*�&.�'\���2��q�h(,t�c��{I�F]��}��w�{�1q��=� Z�����b������-�o��^:��UO�e %s�&���WG�h�=;}m�D�zz�"6�] A�?Dwx,b�A@@��X�������������({�}��8�I�����D�~����^�F�^g;�Z������*�5�K��M���#!������)�c)�JV�������m��nm5����z}k�����\��4��n�SbZ�����(%-�`*j�@����$��h����s@��r��J�������e2�K$��&����s�0�	'����a��5G�5m/1_��W�1(I\���7�����a���MNR o�B�H�k����&�L��;�>Ec�G�/����*�(^C�h:c���*3��a�������
D��R���0���w{��s�lp>'������[��s7r1-��e����:���9U�(��f�W���W����5�&�o�Ax�pY;tP��6����NL��Y������v�k����9G7���o�H���fu�d���H�����D]r;R��,S�X|�;���T���03��y���sL�9��7`��K�f_�s��~�u�QgCs$�oVY�Ul�l��\�K��H����+���ML���S��'	bL���<�\�H���eh��Y��X%O��hy�H�����=u��
��b���r:N��hA��r�������d}�(�r�e�����J_
��	z�9{��-i�$��CBIE6.���7���q��V����$�A�<\��8�}LU����+1��3�����fu{��E��I���-oQIm�oh@�K(�%��+&pDF��M!H4(tM���$ ���a�!�+�,@���A���~4���2t��/1i�������e�F�
.�5��O�h>��(�(O�D������6�@�o��e�HS��������V�;���s��C�H����F�����m���;�Q�PI��M����l�����I6���0��o�e����N�&��U&6���%�rm->��
{FX��Ws�c����mVC����hiHe� 	�4S���l:���n�����P�t\�V�aQ������$��������,��||I������L&CV�f������''�����B
N���e�<Bm�Jb�i	��H��)?�/EY��3��v_E�x<�l�����(��jL���9���6����U~�q��Y��3�q$�\s�����YY�L2k6�^��	�8��8���	�q�q���:M{���'���?��EZD�9��+8-�H�5�9��i'w���b|������$�~<$��\S��D���HfI7$r�a��0B�C��
�nN��@L��S�t�s����F�OT��g`�d�5�6���E��	A-q3���T�sfi""4i��M_���rgo�
�a�y��y��t.��*�����D@(���9�6�R��W���cY���`�����6�3pJ10_]�������L�.��y4{c��989x���<���;�\�g��:��6�5�6�G]2G|��uQ��8}f]����|���l����jntZ�*�l3�V� y�n���O?���%3O��Amo�R�z���>�I�������������������������K�������YZ�6�1�PYz_�x]2�������0�
��ZD	��"�b���1���������SC�x���sVUA�|�����K���~��w&�+��14���7�]�|�������w�8`��2���X 8U}������']�^��������`B���3�U,�/��^B?�D���t�huw��l}����( P�$Y4_�G�������i�y[���Y����Ep��[M�������f���vU�}Q��Hg[�  ���'�`t%�$���+�E�mB��p���5�/�X������%��M�e7����R^����(D���=&�1�q�>q�eH�t?b
�<��.��(��CB�cR�� Y�t�=�<����%��%����e\m��j[��S��PK����C�$���d�	����Ym���Vl`�jWV)���y;����>�A������LWc�~E�3��(�Z��o{��0�E��m���������[����F�������!m	���	��Is��A�?�{���$w�5
����hD*p���m2��Dc���:&�O	�NS��h�#:�I>g���)e
"�+�����=���h�AbrLA���)�X[s����elk�@�x�|����G���C�R��U���H9�x�@�4z����M8�b�|Ls_�������������������<~}N_�����K��a8�NTc��X���!�*%S��2�4+L����FJ�eQz�����
B��A��� �����,I��a�.nTn�/�x���e_�����(����i<�����j�K<�bf�IQTPrGGlR:^���]�<�lnR���ed���������E�ZZ��w�oI�Q	�q�$�Y���_�O��`(�'�!�y����=�p���Tv�0Nm�4�$�.<�%�Z�rv�-/�-����{���K�u�3��_��,L�P���_R�Q��Qw(�PD�����kb5���x(~��c��
�K�>g����H�sF-�m��fv��8S�'�>�a�eH�����f���d_��G���/�������I.���i����3�n�f���g�R���������?h2�?{��ie���������M����D�"���`���,������F����I�5��vg0J�(��FwG�:�)��^�:����dt�����$�B��APd��(	"A"#D����V@�wPG� 1�����i>��4�/�������2	���e$��X3;�r��l70`ucsGR�~�������6������u$�U�
C���������+zv�T��{�(������x�]��]�w����\�)�������>������Kg��w���;�?��P��10��-�X���<_��D�K�+���-j�/P7;k9�3:w}������ec��y:7wZ��MJ������m;:�����r����R��t6\�MUwFS�C��\�_�MX<�0�<TV/S������]������\��In.�_���X\&z�������5=���{���^Z�)�:���oI��%D��~��h�d�Q�%���p�m4�0u�(|�
�u�"A�[��s�/�����*T���#Z�*�\n ��[Z������ty��*fYf����������82tM���rr��uM���z�������fgw�^�%[����U������Y��C�_%�h��&��7�v&����	<���F��l�����K�o�bo$	&��_�9\D{$�$�����|,���.X��y�C����!�WO2r�D�S��_`VJp-��S��g��O��_�&��}���(OA�?�b��O.���'�r�)��%������8p\�����GI������m?����A��XW�F~������g4p��)��$
|B�'���Ou��-�Bi�m����#��T���W��-�T�o+�������h��C�`�yf
K���Y~Q{�k� l��q��#;nA����w�C�z�&�l������ &�w^���di�a���1�d�J��Ko�'��s�,}F��5���V(0i����g�I�>{J%�|����y�C��o��<Y�1F����m*�2�e/��z;D&��}P�fk�P�����y�?�/�{���������gG�%�eZ���Ap�U<������(�{{�k�Fs�%���89��+�-�S��1��P=�-b�ii�]Q`�����1Q���?��8�a?�v��������!�������T��Lg�ap������&
�x#�ga|��������smp9Sc�Q���l�m��r�T�#�M�f��x*<�IJ/���S>�ES2���J��f5`Qad&lQG�%��Tq!*x>"��h����n��������l������G�����y1K��_@3���h�*B���3��5���x����S��!�mJ�������G���]]B�m��*}�)���������Q5�7��R�f^�������_��"������o��^����2��Q��HE��k����	)��})V���_$i�L���b���x��M�l�!����^�1�x�D=��	A����;��l<������"XI���a?aZ�xp5�'�7�It'�f�x�il�o���
/�t+OW,"������:.������Q�4PE�������!In���"�00����.��:J��QlV�f�V~Ze�`L
]DZ�`�n����bu`S)��E;�+����'n���8M��s����p�a�xt�����-E�S��4��'����-�"�?�Pa����� %�;��9��Y��Z��%�
�r�>s��*��0�
5��n'����%��x��'���z�0*y�&���uN�h:2C�K�f��!Y'����G���5,�����&��z�Q<��R�_d��s�#� H�^�XaS!�����[��p��f��4�S�>I���'�I��:G3��]*	F��	jN��_���_��QE4�?�+.6rH����|��\�|�S8������;����~�M�}og(
T��qT&�JrS[�M�7#��x��`����6���;?����|�9qkQRM��Z�o����6���@j�����g��->�t>�E/_���8��C��o>��\G��cLZ�$T�c��:���i��/&����+l}Z^��X������0F��(�7��p�MDy�P�\�8W�$J ��q]Uom}���-�h��J�w����
v�9��s��}���X������[��9�e
�
�������Pm<���_��F��\C>�%� �X��=�\RK���� "&�k������z��<I�Nz������d��l����:���u��1��$�uG=������M�������/�Rb"��O����X����]�|Sz�0��%�(<A^)Wr�����������;ge��~���3'�4�m>M6�����/��L1�s ���� ����9�gZQ����h�����
����GB�D������+������j6��|��{����R��g��`��o���8��_S#���gKz���'W���f�F�3��Q�kB��:[	y��?	DO�h:����%H@N���5�gd4� ����A^��+���)��������(��1_�<$���)�\0d���_��S�$Y�r�uL2�#Y ������Z�L#��f^oe}��QT_]4�k���C�W�y~�E��f�3NA[��5���R�I������(�����Yp��yz�n�'��5q�>������gX<��R0|��SI�
5����e��P�ZU��In:����	�f��>�s6�o���oE?��:�Ig(a�<v'5i�/��}�� GU�%����e6������:������/��R+��Jc�Y������k����b��@!��<G�!�D� �G"���cI<�������\D<����H�����am4H0aLQDK�E6��O��.�!��(T�}<���V�Y�A�b�.��2������������>����;~	��gP+=
��h�]NV��Y��J?H+n�/���%��B�z����fn������/x���'�.�^*G�����:%[�NZsz6Y�,�� �554:o`z�r��rS�eAj/��)lF���-�"���v�s��#����~z�N���!O��d�"
������dBc���'�o���/\eN{��%���5�}6FLI���7S~C��E�����k}��(M����U|�eAr�����V+���K��!!e9�rj��w�(�����j� �L�bN�m��W��tfM#��'�d�{PM�(q�q������7oJ��K�

��1�9��B����t^����*�r������Xr!���Q�M$`tz�y
<6g�s��3�=�4�L*�%
tG��Q�F���R�6i���a��i��P���'v��)�|�s����..n'���*j	*���:R���D_���"tcc���Kk�
y��U���i���K4b@E#sC3����d��NuU����<r�@�{��y��l����=��N"�M|C�'b'8��yA"B�p�����@�Nx��ei:Z,��d���"F��+,�!U���?�7G�l(C��y�ca`�E�R$�k"Tu'x%Z"jiu��l�
�����:��qa�v]�e�S���]������	w,Lo���]�0��Bm�����f"���`I{�O�a���*X�\�2��Q��1����hX�w/�e�e�J}��3eEt������{�EN�;�u>��w+�y�V���hG��.���GKYV��I�����2�/7�gOU|5�
>pE��Z���f	{17�-�`�c�"el���l�I��8��C#l�/6��1�r>�MN�@)�,(�2��j7�f������j_�����[QlnU��4��	W�.������3���R��3#1?P�M�>��Q�3x��,���d�\f�\:�|?^{�gHS�R��#����Q�����k�j6��R��A�,g�-H��Q���{��N���:�8��R�*y����oN��N�q��/|MU� _��Qi��
*�k �[�x|.�1{��]���+'����w���L����7��^���h6��
Zr�+��:��wp�}����h�i/]�����4��N
���K�{��@9l>0��,zN�D��FL�=?j���]��;:�����P�(k�$���@*��M=B$���s=��1s=�O��t��7��|��Y���(����x�J	���Y-x�����9��7]�zt�N���2N�A��$��;2;��_���Q|�<��2)X�Yr��DC����Hu�������'��G�>8��\/Pz���B����\��D�j �)`O�K��)R-��	)�n�v��tH����B�E�,�o���r�����%���qvt����}q��L��y"��Qs�H'���6���pU�,�X�AN����[xKf�$�IRi]��Z��jh`���(a�%�U��E�N4
�B���)X�*�m+�9_Mf�W���+�\��i��rZgb���[Uo9=F��2n/'G��w&)bC�@j$qp���R����p�)�	��KD���y&�5z��k�>�6���@����~C�iM��A����O!���:���Q�{����$�.���1�����4��<�j�',F
I��/��q<�=�4kdBc�M`�H/����mo}�6��U������B|��"
%L��++)p�#2N%��#��5�E-��J�ng$SV��
2�OE��;�d@T�aj�<��h)����f���6�������2,�G|�:::��u"��D���!��]�}O%Q�����J���<���7=T�/Y�09����
����P\)/���\��c���8b���YE$~
CKKK�Dl��������������D��}��O@��������Q�^EX�l�Nx�o������4�����
v�$���qR�n3�QW��}�(��+q
A��
��������oH��������(��bi��6�)�d��q��� �:F��:Dx�E5M�'�I�\����B�,"F+6sc�;&`�������Se����z����.i����E]�*��J�v�P
�@2��~�������?�Q+D������8--�O����o	�<���2�,y_5�16v�P������l�K�����_�	��@�����{��L�2���5�/����58�����~wp�B�X�9����U���u��*�����3��*a<�5�
N��L1���P��u����y���!�C����<bj���;a��a
��w0C���y�'�1��l��#��8�����ou*�����^"{�5��x��g��	��Mf"�P�x�/�O��x����+��O_���6���*�|�f%���b�k�H�K��OQ[
t��6�K��l0��A��F� 9������+�qZ����@�|���+�r�,�#9���+6V�g{�cK#�e������F]|<�m2f�.����i�sa���'�G�T��MRPR�6K��4�p*U��+��/e�H2���N�/�F���2���*�����C��"N��?�/�W�H�������;os�^x�x��F����r�l����������Es��U��"e�B?C�K�U��s����d^G��JPe�!=�6��F�d�~�xp*�^�S�]���>��86l"��%�����&o�����\Kq5�OF��u!3���S	�����(z^�y�
�S�~��
��|dhz��m��7����a:�Z�y���Z���2���n�^�W��]����wu�4����3L�'�o����T�h[4Z�/����~�g��0�p�Yn
a`R��Qe�;���v�{��e��#��*�@�6����5�
y�9a�3��#L'�f�_2����VE��zB�b{�d��t�6^hM�
m>��]�#�f8���X����'�#nLz��BFnXS����sc�
p2�Z��)��	�����Y�������
� �!��s	�/�u���&h����r��=��U'J�27��NV�h�w���K�I�"�<��\U<��aj�,���G�W(��#]�h���(����+�/�v��5_?wDAs�L�l�6aL����d0�����N�jf����C�R����nu�f�V�xW @KHF�db�0�=�!�"�|yX9���l�K�Gm�|�+�
����I���������9<q��Q�.:Cr��5o�J&��XL��YGB�8��K����f?�+�N�����k�#?�R+���F�1��R�P�^z��b�L��d��OB��}��l"B�zt6�D�zx������<_i���h_e>��/xeK��p���X��"� mw��Sa���'Q���p��3��{��_~h�K����G�F����oQ�G2L����!$�fv��K����o*�S(����d�e$�`�Y���D�I7��wX��[�p�����<Tx����)# eB�})4Kr�$_
h������:��C��Qa.���$��g�+�X��
 �Y�Oy.V�������
u[��E�����}��#�3N8��j�\AmF82�Z���&S�"E~��n�sv��b<2l4����ys�������f1|�=�<K"t������X���>������P�}��$�gl��x,�N�QF���^���b_��Y��Y6���,!r�r�.,�����!�OqO��E�S�iv������|Ph���d��5d���')d�K����@�lX��F��28a�9w���jY����?��:+hvC����K�����R�;��,*�D��%,�n�B�34��tC,�A���{�x�/p[��}H�D����U�M���:3:�L�������Z��*�:A��&�����J��JXv�;�|05�G�q\���Y���G$�*5*�iWob[w:�`�X�)�7g��6%�OT(��)�-��V�����Xa�.>C}����Q��}�:Q��ksg�����5�E,,V^���v�����e|��L9��E��X��2)<��,}���rv���\L�=g��fI
��1����{q��f@v������2�QR�V��Md_�S�v:G�����'��2	o�?���!i�o^���9F���$����������4%,��������')��xH	���
�c+�CH�8�����yb��WyD�������w�������Z�����Z�����������j%���,�K�������������������,�"�GX�Y�avo�O���T����������sw�+��<$V.#y���4^�#����t�9��[!�m.���us�jd}�2��}:o�EB�z�Z��tz���!�[�73U����}��V�
��Jc	v.�8�����~1�����B�4�B��G4��c��9=k�C;��d������L�<"��J�'�=�]v�s�j�tgm
�����3��'4���@3Oo�QX?�'M�U���^s�6z?D�;XT�$]B/g)-F�\��*#�f[���!����l�'��t�����C��"�1���o��L��W�rK��� ������]3��d����lm�64V�z��d�����	v�R'�D�'��kW�� ��Eb�W��2�'\���j?�����a
��S`x	
�n�R���u%�y���wV������� ��c��BU?@����d�+��inL����69�$x
0�!�����!�,�`�f{��%��P����"g����u/�i�~��6���seB^�^�@�)�h;'�����4�<)������9��VU ����/��Xdpz�������q`��w��R��:<�O_]����)~//S������
JH>�+:����a�2���Xu�T����a��
���
��zw�bwT��#:�����������������,4��3y�X�a|�O0���^�z���7�/�L>��R��9�'Vy���D}C����Sm���B����T#1��!�,f���Ty�������AZE�H"F��:���|�,A:V�)��9B��w�M��A5��pc�:��=-
��i�L� I�X-��[� Q	g�������=;�d&6����3���t�fh���k����:`l}S@Q�JR�������)m&g�>��#�BJ�j�������
�4r9*�%����5�����|���d,,E��q�(p��c�tb'M�LPY7Y/���������3��Es���)�9u����Z�� t�W�ZHW�RJ���|�2[��������J\h��S/R�.�ME�	���O0��|6��j"���)������^J�i2���T��8�1e�zk�	�?��?~a��$C����d�#y�����A�,lG���v�)Fw�N5/.����7��S�zQ[����+�5�S�����z�g�Q�Z�5������}?>yy���q����eb��we��/���!�1�u�3������mw�L�$CD����`��0�*����t29�����O&�!�=N�
|,l�`�m]UEt�=89�����������W���{!��3t;^�K�N�Fc<��L�q���(M�hL���
a�-U��B��l��E�Q��:J�#�i�-�Dsj;��B.���JKwnJ�c0�s��b����1�m�����!��V��$;�v����m$�q����-%��9�GF��u{���C���'�\��\��W>�ia���@F��`�9�/��q�V���c�7�
���0�=-!MIr�^�]%q,5���{Rc�)�Y������s �<X �$�Laf�d<���	~zx����������Q����}������������W��=���?���tV8|S��rE�`��}��C��t��f�u����]��
f�%�-��x��vX /�;s���W�����5�P0(���.���T�u3d�����>��A.��R^�7Ggt���.���mi"F��xB����A����sv�9��Sc������������{o���C�hD�<����p{h�3q`�)VkC���B�q����E�<B���~��%���|�E���J@:.����jL�����B��PX�=g��S��!��8�����VT�P��xG$�]��H.�Z�\8��6�H�� _�N��}���g�%���p��[F����R���`�K<'�An�b���ZN�b6���G���%�NU�kB4������T���o��F��44�!����zH-�=��@��x�,��c�S�?��N��a��3���A�Wza���W���S���gr���P03������84��o�}��
�T:����7����HY��
	��a��8}�
K��H���\�XM�����f�,L����*���y~��hx@<���G����Eht�k#�N��X����{���uj�9�^���sX3�\��/�:j��m=����^X�2�B�U7���Kd��@��Xz��K!!(�4#���9��O�u���(��OaF�a#��;��7D]�W���!e��G+��Xy���[�j&����fy�l���r�Z���o�#�[��]�9
�U��){�T��g�������O#����=�H����A]��Q�@�2!N�����$�cv�:��)b��*kb=QC��+|�,��U���u��?���=x���/T`��(�<(]�B~
�|��9V�k���������R�;�?\�M�:Xg��2q9���HZ�N���n���w{���&�_[[+o��_�-f�o����Y��{>������^��Gp�yL;��ML���A�a��EG����>�>'ES�
�* O�������3�� }�������e��0��E�SS���re�5g`���b>�|K����jiM�AG��f��������:��&`|u�w���1
A���Y�z�U3����ki��p���/+R��b��j�\���?N�O3��1�'�y�#4����y���{���+��3���O�Q�����s�y�G�?�v_�FH��A��f������=^���X�����p+O���Z��(��d��-�P%������$�Z��D���~��6�F�&�&��Wk6py6Z�V����
`t����:F�p8P�������G#�g�,�5�	�	�a������7L8���]������5]��,�}�,�"�������R�U{K�G����wfO*�T2#�Z����StwS��w�x��6�������������q����8��Y����uq�Y��������[N^��f���������F����������e�>n���
�mn�U<<�8xu�m�)\�����-������y$<]E&0�����f�������jx��������s����W��f�>�9��s�tv���? h���sS���D�U�L����9y5�S�]���<,�a���8{l
��R��%>�I�I%��?G��[�mm�v�W�uJ9�$xut��&xE�����9���q�!���y�f����~A�2tg�����������4kMzVQ��1��2��`��:�G�ys�$c���C�3��,\��~�^Y���i�O��O��u����/�l�H�2(m�B�����BB��o��������a2��7�������[���t���9�����;�Kb�����
`~aK]p���[d ZvoNY����������0=�z1��C��o���ocT�����a�;�����������5�dk����e��/��I3�=)C��N�E�9�>>�x����d�7{{�&25H�>�#�o�=9�}��-�4
���G��g�20��O�N�>��f�W��(�H�u�h�f�&Eo�����z��	����yk�,7w�
��g���oa��>c�Eb1Vacg;�
>���o?y���y���^�'.��i��^<�
7]���b�#��N�D[�{f���l������6��7P�j4j;��/�y�6���_����XT�;�Y��/�U�p����2!������h���F�Jb��6
��x��m��^��������:����v~\�L�{�O�:-4����G ���O!l9J��Z���^�����J��*'���t4W�x��
��G��3c �N�}���P��D�n�E�q����22�����++��gv�1�3���ecY�����6%�kx��D�$�G���m�\L��AJI�3.=W�	\�u����U�[O;���N\�'�=�=)?U���C�-�g���3��DQI$nkm)��e�#��o�a7]�w�������H�0I/&�81��-��3@(����	(��;�_�X�7���m:Mp���?�W�6��\����w���������*F�iC�k�98�8>x���0�1��K��Z/��z\����_�1��q�K�K*(;r�������=������o��o�����/�K����gO��h�8zs��N��9�Kp����������)pk#�����b4��.�u.������������I>M$Lf��k�����HN�P����)�&}A�>x/08�4��g���gJV��(�e����f/Gv�\Z�m=����P:";<�&y	bY'X��C"E�[T�l���d����KG()"��N5��A&'{Z��pL��'�[kn�[K��f��c8!�����o���B�:NUO�p2� P5�9���Z���t�_>�V�-�Y�jk�k��*�iDZd"O4\�%	^r�$����_��v�����##H8x\�{��QQ���9��@�~:jlo5[���A33�(�v���%��D4uh,��.&�_r���w�|s5��p���)L�Bj�����,�0�;N3�P@��
2�Rn�2l/�L�"QhE��X���3ZA����AA��5���iF��$tF�q���6L"M5Gr��t:��[��R�s������ �FY:��]���;T����Y�^���	���l���mE�I
�����QN�HM�TBQ�.����a����^��5�������&�zq� �C��K�p<��Y��Y�BZOf����->�(O9�g��+�)�
�t�����F�2���RaJ�'��g�HH��Q�9"���E�P*� 0k��F��}�+o!@�z�Q�3�KL�N�n�����da"�($e���x�xi��a��N�)�U<��?�q�G�������u��'�f�GO�4yO^���H����Kx�1���J�(���yiE������A�I�� k5J��L5Z>�����Q[TT}����d��5m��0�'���f�`���z
��b�4E�I�*�N�8��� ������T�L��Z��,�Ht�8�����Gg�2@�8f\�Ng<DkI-;
��h�v�k����i���c-��3�\��q&���,i���/r�F�5,'�� 6Y�X������]�����U�����C��kx����z�����j\�\��M>������s;AA�Qk�,_k5��XY��\��*De�������\<ud��D@
�_l�O.j����{����A��!�V��b��&�s�n���k]j��F��p|���a����Hsf9,��P����n�#�I>��fs.<��d0yM:��@�[�CL�o��k�O���(��-����!���B �G��4���H�V#����f.��x���4��.k��C��
���1 �jH��� �tg��^����S�����3�R� ��~����)�A����g�.&�{�wU��k�~�1�ByJ�]&#Lx�c�L��P��HC��k5k���@Yp��q�K
��Xh���(���}6D��y#�2kwt#)��O0XI��������$P��_3�fa������~g��d��]�F�f��j����N^��?�����y�s-�����q��u�����������������lgo
���l~v]����<\D��j��Y/�U31������^Y��2]Ev��F�\o�Pen������l.�\d��M�N����������k��>tW��[��-��}��)[���P&����?��oE�Gm���������7���_6���;���<�B}����O���b�S����:8>1�����9b��e����2��1���0'/�2��'����?�=:����Or_Uf��pN��(�Xl�1��h������h�}l��X�F4�*m�Y{f��c:�~sp������[�6	������������KQ�8�E���.�~tr��G��`���-���-��t;g��g���l��K[�;[�����l�J��b����"�H��`����X��y����0S�����g_������Z�&�}���']����D��G��'z}1��\��2t�
����N���n��nk^�������T]���N ����7�)��UB#����"
�Gd��������hx!�*������I	���]�Av�@�r_tg�^���W�t3�����XY�i����*��M.��5}������7f���m/@n~'6�V��^�M��� I�Z�m)�yo6���f������7Zy��y��3�����������U�I�"�����uX��d�<�;����.�������(�+v����3\�B�j�5,4�;'_?�{�95��#���E���+�zx2�h@�ZF�g��bX��
@�5 ��Gtc�Qv���^������k�f���E�(�'8��8$�6nm�G���{��M��6���N�q���sb����|A:��Vm�+�@w���2N��q��*W�C����?��tH��y�i���52�Q����
;��YS������"������Nc�^ov���eA���FJV3[��r{��k	?���t3�,���Vl���@}t���A%r��hv7Z��\�2�{^B������o�'�e|~g������r���{��s�Z�y*_����ke)d���,v�"��d����F��	���������.��q(�������7��N�'��sI�����_E��E���6����8�?�Q�L���f��U����=���s�y��Vi���[����������k��,L@Z�E��/�Gc{{6w���Fcscg{��o�|��hm5�\s����7���]��7�sb��a����f�?��|0{;�8��h�$�����f�k�:{�{�I3�wv6���N��g^���<�5��h<��CZ��`3����&N���aw���������o�����U^��zl��C����C�����<n�>nm��zW�O/��������������F�Ng���E]�����D>��]���V+�ZY*�FMwJ�%�^U${bh�����Q��p�
�uJ��R9f.$�>�gcB0
�M���v�2n��N���u�����P1�Q����`@�������
��[fu5Sk���9!�[*��2��C9 ����
M��Ir���O�R2�����Y]��R2d����;���L-���?A
�W+;@T��������b]�3tY��:T�n@W�o��.�����|m�%���_�2e�9�p/^����Z��=��8��|Q�J���'V���3D�uNlwN-�����Z��
����^A���8
 A:��H��VksV-����8{%J�'g�2X�|��f���H�-/:t��i����@���|���id��F�
P��nJ���j�l��ZdkM1���
�Ddc�R�jX�]�M	�V�h�W�B��������/����.V�h�*��{�kq9�[�co6wz��N������ifR���&�Y%���� HK��� �T�&UZ�H'@>�7�{">>�_	���p;��3�|������/�|����g$�<��'���w�����	��=�.j����hx������M&�N��U���q��?�"|[�q��F��R��%h��������:�X�x���[xR�B��HB���^������MI�*FV�t�~��jpUC�X+���C��s[�Z��1�z
���qk�zJ�|@��d�&�[��5�C=��qZ����������_�8�j-�#��S�|7I<�Y"�U�;��
�����S���Bkp�r�"�3�c����K�wg�����. w��M��i,2j���KV>&���B� WH6��7�j4�2�(�D��
O0c-^zYsP�IwK�������5��WF&�����hJXwW	������8����a���g"��V���*BHb� �|�J��^�}x,T���>���<�]��B�P����:;��u���o������?��}�Fv'���UX+�!���`���z�U���qe��-��J���%eCP�p���_�a����f��]�������7VT��F���1tS�>%�)]g�j�������-u����=�%?P.�4A���N��%������������o�u�@�
�����tN/�%��Bc S��)"�Nx���y��]pZ(}�����%�M�����s?�����1�������4w���M���l-zn��;%�BO��
�|�|���y��m���?���>���	=�����)o[bR8P@��� ��6�S�x���O���	�w	)��B�6H��;Ag����A�D�������
�x��r�w���D!�1'�������&�!��e��k��4�zth�����L�*�����6�������EET����zy\s����Gd��,�.�-�	�+���^��;����<�.�c�Q(���D�������:sS��M��vF5�:�09����t���l� s��?[���]�(�h&~�:��l4@�3���[�+���Qb���Pvh{�	�}��1
w�\%�����Scb.���SD�N��r�e�[N��H)���:��+�����	%��-yv�����SzW:��N�y��O����]��d�88(`�p��U�Q�h�qO�9��~[&]���2�o��p=%�����E?����f���Q��?#�0l�����F)K������H�.�����?L���_y����������\--�&/z���F��]��o���[���E.��'��/��������)�7����������t�E�����o���/��3x-��v�Z{�Y�g���.Z��
�����=f[;M�"��<��4�m;���Z�5��Ia�8v`Z���}��nP��Moc���E]l��6��w6)�����?n�nlHgX{�v��[umo.�,����7a�1	�z���|���dr�����Z��[_#�M�]%��3��H��L����rS��Yx"Q'�����Z5EMGs�=FI%qPU�<W@U��Y�K�r�� ������W���a�G���e��I���r;c�@���1��2<b9��{��@�i��.�#�8h���B=E�&�C������Ks�|(l����6�h�Jq�f�K�yU�&E|$�GXM�_���(��%��P�R�vO2�'7���F�Y��'4���(�Jl�����&J���P������]hVnNx�'���"K�H������ ���M�C����T	Y���95X�U���3{��z?�X]��M����"@h|6C�<��x�hIQIo���������N��1C%=��������A���G�3���O���99�0G?�+�;�t�����h����������GGg��\?�aL1�O���>��3��Jm�4�����������x�,�3�S��~��m��O�O���T��"�jG�Sg����Pw�5�F��zg���NF����`t���u: K& &?t:������x��nb���%fcJfcc�f��d�c��pvx�Z����m����DH�i�Z��'�"��-�!��HT$��?�
������s������s�'���!KT��[�d�9���3���'������}TuY�|�����,a46k��m��~T�KT����DU3^�����*��Q��!��7���H����7l�$�P�4�o���7�qy
k����_��p�y-n-ns�o�����@����DU��E���W�\���5��f��Ga������X��
�P��E9z�������||�C��:8��S#P�N N}[p6A�>����$X+�%���Vt8	��������@��>���lq]�-d�\����<�pH���xp�a%��+AK8��'R�0K@�v�!/r���[D�N��4c8���Vg�r��.j��`�C��Yk��*���CM/��e&"�Az>��`���I��#Er/fW���3Fy�H9�J�;�oz !%p21���^�|���p����5�s�H�Z��I���J�����R�����[TG�)��4	�@I?�������pvf�_�0���	����j��� �����Q:_�I�Gl�n�^>ds%�RY��*w�~�1SN�~��������b���,��,W���iTi��]��&���c#�'��m������p���c<�<����(QJR	����Rit+$�'a�����+�N�!y<P��T���#{������t�������Wj)Qe{NIl��(�;�n���8j����v�JUm1�n��h��!�������.��gKe���%�����Y8�o2�����<���y��g�Bg>3>�������	���?���bs*��,{.'�-����^�8��������*�����P�)/V���X"	E�3oq�hUy��#&?���@N�.N��y���_�_�T���9���#^fZ�+��������'��|K��.��]�[��X�Aw�Z�`�G��L�-o2�����E�����#��v�7(��5Q!|Q���������X6�vg�/��q��-�0�\s�I�����aY��l_���\cP�$��.n�jks��}?�mb��O�f;����,� �6����A�g���\^���6��MJD���ph�|�M8���6����l�bc���6����(�M���-�?������
8�L|���
��s�7�R]���m\'�m{�$z��LMkx�����'1'�u>a���}\��Z��:�
�{��v���;�bL����+"7�%//�Bj�6���_�|LR����W���8c���2���*��s���%��8�����W���G�^D�����+�]@:����ro�Y������z���j��C�.��Nl	u����gcFx�������m�0���V�r�[~^f4�?93
��$)	����)���]��FD������������x�-�H�sF#�>|u����s��|q�s�7��e@��"-�z:W9Z*��m
������H����
2
F(8������^�G��9h��FK^��-Vl�ZMo���U���[Eu�a�0_��h4�9���WHe���4{G�6���?�}�m@��;s@�}���l���G�L��US�����0c�2c�_�I� ��W��Gj�:�Z��-T�z��������8X���Z��W�����\Sawm\���x�O'�f����SZ
/�u�Jj],�F�]w#����*S���\��&�m��"Q��"�~�2�(w��L8M`������gG�o_]D7�q�<s���et���G��?a�j�<�?�C���m��P~i����������85������Ps�����G�S����D�#7��R��y����%;��:�{+�w��r�QB��{�<�t��T��cl,n	Q�d\����{L%Y��[Y����;G��eAc�)��C+��i�[�!y]��{��#�%L�Y�tF]�*�[�)Lj<�h��8I��J~���
C���(z^r��:�7�8x/������9���EQ�}�0T�������������a����qtH(P����d�*��3��u��=���M
���-��6Z�7�s�C�Rh��������,�_#����O�-�/���CPf�wA��\r,>��k����t����&ka���_�K�&��s����?&c{{Gi�V$W,�008���(M��;GU�����6[L[p���@����?��2��~���To>��U��/���Y��������K���G�v�Y�v�'_���O��f���;�h1���\��	(�LS,Av���}sr�J��MrG��������r�w>oV��V���������g/�&��r����B0�W�;����]\V�{IX�v���[ ��4�����b�d{������/2�h������8��TAT���M�:-z
�J���	�������V�z���t<\@��y�XI�;�K�O���`�Fwt����C>��v-������JGA����2������],��-�*|�F�!����qb��$	�-Y���b��d������s�'��e���l��Ie=�lA��xq���u8�<D��Y�����2Y��r�v��������5s���U��vey���
�(O�M��0�"7�1>#�~g2��q��:8Wy��"�U
Hf��dc�&^���+B�c�T�bB�$|��!�?��B*O?��u�7M�
����k���>:���=�I��9�n'�2��a�?�4el�A��Pu�����y����bc��W9dckGR��*P������d,:TR�2(:��'�E���/�������y��<:l���<G7X�#X:Y/]�d�J=����m�oI�h�g���r�H5N�d|�i	�J*t9N����1�"1z�)�Z�!���/"#c�4�`���p���V������w�<��nF?P��_��c��^���<�y�
9�n!�=aEz�lZ+4RLK��j�%v���-/%u�[���\��Z����
=\^��
RX6�w�bX\c�V���W�1\��I}�,��:������- �+� +\����Z���Q�%=��mT�o�=t�����"Mu7�L��{v&�P�����Y
?)������O�zUa��������D�xUFa�=<�<��*X`��`��\����`�����G��Q���0��jp���l����8���a`<x�t����Q����F��	~Z
x��6�	���iw!U�J�
.>rG0�#IFc�m<ot�EQ����� ~KI0�D��E����5�� �9��Mb^o
�.��a��n��0�s�7�H�O�{�|�$�>)��� J��
@/r����|e����4F�d=pf�0Naq�b!qR�8^D��
\�:~��.��^I��KYL������^.Y��}��)����BF��vHkjw��c\��f����C���B��X�-���i��-oyj$����J���8�J��f4��Rj���~��r��-�
�x�E�X���.�&k|���bd���do�=�� l����Z��K9�<��e��Pc��������%���A������V?x�N�s����YL���� W��%1�9�����!gy
��H3�.$�F@{��f��)hl$�C]�'�Z�1��Z���Z7?�+6ar�v�o!�Ebv��Vyg	��Z�~���;?M'�	���4txf��-���w�b��}I���p���-HW��i��Jp~W�����2��a��%�%�f����`�;�����w�*Wk>��c�
����%���F�*�#H���xD��{���P����h<�L��8��&�G��|�~Ig��N9u��FR�O@w<"Y�_s�O�cb�	���)>��>���z��t��!��r���8�e�����K1�Ap�z�)���y��2�m�*c�����(#����?���mUe-�,�]T=r�[���YF�� ���V��L�LX��\�����#��[L���7���
�+���1Fq7@$��Hn�eE9V3L�����O��i��7���#�����$����6����(5�&5�a�Bz�"1�-b;��b�{���]��:�A_n,����s|m�:�H�<�T������
����<��Qv?o�F<��KX%�/:�t�s���[�~��@�k��g�F�j:� d���zd���sq���T��|��]�Q��4�`��<v�1Q�������0��U��A'f���@��F�!-�Q��_�'�!`#R0iIP�V��]�������S������$�z���t)
�JJd���{N�����9V�L�q���Hs��2p��s�
��$�e��j���,[�aY����h�v�LG��1AX��:����%���2����! �kVm�/���Z&=����K`�Z~*�`�1�,�M\��Uw	�������<m�G�#u����;�����6��]��g�]���IL�C����8z������Q#_u�������.��z��������q�2�p,H�-�I�n���)����iI"�m��z�k��w��>Y���+��@�{�����`0�'�\-O-Vu#�p�������'����oO^�3b�����W�GK��XT���-~�6
�>8��h�@�sA
]Y�+�	��+G/o��CLP>���f�_*����7��pU-�.����9�r����T�]�?m)��b,.�s\�?�e�
��h��-b�i�������,a�����������W:,*�#~>���Dt�?��!A�CgX[����*	YL".CO�g��1�d-`��	��#�s����8cs�"�/x����n8�V�Z5?�L�&����*n�����p���O@�}Q����c@��O�L��\|1�"�OB��������i�����n�f
�r�U�%7|0��6pF�Tz�x�\n�)#�����O^>lXXU���Wnn���I�%�p���4~�:
�7r/3f�G/�����zupq|z�>=~�-�]bwm%N�-py(_z������������!H�q����{9@�=R1<�K��g��*�M��t�\%�\%K*p�.�Jr��������W�??9=9�������6!g$�6���t;����,J���"$�-N��p������j"���#��M��Iz5������_CRd�?����Pz�������_���+���Q}�NHZM�@�h�?&���
p=�2�w,�Cg�%PUn�#�;���c���V��039D����c��d<I�s)�s~=����}�b��PRS[�[�QU"/��F]���g+�h���CQ����0U����!x�cT�\.��p�?H�����)�-�V�?ev��l�rp>d�A�7������ra�}�J�x�rM�K��|sW��E��[�U�F��	����������m�#��(��j�,:��1�����*����G	��,�y�&���v'����{��o�c�^��]���;�gT���)�B��H���T��	�h�"�
��E�����@����^���f��I�t�Pj�",��1�x�l0����&:��y��`��Xc'/����up�:a6<��!�h����������GO95�
	8����k�Vh�7��jL=�0e��'�n�G��p4)����B��gH^������������Y^Q�-T����j��T���h�$�M-����J�u�i�#R���9#����u��^$/��j�I+��p��J���l�G����G7���y�I�^�U��B�#���.YvqO�=���|C}��6S_M�.��@�jx������W��&�A \�b�C<L��� |F��6����}p��Cz��F�g���v�g����(if����
S��E-Tq�R^�5�z�:��r+���*�����e���zY�W��E������"���_M���]��tD��a��]`O���S���1lB����"��z��:(���oDD&�>�(>j���I�����`�
W���J���b1kW���5�D%��6�l=������g�%_<��^��n�r���XG�����^�ZF��Fk���;:�����0��V�������!��X�cs���5R���O������y
SO��x��{�"�~�e�k�|��
�$O�����7�^�4��q ���d<U������-�~���G���a)�r�QIo��i�w�
J�:�%��BnKH����KK=(���W���X68��g�����||0|'Nf"��2-����'�7je�M�0�9CgS
#���--�2����;j�'mYT��������Uju�@�#�|x9cf4	�
���M���:������-9k�$"�,�z�^q]tt���~�ox��/�����Q��G���c���n����-X�!N�o��i._[+�D���!'�m}������Mq��p�<B�f:��>������4}�CZ{���j=�YA����k��T��2�q��{z���~"�t���p�FS�(Y�����a���l@��7������G�Pp���}D\��%_���,��m@�G��~���.4�'��[����(2�m$9�PQ���MN|��Qsiq+��
���[@� &�|6�����G�����s�!d;���y��C�;���pv	
X����e|���&��*
��&,���)���M~AU,WkaR��#Td'���U�����q�4m�|z�
�^P��
[u���
��bnmA<�5�@�'����N��+��L��9�V����k�Y?K��*��1BrL��~��t��z�8%����t���c�u�H�u_���
LK<�g�V��r
�jy��>gg�q��vI]�`m������C��1w;�Jzn<F�f���!nN|�	��|@�AG���/��
����S��~���7�@-�,�M�i8��M����'����\[���8@?�I��V}�G��'����H��e�����c�	�������SX�D�1��T?��B/LQuTAWF�\3
R�#j�hZJ�v^��e<�9w��,?]F�y���_��V����!i���{���Y��~��������0V��*�T�]/�/m���������TI)��l��M/�&zKNt��Z���>c���bQ84)G3YW����_��
B;;.#���g}|;��v=:)��I7�O���yk5���:0j���:���xLY�f�E�u�������1*��Z��R�
{w9}���o�8�SF��E���Nf����}Q)��NG�;�9�=��D-tg:���y���L�Cxp=�������(^��*����?�+�.���y�-��l�L+�o;H	�l���mv��H���"�����"�x�q2��'��*�?h��jfQ��L��n��(+K
"�����.X��W�������L����
�u!����nH���<��;&��U)�����zW���9{�����Q�7����N+�e����<��U���
�}e��@|5���xz�+	5Y���g�(��#^���H]]l��e���>_,����v����S�T�>X?$�J�x��GOQ	�i6����6&~~x��,��YE/�����:��OH�g�m�p����g�,x��Ja����O�kX�G��o��95�@�)��n�O�r����%BI��R�9����}b�����P*��%�5���83�sz�c�*@T��6�*�i�q;T�"���v�2����*3=+Jo�14p�_�����]U�x�������@-Gc����N��5�X���7MgE�G]i����A��m��dV-����&�mQD��pr��LKE�
[2��a@Vg�42�p��q+K��.Cvm��=P*�bnh�7�P��=y��a9c=���k�k�����Q������c\*�bf
�=�+���
[���/�^G!�0�q~�y.�de�%���u�-���3��������9��!n_�'��������{o�k�������k�E�u���a2C������m[��q@��v�i)�]����g_F9�wKGf�;��`W��?Y��n2]I��<��]Q���1eW1J3�����^�F�4��n^�J��WV����,�2�)R�1�p'����ib�d�������������"���f��g��x�eW�|�\�XP���R��q��x.tz��G�gg��5��
���f�^����U�����������z2��A��Z>~��y��r�*������>26���GFA?
�R����5tI	��X�}R��;��d3`mn4(;�l��?+S��S����M�������c?X[z���r��+�Z����U 5oHf��an������h�E���H5-�+�����9�O���m29�7���
�Z9	�+���1	s�:����5@@�����3�N�4�'%	��E�^�_0�^v��(�]���V��%��u���Q�����4�����>y��?�	���������*�sJ�PS�a����0�|�����E�2�){�gW�����p�>1������Q����V/���^�&�Q�����p[5�I�Zka�*���s�WV�����i}��	�����N������� �B>C?O��������;B�����A�=�/���x��M��]e�^R���M�Z�#PI�N�����2��R�}��S� �<_0�����1����d]h���D?F|�p�*�`9���/��H����dS�2D%���� I)�Tn�+��T�%�-,Y�kJa/^�PAt<��m��H�[��&��!��;FLFs�;�eA��Y�#�,�c���F�]�5���!��i9z#�zcNU�D=tX��q���PscU��+�=\zk�D�Y����*:��=�#��WL��`C�1
��nHr�]Q�UZ��&J;�s�M'hE�X����;#��I�����Kb����;�v���-��K�,�i�&��~����t�*���i�7�J�������&���7t.<e.�n�a.���pMRN�z:���a��"S�(D�)�9Z��.�l���k��a�qB���0��4��9��o���rk~.��l-HZ� ��`�4�;
W��Z'w1�
'U�)��8�[�,9���(\���A�D�DNt�������]�����QW���CHR�}�@�����/��9�[�*��dt����N	�����	�����lkh�Oq�!Z��!a���+�-	��b8t��\�������rUVy���E��\���$�?;pDa2�P;F����A�U������9�����g��E��������!���1$��<z��2t�B��5r+�D�@�1Sb�	���&�g&���s�hj��hc��Y3�����
���n3�a��������#��GWw��J���������	<.����b};�������qxJOhDe�(��.S/��������t�tc���2��Fi�h�I�P��?�����#�����e�Y�p��kD����"1����<�G��+�jO�j��z�*��^WAw�����u���#&j�[��&����v�p���!(��I&�Q�Y�+D"@>nE�*��k- e����D2����<#)'q��y)��]��!�B�"����h��$�p��_�,iU��(�`��4�v��p/� L�>����n[��!�{E��k��9L?x-$�(����x����
O���I8 �Lr���Nq@al�d�����1?5kf���� �C������B:�6@��=�8�����o����X�������Q�����0�D��U���Q%��p���'��A������Q�x4�S������Ech���En<d�.1US�W6���7j��?�jrZ@1�r/q����+�~K��F���W�LK	��$Y�v2N�#dH$��(�R��SwY��`p�/�x�&"o)k���|U����{%$�p7C�W��l)��b6a,��*,�z�J�4q�U}��'���uq&\����S������
���r�"���O� ��a�Yy�JXP�<��l�.~���br���E�H� s*���x�c�k�$N�0�J�5�i�h�b|�d&j���k6+��&D�A��Z����#\<��"�tvRBE~�������NR��'���8�Uy��(��d�0���8�h�
�������� ��c�Ru(�5�Q����=�����������@|���  ��9���nc�x�d��I�M����Y��"VE1�$R��SX�U
���:�{�O�:%D�H7��B��K��
\5'�Y������`v�N��������0��^�U7�e�B�
�p U�Os��`��]�v�Ts��:����%��6��,*0�&J��M��#A���A���
�!�/���N7��}�f�����*��9~�8�;Tp]�����2��$V+���#����![v:�^�����6�f
�r0��u�|�	�������R�m?w�P����T!��1���b6��Y"�bF%�n
r5�p�<�uw���j��*�{�/3=;���b�F����l�)C�~�&"���������%:���8�t�A�$=���>R�d������\����B��JF�������b��������![��D��V��
_��)�I�d��������� ��e��eDp� 3`�����_�����3%]�
��!�`1�|
�eXH����Ux2j[�1-�K��P�ax��0�h�c�&�l8����+��\9���]���b����q�=�{31W�8�I���}��B�g)���-��z�N���aq@�voR�]�����a�dE�x���o7��2��4!�D$�7�����Ts����*��1N����+D��
��o����]���V�SYe.��C��#������V:��y��D���k4���T���-�&P��_�F�	�!@���"�����>>i?{��������pJ@�a��&�~&��M�|���xq�1<"�mB��h4���z6�$N�>�i��NJA��7!���4���w��u��,���_j������e���SI�
��]�C��j�N5�E<P�%�f�xR';�� �m�&5�V.�����9T���AH,Z�n��p����+�k^������j��9����������
N�j�[����t����G����<v��W oq"9'�L�X�=/�\��x����l�����9���)�Wp0"�C����]#��V?����0j[����HY���"�E��>��{��������I�M�������������O����M�I��%��G�TK�����-�7@������}��I��2��*�l����d@�p&����>�K@����DeU�����{wR��38����!���m
���W��0y������i�`��z�N����F�`{�+#_k8yA�Q����A\���WS�����W�PAY!Y�|�����w��Bn���-aI�R�|��+Fb/��\���qw�25��o��Ny:�BnXq�[�N�@+sD�����t�mv�����"��S8.,�|�� Y����s�c/ipv�{k�#e�G�`9��6����O0Sb&jv����'NgK�3��X���H��T���9��H%t�l��T���_�}���u
��Q)�s�Fql:��j���Yj��_<2�<����j�(��N�7i������Gs��c�cX�����eD�G�����������K�J#�QQ��JF�r�%�I�kp�������|G�������K���4�fNY�f���E��5c�\
�W�@��3�������%�����1���FR/m�]NO����pot��;�������)�����w��6?<���v��a�Z�(����mCA�����8�[�HiD��MY�}^�����IB��H%�{��j������b�u��Y����W�;�S������-�Y��E6\I��B`��1h�i�����N�$����Bl6?���.��aOQ�1�/j�2u�1�#�1�d5b�A��if|��2��x�?:��������9]������]D�������a*i8F5�_�����Y�y�bL�y���X+;����t`<S
����=qLS��=6EY%���}��cZ(RX-Q0������D�x:��aA��mq%��g��p�)H���Z.�����K��"L�\l�W���8�����8#x����V���8#�#��C����eK:�n���I{�Wt>:^�H"�������}������.��	Z�7?f���;���<k*8+q��G��;2q�q��U[�'E�T����5�w��_���/�;�S�� �n4�&7+�=H��	�R�����U��`u[���O��SM����3e��I:�v�n��(�o������0=��51"U@�� ���Hb|C��c�i�
�=c@�z�x�(�pO�^nVz,��-�I��[_'�Qy�4�H�X=LH��lt<DE�}����d13�D�M@xkV"����������=�r$�B��"���
��GYnj2D�@������j�F&�W�G��3O�z3c=~��p����i��;M�b[�VZTV��V!�K�t���|�#(����q���9x�XH���A�:�Vu*j7z������O)�gn|
�+�f��g/��Q�k:��i
�On��y�c?�*��*�v~�������N_�P�)Y����%������SJl��g�G�.�w����M
��@���o� O�R���	Q�	t��A���7fQ��*����A��������z]#
��U� iY�,L�V�F���[\x��"�����'��h4H������x���K��-��3�;�Svy��)..�N�R�*�^��Y��$����u6�t9C�k}���51VV��mo�<W8��ts7���	Y��j�~Zlph���[X�;�[���c����Qk�{`���R8j�-[;����%�F��.[{��6M-��)�i�B�vk��t��E���6@ �M�+��N���E�[��ZK�7��U(a���
��C	-:�/":-{1�0	t�����������x����k,�Q��y�d�m�mP����_����?lls�IH�2���+�}�^�uw_���c�!��u���E��[M>P���� ��$��f�d�	}@�E{D�|���h�@���B<sl5�6���qF\T����Cm�C���'T�%q����)���mqJu�����=yYbF}�H�,��;AyN���#&�m3��	�.��,��������5���\�+������G�m���<hZ�P���&;w�	^����2�I�u�[�gP�cn���;)�m�x2R�8##��yE��%{.]�$��$h[�t�d�1mp�g�����3��Cj�iD����/��Dg7w���������@���|
b�%n&$�(�Z3$���(v�+�B������~TvA�s.S�\>	��7��u#��FQt�V�L{-�h�F���g,� *pCOx���������������������h?�L����f��}o����y���c���C��>��`����a��A�O�W��+�o�K���w���YZ~\���&��K�N� �0�J�\a���r$YdX�Fw�1��vX�5Q����[��t$�F���iK���0$U�R1���"u��#-�
|'����Gd
�a���B�A��?rn�+^1�;�6�@
 �G���co-�^�����)0��S"D��������UXJ)�����V�^�����o�|
� ��8�Y�
F�;�9�Q�������#
��T��M��j����`C5+�:������!i/'��BbZ�`)g��6�~���jk�T:_Fn1���<�fg�1��u_��~�=P��2H����#��
T��J�=�����tL�sM��KM��
�lcb��x��A)����������J�����,#S��@M�hH�]�d�=�T���B*�{K����,����Yl���������Yw)?7�s����.���6s��T~���3���r������e�)K:��U��:N�]B��q����}�lU����0)�z��Z�Z����V�^���46�;����������A�����h<9�hF5�)��F�OD"Q�0\J|���"���.����,Saa('�$���/���C��LxeMr:�>�����G�(�����~�P�����@H0Em��#;nW���A�2acr�AE�Ue�uw�Y�/�Y�wJ����t����XZ|�q/�yZc3��3��&��j��������L�l��8�q���a�����2-��1�&���v���O���RP��'T���L�T��4����6|T��i�f�0!�c3�%7��������A��Yg�N�+��Qc��k66���#5�7�h�j5���'
Y��9=������L��U��8Yd29����l��[soo{��hn���(����|KhY�E�1�n18�|;Mo�������0lZ
Wo�}4C#G
��</��C;~qtrq����8��[|i���6A�kn�/0�}�}4v��~�(O`���d��}������7���~����)g�|���s�w�8\ k"�'�����I�����Bq�I�f���(�����(��m/Fc�Q���aZ���<�[�����>z�>+P������?�F�8�O��!*(��,�P��[�:��#;�9}9+��z3�\�Y�L�����t�#-��/���@5��������Y����nif^���(\	l�&�
��x��C,<O/�|�����H�)�a!���5�!�c{`���`O����6[��?�L��$0`�C����U[!���
��~�,����3���,e�
*���?��V���#�#�g�]�%�m�j�U��:|F]����Uy<��*S#!Y���y�P��_p|��������e6��X�I�����J
��h�'o��::�.�KK�#��Q�e21�����UtYV�"��jt�jA����E
B+�|E�����'�����7X\�[q�#�
KE��0�������
�u!
�NIG�H0�Z?�f�C��M�D��i6<r���"d��x�;�R�h��v
b��Z
���(
6�N���%K �,	�8���K.G<�z>a�[��`��G~<�5�I��/�%��VU����'�:V'���qc_+��e�9w���h�n�3V{�A
�]=�=�K�z��+�����3�!|����sBwq���"��	%�����O��?!���O��p"������^�t�Q�0���\�lE��#hjnS���S�Qt3��&F=��",�nN�O4�T���!<Pc�A�=��g2�/G�,�E���6
��Cq�y���iz���o�1��n�P�LQ"���2$�!��=lF���Yu��0M���C�q���H��,P�������rb;����2�$Q�y	���[^7>U������R�G$����&�%"ptf�Q �8�_;M%���1�����g��{�����P�P�ik|g�1z����g��M6
R	K�:cMN�kA�kJO
��6
ihN�&[�Zd�����4��Mr����lS���+����aeU��Dv��IH~3���~b��/�_�����k��o�=;�I�Mlbw�������,4�w��h�>�E�AHG����Y�o����8mSPSx���S�����.�i���G��N\�����HI��J\<&4E�����dm�G�����S�P�uh�(������p�?
�u��V�6H�YF�_3��<��fR����;a�'��cY`mW�p4p�G��|&�&(���7VL5=F���x&I����>f?}d��~a=U��������f���e���.8�uV���\���|3�,��}'28a�^X9��9n������ yY�"���)i�<!/�'[�G�tD{�4h�k�(3p��y<y�,�c�qB\��������������[w|x���lFTDM����j=V��.���g)$��%u��T�jA���j���xo}t/�T�cr^6aRUh����}�n���
1����������R0��U���iV�~e2}%i���!v���$cb��L��>{�Fn/4mt(
�����Jt>
TM������Z��+��1W��\
�!��BKY�O��[oi��|��DI0��\R������M2	@���NI��.��$���5
���/��"x�8�rQO&�������,9�"���;1+CV/��t�C49����oY����`r6�W��K�r#���QY�f�Q5�!����8����/�`)w,����GQ�94 p�V����y��/m�9�1���Xf������^�����P@�2�p>�`����3���{|����t�6��qL;f�"y���d���j�U���`��D�On"+�[���|B.�2�)����um�?X�[&�S��`��gK��:�������o_]�O_�"�c���E8&�Lks��ss�"o�7'�"�q*�No#Y��g�	����F���@Y&����<���#� �r��_��������t�Q����L��3#E:8��#�|�xx���j^x�R\�,'h�Olm��^f(.��bM��S�'����	{k��Lt����������w5�4-��^t���y�����~N�K�P�L�^iT�
�R�]
���"�Ed_�Um�@/d��2h�8{&`O�p~A�V"�1vtm ��W�|�}�����5��"����XD�@&8�Jl�}Rj�6�v:�S/�E@��f�R�0X������;����7���CL�Z�?��l/�"�;9�UE�����*w]&������[D0����$������I#
{��'������.'�Fh������'���	;�,g�<|��%*��PnB��)?�K}?�AN����Nf�:���U?+��t�����������I�;Zj���C��&���G�����3x��;:�����P�(�����H`Ezz��I"q�?���l8��kFT:M�-�8�Xy�]��������)
�~rBM%��}i-d���}e�;N���r��j���Y���Z������+���������
�����/�������1t����;�II�m�9��7N���+|<��"`��� ��S<�i�y�&+�nB��p��F�m�|{rHl�����z�Q%P6	J��������&j�6�����j��HC��J��	x6M�r���l���g�����YV�s�2c�N&�&agGo�N�G?^��;W��dV�E��\�#��>�2$�t��yc�AO�9�]J|�X�7}�*:����I�p�����4S�CT:!)r����������,�U��>W��8�m\V�B�#bvee0Y�Z�$U��\����q����snUM����k��lTm$;z4��~U�����Q��	�O�[/�w�~1x�������	�����F�-��d��Cq�������5-`H����[��&�V4��tw�����3�l��O�@���9Q���)-P���x���"�L:),\���7��AY.uT��>�����3���o:=}qj%���| \*��O%>��&`���$Q+F�u�E�GcI��-�|��������f<>o�����������Bk�{R��
������Z��dx���	�7��x���\Y�����\3z>OBu��"���Y��t��9N�����2�*It� �ECf����~�0�8m���f}P\e����qH�����3"����E��v;Rkk%�k��T���|4�a������Q��(������t�k����cBj)wtrMg�\��G
3y$��B/z!}���:5�4KN�LV�6��}��
�N0�.gE�E��\jv����:|7Y#��q�X�l����L��{X���uV�s�;��*o��H	H"��B�����y�nf%�R$�qR��������E#�*��(������i��(�3+4[���b<�����b
I�r�
� ������T�J��t�Iba4JRi��M�'������Z��p�d���lO���)�rY�K,����K3������`�gV�Y�\��VC���)M��yn�
���Z�l�E��a�R�|
2��P{�������>�]�B���]�u{����:fW7���eM0��0yy�GT�*3��5P�\�����.0�Q�v�*a�����}z��;l3�%��[6�INh���A�Q\Z�x�@e����d5hv��9�LWt�N�4N>�G��W��s�L�'��/t7QUp�'��-���B����X�a&�f�^��U�>{�=�u�_��kM
~F.I(4q��>�{�7�t��=>�hn����}�2O��U\=|��iEciMm����aw��v��)}�K��Fz*1�=H=�j����B����a|��� S����q������	��%F�S0.�=�_��	01�����~p��l���/:��C��;�����{����]u�@[v��1��*����EF�MY-�D���u��V���*::;;=����i��c�����kxB��hxY�F�2;xOT�g����'L��:����)����I<�vw\��%�l����!={�d�EW`���f���s�q��i�9���P��w����1�#I9���@����Zk���
������aD�/�/o����>�+|���h5�cH���)_�
��;�VOaU:/����IX-��zi2��h�l["��jhP���)^'�v�2Z@e�h:B7[�	'��*�h��d�pP6��/�������4��D�0�	� ��E����,�v��S�u�*����2��T���y\w��n��C����D��qB�?���Gm�����l!�6��+�A+�7�%�|��JN#p=��m�(a���W�����������}6�i������3S+�n��B;�Y��=np�}�XP/��b��L����8/�����l1g��y.�2����J�'j]9kb��~/��o>>Gh��5/`7��+}���q�0�B��3s(��������
=tDS��h6����ZU�z���X
mf;��|��j41~�	��O=�q
J����L��m'�����w��0��`�N�%�*�-�[��.t&�NY��D�2�I����L���J�D��<�E������="���ph_���YN]s�������P����.KTXB�r�lWY�F���W��'������������*w?���\�)���j���qi�)j��{~b����BG�������8[����S��������L`���56���@����@�ew;�-���y
-��:}�x�Ec����}����f���j�=���g��tK��6������}��&�Q����,(�N�������"FTm��� �>���3�#'�#!c��������m����,"����?-��.������!;���e�e�7�M9��j���{�]�����7PyE�'�<���D�o�����3E�@����[�-���?=��9r����t�n�����8���wi��4f�g�w�2�L�����
3:�k�'���K����w���a����<��ya%���2��%� �N�N��b�"K�DB&�y�uEP���'1#u9���;L�+K�*Y������3�|.����H��M�u9E��Y�����y�z������0�q����T�C�G?~���}��0{bz��d������oD�W���u-�����|l�/%#�=l@v�t5���+a����@���U1���K�������f����{Yc���<4y����*��E��5����<���}��x�Py��/�!4H�=�
���u�q�z�������UE��y��_�����;F�~j����_���/22ja��a�ZYG<��Ms]��J�^W������h�;�i}�hA�� ���*�/�����r;.A2�!(��t�/�\��`z7�H<��{������E�8 tGB�jH��Ig4��Z��!1;��8���x.Rb�p�G�����4-K?qZ��pZ�rB��3)�B<�H_��/tZ����,5X�)sz��*��.1V�ZO����E@eKjN��������-6�>�6Y�������s�k���e�	MT��
������
��
��3�����c��_���,����� _�TD:�U7-��������t��iYk����E���5�O����
�#��9V�{]i�e2T��]W�I������T$A�1�a�Y	1Z�H��R���Q	�P��Bw��v��V�w��''p�r(|�h��B�1��_��DPJR
W����H��-�n�x�T%��������i�Fwd��������Y$#�z�������TiT~��6c���@��|lS*0�
��J���	rE�YqF�T�`�<5��Hf�<���c�U���)}{L�xL��v�	��<�oF%[�K�����l�,��f�z�E�f�b_y�����YS_�q������b���������s)������0�*&��-w�]�r��� �
�*9��P��d�M�
��$����K���d�T��V�n��{�
\(�Xr��������]�Y.d�+/o>eo���~�m,����)��=`�qes+�������7
{_/:�Vv������#1/��������������5�%�C8����)gL�x,��V��-�
tk~
���;�l�M�r��8��{5����%����u6������
,�������-^=�k��2=f�a�����U��	�lb�n!�D��g�7���#��]O��g��5�`'k�f��%!�#pP���x*
q;����<�kT������,%�0�V���uf�9���,@���jj.�dS��~^����x(B�R_,��.�p$/uR81�~���$��+�
y�:�7����>_���Y�Tx:<������J���1�9?]OK�e� L�"@�8��� �9O�����n�f��=e���!Fh����<�K���[��d���,e|H��d��3�
���ja�Z�q�Q2C|p���]������2�6�:c])�O!z�(����.I-.X���d\�9.�xy?��Q���^���b\����Co�����To)C
����%�#����/��+"L����4,���[������~�X����sC�m�����bEiB��2~I
�"J�k�Lna"����t�jR��OGdJp&�h�3R����Y�L�������CF"�6/� /��Lw_�x<1	���$��j����>�p���J�:e�H������pq(�N``���51��5����K��w�=\v&"�:�3x���d���N��1��Mb��
��S��|gHB�x��ev�l#�Z%L��2
K�q���^=��!P���"4Y����W����}��Df�3}5����w�7@�(�����1�>^�Ka�F$��AWu��3p�70[�^!"�(H�
Gc������qY����e7���\�������"pJ���w�������
]�����#31�������\y�����& 
������
��o�^3I�f��I�&��x�s�8c�6���A�T����/����aeI~n&�Ss����>Vr��������!�I<��*�`[:1���*z7TP1?�����[���OR@8k|����+x�j����a����x�'������`n@����'/�S��������_�����j�%cE�>�����J�Eyo�kI���o���b��:����8���.-�'5�u�O����~���$�{��-�����l���N��w�vT��0�0�*6[{��n���������x
T5������c�,�-x�L��$�����_�����!o&��i�[���FH����3�7������n��8�-e�P�����\I�=��t�j��������!8����gP��������}��1-���[�:��=I�����3�u����[���z�	��������t��cJ!D��w�qj�1=]��=L-|��/Q�W��)KfC/��^���6�Fc��x�5�������;��X��Fkc���0uq"�X��C ���[����/�%:�Z��J�"�q�z��on��5
�M����@��P~����N{h��e�y�2q�!r6�g�|Gob!#�<��5Ny��"�zy�I��,\�?�K����a!�������sY��ayZ���c2�XJ�3�1�U6(vGl�5�� |a���D��^�q8�q��D��B�y���O%��La��}�Ns!�DU*����@�L $7���w�a�����#�� i�,tA�-fn�fW�N_�Wt����.��S�U�H���>�FN�\�3t�����-Q�����f(�	�#4���j�����u�_V	�"b�Q'�r�m��L$�����eP�l)����d�Jj�n
�F��b�`�
N��HG�n�(���+V����UBj?	_�f���'E�4t]�|5���i��Z&��e�����
��y��������ztWP�@r ������X�wqL��9����"5e�+*Pq+7�M��skS���jEj�r��(Lw.�	u����5�Z�����p�H��e41�����qM��i�"o��P�g�]%�t���Y�������U�����������=	�l{Eb�����7	��������fi:��Y�&�*�7���4~�{�jD���������t�t=���kD+���6@����N��p?`��[+�g%h����i$�8��wYr��{f�D�����b{m����F�yH��$���7=���P"��96+7�nw�������$�k��`�%5��d$��9j5��D��][����
)rM�Ox���ar�q&��(��g���0aN�"30D����Ao�
6.�N�av;l�"�%�E���}�n�dtVsL>\./b�
r�(M'^�;��������BO�1R�������k�+�g�R|�$k��Z
S����B�)�"?��J�V'���������X���d@���~r�`�2����4�<�B��Z��X�Krgqs���;k4&��G�]��27������<� 2�n����AD��~l��0�j�������#Z���O�Fk
D���|��1��La�������i�P�a��,0�:bv�6�]��������+$D��1�,*��en�U�#{�	������"]I�@Wa}�W[c����OznsY@�#��/�I2s�t�Yu�9%�J���TE�?���8?K���\�I�L�)���b����J��t������zz�[&1���S��)$U����7:���E96�������O.]��r�Y�,�fa���3��9��Y���U�>�I.`���T�@m���2]@���1�rR��o�R�8�0)�tX�@B����I4
�	��	�)� ..����e�8���}L �G���YC~��]a���$Vr~���hM�_&����z��M����N�����p�~N���0��
����a	��Y(H
r��5��(��%t������HIP��8u���y�3+�����c���(�]%�,����8���C7owX
�c�L{�w�����/�I�
.��1h�m���^�:]W=%K�������t��.E���_&[ �R���`M�H�_��|=�kVpIr	%�����~�����3������@7;8��R�]o%~�Gi�C��Xe�&.����^��^ph�|�A��0��	�Zj�A>��c"����T?��^!E�x��M�Vpy��dIC]"�":�6�
�����Fd�O�$dg^UV?�T��S:0?�i�#�OI-�j"�F���G�N���;��In���T)�'������<��eq�����>���6K�����O��wI�m�����R��.@w�`~���E�/f��������o���f|W����w�9�1a�CwHBd���!�H�����p$�YH���~!�|��>�]�3��sS���1+.������&X\�K�>4s�����/�U���r���4;�Mvo�[�"w���[B��z0��j��	�L��������$;HL�F����q���V�y�0?w�=
��\��b��� ��V��5�s�9B���.�CDj�������5�g6�iTs!������s��N���heQu�&(B
g�Scd���^!WN`�<3�u$� *����ZG�����k���?X�<�(@�:�����*�HhX[<P!�!��t��[�_�N'�g�<m�+g������q^&2�Z�c��L>\�f�
�����r���x�/�BK/�LCQ�_��0^��j(|���K���������*���a�h�����2�W.����G?>���
�R��ia���3��s
��%�r�xH�t�y��
E��
��k���fL����|Z6KC����^d�#w8�����G�c���"�GW�"1��_��(��"P��{/n����KcV�U��5�7�D^H������x�����2������'��}��
_z����qK�&nI|q�{o�MF�����e�g�\O&��N����-��[�:��%�6�]�O�	�y2���1��^#�N���;�O� ,��K�;���E7m�>�/\8��`Yl�������~��;K�]�g���N���M�9�V���u����X����*0y	�yBEl��������%Bw�?O�L@C�E`yq�V����nN#��?��![�KZ�`�*�u�������5P�u������=�W/(�.�.�y+\s���
j��������)�w���4����T}�shc�3��7�
���4R�"(���#�Q�dm����No.�������f�><Oz�8��P@!�r��CCNJ+82P��1t?h�2M���X�����Q@��&$S�#��L��z�%,-dx���R������S������&���r^<��"���tD� c{hu�#[)~i�t���!k��j)Ej�w�"�Ji\k����������/X��(%��0����(
��D�.-h�{@�e�b��T,g��V�s-a�R�Ky�%3����=�:g��O��a��K?;"��c�rA�������57>u���
Q��8���/R53��i��j�]��<P�{[3�]��3�%����v��J�����Q����^�)hm�����q����X"'`�S/��f�h������2`�;��!P��o�P��x�0�S�sh5���V"�C��$*U��s#����y]���
���U���C�F�5>���i����x*�Q��u/�r$Vs/���c�z��j����P�����r,��,�J�F/?��M�%�y�M�i��D�j"��yA�L�'r�	�yYFdU����_�>���~�`qS��oC0��~'��!p��c�#�y)���^1y)���9� g�a��
9��R$�%�
aK?\���@���5�7Vw�ZdGk[����-_�1n�H��	����s���u�������#8��n�s�����I��Y�|�;;�Y�������3������W�C�U�.F�9H2YP���U[���b��E5�����Y%un�PY�o;�sp��t�����5��K�,�
�r���h��EW��r� ����^$��x_Q:��;M����X�3x�� �+�:���`!"W�>�����������p����F����C�rW�bO#�����P�mZ~�w��b���f�[���)c����~A�KW�yT���j����_�,���b#�D�w��Y;l3�;�'����1���x&:��d��A�0����#D��M4
uB���ab&���1B�i�r8oUh2���������p�x���*Ox5��A�x4����9�������Ou�-���|�\,���{�|B�T ^�������F��`s�Y�K��R���I?��o*���W�^�'�
s�������H�������w��������{�x(�3�G��K���~wp��ac���4~�G&���y&��3��������8F64Xq5��Q�aez�+X�jFA��Y8�k3h�Y���<�������Z@��m���R�Q���0b�{	e�r����s0���@n��p���q,�g��)����W�������
�����K�
�0���XK� �=Kq	l74;p4dd�	��]A#r�Yg�!2D[S�<G�(%��u�4�a�RP-�,�c�-��������vp&?� �Q�j������9�R9����[ +�f��< !��8�kT
�N�<��U��lNWE����k;����!�\V�������|~���.HY���aZ�����B��SIsh���+<#�hh����3S�,-,�������<�pc�&��&1l�	��Z?��l���3�}�4��q���`��������|2�eI�|�������7�p�����/���"���[B��_������8K;C�����q y2��x9���k������Cc�v�A���&�9�VU��%|n-x�����Bb��ta�:�w��?;/�7U�<��=�=����^�HXX:���>����Y��<���Br�,I����*�P���u���1X������mmC�Yli8��\Q��o�[����N���M������7O�����>A����q�������:A����'�j�����U���OZ���<�������-���ni[���)�����[u�P�A�"�iSW�Q��C�����T���w,Ng����������]��(?H[�&�rP����2���8{Rz�>�n���'��d��x�+�i�ce��r��E��O�����Gm��VbXR������L����4*A���c�����?��B��Q����C�VCK�R�P<�F �d�;�K��aR�����t��:L+����|t	��C�x+����6K�nw+��x:�����N�>a8� �l����3t��Hn���2}X�����(]��j"�f��]����n�������P0�������3���u��u�w�>!O���~�3��<�6$�
�zO(f�K;r�4���&�u<J:��e�VE0��Or�8�_�P��&�������������+�w�+��U�`���,���b�=�I�Le�|����Z��������jc�9/��vM0�X��"��X���2.K�E���:���~��o�����3;�([Q�/�/r���,	��(G$���`�y}]��01vv��JWz����c���f��>��N�x��\�K�:E���v���5T�����uy�y�������6s�DO2���#Yd/)w�����7��t�5��o(�5���D�ee��a�����_�P��t�4hW��d��Qfm���������SX]�]�m����8���p.��m���r���h��[+Bs��;kXd*��~��nXp|%�f�Fu��'��${��������Y�����h���\.���;1�@���D�J���9 ���U�Y��s���8��Pnn�$�Z�fL.W�i�k���fx��7���Ab�r��o��pPQ��� Q-�i�����g/,6�a�J(�vNhr�Y���4T���/L����Vi _���\�H����F�w������a���%�0�EI�4''*)	]-V��e�&����� ����-������(d�uR�4�%c�g��`��������`%8(��I�f��5�G~%�0�j`t1i�u]~v8���I��DS$����eb�(��xtCy��,&�d���
X�$Q�>�Bsg�a��-��f�P,v�p��Q�u��.�N���.[`�^��������p�����A�������rc���l���C��=����DX�I���Cj�s4�0�`��<��u}9;���+���F�k�5�B�.6���.�P^g4{B6��xU����xW��^A$�_($�T�O>�S���|���M
���=��h~\O�~H���e�D��[NW[��\�Q9�s���F�3�>yx��
�;���l�#M
����bs����jb�H�6|��(��HB�.�*��|�~�������O^���oF�W(k����_�?t�s����������������e�J���fY��~F���.����Z2�������bx	��^kq13��6t&��~��vg��F�#t� �~D�4zM-M�?����L����y���2�@OEX>�w�r��70�o�.`L�wq���M�K�\l�"���zjU���F�D��*K��������f��C�B��d�I���K'�=����;�����{k	�Uo�>>o�<>;�8<x�*�����/��v�[_�m������D)��������FC�Ep�t���UO�[���C�E=��<W��V��l4d�C|�~0�I�m	F��������!�L��:�E2��g�LG��l���d���`w��1���A�*����`Y��f ?,�o���I<A�#����$���z���������;��-��p�GS�����'�_���W��n��Q������b���]}��7GH�������_���EAGN�5,=/E�x�u�����Q�����������E����7�gG/�Z�@���*Z�d
E����1������d�����D�;���t����l��r���k�m6���3�(���A,Y���d!�w���>g�����o�I@�"]�'����f�R��p7'�PtI_������9:��~~t����vK��v��
�h�'K�����j��Sc�r�lvxd����"��P�i�&DT��H��i��Fg�v�>�_�|Y=�8��{����8����V�Y?�;�BM�]:��q�
�������E��������,�.����j�mQU�����o��,��3*l�j�L*�)���������m���l�������z�2=RE���$���nV�ug��1ay+"M:$��'6[���!V��RM��s��� 8�����s�f�����=�d�����hru�o�s}j�h{3��VB�a�x�����������E��RK+�@W��J��`u�������bF��p\ao��^7[e��Uv�;����:/p:�6��/�+�%�;=e��<|�Q�3�^KJy�K��-�W��.,b�Mo���
�(����?��#���K�e"�.�,��������,�����X��H���x����K�n]KrJ����
���z����5���U���ot�$
N��U�}i��DaI�/>+.J�"K)jf{D����5c��\����b�M5d�
F1�AM�%�c�R��@���k��7 �8�p�v]u�0���
�%P
2���
�(��?e
��s��O>u�h�	�}���E��t#
h;x�OFJ���\0����41��d�0�I�/T"�[J�N
l(L�Js�.���0|�������F4��/��Y5��N��0i�ui������%�Z8����:]~b������e���O���IcW��k[&sm��O}��7^T��eyT����8��u�.=�0�$4��e��,��Y�������@����U�Z7�o-s�����������e_j	r��z!l��;n����O,wr�G�?}�1���La�_�uM���-�2�_7��B��W����,���<e�9���r2b���,
�u�K�srJ��x�w�$�vW�hg����{S�9���o��c�TL+,j5���R�p��OL��U�s{h}nDV���t�D��X�It�O���Gn�a,�=�����������]�>l�z��^��������A�nYHxE5G��k��Rk��������#��&1������N��x`'��I8�c��:dWlO���{��0�W,�6��G�SW4��e�l����_n����_n����_n�/���m����=V��U����]R�d����lD�r��Q�����mu�;�C�q&�0�1���']H�����c�}�����C�C>qidu�55�;��&�61�
 �n������&�:~��na5Rp��c�m�"��� g��>�.�p���]Eo��&����o[3�������o���x��a~�Z��gGo�N�'G?^�e�&�
�em+��b��U�:�������#mW��[��~��yq�������4�������\N��(+j7�i�W�:H�h�z���K�p�����s�Xpb���>>�pfj5sx���e���X'kn>b��a-��-����z�Q�z����o{2~�XI�S��0,UY]
��H�D��W���L�<�V��W��F�0���c+�S�����O���)��DCOTO%D���>�S�>�.p�����K����i���J�D���98�8>j��(8����)����;j�A��yn�3�[�Vq��&�t�1qz=�s:���l��=$�aR��G����u^�w+�AT��1"F��
W�����i��mY�n���Z7�O~88;9>��K9�8@
%�_~���n�Ca�u��u�R��W���iO���=#���F_���6�!�����Q2��>�J�������^��g���0�f=w�/�������u6<_&���,���nL�x��\�������f�a�����femm�������������Z��h����������0��T$/�3"�!�),�M<�o�Z��9��,1o���G�n�O{�d����3�|~������Dq�
��q'y�x�\���������U���k�\Ys��%���!����|�%�.�����W��!H7�zu�����W�u���PV���E�ou��
�����u�_]!k�K��1��k+p��^5C�@�>;}m"�>b�}$�^��`�a���>���������
���	z����G�|��?�P��Q�&�_A^q�QW|D7�k�xD�gs�3�(��!��K�x2�W�v�������N?1oE�e�ZC���-��FU�y+v
]���1-�b^�<Z^7*���
�y��K��}�q����x�{U���bM��5�7���n��h4����pt�*;�K�R
r�z��^g�;�f\��6��P��f���m���[���^���o����e��n���fl��=<�8xu�m�)]���mVa%��k�^����Co�d�S�'�v����l���z�8����S��?���}z�������_ah����������f���y ����!�D�VA�l��=Y��y����]EF�A���@'pF�I���Vs�J$������Q�_�L���WC�����+��W��!(����x��0�5v�L~��R�P�<�� ��}�� nm�����^mg�c(���jp����{?Ibr.�e*X�������*�LP#X�l�U�8.�,��~�A���`�~A7����y=�%�i�����l�Z-\"q�e��#�&��A�X���a�b���;,�es~���zeM��O��K���l��k��R��R���l����Xv~Q|3�dk����--m�/=�&chxsNQ��--m�)�Q���m����-�YZ��S����mk��9e����h�/������Ea�sSX�'���Kg�mw��=z
�1�B��������������,��l�/���9�������8b��J���*��nm�-�}����Ex&m��+���#�t.[���^o6:�{��H��L�H�K�����~6H�_�K���I�k�fccg�D�#�)n=>��������7�*o�x�e��A����*�����Y�L~�k��N�P���
����.��<�Nr]�"y���&���IiT �������Y�{k����97��?��N��~�����J��5��#c�Z(�Y����o�1��.ad�[�M�E�ble7��[�n	-J�-��Yn�d�Q�����y����Q�(����s.������pg�;r&��t���)������2�����F�\��Q�|,hK���U�U-�I������������e ����k��n�N":uU�x��x����_��N_�����?i.
�+S|�����N�s.���Q7I���#��)�~-�	x��U��v[���������\���;�H�W�g�Q3�2��uP�������R�V=Z�A(ih33�[K������;�~�<db ���8c��dW6�������Q1j���p$�����sAlM�b�[��1������n���G"�����VNk���v��z��hu7w.�Ok�~��fK�Ym����J���GW��-�w�����3�����q
A�^�tI����XBY��B�����oV���+h�xp��)�wi��]em&�3��?���\i���urzr���X�6��B0��5�#���l���&�
�)!��'�~��p�
z|q��|i���0c��o>�bC�G�N�eJ����+l������N/=c[�+�~b�#��R����)���Ua�!��x=]��'owH����Pk��%��C�A~�����Yg�c����	��?��M��
��������4������^}�Y�k�������w�v��[e5Eh��.2Pa���>���N��EE
������T��U�#�,����\o,`ER��g���M�����a5�1��9�uN�!S�����3.�%�K��is{�UY�`i2K|��l?x�a�|���@��v�.���^M�`<t}u����s��F(�i���.�nb��D��iF������:O�NhrW^B<�4<5u��%���mxc��O��1�����]	�s��/%��9�EE��g?2�t�?Bbt�	^�n�M�A�;C4��f)?n(���^/M&n���E+'hvx��x�3~'�0�j!e��!�D0z�oG*�0;�~����]Xx�����=��(qI������3��1����_�D�������c���C`5?���=���i,)d���F<�M�v��NPq
7��h�`��W���:�=O-X9�o[��%Q��4��J[
fG����B�V�I*���I����?]�3��[�B�P7�(fV�G�Y���R���j��
�i���������U��y8�Q�+4�����F�y>1|�d�^B���r/p�
/�hT�9���A6n�"���nf=��SA��`
MI��V��,�_�.�����p������<�,�n�����*7A���?6��{���:��[!*M(/"	��&�^����������
�����?�0kU�"y\��5�[�	�')�b1|$ S���r�U�3�%]������57�og�����
��_O�P4���"��F[�%�>Z-v���-W�<Y��If��w�BQ}��[�.LX�nV�)&`ZE����'�(I�Y�I��G�t�T0
?�c��'��-�(�i�-W�����;HW+SE3,]����n�����H!�g���t28r��M�0�/�c�u��d1������Z5���I2Jn���O���spb0���#N����f�Ek�:��)��Z�]h�Y���kED������M��\�r�L�m�]�� �wk�%��K����V��=�^)�(Is��=�����`�zEZ�U�d�A����a���i�@tF������Ne�3�: �FP�d�8����L.K���CCt�z�a�-��21���L���/�������,L�K[)	���D3�!2O�c��sP�������Q����'��K\�T� }P����=�Y3�&j`����z����mn��`��|\G��g"X�+�����f�� ��*4�`\11%g�"���zxvtpqd.��:2~s&���T��/��;�{e�z��K���f����;��� ��\�j~8������)�����m��TY�����5�0��g.����|�Oh��A>l���a����,P[�����I�T~ �{�����C�M��
)�g�8���Y-��� 69$*>oX5s��Gfczo���s	�_�g��G4"n����z�Q.�N��LC���$KxN�v��9��.{��~����+�Z�M�P
>>qt�G6�P�i`�
o�+7b��*V?s��v�n�E�����r#d��Fw����r����O3���/
�3�o�X]?s<���V������s4�Y�a�O���.���]016L��MD��?Z����f���<�Mr�iT
�{8����W���I���� f���P�Y���;:;2^T����
Ru�8AA��g�y������������x�&B�W���8��,3'4�c4�e�v`������_o6j�_�w��5���^�����|m;�8��%f��\�'7��(���Fr�.L]���3��l�/3������h���� �D����������O�/���2�G<�`1x������$Mf��l����o���b����W�!��m?���"�9�#����sq���;
��M� �4�����Pp������i�z��p��MF�F2�UV_���)�^�Zh�d� ���$���������?X�)x�w������������������������_i6V,���R�,�|O���kx�;x;LH>�<��.|�7�X.5m�������p�O��!8������|�_�P��������yu7��n��RuA���vw�����/��Ur���������e��r������hya���,����V5J"��dD���0��?Y�}��-���5�-
F+�e����������?^���_��!=�mo�m{cf������3�(=��"��A���A�Rbg������z}{���t�(�y��3E���j��AqK��br<G���8�����@pM|�#���!QZ��uY�_��%�B��Q�u)�U����G*?���l���I�~��AU�B�#�AE�K�ag
wMi<��EY���X�+:�@�X)��9�5��v��r�WG//l-}6��C���C5�-���>��8$�6��.����+$�{��k\v�������Fgo���7Sr���h7[�M8�����c��5��'���U2�����X1�T��!E����5����1�NSY��m�B�^gv&����"��M��[�����^��5g�����p����^m�~,�����}���K��A�_����B.��NkqTq�(����h�l���=~[�/���;]�#�'~KG��G�?�/���o�[y���Jy�_���'���'��?IO�/��Oi��Q+��,���@����K�'�����I������]`��Mw�w�������Sf%����S�@_���)�_��-=mp�*��V}�R��/�����@�N!�������bl��������?�;[
�g�����nl�[���6Z[
(���i���i|�)��7��1��0g��r��������N��%��f#��lm�\v������l^�{��{;�g���1�GCs�����i4���oV����bt�����;���	����W�����;��g�p���cLb^&�P��e������f��r>�|�c��D�3��;�L��wF+����=0�n3> �����2��0�`tw��\���������� �Dt(����z����w`^}�\������q�LXo���y��$J����o�����J��GY	���h��xN���`�lR�z��P`�Q�T��<a�P���]�y�(�o8���j���X~:��C��A�#��W}X����u��?��d�P'6��ofr=�^]���A�1m��f8��&���\V3�O�QZ���GX?M�1�?�`#�0�&�
\��@�
,���8�J���8w�	��s������B02�x��b���M��	G��7w#s;J�>����0��/���54:A�.�#�fv���;S��B;�]<���}�kI�$���f:	'�j���a|���*��r�azw��������C��������aB�����`��H����M��5��S[48`���WU��m���(W���a�����=`��I���js����g�{8��W
6�V�-4B��Q���e�e-a���O�o�����T��)?���\�F)�K�9]�T����x��]0s5�3��}�%e}���$���X������:
��-�����C�!�|�|*8�e8M�4<�0l���DZ���!���F�]�fLfD�-_������b*-wJW~�u���b�!ii<3���?A��q�DD��8II&��*���!^�Dv%����r:���L�#
��&�{~G��&y#8M��S�����h��}��^x��0;Z�+��gb4��A-y�`��o4��^S�2}G���U���9��
��$��;U������r�hH�<���+����-�B�9�L,���}�^���{
��TL8v��p@��0qM����Z�_��L��	��=xQ�0R���hX����o�����0����
�'<^�;\�K5xW���2�q9�Zw��	��+& ��`����1��]�$mC��0$\����*�-�!����v�|����A�[���#��4�&�b��`������W8Y�������Z	��)��a[�[ca�
P�=�:
�fb`���$�'P5te���(����5���%��P�8�/�|+O3"�{M����g�Lpn5o#O�.�����X�g�V�%C^r�S��\+��<�
1���5����,".{Xr���)��&���B�F�����r��=dp�Ujx���F�}i���Fl���&4}5@���s��\`�����Z����>I��p#�<�b�i�C�:a+��E�(E1�K�8@kd/a�������_�����Hi@�0��s(|�����"�o,D
��~�T������9�8�H��d��f�������
����������_������aV�p�&��C��?�&�h6�eF�;��]'��x����,w2�J?���cVK���3�=�v���5c�_�>��wx����sw'�0P<o�]��NI��~�Y��v���Q5�ja��_?;:x�����*4��qY�����y���������*������j6�,E<���������!���
tdc�,�F�d
Nb��6�T]��V��:����F�^O�I���d��U���f� ���������g'z3���#<4�]�@�8K����2N	���j���g����L�y��j�[j�������%����'^�Z��|p�J�1�=X�����6���O�D���������]��h^vz�z}cg#���,��Y���5�$�V���]��kZ��d�x������x���"�nqw��
F�x�F�"������d����O�����i-���K��mD�n'�HD���ej��78Mj�Rmm�	����G:I��v�p��t.�i�����z�%�Ur�|�����	��y�_���$�-���#c��r����E������o`�/t�����b�����"5e���������i��;����N�Q�������soe���t:_+��K��e�������x�Y����<������G�Hfa�!��5t����N���N�\��0B���oz1��~>wu�`Au����]�mOv����f��~���s���l����>?{�.����A�/a;����B��kd���M���z�������c!�t� ��$P�;N�`U�cd��XA�aG�)�7
s�a�?ca����{��u���9/���51����[/���_r-�(I`���R��j�*�%��G����
�:co?��~�^k�>����:`�J.M�3+�3B���5�n���S��s7�����'� ������o�]����t��J���go�4/"�{9��J�i��Xr�>9>|JdIfL����I����(I���������*������=��K������y�K�V��7�^��5���a���������A_�����tq�(c������(�b�5���������z�X�l
����\KAj8L>�9��
�{��Lu~�y����{U�>o��������S)����U�Rmr�������F��Y4��X�u/����g���#+t���N6v+Y~��s�����f���;�����La�JO�=Z��x�}��X$��q�ad^�����h�G�?�F�RE&����%��E�_M�.���Q��.������{,��p���%���*���TJv�A����+��=RF]"�#`���@z�z��il��4?��e�\N�0�a��g],�I��sk]P*Cgv�a�S����w�����f���6�}�2�l��n��lL�������"�x\�D,�3�����[�P�1�l��2�8��mGi�QknM��=%F�{�VX�%�R�4H��0���fV��d����%[��gi�QY>"�8i��..y>�(|!����$<	�A�,���c*�GTa%L8�F��=OAb(�"Z��3X������E
R%�TN��L�������?FtE��������%r�w�
*�q,x���9��+��E�`2�*������Q�;�����X��:��g�V���Io�N�_�|���������w%+�����O{/O�^==�^7f�anH���R��*=�:������Z���c8�?8�ys�t�~K��RHR�!�s_���#2DS���5�+��$"el$��X����V�&E�DS�z��N��N�����}���s0��������n>��N��V�H����X�"f�	E_��sgz��)i�:p�u a�d�����(����d�����v������O07��-Q����&�t8C6
2M(�������"n����V��OqR�)#�,5o;�p�����3�`���2[T3L\k��C�d��Y:@�OG�
�5�G�m���vZV��x4O��1���-��4�0��4o��O�H-�=hlw8K�sg�P-H��h��b��{'O�_��Rf:���G�2n��GGIp>*��tR.nt�Q\��6$�d�oG<�'"��"�����e�V��~�qY5� ��
�G�`6����-��Q�j2w#������I���73����������5���9x�6��Er��[7M"Cf���� ��!��=o�&;W�g�m/rA�<�����,����?���y����X8����b�]0��Rm�%B
�������-���|v�y��AW|y0`��W�=�(�)H[������)��|A�W
�*�m�#�����|���N1�QL
��"*���5^�/����>�[H&�������j2(gE�r}
f9pq�[QK<��B
���|*��m�#g��t=�_'�����$c���������<��n�k�
�U��h�8���%vv��

��
��K�������_�%���Z2�hoB��/d�U�\�Ws���g���8�:�5���x1Y�#�-/o}���*^�������%�J�>�D��A��^�^��>���\�x�V��Cq	#��-t����18����_�31��G���_�;V�km�������7)gy�3$d�N[�,E��s���$���2v.�����n���<������<��9��T�q�;'e3�NL�us]��{S��c(sKs����?��X��r����J�������s2�f�����Q9'=�tO�6����${��I������%��^	�w�������e:�]�f�����M���5x7��P�w����!�dC�SYX������sZ��3^��,@Mfv[��op3d������x�O����{! D�9����v�*�^����x'�;5\���#�Z�M�^ebZ����dns����g���]<QD<f*`��bZ�����]�6nNo�-��(�E�yr�t�-�����IKp�6B������(/.(�A�=a8����_�nnI(�)����zp+��,P�rN�O������I�@�}$*72��~��w�&��i�F_������>�A�������?J���Z���R���8U����� `A���ZD����!9���VM)�	<�G��[�sO��dn���O���:_��c���1���f��h�%7�4���Dma�E�N��R�t�U}���[���������r�����0`�g|t�����~��^\4�hy�2��%�M�J�����
R�B�@����*�Aq�C��dWY�1����6x��v�wU=`�0���{v��~��N�����t�"90����&�LR\��ZHM��h���C��\�F�����74e�[�\�������EV��Kk�j����1�6����M�1
�����^�Td�3'A$ �8�A�
/ �<���j<T���������.��B�L��z1*$�u6�q���\�lA��,�k�!uf��,�n�`�o��!�(,���V�<Rd0��3��x2��W�
`i�M]�jv�~(�<]Qb���;��d���-������<����$z�a���7�^���w��
��XL�E����G�62�Z�7�])i8�)�+���3,�T�Y)s��W��[VNE]������o����8� D�2�zd�W#��T�O@
}�}3���O�z�?�x� �
�u����ovj�;��5����-X��l@�����M7��1���W�;�3�&���IQ���N� Nk0I��V����l��,���~�j��n�Q�8�o���+{��N�	:���^�y`���������6Ap
���G��YO�'��*;�������lk�g-�/���0KL�K,�+
�+��w0��[�$xIl9z�
�KP$�>���F��3�.,�a�����������zo�r����@
�m_��Q�7L���{R0�����{�5���@��:BkAO�BKl��Y�=ht�4tv����J�*�_�����*ul����l��Yt�h���&��{�\��m8:�y��clu��F������V�J�G�%��R����f4}>�|��_F�iQun�����f��� a�f�R`��2(O���L-z���G�!�!��ym���U�Aa�����Y���p)�(��ZP�-���$��.zS�AGp��	�;p�]��&��G����F��j������yZ6�A����$E~�����^�SG'��<���">Q�0��[u�(�����~���\����/�`:T���,x���V���?�d�~v>y<��7�Cz������8z;������{0�Cpkw�0B8J���������>����oX?ix1#5���V��y�Y�{]�|[���I��]����!\��+iWpCEP�0��H� �R��`�{�����
�\�H&�;5h�Q�Ue9r���FGD������p�"x�vAM�p��G.	>0�g����Ew�����N���U�~��?���L]�a������'�=�6Bz�<��|#�x~
Iy@�<���1E�B�����2::-�d�
�t����@�A�n��S�T��%�v2�[�b����n���(;6Y��[4I�5W���!������J�#���U��}^����P���{�o���Q�Te��Z�pKFT��&�%P���z�~R5���H�W�Gv�W��n&��3,��t;�����8��P����A!��SY�:N���!�����k�<�ERh7o@�y_�f�K	��z+h�9�'N���5����t�SS�gKW1�K�H��2�HO�S b�������s�J��T�����8��|�K*������&F;�^}�Io���XoP��%�+K�g;�� �������
C��L�z�7I6�:2��Q���qc�_�}]#�
��"���x���9�c�$0�0W�?X4�
��������F�p�*�3Q�������H�����V#y[Cq�#���[���"f������2��(
�U�s��Z:������Z�yd��X	-�+/�����A���I�{�He�kK6D��u�V�\��w���-B�"�?���^���:�����m�E�l�2���������YFk$�����f��� ������&�=/�3d�"��n��)�A�>r������l��1���p)�O����!^)^J���d���8+9�1�x>��M���g���u����7����Z_</@ �	�TGI(�����PHE�P��L<TXT��VaK;��Z1��m_+v	=�Z
#+G(���JX�2����*��]6��B�$k\��4�u��@���$	�I@��f��W�M�J;��x^��3�A�~] �'B%����{�D}#����!D�}\���T���qG{	��y�����>y�K�Ft�(_��D�������zX�	FV�WW'������,��5P���#W�w?�������	{���5��T���+���������Lp��U���?AIj>�;��F_����f�C!�P��?7A/�0\6tg�����,=-��zm?�b��~�Dd��
b�Eur��dp#�����������y[��2]�#��WA�Z&�����@��>l�BC:�;���
!f��-{��
�-�*�@�T��R�����M����`�s�����':��O���*X7���\d0X�
��e".�&S����=S�q����:�";�:A6�2�8���E��c
��.��~��������M:e#��ke���3M���"���w����mn��U2���
��V�e �V��#���:x }g?/���>��y�]Qr�W8�!����"QR1
�hw5
4���������N���j{����d����?|%|����^��P�)N�'��:[���N!�p���M���g����I��w \y����h���yiWi�)h��
p[�IIbv�&���\g�8��KC��>��8vVS8$f�2�,`$s7V��J�y���$������1z��]N�Ua�CIwqHu��BO*.���������;p=;�%��[�r<����fa�I�����F�����y%�,8�C`
�����kV��I0OP�����7��A8�@�)��d�!N��`�|K'�;����	B�0��y��4��"h������:�	2{#���u������|�G7U%�i�@�*���Vg?]�0��-d����AS�������0������^����O���u	�������,����,.}�j���!]�X����s.o��?�.��]b��-ee�m4z��, %�5�[���j	(^@B[�s;�����K�\�:#�66�����5��RQ����c����U��U��oH��3��IuY���OY�vh�����f�9�|�i�&��zB�9��Nb`Hl�9/0]���hG��o&�������Y�v�#h1*2�
��W;�:G�=���Y�
��` �����0�s�����XZ	&gX��#�Ve�wi��� ���7��<�{�!E�D_�y�������_#W����i�;�U���Xl�8���3f����iJ4���z�i��k
D��H� ��0����"��$b�{����M5�����6$t���q�B9�;�b9�%�(mCiC%�����F���zI8���pC�����_kp�
���U�kM�PL���Lf�tF��	s�������L��Ln(S��ZC&;�,��q��p2x���d�3z��^D{RM��V��������l�>���3����Q�����86�[����%L(YU5�n)K����L`�f$fu���s��h(�n������^�w���yB��p��mYM�=i���r��o�gASP��vD`*��'�g�X�c�40a�����q(Y
#N<1�vC(�i���f/lb��TQ�Vr������*h9@�IVC�e��3{��
&M1�7n��
��G#�!����.�W��yq�j<@(���B#�x�1�.����P����QYr��S�`+��'����t]Q�k+��4�ym�9�a���3e���������(�&2�y�a����$x���VcM��$�qp*�SA�x��W1�����rG\<���#�����s������V����I2d3�,�I����f��bJ���O�?=�[���/{O����|����Y�?�r�9�B�����MN-���0����$=����H�����Cp�*�F��y
�����<�������ot�e�h������#j`kJ�<b
���7��M.��.���Sa�4�>�iEe��w>/l�p�����IA�X�#EX����*����V�xn�Q�,���S/h0G�����Y�j Xi��\N�oi�AN�e��/��p���Z�<�4���LYl�a+@�T��g��n3�������j?
�~u���%��%��%Dw��Y1��@t����3����*Ge�	�/���J���?�>`�)��PvR(Y���+:�
��ev���Gn��T��/C������|�"�?����u�i�:����<#���}��bf�*>���3�����$B,���a���*0%M��S��2���q��7A�X��$<u�D�h'9���>N ^���s�N���bE�<��U�4��N
R���/���@�������h ���7����1���z����J���=������Kv�:W���A������0�yK����a�l���A����|��������������y��Fxl��)�P��'����35o/n%�Eg� ��$��<�	�M�
���������(	?G�����Zx��&,]��t2L�E4������?
<<Dr��4'����t���p��[r	���4pp=-��
��kH�|����6b�{\3��S8��h������,a����l3���Y�����y�0y�XH	���7���]����
���1�GtC�<W>j��_�~�U!��!�`��-�H����9'�<�E_�����VS�vs:��E��5I��t�h���zt* ���^E��)�O��<�"�d���������u�d<��i�����Bgm�m��o�rj1�E�d1e�m�iPYq)A��K���� Y�P�����&��
p��Sy�4�P�ZcVX�Ln�t���Se���ja��|�x����^aw��	�@8Gp���F!�������/pg��� 9!�[���]�%���D�fL�v"�Mab|��5~I�>��j�T)�^�qB��V�����?N�S�fk��q���zj}����G��Z0ac	��z#�����*�S����B��Q�"��LG*��Ow���+��Q�i/��T
�O\�oK'2v���a���������?_
��p����E/�s��F
�W�T*��y��N��"e���O���9E�jl�P����SO:5}z�^Y��'����N�n)���l���~��V�����jJy��}��M9��� 67L��C/����P����������7d��c6)������f\`)�4e�hK��^xOl����l�&�������������F�>�N;�}gX���<OW/!���+tT$,��(^�N��`_���a.��l�@���C�4�t�%����\�&d�&�1�����?<���)�3��'Ur�{F�Y����p��5G����."/�_:L��<��w�2�
�r@��5`_6
w|�2��e�yM�M�|�$F){�������``��0o�D���y�5�p����Y��v_r����y���\��.�F0�������`�E��'[	=z'O�� 5�6x`���7�q���������^H�
���z������I1�H[��Dp��Q�������w34�m��=hTF���\#�������;�s)�*q[����B�1���Ta�_��c�3^3��p���TXT���B��tb9��1�Os}1K��7��"ebJb��fr��::x��"���^��'0��"27����������;�����C
�H�����,iD���l�=y��=V4��������Bz��������.�
"L���T�������rP&�z��&��t�:�*������A���p����"���e_%��
g�F�"�H	��U�b'�i�T��Y!�K��JW���~����99Oi�%��G5X�Bj��-zx���m�����{�������&w��P���7)�3{�sK��iT��3�(B�t��-
icI�w�T{�T��N��%5uq��5��W]�{���fF;z/�W�
��V�x`�7�P&#S��%>�X{	���`1`�aQ*
��&���Q"��3�K��4CE���n�*�U��[����5�Vx������g�5��\��R�%GX��-8��yCpU"�L��-:
T��M�x�Y�E�I91�b���o���m*P�Z�������(���y���x"qTP�����=c������A�2����T���3��i�]�
��5
����u�Y�i����.�P�^"�Z�[�	&�4���n,*�s�.��b��m��1�j%�+8uA���OENb�~`�)���	�q�V�&��VW<xE�q��kbxM��FF���2�s�VH���[�D�}}��H�k�yoH�sI1�(�j!����T��'ngF ����:��y�1�:�I����s��XnR�n1���b!v=�M���K�N�OAa���)d�1k6��Q�ma��01�_B����04S>+�����:o�p�@��&���O�~}���"�w����]�����Ev(�����|��Jx}����������g�Y��������a���S�0F*���jS��8vU,��B��z��fd��%�����3�;1��b|;��2Y��=e���b��Wun�YY�`n��
t.��5��Z����cH x����S������R�G���c4p�GO@
W�U��q
r���bC�ZIqX���$Q�m�|�����v!����J*�Z��4o�I�x���������A�������W�����ywA!BE���1��C�zx��g�����	 �l���}����%��K&\�y��c?P<����$t� /����n�)��Y��~b����Z��Z,	�Xm![:���\7���<��WV����R1�a���?��?�#��d���|���<�~����:�	7�a���U���Y��4[e�'�����N�V�uCfV{�������CH%(st��kx�����h�mu�}uw���3�Kp���#�`�rY���@������1d�Y���"��,����&�s���BQ��TI	���������.|3�}&�cu#���$X(�R���
hp9n|��7�h
Mo�B�Q�0��/By�*�W������j���4b!��B�O�r82���Q�����,��M����.v�������A�Pm���l��4D	n��|�avp��$��p�������`��Ob�4?aj�4��y���ZW��A���d1�4�m=�jC��U���L		���@2l�g�@gH����B���8J��_w���t�\2\*��l4����|��e#����|����G/�<����'��M�p�|�2���N��M#���	k�^a�Je�('��������0,�&��P�=���������q���hr��.	me��a�pTQ	�>����i�0�,3l�d
99�
���C�%�R�o���H�@p��'���S�zks=k��SO�"�7�[�)�2�hI��X84���-��}]0�8O�6��,���xm\�q��%ZQ���%�jj��jqc�0
�?O�P��N���b�TR$\@g�*E�P�"�b����!|.pT���(�,f}�1��>���12$?�1��S����E"`�������Ss��JM)�w�L(�
/x�0
�A�_��/0������v����/�K����/���k�`"����(A�O��|dgC��������u3�N�N���P'�u@~#�����x�Xl� ~��~�<0��RdJ���! UOv!�m�}�X!7�
����q���>%�%�[\�>u\���
�i7�F�[��E���\�(����0`��?����]���ed�LCh�Hg�uf�Z/�4�l�����L�
��f���k���o���A9��uqKq����A��l���B�]3��=�`��[o]��r�j�Gq�&I�C����D>���n����_��%�)������������;q<v�"|�f��bR�b�w���l�n�Wt����?��c
%@x�N@;��A���N�.4L��7^p������&F����>����zC�����B�7�dc����XW�ttz`��5���������-���+��o����F1�0��R���x���F���/��
;z�:��g���?�hPJM��
��d��[���m�Idd<.+?�b���:�qP^&i(R�J<����6
���sw~�������~�yUD����������J����7��z�/��p��b
V�P)�m`�NO����BD�D�8������>)w���@���^�)
���n��:�y4i���'t1��}�	����Nr���	%
�F��x�]z9��0�����]��vx��!���?D��?���70����'9c�b~:�u���������Z�/�I��52��b�9Ay��A:���"��G�(�4F�0g�)�6�_,��ev�!��u��	!��N�+�A��	f��/f�k�4u��fh�#�G�S<�`8�	�-��(C�r�eP<��S��~/�AB C��
��[��o	�jNp��Ar�q4 �X#A����b���}C��b�`������eUL��v9c����qR�?���.���.�@������2�T�CV��p������a�KUX&�Hl�]� >%),���z~BRXj�?5),� ��R��7YL�Zl�d�D-6�)�d\��OT��+J��@�8�&�� #�]�
�OX��X���;���U^��K����"�a^�<�y������4�*��������L!O%(�3����vP�'r��$A��;Zr��������������^���)��#� F���.Pf!���j��P���N��O����W���
N)A�L�!��d����uk7dsy���E��4'���n�	I��d�X��8�����kI���r���abwAv��{".*`�!
M���>H���r4G�3�Ez��'�1�[x��-M�	q�J���.[�j*\Y�"Bu���1�sHAj5��Y�~������";��R���g�5�xJl�"_����L7?>�t��Q�������6e?l��1�$eB���ZQ���a��|���&��c�c��9�o\�M7��������e���}C%���x
��Q!�N���CD�����|�t���'��6>O���oN\3]0����+]01�+�7����nV`���.��2~,�
�Fh�Ggn~R��J%7g�gT�R���M�%�������7�j��g���xu��Ky��h4&��D��L���_J-�B��� ����eM��d�����G�L��Ny�k�N���+$����N%hH���b��b�V��l��������%i��
F�nk�8v K����.5��y��#���f���|�����g�_�������R5I��<��[A��mm����2�Y�M�l��\~�D���'&�n*>we"l<���-��Y�9����<�}\5��(��U�J����`SV�`�k�<�M%����<�L������$��(�M�,��X���r��V�����+W���%h�%] �m�C�e�� pZZ�T���A�s�VN��Bzs��]#CU��0#�\����9��{\d�n~l�tn�1w��`=�����������	�6������C��%f'X����=PXC0����@EC��oP�k�q�m�j������+T6��yJ���D�R��`���p�;��d�o&+������������U����X��V�F���6h������q������+����e��%/�F��Ku1��>*���������=����e����&+QR����]��^Z�*����qb�+$���3�z����0JIu�xR�,����|(�K����T���<#�
X���2����X�V�����$d�M�.�y� ��r����|��i���'�G�	�b�NmYs�u=U�G~����q"<�(����pS�h���d����&�tO.j��r�F���V���
�jQ�mU-[,�!
�O�$k���m�`��6�)���?N�������?99�����y]A/
*��Q(��;�s��'���A�J����_�6 �� ���~.��T��M7HZ.(:f���3?���qIc��N�-�dq�g(P��L	W�04F����V�T�!FT��X�b�~Cc�;�w��!%������)�.-��7���_�CQvPeJ�8^i�n6.����bk4�R h���vy1�Fkt���/�8DG�#;�1]{
��
q��u�P�JP��Ki+rH��L����g7D{,"����V,�;�����3���!�"w'tD����[q��
�o��8���)\h��-����N�)���:K~y���52=^���<�W���f"Y�	���Us�v6����\l�O���<UF�������c�����2K.�B�+S4��f�������W�*���z�~�����,!��N���i$�#�$�tgS!g��ls�(��r�S��\��s�����l�f2������[�����I�����������R����t�xo���5;��A��1�_M������!��o����"�N�����F����=�;�<a�e��E`x�u�CwM4���6|���������f�����Y~a�"&<�.�!J9i*�+O^�"���z2#S���25��!�ZCxJh�q���4���I<.�"p�������9��g�p���(���=�t:-2�(��8����H�������B�0�@��2/��l�N�\Ep\l���_|����CM��d~�)C��o��jz9�+O��e���L����y.)���N~$c�n���-���O�`b�������l�r�������\m��K6�#�@�R����	+}d��7���J��
�j�����=M�	+3���8��]�AhU�"n�JQn�F\g��)jj.I&�\I"6�x���0D���	��qO�/��,�Y��u��Of�k@���Yc^�����[$Y�Lq�,^l %���u����!:&9F�y�@{�6��0�7�6����I�s���$�G��G�������*\�����&B.�{��K��Y��L�0��_TW�������F�����YJ��0av#1)�F��.�Y��A�����X�C����{��Ty�����:=c1�l(rr�L��j��UP�+��\������8�b������v�J��+�[Y�8O�r�-w��T�-'�p�����].ec�#!rh��0��w�����i���/N{P��$x����Y���YTr�L�z51w��v�;�@%��Wi0� 3�������?�?����������G5�5�1R��/�:j�L�U��\a�>���/ #����Z��>>�lM��������m!O��>0T�9F�Q��K6�N���y<�> (���:,�N(�}z%�*��@��eG�[�">P�5(��	7)��W���Ay��9#��25PS�7!�����~�<���f�Wi���
#��/��)#��5u�0���i("�<o��I9���*�������4����#G�F�}~���$G��	���>����o'-;z
M� ��?�i&���7���b���!�2��0���[pT�}�"4���>�?GS��R3
r�CN ��b�F7��`v��@��$t���H:�S���j=��+����W�}W��\R�R��l���s�\1�-,)�zE�X���{�}gf�d�$fh�5��q�]�Mm)�?9m���~���iR�Xq�aw�/����&M��7���g�������G'�k)!�����^F*RLx��[
H"H��[�Y�QY�-��1��bn}������a�?�6
�
v���r)��Z��=���><@�G�p���\��,�w��kz��{!1���B��f���`���"�y{q "�A]t6{��A8Tr,�}#`������C�)q.K�ZU�.��B���H��0�\��!��(:f�7�"�Dn_DK����uV����t*��O,����~�-���������_x�x0��,B���bN)�B�Q"ED$V>y�\5"��&qe�����)�����d�Rn����<�������,���e��7�5���(�A���u�VEE�/��k�!��$���8P:,����pNT��}��@t6P���!�����"J��Ic�j���
����\v���#��������NY�������C:%_UH��g4|��3����WKI<���P��o��,(�t�:������>���|]"��i=_&o���b�u�W�e��*!G7(LbN8�#�c��{��q�:����WzbQ`aT���tJ$J*��@���J�%���w��"��d60,�;rgV��`�!��4ct$�{���M��E���~S�>���S����bA$}>F��1;���_[%F�����x��n�#�|M�G	]�!($�����������f���9�-FL���
����l���hS"�hp5���aZ���i��9���6��(�#4���
FM�E1a_5��w���Y 4��1��E<-���0?�_���g����6#���X2����'	"ljXv��!��I���*|�����S���������c
�)��}�_0�4�Lpxm�a�;v`l�=�DG#�/u�Z�r��
&��r��!z��&��<v�~��A"��:����hw�Y��N��	}�j�QP?Q��a� �0���
�K������-�6���}�c�I���j��zC�o���K�PKlh#If+~l:����I���h��c�HY����4|�!�bE������������P��p�[QN2��FRH�����e
�%c�\u���RT5������4����H<�WS��	/N����m��>o�$���������g���QS�A0U����"p ��}�a;���p�������K[I��
'aR�������\^?E:���/G��r5s�9_:�������G6���,S�����'��N������?�R�H,���[J��@�O<e���S�p��:%)'�fy
K�����{�*��doK�K>
'�:?��/���<���������\[.��E���l
	NdM�2I��v����wa,�xzM���T����B�+�K����b<�FI�_/N�����x���Tt"KN]z���=m�79(�Ls�=�z�.{�!y.�r����]��&[�6�!�|�+l3�|0&Wk	�W��d�������
�B�]*z�}�q����/��ff�^[����U����c�R�8�< �����W����A���Y��z.��(K��)�p���w��/����	����������
K*�nY��9g_��H�d�LIfd�,.�
:��������T_�6V+��R]���x�[NJ5�5��C�G""!I�7�@l������aC��;�����m����QP(���������=�:�1���j��E�Z�=dirCz�Y7d�8{?5���e����9����=(�.2s�cD;q�?����j�0}���6���$�����<�`����="���-d.������Q����s���>�3����.�o#���LO��`��6n��F�c����7�co�m~�G�����cv���������)(x.��EIE|�y�N��/S���V��1�;���������������U����#������fr��V�DB4"Q})����IHn�q�9��$��Vf�����������Nc&�M��/]�>�M���j�9����Y�_1��L��<�LR�
P����$�r��������ol�-�~yH����v��5�r�c~�Y��<-�
	���K�Ck�e��
��1������	E6q6k��:s+W0mD�#���C1)z�Z���d����z�<0t�t������(��*�T����'3�^8�x�}������N�v���<�K/�c������S��+�������A����%�>:L[���VyY�<�-����P?�>!6
�JgT��TF�0���^����<���$N<���6z��i4�Kd�D���/���^c���)v��~�cb�@P����WF���,���@�+R��Ff4PM%�
r�A
\X;�����bq<�n#q���'���EA{u3�&�����d
��n�
��������[
?�eU�z���Q��s�48v����LS���l!���5k�����
������m���?ylQ��z��=�^�G�(7�d7NA�Z6jD�����LWnY"I@��d
��Yf�/�Q�r�{(���vb��g�� ��T0!���h�{���$��B������=J&W\���R���������;lb(�$���T��R�����`1���A��!b��=��!����!�2�S���v*�v�������������5���Q��E�M|�J��<�v����s?n�5#��KY��u���"����C�1���3�Q�5���4j:����~=���.�_`��x�U���I;��i�r}U�������5�D����(Y#�"wJ��	�
Nn��9����4��o��?m|z1�|E�S����,+��u�f��HrC����0n�/�3N!J�����t
`D�R�6v)jmns��fKH���E���T9�����pV�g=��HW��eu�F�RT<�-,9�_�&�������'�W���d�	�9���`jC���c��v��o�z��l�����v2o����E��U��	D�C���t�`DH�k����[@�'��p��n>�C��� ����-�R
S:�d]�'g����g���a	����_�+�W��5��1���^�3�w�4��#���s���5]���K{�������OU%c�Xs��j�l%�}� �`px�
���!k����u�
�B������q�e9��R�>\����Z	���Vh
�W	�r�#A�n�r�
Ro�$�E�&�n���<q������Acw��L8�M9EV*�!�A"���;K��� rd�q~���U~o>JgW�g��wz�h�i���|�E�n�1����z�g�{2+H�u[�*T]x]��yE�d(Q���mI%���z�lC����*�m�Scq��~iMD��d
�u��fa�#BhaN��M8����=�\�\�@�R����E�u�#�5DG����
cR�'���f�QM���*$*V4tc
e*��
2�?<l(zF��:����'���F�bg1�A��z�g��ka�>t�Ou^�/"KSW�UK����o��[�_6���N���"�����R�K��AK�l���]Q�$�N�t.w�J}�Z��It���S\P}�d�+!y�-#���7���3fD��~�����	��
�������Zj��W������\Qp��_c<M��$r+��k�L�b�oE���v�X�����V�a8-�9���4h(�R��E�8��9�������{R�������O{/O�^==W������H������+!�b�j�ov�K�|������PJ�
<�b�v<��V4�}����=Q841�u��� ���u���F�kX���m���Y?�{��%]Mf���F5�l�\_:�j��I�<Mn�'��L�,dT�0^r�.���J�]#���sg#V�?I�wjd��~XS��j7��O�5N�����*S�����V����m����E�'�%
�Q�4;�3sPn��A���7�|�i���>k���'M���1nD�:jx�06�}?��Ja�=j%O��C�Q�����Us�@
AjA�=@����f��!�v����3��6g�����D��.�}�z2&�&��0�J(��c�:��X�"�rq��&][�����S����`b��Y��f�&^31�62�q����~�F�����N��{����]ad���+[39bg�Z��#�r$��0]L7"��B	
�����P�9L�8%�*S�_�����
��jy;X�������%x���o��"Q2&~�/;�g����B����7jP�]��Za��g�ty_��,�kM|���l��#]��[4���:��vXL�u��it}`�}c�0+3����Yd!�f4I�`��./�^�F,��?���,�blM>������n����������lM`a����C�������	M�p��2C�������O��JC�����6,�����l����������Z����Z���0�]=��S�	���%	��xZq�!�	��V��n��ME��;p	5=�b�I-�
i�EP0n����$�1Z����7�~����?�6��p/����4:��tq�e#��C�/��`�Yn%��#��8�d�6�5�js�>�!O1��r��1�k@�7�M�p�v�h"?@Sy����to�F4f3
����-����=��C�E�e��3�?�����/���cY�#nW��w1�G�%m]bY������q���#/@uk����_�������V��V����C���P_Y"�����5��a����+I��b���j�/~�2���7�7�M�� ���x�a3	���Z`����w��7p�n�\�l^
�I����[�������d�-�����O�9P�V?�X���Q��;�{���V������l�{;;�fs�/n�����
����.���O��r&#B^{�x>�A>��e
��0]��%6�s�i�%���������=����:�%(7�8����}�����&|�S[1?���2�x��{4���?c��������g�N_�	���E�A�(�����d8�=E�'�	�����PO�|`_�'����P��������]X��t6y�C�D��c2���&_�}�<��%2N�p���J��2<y6�gb�5�Z��+�t~u��)Q6��<8�4��#tP���}=���F�%��O�@a�z�c��A�]L�e�c.�����CZ���H��.]7�{P�g�-c�2=���	�a�����|DM�����	��*�����V����u���I�
���Mn�y1�ih���P�����:t:�4ns�{���P@����c���=4'���_�������_�����2^>�����$��p���`�����`V�I�v����.t�@G���}���>���[N����xQ�#��/t�pt&����p,�W���M6����k="��}���c���`�����7�����nr���4�����0�\�CHF��b(����t����tw����C����q��)t���.�����
"�T��d	-�7��F��zo�)��M����?`�/�R'�_�7��Z=[��t��bk1���p�e�%�gX�Y��a����V�r�2���_��,4���,F��ny����Q������8�����I��	��Z5�DI@��i���`
��!	�����4�"��@�$U��1(��^d���U/)������r1"J���|��1;�,�����w	�H�Ww�D<���:T�|-/���.�L���L7�c4��eHP�chH^�������n%�}��F���V:�o��b�/��X>��~7=L;�V�O���N�	�6R<{�b���5�vR�_���0
C	��
��s���)K�V;�]F7�'g�g��{�?=;����Lju0Y�V�lS���t�Rg�(�`Q�)����t
j��zX�/_�����I���VA�������#��)����8Ye�c	�+�E�j��]���e�}���C��������aT�Y����^���c��|s�=����
��n������!zG��?�DU�
��}5�h�����3��r�e��]
�6?������BBF�r7m=������3p���m�WO����Out���5�c='����0�^����G��oo�����Q��s������Wo�%{9��������7�o�[��'`��`:�|����Etm�5
L�+���=�3V�h�{���{�(�{���\������ta�J�$f��+�G��X�0J	��3��uu�<�Y�����5��y�j�/�3�k�1���n�`���������W���2�B�Um���������we�B����,m��W|c��-}}�]�/��^xI.������/���[���s��5\y:�W�8���m�2��A;D������3C�����b���X����
�������I�o$2�D[�q
v�'�� ���s� X��MX��~������� �^#�?������g��
,��
��t^��my�Q����-�{N�!'������Lf���R��[����$'{o����a����({g���!+�{���t����,������,{f����<��C�6e�_�l4���v���������^���<;=o$������{���=y�]����@��f\Af���������������/\F�{q���Qs�1�.F�~1�>�/|t�`������NN��������9I��2�����
|�?���'VQC"[/O��p�"\�(��
���^t[�lo��������Jie�0l��6lF���6������]p=��F@�u@��������La��6����`�o�G���V��d1D�*9 E�E357hn�)�}�&>�w�t�3�n�������� %S�������a�3�~��E����qr�p���]�}�le��e�&��!��N��2�@�DC1��H@���`�	dQ ��`
�����������d���������##M�'>f��5�l�X�j6YL<[�=�_��=�d�������@qf�{�Hc��U�A��V/}J�Y42 5��Sy��t3	���G��]u����6�68������T:�NS�a4��$Q��L}�D����@#�J1�fM����S�$��!GAok���z���L8��`x�b/7��N���c�L+��%�))4���F�`��p�d�}[u�"sk����^�J�3��/>7�T5�#m���6�
Q�(�U
�;�=�`st��o8��^����f2pS23_�\'33��:�����m�|q7�����O������5�5kd�+��B*��YF�B���KGk�����N�g�s�E�~)�#	����������#!���rb'\o�'�F���~���[�J���.8_�(�������i���
M6L��
���$,�aCcd*8��3�o�L	�M�
*���_��aw���p��i8���Z��.����1�|��l�5��	�l����B 6]�R�+�7�U��H8�r6��F�7��.fS����!��j��Un�q]������=���s��=�d�z��eq�#�N��5q�|K�%2�����/jf7-rv�N��
���e��T�f������#9���x��`�`��;�cvL�3p� �]C�^1�fz��2~�#����$���d�gI�����4�)�$��3��8LH jl�/��M��w���
���!u+�6$�my6�R���$�Es\+9��'iN�N��-�z�M�#���_��w��6���J�R,��Q��h�H�����ckn�9�M�f��������pL.aN�9��$��������fZ�&g���j�oS�2��'2�gp�>��'y���J��%0_����bl�2�u��g��g)uk�����&����|
;��U��*"�1CX���&m�M���t�bS��x�����D[��	6�	C�uS�&��L�L]g������d~�fC�����O�s4��R`qqV��?N�o�{ub�\0��p�]��?5���n��i�ds�Gn��i(�����\*6m��/f��f�60�$G7����F�B��
��h��������������~���m7����A�p���	����7:0��U������ ��B�iz���\�-H�,$��V]'#=.&���f_�9t�-��^��VL\~���j�J�y�7��A���g�����~rc���p2�W;Q5��s��bC��5�o����2�����	�121���H��@��	��t��6�y�C6����&�Y|h�n����������7OOP�'�g�\<==9O�
��[���o_>��s����,O�����mr�WQ���N��}�z������M�Z%u��N�W�j?;~����F��&��2�2������KG"q�hD����?}yj��|�t�"R���	�BQ��R���Y��\l�TVH���DdY}(�:�hJ��#��Y8}U�@�aw�������x( ��H��������nw�	�n�~��i��`a��5����<�@(%B�x�������������0��c��dL�^
�C({��J�k?"s�b?y_}����k9G%e_�L�b���$Z������t
J���]+��J�zv
ldK����Q0'�7$Id��n����-m�{:��q�K�0���-Es�N[�����q�"��5X�*_�o��
8�����r�;���p�O6	�O�a��X�98e3��8��;���e8`��0���1���0��2�G4��n$�.fx6��C�1K���F�7�l9lv���m����FK:r�O��k���/
�y���f�D�6/"�d�I�P�hb�1�	9��m�Xft���sT�O���\�Z��/W�Qn�������#�S
V������nq��	�B�&������%f��a���\�b�@�.���23z��qdy�`E�Pp����!��4�Y�3�YZ8��N�!������
�7��S�2����H�S�:J�;P�h
{9�����a6����)^��4�V0~��A}��x=��T�`�0r�u��U��E�l6c��c��wy-��YMH�j��x�B�V����i�JEj���>����f0�����������=���p=x'��K&;����#0���, ����l���F�2^��Q2)�������k�4E���k��AT�/���[#A�l�fk�M�;"��x2n����8#�n|���'���kxg�y����N~4�q�>��C�A�������e�]8���{�j��.���-��g�,-B�(���YK����G��{��q����$�+��C�S��hCz��	��M��Y((B�
� ��4\�u<oJG��(M�������id�KF�z�+�W2���_
GeNc����y�{8�{4p�	�O��w�����_�z�m���?QY��
KPJ��w8LI;�\^_������o�K���5�5��FO�%X��:�i�:��+��f�b���c���N�2]�C��(S��$��
���A�qmi��z�s�l)���4���b�pk%�@C������dn���J&��nkwc�_�������������,��7V�8��5���7�{{;������?�������t��Iw������v�����/I��5����t�$�g6[^n�������t�;������=s���Cz��l{��r��}x�?L������&@��Ig?i����%]PA@3_&��k���h�����s����W}����ax��8��O���6'�o�wv�v�����3�6�n�lqa9_&��(�dk���3a7i�7��
T�I)`~u=��e�[}rU��������n���*���t4���osCz���������*�dd�fPI�djx�[D)���R��R��� ���J�F�B�b�W�/�$��:0F��s�1�|(x�������/��8h�`D�K����J���-��t~w�H"T_]V����pM��~�r�����^��~'��X�����7�����s������M����M��x;K�p��:!oR�c;bP���[3�����
��6o���p<:{	��c���+8�;�s3z�S�z����d`nFz��m$tZ�/5c������e�6|�m]
r�nI�E;��g���������t}�n����[�4E� �g~��uQe�K��[��9��<9K^'_�2C&��9����{/�_�pz~���#���8����j�`�m���^�#�~l#<��$�X�}5�t��.�W[���W��=�`�,�'����8Wu�$��+�	�?�S4"�%7����t�<��Y=2�����`���,����mKo����������������w/{�N��
]lK����0s�q����ONJ����
KU��2X���e~Bf����$��xV��X�����/��f���W��*�c�[�s�������O����J�G3J5 ��F��7�&'`[����(��a����������jcp��(�gX�W=i�z�%
�����\�(��r�0x;���.�K6e���v,��H�Nu�4���@���R��&c:i�:A���Q�}��2�B��v�����-�h��:�q����b�4R�}R�/y@C	�<�{}p�3�5�A�cPKQ��|bN$�^�1�4k�k^�d�U�={�n���:���w�Xf������8
������g���U� ?�_����o��'�&����(���I�|�(���N�*�w�y�=wz~�����
���Q�.����V*�����������c��%�����=�;�}���`��|>�_O���4�Yc
�{zb�!^k�����f�}��������8��o�y��G�(�z��u3Y���z~d�z	l��'��h�]�j�m�Z�+J5F���J�-���&C0���G�D��Y�?�H���vo"D��Vv��J:�� ��q$<��j6��
�O�?>��������=,S]�)t�_U�yQ+h/_:�5L��R�A���M��Z�*?����#����l�����H�����S"�[�p>�;�5�D����e����9�r�{[���2��I3�@�"l|�����ud��0,k��&�
hyK0�k4��VvY��nk����Vs�p����6����z�a�Qk:_��C���
��^��;d��J:c~�="}�� 	�����T�y5�+j�
�cG��+Y�����4<R�<
�5{���?��s&��A#y9�9}�-�}�����v�R��[�
2G�����0S��w��G-UM��-U���F�a�����w�-^��������w����`�"���������<=~uv�;�������O_T*���Ny�N~�"��"����l���<{�L��+<*�����a��eD�zc x���	8iD9>x`�l�M���-�r0���d�C^��mQ��[-�{�2$�`S|OT�z��D����P��I�2<��+����%�$��Ln��U��H���~X&@�*����b6��`g��k�������%@�
�SsE	I���m��'JV��#h���B�m@F(#���03��a:~�DV@�j�w��g�/_��W�������]:��cxf6��!��)u����LB�h���U�M���Ux������~�m7R�y�����X����:]����ZU��c��o����.���#obb�=����2[����|�Q0������|��;��
*�'�R0J|XcX�>+�/,�F�VO��r����H�,���6?�� �?��3���������j����o��j�w;�����g�������-���s���\��=�d��(��-���`b�����2�P
DL�PE��}�-9A���:���'=�-�x	$��wt�!_�@��UG��F(1Q q�j���w�����RzAc�A[E��bB�T��FB&�3�Y�Su�HU���Mr���4��_��]���Ycs0��o��J������-�5DW��`a2��Nw��ED�	��N�	�Y��=�W�-��t�Pc�����y�����V!��	� 8�\Xm@"z�r3��VI�����?�3����'X�q��@�\��|�E	��u0��
[r�I�:q���xD^V`�������t{�o����%jR�����+���-��C����1������������G���?'/��c�2��'(��B����]��#m�?�������'��|C�5�
�3|�x�su]@g��b�E�m�He���#j�������NYA�g�v0)u��.�	��s��}���(H�F�U��e�"�'�������U	).��W�	#�;��`�HO�.�hL;������?��|v+%+��>0s�b�`�������"6��0=�cd�Cr8*t@����2���q��#M7���8w�L<o$����s��0�����1������Fr�|V�=�m��D���(��7�f���_�1���_u�\�9�B�&}r��N�&	F�`�g�E��m�	P������_�����aK6���>�a�e�S�>�9�!b�e�t����U:l
r������c�Q�\w-��
.}�`����t�*P��~do�g��k��(
�m�sUG}E��;8��>��_���p�@�B�`�sy�[2��,�l����"�{v�A�Sp�<
&�bR��sX���5�t$���T+5I�[o���,xDV�s>���8��KL+������D���;.A��W$��@��P�@�"��Z@��c��8���l~A����z1���)��>�����e�a
�P�,\���	�B��u��Q%B�i�������{���TE�E������V}IYm�v��j����gb���.Y�<���1H��C���q]��!����49���������66q^rY�h��[Q�[��$`�����qE� QcRVS���MH������ ��Y6����S�eU�4�������r��9��(��o�J���g�d1�>3������Z��C���c����"Z��e(�R��������8U�\dK��%��B-�Q�'I�r��b0�AhYY�
sOOq�
�d��%����	���K��3�������$s{����gC�����"�����w-� ��F��`k���>��i����;BT���\��.��
:�j�Q���n�����^��k���c�����S���F�,u���z�K{�(���in! ��m��6)@���N������2�x<(��f
|��Q�@+���G�,#��iFb
3���BiM_�H���6.����tU�gHg*��kn�
narY3]���Q�U81�������'�����o���p�fR�4F����p �0b��"
FFOF��
B*�[�Y�/����ZZ�����LP���53�r6~	�P����f�!���a��(Q�������0_��EQ�4*��sc12�p$�.�roX���[���mz�C�a4x�=l�7xnxL�h%l����������s��n*�C0PA%f������0��Y�4\�������������qi�Z����k��	�wP���72us�)Mw �0h��#`H���l:��(�N@�W�M�#"���������3��F)�2����
'�B�SscL�4�\�yJ��T�:}�qB�����6h��'�����7�(\1S�u���y�
v��� �:tEV����j,�$�Fl0_�k�<BbN&��L�I���4+�0��H,JkA��Q���������ho��L�m������<�/+�k�o`�bM:W��U������9����>*�����~SGn����E���u��C������YH��	aP;��
&P������	��d.z���s?n�V����m`N�2�Z<I�F���WB�r�H�;Y�@���B>������7u3������l�N�4��A�%`�0��6$y6����e|�[jq��\�p@�{���������d���W���*
���M����9Gu�����s�4�"�O:��707�J6�R !���M�Q'XW�zO�<�����D��\���� �O�X���enR�(2p��G�X�s�����3����;����)�>�|��6�������|3I�{2��a�)�(AX'���`����C$G[%�A�,\�-�X�'����h����&M
��|�����,���Xw�l3��D�H��I6`m��h#x�."h���B�5k0���q�s��9h*�U�*37^kq�
Pjl4��h��
��zB�H�����:[\�B��&�P:�S��N�b/y��.��L+N����]	af��]r|���&��xN5{���>�.���*jxx}�#��R�83��	U��#���
[�&0�t�w9$a���G����T����u��[)��^��nM���T#���z�B��ct0?��gZ!����j���~C��H����7B��y6�Qf���n���R�+u���� 8�r�����@�n�����=�G���H��+�n�JY�is��O^o�����j��xw��J��I��-7q,�	5$���t��
�)P�M� Obx.�G7�{����Z-��;���s�a�e�����$��[���K��}I�UNu�rq@.���nu^a�U���Q�i��V�A#a%�1	>�d��v�<@���,���t����d��
L�Et��I�r����n����������P����>F5�"�W'�
�Lx��k/D{
+9:��d40�k��!�S���{�m��/���Q�b���6��k*F������j[�Z1R�����H%���h��l�\/jb�A'�t��,��a�BV	<�Ab��$c�?��\�n�*++EbUV������T��B��:%��a/G�{�.Y?������e���C%E��ZK�v4������7�B3'�o`��j����!�H�7�M	d�,_gGgt�
R����gu�����7��Y�%Wz���� �/�$Y����&��Z���<z�9�5�N�`y�j�������F�$�g&J�l�Fv)�����QL��E�$g�����t���`{���FR�	e��3x����]��uH��3�������^���:��	5�����Qr2����45����&��5L�|~5����i�����<s2E�zl�hi������F�W�!$b��I:u�x�z]{������o���?y�����p��:��f}��2�����G��{����!����4�2��_A��KI���~\vj�8��!��?���KtHIA�w[����.������*�������i�p%��B[�i�J?���uL�Z�{n�������%
�>�'c��b$����a���b
���F�
%��99[LN ���6�����p<D�[hi���?dO4�8$a�hF���L:�0$\ ��0G�r�.���t�23B0(��tfp1 �9!0��E�FL�wA4����p�����`�O�4��D:�\W����&bGT��~c.#L�����+����~>��U��xx�3�S����D�E���Am��m����������W�l�����
|��j��p�Ao1 /y0��*�f�A�xX�yz.m�F76�&���2|����/�k������g���bYr`;�!H�����Q�-���{g��6���*�K�(��`������1O��jNQ���8}}bEL=����$du��SI']�����4�_�T�����J��L��Jqe����lsP���TC�s&�P�����!"<��~�,�|@��u�������Z�H�5��JennQ������!hf2kBb�H��gy��y��C��z�PKJXo���b�Gg�����B���e��	���
�p[�D���������*z��W,-We*���c���h�A�Nh�mY���)zT���w�����;B�*��7D;�G���q��^#�!E�z`Y��mL�1��P�0���*`]j�����-���}dc�5�|�Tt�Dm%���b�+�mF�+����=�	%�	���&��dF{6�rA�>H������P���#u�A��~�}�=7�0J�`-�{c����,��Z����a������o�Pf��0�C�-0S:��EA���M��#B�2�3#��5wt8���;G@�_����h�gg�/�{'���������Wb�w�&��[4�%b���A�����he�v����Z��\�|Z��_x9~A�i���N���9A��$��0u������WEf/���0���c%��lF4e��� O��J�N���J5�4�'���&�
]�r}M�tB����e�V��@:���t����d�����!W�fU|�~�y<���O\6����r�8���G�W}��[�D`PF��L�PO���Z<<a5���z�X�����9��l���Avp�]�0���b��������-;��rr|~���w!zic{��q��z �]���;_k8F�����ba��zf4W�����-B�}��+M�z%T�'H!���!�+d 7�h��`������@>��@��>ud�C����"���F�s�jX�,�
+�"�v��������
�v�}�M�1���&��)%���f�FpM'��7(j�
� �`��'o��^w��s���F��������U1�������0e<�?��������T-������4s�!�K�F����FI=�Y�l-eu��+�!GW:�bb��F���������<����:{�kY�CVz�#eW�NS��^IDI�T�qz�!wN�`����	^�eM.+�����T�V���R�wk���R9XQ<�1������\{�����u�bKwW�f
�^]�
EwV5}����(�88�!-aQ���pE��vV����Fw����[z{�����������.*K�jq����.*K�j
E�aZ�'#{�]��{6����d�|!K_3�yy��������~{o;�`���"Y(������
nk���g5�{���%�����K�L{�G��Lw����|f���?D��*��8[X6����;0�we
��y���p�Bb{�.L�5B���.	v��f~�Cw��������Z����� �u_V��|��]����7��-?h�8�A��a/�9Qty��N���NJ��_(�5[����Juqq�K�m��@N`,�[���@h!V_FR�,�6i��![N�����"v��,8�.�����<#��wo��a1���7���g����O�A`%�g35�khF��Z�S� �|��88�N����Z��z�Y��N��umG[.^t��7��x�X?>��p6tl�������J%�Q���t��
��0�,|�
�M�{��X���b�Rb|�m��V+�w�����j�]#�J<�*�'����E$g�OOO��qk�OA���D���k�z9�gU��$m*M�VR	�t�����d[*Vt���\��4���M�����)��K~��>f[��9�R�25I���<Mc�9��}?Qt�������_�������������������xg

#134

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tomas Vondra (#133)

Re: multivariate statistics (v19)

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is v19 of the "multivariate stats" patch series - essentially v18
rebased on top of current master. Aside from a few bug fixes, the main
improvement is addition of SGML docs demonstrating the statistics in a way
similar to the current "Row Estimation Examples" (and the docs are actually
in the same section). I've tried to keep the right amount of technical
detail (and pointing to the right README for additional details), but this
may need improvements. I have not written docs explaining how statistics may
be combined yet (more about this later).

What we have here is quite something:
$ git diff master --stat | tail -n1
77 files changed, 12809 insertions(+), 65 deletions(-)
I will try to get familiar on the topic and added myself as a reviewer
of this patch. Hopefully I'll get feedback soon.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#135

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Michael Paquier (#134)

Re: multivariate statistics (v19)

On 08/05/2016 06:24 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is v19 of the "multivariate stats" patch series - essentially v18
rebased on top of current master. Aside from a few bug fixes, the main
improvement is addition of SGML docs demonstrating the statistics in a way
similar to the current "Row Estimation Examples" (and the docs are actually
in the same section). I've tried to keep the right amount of technical
detail (and pointing to the right README for additional details), but this
may need improvements. I have not written docs explaining how statistics may
be combined yet (more about this later).

What we have here is quite something:
$ git diff master --stat | tail -n1
77 files changed, 12809 insertions(+), 65 deletions(-)
I will try to get familiar on the topic and added myself as a reviewer
of this patch. Hopefully I'll get feedback soon.

Yes, it's a large patch. Although 25% of the insertions are SGML docs,
regression tests and READMEs, and large part of the remaining ~9k
insertions are comments. But it may still be overwhelming, no doubt
about that.

FWIW, if someone is interested in the patch but is unsure where to
start, I'm ready to help with that as much as possible. For example if
you happen to go to PostgresOpen, feel free to drag me to a corner and
ask me as many questions as you want ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#136

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tomas Vondra (#135)

Re: multivariate statistics (v19)

On Sat, Aug 6, 2016 at 2:38 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 08/05/2016 06:24 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is v19 of the "multivariate stats" patch series - essentially
v18
rebased on top of current master. Aside from a few bug fixes, the main
improvement is addition of SGML docs demonstrating the statistics in a
way
similar to the current "Row Estimation Examples" (and the docs are
actually
in the same section). I've tried to keep the right amount of technical
detail (and pointing to the right README for additional details), but
this
may need improvements. I have not written docs explaining how statistics
may
be combined yet (more about this later).

What we have here is quite something:
$ git diff master --stat | tail -n1
77 files changed, 12809 insertions(+), 65 deletions(-)
I will try to get familiar on the topic and added myself as a reviewer
of this patch. Hopefully I'll get feedback soon.

Yes, it's a large patch. Although 25% of the insertions are SGML docs,
regression tests and READMEs, and large part of the remaining ~9k insertions
are comments. But it may still be overwhelming, no doubt about that.

FWIW, if someone is interested in the patch but is unsure where to start,
I'm ready to help with that as much as possible. For example if you happen
to go to PostgresOpen, feel free to drag me to a corner and ask me as many
questions as you want ...

Sure. Only PGconf SV is on my track this year.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#137

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tomas Vondra (#133)

Re: multivariate statistics (v19)

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

1) enriching the query tree with multivariate statistics info

Right now all the stuff related to multivariate statistics estimation
happens in clausesel.c - matching condition to statistics, selection of
statistics to use (if there are multiple usable stats), etc. So pretty much
all this info is internal to clausesel.c and does not get outside.

This does not seem bad to me as first sight but...

I'm starting to think that some of the steps (matching quals to stats,
selection of stats) should happen in a "preprocess" step before the actual
estimation, storing the information (which stats to use, etc.) in a new type
of node in the query tree - something like RestrictInfo.

I believe this needs to happen sometime after deconstruct_jointree() as that
builds RestrictInfos nodes, and looking at planmain.c, right after
extract_restriction_or_clauses seems about right. Haven't tried, though.

This would move all the "statistics selection" logic from clausesel.c,
separating it from the "actual estimation" and simplifying the code.

But more importantly, I think we'll need to show some of the data in EXPLAIN
output. With per-column statistics it's fairly straightforward to determine
which statistics are used and how. But with multivariate stats things are
often more complicated - there may be multiple candidate statistics (e.g.
histograms covering different subsets of the conditions), it's possible to
apply them in different orders, etc.

But EXPLAIN can't show the info if it's ephemeral and available only within
clausesel.c (and thrown away after the estimation).

This gives a good reason to not do that in clauserel.c, it would be
really cool to be able to get some information regarding the stats
used with a simple EXPLAIN.

2) combining multiple statistics

I think the ability to combine multivariate statistics (covering different
subsets of conditions) is important and useful, but I'm starting to think
that the current implementation may not be the correct one (which is why I
haven't written the SGML docs about this part of the patch series yet).

Assume there's a table "t" with 3 columns (a, b, c), and that we're
estimating query:

SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3

but that we only have two statistics (a,b) and (b,c). The current patch does
about this:

P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2)

i.e. it estimates the first two conditions using (a,b), and then estimates
(c=3) using (b,c) with "b=2" as a condition. Now, this is very efficient,
but it only works as long as the query contains conditions "connecting" the
two statistics. So if we remove the "b=2" condition from the query, this
stops working.

This is trying to make the algorithm smarter than the user, which is
something I'd think we could live without. In this case statistics on
(a,c) or (a,b,c) are missing. And what if the user does not want to
make use of stats for (a,c) because he only defined (a,b) and (b,c)?

Patch 0001: there have been comments about that before, and you have
put the checks on RestrictInfo in a couple of variables of
pull_varnos_walker, so nothing to say from here.

Patch 0002:
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
s/in the/in the/.

+  <para>
+   Create table <structname>t1</> with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for detemining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
s/detemining/determining/

+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
I would just assume that a statistics is located on the schema of the
relation it depends on. So the thing that may be better to do is just:
- Register the OID of the table a statistics depends on but not the schema.
- Give up on those query extensions related to the schema.
- Allow the same statistics name to be used for multiple tables.
- Just fail if a statistics name is being reused on the table again.
It may be better to complain about that even if the column list is
different.
- Register the dependency between the statistics and the table.

+ALTER STATISTICS <replaceable class="parameter">name</replaceable>
OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> |
CURRENT_USER | SESSION_USER }
On the same line, is OWNER TO really necessary? I could have assumed
that if a user is able to query the set of columns related to a
statistics, he should have access to it.

=# create statistics aa_a_b3 on aam (a, b) with (dependencies);
ERROR: 23505: duplicate key value violates unique constraint
"pg_mv_statistic_name_index"
DETAIL: Key (staname, stanamespace)=(aa_a_b3, 2200) already exists.
SCHEMA NAME: pg_catalog
TABLE NAME: pg_mv_statistic
CONSTRAINT NAME: pg_mv_statistic_name_index
LOCATION: _bt_check_unique, nbtinsert.c:433
When creating a multivariate function with a name that already exists,
this error message should be more friendly.

=# create table aa (a int, b int);
CREATE TABLE
=# create view aav as select * from aa;
CREATE VIEW
=# create statistics aab_v on aav (a, b) with (dependencies);
CREATE STATISTICS
Why do views and foreign tables support this command? This code also
mentions that this case is not actually supported:
+       /* multivariate stats are supported on tables and matviews */
+       if (rel->rd_rel->relkind == RELKIND_RELATION ||
+           rel->rd_rel->relkind == RELKIND_MATVIEW)
+           tupdesc = RelationGetDescr(rel);

};

+
/*
Spurious noise in the patch.

+   /* check that at least some statistics were requested */
+   if (!build_dependencies)
+       ereport(ERROR,
+               (errcode(ERRCODE_SYNTAX_ERROR),
+                errmsg("no statistics type (dependencies) was requested")));
So, WITH (dependencies) is mandatory in any case. Why not just
dropping it from the first cut then?

pg_mv_stats shows only the attribute numbers of the columns it has
stats on, I think that those should be the column names. [...after a
while...], as it is mentioned here:
+ * TODO Would be nice if this printed column names (instead of just attnums).

Does this work properly with DDL deparsing? If yes, could it be
possible to add tests in test_ddl_deparse? This is a new object type,
so those look necessary I think.

Statistics definition reorder the columns by itself depending on their
order. For example:
create table aa (a int, b int);
create statistics aas on aa(b, a) with (dependencies);
\d aa
"public.aas" (dependencies) ON (a, b)
As this defines a correlation between multiple columns, isn't it wrong
to assume that (b, a) and (a, b) are always the same correlation? I
don't recall such properties as being always commutative (old
memories, I suck at stats in general). [...reading README...] So this
is caused by the implementation limitations that only limit the
analysis between interactions of two columns. Still it seems incorrect
to reorder the user-visible portion.

The comment on top of get_relation_info needs to be updated to mention
that mvstatlist gets fetched as well.

+   while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+       /* TODO maybe include only already built statistics? */
+       result = insert_ordered_oid(result, HeapTupleGetOid(htup));
I haven't looked at the rest yet of the series yet, but I'd think that
including the ones not built may be a good idea to let caller do
itself more filtering. Of course this depends on the next series...

+typedef struct MVDependencyData
+{
+   int         nattributes;    /* number of attributes */
+   int16       attributes[1];  /* attribute numbers */
+} MVDependencyData;
You need to look for FLEXIBLE_ARRAY_MEMBER here. Same for MVDependenciesData.

+++ b/src/test/regress/serial_schedule
@@ -167,3 +167,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_dependencies
This test is not listed in parallel_schedule.

s/Apllying/Applying/

There is a lot of mumbo-jumbo regarding the way dependencies are
stored with mainly serialize_mv_dependencies and
deserialize_mv_dependencies that operates them from bytea/dep trees.
That's not cool and not portable because pg_mv_statistic represents
that as pure bytea. I would suggest creating a generic data type that
does those operations, named like pg_dependency_tree and then use that
in those new catalogs. pg_node_tree is a precedent of such a thing.
New features could as well make use of this new data type of we are
able to design that in a way generic enough, so that would be a base
patch that the current 0002 applies on top of.

Regarding psql:
- The new commands lack psql completion, that would ease the use of
the new commands.
- Would it make sense to have a backslash command to show the list of
statistics?

Congratulations. I just looked at 25% of the overall patch and my mind
is already blown away, but I am catching up with the rest...
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#138

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Michael Paquier (#137)

Re: multivariate statistics (v19)

On 08/10/2016 06:41 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

...

But more importantly, I think we'll need to show some of the data in EXPLAIN
output. With per-column statistics it's fairly straightforward to determine
which statistics are used and how. But with multivariate stats things are
often more complicated - there may be multiple candidate statistics (e.g.
histograms covering different subsets of the conditions), it's possible to
apply them in different orders, etc.

But EXPLAIN can't show the info if it's ephemeral and available only within
clausesel.c (and thrown away after the estimation).

This gives a good reason to not do that in clauserel.c, it would be
really cool to be able to get some information regarding the stats
used with a simple EXPLAIN.

I think there are two separate questions:

(a) Whether the query plan is "enriched" with information about
statistics, or whether this information is ephemeral and available only
in clausesel.c.

(b) Where exactly this enrichment happens.

Theoretically we might enrich the query plan (add nodes with info about
the statistics), so that EXPLAIN gets the info, and it might still
happen in clausesel.c.

2) combining multiple statistics

I think the ability to combine multivariate statistics (covering different
subsets of conditions) is important and useful, but I'm starting to think
that the current implementation may not be the correct one (which is why I
haven't written the SGML docs about this part of the patch series yet).

Assume there's a table "t" with 3 columns (a, b, c), and that we're
estimating query:

SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3

but that we only have two statistics (a,b) and (b,c). The current patch does
about this:

P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2)

i.e. it estimates the first two conditions using (a,b), and then estimates
(c=3) using (b,c) with "b=2" as a condition. Now, this is very efficient,
but it only works as long as the query contains conditions "connecting" the
two statistics. So if we remove the "b=2" condition from the query, this
stops working.

This is trying to make the algorithm smarter than the user, which is
something I'd think we could live without. In this case statistics on
(a,c) or (a,b,c) are missing. And what if the user does not want to
make use of stats for (a,c) because he only defined (a,b) and (b,c)?

I don't think so. Obviously, if you have statistics covering all the
conditions - great, we can't really do better than that.

But there's a crucial relation between the number of dimensions of the
statistics and accuracy of the statistics. Let's say you have statistics
on 8 columns, and you split each dimension twice to build a histogram -
that's 256 buckets right there, and we only get ~50% selectivity in each
dimension (the actual histogram building algorithm is more complex, but
you get the idea).

I see this as probably the most interesting part of the patch, and quite
useful. But we'll definitely get the single-statistics estimate first,
no doubt about that.

Patch 0001: there have been comments about that before, and you have
put the checks on RestrictInfo in a couple of variables of
pull_varnos_walker, so nothing to say from here.

I don't follow. Are you suggesting 0001 is a reasonable fix, or that
there's a proposed solution?

Patch 0002:
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
s/in the/in the/.

+  <para>
+   Create table <structname>t1</> with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for detemining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
s/detemining/determining/

+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
I would just assume that a statistics is located on the schema of the
relation it depends on. So the thing that may be better to do is just:
- Register the OID of the table a statistics depends on but not the schema.
- Give up on those query extensions related to the schema.
- Allow the same statistics name to be used for multiple tables.
- Just fail if a statistics name is being reused on the table again.
It may be better to complain about that even if the column list is
different.
- Register the dependency between the statistics and the table.

The idea is that the syntax should work even for statistics built on
multiple tables, e.g. to provide better statistics for joins. That's why
the schema may be specified (as each table might be in different
schema), and so on.

+ALTER STATISTICS <replaceable class="parameter">name</replaceable>
OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> |
CURRENT_USER | SESSION_USER }
On the same line, is OWNER TO really necessary? I could have assumed
that if a user is able to query the set of columns related to a
statistics, he should have access to it.

Not sure, TBH. I think I've reused ALTER INDEX syntax, but now I see
it's actually ignored with a warning.

=# create statistics aa_a_b3 on aam (a, b) with (dependencies);
ERROR: 23505: duplicate key value violates unique constraint
"pg_mv_statistic_name_index"
DETAIL: Key (staname, stanamespace)=(aa_a_b3, 2200) already exists.
SCHEMA NAME: pg_catalog
TABLE NAME: pg_mv_statistic
CONSTRAINT NAME: pg_mv_statistic_name_index
LOCATION: _bt_check_unique, nbtinsert.c:433
When creating a multivariate function with a name that already exists,
this error message should be more friendly.

Yes, agreed.

=# create table aa (a int, b int);
CREATE TABLE
=# create view aav as select * from aa;
CREATE VIEW
=# create statistics aab_v on aav (a, b) with (dependencies);
CREATE STATISTICS
Why do views and foreign tables support this command? This code also
mentions that this case is not actually supported:
+       /* multivariate stats are supported on tables and matviews */
+       if (rel->rd_rel->relkind == RELKIND_RELATION ||
+           rel->rd_rel->relkind == RELKIND_MATVIEW)
+           tupdesc = RelationGetDescr(rel);

};

Yes, seems like a bug.

+
/*
Spurious noise in the patch.

+   /* check that at least some statistics were requested */
+   if (!build_dependencies)
+       ereport(ERROR,
+               (errcode(ERRCODE_SYNTAX_ERROR),
+                errmsg("no statistics type (dependencies) was requested")));
So, WITH (dependencies) is mandatory in any case. Why not just
dropping it from the first cut then?

Because the follow-up patches extend this to require at least one
statistics type. So in 0004 it becomes

if (!(build_dependencies || build_mcv))

and in 0005 it's

if (!(build_dependencies || build_mcv || build_histogram))

We might drop it from 0002 (and assume build_dependencies=true), and
then add the check in 0004. But it seems a bit pointless.

pg_mv_stats shows only the attribute numbers of the columns it has
stats on, I think that those should be the column names. [...after a
while...], as it is mentioned here:
+ * TODO Would be nice if this printed column names (instead of just attnums).

Yeah.

Does this work properly with DDL deparsing? If yes, could it be
possible to add tests in test_ddl_deparse? This is a new object type,
so those look necessary I think.

I haven't done anything with DDL deparsing, so I think the answer is
"no" and needs to be added to a TODO.

Statistics definition reorder the columns by itself depending on their
order. For example:
create table aa (a int, b int);
create statistics aas on aa(b, a) with (dependencies);
\d aa
"public.aas" (dependencies) ON (a, b)
As this defines a correlation between multiple columns, isn't it wrong
to assume that (b, a) and (a, b) are always the same correlation? I
don't recall such properties as being always commutative (old
memories, I suck at stats in general). [...reading README...] So this
is caused by the implementation limitations that only limit the
analysis between interactions of two columns. Still it seems incorrect
to reorder the user-visible portion.

I don't follow. If you talk about Pearson's correlation, that clearly
does not depend on the order of columns - it's perfectly independent of
that. If you talk about about correlation in the wider sense (i.e.
arbitrary dependence between columns), that might depend - but I don't
remember a single piece of the patch where this might be a problem.

Also, which README states that we can only analyze interactions between
two columns? That's pretty clearly not the case - the patch should
handle dependencies between more columns without any problems.

The comment on top of get_relation_info needs to be updated to mention
that mvstatlist gets fetched as well.

+   while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+       /* TODO maybe include only already built statistics? */
+       result = insert_ordered_oid(result, HeapTupleGetOid(htup));
I haven't looked at the rest yet of the series yet, but I'd think that
including the ones not built may be a good idea to let caller do
itself more filtering. Of course this depends on the next series...

Probably, although the more I'm thinking about this the more I think
I'll rework this along the lines of the foreign-key-estimation patch,
i.e. preprocessing called from planmain.c (adding info to the query
plan), estimation in clausesel.c etc. Which also affects this bit,
because the foreign keys are also loaded elsewhere, IIRC.

+typedef struct MVDependencyData
+{
+   int         nattributes;    /* number of attributes */
+   int16       attributes[1];  /* attribute numbers */
+} MVDependencyData;
You need to look for FLEXIBLE_ARRAY_MEMBER here. Same for MVDependenciesData.
+++ b/src/test/regress/serial_schedule
@@ -167,3 +167,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: mv_dependencies
This test is not listed in parallel_schedule.
s/Apllying/Applying/

There is a lot of mumbo-jumbo regarding the way dependencies are
stored with mainly serialize_mv_dependencies and
deserialize_mv_dependencies that operates them from bytea/dep trees.
That's not cool and not portable because pg_mv_statistic represents
that as pure bytea. I would suggest creating a generic data type that
does those operations, named like pg_dependency_tree and then use that
in those new catalogs. pg_node_tree is a precedent of such a thing.
New features could as well make use of this new data type of we are
able to design that in a way generic enough, so that would be a base
patch that the current 0002 applies on top of.

Interesting idea, haven't thought about that. So are you suggesting to
add a data type for each statistics type (dependencies, MCV, histogram,
...)?

Regarding psql:
- The new commands lack psql completion, that would ease the use of
the new commands.
- Would it make sense to have a backslash command to show the list of
statistics?

Yeah, that's on the TODO.

Congratulations. I just looked at 25% of the overall patch and my mind
is already blown away, but I am catching up with the rest...

Thanks for looking.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#139

Petr Jelinek

petr@2ndquadrant.com

over 9 years ago

In reply to: Tomas Vondra (#138)

Re: multivariate statistics (v19)

On 10/08/16 13:33, Tomas Vondra wrote:

On 08/10/2016 06:41 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra

2) combining multiple statistics

I think the ability to combine multivariate statistics (covering
different
subsets of conditions) is important and useful, but I'm starting to
think
that the current implementation may not be the correct one (which is
why I
haven't written the SGML docs about this part of the patch series yet).

Assume there's a table "t" with 3 columns (a, b, c), and that we're
estimating query:

SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3

but that we only have two statistics (a,b) and (b,c). The current
patch does
about this:

P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2)

i.e. it estimates the first two conditions using (a,b), and then
estimates
(c=3) using (b,c) with "b=2" as a condition. Now, this is very
efficient,
but it only works as long as the query contains conditions
"connecting" the
two statistics. So if we remove the "b=2" condition from the query, this
stops working.

This is trying to make the algorithm smarter than the user, which is
something I'd think we could live without. In this case statistics on
(a,c) or (a,b,c) are missing. And what if the user does not want to
make use of stats for (a,c) because he only defined (a,b) and (b,c)?

I don't think so. Obviously, if you have statistics covering all the
conditions - great, we can't really do better than that.

But there's a crucial relation between the number of dimensions of the
statistics and accuracy of the statistics. Let's say you have statistics
on 8 columns, and you split each dimension twice to build a histogram -
that's 256 buckets right there, and we only get ~50% selectivity in each
dimension (the actual histogram building algorithm is more complex, but
you get the idea).

I think it makes sense to pursue this, but I also think we can easily
live with not having it in the first version that gets committed and
doing it as follow-up patch.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#140

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tomas Vondra (#138)

Re: multivariate statistics (v19)

On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 08/10/2016 06:41 AM, Michael Paquier wrote:

Patch 0001: there have been comments about that before, and you have
put the checks on RestrictInfo in a couple of variables of
pull_varnos_walker, so nothing to say from here.

I don't follow. Are you suggesting 0001 is a reasonable fix, or that there's
a proposed solution?

I think that's reasonable.

Patch 0002:
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
s/in the/in the/.

+  <para>
+   Create table <structname>t1</> with two functionally dependent
columns, i.e.
+   knowledge of a value in the first column is sufficient for detemining
the
+   value in the other column. Then functional dependencies are built on
those
+   columns:
s/detemining/determining/

+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the
specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in
the
+   same schema.
+  </para>
I would just assume that a statistics is located on the schema of the
relation it depends on. So the thing that may be better to do is just:
- Register the OID of the table a statistics depends on but not the
schema.
- Give up on those query extensions related to the schema.
- Allow the same statistics name to be used for multiple tables.
- Just fail if a statistics name is being reused on the table again.
It may be better to complain about that even if the column list is
different.
- Register the dependency between the statistics and the table.

The idea is that the syntax should work even for statistics built on
multiple tables, e.g. to provide better statistics for joins. That's why the
schema may be specified (as each table might be in different schema), and so
on.

So you mean that the same statistics could be shared between tables?
But as this is visibly not a concept introduced yet in this set of
patches, why not just cut it off for now to simplify the whole? If
there is no schema-related field in pg_mv_statistics we could still
add it later if it proves to be useful.

+
/*
Spurious noise in the patch.
+   /* check that at least some statistics were requested */
+   if (!build_dependencies)
+       ereport(ERROR,
+               (errcode(ERRCODE_SYNTAX_ERROR),
+                errmsg("no statistics type (dependencies) was
requested")));
So, WITH (dependencies) is mandatory in any case. Why not just
dropping it from the first cut then?
Because the follow-up patches extend this to require at least one statistics
type. So in 0004 it becomes

if (!(build_dependencies || build_mcv))

and in 0005 it's

if (!(build_dependencies || build_mcv || build_histogram))

We might drop it from 0002 (and assume build_dependencies=true), and then
add the check in 0004. But it seems a bit pointless.

This is a complicated set of patches. I'd think that we should try to
simplify things as much as possible first, and the WITH clause is not
mandatory to have as of 0002.

Statistics definition reorder the columns by itself depending on their
order. For example:
create table aa (a int, b int);
create statistics aas on aa(b, a) with (dependencies);
\d aa
"public.aas" (dependencies) ON (a, b)
As this defines a correlation between multiple columns, isn't it wrong
to assume that (b, a) and (a, b) are always the same correlation? I
don't recall such properties as being always commutative (old
memories, I suck at stats in general). [...reading README...] So this
is caused by the implementation limitations that only limit the
analysis between interactions of two columns. Still it seems incorrect
to reorder the user-visible portion.

I don't follow. If you talk about Pearson's correlation, that clearly does
not depend on the order of columns - it's perfectly independent of that. If
you talk about about correlation in the wider sense (i.e. arbitrary
dependence between columns), that might depend - but I don't remember a
single piece of the patch where this might be a problem.

Yes, based on what is done in the patch that may not be a problem, but
I am wondering if this is not restricting things too much.

Also, which README states that we can only analyze interactions between two
columns? That's pretty clearly not the case - the patch should handle
dependencies between more columns without any problems.

I have noticed that the patch evaluates all the set of permutations
possible using a column list, it seems to me though that say if we
have three columns (a,b,c) listed in a statistics, (a,b) => c and
(b,a) => c are two different things.

There is a lot of mumbo-jumbo regarding the way dependencies are
stored with mainly serialize_mv_dependencies and
deserialize_mv_dependencies that operates them from bytea/dep trees.
That's not cool and not portable because pg_mv_statistic represents
that as pure bytea. I would suggest creating a generic data type that
does those operations, named like pg_dependency_tree and then use that
in those new catalogs. pg_node_tree is a precedent of such a thing.
New features could as well make use of this new data type of we are
able to design that in a way generic enough, so that would be a base
patch that the current 0002 applies on top of.

Interesting idea, haven't thought about that. So are you suggesting to add a
data type for each statistics type (dependencies, MCV, histogram, ...)?

Yes that would be something like that, it would be actually perhaps
better to have one single data type, and be able to switch between
each model easily instead of putting byteas in the catalog.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#141

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Petr Jelinek (#139)

Re: multivariate statistics (v19)

On Wed, Aug 10, 2016 at 8:50 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

On 10/08/16 13:33, Tomas Vondra wrote:

On 08/10/2016 06:41 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra

2) combining multiple statistics

I think the ability to combine multivariate statistics (covering
different
subsets of conditions) is important and useful, but I'm starting to
think
that the current implementation may not be the correct one (which is
why I
haven't written the SGML docs about this part of the patch series yet).

Assume there's a table "t" with 3 columns (a, b, c), and that we're
estimating query:

SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3

but that we only have two statistics (a,b) and (b,c). The current
patch does
about this:

P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2)

i.e. it estimates the first two conditions using (a,b), and then
estimates
(c=3) using (b,c) with "b=2" as a condition. Now, this is very
efficient,
but it only works as long as the query contains conditions
"connecting" the
two statistics. So if we remove the "b=2" condition from the query, this
stops working.

This is trying to make the algorithm smarter than the user, which is
something I'd think we could live without. In this case statistics on
(a,c) or (a,b,c) are missing. And what if the user does not want to
make use of stats for (a,c) because he only defined (a,b) and (b,c)?

I don't think so. Obviously, if you have statistics covering all the
conditions - great, we can't really do better than that.

But there's a crucial relation between the number of dimensions of the
statistics and accuracy of the statistics. Let's say you have statistics
on 8 columns, and you split each dimension twice to build a histogram -
that's 256 buckets right there, and we only get ~50% selectivity in each
dimension (the actual histogram building algorithm is more complex, but
you get the idea).

I think it makes sense to pursue this, but I also think we can easily live
with not having it in the first version that gets committed and doing it as
follow-up patch.

This patch is large and complicated enough. As this is not a mandatory
piece to get a basic support, I'd suggest just to drop that for later.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#142

Ants Aasma

ants.aasma@eesti.ee

over 9 years ago

In reply to: Tomas Vondra (#122)

Re: multivariate statistics (v19)

On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

2) combining multiple statistics

I think the ability to combine multivariate statistics (covering different
subsets of conditions) is important and useful, but I'm starting to think
that the current implementation may not be the correct one (which is why I
haven't written the SGML docs about this part of the patch series yet).

While researching this topic a few years ago I came across a paper on
this exact topic called "Consistently Estimating the Selectivity of
Conjuncts of Predicates" [1]https://courses.cs.washington.edu/courses/cse544/11wi/papers/markl-vldb-2005.pdf. While effective it seems to be quite
heavy-weight, so would probably need support for tiered optimization.

[1]: https://courses.cs.washington.edu/courses/cse544/11wi/papers/markl-vldb-2005.pdf

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: 1946678140.176323.1470189523073@RIA

#143

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Ants Aasma (#142)

Re: multivariate statistics (v19)

On 08/10/2016 03:29 PM, Ants Aasma wrote:

On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

2) combining multiple statistics

I think the ability to combine multivariate statistics (covering different
subsets of conditions) is important and useful, but I'm starting to think
that the current implementation may not be the correct one (which is why I
haven't written the SGML docs about this part of the patch series yet).

While researching this topic a few years ago I came across a paper on
this exact topic called "Consistently Estimating the Selectivity of
Conjuncts of Predicates" [1]. While effective it seems to be quite
heavy-weight, so would probably need support for tiered optimization.

[1] https://courses.cs.washington.edu/courses/cse544/11wi/papers/markl-vldb-2005.pdf

I think I've read that paper some time ago, and IIRC it's solving the
same problem but in a very different way - instead of combining the
statistics directly, it relies on the "partial" selectivities and then
estimates the total selectivity using the maximum-entropy principle.

I think it's a nice idea and it probably works fine in many cases, but
it kinda throws away part of the information (that we could get by
matching the statistics against each other directly). But I'll keep that
paper in mind, and we can revisit this solution later.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#144

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Michael Paquier (#141)

Re: multivariate statistics (v19)

On 08/10/2016 02:24 PM, Michael Paquier wrote:

On Wed, Aug 10, 2016 at 8:50 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

On 10/08/16 13:33, Tomas Vondra wrote:

On 08/10/2016 06:41 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra

2) combining multiple statistics

I think the ability to combine multivariate statistics (covering
different
subsets of conditions) is important and useful, but I'm starting to
think
that the current implementation may not be the correct one (which is
why I
haven't written the SGML docs about this part of the patch series yet).

Assume there's a table "t" with 3 columns (a, b, c), and that we're
estimating query:

SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3

but that we only have two statistics (a,b) and (b,c). The current
patch does
about this:

P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2)

i.e. it estimates the first two conditions using (a,b), and then
estimates
(c=3) using (b,c) with "b=2" as a condition. Now, this is very
efficient,
but it only works as long as the query contains conditions
"connecting" the
two statistics. So if we remove the "b=2" condition from the query, this
stops working.

This is trying to make the algorithm smarter than the user, which is
something I'd think we could live without. In this case statistics on
(a,c) or (a,b,c) are missing. And what if the user does not want to
make use of stats for (a,c) because he only defined (a,b) and (b,c)?

I don't think so. Obviously, if you have statistics covering all the
conditions - great, we can't really do better than that.

But there's a crucial relation between the number of dimensions of the
statistics and accuracy of the statistics. Let's say you have statistics
on 8 columns, and you split each dimension twice to build a histogram -
that's 256 buckets right there, and we only get ~50% selectivity in each
dimension (the actual histogram building algorithm is more complex, but
you get the idea).

I think it makes sense to pursue this, but I also think we can easily live
with not having it in the first version that gets committed and doing it as
follow-up patch.

This patch is large and complicated enough. As this is not a mandatory
piece to get a basic support, I'd suggest just to drop that for later.

Which is why combining multiple statistics is in part 0006 and all the
previous parts simply choose the single "best" statistics ;-)

I'm perfectly fine with committing just the first few parts, and leaving
0006 for the next major version.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#145

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Michael Paquier (#140)

Re: multivariate statistics (v19)

On 08/10/2016 02:23 PM, Michael Paquier wrote:

On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 08/10/2016 06:41 AM, Michael Paquier wrote:

Patch 0001: there have been comments about that before, and you have
put the checks on RestrictInfo in a couple of variables of
pull_varnos_walker, so nothing to say from here.

I don't follow. Are you suggesting 0001 is a reasonable fix, or that there's
a proposed solution?

I think that's reasonable.

Well, to me the 0001 feels more like a temporary workaround rather than
a proper solution. I just don't know how to deal with it so I've kept it
for now. Pretty sure there will be complaints that adding RestrictInfo
to the expression walkers is not a nice idea.

...

The idea is that the syntax should work even for statistics built on
multiple tables, e.g. to provide better statistics for joins. That's why the
schema may be specified (as each table might be in different schema), and so
on.

So you mean that the same statistics could be shared between tables?
But as this is visibly not a concept introduced yet in this set of
patches, why not just cut it off for now to simplify the whole? If
there is no schema-related field in pg_mv_statistics we could still
add it later if it proves to be useful.

Yes, I think creating statistics on multiple tables is one of the
possible future directions. One of the previous patch versions
introduced ALTER TABLE ... ADD STATISTICS syntax, but that ran into
issues in gram.y, and given the multi-table possibilities the CREATE
STATISTICS seems like a much better idea anyway.

But I guess you're right we may make this a bit more strict now, and
relax it in the future if needed. For example as we only support
single-table statistics at this point, we may remove the schema and
always create the statistics in the schema of the table.

But I don't think we should make the statistics names unique only within
a table (instead of within the schema).

The difference between those two cases is that if we allow multi-table
statistics in the future, we can simply allow specifying the schema and
everything will work just fine. But it'd break the second case, as it
might result in conflicts in existing schemas.

I do realize this might be seen as a case of "future proofing" based on
dubious predictions of how something might work, but OTOH this (schema
inherited from table, unique within a schema) is pretty consistent with
how this work for indexes.

+
/*
Spurious noise in the patch.
+   /* check that at least some statistics were requested */
+   if (!build_dependencies)
+       ereport(ERROR,
+               (errcode(ERRCODE_SYNTAX_ERROR),
+                errmsg("no statistics type (dependencies) was
requested")));
So, WITH (dependencies) is mandatory in any case. Why not just
dropping it from the first cut then?
Because the follow-up patches extend this to require at least one statistics
type. So in 0004 it becomes

if (!(build_dependencies || build_mcv))

and in 0005 it's

if (!(build_dependencies || build_mcv || build_histogram))

We might drop it from 0002 (and assume build_dependencies=true), and then
add the check in 0004. But it seems a bit pointless.
This is a complicated set of patches. I'd think that we should try to
simplify things as much as possible first, and the WITH clause is not
mandatory to have as of 0002.

OK, I can remove the WITH from the 0002 part. Not a big deal.

Statistics definition reorder the columns by itself depending on their
order. For example:
create table aa (a int, b int);
create statistics aas on aa(b, a) with (dependencies);
\d aa
"public.aas" (dependencies) ON (a, b)
As this defines a correlation between multiple columns, isn't it wrong
to assume that (b, a) and (a, b) are always the same correlation? I
don't recall such properties as being always commutative (old
memories, I suck at stats in general). [...reading README...] So this
is caused by the implementation limitations that only limit the
analysis between interactions of two columns. Still it seems incorrect
to reorder the user-visible portion.

I don't follow. If you talk about Pearson's correlation, that clearly does
not depend on the order of columns - it's perfectly independent of that. If
you talk about about correlation in the wider sense (i.e. arbitrary
dependence between columns), that might depend - but I don't remember a
single piece of the patch where this might be a problem.

Yes, based on what is done in the patch that may not be a problem, but
I am wondering if this is not restricting things too much.

Let's keep the code as it is. If we run into this issue in the future,
we can easily relax this - there's nothing depending on the ordering of
attnums, IIRC.

Also, which README states that we can only analyze interactions between two
columns? That's pretty clearly not the case - the patch should handle
dependencies between more columns without any problems.

I have noticed that the patch evaluates all the set of permutations
possible using a column list, it seems to me though that say if we
have three columns (a,b,c) listed in a statistics, (a,b) => c and
(b,a) => c are two different things.

Yes, those are two different functional dependencies, of course. But the
algorithm (during ANALYZE) should discover all of them, and even the
examples are using three columns, so I'm not sure what you mean by
"analyze interactions between two columns"?

There is a lot of mumbo-jumbo regarding the way dependencies are
stored with mainly serialize_mv_dependencies and
deserialize_mv_dependencies that operates them from bytea/dep trees.
That's not cool and not portable because pg_mv_statistic represents
that as pure bytea. I would suggest creating a generic data type that
does those operations, named like pg_dependency_tree and then use that
in those new catalogs. pg_node_tree is a precedent of such a thing.
New features could as well make use of this new data type of we are
able to design that in a way generic enough, so that would be a base
patch that the current 0002 applies on top of.

Interesting idea, haven't thought about that. So are you suggesting to add a
data type for each statistics type (dependencies, MCV, histogram, ...)?

Yes that would be something like that, it would be actually perhaps
better to have one single data type, and be able to switch between
each model easily instead of putting byteas in the catalog.

Hmmm, not sure about that. For example what about combinations of
statistics - e.g. when we have MCV list on the most common values and a
histogram on the rest? Should we store both as a single value, or would
that be in two separate values, or what?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#146

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tomas Vondra (#145)

Re: multivariate statistics (v19)

On Thu, Aug 11, 2016 at 3:34 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 08/10/2016 02:23 PM, Michael Paquier wrote:

On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

The idea is that the syntax should work even for statistics built on
multiple tables, e.g. to provide better statistics for joins. That's why
the
schema may be specified (as each table might be in different schema), and
so
on.

So you mean that the same statistics could be shared between tables?
But as this is visibly not a concept introduced yet in this set of
patches, why not just cut it off for now to simplify the whole? If
there is no schema-related field in pg_mv_statistics we could still
add it later if it proves to be useful.

Yes, I think creating statistics on multiple tables is one of the possible
future directions. One of the previous patch versions introduced ALTER TABLE
... ADD STATISTICS syntax, but that ran into issues in gram.y, and given the
multi-table possibilities the CREATE STATISTICS seems like a much better
idea anyway.

But I guess you're right we may make this a bit more strict now, and relax
it in the future if needed. For example as we only support single-table
statistics at this point, we may remove the schema and always create the
statistics in the schema of the table.

This would simplify the code the code a bit so I'd suggest removing
that from the first shot. If there is demand for it, keeping the
infrastructure open for this extension is what we had better do.

But I don't think we should make the statistics names unique only within a
table (instead of within the schema).

They could be made unique using (name, table_oid, column_list).

There is a lot of mumbo-jumbo regarding the way dependencies are
stored with mainly serialize_mv_dependencies and
deserialize_mv_dependencies that operates them from bytea/dep trees.
That's not cool and not portable because pg_mv_statistic represents
that as pure bytea. I would suggest creating a generic data type that
does those operations, named like pg_dependency_tree and then use that
in those new catalogs. pg_node_tree is a precedent of such a thing.
New features could as well make use of this new data type of we are
able to design that in a way generic enough, so that would be a base
patch that the current 0002 applies on top of.

Interesting idea, haven't thought about that. So are you suggesting to
add a
data type for each statistics type (dependencies, MCV, histogram, ...)?

Yes that would be something like that, it would be actually perhaps
better to have one single data type, and be able to switch between
each model easily instead of putting byteas in the catalog.

Hmmm, not sure about that. For example what about combinations of statistics
- e.g. when we have MCV list on the most common values and a histogram on
the rest? Should we store both as a single value, or would that be in two
separate values, or what?

The same statistics can combine two different things, using different
columns may depend on how readable things get...
Btw, for the format we could get inspired from pg_node_tree, with pg_stat_tree:
{HISTOGRAM :arg {BUCKET :index 0 :minvals ... }}
{DEPENDENCY :arg {:elt "a => c" ...} ... }
{MVC :arg {:index 0 :values {0,0} ... } ... }
Please consider that as a tentative idea to make things more friendly.
Others may have a different opinion on the matter.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#147

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Michael Paquier (#137)

Re: multivariate statistics (v19)

On 08/10/2016 06:41 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

1) enriching the query tree with multivariate statistics info

Right now all the stuff related to multivariate statistics estimation
happens in clausesel.c - matching condition to statistics, selection of
statistics to use (if there are multiple usable stats), etc. So pretty much
all this info is internal to clausesel.c and does not get outside.

This does not seem bad to me as first sight but...

I'm starting to think that some of the steps (matching quals to stats,
selection of stats) should happen in a "preprocess" step before the actual
estimation, storing the information (which stats to use, etc.) in a new type
of node in the query tree - something like RestrictInfo.

I believe this needs to happen sometime after deconstruct_jointree() as that
builds RestrictInfos nodes, and looking at planmain.c, right after
extract_restriction_or_clauses seems about right. Haven't tried, though.

This would move all the "statistics selection" logic from clausesel.c,
separating it from the "actual estimation" and simplifying the code.

But more importantly, I think we'll need to show some of the data in EXPLAIN
output. With per-column statistics it's fairly straightforward to determine
which statistics are used and how. But with multivariate stats things are
often more complicated - there may be multiple candidate statistics (e.g.
histograms covering different subsets of the conditions), it's possible to
apply them in different orders, etc.

But EXPLAIN can't show the info if it's ephemeral and available only within
clausesel.c (and thrown away after the estimation).

This gives a good reason to not do that in clauserel.c, it would be
really cool to be able to get some information regarding the stats
used with a simple EXPLAIN.

I've been thinking about this, and I'm afraid it's way more complicated
in practice. It essentially means doing something like

rel->baserestrictinfo = enrichWithStatistics(rel->baserestrictinfo);

for each table (and in the future maybe also for joins etc.) But as the
name suggests the list should only include RestrictInfo nodes, which
seems to contradict the transformation.

For example with conditions

WHERE (a=1) AND (b=2) AND (c=3)

the list will contain 3 RestrictInfos. But if there's a statistics on
(a,b,c), we need to note that somehow - my plan was to inject a node
storing this information, something like (a bit simplified):

StatisticsInfo {
Oid statisticsoid; /* OID of the statistics */
List *mvconditions; /* estimate using the statistics */
List *otherconditions; /* estimate the old way */
}

But that'd clearly violate the assumption that baserestrictinfo only
contains RestrictInfo. I don't think it's feasible (or desirable) to
rework all the places to expect both RestrictInfo and the new node.

I can think of two alternatives:

1) keep the transformed list as separate list, next to baserestrictinfo

This obviously fixes the issue, as each caller can decide which node it
wants. But it also means we need to maintain two lists instead of one,
and keep them synchronized.

2) embed the information into the existing tree

It might be possible to store the information in existing nodes, i.e.
each node would track whether it's estimated the "old way" or using
multivariate statistics (and which one). But it would require changing
many of the existing nodes (at least those compatible with multivariate
statistics: currently OpExpr, NullTest, ...).

And it also seems fairly difficult to reconstruct the information during
the estimation, as it'd be necessary to look for other nodes to be
estimated by the same statistics. Which seems to defeat the idea of
preprocessing to some degree.

So I'm not sure what's the best solution. I'm leaning to (1), i.e.
keeping a separate list, but I'd welcome other ideas.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#148

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Tomas Vondra (#133)

Re: multivariate statistics (v19)

On Tue, Aug 2, 2016 at 9:58 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is v19 of the "multivariate stats" patch series - essentially v18
rebased on top of current master.

Tom:

ISTR that you were going to try to look at this patch set. It seems
from the discussion that it's not really ready for serious
consideration for commit yet, but also that some high-level design
comments from you at this stage could go a long way toward making sure
that the final form of the patch is something that will be acceptable.

I'd really like to see us get some kind of capability along these
lines, but I'm sure it will go a lot better if you or Dean handle it
than if I try to do it ... not to mention that there are only so many
hours in the day.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#149

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Robert Haas (#148)

Re: multivariate statistics (v19)

On Wed, Aug 24, 2016 at 2:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:

ISTR that you were going to try to look at this patch set. It seems
from the discussion that it's not really ready for serious
consideration for commit yet, but also that some high-level design
comments from you at this stage could go a long way toward making sure
that the final form of the patch is something that will be acceptable.

I'd really like to see us get some kind of capability along these
lines, but I'm sure it will go a lot better if you or Dean handle it
than if I try to do it ... not to mention that there are only so many
hours in the day.

Agreed. What I have been able to look until now was the high-level
structure of the patch, and I think that we should really shave 0002
and simplify it to get a core infrastructure in place, but the core
patch is at another level, and it would be good to get some feedback
regarding the structure of the patch and if it is moving in the good
direction is good or not.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#150

Dean Rasheed

dean.a.rasheed@gmail.com

over 9 years ago

In reply to: Michael Paquier (#149)

Re: multivariate statistics (v19)

On 3 August 2016 at 02:58, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Attached is v19 of the "multivariate stats" patch series

Hi,

I started looking at this - just at a very high level - I've not read
much of the detail yet, but here are some initial review comments.

I think the overall infrastructure approach for CREATE STATISTICS
makes sense, and I agree with other suggestions upthread that it would
be useful to be able to build statistics on arbitrary expressions,
although that doesn't need to be part of this patch, it's useful to
keep that in mind as a possible future extension of this initial
design.

I can imagine it being useful to be able to create user-defined
statistics on an arbitrary list of expressions, and I think that would
include univariate as well as multivariate statistics. Perhaps that's
something to take into account in the naming of things, e.g., as David
Rowley suggested, something like pg_statistic_ext, rather than
pg_mv_statistic.

I also like the idea that this might one day be extended to support
statistics across multiple tables, although I think that might be
challenging to achieve -- you'd need a method of taking a random
sample of rows from a join between 2 or more tables. However, if the
intention is to be able to support that one day, I think that needs to
be accounted for in the syntax now -- specifically, I think it will be
too limiting to only support things extending the current syntax of
the form table1(col1, col2, ...), table2(col1, col2, ...), because
that precludes building statistics on an expression referring to
columns from more than one table. So I think we should plan further
ahead and use a syntax giving greater flexibility in the future, for
example something structured more like a query (like CREATE VIEW):

CREATE STATISTICS name
[ WITH (options) ]
ON expression [, ...]
FROM table [, ...]
WHERE condition

where the first version of the patch would only support expressions
that are simple column references, and would require at least 2 such
columns from a single table with no WHERE clause, i.e.:

CREATE STATISTICS name
[ WITH (options) ]
ON column1, column2 [, ...]
FROM table

For multi-table statistics, a WHERE clause would typically be needed
to specify how the tables are expected to be joined, but potentially
such a clause might also be useful in single-table statistics, to
build partial statistics on a commonly queried subset of the table,
just like a partial index.

Of course, I'm not suggesting that the current patch do any of that --
it's big enough as it is. I'm just throwing out possible future
directions this might go in, so that we don't get painted into a
corner when designing the syntax for the current patch.

Regarding the statistics themselves, I read the description of soft
functional dependencies, and I'm somewhat skeptical about that
algorithm. I don't like the arbitrary thresholds or the sudden jump
from independence to dependence and clause reduction. As others have
said, I think this should account for a continuous spectrum of
dependence from fully independent to fully dependent, and combine
clause selectivities in a way based on the degree of dependence. For
example, if you computed an estimate for the fraction 'f' of the
table's rows for which a -> b, then it might be reasonable to combine
the selectivities using

P(a,b) = P(a) * (f + (1-f) * P(b))

Of course, having just a single number that tells you the columns are
correlated, tells you nothing about whether the clauses on those
columns are consistent with that correlation. For example, in the
following table

CREATE TABLE t(a int, b int);
INSERT INTO t SELECT x/10, ((x/10)*789)%100 FROM generate_series(0,999) g(x);

'b' is functionally dependent on 'a' (and vice versa), but if you
query the rows with a<50 and with b<50, those clauses behave
essentially independently, because they're not consistent with the
functional dependence between 'a' and 'b', so the best way to combine
their selectivities is just to multiply them, as we currently do.

So whilst it may be interesting to determine that 'b' is functionally
dependent on 'a', it's not obvious whether that fact by itself should
be used in the selectivity estimates. Perhaps it should, on the
grounds that it's best to attempt to use all the available
information, but only if there are no more detailed statistics
available. In any case, knowing that there is a correlation can be
used as an indicator that it may be worthwhile to build more detailed
multivariate statistics like a MCV list or a histogram on those
columns.

Looking at the ndistinct coefficient 'q', I think it would be better
if the recorded statistic were just the estimate for
ndistinct(a,b,...) rather than a ratio of ndistinct values. That's a
more fundamental statistic, and it's easier to document and easier to
interpret. Also, I don't believe that the coefficient 'q' is the right
number to use for clause estimation:

Looking at README.ndistinct, I'm skeptical about the selectivity
estimation argument. In the case where a -> b, you'd have q =
ndistinct(b), so then P(a=1 & b=2) would become 1/ndistinct(a), which
is fine for a uniform distribution. But typically, there would be
univariate statistics on a and b, so if for example a=1 were 100x more
likely than average, you'd probably know that and the existing code
computing P(a=1) would reflect that, whereas simply using P(a=1 & b=2)
= 1/ndistinct(a) would be a significant underestimate, since it would
be ignoring known information about the distribution of a.

But likewise if, as is later argued, you were to use 'q' as a
correction factor applied to the individual clause selectivities, you
could end up with significant overestimates: if you said P(a=1 & b=2)
= q * P(a=1) * P(b=2), and a=1 were 100x more likely than average, and
a -> b, then b=2 would also be 100x more likely than average (assuming
that b=2 was the value implied by the functional dependency), and that
would also be reflected in the univariate statics on b, so then you'd
end up with an overall selectivity of around 10000/ndistinct(a), which
would be 100x too big. In fact, since a -> b means that q =
ndistinct(b), there's a good chance of hitting data for which q * P(b)
is greater than 1, so this formula would lead to a combined
selectivity greater than P(a), which is obviously nonsense.

Having a better estimate for ndistinct(a,b,...) looks very useful by
itself for GROUP BY estimation, and there may be other places that
would benefit from it, but I don't think it's the best statistic for
determining functional dependence or combining clause selectivities.

That's as much as I've looked at so far. It's such a big patch that
it's difficult to consider all at once. I think perhaps the smallest
committable self-contained unit providing a tangible benefit would be
something containing the core infrastructure plus the ndistinct
estimate and the improved GROUP BY estimation.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#151

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Dean Rasheed (#150)

Re: multivariate statistics (v19)

Hi,

Thanks for looking into this!

On 09/12/2016 04:08 PM, Dean Rasheed wrote:

On 3 August 2016 at 02:58, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Attached is v19 of the "multivariate stats" patch series

Hi,

I started looking at this - just at a very high level - I've not read
much of the detail yet, but here are some initial review comments.

I think the overall infrastructure approach for CREATE STATISTICS
makes sense, and I agree with other suggestions upthread that it would
be useful to be able to build statistics on arbitrary expressions,
although that doesn't need to be part of this patch, it's useful to
keep that in mind as a possible future extension of this initial
design.

I can imagine it being useful to be able to create user-defined
statistics on an arbitrary list of expressions, and I think that would
include univariate as well as multivariate statistics. Perhaps that's
something to take into account in the naming of things, e.g., as David
Rowley suggested, something like pg_statistic_ext, rather than
pg_mv_statistic.

I also like the idea that this might one day be extended to support
statistics across multiple tables, although I think that might be
challenging to achieve -- you'd need a method of taking a random
sample of rows from a join between 2 or more tables. However, if the
intention is to be able to support that one day, I think that needs to
be accounted for in the syntax now -- specifically, I think it will be
too limiting to only support things extending the current syntax of
the form table1(col1, col2, ...), table2(col1, col2, ...), because
that precludes building statistics on an expression referring to
columns from more than one table. So I think we should plan further
ahead and use a syntax giving greater flexibility in the future, for
example something structured more like a query (like CREATE VIEW):

CREATE STATISTICS name
[ WITH (options) ]
ON expression [, ...]
FROM table [, ...]
WHERE condition

where the first version of the patch would only support expressions
that are simple column references, and would require at least 2 such
columns from a single table with no WHERE clause, i.e.:

CREATE STATISTICS name
[ WITH (options) ]
ON column1, column2 [, ...]
FROM table

For multi-table statistics, a WHERE clause would typically be needed
to specify how the tables are expected to be joined, but potentially
such a clause might also be useful in single-table statistics, to
build partial statistics on a commonly queried subset of the table,
just like a partial index.

Hmm, the "partial statistics" idea seems interesting, It would allow us
to provide additional / more detailed statistics only for a subset of a
table.

I'm however not sure about the join case - how would the syntax work
with outer joins? But as you said, we only need

CREATE STATISTICS name
[ WITH (options) ]
ON (column1, column2 [, ...])
FROM table
WHERE condition

until we add support for join statistics.

Regarding the statistics themselves, I read the description of soft
functional dependencies, and I'm somewhat skeptical about that
algorithm. I don't like the arbitrary thresholds or the sudden jump
from independence to dependence and clause reduction. As others have
said, I think this should account for a continuous spectrum of
dependence from fully independent to fully dependent, and combine
clause selectivities in a way based on the degree of dependence. For
example, if you computed an estimate for the fraction 'f' of the
table's rows for which a -> b, then it might be reasonable to combine
the selectivities using

P(a,b) = P(a) * (f + (1-f) * P(b))

Yeah, I agree that the thresholds resulting in sudden changes between
"dependent" and "not dependent" are annoying. The question is whether it
makes sense to fix that, though - the functional dependencies were meant
as the simplest form of statistics, allowing us to get the rest of the
infrastructure in.

I'm OK with replacing the true/false dependencies with a degree of
dependency between 0 and 1, but I'm a bit afraid it'll result in
complaints that the first patch got too large / complicated.

It also contradicts the idea of using functional dependencies as a
low-overhead type of statistics, filtering the list of clauses that need
to be estimated using more expensive types of statistics (MCV lists,
histograms, ...). Switching to a degree of dependency would prevent
removal of "unnecessary" clauses.

Of course, having just a single number that tells you the columns are
correlated, tells you nothing about whether the clauses on those
columns are consistent with that correlation. For example, in the
following table

CREATE TABLE t(a int, b int);
INSERT INTO t SELECT x/10, ((x/10)*789)%100 FROM generate_series(0,999) g(x);

'b' is functionally dependent on 'a' (and vice versa), but if you
query the rows with a<50 and with b<50, those clauses behave
essentially independently, because they're not consistent with the
functional dependence between 'a' and 'b', so the best way to combine
their selectivities is just to multiply them, as we currently do.

So whilst it may be interesting to determine that 'b' is functionally
dependent on 'a', it's not obvious whether that fact by itself should
be used in the selectivity estimates. Perhaps it should, on the
grounds that it's best to attempt to use all the available
information, but only if there are no more detailed statistics
available. In any case, knowing that there is a correlation can be
used as an indicator that it may be worthwhile to build more detailed
multivariate statistics like a MCV list or a histogram on those
columns.

Right. IIRC this is actually described in the README as "incompatible
conditions". While implementing it, I concluded that this is OK and it's
up to the developer to decide whether the queries are compatible with
the "assumption of compatibility". But maybe this is reasoning is bogus
and makes (the current implementation of) functional dependencies
unusable in practice.

But I like the idea of reverting the order from

(a) look for functional dependencies
(b) reduce the clauses using functional dependencies
(c) estimate the rest using multivariate MCV/histograms

(a) estimate the rest using multivariate MCV/histograms
(b) try to apply functional dependencies on the remaining clauses

It contradicts the idea of functional dependencies as "low-overhead
statistics" but maybe it's worth it.

Looking at the ndistinct coefficient 'q', I think it would be better
if the recorded statistic were just the estimate for
ndistinct(a,b,...) rather than a ratio of ndistinct values. That's a
more fundamental statistic, and it's easier to document and easier to
interpret. Also, I don't believe that the coefficient 'q' is the right
number to use for clause estimation:

IIRC the reason why I stored the coefficient instead of the ndistinct()
values is that the coefficients are not directly related to number of
rows in the original relation, so you can apply it directly to whatever
cardinality estimate you have.

Otherwise it's mostly the same information - it's trivial to compute one
from the other.

Looking at README.ndistinct, I'm skeptical about the selectivity
estimation argument. In the case where a -> b, you'd have q =
ndistinct(b), so then P(a=1 & b=2) would become 1/ndistinct(a), which
is fine for a uniform distribution. But typically, there would be
univariate statistics on a and b, so if for example a=1 were 100x more
likely than average, you'd probably know that and the existing code
computing P(a=1) would reflect that, whereas simply using P(a=1 & b=2)
= 1/ndistinct(a) would be a significant underestimate, since it would
be ignoring known information about the distribution of a.

But likewise if, as is later argued, you were to use 'q' as a
correction factor applied to the individual clause selectivities, you
could end up with significant overestimates: if you said P(a=1 & b=2)
= q * P(a=1) * P(b=2), and a=1 were 100x more likely than average, and
a -> b, then b=2 would also be 100x more likely than average (assuming
that b=2 was the value implied by the functional dependency), and that
would also be reflected in the univariate statics on b, so then you'd
end up with an overall selectivity of around 10000/ndistinct(a), which
would be 100x too big. In fact, since a -> b means that q =
ndistinct(b), there's a good chance of hitting data for which q * P(b)
is greater than 1, so this formula would lead to a combined
selectivity greater than P(a), which is obviously nonsense.

Well, yeah. The

P(a=1) = 1/ndistinct(a)

was really just a simplification for the uniform distribution, and
looking at "q" as a correction factor is much more practical - no doubt
about that.

As for the overestimated and underestimates - I don't think we can
entirely prevent that. We're essentially replacing one assumption (AVIA)
with other assumptions (homogenity for ndistinct, compatibility for
functional dependencies), hoping that those assumptions are weaker in
some sense. But there'll always be cases that break those assumptions
and I don't think we can prevent that.

Unlike the functional dependencies, this "homogenity" assumption is not
dependent on the queries at all, so it should be possible to verify it
during ANALYZE.

Also, maybe we could/should use the same approach as for functional
dependencies, i.e. try using more detailed statistics first and then
apply ndistinct coefficients only on the remaining clauses?

Having a better estimate for ndistinct(a,b,...) looks very useful by
itself for GROUP BY estimation, and there may be other places that
would benefit from it, but I don't think it's the best statistic for
determining functional dependence or combining clause selectivities.

Not sure. I think it may be very useful type of statistics, but I'm not
going to fight for this very hard. I'm fine with ignoring this
statistics type for now, getting the other "detailed" statistics types
(MCV, histograms) in and then revisiting this.

That's as much as I've looked at so far. It's such a big patch that
it's difficult to consider all at once. I think perhaps the smallest
committable self-contained unit providing a tangible benefit would be
something containing the core infrastructure plus the ndistinct
estimate and the improved GROUP BY estimation.

FWIW I find the ndistinct statistics as rather uninteresting (at least
compared to the other types of statistics), which is why it's the last
patch in the patch series. Perhaps I shouldn't have include it at all,
as it's just a distraction.

regards
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#152

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Tomas Vondra (#151)

Re: multivariate statistics (v19)

This patch set is in pretty good shape, the only problem is that it's so
big that no-one seems to have the time or courage to do the final
touches and commit it. If we just focus on the functional dependencies
part for now, I think we might get somewhere. I peeked at the MCV and
histogram patches too, and I think they make total sense as well, and
are a natural extension of the functional dependencies patch. So if we
just focus on that for now, I don't think we will paint ourselves in the
corner.

(more below)

On 09/14/2016 01:01 AM, Tomas Vondra wrote:

On 09/12/2016 04:08 PM, Dean Rasheed wrote:

Regarding the statistics themselves, I read the description of soft
functional dependencies, and I'm somewhat skeptical about that
algorithm. I don't like the arbitrary thresholds or the sudden jump
from independence to dependence and clause reduction. As others have
said, I think this should account for a continuous spectrum of
dependence from fully independent to fully dependent, and combine
clause selectivities in a way based on the degree of dependence. For
example, if you computed an estimate for the fraction 'f' of the
table's rows for which a -> b, then it might be reasonable to combine
the selectivities using

P(a,b) = P(a) * (f + (1-f) * P(b))

Yeah, I agree that the thresholds resulting in sudden changes between
"dependent" and "not dependent" are annoying. The question is whether it
makes sense to fix that, though - the functional dependencies were meant
as the simplest form of statistics, allowing us to get the rest of the
infrastructure in.

I'm OK with replacing the true/false dependencies with a degree of
dependency between 0 and 1, but I'm a bit afraid it'll result in
complaints that the first patch got too large / complicated.

+1 for using a floating degree between 0 and 1, rather than a boolean.

It also contradicts the idea of using functional dependencies as a
low-overhead type of statistics, filtering the list of clauses that need
to be estimated using more expensive types of statistics (MCV lists,
histograms, ...). Switching to a degree of dependency would prevent
removal of "unnecessary" clauses.

That sounds OK to me, although I'm not deeply familiar with this patch yet.

Of course, having just a single number that tells you the columns are
correlated, tells you nothing about whether the clauses on those
columns are consistent with that correlation. For example, in the
following table

CREATE TABLE t(a int, b int);
INSERT INTO t SELECT x/10, ((x/10)*789)%100 FROM generate_series(0,999) g(x);

'b' is functionally dependent on 'a' (and vice versa), but if you
query the rows with a<50 and with b<50, those clauses behave
essentially independently, because they're not consistent with the
functional dependence between 'a' and 'b', so the best way to combine
their selectivities is just to multiply them, as we currently do.

So whilst it may be interesting to determine that 'b' is functionally
dependent on 'a', it's not obvious whether that fact by itself should
be used in the selectivity estimates. Perhaps it should, on the
grounds that it's best to attempt to use all the available
information, but only if there are no more detailed statistics
available. In any case, knowing that there is a correlation can be
used as an indicator that it may be worthwhile to build more detailed
multivariate statistics like a MCV list or a histogram on those
columns.

Right. IIRC this is actually described in the README as "incompatible
conditions". While implementing it, I concluded that this is OK and it's
up to the developer to decide whether the queries are compatible with
the "assumption of compatibility". But maybe this is reasoning is bogus
and makes (the current implementation of) functional dependencies
unusable in practice.

I think that's OK. It seems like a good assumption that the conditions
are "compatible" with the functional dependency. For two reasons:

1) A query with compatible clauses is much more likely to occur in real
life. Why would you run a query with an incompatible ZIP and city clauses?

2) If the conditions were in fact incompatible, the query is likely to
return 0 rows, and will bail out very quickly, even if the estimates are
way off and you choose a non-optimal plan. There are exceptions, of
course: an index scan might be able to conclude that there are no rows
much quicker than a seqscan, but as a general rule of thumb, a query
that returns 0 rows isn't very sensitive to the chosen plan.

And of course, as long as we're not collecting these statistics
automatically, if it doesn't work for your application, just don't
collect them.

I fear that using "statistics" as the name of the new object might get a
bit awkward. "statistics" is a plural, but we use it as the name of a
single object, like "pants" or "scissors". Not sure I have any better
ideas though. "estimator"? "statistics collection"? Or perhaps it should
be singular, "statistic". I note that you actually called the system
table "pg_mv_statistic", in singular.

I'm not a big fan of storing the stats as just a bytea blob, and having
to use special functions to interpret it. By looking at the patch, it's
not clear to me what we actually store for functional dependencies. A
list of attribute numbers? Could we store them simply as an int[]? (I'm
not a big fan of the hack in pg_statistic, that allows storing arrays of
any data type in the same column, though. But for functional
dependencies, I don't think we need that.)

Overall, this is going to be a great feature!

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#153

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Heikki Linnakangas (#152)

Re: multivariate statistics (v19)

On Fri, Sep 30, 2016 at 8:10 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

This patch set is in pretty good shape, the only problem is that it's so big
that no-one seems to have the time or courage to do the final touches and
commit it.

Did you see my suggestions about simplifying its SQL structure? You
could shave some code without impacting the base set of features.

I fear that using "statistics" as the name of the new object might get a bit
awkward. "statistics" is a plural, but we use it as the name of a single
object, like "pants" or "scissors". Not sure I have any better ideas though.
"estimator"? "statistics collection"? Or perhaps it should be singular,
"statistic". I note that you actually called the system table
"pg_mv_statistic", in singular.

I'm not a big fan of storing the stats as just a bytea blob, and having to
use special functions to interpret it. By looking at the patch, it's not
clear to me what we actually store for functional dependencies. A list of
attribute numbers? Could we store them simply as an int[]? (I'm not a big
fan of the hack in pg_statistic, that allows storing arrays of any data type
in the same column, though. But for functional dependencies, I don't think
we need that.)

I am marking this patch as returned with feedback for now.

Overall, this is going to be a great feature!

+1.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#154

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Michael Paquier (#153)

Re: multivariate statistics (v19)

On 10/03/2016 04:46 AM, Michael Paquier wrote:

On Fri, Sep 30, 2016 at 8:10 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

This patch set is in pretty good shape, the only problem is that it's so big
that no-one seems to have the time or courage to do the final touches and
commit it.

Did you see my suggestions about simplifying its SQL structure? You
could shave some code without impacting the base set of features.

Yeah. The idea was to use something like pg_node_tree to store all the
different kinds of statistics, the histogram, the MCV, and the
functional dependencies, in one datum. Or JSON, maybe. It sounds better
than an opaque bytea blob, although I'd prefer something more
relational. For the functional dependencies, I think we could get away
with a simple float array, so let's do that in the first cut, and
revisit this for the MCV and histogram later. Separate columns for the
functional dependencies, the MCVs, and the histogram, probably makes
sense anyway.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#155

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Heikki Linnakangas (#154)

Re: multivariate statistics (v19)

On Mon, Oct 3, 2016 at 8:25 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Yeah. The idea was to use something like pg_node_tree to store all the
different kinds of statistics, the histogram, the MCV, and the functional
dependencies, in one datum. Or JSON, maybe. It sounds better than an opaque
bytea blob, although I'd prefer something more relational. For the
functional dependencies, I think we could get away with a simple float
array, so let's do that in the first cut, and revisit this for the MCV and
histogram later.

OK. A second thing was related to the use of schemas in the new system
catalogs. As mentioned in [1]/messages/by-id/CAB7nPqTU40Q5_NSgHVoMJfbyH1HDtqMbFDJ+kwFJSpam35b3Qg@mail.gmail.com., those could be removed.
[1]: /messages/by-id/CAB7nPqTU40Q5_NSgHVoMJfbyH1HDtqMbFDJ+kwFJSpam35b3Qg@mail.gmail.com.

Separate columns for the functional dependencies, the MCVs,
and the histogram, probably makes sense anyway.

Probably..
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#156

Dean Rasheed

dean.a.rasheed@gmail.com

over 9 years ago

In reply to: Michael Paquier (#155)

Re: multivariate statistics (v19)

On 4 October 2016 at 04:25, Michael Paquier <michael.paquier@gmail.com> wrote:

OK. A second thing was related to the use of schemas in the new system
catalogs. As mentioned in [1], those could be removed.
[1]: /messages/by-id/CAB7nPqTU40Q5_NSgHVoMJfbyH1HDtqMbFDJ+kwFJSpam35b3Qg@mail.gmail.com.

That doesn't work, because if the intention is to be able to one day
support statistics across multiple tables, you can't assume that the
statistics are in the same schema as the table.

In fact, if multi-table statistics are to be allowed in the future, I
think you want to move away from thinking of statistics as depending
on and referring to a single table, and handle them more like views --
i.e, store a pg_node_tree representing the from_clause and add
multiple dependencies at statistics creation time. That was what I was
getting at upthread when I suggested the alternate syntax, and also
answers Tomas' question about how JOIN might one day be supported.

Of course, if we don't think that we will ever support multi-table
statistics, that all goes away, and you may as well make the
statistics name local to the table, but I think that's a bit limiting.
One way or the other, I think this is a question that needs to be
answered now. My vote is to leave expansion room to support
multi-table statistics in the future.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#157

Dean Rasheed

dean.a.rasheed@gmail.com

over 9 years ago

In reply to: Heikki Linnakangas (#152)

Re: multivariate statistics (v19)

On 30 September 2016 at 12:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I fear that using "statistics" as the name of the new object might get a bit
awkward. "statistics" is a plural, but we use it as the name of a single
object, like "pants" or "scissors". Not sure I have any better ideas though.
"estimator"? "statistics collection"? Or perhaps it should be singular,
"statistic". I note that you actually called the system table
"pg_mv_statistic", in singular.

I think it's OK. The functional dependency is a single statistic, but
MCV lists and histograms are multiple statistics (multiple facts about
the data sampled), so in general when you create one of these new
objects, you are creating multiple statistics about the data. Also I
find "CREATE STATISTIC" just sounds a bit clumsy compared to "CREATE
STATISTICS".

The convention for naming system catalogs seems to be to use the
singular for tables and plural for views, so I guess we should stick
with that. It doesn't seem like the end of the world that it doesn't
match the user-facing syntax. A bigger concern is the use of "mv" in
the name, because as has already been pointed out, this table may also
in the future be used to store univariate expression and partial
statistics, so I think we should drop the "mv" and go with something
like pg_statistic_ext, or some other more general name.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#158

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Dean Rasheed (#157)

Re: multivariate statistics (v19)

On 10/04/2016 10:49 AM, Dean Rasheed wrote:

On 30 September 2016 at 12:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I fear that using "statistics" as the name of the new object might get a bit
awkward. "statistics" is a plural, but we use it as the name of a single
object, like "pants" or "scissors". Not sure I have any better ideas though.
"estimator"? "statistics collection"? Or perhaps it should be singular,
"statistic". I note that you actually called the system table
"pg_mv_statistic", in singular.

I think it's OK. The functional dependency is a single statistic, but
MCV lists and histograms are multiple statistics (multiple facts about
the data sampled), so in general when you create one of these new
objects, you are creating multiple statistics about the data.

Ok. I don't really have any better ideas, was just hoping that someone
else would.

Also I find "CREATE STATISTIC" just sounds a bit clumsy compared to
"CREATE STATISTICS".

Agreed.

The convention for naming system catalogs seems to be to use the
singular for tables and plural for views, so I guess we should stick
with that.

However, for tables and views, each object you store in those views is a
"table" or "view", but with this thing, the object you store is
"statistics". Would you have a catalog table called "pg_scissor"?

We call the current system table "pg_statistic", though. I agree we
should call it pg_mv_statistic, in singular, to follow the example of
pg_statistic.

Of course, the user-friendly system view on top of that is called
"pg_stats", just to confuse things more :-).

It doesn't seem like the end of the world that it doesn't
match the user-facing syntax. A bigger concern is the use of "mv" in
the name, because as has already been pointed out, this table may also
in the future be used to store univariate expression and partial
statistics, so I think we should drop the "mv" and go with something
like pg_statistic_ext, or some other more general name.

Also, "mv" makes me think of materialized views, which is completely
unrelated to this.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#159

Gavin Flower

GavinFlower@archidevsys.co.nz

over 9 years ago

In reply to: Dean Rasheed (#156)

Re: multivariate statistics (v19)

On 04/10/16 20:37, Dean Rasheed wrote:

On 4 October 2016 at 04:25, Michael Paquier <michael.paquier@gmail.com> wrote:

OK. A second thing was related to the use of schemas in the new system
catalogs. As mentioned in [1], those could be removed.
[1]: /messages/by-id/CAB7nPqTU40Q5_NSgHVoMJfbyH1HDtqMbFDJ+kwFJSpam35b3Qg@mail.gmail.com.

That doesn't work, because if the intention is to be able to one day
support statistics across multiple tables, you can't assume that the
statistics are in the same schema as the table.

In fact, if multi-table statistics are to be allowed in the future, I
think you want to move away from thinking of statistics as depending
on and referring to a single table, and handle them more like views --
i.e, store a pg_node_tree representing the from_clause and add
multiple dependencies at statistics creation time. That was what I was
getting at upthread when I suggested the alternate syntax, and also
answers Tomas' question about how JOIN might one day be supported.

Of course, if we don't think that we will ever support multi-table
statistics, that all goes away, and you may as well make the
statistics name local to the table, but I think that's a bit limiting.
One way or the other, I think this is a question that needs to be
answered now. My vote is to leave expansion room to support
multi-table statistics in the future.

Regards,
Dean

I can see multi-table statistics being useful if one is trying to
optimise indexes for multiple joins.

Am assuming that the statistics can be accessed by the user as well as
the planner? (I've only lightly followed this thread, so I might have
missed, significant relevant details!)

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#160

Dean Rasheed

dean.a.rasheed@gmail.com

over 9 years ago

In reply to: Heikki Linnakangas (#158)

Re: multivariate statistics (v19)

On 4 October 2016 at 09:15, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

However, for tables and views, each object you store in those views is a
"table" or "view", but with this thing, the object you store is
"statistics". Would you have a catalog table called "pg_scissor"?

No, probably not (unless it was storing individual scissor blades).

However, in this case, we have related pre-existing catalog tables, so...

We call the current system table "pg_statistic", though. I agree we should
call it pg_mv_statistic, in singular, to follow the example of pg_statistic.

Of course, the user-friendly system view on top of that is called
"pg_stats", just to confuse things more :-).

I agree. Given where we are, with a pg_statistic table and a pg_stats
view, I think the least worst solution is to have a pg_statistic_ext
table, and then maybe a pg_stats_ext view.

It doesn't seem like the end of the world that it doesn't
match the user-facing syntax. A bigger concern is the use of "mv" in
the name, because as has already been pointed out, this table may also
in the future be used to store univariate expression and partial
statistics, so I think we should drop the "mv" and go with something
like pg_statistic_ext, or some other more general name.

Also, "mv" makes me think of materialized views, which is completely
unrelated to this.

Yeah, I hadn't thought of that.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#161

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 9 years ago

In reply to: Dean Rasheed (#160)

Re: multivariate statistics (v19)

Hi everyone,

thanks for the reviews. Let me sum the feedback so far, and outline my
plans for the next patch version that I'd like to submit for CF 2016-11.

1) syntax changes

I agree with the changes proposed by Dean, although only a subset of the
syntax is going to be supported until we add support for either join or
partial statistics. So something like this:

CREATE STATISTICS name
[ WITH (options) ]
ON (column1, column2 [, ...])
FROM table

That should be a difficult change.

2) catalog names

I'm not sure what are the best names, so I'm fine with using whatever is
the consensus.

That being said, I'm not sure I like extending the catalog to also
support non-multivariate statistics (like for example statistics on
expressions). While that would be a clearly useful feature, it seems
like a slightly different use case and perhaps a separate catalog would
be better. So maybe pg_statistic_ext is not the best name.

3) special data type(s) to store statistics

I agree using an opaque bytea value is not very nice. I see Heikki
proposed using something like pg_node_tree, and maybe storing all the
statistics in a single value.

I assume the pg_node_tree was meant only as an inspiration how to build
pseudo-type on top of a varlena value. I agree that's a good idea, and I
plan to do something like that - say adding pg_mcv, pg_histogram,
pg_ndistinct and pg_dependencies data types.

Heikki also mentioned that maybe JSONB would be a good way to store the
statistics. I don't think so - firstly, it only supports a subset of
data types, so we'd be unable to store statistics for some data types
(or we'd have to store them as text, which sucks). Also, there's a fair
amount of smartness in how the statistics are stored (e.g. how the
histogram bucket boundaries are deduplicated, or how the estimation uses
the serialized representation directly). We'd lose all of that when
using JSONB.

Similarly for storing all the statistics in a single value - I see no
reason why keeping the statistics in separate columns would be a bad
idea (after all, that's kinda the point of relational databases). Also,
there are perfectly valid cases when the caller only needs a particular
type of statistic - e.g. when estimating GROUP BY we'll only need the
ndistinct coefficients. Why should we force the caller to fetch and
detoast everything, and throw away probably 99% of that?

So my plan here is to define pseudo types similar to how pg_node_tree is
defined. That does not seem like a tremendous amount of work.

4) functional dependencies

Several people mentioned they don't like how functional dependencies are
detected at ANALYZE time, particularly that there's a sudden jump
between 0 and 1. Instead, a continuous "dependency degree" between 0 and
1 was proposed.

I'm fine with that, although that makes "clause reduction" (deciding
that we don't need to estimate one of the clauses at all, as it's
implied by some other clause) impossible. But that's fine, the
functional dependencies will still be much less expensive than the other
statistics.

I'm wondering how will this interact with transitivity, though. IIRC the
current implementation is able to detect transitive dependencies and use
that to reduce storage space etc.

In any case, this significantly complicates the functional dependencies,
which were meant as a trivial type of statistics, mostly to establish
the shared infrastructure. Which brings me to ndistinct.

5) ndistinct

So far, the ndistinct coefficients were lumped at the very end of the
patch, and the statistic was only built but not used for any sort of
estimation. I agree with Dean that perhaps it'd be better to move this
to the very beginning, and use it as the simplest statistic to build the
infrastructure instead of functional dependencies (which only gets truer
due to the changes in functional dependencies, discussed in the
preceding section).

I think it's probably a good idea and I plan to do that, so the patch
series will probably look like this:

* 001 - CREATE STATISTICS infrastucture with ndistinct coefficients
* 002 - use ndistinct coefficients to improve GROUP BY estimates
* 003 - use ndistinct coefficients in clausesel.c (not sure)
* 004 - add functional dependencies (build + clausesel.c)
* 005 - add multivariate MCV (build + clausesel.c)
* 006 - add multivariate histograms (build + clausesel.c)

I'm not sure about using the ndistinct coefficients in clausesel.c to
estimate regular conditions - it's the place for which ndistinct
coefficients were originally proposed by Kyotaro-san, but I seem to
remember it was non-trivial to choose the best statistics when there
were other types of stats available. But I'll look into that.

6) combining statistics

I've decided not to re-submit this part of the patch until the basic
functionality gets in. I do think it's a very useful feature (despite
having my doubts about the existing implementation), but it clearly
distracts people.

Instead, the patch will use some simple selection strategy (e.g. using a
single statistics covering most conditions) or perhaps something more
advanced (e.g. non-overlapping statistics). But nothing complicated.

7) enriching the query plan

Sadly, none of the reviews provides any sort of feedback on how to
enrich the query plan with information about statistics (instead of
doing that in clausesel.c in ad-hoc ephemeral manner).

So I'm still a bit stuck on this :-(

8) join statistics

Not directly related to the current patch, but I recommend reading this
paper quantifying impact of each part of query optimizer (estimates,
cost model, plan enumeration):

http://www.vldb.org/pvldb/vol9/p204-leis.pdf

The one conclusion that I take from it is we really need to think about
improving the join estimates, somehow. Because it's by far the most
significant source of issues (and the hardest one to fix).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#162

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 9 years ago

In reply to: Tomas Vondra (#161)

1 attachment(s)

Re: multivariate statistics (v19)

Hi,

Attached is v20 of the multivariate statistics patch series, doing
mostly the changes outlined in the preceding e-mail from October 11.

The patch series currently has these parts:

* 0001 : (FIX) teach pull_varno about RestrictInfo
* 0002 : (PATCH) shared infrastructure and ndistinct coefficients
* 0003 : (PATCH) functional dependencies (only the ANALYZE part)
* 0004 : (PATCH) selectivity estimation using functional dependencies
* 0005 : (PATCH) multivariate MCV lists
* 0006 : (PATCH) multivariate histograms
* 0007 : (WIP) selectivity estimation using ndistinct coefficients
* 0008 : (WIP) use multiple statistics for estimation
* 0009 : (WIP) psql tab completion basics

Let me elaborate about the main changes in this version:

1) rework CREATE STATISTICS to what Dean Rasheed proposed in [1]/messages/by-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=VbJ78VQFA@mail.gmail.com:
-----------------------------------------------------------------------

CREATE STATISTICS name WITH (options) ON (columns) FROM table

This allows adding support for statistics on joins, expressions
referencing multiple tables, and partial statistics (with WHERE
predicates, similar to indexes). Although those things are not
implemented (and I don't know if/when that happens), it's good the
syntax supports it.

I've been thinking about using "CREATE STATISTIC" instead, but I decided
to stick with "STATISTICS" for two reasons. Firstly it's possible to
create multiple statistics in a single command, for example by using
WITH (mcv,histogram). And secondly, we already hava "ALTER TABLE ... SET
STATISTICS n" (although that tweaks the statistics target for a column,
not the statistics on the column).

2) no changes to catalog names
-----------------------------------------------------------------------

Clearly, naming things is one of the hardest things in computer science.
I don't have a good idea what names would be better than the current
ones. In any case, this is fairly trivial to do.

3) special data types for statistics
-----------------------------------------------------------------------

Heikki proposed to invent a new data type, similar to pg_node_tree. I do
agree that storing the stats in plain bytea (i.e. catalog having bytea
columns) was not particularly convenient, but I'm not sure how much of
pg_node_tree Heikki wanted to copy.

In particular, I'm not sure whether Heikki's idea was store all the
statistics together in a single Datum, serialized into a text string
(similar to pg_node_tree).

I don't think that would be a good idea, as the statistics may be quite
large and complex, and deserializing them from text format would be
quite expensive. For pg_node_tree that's not a major issue because the
values are usually fairly small. Similarly, packing everything into a
single datum would force the planner to parse/unpack everything, even if
it needs just a small piece (e.g. the ndistinct coefficients, but not
histograms).

So I've decided to invent new data types, one for each statistic type:

* pg_ndistinct
* pg_dependencies
* pg_mcv_list
* pg_histogram

Similarly to pg_node_tree those data types only support output, i.e.
both 'recv' and 'in' functions do elog(ERROR). But while pg_node_tree is
stored as text, those new data types are still bytea.

I do believe this is a good solution, and it allows casting the data
types to text easily, as it simply calls the out function.

The statistics however do not store attnums in the bytea, just indexes
into pg_mv_statistic.stakeys. That means the out functions can't print
column names in the output, or values (because without the attnum we
don't know the type, and thus can't lookup the proper out function).

I don't think there's a good solution for that (I was thinking about
storing the attnums/typeoid in the statistics itself, but that seems
fairly ugly). And I'm quite happy with those new data types.

4) replace functional dependencies with ndistinct (in the first patch)
-----------------------------------------------------------------------

As the ndistinct coeffients are simpler than functional dependencies,
I've decided to use them in the fist patch in the series, which
implements the shared infrastructure. This does not mean throwing away
functional dependencies entirely, just moving them to a later patch.

5) rework of ndistinct coefficients
-----------------------------------------------------------------------

The ndistinct coefficients were also significantly reworked. Instead of
computing and storing the value for the exact combination of attributes,
the new version computes ndistinct for all combinations of attributes.

So for example with CREATE STATISTICS x ON (a,b,c) the old patch only
computed ndistinct on (a,b,c), while the new patch computes ndistinct on
{(a,b,c), (a,b), (a,c), (b,c)}. This makes it way more powerful.

The first patch (0002) only uses this in estimate_num_groups to improve
GROUP BY estimates. A later patch (0007) shows how it might be used for
selectivity estimation, but it's a very early WIP at this point.

Also, I'm not sure we should use ndistinct coefficients this way,
because of the "homogenity" assumption, similarly to functional
dependencies. Functional dependencies are used only for selectivity
estimation, so it's quite easy not to use them if they don't work for
that purpose. But ndistinct coefficients are also used for GROUP BY
estimation, where the homogenity assumption is not such a big deal. So I
expect people to add ndistinct, get better GROUP BY estimates but
sometimes worse selectivity estimates - not great, I guess.

But the selectivity estimation using ndistinct coefficients is very
simple right now - in particular it does not use the per-clause
selectivities at all, it simply assumes the whole selectivity is
1/ndistinct for the combination of columns.

Functional dependencies use this formula to combine the selectivities:

P(a,b) = P(a) * [f + (1-f)*P(b)]

so maybe there's something similar for ndistinct coefficients? I mean,
let's say we know ndistinc(a), ndistinct(b), ndistinct(a,b) and P(a)
and P(b). How do we compute P(a,b)?

5) rework functional dependencies
-----------------------------------------------------------------------

Based on Dean's feedback, I've reworked functional dependencies to use
continuous "degree" of validity (instead of true/false behavior,
resulting in sudden changes in behavior).

This significantly reduced the amount of code, because the old patch
tried to identify transitive dependencies (to minimize time and storage
requirements). Switching to continuous degree makes this impossible (or
at least far more complicated), so I've simply ripped all of this out.

This means the statistics will be larger and ANALYZE will take more
time, the differences are fairly small in practice, and the estimation
actually seems to work better.

6) MCV and histogram changes
-----------------------------------------------------------------------

Those statistics types are mostly unchanged, except for a few minor bug
fixes and removal of remove max_mcv_items and max_buckets options.

Those options were meant to allow users to limit the size of the
statistics, but the implementation was ignoring them so far. So I've
ripped them out, and if needed we may reintroduce them later.

7) no more (elaborate) combinations of statistics
-----------------------------------------------------------------------

I've ripped out the patch that combined multiple statistics in very
elaborate way - it was overly complex, possibly wrong, but most
importantly it distracted people from the preceding patches. So I've
ripped this out, and instead replaced that with a very simple approach
that allows using multiple statistics on different subsets if the clause
list. So for example

WHERE (a=1) AND (b=1) AND (c=1) AND (d=1)

may benefit from two statistics, one on (a,b) and second on (c,d). It's
very simple approach, but it does the trick for many cases and is better
than "single statistics" limitation.

The 0008 patch is actually very simple, essentially adding just a loop
into the code blocks, so I think it's quite likely this will get merged
into the preceding patches.

8) reduce table sizes used in regression tests
-----------------------------------------------------------------------

Some of the regression tests used quite large tables (with up to 1M
rows), which had two issues - long runtimes and unstability (because the
ANALYZE sample is only 30k rows, so there were sometimes small changes
due to picking a different sample). I've limited the table sizes to 30k
rows.

8) open / unsolved questions
-----------------------------------------------------------------------

The main open question is still whether clausesel.c is the best place to
do all the heavy lifting (particularly matching clauses and statistics,
and deciding which statistics to use). I suspect some of that should be
done elsewhere (earlier in the planning), enriching the query tree
somehow. Then clausesel.c would "only" compute the estimates, and it
would also allow showing the info in EXPLAIN.

I'm not particularly happy with the changes in claselist_selectivity
look right now - there are three almost identical blocks, so this would
deserve some refactoring. But I'd like to get some feedback first.

regards

[1]: /messages/by-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=VbJ78VQFA@mail.gmail.com
/messages/by-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=VbJ78VQFA@mail.gmail.com

[2]: /messages/by-id/1c7e4e63-769b-f8ce-f245-85ef4f59fcba@iki.fi
/messages/by-id/1c7e4e63-769b-f8ce-f245-85ef4f59fcba@iki.fi

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#163

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Simon Riggs (#12)

Re: WIP: multivariate statistics / proof of concept

[ reviving an old multivariate statistics thread ]

On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 12 October 2014 23:00, Tomas Vondra <tv@fuzzy.cz> wrote:

It however seems to be working sufficiently well at this point, enough
to get some useful feedback. So here we go.

This looks interesting and useful.

What I'd like to check before a detailed review is that this has
sufficient applicability to be useful.

My understanding is that Q9 and Q18 of TPC-H have poor plans as a
result of multi-column stats errors.

Could you look at those queries and confirm that this patch can
produce better plans for them?

Tomas, did you ever do any testing in this area? One of my
colleagues, Rafia Sabih, recently did some testing of TPC-H queries @
20 GB. Q18 actually doesn't complete at all right now because of an
issue with the new simplehash implementation. I reported it to Andres
and he tracked it down, but hasn't posted the patch yet - see
http://archives.postgresql.org/message-id/20161115192802.jfbec5s6ougxwicp@alap3.anarazel.de

Of the remaining queries, the slowest are Q9 and Q20, and both of them
have serious estimation errors. On Q9, things go wrong here:

-> Merge Join
(cost=5225092.04..6595105.57 rows=154 width=47) (actual
time=103592.821..149335.010 rows=6503988 loops=1)
Merge Cond:
(partsupp.ps_partkey = lineitem.l_partkey)
Join Filter:
(lineitem.l_suppkey = partsupp.ps_suppkey)
Rows Removed by Join Filter: 19511964
-> Index Scan using
idx_partsupp_partkey on partsupp (cost=0.43..781956.32 rows=15999792
width=22) (actual time=0.044..11825.481 rows=15999881 loops=1)
-> Sort
(cost=5224967.03..5245348.02 rows=8152396 width=45) (actual
time=103592.505..112205.444 rows=26015949 loops=1)
Sort Key: part.p_partkey
Sort Method: quicksort
Memory: 704733kB
-> Hash Join
(cost=127278.36..4289121.18 rows=8152396 width=45) (actual
time=1084.370..94732.951 rows=6503988 loops=1)
Hash Cond:
(lineitem.l_partkey = part.p_partkey)
-> Seq Scan on
lineitem (cost=0.00..3630339.08 rows=119994608 width=41) (actual
time=0.015..33355.637 rows=119994608 loops=1)
-> Hash
(cost=123743.07..123743.07 rows=282823 width=4) (actual
time=1083.686..1083.686 rows=216867 loops=1)
Buckets:
524288 Batches: 1 Memory Usage: 11721kB
-> Gather
(cost=1000.00..123743.07 rows=282823 width=4) (actual
time=0.418..926.283 rows=216867 loops=1)
Workers
Planned: 4
Workers
Launched: 4
->
Parallel Seq Scan on part (cost=0.00..94460.77 rows=70706 width=4)
(actual time=0.063..962.909 rows=43373 loops=5)

Filter: ((p_name)::text ~~ '%grey%'::text)

Rows Removed by Filter: 756627

The estimate for the index scan on partsupp is essentially perfect,
and the lineitem-part join is off by about 3x. However, the merge
join is off by about 4000x, which is real bad.

On Q20, things go wrong here:

-> Merge Join (cost=5928271.92..6411281.44
rows=278 width=16) (actual time=77887.963..136614.284 rows=118124
loops=1)
Merge Cond: ((lineitem.l_partkey =
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
Rows Removed by Join Filter: 242
-> GroupAggregate
(cost=5363980.40..5691151.45 rows=9681876 width=48) (actual
time=76672.726..131482.677 rows=10890067 loops=1)
Group Key: lineitem.l_partkey,
lineitem.l_suppkey
-> Sort
(cost=5363980.40..5409466.13 rows=18194291 width=21) (actual
time=76672.661..86405.882 rows=18194084 loops=1)
Sort Key: lineitem.l_partkey,
lineitem.l_suppkey
Sort Method: external merge
Disk: 551376kB
-> Bitmap Heap Scan on
lineitem (cost=466716.05..3170023.42 rows=18194291 width=21) (actual
time=13735.552..39289.995 rows=18195269 loops=1)
Recheck Cond:
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))
Heap Blocks: exact=2230011
-> Bitmap Index Scan on
idx_lineitem_shipdate (cost=0.00..462167.48 rows=18194291 width=0)
(actual time=11771.173..11771.173 rows=18195269 loops=1)
Index Cond:
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))
-> Sort (cost=564291.52..567827.56
rows=1414417 width=24) (actual time=1214.812..1264.356 rows=173936
loops=1)
Sort Key: partsupp.ps_partkey,
partsupp.ps_suppkey
Sort Method: quicksort Memory: 19733kB
-> Nested Loop
(cost=1000.43..419796.26 rows=1414417 width=24) (actual
time=0.447..985.562 rows=173936 loops=1)
-> Gather
(cost=1000.00..99501.07 rows=40403 width=4) (actual time=0.390..34.476
rows=43484 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on
part (cost=0.00..94460.77 rows=10101 width=4) (actual
time=0.143..527.665 rows=8697 loops=5)
Filter:
((p_name)::text ~~ 'beige%'::text)
Rows Removed by
Filter: 791303
-> Index Scan using
idx_partsupp_partkey on partsupp (cost=0.43..7.58 rows=35 width=20)
(actual time=0.017..0.019 rows=4 loops=43484)
Index Cond: (ps_partkey =
part.p_partkey)

The estimate for the GroupAggregate feeding one side of the merge join
is quite accurate. The estimate for the part-partsupp join on the
other side is off by 8x. Then things get much worse: the estimate for
the merge join is off by 400x.

I'm not really sure whether the multivariate statistics stuff will fix
this kind of case or not, but if it did it would be awesome.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#164

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 9 years ago

In reply to: Robert Haas (#163)

Re: WIP: multivariate statistics / proof of concept

On 11/21/2016 11:10 PM, Robert Haas wrote:

[ reviving an old multivariate statistics thread ]

On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 12 October 2014 23:00, Tomas Vondra <tv@fuzzy.cz> wrote:

It however seems to be working sufficiently well at this point, enough
to get some useful feedback. So here we go.

This looks interesting and useful.

What I'd like to check before a detailed review is that this has
sufficient applicability to be useful.

My understanding is that Q9 and Q18 of TPC-H have poor plans as a
result of multi-column stats errors.

Could you look at those queries and confirm that this patch can
produce better plans for them?

Tomas, did you ever do any testing in this area? One of my
colleagues, Rafia Sabih, recently did some testing of TPC-H queries @
20 GB. Q18 actually doesn't complete at all right now because of an
issue with the new simplehash implementation. I reported it to Andres
and he tracked it down, but hasn't posted the patch yet - see
http://archives.postgresql.org/message-id/20161115192802.jfbec5s6ougxwicp@alap3.anarazel.de

Of the remaining queries, the slowest are Q9 and Q20, and both of them
have serious estimation errors. On Q9, things go wrong here:

-> Merge Join
(cost=5225092.04..6595105.57 rows=154 width=47) (actual
time=103592.821..149335.010 rows=6503988 loops=1)
Merge Cond:
(partsupp.ps_partkey = lineitem.l_partkey)
Join Filter:
(lineitem.l_suppkey = partsupp.ps_suppkey)
Rows Removed by Join Filter: 19511964
-> Index Scan using
[snip]

Rows Removed by Filter: 756627

The estimate for the index scan on partsupp is essentially perfect,
and the lineitem-part join is off by about 3x. However, the merge
join is off by about 4000x, which is real bad.

The patch only deals with statistics on base relations, no joins, at
this point. It's meant to be extended in that direction, so the syntax
supports it, but at this point that's all. No joins.

That being said, this estimate should be improved in 9.6, when you
create a foreign key between the tables. In fact, that patch was exactly
about Q9.

This is how the join estimate looks on scale 1 without the FK between
the two tables:

QUERY PLAN
-----------------------------------------------------------------------
Merge Join (cost=19.19..700980.12 rows=2404 width=261)
Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND
(lineitem.l_suppkey = partsupp.ps_suppkey))
-> Index Scan using idx_lineitem_part_supp on lineitem
(cost=0.43..605856.84 rows=6001117 width=117)
-> Index Scan using partsupp_pkey on partsupp
(cost=0.42..61141.76 rows=800000 width=144)
(4 rows)

and with the foreign key:

QUERY PLAN
-----------------------------------------------------------------------
Merge Join (cost=19.19..700980.12 rows=6001117 width=261)
(actual rows=6001215 loops=1)
Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND
(lineitem.l_suppkey = partsupp.ps_suppkey))
-> Index Scan using idx_lineitem_part_supp on lineitem
(cost=0.43..605856.84 rows=6001117 width=117)
(actual rows=6001215 loops=1)
-> Index Scan using partsupp_pkey on partsupp
(cost=0.42..61141.76 rows=800000 width=144)
(actual rows=6001672 loops=1)
Planning time: 3.840 ms
Execution time: 21987.913 ms
(6 rows)

On Q20, things go wrong here:

[snip]

The estimate for the GroupAggregate feeding one side of the merge join
is quite accurate. The estimate for the part-partsupp join on the
other side is off by 8x. Then things get much worse: the estimate for
the merge join is off by 400x.

Well, most of the estimation error comes from the join, but sadly the
aggregate makes using the foreign keys impossible - at least in the
current version. I don't know if it can be improved, somehow.

I'm not really sure whether the multivariate statistics stuff will fix
this kind of case or not, but if it did it would be awesome.

Join statistics are something I'd like to add eventually, but I don't
see how it could happen in the first version. Also, the patch received
no reviews this CF, and making it even larger is unlikely to make it
more attractive.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#165

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Tomas Vondra (#164)

Re: WIP: multivariate statistics / proof of concept

On Tue, Nov 22, 2016 at 2:42 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 11/21/2016 11:10 PM, Robert Haas wrote:

[ reviving an old multivariate statistics thread ]

On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs <simon@2ndquadrant.com>
wrote:

On 12 October 2014 23:00, Tomas Vondra <tv@fuzzy.cz> wrote:

It however seems to be working sufficiently well at this point, enough

to get some useful feedback. So here we go.

This looks interesting and useful.

What I'd like to check before a detailed review is that this has
sufficient applicability to be useful.

My understanding is that Q9 and Q18 of TPC-H have poor plans as a
result of multi-column stats errors.

Could you look at those queries and confirm that this patch can
produce better plans for them?

Tomas, did you ever do any testing in this area? One of my
colleagues, Rafia Sabih, recently did some testing of TPC-H queries @
20 GB. Q18 actually doesn't complete at all right now because of an
issue with the new simplehash implementation. I reported it to Andres
and he tracked it down, but hasn't posted the patch yet - see
http://archives.postgresql.org/message-id/20161115192802.jfb
ec5s6ougxwicp@alap3.anarazel.de

Of the remaining queries, the slowest are Q9 and Q20, and both of them
have serious estimation errors. On Q9, things go wrong here:

-> Merge Join
(cost=5225092.04..6595105.57 rows=154 width=47) (actual
time=103592.821..149335.010 rows=6503988 loops=1)
Merge Cond:
(partsupp.ps_partkey = lineitem.l_partkey)
Join Filter:
(lineitem.l_suppkey = partsupp.ps_suppkey)
Rows Removed by Join Filter:
19511964
-> Index Scan using

[snip]

Rows Removed by Filter: 756627

The estimate for the index scan on partsupp is essentially perfect,
and the lineitem-part join is off by about 3x. However, the merge
join is off by about 4000x, which is real bad.

The patch only deals with statistics on base relations, no joins, at this
point. It's meant to be extended in that direction, so the syntax supports
it, but at this point that's all. No joins.

That being said, this estimate should be improved in 9.6, when you create
a foreign key between the tables. In fact, that patch was exactly about Q9.

This is how the join estimate looks on scale 1 without the FK between the
two tables:

QUERY PLAN
-----------------------------------------------------------------------
Merge Join (cost=19.19..700980.12 rows=2404 width=261)
Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND
(lineitem.l_suppkey = partsupp.ps_suppkey))
-> Index Scan using idx_lineitem_part_supp on lineitem
(cost=0.43..605856.84 rows=6001117 width=117)
-> Index Scan using partsupp_pkey on partsupp
(cost=0.42..61141.76 rows=800000 width=144)
(4 rows)

and with the foreign key:

QUERY PLAN
-----------------------------------------------------------------------
Merge Join (cost=19.19..700980.12 rows=6001117 width=261)
(actual rows=6001215 loops=1)
Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND
(lineitem.l_suppkey = partsupp.ps_suppkey))
-> Index Scan using idx_lineitem_part_supp on lineitem
(cost=0.43..605856.84 rows=6001117 width=117)
(actual rows=6001215 loops=1)
-> Index Scan using partsupp_pkey on partsupp
(cost=0.42..61141.76 rows=800000 width=144)
(actual rows=6001672 loops=1)
Planning time: 3.840 ms
Execution time: 21987.913 ms
(6 rows)

On Q20, things go wrong here:

[snip]

The estimate for the GroupAggregate feeding one side of the merge join
is quite accurate. The estimate for the part-partsupp join on the
other side is off by 8x. Then things get much worse: the estimate for
the merge join is off by 400x.

Well, most of the estimation error comes from the join, but sadly the
aggregate makes using the foreign keys impossible - at least in the current
version. I don't know if it can be improved, somehow.

I'm not really sure whether the multivariate statistics stuff will fix

this kind of case or not, but if it did it would be awesome.

Join statistics are something I'd like to add eventually, but I don't see
how it could happen in the first version. Also, the patch received no
reviews this CF, and making it even larger is unlikely to make it more
attractive.

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

#166

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

about 9 years ago

In reply to: Tomas Vondra (#162)

Re: multivariate statistics (v19)

Hi Tomas,

On 2016/10/30 4:23, Tomas Vondra wrote:

Hi,

Attached is v20 of the multivariate statistics patch series, doing mostly
the changes outlined in the preceding e-mail from October 11.

The patch series currently has these parts:

* 0001 : (FIX) teach pull_varno about RestrictInfo
* 0002 : (PATCH) shared infrastructure and ndistinct coefficients
* 0003 : (PATCH) functional dependencies (only the ANALYZE part)
* 0004 : (PATCH) selectivity estimation using functional dependencies
* 0005 : (PATCH) multivariate MCV lists
* 0006 : (PATCH) multivariate histograms
* 0007 : (WIP) selectivity estimation using ndistinct coefficients
* 0008 : (WIP) use multiple statistics for estimation
* 0009 : (WIP) psql tab completion basics

Unfortunately, this failed to compile because of the duplicate_oids error.
Partitioning patch consumed same OIDs as used in this patch.

I will try to read the patches in some more detail, but in the meantime,
here are some comments/nitpicks on the documentation:

No updates to doc/src/sgml/catalogs.sgml?

+  <para>
+   The examples presented in <xref linkend="row-estimation-examples"> used
+   statistics about individual columns to compute selectivity estimates.
+   When estimating conditions on multiple columns, the planner assumes
+   independence and multiplies the selectivities. When the columns are
+   correlated, the independence assumption is violated, and the estimates
+   may be seriously off, resulting in poor plan choices.
+  </para>

The term independence is used in isolation - independence of what?
Independence of the distributions of values in separate columns? Also,
the phrase "seriously off" could perhaps be replaced by more rigorous
terminology; it might be unclear to some readers. Perhaps: wildly
inaccurate, :)

+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual
time=0.031..2.870 rows=100 loops=1)
+   Filter: (a = 1)
+   Rows Removed by Filter: 9900
+ Planning time: 0.092 ms
+ Execution time: 3.103 ms

Is there a reason why examples in "67.2. Multivariate Statistics" (like
the one above) use EXPLAIN ANALYZE, whereas those in "67.1. Row Estimation
Examples" (also, other relevant chapters) uses just EXPLAIN.

+   the final 0.01% estimate. The plan however shows that this results in
+   a significant under-estimate, as the actual number of rows matching the

s/under-estimate/underestimate/g

+  <para>
+   For additional details about multivariate statistics, see
+   <filename>src/backend/utils/mvstats/README.statsc</>. There are additional
+   <literal>README</> for each type of statistics, mentioned in the following
+   sections.
+  </para>

Referring to source tree READMEs seems novel around this portion of the
documentation, but I think not too far away, there are some references.
This is under the VII. Internals chapter anyway, so that might be OK.

In any case, s/README.statsc/README.stats/g

Also, s/additional README/additional READMEs/g (tags omitted for brevity)

+    used in definitions of database normal forms. When simplified, saying
that
+    <literal>b</> is functionally dependent on <literal>a</> means that

Maybe, s/When simplified/In simple terms/g

+    In normalized databases, only functional dependencies on primary keys
+    and super keys are allowed. In practice however many data sets are not
+    fully normalized, for example thanks to intentional denormalization for
+    performance reasons. The table <literal>t</> is an example of a data
+    with functional dependencies. As <literal>a=b</> for all rows in the
+    table, <literal>a</> is functionally dependent on <literal>b</> and
+    <literal>b</> is functionally dependent on <literal>a</literal>.

"super keys" sounds like a new term.

s/for example thanks to/for example, thanks to/g (or due to instead of
thanks to)

How about: s/an example of a data with/an example of a schema with/g

Perhaps, s/a=b/a = b/g (additional white space)

+ Similarly to per-column statistics, multivariate statistics are stored in

I notice that "similar to" is used more often than "similarly to". But
that might be OK.

+ This shows that the statistics is defined on table <structname>t</>,

Perhaps: the statistics is -> the statistics are or the statistic is

+     lists <structfield>attnums</structfield> of the columns (references
+     <structname>pg_attribute</structname>).

While this text may be OK on the catalog description page, it might be
better to expand attnums here as "attribute numbers" dropping the
parenthesized phrase altogether.

+<programlisting>
+SELECT pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE staname = 's1';
+
+ pg_mv_stats_dependencies_show
+-------------------------------
+ (1) => 2, (2) => 1
+(1 row)
+</programlisting>

Couldn't this somehow show actual column names, instead of attribute numbers?

Will read more later.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#167

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 9 years ago

In reply to: Amit Langote (#166)

1 attachment(s)

Re: multivariate statistics (v19)

Hi Amit,

attached is v21 of the patch series, rebased to current master
(resolving the duplicate OID and a few trivial merge conflicts), and
also fixing some of the issues you reported.

On 12/12/2016 12:26 PM, Amit Langote wrote:

Hi Tomas,

On 2016/10/30 4:23, Tomas Vondra wrote:

Hi,

Attached is v20 of the multivariate statistics patch series, doing mostly
the changes outlined in the preceding e-mail from October 11.

The patch series currently has these parts:

* 0001 : (FIX) teach pull_varno about RestrictInfo
* 0002 : (PATCH) shared infrastructure and ndistinct coefficients
* 0003 : (PATCH) functional dependencies (only the ANALYZE part)
* 0004 : (PATCH) selectivity estimation using functional dependencies
* 0005 : (PATCH) multivariate MCV lists
* 0006 : (PATCH) multivariate histograms
* 0007 : (WIP) selectivity estimation using ndistinct coefficients
* 0008 : (WIP) use multiple statistics for estimation
* 0009 : (WIP) psql tab completion basics

Unfortunately, this failed to compile because of the duplicate_oids error.
Partitioning patch consumed same OIDs as used in this patch.

Fixed, should compile fine now (even each patch in the series).

I will try to read the patches in some more detail, but in the meantime,
here are some comments/nitpicks on the documentation:

No updates to doc/src/sgml/catalogs.sgml?

Good point. I've added a section for the pg_mv_statistic catalog.

+  <para>
+   The examples presented in <xref linkend="row-estimation-examples"> used
+   statistics about individual columns to compute selectivity estimates.
+   When estimating conditions on multiple columns, the planner assumes
+   independence and multiplies the selectivities. When the columns are
+   correlated, the independence assumption is violated, and the estimates
+   may be seriously off, resulting in poor plan choices.
+  </para>
The term independence is used in isolation - independence of what?
Independence of the distributions of values in separate columns? Also,
the phrase "seriously off" could perhaps be replaced by more rigorous
terminology; it might be unclear to some readers. Perhaps: wildly
inaccurate, :)

I've reworded this to "independence of the conditions" and "off by
several orders of magnitude". Hope that's better.

+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual
time=0.031..2.870 rows=100 loops=1)
+   Filter: (a = 1)
+   Rows Removed by Filter: 9900
+ Planning time: 0.092 ms
+ Execution time: 3.103 ms
Is there a reason why examples in "67.2. Multivariate Statistics" (like
the one above) use EXPLAIN ANALYZE, whereas those in "67.1. Row Estimation
Examples" (also, other relevant chapters) uses just EXPLAIN.

Yes, the reason is that while 67.1 shows how the optimizer estimates row
counts and constructs the plan (so EXPLAIN is sufficient), 67.2
demonstrates how the estimates are inaccurate with respect to the actual
row counts. Thus the EXPLAIN ANALYZE.

+   the final 0.01% estimate. The plan however shows that this results in
+   a significant under-estimate, as the actual number of rows matching the
s/under-estimate/underestimate/g
+  <para>
+   For additional details about multivariate statistics, see
+   <filename>src/backend/utils/mvstats/README.statsc</>. There are additional
+   <literal>README</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
Referring to source tree READMEs seems novel around this portion of the
documentation, but I think not too far away, there are some references.
This is under the VII. Internals chapter anyway, so that might be OK.

I think the there's a threshold when the detail becomes too detailed for
the sgml docs - say, when it discusses some implementation details, at
which point a README is more appropriate. I don't know if I got it
entirely right with the docs, though, so perhaps some bits may move in
either direction.

In any case, s/README.statsc/README.stats/g

Also, s/additional README/additional READMEs/g (tags omitted for brevity)
+    used in definitions of database normal forms. When simplified, saying
that
+    <literal>b</> is functionally dependent on <literal>a</> means that

Fixed.

Maybe, s/When simplified/In simple terms/g

+    In normalized databases, only functional dependencies on primary keys
+    and super keys are allowed. In practice however many data sets are not
+    fully normalized, for example thanks to intentional denormalization for
+    performance reasons. The table <literal>t</> is an example of a data
+    with functional dependencies. As <literal>a=b</> for all rows in the
+    table, <literal>a</> is functionally dependent on <literal>b</> and
+    <literal>b</> is functionally dependent on <literal>a</literal>.

"super keys" sounds like a new term.

Actually no, "super key" is a term defined in normal forms.

s/for example thanks to/for example, thanks to/g (or due to instead of
thanks to)

How about: s/an example of a data with/an example of a schema with/g

I think "example of data set" is better. Reworded.

Perhaps, s/a=b/a = b/g (additional white space)

+ Similarly to per-column statistics, multivariate statistics are stored in

I notice that "similar to" is used more often than "similarly to". But
that might be OK.

Not sure.

+ This shows that the statistics is defined on table <structname>t</>,

Perhaps: the statistics is -> the statistics are or the statistic is

As that paragraph is only about functional dependencies, I think
'statistic is' is more appropriate.

+     lists <structfield>attnums</structfield> of the columns (references
+     <structname>pg_attribute</structname>).
While this text may be OK on the catalog description page, it might be
better to expand attnums here as "attribute numbers" dropping the
parenthesized phrase altogether.

Not sure. I've reworded it like this:

This shows that the statistic is defined on table <structname>t</>,
<structfield>attnums</structfield> lists attribute numbers of columns
(references <structname>pg_attribute</structname>). It also shows

Does that sound better?

+<programlisting>
+SELECT pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE staname = 's1';
+
+ pg_mv_stats_dependencies_show
+-------------------------------
+ (1) => 2, (2) => 1
+(1 row)
+</programlisting>

Couldn't this somehow show actual column names, instead of attribute numbers?

Yeah, I was thinking about that too. The trouble is that's table-level
metadata, so we don't have that kind of info serialized within the data
type (e.g. because it would not handle column renames etc.).

It might be possible to explicitly pass the table OID as a parameter of
the function, but it seemed a bit ugly to me.

FWIW, as I wrote in this thread, the place where this patch series needs
feedback most desperately is integration into the optimizer. Currently
all the magic happens in clausesel.c and does not leave it.I think it
would be good to move some of that (particularly the choice of
statistics to apply) to an earlier stage, and store the information
within the plan tree itself, so that it's available outside clausesel.c
(e.g. for EXPLAIN - showing which stats were picked seems useful).

I was thinking it might work similarly to the foreign key estimation
patch (100340e2). It might even be more efficient, as the current code
may end repeating the selection of statistics multiple times. But
enriching the plan tree turned out to be way more invasive than I'm
comfortable with (but maybe that'd be OK).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

multivariate-stats-v21.tgzapplication/x-compressed-tar; name=multivariate-stats-v21.tgzDownload

�IOX�\ys��������MU��M��$:�*��l\eK~�����TC`H"Hf���_w��$@�G��U�J����>~�=�q��tc)�ew���m�^DAHE�������+faw���#��_s{-bg��>�>���>8�����N&����pt0����__A��?���XD�.��������c��HN���l�8r8����dz88���`4O������&�Z�ap��s���M��y7�J(�{������}�^�E���puj��|�I���.��8����y��8�u��~�N��������`���7��i���Cj�mc��F
#
-+��2z����L�`.<_��R��BG*�����ieu�]T��f���������~�Q/�=����|d�Y�e���/�Y�`!���h?����mY�7�C���b�G��=z���|��P�c���Gb��E��i5{�c!�������L:�tz�)H���u����S���A,����|h[����s�T%3_�K_�&�n�;��8���J��~��z��7��+u�����n����_���Go'�*���M|Gwg,#9��J������j&#����!����wO�	�F��\������W<����|������6��Le�-5}�C�H'l���O��
e��g8�z��Ag0)��f����J*���_N��� ��0�O�%���.���+I�4���u�)\/��w!ct��@�}���w��6���w�)�����2�]l�:����{bY��X_����e�������E�F��8����
\O�^��]'����x2�R�<�����?����+^����L�P�����~_�#)�G��T��?M���?>l��C����9h+����w���V
�����~�s�f�����z��p+0���|ZF�:�
Uly�u�"���s��h�"��@>���X�����������-?N�~qu~vs�7g7��o^���@�$\^�?Y�
?^]��X`�C��W�o����g�o������
��\��7������~����3�ru~q����w_�P�Zf�����(�]�o�o�`[*�0E+C��15���aL7�������_���I�.��������x���ARM�)c���V!�L�!�
G�
���"
��j�!=��%q�Y����9I!KUqD�]�!��{�!��q����<��b4�Z�R����a8Ph|�4"D��B3T�`v)6H4��~����L�������r���FlfR���H|:��
&��t-T�<!���u�@CL�'���r���$�#�Nw"����������K���5�2pH����YTm��������Q3��o��|?|�&k�d�:��8<x(2�N,���MV�b�V��j�!��4�R(�����L$J��5��S��|Ic-i��UFFEt�)/%�)U��5��1D��P�����i>J��$_.J�x������0��m�@�[!c�f��A[~������g���:�C8����OP�Z�@]����kEWt5��<[�(M�x�
	C�B��D��W�rt�F�1��� �q�
!)JvM��b�-����H��/�Rq��H�-?t��:��$���tw}Bw���|'��9�e��Q��{w���������������sa����5����(\��Q�M����09�F$>�
�(���p��y�(�������9 �}/._����S��(H�36��h�sv3�>`l?%�E�
�^�0k�
1J���an�AN#T@uDx��!������R���'�k�(Yk���P��"A�o����=*QdA�'��`��J�p����h�����nF�B_����L��� �������Q��A�� �g��BJ���%��IdT��2�,� �s�r����M�����L:�c�F�RT���}d���H��,�Xjf����2���I�����%Eor������Y	��"����3#��)~#K���v�!z����;�����\�`��
fZG,��<�$"}�z�����L��=�!d����OL���(r=�������-��+��2�|%J��pL%sEq�&��%��(tT�cRVLQ|�|H���B������+*V�L��1�Gc�GX��%��b)�D5$�>e
'"[��;
}JMH,=���0����$�M�D���A���g�vZ��-`��:7tx�J-V~�$���o�������%U���4�"�x�n_����+���Z�����P���D�@�3&�8���zZ$��v0Kp�0+1;��(��"2�,n&���`�
�:�A�p|gyg;uTC���4sS!E�N��'[J���K�M&�:JJ.8����������E �b�HKt�f:]~��{����/�
�[����jT��`[,)[i�j�Q!Dl~��S5R�i:+W���T�Z"*x�E������L���ii��<A��a�*T�X�Z8�N�����jK0��x�)4��5�e����{�H����T��������Sa�j�D�*�D�l^F�-�d��{J�ub�g-G�&j��Y8B��J�
�)����
G[���V�l4uK%��&��2���s;�z��^T������R��,jyq��������Y�{���W�������I��UF��GY+B�r"o�
#9#�q�,����.,�d0���J��`Y?[��T\V`b�EIT�MTy�XnS
��0�4j��N�Pw)sj�:<*������O7�1���d���V������Gr��\��o������Y�IMzQGExTv�LW�eR2���_�`�84�,�|a��j�Q�������[���pi�xl�-��zV�v=U�2�T��M�^��+��"��t�}E��'���m���q��N�5��"�`���r��dQB�DJ���^������e�+����\��#�q�h��}������T#`�m��t�YG��E����
�.Yd��+
�H�I��&�������p4,wmt`0W�R��m[f��6��i����������������|,~��>�H�j:�i���#I������I��A�d���G)+�x�~��y�{V��vl��v\4�MSw(���v�Nv(��M;�uO���]�?�v|���������^>�Z�~�o���2�����w@P�'�����Ewu�������~�L�iE��������oz��l
b+w���_�"�d;�A�x���d�<�y.R�j$���>��'�Aj����*D���eH
���?��K�������|���������W�{�G���u���nk����FM������������<3�2�]��|��C6��k�ZhZ���=x
/�Pe�!-�~�D��������{Z���y_�4
���I���t/���
/�n\�<EN�K������F����{�wO���z�R-�_x:�T�4���S�m��#���B�-g�_���9�a��=��'��h%-�����L�a����}�g�����r���\��da��S��%l�����h���_�._�L�������L�M/B*�l���m���/��Hp5��P���,�KZ���}��8������`�,�O���.�9wDO�Z��,=��1�|�tTe�����F+����|������F��R�	��������x�K#|�^�M�Co����B�����,��d_�v~��5����t8��3�z��9}d�%����;�Q!%/���������sc�B!l�tz����{<Y}T��4KEbSh,��l��v%xXz����4��f#eF��A�@@�a�����0���;�h�QL,�{�����i�En�n����~���������Pld�<���a*.��B��P�<�0�)�L�'	d����g�� �����yF�Nm�S�\���M1�d��� ��<�v]5%�@�������t8>��w������r�����+C|��w��@��Zi�)zVP���?*��0�{����d����;U�%����B���t�9r^:6���9�kM>����O{U�@����]��}$�9p���y��4��r���Ue��{n"���4���I�
~��b�T�BK��,��_��&H2�G7U	�<�C7O���0���c4.2�Q�03B�4Z���
#~d��Y��Y���p����]y�l�i�g�nH������E���bxq�r?7�v#�P�^h�5��e�9���T�T�=`1��m���Q�m�QF,{����q���������%�*��C��kW�������y<��zl��O���	d����|-�?+o�6^nGL�D$�t��C��Q71�
���M���v�?���_Cj������p�;Ach	xuq������W��W7��/_��>q|K`�I?�.��D���~�CR�����A�<���B��	*��B�����n��6��H��\�&:��J���b)@,����b�Qh��s-������#�ax��Q�oP
���
b7�D��I�[�8a[gY�c�/?�_��U��>�'�>`������z�wS>+vV��������r>�=J8�L��`�]�����&K��ymu���>k��P�:��+�UN�������G2��3jI*!�i�!��X��%K}2�
�iw)��u��������g�.����)���Fe�5�.��>���|~�+��AE��I/\��|��`$3�*>����m�����L��s���Q���3��t�h`�C���p/:�?�YN?z���9��N�vE*�B����tz�tJ�o���sq����L��"�;/m�����G|�5�����?26�U��ELr�/e���$:�I����w���:�������Bf�'|o�<�\���h`,n�>���S���f��E��x�"X�1����9�4������L�"J�V\�����s������6l&&S*2��I8�������X�����J?t��ms�Qg��`��u�i����J��#	����y���cq��`�OG�G0`:�1``�5�C,����ix0��`8�
���������\juyQ�ot�&�!C�,Qp1r/��;����fYq\��9�7|����cJr)I�b%��>�iL_��A3�����1@c>��ya��k�H�b���>a��jMyKoAH��������>�	��O��\��o�^�
��H2�v�>"f�xIy�<�d�C���L>Q���&/D�WSX\�u��e"�<�8>���`�)~���??w�6����a`��nz�7��
c�$�G����5���_�����r	>T�������K8�*����B9�z2�LY��H/����Qy��naS^Rv������N��:���,��
�������6��U��G�I�[���1�b���ikBo 6,�IL;6>�a,
�p���[~2�h�njN��4Zv
���U��
K��tHi����M�l�:_zV�G��m	6�H�Ndnou�t�O��Y�-�Z��
���O�a�
�^B�F��M��tLJ� �\2A�t�F!@��\�x���7�Y��yDig1J5~��C���"��q\��K���^��8-z��c=@��rj���^PnX<���[��ma������&m�����]:c����M�S�l���8bf�_� �"�}S����S�p���+u�^�p,���F5vhYn��(���,��=��q�}�p��
f������p����dzx(�;����������6�GC��������"�D!�O('J�'��?�{��4�lq��+}��f���6�������%�$g��<<
4R�@,k��l���v2��7����������x>��b��#f0)O���_���O�PdR{N���W���jC�m���%����	�V�wEU9�K~��NW�JU�,h��o������(�WfU��"K��z�kRP�
h�At��i��w�*n��>n���g�h6=�"#����m4���Aj�[��Y����_��.|m�k�����`���������T<N�+v���jP����
l��l����;�f��~���E{�~S=�`���6�tqI�f�^S=�X��-�i,��j�n�>�j�P�T+8�T��I��.�;]u������{%��?`��t���^����js����u�B����v�����'��Z�`Z�SD�?d�V��u���4���+�����Q�����=�Q�Ozd(�M�A�&�4��p��p���(���7��Sg��8�H��k��-����(���@�?��c��FQ�C�����z��H;��s j��1�1���G�L�D���X8c1�ZE�@:b�:���&��E
Gj�i�t����%�c`���>�?�{&�Z������6����lCT�2�l��_s+__�nC/P9�������o����%����+h�7u���:=?��>�(9hsY]d��g��n�A�����v�x����[.��1����=����uv��Sa���I��;�}���|��*u<�w�R�6:pX(�lG�fa�a����U����a�X2��)�!�������C�H��x����X����)Q�*�^w�v��0`�p���	J���MhcM�I3�+��,��C�n�"�1-�8 �����9����	�Y��9��On2�F�����X�K{_^4	���
���v�4--W����F&����H��������F�"`0F){[jRd=<(�O�)�B�$�,���F�{�6h|�W�H�q�|n�A�0��v/�������#�T�����F�;eH����t����B�I���5�;����J��ExTi_m��2_Y��AP�%I>���5�����Nj��
�/��[
����w�G4���O�
�^z�}$�$i�����4��+������.�������A�F��06�W:I�6��}�5*�=����9���p���*�1\F����1�(c<����hj�Q����L���`���-��������PX�*3���Y�b�X�,��+��_�����Ao5�����m.� ����Ux��F{Ld�]��r�9M����������7�:z�cl�i<b6������:;�V������i��
��� �TI������4P�����
�9���8R�/i���y����-g�
��v�2j������18���$�������,�����Tp_�	']��px7��F�AR�RN�3�h�(I��E�*v��D|��MdJ�HX��c�QIs2�B1�z/�
|��"�gVf��w��t�����Rs>�3����l-�R�������Z�G�A��Q���lybW8�;R�-��u��B[&��c:�@#�x�p�������s�eh1������(!�H��mu����$+
���K��C�1bi��OVe*t���\<�3Cw�T����C�N�;���b��g�B�����Z�B���R���\�#��s��L�>����5����Z��5
�3�����%�vR�oxq�<M,�X����0����P=:M@��������l���pV�d���c	�.��o�:��������� �/��\�B����$ ���K#{Cdx���6�C.�{���u�sPc['|��z�3u4!1�H6�zu�����
�5�����n�od��X���[�,(���*��g�>N�z���=����'DH>�L����4���T�iY4�E��.�Z�*����>U7�S;�=��K����o�:����*�bO�De���1�T��^�O�(9YL��U{�,�Y���j�L�V��G����z9�y,m���
VQ�r�JFi�VV��s�v�\E����Z��j"��^��a}�<%���HS07�*���y��T�����d�SIQ�F�+C���Gv��$����'�JN�e�.8�t��Z%.]gQ��Sc��a:��VZ�Sf��	e���q�*HI&rLL?#��2���=I�3p}��2q��	����n*���7��>�
a�3��A�R:�L��������R���a�T��V�������j7�������n�|��-����!����:��Wl�M���D�D�gh���d_�l��rd��>!K��2�|�&��]2���9CsM��x����F�2����W��g/�}��E�W��;C�6�O����U��7�uJ��
\��b���8sz&�Pw�k�1���j���^����3�g�fK��z@\�_]��l_u���i����#��R��Or���}�d�������������5	\��[
va���h(/�����_�;��v�� E�{��.+���g<��$��T
��o�Y.�`�Hl"q�y#
�������Z6���>�����%��$?����C��mPre`
 ����x�WSr��vP�a�0D��v���#�s24k�M�'���	/�b��*��k��f�`���;��a}9�����`\��&h�/�j��Sn�a���+r�hc}+��5Em:\T�@~q}t6�^���|6*��n*������tJ�P:�0���G���0Q���>`\��G�#�scHG�HhWh��J���{��Y)�����-�.�����$=|����p�	��5��Y��8�������5q�'���#u�y��9�`;���`� ��L����Q��^2�J6�$�B�kk�p6��)�%�@�v�
8\�'���^z�-Rm0_;�i�e8{�����V�\��f:���(P+Q��'P
�v����<������0��������������g�������}r]��������FZ���r���������c!�^P�hWzOG�t�p��}}u}���Z�z���&.�a�H���w���A�A�"����,{EI�X��0T�'|w�!�E
���+�&)�/�_�ue	9��5-ck�"��S}��hQH����2*	�`��Y`�]�������x9�����#4��l���zo�Z��{����k�yz���j,B����{�;>����f}=���{���LF��.6�O����?������F����|���!:?yu|u�i�>Cw�6P��k��j|�����/_�/��E<�z�^y��������������������gW/�/_3��)^YP��;C+��������	�#������J��K���K���,3-�1��Q�x�1	�L
��MG&��u����_����bA��:�`x��8@��l>3���39x������mna@�p>��B�+��jc>����fLA[���<�Q�
kJwAo�J��<$���>O`l^�,�h����2�����
N�0cg�����X~m����D5 4�~��a���"K�n����� ��� �[��n����{��T/F�R���!��EH��6����`��f,����,c�����`:�8J�6���.����a>:�:X5�"�0�DE��3��!���B�����"�N�
Fqi���$���U�{[&�A��T�T
�#���[���:�\��E�bu|

�,�����?��#�upP�l<�&B� $�:��v�8����d�Hc�-B	�B�������Jt���rZ4����Tk|���[G��4�1�>'��������o��%�����+<���A	^�^�)`~:�f�)4d�R�_�4��f���Ma=��D�C�X��Vex�0�4�.��Q���; ���r|���[�X�r��AU�h�Q���J��8{���z#*�[��z���c������\V�����[7*��.���!UL�3�(a��!�G���J�15d����hQ��i�(�8%Z;�EZ�c��5��d�����.f�,�$�	.����5���e��"�����7���L������t��n�A��S5g�A&_ilB�B6�a���c}�:�K�����'�f �F�Pl_#1�������P���g�~�?�!���t�������=��'\�@8��������>�������p��!7�+���y,D|��d�Ls�k�-y`�j���%%r�����S�8J_�S{{�fC�o�����J�+
}s����^t��m�C
eXs�w�;�9<�L!;�nx�<,y�����KQJQ�����[�b��UT��pZU�"\�{�PV��X9`��/)Qq&tB��#h��E�r)_���N�8|?+���e�(�Aj���#G��wb�����J��#!���(A�~������O�??���GX�`:5��{�v���Od@�d�����D�PP2#R�cD��d�V��
�NdaN�+�����	��,}ar��)�J�]��\�xDRY��R��zTV������������4c�����H=�������������t�\�^<�+���!��"G	����3�!bX��ei�K_�Q�~��C������'���?��(��e��W��%� ����eyk�}�����vG�Xs.��h�G

c���k���T5x�?X��b<�#v�l���s��"�~6�`�w�#!:B�]=�j.�Q{F��Q�5�z����LrO!=!�&��@��=�q�T�����:��(��J�������0��paa�7�j����v+���6�Y1��v��&���s�%"�(��mhU�<Z�x�y���������+����e����Y����W�(!g�k�w��*�|V��#����UJ}���G4��B��H�<����~��`J��G�e�x�qw�c�a�	�[rA�����.���Q��CL�hh��~�<��7��$���#�����6����d6_���3J�r�����~Q(JA�!{��9uf�x�o�����0�E3��n���>���>����e�w��� ��e[CF Sp��1�\C�7rvu��k1`�p���e��r"�� =��	���(|�<�p�y-Q�DQ0��]����5P��(
�_
�(��sq�2$�lm{���{�����;,��1�d4j��8��1��@')�������2�w����5���_�V"x�p���������?72�=s�D	�1�0�u�(tDdgM����%�:9y���*%C������������_�v�/�tI2f �i���i5���`a]k�V9�+n�F#���	\LQ2c����
��i������7$�����q��?\��	�
x`*��s~�7�>�diu��EE�:]�
�/,�&�XBn+�a��8V0s�	�y/14WY��v��g����:$!'����������tk����l�s�5�dI��A&�:�5��@�*5&1n(m,������8��#~I���K������v=x�
�0�`Y���*R,��0��6*���-����W��H��A�����.*N��I"�C��V�IG�-8�d�'���(&��'!�'��D�5R.�l����%��_��y�����Q�'�j�5��XY��=_O��Z{%�NQ�f�i�+����Km�n4kk)%�b-��c�e�y��~��
���AB'��F?"�p0~�(�F|s;���B��F���6{���C���U��%�Z��_�&ka�2s���]�y����������j�l�Ms�k�oc��A����`!RhB����rMY`S�t
�,���E�|�>�{����C�+*����|�����(>��OW���m���~��-���D'h�����x��������j#���\��q������;k��8�������6�Q;��KOT�{�����<>E��hvLg���K�$"��4�+�L����%S���F����V�8/k{���kWh�e�V`��[bn`8��'�v�tR��`T�_���9��V0��u��f\n(���YR��������~��T���`���ND`_���{T@nZ,H�[������S���t��l7����|RZ�����C~��)���9�gP�Z���������RU�_��P��0_��p���V����At�3_�����LQ��s�m���Q�m��=�T���W�'m��)�����:�";z���E2�3�b_^w\��o�h����&�(�Uw%�z"W"vF���g�q2!x;�R���8���P���WU5jb��O�Q���E@����nf����h0e,G���3I�����4�R�}~zf3J����I�E+����l��/�M5+�9��gR�����u�ZY^��N��N(���p�t��+�����r�M+X����OI��D����^���j���f����I�3-�~�C|��m�XC��`���jR��@�F7�����-�������
������uF�&���]���A~b3� ���U���M[A��28-L����������dN��[�C�
��JJh�n8���>�c$%w�af�-�X��(_�l��wAo>�/��_�RS��;�%iv������|km�q��/gv�j]��q�q�v~�v�S�@����^k�c8(-L�W����5wW83��GF!�������F��.	�^�F�������.'t~A�������O_��y�����������x�T�������/�\��E�����[:��vx}uq|y��|���~�(�����1�S���FS�}��P���
C;��Y�e��oY����_i�W�Vg��g����F[#�I��SN��
�Sn�������A���.j]W9���EG��k�Xp�7%G
7�m��b)+}f&z�,�P����k�Q{��\�K��?_����]���x�~t������b�]ZL��+����!��������"�������E%�=e$�(?�S�7S��y������4TI5�y���0����1��y�{
���#��d�[m0����_�f�^=~>�2#���#�����Y������$�%��	��B��I���+��x�����w���_IKm��L����C'$�-��"Q�(T�� ��>F������?�A+�Bk?��T��A0LB�����������*�>{�����',]�`������=�<��`|_��5����Jz�����HC�_$)�Sf$��`���It 
��-�y��`p0h �Sk���9m,��N)BP��e��&�q<��N�����p��:
�{������s�����ru��;�2P�dT�ZR�au�h�h��m�4"_6�O�(-������Y��`��/�����+�����_��6�:����jup��*Dx����.J�}�7u���u~^�� 5r/�e��~��9�"�P�g��i����O/�m_�g[������e�����
Ar���/��<��o�k�B�:�xW<��;���"
]�o^e)E^��cY���@X^��N������S=��2����~m������������SN���o�1����3��
�qKc�c�����e���Sd�Xr��f����%��'�\c�����<��Y����~�a�s
�$L�J����!�F/(��I<���~��+�����v�V��('���a�
��9��a<A��z9���%-�*�2��Xp:��qDA-f�����tzX�����8=!K��t3��g]�����z0������v�L����_�eVr^{��jLH~����a�x�8~A����2/)�h>m�"��yK���e��ZQJm��!�Jv)m��IHq�N��;/�#�E�w;d�/�T�S���rv�.X���7�P(W�u�����n�z�V���4�lTX��Pt/�:4,2���|�a�4�eMm�p�)���,Di��@��L������`���x��-�UH�A<�c���]����up���	������0Uu�{%��	{�0�Rcw8�
SeC����`��%
��8Y<�"��G�t!&1�$p)��iS
�;\%IC��{�<J��b|�,�-���9���>����3��2
�9��|�bL��0�)����-5Fq*�5����qd�:gb�x����4������� :^9iCr����*?�������{�?�;���'�(�g�yr�f;l��gO(hF�?H�������|�x6_�5�_w=#o�A������3����v�;/�U��L�����?�>6������}*�y�������x�
0��g~�������D�a��em�(����j9�B���4g��*��Y�*���i��$�2�Qg0�j����5��@v�X2��������Q�X�.��
,3��!�i?8��`�O�h4��*��|?O��,�*��$mzUh{�T��\O��Zlwn�%�a��9��v�e��7���i�n�^�C�A�������������w�����������81����,����z�m�g�Z��"�$%��glTU���c?�;�]��J�i�P���F�
�$]&V#��O
_��2x	�f�,���������E8P���6?3�K�3�{qcX'��$��X&��,r0���TR0��2*)��w��z��k�CN�1�c�s�Ml/�;�~HN'r�<;*����_��C��d�W����&����}&`�0��(���,��=e:?y���'�k�e7���cq�M[-�b�V����'V*����Xg
���^�;$�l-'���lOi2'��#Z����#[����.S>�;���.\�S�syP�f�
=��r9�j���uh}@H{)43K(E�7&21�R�w�I�C���p�����C�%�I0���J����#�9�\;e�$�*�������e�k�(�IM�Z;�I�
�3�b����^�
�X���JA�zG�&����L���x�������,�N���������0�����Y���1��%��L�v3�n����-�����������JI�JbR?
U1�����.KYtQc9z�3��&H} ww�E�<2�e�h�  ��1���j&����o#	�����Q�Z[���C����N��BS/��@>���q�,o��KZJ4E�����E�q	��D�E���7	�0,��[�������H����p�~�����A6�]Sz�x�
�]���
���M�X0�h�������`Nc5��'�
�e�p��d�:-��)X
����'��Y��nkV��C���!����eH8X_��JN!�4y���B����F<H9�t��$�-��b,�JQxRXl��Wp�=V����������0��|$�(s���i����������W����/����N~���>O�
�,��<N��dHe������)����I��q�F0��#��Q��������0u��0�anFQ�y��q�
0����)t�����U��p�[�e����[
����S2�N�.4Y����E"��s)�8cf�[��`Ho`�\�#G��J��E�����X_���aV$gB�Q�s�����QQZ��q�����
�w���i���U��0QD>1/s���Lk��(��Q&���[t��cv�~p��jq����I�'z��B��HNR�D��P���Y��8����������.XJ`�z���'�h����f��.���l4v���~��w� �q����6[Fb���.~��<��e.t���Q_�:�#�s$�)h$!@h�����UD�����z.dJs�2��C8 ��J��0�/~��!������yIji?�/�#{�KM�����W�����������Z<��>�����8�K�E5�	y�r/d�W��,(�h@A��=1���}$�n��L�)[H��djl�|��1�EJ�D�a4�o����6��)i.�*���V��hv{<�I���������g��}-�|��{��5������j,�r�i���;E�������A:P�;��=�?�����z���Wkp3���A7:A�v���]'L��VS�t�t���<�n�����
�6��8={q�a������ JP?� ��w`�Q�3��a_�Cnn_9��w�4Z�Sr5�%�<F��'u�N��YJ-�b;T������mj�2�8Q*��I���.f6�W�%��������y��	y�����i����s1����Q��,I�k�U��
=\wRp ����i�����M��������/��
�3
�F����ZQ�L`n�`�9�E���n�7�g������x6�=��E����k��)�w�5��������
�j�����3���(��3�q�3:�+���6�GKM��p��!?�E�������hG>_�(��9�;G���[�{&�
q2U�z'kS?&2�"P�qn�&[3�#�aS1�hI�!#�i�$�����D�����6W(Vw�Sbp$���&H9(�Y���g���E�����u%�)p2�K#M�z�Q�W��a�+*�(�I�^�^�s�#����{=80��3�&)���2����(9��S3���b�'�U*j��W�W3�Jq��G��p���F�])h-��t?��[+ha�m���;.�g���<�O�[��[�Wu���q�����Ax$�sH��%0z�Qy�t?��(��{��;�@p�.��Z3*��L/7}>q������r������7���-.#`R-s��t���mU��u��(���cp<$FqQ��U���,�-_�������>�8Dz���HC�mh�L�/40�,E�%x�,d�R0���Da�^N�����&a�" �'�DT�6Io��B��E3y&A�	U-������}���M��,���{����}������a����V[��0�/����������+J�����(�����A����(���mCf�6�!1%��R0�`G����eN��P>1yUP�K"�"�/��'��� �BT��<�1��S�dhr������|B�������T�r���>�B����m�z��������_mRQja��-X�Pn�f./�wN)�k9j�97��g��v����g�������o
U�N)��: s5��e<t��	����&1s�!�%���M�����?��w��z��Kk_���z#�g&��d&���4)���d�o�~�������
0E�{e��H)�����_19k�	� �|��W8����H3�8WWlR��5"�v>UX�K���d�o��?��l��.k�n�&h����XKv.�M�u����l$\�i���5���n�6����fy���m�,��
����b�%\�7���y�GW��C����������Gl�aK|�#������[������njO�TK�W�T�F�Z.\+�tXe���@�����(����K<�^m_B�P
*
��F��`r�]�A9�������a�����'�(�"������u���+����]Q]/zFN}�=6����a�t��t�%��H�D�Iko�R��4W:�c�Ct���e�%B�	��Y���
��\��z\]���{$'v�9q*������<,��h�==n����������&���-(��\OQ�������x�����t�a��X�<�A��>��l�
:
����s��t���q������(���Ln�rf�?e�L����M���N�y'w*���Ih�sH1l�t	9	{v�a)�pVb����/�L9�n#�l�~�i�l�d�K��F�k�\���Lzo�.fGg�xB��h��h�������p0hU��~�Y����T�,G�*@�]�[N�R�b�m�.w5��J��"�����{&���9d�������8f�QY�zJ��\�N��D�B�r�"!�_�����
�zP0n2 H�e�q���upS�]I���a��.�����n�	+g��d�'H�:�,�X!|�^c���W���~NA'7v�k��(�V�{�%������}�Y2�6j{<o�=���I�+�+������}u�y���������W2)��lh��h
�������������vm�rD��������-���o���n��i#I�6H�����o��ix�'QO�������b/XD|����Z�e�TA4R������~k�*�����m��T�,�K�\(�#F���!��Vw��tPI]��J���7�1?�2�?�
F7�Y�����Wo�~~zy%/��L�������c�V�'�u1�@O��1������M�D���&A�Z��t����ja4	R�V'�u%�Z\�	�Clr�ePC�A#�\s��":8��~�y�W���n�pA���F��"U��u���7m�
q�Xvd�|��wQ��q�Mp��B��f�e{�b�O��Q��(�����7q
��,�L�b���h2�@-(:��E�1sP�:�t(=Jg��fD���w&��OH��>�I��E-���t���g�&��Q!1�����2���^�C�0jr�����2�{
���,d=������C�m����g�8��d��{Z9"Qe�&2�����/�EG-�'�����'yC��{4�K*z��x2�u(2�sLjM����;�������w[U�R�V��;�
�a0����G�6��P'��,%�����������>TX/�[��ww�p��9,4�w0��h�>��x�UZv��L����h�&��Q���h�50�%;FF|�X��&A�f�>Lb��"T�\�dK|O����F��6b�O���'�.�� ��zsL�[6{��	���x �p��L�� ��bN�Y�	S���z43#��3.vf��c��@u��9�v��c����+@P�o?���@J�rp8E�3U���������T�Z�T5:�@@���������9i�+}��b����By��G���ff�t
6�/�b�������B������+�en��1����Z��~���.�y�P �������O��@�7���^��]����&��d���7	���^�,�9�/s� �"V;��&�����=:�p�����o�n����
&�zj���a��:�O'Kj�O���X��Jr�)VL��r
f[{�������d>E�<�g���l�|����ET
�61�sb�Z���~B3T�B�:t���Q�����85��[���]P�7{�*Q, �C��|,�x��Q�!����G@{����1(&0A�]UW1:-���g��h8��o��3��m"��N2�]�Fc��d����g���h�#b�k�"[,�������n���O�����-��9��!e��:�a��pfrn+N��/��x���apDsH� I�#c�W�E�i2�f���!<2G)f��EE�PQ��`n��!)��8.�mg|��q2�f�q����g���l#�X^dy������t����C��=��
�09T�u����n�C"&~@�E���.�i0��+n���E
�#���dP%�-�[)�<�*��$�5`�n���>���6VaB�
�
s����&���
�G��3��3��B���m����������v�<�6�_���+�Tw32�eR�J����Z��3���~T�/h��.��.�iM�c�%�I��k`CP%2��|����D��2)93u��H9x�lY�a�c����Z�KU��86�R��|H<,���qA���������h ��v�	#�E�R���	@��aJ��������G�����xbs� �7��3b���K
�;o�* �DS��C�o�B�>CE�����g������E�����q���NS�k��Z��,��7@�?����	~qy���vP���d5[��Jo��:
�o�e��Ss�A�O�kk6�H������
<b�S��*(�8������:w-\�,����"����fA2DR>�q�.�����7�?�����uk I
�Z(��}E"5Z��:_#b�-bB�����9������!����y��sw�����oi�,� =���y�����hl"<I�1��c��(�kg������H}��R�?���k��H��teQ��	1p��	5fy�Z#�A
0�G"���k1�G�q]������s�>��Y�_x�r��3?��9�=���p�S��D�,Z�%}�!H'wp>���Q^�������l�[(4�,����#+
kS�����{-S���Y`J�m�\ �N�����As��Z=h���"4S$��$�&�E�&q��[}\�p�6��e�R�	�g��n\��~�����j�Y|�+{
6M22$� �I�x�Xu ��������[�b��<W����0���>�����)F�����������8>��e5~Bzx�#������pO�O��B6���D�P��G�
���,/5��X��7a���������	��k.WH���WE������Nl/r2��H��$��jI��a|�p��t&���>���p�1m�qU��F6N�d�0�chq�uRcr����u4���z0H�1���o��5p�2�S
+��������W��$�%�\w��C ����S���Ep�����g����S�O6AvF����2��1���'����N�(}��&���1�����m��\�P�}sFwhE�@'��?�P�+/�)���F�J�66���a�8r��A�>!\���7����t���O�[��M��������#� ��?���i�������]�??G^74�e�G�����%"�&�������}������U�g��(zd!����_;!J�B��0��Y|���>0"���}}nrO
��qL�CQW���n��y*y�h{��8<�<Y�!xR�����$\�T��N�$R�(yvY��2
��Kb�eR<E��r��]��|�`v	�A�9�[�J�S��1�Q5K8+��&V%�j�Y��p5_8hG������$3+1z�:hM�A���	�E�:\� $���y	G����q�l�7�vX��/�����e�-d�]�,A~�'��=F$^
X>@	8G4�zH�`�@���]iL������A3�������^���=X��t�YLW:��bj])~��<��)5]�Pi��0#�\�����K���$����n�Ui�vl�t3j�h��H��7d�L��5J����$=��)�z����.��Sc���_5*Vb�(�gei	Z5.V�b��{Y����@v�����7ug��`8�����H���~��H��5$702�g��)����-9�b����-]h��I����~koi���.�Ju:�[��	|��)�=@4�H|z����CTk��P��8���d�e�����
l�p�!����d6�����C���N7�������l?��Z��"�I^	:�{�����3�����AP�����8����k�~�t1Ft�z.>�jec����%��*�����7���0�N���QKk��f
i7`$�,�f�
���u���}k1O'��{O	o|��V�Q��-�$�y�.���G^�`��Hr,p��d�F#_�������������J��+��-d���n
w[��+�����K&mqt����o}�4@���Pq��6_'�R�7�6�rT��wB&GR�.l$�XN�+����e}����N��?�i-�w��/�hd��+�;�8����J���-�RN�>ek�J!8��L4�h�kpT&c~AO�4	����a���y+�n��+���
�w��D��y�g��p��|$��wh�Zl����m�3��8���*)���S@�d2�Q����0���I�7��WDB�U�/�'C���F0��i�������<'����� R��TV��>^0�=����p��]u�/�j��;-+3'Z���n����C�e�6T��.V���)L����tC)@#��������$��>,���r�����`6�F{��KV�e�&�������W��T��9�,��w%���O��e���Cua��g��L�#����+bC�%�����J�5��o�A'l�]����f���Y�YQ,z��wd������t*��n>�U���tv}�`�|��n�3��#�K��u��9���Nz�	Jk��5|��^!1fxNN��V�=���s��p�/�����>���8NU*u�e�-�zGMy�?j��p}K ICP_l�X���n�J�������pwq9,�-{�zI��$q4�B�p��+=3���\�z4��Uc��	�`F�_�;pv����ia��1���3�Q��9�3\{�R��DE��v�)��i���*����ms]���0 4G(���1q�g�,|L���mg�����2g�A{M�M0�&����0�iI�M����!����Pk��(p!��������AgmI3D#Z��m��~�
'�]�E������-���]L<���$�[�~B��D�w-�d�vH�9��h�+
�}<7Q�h�U+,��l���2��xb� 
�}��(1��a0L���}�vY�9Cz=F��f���C�V��Y���-{��c�#H��e���
-�6
4���Aj{����`���`z�^��M��t�������+���o��YWf��i�w3��i�U�q5��������|LS��dB6Zv�\p0�Bdy$R���������>������W?�O{#�RV�*u*l�m�^��Y�t\X�.�0�Z��E���'�L >�$��}Y��9����;���hhjR��cb���5T��"���M����p���je�)&h;�<���b���4O��;p�t�CU�n~)(s}K��%ztP����cG��
{�y�`Pa��1J�&����[�H�D��8f1F}S��s ~U�?���8"�
#7�S�r�=D�8��6.&�)��Asc�T�@1�M�:�3�����p��I5���`>���@ ��t�TL����dxs�!�����=�/t���I����p
jHV�	N0��|���y�k�m%�<�8��q�1Yk9jI����{�n	+���^P���p9�����3L� S"H��gs�S��+��"�������/�ZAC�wV��M��<h���y��������^.���_^���a�IP_��~o�Q��V������+�����9@�0���+�!X�R�<n��rz3hR��3O0�V<G�7����&���K�������'��x�6���Au'd��=��-h�O�3I���Bmf24�=��Th��<����Ge	1>q�d�(��Nz�����xv������yD�3/6F5}O��[�/�^"k,~D��P� d���\�L*��DJ�U��O��wb$@i	 !���K@�(Z1����FA�,$m�qJ�4t��u2�b�8$�q��������l�|D���:0�-�����>L���	�#PGh����H����7��?e�����C����V������3^}����\�������N�O�@��I���<����K������s�g��`�xf��L��������,��4=ca���+�/VEx��j���q�:�Z��Xg~� ��;3v��oI�������	��)�����9u�]��9��iCf))&�?�Di�e�Hb(��d���5q�%=(�Q�>��gD����U���8r��xXPHg:83���Px������9+��x3y���\q�>q��`�?r�[[h����G3!%z+����l
-RVW��_}9h�Y>���IwQ��<}����k��.�����c���e[���%;r_b.^r�m��s��+�=}Z�_��W�X�&�B��ax�ah�]a�
��a�{/���.��@�c���{Z��xXsJ�iq'��I�T�#�p���g��Xf#o6�%+a�2�D�Krgj����N5��l��NQ���!��
�)��2���p
�o�JC=����	�����O�0���o����r
�L�urK���m���'�O<+S��k&u�����9��*fo�4��u���WsG�x������Eqd\��K��o��b�/�����G���06������G��_B���QK�E��6�RFWa�j�"q�U�(/����mZ�M��Qp�������S/�U����Z�!���N�
�D�~~#�N��1���������*������L��r��r�R*]���$
�'��P�4r�#_���3��,������0 �ZM�z�&�L����|��h��������7C��N�\��U=��M��B2����k�C�F��x9Z����p�Ox������3� ���I���<S��{yB����d���Dq;��3�H�t����3^a7��9CrE�~�^������[�%$a	d����)ly>l�� \�|EV(�Da}2����N*p����y=��@dir����3�%�A�RHy��:����'�uI�)����:H*�����2�/-�n�5�����
:)��<
GI8+���vi�4 ��RS���r<��>S���2�S��9e����K��)*��2F��YU4���+����(�!(X�;�h���)�p��q�i�FcD���F��������l�/9�.����l��tc�[v�*�����ELl���L�h$-�����J&q��P����@��9(��.`���b(.�k�4��?��0�nu�*��U:1�:��.�T����g���}S��gk*-�����$9��^K���]�E`����pF
 dAzy����a�lx1O-	������=�i��o�aoY�=C�,���C����
��Y�,D3#�a&��hp���K.���R'}7��H��'������S������x�oL��>'�<�5RL���41������,�����FE��s�Zpd|�+&a|���S~KCKf~&{���R��0-��o�K��o�Bv#��2�!<�/���Y�dg�
<�n��;��V�����le���Y��d���oU��4�a-�'�0��tx��d�����[���7B����������������������1I�K�3�����?���K�������8.�����9�����5F�&xG����{o����3u>a�����8z,@Y���L����u�X�LM�	r0d_~\/��vF��E�;�!�D���PW��j���ED T�E�BLR��R�������}@����|@ffq����/U����F����&��F�����]0�}^bc����M��������{BR�B@�	I�{'l~���b�MO���4F�3}e��Z&q-�r��C|����d�'����JKN��K�R�-���3�)���rDa�:�db!���@-
�PO��]{�80O�(���h���Y��p��a,�QO����t��_�=�z�#�|E��?q� �#�aS�h4�S�H=o�8~���sr���o�����H�q��E/���\��'���#����1i!�����H���jd"���;�(���Z�\�-.v�\y �����V1w�q����"�t�.�8�|�c(����}���G+3vM/�Q.6�G�
�~r2_��b=u�����_�=��q�"-C1�o42�Fl�;2eE&N��*�*>X$���c8��du�����D�w�]l����M�-K�T7U�[\�Yo|g���&�����Q{8��6�������b��G��E�����@����HZ���rR��4����d�i)���U�oW�|[������Y>����������?��z~o��	�I"���E@Rv�`�TQ*|��q�,L��A?��/�>bH���8�$���L��K���F��g1�w��(U
9��pg8�I����$M�8�}I��s��m%���C\,�a�{7���p:�5�:(_��S�$B1��CF�a@M8��R�Q�y�_��+@�=Y#�����I'�C��P&4[�[WWD?��j� I�^DNv���-O�����*}H�O�f��8���Qwb��5Br)c�"�Ey�����2�I?�d6�}����	������L��8rM�����/d{~������>�gOzgt��i����V7��z�~��!�.pwI`�x�
��b�{���J�'��##
O��M�a,�!��'��XF�Z�3K��S�K:�%��;�P��59�t�O��;=���-����D����V.�[�L_��f��.�-��
[���i9��C����n���o1W?��$]�.��L�=�����B/������7��!fW�]a���p���w� T���A
�r#���/*�0"F�o���y�*do�$��=��_�v��k.��x��U).�z�;r��{".x �Zy�l�]��+������fW��!�K����V��^�������A|(\��.*�����g���������dK��;���b��][��s�&�q*bC�lE2+�XQ��x�Y���m'a�-���
�w�M��N!7����Xr���pH�/eg�o+d[��s�.l���9���z���T�0k�����q�
���rSd!�-�1��i2���&s=��J`���y��:�2��u��{�ib���~�������&J���*�(�)����;Q��:-[��Y�y]V[��4���T�
�	��)0x�TS��)��|����D�E��5k���=r������YLt��3�=�����p���D�������)��Cg�����;��-��e�8��Ek�$:	c.b(�V���k�GIx�l�~�`@`\/+�c5
G��}���5	]d�QEb�1�i<�� �a|�QJ��	����K2��.-B������J]��oF�"���t�@���6��oz?��}������N����O�1�Q�AB�0K;�;9F;B�dL4���a���'���4C���=�^�+��m��wr?���{��	�B����Qg,d�o�v8������������/��pw�Q��Q^9����y�����6	�C
�P��f�E�@l�J��3C��wr���9�w$�Aw3���r�JcN�|Tv��h}���E�!�`��6����G�6�V��\�_�>>�T'c�lm{nsX7?B����uN;6ak)]%����fJ_i�"v�^F���WZ�h
}���������������R7���g���L,T�������Q$'��F
�8�3F"Gy2���z�.�t)�.IU5YFb�w��G	�B�J�_�����S&�qD�y?�Q��w4�0��t:@cZU*�X����������������1����K�����hmP��	�A�~�k}��O}�T)G������^�h����*
e�P	��)I��7�5'3_3H�)���k|��� ����x�5�R�G������3 
&g(��4�9��<��_AU���l�6�P!w�'aL�EC�S]�~�-�gl���b#��Dd��\�;�WW�����7�������^���$p�C���i��jD��Ne	���~Z?��\O���;�[��p�B��������0mE�xU	����Zh��(��`���k�������;<hI�|g��r5/]��J��%�<������85���
w�wU*���<�sIj�-i��m�Z�@�N��l�;v"O,h�����'s�N(;EX�G�������}�P����Uyj����n��Y�U���fj�&"��J<uZ�R����#�����kk����3�D�DE���[���q��Op��j����qFVg�f�)�i��EFY���c�@��!�Q������]����U��e�����$Cq-N������w��e�����<�|ye�;�p���CQH\���S
�N~NhQ�� �����Z=� maC�&��!aaK_����>��h�:u�������������v����s��������\SPx���6�:�9���zd��~�lT���&,�e:8�|H�T��0���*BV��y2������a]���>l�&h�d4rb+��m�Z�SN8~
]�8�=�X���N�D+�OU}��f��Sn�J���,��(@C��|��!�GI�9iC&�"'a�I�� �]��wL��}����Z��o/
������2~p�8���l��������R-+S�2,�}�D��0��K_A�t%���}���=��E�y��d�.��W��JjhR�1l�c�]������Y\�WS.��`�U��5��g;����)��|��`ig����l���9���������������i��0(������U������G�9����vh���4��C�"�w�;���akE���&���;��0����7������KI(Mm*�!l�*7�i�|W�I��E�.�2]�RyZ4�3+� �u�B��zS4����<��(-��J���1���*���
���a��?T�K������V������k��:j�U65C�F7c��o�X����$/6���&1��5ixjgw�V�����/.����N����U�������Ya� �����H#���H"5��^Y{�G�j��A��c��A4D�T����+�z��8�vY]�y���go]E�
K��9���&nI�A8r{IO������&��J(�m���VLx]FPk���N>F*�1z�}Y�w��m�r�	ee~�	�3L�����"�mXt^��:���C��{�1�	�=��w�/����Y{�5Yf������2�XH��S���F!��J�b<�y
*�O}����%�=���O&��<���gz�k�0��j��R���Gk�����8DV��[9����U�HcU����R�y�b��
��g��m=�{��d����j����*�6PE���nw��Fc�i�n9�hJ�j)b*o��-�b��|�j���\�xm:���v{Z����/����-eb};�Q1����d��@H<�b���8N����RgPK���e�:/�8���P����!^������]t�#J��������U3��q���v�<��IW`�v�0�}��*�9w���(n�#����FEks,s�����B��\�|���Ah��fu��+�j3�w�H	�C�09���MT�a�
u������vb��Zy����
I�������x��s�����>�V�,���H�^�w!��H�C�R�,�U�����0DT����7�/��.�����������(��V;U����f����hy%�1]����L���k8A��S��S�h�����D<o����"����u����%.^Q���;�~|},�5�e;7g�L������)�����/OO������){��E����UXw��m|\�R�����@�f���u]V��H��M?6��l��q��3}W�K%n>�.
i����m&L�K�x�X1XE������N
KI�r�a��u�B����#�h����.����"�!�sq�q}�5�,���I#AZ�������;8�'y�^Q�� D�9�Xg����S�9>����0#��������G�����%P�%C���U���!�P�8V;��V	��u���vR��W�9�2�f�+����p�.�����s��\R��)�] ��z[�|]�<Lp���G��Z� �x����iq�2�9������n�nS��/��z1����&�t�����tg���r�G�H���=��Y����}tQo���*�!�4���${���g�8�����y���G��C j�3��G���o�(�����UP�7������~2B�;���>q�&u3���@U�rM��@���"�����/&	<?!�|Dk���|*���6It�P4��
7�Y�[�P��J�����H�o�b<�7�J����v�"�3vwe5T�v�dT8�/b
r��i?����h	Q��������Kh#!]�u�P]tn�g�O�K�1�&ymH�-f�}�^�;�v��$�����t���%�L�_�;��=�N��7�$;��ww��j�v��
����vq���l�[t�m4��=�E��E����ms��� &�A�K�e0
�pT�����=N%6��B��Z�V�(��pE�@m��	��jBN�1�k{5���\���~���������&f����}r�(�4���0	�>~<
o�/u|�����/1����������n��5Yf_�d4	)�;��
z�`����Ie��6�N5E��zqy�cr��x&$\�!�q�:�F��S��������l���>��3 �~�X5���
�m;3]A74��@���o��mez��6�L6��������U�����Q��W������}�+��Z�Ij$`�m���:��:V����Q�����w�[�Q)e�1�����/�x�G��Ni�����
�Q��;�e+C1m�2o�5�$iN�*Q���B!����v:��9��s����e��
���W)X"�
H�g#�WT��J�F�1;�$v8�wpz�I)��Oh���+�A�K��Ua��/��JS�s'�R��ia��V]�r��t>d�G	b/h�P�
~	��}
G�-&�cs�Y�j�f����xCf��M��EE��	�f�1�V��z�`?{�,l��>��5T��W��?�xi������$�	���:�������
��"@������}��������)�.�:�on�y7�/�_���PY3�N�a�#�`[���<>�zq~���M�qB,4J��,)��"�r'W���XD>�+����_8��6�;��Z����E�l���Bi�0�Q]���h,xC<�;���xA���c����}�=w���E�:�g�����W0���Z�V�9	9>B`�(-�e��)�b~)���������~m�Y���T/�\)@0[?D�	?�b���#��K
=���9����SQ��S����4����x�u�>��]�8����S��O�rHk-�����>��=��i?������X���Ifiuez�n�>��O�C8���#0q��TK����t���sB����Fp+�g[@`�U����U�A�`��w���&���)D0��G0�&��6��v����?���=o_�%b����Ag��L�1;j�4+���/�!�Wjq$?^jlY7{x����)+!�hS�!�����b�
Zx����z�&1�v�LC�=�gh��AM��F��D'M���6W	R^f(���K�"�l8�KY6�3(�W����E�T�����)�%#q�ZQ�H�-��_�.w�^[zHv��H�^H��*��(jt����w��x)Z�eL+�h�)�i�0<l�����~����m#�a�V�0
|8��0r���*q2:|��2a�Q���t����b������R���\^'���i�C)R��T	���M����b������x,!����$o4[DE�_���P*H���ha�'A�w��{G^������
����C��%����x4�f28���={)�a�� ���y������z�V��w[Amp��l��f���i�EB&���������l���h������*TE�{���������:�������h-{+�M�;�5�T������R���r�dl���-����G��Ju�l�p*�~_����".�S^W�&6�zD9w��xz�OW �3Ae���/�Rt�A�?����|�{q���n����r���%u����M��/��@�R<�����$���i5]~3����3��5F����k�{b���vQ�>;�n_�B�a]�mT���'3��'l�J��G[�%�
��6�5�#���F��������f	?+���#�%
/;����p�K������,K���u�����,`V��M&
e��p��pm-Y'T����ik�&�K����(�:�V����4�W���y��
4��r���k�q��<To�{��$+��n�dM^r�As���KY�R�@������'�Q�C�P0����dUh�:�76�N)#L;��l�Cz���W6e3zX��7�(�']U
���k���o������p>�
l��lg���T�~�;�QM?���L�����@m��l��)��^�c��;	o�8x��H$��u������+^S�Q�(e��I"�8�U�����u`n.=��5�����>�'�on�\�I&�Q�:�;�����B�-i��Q|�L7���0k|P4�(o��gf�I�����(&7�(�kl�����h}yQ<�P��ZI:�kk����xC��%E������Rq:ukk{����F�����e^^
�@]b�����*)9�k	���J������C���~���!-,$!�1��q�Y�e����eiC[����S��������5b����n�y\�G�Yk"}~$����� jf/^�]]��o�F��T����x�R5�Y}Om4����P�D��������������s�9��%h�Z����i��:�Y�s�n`��������x�[2�����4Se��|<�=d�8n��sx����O�^f����������>~��'���y�?`�<���k��V~�6d�\1�\�z���J��2��������xK>vl��r?���x-�������3fs��W��O�v�����ro!a�����ro����a�Z
���^����%-,����[�=~�GV�����+��V}�k�UW������d��hbd��+5�JV,���
W�=�?o__��0�5J����AZ�{�Q�>�3�H({����z����n�����vr�|'�v&�M�q5p�����������Q9
Rh
�_t.�I�
+2Q9+�T��q����������:~u
�i����Z�Z���5:�����I<�<�'�g
��?ur�����s\�fc�I�[�4������~�<��pV�a�����4Q|X�BxZ[��o��������`r�5�HC��:cSH�p�>?��.�jV����[��N��41���
5��fnQ���ptj(��w�Se=�`>�[������������B��n�G)�8�s�%aot�!��{x@�.��W���4e�"�^���Xt��`��\�r�"Saba�e��UK��>���(� S��\�}�
&�a}'�
0t]c�����=�������_W�����F�a����F�D����}2
c:F�A�5�cbk��P��b����77����#Z����x�"�&�UK+^�v���`z�5|������b����)���g��f�{����z����m�u[�b��kg�Ol�a(�����A���fx�����u8�`\z�:���*�GPn<���#����E������P2�/����K���������*��������F�U;X2T�km�Vq^;O.tF��Q�G�����^_^�R9z��14�]��n��-rh���n�1�V��F�ux��d(��"P�
�"����?�{�*��A�����aP�%��lh�
*�����r!�8w�ctM���i��J�#XS ���4��c����G�5ff�f�[�2��m"��K��-\��p��������K3BcE�TG2)����p��`��O�
���u���g,���-����k_��Jr��z�h�����x�
O��9�V'���k����_����V���������zT��d��[gK���^��c�z
������n�\w.��jW����o������i������G�t�)#<9?���<��u@��.���%��H����}-'����A��������4Pt��"����./~����px>�!��������`�~������+��)��_|�m�����:���8���)&�AAQ�7
��g�&g0�RC�8��,�d�������URW��5`��b�9	nB}���R�N\N�;���`�l�$>��o<q��l�E�`��0
�%X����[����O���}��LQ���H�a�����W?=At��'�i@R�f
���������|$�
/]@;�$1#?2�h(O?����F�[�Jl��	;LUV�[�c��*RU�f�A%Ga����~_�Ue�y�ue�l���C�8E�e���ae#yy�K�(��x�n8{5���Ig�K�w�.��50Y8��Nnsl�f�J��p{����B�f3���VE������EZ���c��.-��]�E�*z������x]�-�^����~W{��l�AnPC�N����'�h@]��9�!b+�j����37�a�p�h����v������/�n��oI��$%~�6���H��YO�M�i�!m"�R`fh��C��d(C��K����w�����n"��������~�W}+U�5�LM��K�G�X��$|oUb��U��(�����@��G���Z;����^�	�~X����^��"�s��Wg*><!�H��Qr�:�V��0��N~\�%��c�Z�1��_X�`2jMeA����������gp#��������K S���y��Eg�����?Xj��n�������y�ul%���u��I�(���>���%J���}���%��Mu�
���y��:F7�T>e#�{�����?����`�p��}q�t�p<�Hx��m|���%�M�z1�l���l����{��������"�G/=�U&U��U?:
������lI4U�&G�O>��"B��;Jq����M�a�p$�-�E��^�~:��)O�4���B3�8���OzM�����p�������������<���������O��e8�'��y��7,
��shQy�����s6�Z�
���+��)�/���mKV�r:�[q:
7�[v�����]5�/��$��kzQ�v��u�J��8$j�
�����03�14��p��>��5�q\PswRGx���vt������gzc-�������0���e$#�-���-^�������h�(e��q
cB�����Y�$>d$q�X(�)f��4'��B�]�+���U
�B�3n����H�����1=n`��p4���Z�d:�=��)���=�T�V2�X�L�J�G#vtP����y�#���D�.�sQ������v k���&�s������k�2����5Gsy�����E2�����4�u�6@�{���h�1�7�l����.F��a!���p���+Z<_����W��a�#���R-����x���f�`�Xd���AS�4<��5Oq�����A����:�F����=7!����4l����6����8����.������~���V�����-u��G^!�i���x�O�qZL�����\
�E��z����W-�E;\Xm��������pji���������^������*� �y��Pr�, �Zk4�ak�Z�����P,���)B����;0j�t��S}��p�HP~�~�
���g/��_��w�`D�!��/��.��s���I��k�R��c�H��J�0�������w\2��B��
��/�vwk��B~y����^�'�$d�	�LJ
J���i���8����{���s�����<�J�t����2O%�K��b�^��d.�Ba4+ma��W�pUY����1�"Q��ay	��*X0��V�����/�lw@��	�0+���+�:7m�� ]vj����F1����*��`�q(���y��d�A�~I�������`� Jx�-)'�
M`�O�0�/�����P�$�]����}p?&��KaR��Q�WVI)B�>>;~������H�~?/b���38���v�xvGj���I1{�i��F����V���Y����������#|�y�S?�
;W��j�*�����g��g\����V����2rb��/^� ��_]_a�W����>9sv]z�N������D�}"0��x���AA�g�5����1G��H0��\zL]���g��/�
)�x��|��$e����?`N��>�
e=�f=dr��o�Nrnf��h�Fv�w]$��,��s#�Fl��K��' ���������*zGS�R/NfG�f�JJ�f�Z�9'G��.��n��{��c\�47\���On�N�4}P�Ea����+]�������9K���.���`���7>�r�n�?��p��U�K�*Z��7���������n�5����.y�?�z(_��6�����|��Y�9���_�_d��J�2`���"�lqctP������`�yP����\&k����������W��������kuqy������6�I\��ZI��N�B
\��=�{[Q4�6J�x��u����x����P"���e[�%Ei]�t���N��Q��C��b��F.�����q�SJ��l�E�7+��I��I�1�$%�V�Z��)�d]�t��A�(�����}�>�@!�������d�C������Z.3����1(HdV�mP�
��}��oaXS�����p�J�����������=~��}��5:��)1����MdM��3�(=e�%��+���z��E�_�Tj�KI�Q8���T��f��X���\E�����N��
~+�-BODQ��F���z��T>�Oxg$�.�.���p4$��aI1�|��U��;A�A�}���w��8��[���shz�s��o�?��I����#�G,��j�P����X��l?���1�$�)Pu_��:={��WW��j�}_��<c���E�ql�R��{�����J~�j����A�y�Me�{�����������
z�}_��<C��a0��7���q`������~D
�`z���}�����tf_�/?e��7��s�1�&����OG/������`X�p����MA�DO.X,T8��a��.��z,�4!������������<o��l2�pV����WTk!���WoV��Ac�Q��~�RcK)H�8�`�}���E��^���^#��5�����:>{��2��H���sS��^�[Q�)��=z �T��w2g^�-?��as��`�Q?u���.�M���4�Z:���Z�2=�ZAo��}�.��G�?��*t���k��*��2��9�.�\���]���1��8KO�)%�g�You1���a������X���scK����q�O��P���E�]��&�U�x/���j��69�l��1�.�q5&1|S��O�r^m�"�:�>��[����|�K=�n��
=~��`Ds�z`���A�(����IUb4������W��T�*x���I�� z�9���D-��L$)z}����K����H�����){��#��@���Kh�
�������+%F�r�Zz����`���!:Z�j��nNZ��[v�28���H6���b����6[�T%L,��&sI���e��R #�����xB�c~1?W���W�Reu����?��6w��`e�R��e;�.O6cDl��K�F�z
�P-|��6{��M`�7'�e��A���5G2ho����/7�u����FR�A�&
�p�[��tu6�o�9�n�3B����j����B�X2 rM�J|���a`�f���;��u�����P����Q&��Q��6�L��>Pp�iK,�{�����t�������j�c����8X������w������4Tf9(���7�i�����#c�Mh�d��w�a���
n�^,9B:'F�f��\W(���^���wX�v�k���2��m���d2�jV�����-�C����sq������XQw�U4K��@u�E�|l�p�=��w��+���������{��/��	�F��8>,0�^������-`����*�����4w��:^���O7R���b�X���>Zv�)�|by=G�|���2�m?QZ�7��r�����Ac���9jn/�#��m���MQ?���s���>���T��k��o���*}&K��g��YM����[���\��	�i����)�`~�q*u�|R���a��V����C��x���
9NT������)�u����|5��gV�kbn���G-��Y-��������������qs������m%%n���pgo������6���z���$��nW���������n������Fk����Z��W�_���h5Z��j����o��R���yq��������x�����5��Z�Vo��ja���
v����?z�z��M�:��p��9j����Z����<V��(H����?
�����&7FM`Q��cJ���j����(�F���=n�=n��-������p�c�#A�j�����ce�T�P�J����)J������F�!��A:����%�@8�U�P��J$���ao��W)%���
����R%IV�/��t���.��mee}���-�\�����v��0�Gj��%
��[�<Z�lN�.\���0�'�:���V�:f��h&��MS\��fW�.*��q�($�Yb����p	] �1��s��(���
h��<���l(������zC��`A�>s�E�hL�g���������/�Y!������=����P���&�a�*�fMmm��M��G�p���@�VCmAX�M�lL��S�!�a��yA1q3[//�{f���z��O�qe3��������gE��Z#]���$w��
v�c�Vf�l-�s3
F�\C/�W���Bj��x�l�a.�������w�~S����������-��_����K�{��^��]���%s���y;�R��Ry��W:��$����pAw��g�Q��g��V������������� ���r�5�R����|�}�����d�jK����O������4�����.��V��z����������K�<�-O����������E[��Hp�,���Kz��:i�������;T�����l�+�UG�S��������O<k>�	V^����B������4Bw�K�����u��j�����[�TkAud-(����}�>9����i|��4��~�{�S����)G�"�g����>�q�<�����x"�a���t8��?6�N�
�U)�������H���N���A7k��4h�1~�t�d[(H�d��m��h���A���QR5��g)����^��#/X�T���O����UA`�`�s�|
�,�)�������n4z� ���s�@~��S7����X�r{�����K�vQ��2~����#��~{r�m�n�A�����-�7z`��������x�Kz����BU�����5�j��]�PK����J���]/��O�eN�D������KaG~��q_V~�V�~��gn�gw��Z�2���U��F��5�O������<W�
0U�/Yt"2C#s#�!���J���$��~�����l};���Q�� ������c�	[�`�,B$?��8�<~��n_n<s�����F���H����5"���
T>������~*�in>�,FI�RZ��I�hY5�1�k�$�����-.��u�������Z��5��kS�N�F����zhlx��7����;d��1��`]:7p�4T�-9+��m�o����Y4�NW��M%7�]�'���_�O	�����W<�6���f|`�����d
70�L��n�t�_c;�}!?���nQ:i�H�lD��F���&�<D�{�tM��"Ys� 7���O5BU��>�wh
;@�@p�����Owd`���*��*:7�S��D��)Y�
XxR�
�������6���:J���O
�nMe;`�1�!��n���:;��~���K~?��vrv��<�����3{����S�P������!����!�g�����k�O��[���g-4�[k������n���H`V�50��|���U��k�U4�3cu�Tr�]������V��~���e�M�6d�M��}<��}��g�=m��9�O����E^"�w�]k��d���x�l���m�8��x���|���R*�4$��
��;���q��J9���������c�[��sFU����������-��E�)tk���������� �k$�������`��_X��������p����rva�>H��m�H�j�*�O�h5\L�������J22�V@0�����u���>��%].�R�y��v�(���������d>��S�]����4���)f��d�����Q
��x����B�k�)_���*'���a�-���b�e���C���X�+I��h��'Ot�����}I���:��9�������#~�]�R�r9�'((.�&�%i%����9�8�@�����(Du�h�9��=��Kwr�r*��J���5#��p:�������<9���9{�~qz�~�99����Xt)�Q%7������`�>Zo��QJ�TJ=�rq�!�v�)��;��8�bJ6u�G�a��5�La�nKC����5��f�f�y8h���=�w�.�=*�!���!��bz[L9��&x���I����0C�(��#Uc�,A����k���lzm-�������� ���J!pJp�K~Gg��@��g���w�P�������q��������?76t��5wB[kt�E������������aB��_�;�l_�Rz��MeW�L���+��Lg�U�6�(�
D��8�1Z��N2Y��0���`
>��5���5����c���?C�������V��`L1�Qz6�)� �������#�JO��Td|Z3)@X����Y������\T���U���@W���5d�Y��w1��
�3V���������}�&��;.�~g`Dgf8�W�e|�~����]�*���K�Pl
�]�^g��w�j�j��lu�rL��[��r�2����JO��t�H�0I�9"�h������3<�/N����t:v���K7+F�����}��JW'���/���>]�l���1������cs��p����rPk5�j�������6P�9NZ0��O�Z��2������)�0\�������.����Z�����������,��5hc2E\4`1��T��=q���~�{�"�����[h�]��lw}/8���_��j����]�Pv�������[���A����.]`��p*)��x&�o�����(�m4�e:�)TFwH�����ZV/��W��oN�K��9ec)��|�o+"�;n���[��k���{�)A�z�UD3U�����In��L��z�\Fp�b��R
�i���M��>QN��\U���%�����R�����v��ek+���8#Xv�R#�uN��S�gF�.l�#����ca�1>�Nd&�*�l>
!-�U?�Z�[�N�l����C����q�_;{P��U�e����E�h�m7�tz��$G�W�dD���������'��2a�>����o�n��k9�X�����l�6�C�U��M�s���*����f��������H^�_M���1vO{ZJ����)��fO*~��4�9������'�K����W����TO^��1y����%�0�c����������������}x��
�+��z#w��w������w�x�[�l��_��yw�B���q��o�U��� �s��-i&�
���l
�R����h�q��j��Y<�k��%������	��ZrK�i0���
��F�w��_a�,��������o�������J�	=�]�8�2z����Y�/Z�<[�U�y�u�p/,��zo^�9��q���i}�h�
�
L�����^8���p����4�r�7}4��������Ez�a0b�RG�J�������Q����6%U�]�6��}��~����8���V������?=��y�������c%n���f W?��PD�,V�����Qn
�0�����{U��F�t�=������Y�,���(��w$��.%�J����|�H�0���(	I`�C��UL�E0�y�	l��W|���.f�.>�sq�h��������sN��m}Mg��_�V���������F����]<��m<�W�):�
��]�P\�<�4|����$@�cVAI��������.^��AU��b���M��������j<���_;����:�bu�	F��z����E�b`S	4(P*m�}���h&N M�������U�	�kF�X���1��4z��b������q�ESf��oB�J�����>_S���f�	��tT��V�&�� ��}�6�4�hW�X��*�[�:��
�cF���� �a0�rU�k���PXk}K�%�����4*o������2tU�'s��RV�&��
E������-�E��*��[��nD;`�.����')F���OP��:��$`�p��q�����<J����X������E��5�j ���
6���*�K7
���
�i�|$=����`��m�x��W�����f���.(#d���&�Rp�J&b".��0�(����o�k���$,Y�����]<dC�����f��9��Hr�%���g}k�� @)�y?�.�zw�8z��m0A��D���)
+����$���%=�u��n�!]IV��
G��Y�qVt�
�o�p#�������E<���@
m6��S|5��
&�i���(NP�	3�FCk�I�����(���\Qh['���G���N~G�k7��A<��G
��w��]�����6��GK#6B�/H�i@N�d���=��Q��a�U����'2���R��4��G]�����jt�������r��1�9���,0Xo��m?a�=F1��m}l��k �`��N�����sF��F5"����|�`{(5T���<)�� V�\����C����|�YLm��?�������Ti\�(L��b��J��t��3�g�����V\VZ����3�	��>2��;z��e�1|�U,J*��!0���fLn���U�F#�:Za�%F�	7�%�m��@��ff	r�X���J�B�]lt��>�=����5�:��_G�T���[�����s��.���'�:
(Qm��'
! F8848��h>�!���8|�1y��V�#+Y�
��s���su5B���f|�D�w�5��9tQ���%�����
3��t�����?gx+���j��)�hOD4B-�
�N1,���#�\F��x����@Ml�����������i����@#���%�4�E)8�[�{��2*��GY=R����o���E	_S2������o
bsPd�
e=��U����J����9
=�q[�����l��wxO,�P{��0�UN,�T\|x���@�oP�Hg��s�W�]�G�R���Aa��"��lC*�����n�N�g�~B�d��S%���}%�����_����0ayy�:r�M9i�g��+X�)��A@��Y����A���p��J��E/e��#h�5���KC��(A0�5��@S���@	q�2���<��7��O�|���)����*z�n{���##�/ {��Z����0�iF��6Yc�
$���`
�JX
���������t
WhG�&�{f'��Mx&��7�}�
�D��*�2�\��6X�kB��Qh���==�3O�>��%��]����1	}
�$f/�+KD��h�Z��������3|�O�T�RY����c���0!v���g�z\��]*����36�Cb�c�
���X/!��������j�������$9�+�IMt�E�J]Hd��V��k��s�����,����(�%G������I[������4VT?���)H�(Hxmm�g��g�5�n>���V��R����^k�����W�HS����.���~M��o�*bI��=b��)�
4b0�5�x��>�:����Ynf4���wgcZ7K�%�_�������	bM����_�$��������
%�az�.���4t�,-4��<�fL|����`l>�Q��*��F�(&�cG��h��8��
�}`���O����� �h�S�������6��)P�E������^~b�K[OH�����7q�
[�\+)�Yl�C�E][�����n�~�����R`i�[�E�:�b�W��d�D&O��M�b��Zp���C��,�pbI�7T=���)0v��~�k��\K���M���:yhT�R����P���R��\K]�	5��i���`|h��W7G��
KGg*Z�1&��2X�7��&�d���G��\|b��e����6�f�;R��F�Kk���8Y|��8�.ED�����-Xy���WA5����*����
OG�PWk�Q��d��#���,�Oc@
H���*��3��6 ;Qm��t�����[no������]�������=���������b����Q��8T�(�k������@�~�#7UG�A:��G�H�V�3�u�^���R�W�//�W~��Bkn����9��:�'�S�U�,�~x�����w+n\��d
�����:	���"~IK���o��Px�7����2w�FL�t���N����O_��c8t����!�|�
�A��/��eCx��
�� D�O��o��}�C[M��#��(�� "n���/Y~����s=:eS�Mh���$��D.F;0V�PG�
��e���S���Z2�����X
Wz�]/�vP�����������cJ"&�!���b"7���EF<o���_�B�����O��!�#|v8���EE�����1��Sn�m�d(.(;5���>��F���C�U��������9���dn�Z^c����E ��^��2�Q�����o���}���a)�"������
}���W(6���G��e�<�v����,�%�����A�|
������Ct��r��	LF�O������;�lHL��O�������,P�L"� �i�'�4<�z�@wH#F�������G���������`�c���"������G���[F.�5�!�^��z����
j������xM���eq$|�(����E0��0������I-[���C���4�0�" mg���rO��
�������YMRW�[�p�
u�3w�3t@%��t"��F~�	d#I"!L����?Rd����A��M�58di���d�	��Y�X�����hP�qU�O�/����%���1!�����
���W�)j�dN�#sB=|����*���g��,�&���r�������l�'�3��������/����'zg�Y�����c�U���`�z�c����q�^�xi��"�q��J�\�r���U�L��sp���aXr�}p���$V�����a�W�"!�C���`�_"��5��6�Jh����qzdPC�l�M�:���L�h:H3��,�'�G�#1U#w�rX��k���#��=z�l!�!�u�`=ah��B���@��zv�����S�T�,D2��*l��|�L{���[JI	�T�,.F
2md@��`���s���+�UrN��8N��9��C�>}����w�R�%�r����>����	}��Gc�)��}����6����]hv���U�j����\8M�2��$�9��u�L�&�t��1�X��00�����|<�xYV^t5v+ro���!]lm���np�$�(��F&Zh0��X������S1R4�A
U�/����5u�:���5"�u�K+O�o�K�J��w/Y����;lBW����3�v�id&�@����)O��`�	FR��7A�B��� ���5�
���z�w���n�>Ev����QXx�)�Zq�������l�j��b��5c��Uu����20� m�h��c���8(�zz�C�����"��R�bc�x����_E���Rj�;
�����oI�8�-(�C}t��#��H�p�b�em���cB�Iv���k��SMp�~���/�o2����t�HO��N������E�0�>e�rvH�E�rK��0����Xn�s�Sx�F��$^I��Q�M�����0m����d�h��
5�m��(L�#nH��y�z�X�����K�X��A*��b��#�(�� �F������v�:>N��)��l��b��2;�����sv<3��1�k�c�S�L�+�QG�~z�2c���9�>x8��p�QF3���n�Z������"�3A:����|L-qsh��,�LuKw�4p�Nq
�"��{��z�7�~�l
y�����5*�Q�������S���F���-�<�U�1&�b��%��rw���������i���^,v`#]���$�zs4�t(8s��h���%�$:���p�|���e�}u?�R��5�������77�QB	�B��&���7��\)O�n?��'���W`��i����8H��5�t�0�[���v���~����JW���K�4����$ja�[�����(�����$����G3�6o���)	��Z�vr ��k=m�S�]a��z�Xk���kt����YQ���e��U�@~z���N���:ehO�H�Y����?��G�s��������Ug�����������Nb3h$�PH��~�H88"��Y�wX�%-7E���)D���/�?�do�t&O�m�9
1z�P�)�Ra�7�{3c+����V��������o;��GD��
��/�~(DI_����1���vl���Q\R�U�[�V��9�)P����X�"�� ��|pJ�As8�L�H����L��w�`$P���/<,�������>;R>�Y�D�����~�D���8R9���3��t����W����,�q����!��&���G����<�����$,��)��Y�{)Im�����_��$����?�-sf�Y��5�~�,x�J$�@wt�wa��u>���(�V�)�*����t�E�)�0�������]����$�>%���Y��M<�Y�+	[����5y����x���b]��t���P�K6���������in����~Y���h��� '=�u�G�h�����B�q�`V}���p���=���e�c������i�l�##YG�����'d�CwY��cz@�2�Xy��g���"�{���}�{������k����y�����~�}`�sA��������|#���|s�����(}]+z��%]�����:,?�dO��������2K_l�i����`�����&��=c6��(�E������	:����&��7��pU8`)��!z�K��-���� u@�>	�|Kj���5�?���������]D�L&o������t2�`�(/��7����)��h
������y��O�1� ����+��b��
�*��&�+���(W�D���uX�|�\���W��@���~w�n��#����H/<i���,�%[|�J�Rq�~���!���AQ*��R�V�	$����
�S��o]n_X}=Z��(c	��!�1
)�
�
=&(f����
�/6�<: 2�B���]���F��~$v5��+��6�}�����h�l�|���|
�,�$G����0�0��f��3���l�f��V���#�d��Z���
g����������_�+��>2�����0����>����F��mq��(��z0���y�����>~yz�$�4��_�p��|}|UPc,>-5�y�)�����\��E��BV4�R�_��[y#6p�A�YE�
��1�1�k�F���Da$��x�P8����B�\�)u������u�S �Y��f�B���#��,r2X@@d)j�n9�����m0�G��_A/kk�pl�����Z�KmHEG��R&x�he�E��o�/�y~y�?jk�R��J��jX��-��|C�r
-z
����J�c�$_��3:��p���93��r�[������y�H1��`1�Mx�����^��O���+J��[�V^���������L�c��n�����'�����/VT��������xqD�&�K[�]���f6��_Z�����k�.�wo+w��yK��Ye���E)j�;�^���e`�t���t%�V�TB���(��E������%P���R5�SX����Sc]����a1��sC�d���Ng7B�X5g���TD.��1#�RrR;�g?t��2d'7���K��I���	�VN/�2_X�S�9H�qY2�����z�5?0���-<�^]F��Lu�t�(}5[J;@��%���
���h����"�^{J7!��2�#�He���$S��x�G�t6���y5�EZ8���_�@�e��JN_����G��E�J�&V��u��T����[�5�Zq����W?F���"tD�x�@	.<��V�_�{����C�e�rT�P|��
>���}��w�5:��64DrH���go��l�BGCSN��V%1������b�����s��h3w. 8HF�}��ye;����.���[��m��+~ ���T�&�hk��,�k��� ������u���������=�3o�G~���9M/~���9�$�N�i���,�B�$�u����E)�0�[��8�%��W��&3�'$�w���2w�3�%]�[.�`,���[,��d���
�Q�$��m<��������:}���bF_�����������hA:y
1]#���Y�p���R/���p��/,����(���La��q��	�g��awbW����0��fqlzgs
�����&��U`tbN��vf��9�T9���oe�t������	��v�/_^���t�$!f/q����N����i�`������G����(��$���$������I���}|����9;��\���8��n?�K:q^/c�A��`udg�� Z%���2��`U.���Y����������T�����/�(d�� CVlu�����H�~-C�����^B���YF#C~
������	��
�s������s����:���$��P�l�d4N���Y�G��8A���m:'aL� �#������~6<1yp��U4	���O��h�j�*h�e���|��u��sqQ���bu�".��O����Aw6�Dr$����{��7>�>�	WC�HB���hf���?}..��)�6���������j�����x)vC���y��v���7SR����Q�F��M~�6�ykfF�X}9H����j��a��i�i�^��)x��O0�kU��q
��<$xru}yz��-�mD�gx��!�����k1^�6���������,���$�y��11�&�ha����l
�+�!��*7�I�����="����m;�P�����h)|/�^����68�V�aoP�u���[��`��P�������9�d��,��T����������(+u�P��j����*�l[���;Y�M��He������$��i<��F��-Rd����vs�_*=�-�K�i�1��,���&���S��U��
.�h��r:�k�?�MbY?�u�W��TKa U�2�`FD�W#�J�:YJ�4�u�?��Sh�4�@A�����1�tf�>�m����~s�����{n�FZ�Fv<��dj���	��f����8^q��{���i3#��,q��-�L�>���Pe{�of�j���VIh���*f"���{'hO���t�K��/�#�H9T��j��p������K����o���E���$�[�r�+����h�c����#$o���8���n��R[{�4YN���z����	�Z4����hV��N�0��L(%:����P�:���^H�r����P�k ��b�Q�G,�����l�V���kk����q�#Yk-)�����]�N�����-),�A0���%�e�&,�!,.��w�����q��./�'W�-/��8��
.88�@����c2:v�Y�6���n1~�o���_��V��A��m�V��������!�e�I	����K�K�����q�s���?�zE��fko}�1az����V�
��7T���@�d�,M�m -��:h��Y����)�4�K��}ZEO`�%HK�V�A$3@�Gm4����!���Il�����n��x�p���+�C^o�8���I��X�;qP��Y`w�8���i|��Xm+����S��D���6d��{���V��@,��D�U5���U�"����m����r�+�I!1�x1�*-@d0����g�b�0�<j���}��S=�~�)�E��&��{�F�����6Vc������!vh�Xc�����>i�6������uX������g
�7o�0@�E@MQ��%������}v����p�����'�Wb��!�\��{�������&�Q�-?���
!� ������������n3Y4�"�!����C%��������u��d�.���V�K�<�+�<2�<s������3���]�k�!(��XI�FL
�����x8$���3H���:9����9�N�Q|TQ]���,�����-��X�z����Z�5����/>�9MdOkN!�k�x\���%Pk*q8����\�o8m8s�&?9'�)i��oIn��:��%)�6�����,��?��B�;�8#@&@D�,�V43�=Kv$�]K�U�� ���s=����=G��#��R�����p���Dn���$�������
��o�^����f������	�����~c�8
A�S��e���.�`yW
`R	Wp����d��W�1�0���=��o)x~3%�K�L��.�r9��f��r/�*����+[06G�U�3k�]�I�dX)!��$��$���9V	d����@[���Pa
���n�0\ m�4���L��:�J�&&�����.���<�8a�r��k�+�V�W�<��w~�����Lgq��-�U4)��@�7��I�G����7�
���:%�{p������~onO�x��������@�� 2���h�{X���E�&�
���u��H�z�<�5�Z�hwm�����n���qBxeL�z&��\+�kZ�V4�F�J"Z��fE'Y����fS���������K<��&������I��d
�q��1�kB�rA����S��kE����r�9&��)��nm����b�J�e���H�����/��hv�b�S����
�4}�d�l"��&��_���4my�c�s��
�L��E[� X~�a��Qx�����n�I-h���e�L&Y�j��[�HL��q��
� ���$Db�%������sS&!�����-r��������j�(��#'S��fd�%:=�������,\�'��i���fM~]����].��N-������rB<��vx��?�Rf��b]�xh<�':�������.�[�h�ux�h�f8�4��\a�G+N��q�V���0J�,��y��
f����
;�}�������+e��Z���W�eh���TC��>��6�~��~�$a�_�C��N.���mu}�����u�v"%������
�:�{�}}t6�]�N?lz0��+���w�����Deu~�JR����������aAwd��Xi
a��������esL\,b�i�:���;Xy�)K����hL8��x���9Ym���*���t�W��3��C21���0L'dQ�C�}�@���=���d�!����[Q������z<��W�E�|y}����}y�'���I>e����}r�x�UT������F����<B�>�,�%�	�o�
n,<��g��~���c%�KWr��T,q��+�GH]��D���}�V��B����6?�~�~��oN��C�T���[�?���r�~S���V��e(���%��77@#a���P��Gle�S�U�_�h�j��7�c��f����G�eq������Q]=V��!��j*�6R�y���72-���p�)���x����[�x�D<_38�����B����������p��� ���i�?�/�/3}|t�
bK�P~(�x���2����`�1�o���s�����K7�c������!����������9���?-��M��Q��b��g�3Wg���G����tzs����T�Q���Io8��q����������x��TM]�CZ��+(���m������l-m����=����������\�{���p�������0u-�����B3�d�%s��`~�[�.�����|fxK���[��n_�>��Z���y�bT������n��^������f+k��bs+�Y����RG�v�c�w���.����<B�J�<b;O]+��0����P��cku�R[�sYR��%���6����av��zv�F^����5�>��A-��X8����'S���L(m��x�D��'���D�d���	�%[�r�WS����zV�����j���`N\|��f������zP�v��z�`U����2��S��A+��
��e���I�y����W�������W���+���R]���v���Q]��=�{*�j����Z���^7�����
yM-!+��������+�x��{���h���`��l�\Y�^�e4I���-r�0E��6�4�q2��`HD ��X���+S(��*��mn��]I'6�N�)�l������
-m&3�B������� ��$��We%���X�'��l_�^Q��������V��n�78V�x�����GH?�S�-�l �3?��b��Q8���'�n�����N��F��G7���g����|����U�(3��r�JF��Q��j[�c+U��E��J����A�
U������V�A�
z�.P�2��K��M��z�������w�4��r�?��*��r���0�	��+�_�#^UW=F��p�fSJ��a����W�pm��������scK������}���"��7���U�`D_�u��
G���7r�
��~K8�y�T%z������xFV���G+��
������ ����^�V���\��Wr����+��q,K���L��@�������7�G����$���I��������kS���Rc�9��f�m97�D��"�_g��j�������Z���U�.��2�K�g��J���1� �I�
��C^�ci��Y����'�e��R�@���Z�:*����v����/	����5i��Ho
��t��T���4��N����j�8
��i#)�6����GX���� |�{��u��le/��W�H���,\��u�;j�cX�SCNoq����76��#�MF7S��_��_���m������p�U2��Xz��\����1}�7��-�O1�::B������;��E8[��_A�
��%'��v����~�BrF����AcP�������L>�m��pdf��VZ����[��c�265��	ak�!!k��Q���RJ?/	���f}+�L�<a�,+;�-\�TY�zo�;`IA-<<l.Y�t#�.F��_���g9��X_ae.�C���a�����!��J�O�f�x���SQG��S����{��=����?����O�z��1J\E��SA�d:��0����*���}���6b�w��J1�i��d������	��M��������	�D����}�,$���O7�i�5?Z,~��i3�^8h��P�Rk�z����--���eQ�O�Q�K��8$���&��n����O�[��l�>�jh9BH�,@	��Z���
�j�;h6��`����-��� ���?�=���l���7������������DPk�������$�����hv�
��(0��/�mKul3�Q�����}���Z�Y��������[���WH��^�Q���fc��_��;����H�)�_��[�����i<R�Zc��t���aX�����Z�^��ZA�w�8�'{��z��U8��1��c�)��:6�X]�� Q����4POg������1����{��:&g|���c�s�^�����������_�	A<V?d����O��>V�*�������=�����$,�>��e��z������S���*��ScI�\�w1D���x���WD���`E��4��9���!�z������pL�P��u1����u��(�������r�b�_w�h��[e:�l���	�Q����,�N0\�����9j �E�����Q"o�R]mK|c|qQ��?�h0���7����6�C��3$�`�	���
gw!��Z����N��Z��y4��`{Hh�uC&���������������$#:�m�f�����D�Te}���obNG��4��;�{���(��N)M9o��dLj);�qJ{�i>�����k�>M�!oA�^��fC�}�I�)��@-KB�u���<j�
���]�
����ag�6�,G���$J����c�*Zy��b���#���M�qz�)�������X�����E}=\TJ�'��Ld/B�7�C�$|N�yr?�9!�B�$� �,u��R�:��u�N��="o���pg2�����o���A]��9+`T;�5d�+����nw��a�g����n�_�=/��P��_�����Z�jq=������S0�E�<�����GS��Eq���j�q���5��Y���P4��.%A���
�{���P���~i�\Q�-�a���{��"��.z��k6Z�pa������k-��������������'�
���o�[���F�%^�L�Hm�6�Ux:�RR�!17��U���W��gJ���G��LZ"����fW������
?{@%�H ��z&�g��y���0�6"1�/zq2;�U�����d�x��AwQv{tP���1�T4
�l�^�6��N�aO�����p�>���Ac���>�����Sw����������!��n���?��[�`���j
�-��R/����D>�6BoO��u|��!�+*��U�"��B�d'��d�	[t�&��WI��Uj��l�������1M7gAJ��=�"�V.��44��I�Bpa~M���-������\�����#�:�5z_�\��Eu��{c����{�}(�\x������� @���������n(v�0�&�Ow��N<�W@>��62B�A6���=�E�a������)�t�_��H�cF;�Dq��Hf�=c2���Mg���v��>��!��f���NGP�L`��C�z��}��!,���{
,
���Ow�!�o����"L���]��<N���~h>���f��2!��U��T�-���c�5�$�� L�H�L�M��	��8�d�=l,�W��	z�E�P��|
 ��=[�����,�`�+hGVI!�`�V�����.��jpc��������#0hq&=�^3Yf8��'Lpd���-b���L�m��� �KAgZ.n�P�n�
pD
���'@���J
r[��=�;=`���}��x
J��E�
�����2R�%;��I��0���Q��P�u>��YE���o��&(���l��F�Q���:�e�<����Y,i�NB������@a�F�Y!v��<�,`���6���&��������":�S�hz������eE�I�5@��6�>���B��\0�
J���T* =���
.�����2�F����\K����#q�xPs���G��t"����"�	��\���[j��%.]]���;���
\ ��=bX�����1o�{s����Ym���qF�8O�o	`��xN@���������9���F3�-���04����;��N!���d��4DDe���~KO�3�����
g�����1�f���B�/�e�I�ypH2����p���9
5AD�<��)���$�eH�&$�^w�s����
ZI�ciTw��>�!y���}53��n�������6��	�M�
^"�����q��ibO�=�[�1����L���
vs�l��c�ER�(2b[��@p@6�CZN�K�%0�]��B�6��S�a��^r��cK�y�bx����Q��,e��t�y�������8����Z?�K�]���xmV���a=�l�\�I�R ��>{�/e�#=�P�8��[�v���������x����%��M��&����9�|�$ �����$�5���a��7��[]5���5<�G
'�Y!�2i�[�����{�z�D�L.;{�����������~H�#�I
�����J&�V��9S��2���.MF�)�����*�����5�	����)'{,*��]6d!�(`J'�t&��"]/��]>��R�Au�
�K%�IB���._�e�IKqc%qa�Eh�h,lx)�����2����R���W���n#V=?J��L�Y|��D��{�MFIo�$�5&}�� "�[���m@0��d��Hg~V@w���[��QQ��"�i��d&��h�X4���"W��x`����d!LrbGiY2��M9��8���B|���]��BO�n��[�=���$�f�0u�c�eN�+���2������r3�~EH�4YWey�j��=���D�����{'��KG|w<s�����5I�|�Q�:�����������S�1�l{Z���<����n�&,�sLF�5���f�f5i������;����5~aQ�������lg�=��9�@�����V��W�������`L�����>!ZhiF��������QOD��>~��y��$��B����i������|}������MQGl��$�(��� ��d�?C��6P�	��v���'�<��TaC^k6We�	�-@Zb��&XH����V����O�T��K
1(��N������'���F��]��F��1<��{]�v��}��A��5�T���#7�.��`�<
���C):�p�)AR
U	�Vo�|�c3/���I^�����:{deu����
hA!�z@��B!i�?��E	����"�����T
m%��J2�zR�FB��Kq��O�J-$0xs�o���_�G�)<wzp���y�u;X�v��bA����W�50 �~���i����>+�H:-����F�G;��/b��6�����P�vUy�E���9�}�|�v^��a��x�Nj1�p��pV�{��! I��������a|�/q��Qq�70�����wAz�u���4���a��!�I/�4�l��+�&l�Y�LE�L�
����v�UG%nk�xC!����%R"�@J��Vr�y�����p4��+��G��� ��K#y������u�#,N#�(D����5L�7���M�������8��s��E���y��j���V��(�����N� �h1[���x|����0��	(��x>bWHi��h6
&	���>��a�"�(�B��#C�9
{�W�����O��6��=�V|��=��*�q��r��5���;]�qk����BL��������rK|A��q<��_O�����)���w��\�J
��A�*����-����;w���;��$oF�w�ew5	{Q0�7�F�3�z�D�o��%!�i���l���TY���X�B��Z
�Rf�X&�?a�����������������>W'��/�\k����������P��
q��,h��d�5���=+�6"�j��
�:�a����m���J�t#8F����Ibs���Z:
��b�]��6�R+S��f��G��#w���p�3����))
����J��w������#.��#�d*�$�l��s}C�Z�>'"!6'/Y�#���'8�~l����/�z�-����,�O�
����4�d�t�H�Ay�f�N���Z%�rQ�7�u�8�IH\@
���t

��"d`!�(%����L��N�h}�e�C��!�/@�n��M��c�]�����/b��F0$p"�f�B���mq���{R�&�r���Z���l0��&�)Ix��\m�@D��
}a+�Dm�W�T�El�{�T�p2N/'z����Qj�Dy����#������lB|�D���nO��S>��w+�]�����u�����b�Z1&�c:
D��I8D��GC"��v����|�������y���G�V8W��!�x��(���
b�H����;}E�[�^�#C�|S@V;����-���$
M�88�
���5	��V�1����0���������h��j�E'�D�@.��I�����P�<G������=�
=\�:��8��_r;�7��W��Z#fj
Z,�H�j��:"#t��;�O�y���.o
���	��
����sK��C��|�c5g�|��w�b�p�<�A���������|2��3��P�y�&&UBM ��)�Lq�����
��c;5�H���:��R��1�G5����p�����:k���W��p(��@*��P�
&����N
 d��������}74u�]�l+�d�^���7������
�P���< ��WH��Jz���V����#�
�����i�h���d#+�����:Y��q��r�9d	����	�����	��Cm5���y����H����Rk{�nRo'
��ysor[�����=�V����FB_\�?��f$-CC�P
�� q�7=�c��2w1������=�Ydp��)�������
wH��&��|��������Wv\ccM����z�rb�1)��T��������$G���xM�%���"�\
Y�9�sZ'-L0G"���$��������O��e������oa�&�`8@7t�<���`�����0e@V"����q�k��c��h�+:�oX�_
����d]��r�X���!m}�4N����m`=K�����M@�Y#��o��W��(X"FZeb{��F"cM�����m��i�^��������*#������x�p��������=@8�C��1���0�w�x|S�c>N�'�#0���n_w��;<���Le4#=�%�\l��o�Zm�e�������Y�������:��8�����k�	���?P���A_m���S*!�RV8p�/'������3������|
026F���"L������c	a�}plob��U�'n��Ru��5�Dc%�<��r@��&��?�oV��,���_����L�s��f��O�+AB!k{G�d����'�'q���yO��w��&���BC���{��?6,1���=9��+:����\+�2=[����O,1�5?~��������/?�@��k��wq�ql�v��U��!�Y)���:-�������PV�b_��Ex*��8;�V�2=B��+�������;J:��3
�r��{�I���0���k��&��#6�04�]t�Zz�\��Y��O��w!���L|9?p�J1���nt��z2K���[�0h���^���P�����-���s[s��(Uk��d.{�v�t�[.'�:h{((\G�.��Dr�W�4������0e?*Y�*��7�,L������2��1���<�t�w��d��nW������k��
�:c�;��W����
�Z. F� k��0xK��$�1��E�����xtT8�������I���I�&�|��V-���;8�J"�,������������"�jS��/
f_8����z�E�Bk��Gq�	J8+��3��X�N�Z���3Oxg����DwF,�t
z�ZP�����U�,������}>���Z)k������&"�w�\�m��P:�~����{�/�������U�xL��Ti�n�C@��?�;�=��E����;�$-1GSBy���g��Rp�ZA��=z���2��p��*��:C
�B��[|�t0)fD�����<��4���<%��CQ�gP~t��O���l��Y��?tV/�Z�B���\.<C��$�m��)?h�q�chJ�.�c��"����i����+=Zh7�����^#,~��+�N=g��������HI�ZW���z�<�6?<Tw�u�����E�d&���X�aO�3S���0W}����[s����a��������_��X�j ����#�t3NHi��]lQ3�Th";����@d�iWO��<�R�4�� �����q�ia_�4��QMw�x���xgC�x�)�54%�sp�b��q�8���My���X�t�j�����k�����f�yh�QZZ���`���
IXG&����L"���i�V�*����m8�>|������;�a
������g��3� ?/�}Xa�}"Q	�6�B4�Y��`Lvt�dxk;_�Y�+�t�t�1����SK���I�XW��N�w@�����S���a��7UK^�'U�������t�#c�Bs,�0��R��69�v��WN!#�D`�����mK��!��P�*uH����(jG�70��	��HC��"��O�m��nRH�7i�j������@��h4R�l2�#�oH�������o-�T$��4��d�J[�i���P6F�lK�a_C��s1��QD�
=9r@��	���
�p��~���h�K$�!�n'�S�n8@N�����K�2kv�,����|������2�$iZ���q+�#� �4�&�+�
�IX��X��'���Q/��>��vL���p��(c|Em���������;�P���b 8M�E�(���n��kR�������$����tC�9��������������v�!v����*�����vLYb�Ds8��6��&�r�|��r��!����+}`YP���kk���K~n8��`����/��"���%g�<�� ��Y����>�g���^���S`T��_zt����������5��'s"��Ny�Ph��_
�h�����!���B����	@��C���^��	�=�^��������=F�3�� M�eg@�bMd���|��y�f��Qd���TY�tjQw.��&"��xB�t���
���/vcg�0�0�>��s��mi5�=FV�!}��&}1���w3�t-�j�@Z�K�D�f�sNW�j�����x��;��D��@��NG�������� yfJ]-���E��6}s���!�*?dm��H@[o��a���BF2�XNC���nMmv�%I�M>�cn�7��d��#�����N��HCDlj��F1��ro|<0VIS���BGcg��;�0kf�/�����Q��=������T�E�#",�C%8�v=�S�YP���O�XB&�����%V&N�@��ml�����p���2�uR@|��Y5}��]���g�2����VD���BO���H$<�����|��,���p�-��#H+��/�,'��^aI=um�A�1Bv�q�����+��|<[�
+���;�)<D��C\l�NI�'��hU!Ao�����.X/w������r(��?���z����l��7�r#o���:��]��y�M�������r�kXQb���Ke�9���'C�C[$`�!5���g�X#o4f���z��F��]i����:�{1W�����yi�u��E�vH����B�6��_`s�%1kJ�2��)c��L6�������Fu��6K�x����a�~�~��&�92��-o�i��|G��+�si��7<��w�
�
H�{��(Zk'4Z,MM1�C �L9A�tq�!]zQ]�6@a��k?j}UR��
(h-������0���T\�O<�a�$*��������������������hn�1�����je)X���=��H����������a�@��<|s ��b���"]�PW&m���x���D����0�O��,�y��p���;��� ���d�@J�"�:F�zlx����2�a#qc�N���F�5�Hz�d���<�/�3����"O���6;b���pIb���[��v�����3�&��!��	M��T*��U�����CG���Z�Ss�o�C9Fo��BaQ�~sx�es�,$p
t[9���l���y���"R#l���Rk���d���M��q'z����p�" ���X_,�m�2Y3�=�{:s���/<��%�?a��J�xv�����}m]Y�p�+}���9B��l����.\��T���IP,��t$c:������t!��7tu�����������6���A����[0�����������u���9�t"��������r@Nm�"����=�\�T����%igj�O-b�%��kC!���I���B�Q���d����5dF\�F��`��4�����=����8����m�\�C9���$�9�2��ft$y���^�7Mo,c@��#v�:d���*�l�b3��Q^E=4}P����C+�����
����%��R2����3va��!R�hY���3��	!��K�*����;g1kja����_g�c����QhD?Ky�n�9�2
Z�\u.�G'�������Q���Va���%��6��5f��>���o�GwQ�
g�h�Kq�j�h�����$�K���(�A�!��U�x����X�
1�]�N$A�4zVi?C��&��O��l��0_��T�k��Cao�~��<}srvyz~v��p!���Om�96�JA�^@!�|�RO�B���L0������]�����Z�| �O����7e�7��>-%G{6����\]D�-^���1��nD a*|N����B*
,���qbp�+�)��r�L
m�����|�����V���%k)��z^�`]���7F���/t�}�����R<�G
1��K��Ba���(6%���}��)?�����������$�xn#�=����d���7n�{}�R$cKi�m���l!�5��P������P\Y6xfi��d��n5��u5��V2y����]��c�F���	�Q:��o����([7�4c^�b����w�}���g<�k`y\M�(��c����_��������7'b��G��h���! ���w���3X�,����_Y��%a�Q��X�IC���]�P���Ftd���-2�@���`�bs;O�-������*R�y(��g����GR��!��PLtB�I��Hn��C�����d���!�=�����!����_I�F7��������+�(C�����W��C7�\�+Kt<���L�l�\�|hHl�V�-��e��Fwz��#�)���4d������h��9�j��y�����/��\^�A�j�����}���mvo9����r���l�#��
�e��*�K��;�i�������tx?yp������|LA�J�����O"[(����{���AOX���������W�]��M���3�i��n����2����2�X7����.16�����#�����x>�!�xYC�JudM��$��0��(���u�
V?atH,�,bP�IZ��{�D4F�&Idt���p9��t9��[�Rb%:��5�\���<�����\�7�
�C�~-���~���5�@�C��+G�I5wP�49
�e���"x[s-��W�����o�h9��W������>��=�{1~s��g^LL��s�q����	����M�t�2��&�b*��t����=.��C��������7�O���V
���hWHR%�'��m;{������US��mU��@C�*�}�?���7�����|Xa�$]a\0��@/��G"��0�
����/�xR���7f
"LLX����i��yP`���
��C=m�d�^PK������	��F�hY1�<9���IN�W����KNt�������%Q�H�7���Ibjh|���2��r��2�;�H�O����y��F��A���8�9�f�o�pc���w��fI���8���,~�	��=
x!��f55��m�����.��O�>FT�{��0t���r�&MC��=�W����������G�E�/�3�	\�����L<�9�����.����]��/����B�e8�{��(���I�i�f��p�%s�\`�f�^q��Uu\��R`�=���_N���q��|,�n�=},�;s�5w�FA���w�jK��A����Xqo��`�.=�<�1M#	)�+.��?�����`��k��x���l��7�M����q����m�
-�6DJ�.(eiwZ=d�W�u����
+IM*�����P,� ��O����p>d"���F����>�����r�l��B<��TtD(���������S��'97'qG�=
e����X*���6��b���!,����F��������;`���9�>�^�@�.`{�Ao	�KP��G��c��~�������������x�Fz<��W3VQ�#���Vx����W@.nb�2��=�K����%���4�����[&�����
���,���&�� �R���0!�-��V�H�t\��������.��z,��k�k�k/����7JL��(������IF4���6�n���)�<��`{�7��1��zl�_h�|E*�j<R��t5B�?�Q�Y����P�I!�m��T��yT<P�R�+'k���GT���_����w�E^�n1|Z�5���O�U/�Hs�p����|����q;.m�[�df���K'���6�v�CO~��'g;������]n��|e��B� ��vgi�s�Z�_'-&z�&����)?�
�5�&��6y��(�+��n��}0(�(������������G�6�R���q�6�5P��,�5�,����7�����-!��u��~��1�A?R�����;8+� ��Yo�[a���|7<���%����N��3l4�;����_�ifES�L3+
#H�~�����5B���]����������R\-��&(��C��H�&d4h��V���,��9�8�>?�vY���W�M�Q27��_,<
��\xn�f���W��E�0s^����bj�+��e`��K�;!=����z�j3���,8y���8�h���h��lLvpt��R�����=���_r��XC`���e�^������`�W��f�<��5��?y��)���m���a�����/��>���>^��p�{�u�C0�0QF%��3��%��L(1�8Fn01��|7X�r�q��03�w@�du�>�f��o]�?lW7��pw�E�?�������8O�s]CR9���,���1���,I����6�-7�y��]���F������1*�#�����	�2}�c���#OV	����P�����cc��sD},H�]U�IxY�}z���E���.�����!�32e�AF
y���h&,��
�.��M+!m��g @��^*\����38*:
�,�HdW�X@|�Z�f�8�E������6D>�}�}���ja4�E/b��u����:���cv�x�I*�_H��������Fc��s���Y��7�<������w�M�C���-)k�^2��<�����$�r"�����?�5����b�i	����cD��!Te����	P�
!�*Li�e�{A�7>��� �
�d`"���p�����M������yhi�w�/�v�Q��MF;��f{��K5��.���^�y/��V7�A��x~^��S|���Q8�����>�������r@5(�>@4�/T��4�A-��S����>�m��{���]���D��i'��;�!�Z�)@�k��A��Cij�|4�I�'E�2+��+��-$�?~��GDkH�E�������=���gT�vu���]�����dWM��L��6/\��NJD����&~���9�����E6EC]Q`�E���<��	�Oh�A�B�*lQv���Y�A���K���
����6��0��{qH�� �L�q��p;w2��0�*��+;.�*����~r��}4Z��/���O�G�����O�5v1���{G��Q��b��*��e��kej�m�zR������*<A��Q�n��=A&nMb�W�`�������e�HN5��|��b�a��(���5.����p��H�
@~;�fO3����h`}�tAy����.`�#��zS�!:����n�|IY0.��c����H����5u��	�����hA�
x��������]���./O.�z����������I/=N�3#�����9����xw|���&��j�j�Q5�����e��D
��L���9bW��:������7l������8�_y^D���t�f.���z7h�a�_���1�8	����3����U��i���qc��"�C�0��y�(:%py%
�h��_��h�����X������$��v*k��R�
��n�����N8�F�[Q}������;�U����R���6$��[�~�������h�j���!yh����������������wg/EMV*iBg4g�{�����"5/\�w=l�N�V[���y�����3gw���sd��d�q-��v��j��+/�{�����n�1���w���{���_�q�;�}`fwDB����%�5e�*�C�v	h��������{}���D����� �o�
_1k!��*�P1@��CW#57��;}z��
�su^4���i��8�	��v[���_����7F��%�_3h��x�YQ +�Yv���`T������v~oa��V;^��'���g/O~P�����'u~V4'��T���������
X?�6�Xs07�������6�>�C�v�
u1iw���Q'K-������/����x�Z�)�,���5��S��>�M/���i�}��@��"������N�Tp|~yu���k��db[�~E{��H��MR�b ��>R�����\�]AOg��^�G�@�L�I�U���iZxWp,���x:|���n:��;(i@���G�|�u}���q��m]\h6�����[uu������p>���,N���J.���*�3�j5�����?�������k�Z������������;�yTpj�yD����8��O�Y�s����|�����!���*��>����T�Z�[d�_���f���3�(|������^��7��a�{��\�X�\T�U�(2V���#_�5��?^�u^���nT�y9�������\�"�Ux�v�Ikv�8�9�zP� ���>�+O�=#_v��Qov���Z�?��w�����a�s���	�p���i����������r;��GW���]��������N�K�����n�m��4���ps{�iu���.�S�yk������Bs\�������W�[����B	���~�fxv:���NF��ukg���t���p����s�TkO5�O�
�{���<UW�$L�w ��C�|����
�,
�A<yQ~����^�vG���������v��6�������K2F>U?��T;������[3h�ex��dE�@L�B)J=�8�������>n��Q4"qc��,a:��9~(/	��a����1g���cNs~aq��=A�a��������R.���
-r7������q��3���1�~	���|������r�@���C���2�so�|UV�x@2Sr3o������_��;MUM���ac\N%��K�h��6���	49r��t�U�r��dt����8��X�4�SP���O&����cL��{�����T����r��U0=���CH��o5��w�h����ng����5�VNZY����`������Ct�65����cN������5c.��}���3���^�u���k?41���y��f�w����H{���Up�Q?J;�%�F�*P�A�,��������S��jR������0gNG�����N�
.S�����Mp��f�[Al6B��U�t��.����3Hq���k����r�6�^<���p��_�.
��@g��X,���VV��/��U=�{�x�{	Z����Wy��T�J-���&��3$���������B��[L��T{�7���HP9p�	��8�� �2�G�0��Z������y���W�_���"]���W|)j�N�5�;�6� ��dVTg����������K��P�#����������'�%�����v��9{�7�����o�_<�����x�_�O�s:����?6�t����7�3B�����C��XX��9)
LS���o`�&�d��\!�p�{�5�����W���,tU���^����j���NU.\�T9�1�>��.��~��X5���Pd7��D�������}���Gf�n�e���6
��#����?��cD7�����j Q<���_����s���4�D/��7��������
������t�>3�n��c��J
��!w�V�!X*8F�8@����������J ���v����������CY��f/g���G���g$%^�W}+	������h���wg�C������J��������?�]����N�������a����3�.��p��rN��j�V��S����,��`a������W�p������<�H��P+����*�8HR���{��c��Id&R:�)6w3���!<O����W0���SUG#Ll	�!�����������|[�$�@���B�Fx��J�S��n����Q"?�xz?�m�k��p��t0�����y�S �K�K���4��-�T���.A�p8��������x�����o���C!���M1x��1J.$8.gxI��X��h=���	�1g4���g�"{��
o���L)n.��#���S�m�
�k�U���L�U�V�a��OJ_Lcc����S�=�U����t"��� �"x���}o`����d����br?n�Qh��c���W����4�A���"D��V�?���������r����,0�.�a��l}?���S�sm�������
�ER������F��h�Z�,�������6.}^�.q��E4���f]�����o�K�T5��}5��x�),���W�F�H�*��>.p���BS){���K$�{%���l�v��\Iv�� �aC��p4G��	��Q�����(��'~��K�'����$q��eC}!M�J��"�P"��!�4
c�u.z��7�k�	"8��z���`�"b��F�K��=d�ON����`�E��� JK@X�c`�h@��j�������C�0W�����-�P����/����B�H�&���9��yb����U�"���/��{;�|����Qn=���H�Lu��!����x{c�[:&^?��=s�������G'Z�bRY}�x��Bmy�������d�_�>L�2�7D���S������6���6���a�����_L����}g������5o�����
���������}�u
���"�#q�,&GG�A<g+s�4[��_����:�n�~���Fa.��D.u��"y��kF�B��	�0�V��H��%'&w���V�S>�&����{���
��d������>��h]i��]���?���A _#�j���Q|������z������c�Zx�R��P��f��+����k���Y�����R-��Uk�.�����va�FCD������7�C�����n�`���{+<��R�Q��
�`�����'����������'&��<�����]_
�GJJx��O8�'���*h����Kd'�m8��B>&k��^��d�����g~��j�C8D�����|UM�-��C��{�.N���!��.O�od����wJQ.�lf`9|CM�SO@J�-2����2�;J���0�H%��c�K/��:���^/����A�Y�T��"�E\b���Y6	��B�����C��IcYma���K�@�I
�+
����9��p��7��^@6?W������&�<B�)p���v�������M���� ���n�����]W��$QQ����w�6a�&�4�)��1�����pm�	]�k�L(���;eY��K�s�*�N~����8>�/�������tP�<��&����|�E�l���_W_!&��)���h��w�B�\�_��"���1*������on���*��
/���%uK;:�����`��o�Z�m�h�W���42��?d�QY������:;�R'?��/�{�|0�B��L����GGoN�N.6^�~z�P��g�i�����Tk���S���E������T�Sl���8M��)'0��������=���nzr�SS����'X=���r�B���O���W��������p?�#�����5�.�1{7���U,�8)�tc��9y%!f<x�m���Ix�|��D������!�(x/�x�E[ZV&t�����r�C����&�R�9���a�h���� ��6�2Ji��Bq0~�uMX�N������^u���4��(~����S�K��d�~F��#��B�v��*��8����z^�~��`�*�,l(XX�.��~m.��9�i����{O5vtY6G���G�F���]�k�b��|4	5�d��SR$U�8}��:���5�]�uq+��T�^��������p���-�,���z�:���"i�U
���K���p����/uL7��S���x��v�O^]�j���AuFU���q��/	�����y�,�I�V�?h4�n{���������W�L��xJ���tH�)�j��~�\.&�?A��J%1�T�@QH�G�FA>c\F��m�
�)�}�0���jn�P6qt���i�p�5��I���_�~y��\�R��`=��9� �l'�T��?Lf���]�6}���exj�����G��>`�L�����5
#�0��<��8���G�R����?=>y�������=*Ta(M�gZM$�!u���U�7�zm�2d8Zv��@�5i���s�����_Tvu	d�*�W�~���,o�Z*���?In��i��S`�Q�A�l��
N���-�'���~tu��vB��|\b�T��A��6������Jp���A�����}��k�U�=���4L�'��U-z��
� ��+�t\dW4�������oJ?����|�
�$z5��@�w������b'L]2=��b�7�����6�e���o>��p!���*3N>
�K��y>���F��W~-�k�`��i]7�A�}�
��L��5S��k�Ed�i�
�A�T
��r����Nh�F��$���������W'����_�^���~x���z��{���fJ8��b6�|y�����C���E1����0��UH8E
�	��qW�5vl��'�n�=��_�T���(Lo'�ovj���'	�Q�8Y`�A�v�~�p��/~]�_��G�X�l���":AH�w�
���k'�1��{r��[P���Z]������U���d�~�j����5�'���e�������&�`��%��`!k^2�����O��R*�S]����4��=H�%�����2z����S{�����F���������SAr�agRU�i��n�';��m�����>,Q��G��5S��L+�;��-�9s:�"����v�9�nUc���G�rhi�:�5��5��?�b��B��v�����r�C��5g�695��0�cz%~+�k�i���5G���z���5M�U�'�,��@�!�7���'T4<~���b�2>b!�:D���l�>y��w0V��#���!*2�F��dV15.���J��bk��4��VSA��aH ���B9��e�hJ]��s�l��eto9�X?�C��u�->h�.����F���G�k���J�{��X����w���Im^%�>��j������_T�&��%*�P7x3������P����������Kuu�����}w�U����^�����K����U(mV[�}��������?�c4]�st���Q�C)��X�u��%n>�w���W�R�)r�\����S�F��2�S;���0a%2�mbb�AD���: ������a^4,{�D���rr��C��!5}a�j�%mq���r�8�:[��4��T��F�(+.6�Y����$�Z�l��TN4K�LnR
Q)s`�{;�\;���j4%�3�Qy��S�_����#k���[�]s����\�i1�s�������A|3�R�95�����~���O��x����z�����R�N�����(jT��2�tE�X�.�� )�S����r.$�tQ�cC6	�G#j�O k��O�h�(�9��|5Nh��>u�-"�5;����Evd��������P�9��9�8�	�#�d��a7T�W�C(�Pd�����P��rn�f�6��4h�y����I����x	'���&;�>'�����
����:��F_no�����|*m"t1�uQR�����n����T��7�^����R�s�
�sM
c��x#�����8x8>�feB
a�2�\dId�m�$	���[����V���*�w��4���G�[:����&�m�$X�I�9���'���Q�*G�L��2���%5F���R�Q"������'�x�|:��6����?�R���FZ<T�wNM�\�4~MqC?��m���g��6rj??K0W%�d�LL�~�ZF����\{v��2:=&����&��4m����S��ev��#�V��e��Cg��w{���]�o�?�L��F����.��A�?�����,����[����2Su�O��O����_MR/��,����"�,���G����FJ���I�����H�s�G.�������E�\�����&40�G���������R��#	N�PXX����I��jf2z+���s�Q����ZN��d��p���$�1�{���
5MR�����/����SRbdOt'8w��ut���R��f����h|�����V�5nW��������R�%�0�pp�����g�I[U��\O<��=�(�%����X�������j���EM+�+jL�.H/7���
�](�e8l�����������e3dfRG��V227����p.��yD��d��ar3���Y��q5�%�Q��Pd
�F9�O���T����������
��R����x:�
�K�n�WK��.r���{�/����;<dAjv��C����f��l���\:�����({|�
�iB��N�i.y
m[�'[�)tC ���G��`��s�����w9�.##P�V�`<�=9n��LF:77���5��������I�E�q���~6S��F5�+����a��/X����3��y
���H��W���{�j�R������Z_�z��~��(�-��.���z�1V��r�7a�6��[X:wb�<����V��9PcA��i�}2yq
�H!JZ����.�1mH��h1hT����<���Ix��AtcG,���
x��\(�������,D�;;�'4�
�W�o��
�>�_39��YF�S���&���W�"�)}8Dj+O_HcGu�|�����X}�(6	��4�,'���D�
��������Okn��j��q?���	L��1����fJnG��Q��<��0Hm�*F�5lD��`�����$.�C�����0y�������ax�����H�������%A���%��$�.(p����H�� *z�H�'j���A����V7��s(��K@o�wh&}�e*xJ�&��I
��$��t��D��@����{���QQ�:U8r{r;2�K���E :����N9�y$a����7����bqo�Y*5;���s�aa9T�y����>f2<�1�}��;������	���m�Kf��7f�H.�0��.�UdU�z������(.�XM�8A
����p�$�1�; ��&#a�\,ZD���N\�@<�S�W��������|�\(�1���hqa�2����D�"�#��R�� �z�M���PEk���=��������[X�]���m���+�����$NH�����u��l�(�sO�����!#��H�Fc����IP���	���!",�C�c{���!�^����7�D;����f��8���_�i#��W�|N��x�[p����Flb��C�s�S���f'2���Vy�`������M���*-���~�m�NJ5�R��'��Y�t�����G��cw�6�kU��8�,�!�����?��9"�#�cK��������F�{CM[�� ��1+�Gu��� eDFA��5	Z�D���i,Xv@�0n���1w�n,��(�<G�D*��'G�����]?��-H�CZ�b�6w����O7�r�#���u3q�/���bR���������.�2��(<�z�j`�g�1~�x!>�7�oz��>�����i?�n���)
��)M4S{�M���}�g�U{~R9��=�P�E�opq�x7Q}8���r���rn�1]�����pO�]�
$*�����'OT�VC�'2H�xI���U��B�C����8O�������	��q�x����-�:�=�9��t�u��\G���*	o��5�Tm��:��dHo��w���j�Y$��9�1�7k��S@�(���s�S�iO����7����K����������f��(�����@��||������Z�;<eZ8
�!��U����<�u���A����M
�J�>-�99�!s���|����y�j8�8�s����?��ht�����A�}Zc��o��.�_�.���3��
�����}*��K���m���9S}��u��uC��e��x.����}0����gpA����/�����>��>;�26x<�F'�7�j8n��i�j9HU��t��'T[�8�;�OH��RU����fl�,L�_��a0�K������%R++���?B1�e�E����F�������������
Fu��1�4rV�n.D�G7+��*	$������^>���x.D���)��1b���)�!�bX��X��@��
�Qg�&a�Z�����m4����=��	C�����*�;v]'3v���+2����?� �����&��VM�N>��|�����_x�gp?�\�S�	��p��X��"��$���~3�.��/��edj�/��7��	K���'/g�q�q�g����?]�S�?������c�
�����=h���-q���.
�&������5 S&-��M�����}�]���)�8�d��5���������e�M9���N����>�`�?�
����n���n~��RT=�{�'���x6��*�#������V�z'�<y��\���3st9y�"t��I�p�����Ml��G��.^_QO��������[���Wm�hwY��~�������x����h.�HQ���TiZ�[>S*�YNQch�*1�D����l-F�������QM
��S�&r��z/��?����1}mM��
��I�h��ru�?�����o�f�4/�/��Ax��A������������W��=���!`a�-�zx���Yr�����r�_q�v+����<��~�`�&����um�#R�r��$���ue_w	u����Fl��C��;���b����~
���V��m�\���r�����_���<���
����f����ht8Y��gC�n�d���hoMU�5A��m�{J
���"U���d�	2X$NjE+����32�d���4x���L��kD&�����Q�z
�l��b�>��#$�a�B�u@�*�0�)�JE�)����{k�BE��v
��2�&�6�cw�g�.���u��,�N��`�"��T+!~�$V�b���&��K�+
�8gg}���u��I30��a��jS_\�[�M�����j�S�9oe�NE�%Dr�~��p��c-��
�2���n��
�Xw�%��>�.����q0�g{���e��h�c��1
h����o�6�R;���BVG�R��k8� (=&������$6#��GC1^.�'�c��N�T���\D�MP�[�.���A#&���D�%��Q�r�?��y����@7�o�A���}O�4���Lc�'��^6����������&�!!0����Y�8��2�~F@2�����Q����3��_�u�����1�z/]K�DB�J2�����hh��������Q��=����&���c5�g����R����Zt���av�7��{�X	~�,��#��x���fI�J��p(�h�m��x3��t����W���C�PM;xG����^�$'�����r��f<h��?��:���s��/
	�A���w8��E��<��W0��z�����������T+�����h���;�f�/k��~.h��3�\@��;&r����%�kEk�e�]5R�3�F���F�@67�A8���`J�%���F+-/U�Yij��p���s��x�{<��98;?;a�9Y�E�+���H7K�M>����r���p�����n6��w���bW]��I	Q�G[s&����\rv��`��s���uV[��x��Y��ndW�:q��/N���:�o��68����",���u���"��f���L���h���B��d�Vz,�N����1������p��=g���_F'�X9v��NkPnRv��K��_k{���U�$]Vr�ol?����'���nK��^��$e2�nO�O���0�����d��K5e����za�9J�=�DSA�){m�u����tF�5C�_�����y�������r��G��g�������|���"<�p�r��A����+�0m����WkHy���5|8�?���L���������i�\q�?&{+��u������s+qg������3N�B|���!����}��1�����2BP�I�L8�E[�P?	�=e�@�^���j��� �7���z[��S����9���8�G�#M[�n�H��}
��e<3���q�:���Y�v2�_�W��,k������5Z	����n$�Y�</m�Y��A����2��~��L@���1���D+�������i
���M���<���o��������3�u.��n��)�q��Y8��J������h���X���86����Xr����]#7c�4���������
���x���>�
�������A2Y��R�"�yk��a
:/0S���a��	}�)WSOxt�F��m��H��h	'��k���N.�q��J���Hs�������.��]9�[�M�����k�b�L������Of�o���i��hF��Xy���*gB��1���9mF��l=l-�
rN�g��P[_���DKVw4':�T$jU����>� ����	�	c��o������oOz�\���^���X�:���	Lt���z&&������2C5��o��3BK5�M�r�"�g;�=�fB�7t�����C�#���Y��}J����kW>T
���w�Y�n�y�)��T�f�bjiGd�kLBB�}4c��]�A.i*f���.�G�_��v9���C��]��l��m��T�,xD:����uUo������<�yI�0�ex%�,I�f�'U*���	a����
L�����4������}E���m��~i�B�M�!j� ���p+���9�<�;e]��������*��\���u��L
j���P����|�]:���%65<G]��{M&��i@��`F�zd�pN���8Z@���?/X�A<�/�E��U��x����J��r������>^R8��tsLo�Py�s��^��:z���w|�������Y���e�����1qN�t	9~O���m9$_����\a;"a��)h�k-uW��e4@�9>W��p�/�������Sp6���rzo���u��C��2�]@rc� ��g�M��P?7(�(�Q�9g��o��s�����'����9;w�?y�����IO��{^��_s����,�hut��x��R�
�D<��~��C����z�������8�y��k5�t=o8	h?�q�Hm�EQvg�~�w�}�<��n�E�l�%=�@�XX[���w#x��s	V����ZG?���F���/���k��-,������NPJ���������\�^_g~�O��K�1vU���L���������N�VTr1�>GZ�\aEn�o8����<R�����#��\1C���B�����GPK�G���8`�����'���������z�TJ��p�F���v��G�wn0��Bp/rn3�:|��_��!�=��e�LS�;���A�Z8�rZ�i$%`�I\2�������j��^��:{=�^��UE�z�Zo�������	�R���`�5�>yfkM][�;k}R�(=6X��7	C���9��+;.�^T������Hi=���0s����SE����|�$�����8|�a��y��y�D��:���KX�76�Z�b��
aVe�KE��w���v��/���ud��Wg7)d���x�����c����9�Yav��
�p:�������,�����h~����q����~����>��c���$��'�3;h�+lf�I<+F�h�b6��NFK1�0��S�<2/���T�	�<�+K���Qa�[P��y",��0u���b|���?7�)�X�H����3�o�s5�"�����0��N��5!^N���c�6C�hz�'�Gh���a,�bti�����E��{��)r�_�\������T|��a�4�v�F
JV�����^o�������y>I����:�����x{6�����l~1��v�v�����Ag8����-����l�������@�>@Qb.�����%s>*]����!���}T����FD4����x�I�XQ��\]^]�;�
n���P��S?'_b4:�Z��.1���s�n0d�S7��yC���~���:����H8����0�����?���;���4�2N=5]�05e���[�K'j+���o!(���O�9u�dL���$=j�4;�ta�af����3w��J���CT�af���L��D��A����^+�%xM�m�`��t���_����v�:��v������_����,)(� -
���@�e8B�Ga����Vr�%�m������O����`t���q�G��r����$��dO�
N>�ah���C��*=���MCmM>���'G/��40	��������N��O��E��7��?�Ft�\G������6J�n��	nb�����V��sV�oq�z��U|�� �_�WBb�*��J�%��Qy��.��H�Y�u��-8P�:���0��<n%\�c�n��rF�y�,���k��(G�Xbu4�D"7C�9���s��g-�$�������XF�'��BT�����I�!,?Fs�&9��A�J��|�JPa�^���'^��
cT�%��7i]���[�����@< 
��S1A��tD0���������b@�_�qs��Ok���l�K�c����Z���j����!z��x�7<�=B��^�r����
���=�r3����@�GL �������$���
�.b�`)LK����C�8���`oFg��^qn���iP\���W�HN��a���h��E�y���:x��JC]��P4]	�2�c���C3Yx�B�����(��8e5����i�������h�|(al�|�2%�6Qg �,�9��$(0�	f%�1*������<y�"tG��@igT�SP��N���\���0���tYzS���

�5R�
Z&?����`S���I���'��7��b�R�)��$26�r�f��g@��d���;�+�&�r�u,�d�����oO.N�[E.x��	�
UqR
t�����s�������t�ugpG�.�}~��I���S�&�k(a{.WO��qLi3�E��k�^R��1��Ul"#g���O=�6��q&��(as�"L>XX��%��Q��;'SN�Y�TM�z��7��{/��s��Q�8�L�����"�Vxd3j�eX�R�	2-
9�Y
�d-��B7�� 7�\G`;.
%m�������U������Li�AT�U�X��������!n���=b3�8ZD�~�����������e]3��G�Ip��ImZ��f��#p�o��P����������h9����t�9���C��(��c��DC{=�0hJ�PX���\H��2�,�����8q�������h��@#������qM,�c�\�i��e^YbBU��s�NK�c�%.]k��v$V7t�����P�s|>�NU�&���w���IRC���������1b�G��&i5?����#��;ggE	������*��QK�j41�e9	���pV*j����nG ~"a�?
Xq��D��Rd��a�t�<vr	����,h`���e=��S���b��n(
!��y�����a��c�r������<)�.1�oF^��z�3)������Ia��� O��G�F�Y�%��d�%/�����[���q��C�?�Ao���I3�q���PV��aT
B��"��Gj��s# �?J>4r
N��J��-@Y:�:FqjA{�1���y�ZC~�d3
m��9�k&��U��?b�'�%��$�B�H����E��J8�y�#�s(WeN���
c<bT�9n7��XA��@�*�S�k�K�1����c9�-�0�a���f�I(�G�Uz������A����[~9�$��a7�!-�.�F�%[tvi3�*H{p���4�S����UpAp,�s�#�x�n���]x/�O�f����3v�"�l���]j3Lb������������5Ly�b�F�6=��2	�<t�����/	�������R��TB�B��XN��
�-4Lb��	#
����9��N/��X�E�|��8�a)xM����k�1$�l��P�t�s�)�
�<�a"=)�+�c��"��/d�H�s���2��?lP��i�����!'��K�8��fQU
��Xk�^��$zW��\�����
���u���Q�8Y���a�`u�����W�HcCB,=)��V�X]��>9�B����,��������	"�)�D�X� ���f�U����/]��)�X~�	�]�>^O��"t��s�L?ufk����h4�_oU��Ze�
��%j��'@kR��!^	�W�x�c1��*�T���_CG�V1m�����~���w�3�_a"]����q.���������v(��`*j��@����P�D`|�M��*W\���@��Y���]$9d�ke�rfc�p��4vY�[:��,��NPpE��������$������EK�q������\�m��"��r]���(�E��h��x�����Zy�k�MV���U�����3����w_���T*�����n�||N�
�\��5n�8]��n���vgY4�zVWG�p5�jZ����!����-P�������HM���sS]��a3���g��-*>�����@�;���1�wm��@C�n�;�K��fKg7K�WbC�:R2[ez�EN��c�Gr��yz�	j,>����DH�m"DG t�V�J��g0�;�E���~J����j�\�?l��3v���a����K}��Pr.�!�Z�6�
+dK�.�W�\
�����v��18N���oy��3�"����Q���3�IL�/�B���F$��V�������U$<��9��eA��A����ev�lU2.�m���c�4���,�:sv��'����(�,|6�6�/��B�`�i�N�(B
��c�1�1�+�_����v��n��n��CiCH���-gQIm�&� �E����D���)M�H���p���%��Q�����#���x�ec ?H�H7�V����&���k����Gp�:c�Q+$ff@^u
;
�j��]����v�Ed �{Nf�d���Q*������Z����^�Dv�t���i����vn�����\|���+�����"��gf��d�C�6��RL@z���f�{+3�Sf��?��h&��mk���
��3�J����Ca�6ev�[��7�,2y^�!-c{�{������f������������t�b!��/Vx�<]�dXb[�=��a�u�W0}�f�AGX�
)W���^Q�{k3���X	��C��3�j����*�	�%J�!, ��~<��D�t0������p��2#4�8 b���C��&��%u)U3l�h�����i��b�3�p$�����{&,K��S��zE�'���/HHfd�.6debq0�_��L"s���'���?����������
����5�9p�]3��N[�u�/j
G��8�Kw������a������
���YQ�F70�9Ix��i\c��� �Q9Cb��M����#��k?�7����9�E�TFJ�"�5K���*�����}�D9���%l"�V���w��QUd�l��U�7����PN=�sM�����IE��g�����I��>�3�J10__�\����_��FCf	�!�h(���������������4s��F��R�D�d�tnu��J��u�W������/$J����ug����p]��i�!��)�)8��'��4 e�'m{��������[$m~�L����z0_����l&x0��\!���VQ���0����n�c8����<s�dN�7E=��;A,+�Sc�2E���p���s��������^F}��9�hAv�����_��C�;�a]���1�J�������������nj�g\�����K����#|��K��D�v�{������]���!P���<F��1�t�vSjJ����U��#�
��d���.�H:��i��D�NM��T*{���?���`{�0�+hz��|+:��'��H^!
$��r��Z?!,H|#�-:�P/���k���`1���>���>����3YR������Qts��dC7�;��k"e�P����1y)���
I�������P�)��0�j�ZVV��W��:����>�R:o�$)i,.�+�A[,��	)\�|'���t���%��*��q��b�r��-�kc��I@�<fA�SfTB���.����C��m�������.�8���3&\��<��O3�!�la�9OA>�����-C��y���! ���D�|p�CpZN�����YN}-h>��@B�I��#�zt�G��k��N���5�,�(>8��_��8M���9�������R����(�Q��#'S�����mr����a(�^\
F�W*��T�����/:R|~p�p�Q�s��'��P��S�y��]|h�..z/���zo�.h�������%}�c�=�^.,����np.�LLH�J�*���H���c��S��B���z�������!��+�A��������D����Z�*��r�	/��/�s�.�}���5
Z��l{���F�����f3�<�(+(���B	�;�1�wj�����[���/�8�C�6�8UC[������G�aS<E\*�n��,����E@�c���6o2�<��N���. ���U�Q���S/	x�����m9�3�~|�SR�1z39 r�ca���L~�k)�GA�����L?X�8J�R��J��uR:wn��p*�������L"GW���)I':I��j#�����eY�T����w 5>^�����N�G�?�&1z
�%{8���B�CI�������z{�]���;�Bn
����F����M9�C�s�� h�n1;o�A�V�&QX�ffM�&�)�uj��D���	�o�5Q��t��v�w�������8�����N>
�KT���*�7��#"���������z���o��*��"A�MZ~R]�k�����!�4_=T�g.J�Ew���PRs��y�3�K���N�H;C�"a����Wk����k.�G=L��11*�wv����5�������7�.��T3�u+���=�9�������e<_\�'H��8�����3��z������������3=Th��G�9B�,ul��_I_�}������t|��������M�ZN���m�����3�m�0�j��]�t)k2������@�E�SrVY857�j�fR�*�A�DRB��������#���C��OH|����)u�B��]��'���_�OM���`��]Q^�~k�@����NZXn�T�)XZ"���mI�%��~GI<e*2��[�����p�p0��T��|X���j���-EL?���q8v*Xy�VApo�@������r`1�|9��k8D�����9_��c.�H�69f�3��VS��
��\���:��[1D���`����vv�wZ�4��k������cW�_�3
#t-�/��L���i��1Mfq�(.���d3A���3E2���$��@��R���BT�8J3������^j�&�x������g)t�a�#k�A?f��p-��c��O���j��}����P����X������g�r�)�����3���v\�����CI�������_�+�S��:��:)������h�����i�4�m�4BYz(.�E�=������o����x|���t�)�@��+���H�x,��q�D����X2����"�����������g��pz{�QcM�|=���l��nmGU�w�	���;�}M�RZ��;"Y���N�8���uB&�#���Y~+��4�������L�=��v>kR��y�C�����<[�1F���y*����5hw��DU��]@��~����V�&=������(9�z{~y����e�g��C��"0
�Y���99� �F��`��n�vjP"Y������V���z�>���7���@K]L�vCA���n:������O��1�b:
����<9�:}uzL)dn��T��L*t�1�I�i���?�<)�+2��2Ey������kcS����a��w���2������8�6�8=�.�\g�.J�MA5�D���:X�e
TT>�*�
#�a���0.)E������eL�m4��!�����@X�'��4+��'`�0+�����������l�������+">�	�8����h:�{0�]��wgS<�ko��g/O~8�$��J	��yL[�$U�.[�iA�V%�6��R�f^]������������n+�������y��"I�<�����*<���IP;��S�,�w�����h�w�8#��e��b��gHl���G��z6��-L��z�?b��������`��`������q6�uv1�\@�=��?wv8������� V�V�����O��	i4 :������M��b"�+^;�X�aLx!jT�J'���>D������y�X���-s��q�������[���������B�"��y�6���A ��n�s���^�-%&�<��M��a�z���h�����Af0Q����]����q���O��O�%��I
�xb��z�p93O��z���I�q�� F��.������E�P@����c�
���"�����m�#�93����������4������q.�j0���}�$@��,x/l�D�W�B9�@�_�W!�	$����
<�`�p�Q�����ca����,�F�����(F
bO4n�� ��_��[=����V�,������v��$Dw}���A����~��P6��h�����J���L�C%;�8&��KA��v)�\[��������w�	,�wj��J������������/�>�^�h���T��y8M(F�H�|	��0����I�d�*��`���/�s<�'���8������J��\��1du<������yQ����##�d��c)AIr\���V���������&��I2��N��}6��������k��u=Z���K`	��dK^��1���w��&�W��g���=���<��?��4*���y�1��Z���������4AP��v=�)�<W]LG�UO�����df���x"x��]����$�����,��)�@�{�X\I����G,2�[;������q�����s�����S�jM�[*
���L���#����u���Y�a��Hn-�W�?k��#�Id%-��U[j�������7�IL��I��C���~sz��+����B�3����e���CMb�l�������p*E.�fM���f��`��s(��ID��v9� I	�I�b����*������4,L����E�Q����F6�#������@����Mf�H�mo�z��$L#�Yf�NI������9���0�%�y����/s6��k�l�X�S����F�}�oL6��1qr�"�A��_�)!����q�U�}��B[L(��P�5���d�N2��
�S�<Np��������k�>-����%����k�8�S��~&�d0����5ei����Mq[��p[����=�Iq(�<v+���d����P�UT�@o��,���k�,��a�Q*$��d������iL6�w��\�@j��� ���(����7dX'�Z���\��-��!�9>��������:�#�3���iX.1�(���A�!�Kk����v���nQ��O����~���j22Y0W��&��,(z�;u��G�Z�z�Bs�-�]�z�T��[�������J�2�'�@����B"��p�z��k0P'c��bu3�p������<e�-
���j����W�bSY�c���d����!����:������!�r/sr��.J���fd����+��O��O�.��'��{~���>���|o�Vp4�CL$RS>�������L=�'����9?'�������bJ������	Y����\{8o�q��4ysW�U|�eA2y����V#~��������!e�rj�w��Qf0#;��&Ab�4����L���c'&�O�F��K��P0w@���7�����;C��o�?=����Y5���&�������.T���#���U���B
� ��@`�����?�R��QRl���4Qg�����8KG������z�����Fe�MM����6��>���N������%C������M�XO
-�N�T�I�E���v�����-WZs�@����'��,�AM�������"(�b�if�
'C�v"���u'V�����xI�?����e���ss8u|;=�h2U��OD7Np~o2���a�x	�]���K�t��i�:���y�fjuWXFC���x��7G$��Q��sR���l���H��Dh	��d��h�����Y��5�:��6w������u��ETOY>nueZC3.+'��p��D�����1���BM����PW����^oI�G�����+�J�4h�Ed�c�XU%b�����@�^��n���z�Ss����4�����m;�EN <��a�g#�9�VJ�hG��.���OG3V�vI���Ij��c����q��j�5�
.���SZ�c��o3�=��{�-��B�������s��N�,�i��h�h3w�y���p����k�iyJ�V��@�YoO��5���$l�W��
�I��]�u�fW��O�	�p�B�Hx����U���Ybq
$���h�W:�rz�X�����t�~�H�*�Ga�� M������k|�m�����j���R��A���LH��Q��z�|��~�y�2q�$�Ur��1����S�\�\{���j��|�}G�����"j�&P������3'�������r�{�����e��?��\����GD4G��)��WLC)�TG�zf���W���:�Y���	h: G���P�q2�k�������`;9l.n�",�nV�Do�k�?xwy�;��<���{r����o�|^zgI����T��w���FVB<��X���������������<�I7��������Q��y����0�Z�D�!h����L�����C�1���j�aB�u,E���r�9��������!e�IA�����$R�jW#���,���^�;;F�����7�����l.��}��������Y�d;��S�BR�b8��V��4�n����dJ�vm������`���������.�g0����wg����`��[�=�:"��{F�`F�X�Q�,����Nr����a���56��#�C#�

�|o�1��`_��b����	��������H�BF�lZ�g��1�/6��� ���'��b�h��Q�3nW��tt����&���������2R#�S��N�:4����LIO�-]"��Yx��/�����'X+�4)0p���.�Zh�@pu�`�'8tl�=��db�?�'C��4��>4��;I�K6,x��?�-����'!��y���-�r����q5���#I�G�>�;��-ir#x��|d\�MpO��b���)	��#��J�[r;�#���zz{�a�9�j����5��%"t.����[�a8O�!DaQ�����I�5�d�:�����*����y.e]�h�F/a7�����V/J��>M���la�����*tJ��J|�i����k�Ml)/^��L�����(2���pS�8$�
CK��
{�hz��5� �%-&�2e^'q�O���z�G��$�
����k%��<��"���5�W�s��{��f�${�qe���������uk+��+�AV����~��D<<T$�ek��������H�t��0�%�d�47j.���q�s�@���r���BR�2��i���(`�)���[n"�`5��.�B��)�^����K�n9�l8��3��4�4�\���0R��91��t�����CjT�zOl5%�fK���bx����#���������X�d����y�����y���I���vm^�@��}�&�]m�j%x���g���-N�.nyb/����f����������Z��,���U��f�w���7U,����������~�d��2�]�M{H�d�`b�F������(�U��G�&������O��67����:Y7^"���t��!�5q�"��u����Ij��B�/�9���X<E�oo�	�A�C� �PZx�-���������� @��~Mq�=W�U����H2��C^K}�JQ�{��.e��x�� �J�� Y�����2+��M3����l��k+�Y�,!�#9��M+�I���cK���R�,L�n��.�
�6����v�<�i��_���'�G�T�BMrNR�&-��4q�T���W���)w������#�
sA��*B����=�S"N��
~�p/�W������$�E�����>x��B��wy�op/8�Kx�)1��R�w�gW�]�b�"�#�h�w't��m�����@a'_A�����^�"
�k�m���TH,�]~=�
������`-$��dVu2���������[)�m��XZ1�B�V�^�d)�+0"��Y?����oEb{����{�b;�?�>w�mF�9T�����0kI������0��9~�����j��K?��!�m�&��~wt��I�����My�r�*m�F+j�����lT�"�<���#��hV����r#���;�\�T�x$�uAg����ad�O���0������m����V:�\b�(@� U/(A��(1?��'��X����}���GW�ua8�w7L�?���oWJ�H!�����bi�97F�+��s�M�OEP4ufv�Z�M���7�
i7����a��;Yar{f��2('��[���+s#oke�N�h���$Z�.b��n�U
U�;L](�����2��J3�H�-b--JE(��
�/�����m�������.�	���2�7�����EMm�^y�j6��xX��)�U+���9���=Z(>rx��� �VY�#�g����'�/�zE�����n6���07���qA�{hT����\:�����	�9u%����;���2��U���J#�������zH��	V�\@W�����Rl_O����b�JC��d��OB�r]���cNWt6Dlw�x������,_i���h_a�:v�����fK�U��|��x��6;���0P�������0��V
%�~W������2Sk��c�*����;�6��g��Q�C 7��W�������L!/�	x��]�e����Y�.�"1oDC��b�y`=&3rd��%��2<����%)% 9�'�$�
d�I���A�����LwsL�G��d����P�1oh@p3���\T��.����	��Lct~?�n�D��,�<�D�Z��
j3��j�2�g-�Z��(rw�t�%����������B����d"_ �{�E/�E����,�����������:� 'dGv���Cw�*��c�2qG���;��Zp`�:<�y�=Yl(vy
0a����e��J�	'9���[?�������P��=O�uZ2#�8���4��lP4�j:��$#�L#Z	�A\�O��5���T
B�4g��\-
&�����VkMo��CQ��{i?��#�^sK{�E%�G[�	K�����M"2�������N"��K�~���"h"�|ro��&�>Lx�^�>y+�=y�-�i�k��6f��6��ZI0w������!L�YTk�fh��W��#�9�����obO�t���3��S*n�"-mJx��P���c[�(9?�l1�������z�]|���9B�&BP@�x�E��D�<�f�!���'~�}�F�H^L�8R9�����g����y����R�%�b���"�������%�S���cZ��qz[����p��;�����Od�k/+�KF3%eg�\�%0I�Y`�s�
:{X����Ga
���������
�7/{
*���|c��$����������;^y����]2^�k��k�[��S�����/ !zYCR���%�{k�������y��gw��<��7_�6�3R-~U��z4T���`�ZA�< ���f�����v�������~���UE�;� �����\��6�/����7'g���g����I�yL�\J�b'B����V�8�^�E&����ts��Ui�#�U#LY �����y=M������$�������z�V�K�F9:c�+"�.Q���5��_�y4�����b�5�������~��y��R���fu����O&k�VFf�f�&�
�lP`C��i�����W��jS����
1=1P����)%F#�|��a��~hSZ�j%�s{Mx-���hd�k�]QA
	��%�Er}s��,b�n��!��hL�R���w8�'W�=�v��5A�%�:F-�\l���,�d0H"�N6�ZX6u�I��_���V`�`e�0�l�0P����`g)���%E]
+�dt�H��J?+F��d���tT�VZW,�
k8N��#IP��p����gT��+����%���pcV�D�Z7�����
����

H�$9^	�OscZI-N(h��K��c'�n"�B
�k6��["i�_�*���K��p��,�����S~����r��(
��!�l�G4���qS���Zp������eO�.�(zs���Y�3X]]�a`[����q�qO������p��Y����x�����%�����%�~�p��y�����_�W�Xu��b#�~�1#��1CVC����������bx�B�~�������|�j��z���<e��4����h-���6������'��@�y�Gl��'wVY3�����,��u��� q����NS
�,F�(��Y{S���h�NR7��c5"����,8#|`	��:Oy��B_~���O:��^�cnY���|q�6���Z#��|
���p�9���������OV����<E��I����{b���Y�_Q�b�ke�(�\��t+����dQf
��X0��u����5vr��Z��j�r��X����5my���$c3IYY����)�����������4e2C�]e�k�|>�n��^����.�5^��������?��i�rZ.�*_��WJ�)c!����#��
�������������kvS�/E��A*�h�9s�f�� zv�g�x%1��"�qT����A����K������
���L��}%���E���+)aa;�1�^7;�btY�P�b�#]�����~#���"��}�:�S�(�A�����`kY�P(o���Tn�������w��C�Q�qH�.o�+�������H4��B����R�J;��7�2E�T��~L"����L�|C:���t�v�O^�X�� ���8%|l(�E`�m�����j�?�����;���:���:>z�:����lQ�x8p�\rLqB<02c�lgR��J��$���f�%
�
U��Bt�l��U�P��:J�����BsN8���+���JKwvJ��7�K��*dD���B��}����O���4��I��4��i��s���Ig��q���#�J,�9����uB��q0�}���+�4�z�` ��J���/��|4�@��\���a�C�/�����������%q,5�D�=�1�����w|����%{,�~}�0�A4��a����<��:9�zwq�;;��]�{�������������&�0G�������
�p�c�Q��0�>�)a��J����
��-'����c��~��,����:�3����\|�C(����@/�3�T�uS�������G����_!���������'W���mh"F���z��=����%��e�l
s����4P-���C3J0d���KN��i���s�@��>�g��j�P(�����k�`����h��yD��]��.
:�?�J�YI�� ��+A��0������,&X~�A��>���!z�D��2>C�-~�k������CY;�n���2Ja�!�@ShY3!a�)4��$H{���B-�k���:����66������&(��C��/0�_"rC��
������?�Z��p�z��_x��N��n��K�\��n���HS@��1����%���HS�G�Bw�s�Pg)�	�����`��r1�J_�l;��U��)���1���^h�2��i�#��Wph������.=�t�	po�t]qA����9���k���/��E�����e2���v�5�A�L�k0P<�������fb�{�\�)]�F7�b����gFF9!�N�����bV��g�p����i���uC�ZpbG���P�K�FG����F�������u<�m����FZ��g�l���_�u���(���OaF�i��s��;Do����i�#e��G��hn>{D���5�ZF����PO���F��k���w������$o.�+����1S1���Yg�0v_v>	��'�D�8X�DAh�z6j�`�&��{�/�%��uLd�#�9;�����
�:���!����>l�}����c����g�
���WZ`��(�yP������e����������n�����/�M |�R]��9rL�Gvz����@O+�g��PNmE��pO�)��;.��1����/]���_HU��.bU�wS�G	�x%2�E���IzD�����Q�
x���T�[`R�a�!�Ii6��[�&�����s��J��!g@��:g8g�������c�������%�PYQ�Q���|QMx>�Y��.������,B�w�PV�����wu4V���Q���@�f��"<S����2��I����e��4���I$������ri�
0A�$�!�	-6�x^��A��������D���IxMz%�*$g���B8��(Fr�����]�
x�	���E�����������^�f����Y
V9�!��@����%�yM:�?0l�hd�3�w�����,��9���tN�(I*H?.V��Y�tP9�x H���o0�v2l��g�?�����G���}S���nsp�{�h�;�0�o7���v��z���r�Z]�������n���UU���t��it���K�:O��6����PV�V*�z ����N>��&3D��$* �������+�p�
�5���ytc�RG�
��)b��?���mN1��F�-��P���jq]��,�9��E5�Y�'��a���:��]�a��d�pzO�"��m��f�����R�.���@W_N�Q�dA*�T����$��������4mL���Y&�X�a��?���!	U"��m��QWt|:���.��s��*~4��c������z���1�<�@�����#���S�;��<����s�}��*8��x(()HZKc�����K 0�#�0d83��8_^��n����i�����Z;�x��#�u��,�p��Y��o �q3�����Vj(�T*�^�3L��3�@���d
-�U\���XJ����Z^\g�;f���PUj�b.f���V:�2���I��
��dY��{��}�5a���?�]_�`�i��`����d�vC�/�,�h����xzn=���q+L�������uk�o4��axp��}��[��Xq�o�m"F�O�����A�}�0dI�N��)��^�&P��������R�=]���T�-Fdw�8�0������RrZh��	[Bj2�"��w������-��
X-,�4�b4���Et������6�E�_��~9����:���N��=������V�TY<�"���cX�������Ng����O�I�>g�(b���C��
����>�}��%�.���r��g�^� O�K<�"~�T��@�^4���g������#�*�+�Z�U�3!~�z�:y�`��"3	)O
g� 7b�������a"(���i���!p<��a�N����O�����|8������A����� ��������%�����zE�ZD�<�KXqC�j��C
����\�2%Y��D5�/��lJ�K�����A��.���-��~
�Iu�Uk7q�P��%(����rs<C��"�"������S?L����Q�;�2I�BOW�U��Q�� ��v��e.���l�W�8�R��p��n����F��Ri���.�)�v���h�n`���@a{�a���J<\�����|�,�"ur�:����������`�a��0�����=8=w������y��V|�45n�}�{�]�hO/�N�{���8��z���k�u���6��V��o��(H+)C�^�YD�Wx�����`C�������4���S0�+�9>cL��F�'D��C�i���
�T�q��i�����]����������w-����O������h�c��������^{ ��V`���+��,Ws�����KP� �����G0}��
�l��|Q�3���9K$q������:�H��f�;_����f<���.�L����
��s{����D6����5[����Z��G��nNM@����f����W-=���w���b���N�+�a�$��=b�QN��h��w�S==|�O����o���<Z]sfa�j�����:�	X����Sk�8l��v�dR.��
�5&����g�@�w�i��_�l����g�n[_3�����B��S����e������g��B���6�nz�d�z���U������6����?������t?�=�����2z|�0�r5��p���Zs���i�����g:{�����4����K�;d����* S0����Y������	�N�X<�u�_�W����������N��htw���fg-�WZX��JR�������i
�y-�\e����[��}uIK���NW�avc�T��2�3��Zy�����'g/O��OO.���]D�[<(�Yo@OS�����:��H}����
�%k��z��s��ng/xF8q�^7����1��$L�/;�{{~zvur����9��T�up?��p?���~�,����o��~gvyV,�>��7l��	9�s
�hx�}�������o��������/!�%� ��3I����+������he������`���
�!��^u�U���*���|9X X�&���[V�;�|�`�ahM���p��>M?���F���w�������nE:i�5��#��R7�R�R����Z�
�)�-(�3D�;�^��|�8
���dt�(�e�}���{H��R���n2�9��v/
��0v��b:G7HI�y�o�r��6
	�X�������A�y�5���N?��L�[�)B����g��g/��H�[��J�#���ur]������P�|G$xY��V:e�*_0��
E)�Q4	U��7��Yc@'�kUo�m�����L�
^t0��r.;��u��5h�]��C���/:���Jv����C��(��@g������<����p�\N�[�F�2�A�\�>���4��ch����!���3
���=Y�oD���KI�O���hq�����l����������JM6�i���SH�z{tquz����9��J��Z���~]*�����
�a�~,��K�'GZ�s]�t��������@�0�OG����������C�:
p;���N�uj�v������h���B�o����K���@{�@�o�Q�	��zT�e��8�����N��������������`�r�`��^1���^���L�h���^}T��>H�����4+N���B�SOc�Q�Nz�["�G���p>$*��pZ,`����8�4nF0�"�t������2��������,tbAN}��pv��|��cU���J%���TL=�����iR5���I��P��9,gx�@:� �VX����i������N��<�q���9|���>=yR�	�HN��FU�m�_G�����dT��p3�D�W[��nh�
]���t��_>��Wp���Y M��_�c�K����Q
�Ic����v����.Gz;������l�0*�=t�Gv����Z�N���\�g<
f>��0��@�\��������)��}���aqG_��
L��%\n��3�C:��vk��l@��|w����h�#3P|��cL!�)�oJ�L�WA��l�<�k��I���o����3��9����#�$_^��g�/�r��k�1�O*H*��cv�9�E���D��r�Ur�d6+]�[%����t��,�����4��$!��,B�4'��?T�U�J;�x9�+��W0T�RO�sI��������>�4X��tx�9�5Eq�Nl�#'^���E=!>+^���L>�:��<D��3B.y���"�]A�!Z"���W����_?�����$k������S���;��&o$���+h���X��}^���x5����5�(�������S��B�o
���2�C3��`et!�?�l
He4z�u�r�7����.�������'�X�J�9m�a{���,��PV�j�=G)5I�5��6����U�<���z�&1�"�3%l�O�1����������m�e���N�z5kM�\�������P�)N����'GW'�����'�x~@I�'T�gW5����>���U<���r��t��N�&����J�����o)3GH���p��������J��l��7,�,9�c|%�������1��G�r���\F���[Sn.���PS+�a�Q\Oj-P��|�4��]��W#���QM�����������g���!�kjP<��!�n�$�;�r#�`��4&���HNz��'$��Z�'����1,���b|����F5�j�0������?�w��&�b�|�C6��9]aF�`��������������	tb��9v��Hj�:�g���''������nR��`��w{V�����B/8\D��Z�{�E�������������H�d��.�]���-T�/pS���/�����[�V�K����?�����p~g��!.����Q[������l�S3GS��E�I'!#���8���V��b���Ow�=�O8���h�	�����]��4�u��p����G��~�_^]b�3������������TGg/�"��g�������\�]A�g���+f�;'V��`�.1&2���"��{O�OU�+4���X�mT�J�vJG�m��3�~u�A���i_^��Mq�|�t�b��|���:���������h������'���_�D���?x�����/�m����+��������
�H�<�O���g�M8	Fr���|����{p����b��:%$k�Q3�����7���-"�+���N^�����A���k�k���-���zR����
I8���b� P��g���2Ns9������hx-I�s"�����V5����eU��+��p���[ j��0+m�Oa�������R��-��9���o}jb��>�vge����?(���8����kZG�����St�}�B�����\�������������VcNS��7�u@�	�/y��v��Igc��>�Ze�#��Ej����i����������C��X���U=T�G�6
��y���Cs�4�t��$,�����`��;��J��j���M�8^��*��V[�\�A<������4�Cu^�������p��	�Wsj3$Tc�{4_����+dJ�P��Nso��h�G{������������%)������*�{�w��_�h.e��&32���D�O�0"�o�qb�;�+v�
������)�f���O(�p�\��� &�8~@��5Y�p-�4>
na��-[����RiD�s���������`��{������C��w08\����+S�B�N��TjP������
T1�K�SeM�~���h��[�h/*x��:����H{��&������imIwN(� �m6k�C���&��\�N����z}��u��������:���n��2�9���(30#�y<>C��K���v�Y2���_{��K��������7�O!Y�)��PS�P��{���1��i���/7&N��� 4�j�y����/8O0(L=6�!�����L!
�8��^7���pw�����lA�|��Z������*O1+�t;=���`k�C���g���)}H�	^�s
�9S��!����(�{`���w�Z���J_������h�;��^���N7R���bl����;���*g���`�X������;z��Z.������z.��]o����o�����:��~�w������;8����qsE��z�>_c�_d:���F��������0��]�/��T�>�]]q~�3�t�:�?���O[��T����9���m�0��/::E��\������0%��~�s��r����)����l�n�-�(�]�i��t&��\��>���]�����t�m6�;�nw�_Z�f���j7����^��/���&��g�'A�A����V�?��F�]�Z�;;���n���w�{���=;{��~�`ow���&���h�X���S�R�V�y���I�����p������n��p4X4��E3O=U��)J4��6�6���������Mt��\�&����N���>x����*L6)����4���[/�@��C���6�x\����%l�j���J�S���p#A�F��E�4�%��n��K"tPFaZs���H�u��+��)C�a4)��a�h�U�	������I�p�e�J�m��6��#�Js�?g ��]��;�^��G=zY
�)����X�	'
�K��1��T�T%\ "����R�
���5������lZ��v�+Jk>�*���>YD���Qt��}��OuU����1�h���0iR�B7�T-�u�������� ��'���m������Dc��v����S/��(���0_�q��<��m4X�%=�����y���a�3�j�8��G�X��9�I[�]��Z�����ef���E�[@��+~�����9?tvv�(��'f�NG/����L�v{UW��';�U]	vyv|�t����Q�NOi�l�V�n}!���}���Z����Z�xv�Vvf}�R�ZY+������?��y������T�b���.������{�����&�@6�A���l+�5G`��Z��@
�q��+e�$���>�0=��s�k�wz����pe�P������R,��  T�A���T����F�������]0�[{VK?�����t��?�/�$�G�����s�W$��.���o��<�����x^ �s����?6R��5%~�Y.G���c���`��Q�
�������S��4v�>I�`T]��3����\"�\8�3�w���k��-onf��d���O^���v��:U�p�S���t2:->��^
x��v2V
�?�\�-�G�r�$y<(��iA9��VJ�R)>���D���V�G�^�e�A��Vn��n�XpC
]�^��l�z`���ml��$������p��n� ��S�R5��]���
7�G��~�Q�V7Ss�*�S��t���������CY��f/g��������$%��W}k�uG�6z�uP���EI��yRR%���x��V�1j��V�zc�0~~����_���������o�iyN����GG�����|1Z���=������|�\�l��p0���� b����MH��x��4R?����x�`�$���]<�����Ww:�asH=��]}������������#��������'5�
p`��>a�^��jpU�~^�9V��I2���tM0��SG�Sh�a�s�H������h^�2G#����(�#..�H�B�?�`oi�������S�J������Sf������;�*���
���z��nn����h���8��i�=�Q7�8�6|�o�����1?�45+W����C�'�l8�����7�&��S#3r�1�����Kw�!���w�{�j���s+�~jO�}��n����W�~��)��9��XgL����(�]G;?����2i�+b1#p�T&���>�C�[����?6����]�������&t��`'��f=R[;�/��c*�����p�k5X�Zo4��lg��h7@x2e�q<������0N
~�n��h���.�	��������-%��T5�����z�),�:�W�Fk����~�p���������("��s	�<�`_����I("8�D��������s�{9F�'�S��)���8�%�/,�}c�w���;��-O��87���^�
��e�?5��
����Mk� ��������F'.	����>`�sT8��&a#����pi����E-hv����)oHgp����!U9�-��
����
0pS?�@��>��&t\.����v���f��S��"fe�+J`i���n�7�<B	���@8��$;��=T�S�$�~�k��4�zth���
��PUxj��,)�7�f�x����(������z����5��i��]����]D/�#��?��O1�����*����<�8��t|��f��9B�~����8�`
�9T�I��=L�$��br`�b�G��A����+K�{����+*��T[~�J�����v����,�����n���v�~����`���a�7�rR���o9��7��@�:��+�����Jv�����$�c��)�+���N��h2"N�J�.T�d�88(`�p��1�A�1����9��~O&]5��2����z
Z	)�������r@�.���EMFS�(<�����F9������XOF#.��i���7~� c��4��G��������UVK��E��������tU~��Z��u��W�@?/j�_��_��nK~o5v����;;m�������o�Z������Ng�����3k�p��Z��[:v��Ygg�T����s���k�w�������LB��9�����&�H���:�Zg������w�(��=�u?n�w:����o:���:v�k2�@c�G��C=zLB��uh�G>�P?Z�E ���,���?#�����������1	���q�/��D�x9�sJ���5��Ut�����D��QA�B9���A��*S���%H9i\�GE�nF��(��e����w�2��$K�o���1\ �A�0i���	d�!`K>y�!����|E�yI�B=G�C�/?+���P�������
}Y�W���.�6��jD���b ������D�@+����)�t�$�@�y4&�4z���-xB3Z�N�V���JL?���w�t*��p]��.4+;'�����7��IY�x[�e�}h�����d�E�s.�Ga-�Q�z���k}+j��3�U�D�-��<��B�*��
o���%u���`���m4v���� \��^�V�zzEi
�nS�F��S�d�������p������yD�7��{�������7'W'/l?�i(:}t�����?�?���%�X��Ggk>�<���R�N;������:�N�,������������o8=9���L.��i��4�;
��v�R��[�����+����hJ���&K&�����.�1�pnW�]t��"�a	���>�s,�����Ys�G������Z�[��}����@��q�0����vkw������*!�F���i�sU^hG���\����I�\PQ��^��O<��
�������Ao|��������N����������=3�a!,���K7:�?�����.���`���������H���H��&�k���~��`�j4�����>��X<���jx�(���&�%�?E����{O�stY6GGp�,�Q�Q����1*@1�-�,���K3\�aU?�����65{�D���j(lO���l������)��c�����t��,��,@Z:��gR��ZB[p�a�s��?XD��t�����Fcw�l���,�+�9�y�Di�6�w��6=*���
���b�������ZF��u��p��xA������%N�go�5��$|�o=�rS�n����?��s��k���-3��F��t
N��gQS��UhG�����;�]��9�:}-�f#�g���T�`ljt�`<��d�_�_@YV;o��6*��P5��y�y-Vl���i���j+�K#����a-)i���D�� F������������~�Q�
'�m�7�����l��k�+�"<�(���n��b�G�A@���z
x2�%\���������u�S�u�Z�[��a����|h%�n((4In��i��r�$b�3/�&�^����USs���n7L�� �/�
�����vs�NI��L��O	�~<�6{���7� ��U�=�
O9v����X:��Z��y�SI�W��$�h7�nUK��M��U���.�E���������Up�Wo�W��@�����X�+8��(�*�
�<=Y���7����Y/�]�|4�Ngx=���}�I,����<���(�����	����V�h���}����LY S��Yk�M���KoZ����L��x�:���(N�8'�J�R[�^��/N�Nz_�����:=y�2����L�"��p��L��T'������u�uhmz�`�����������N��i@I���������`
���p��h4��N����\�T���(L����{PC�~w��}�@7�c���CO�@mm����GM�a���L���D�I�R	E��������^�<y{Yj~j���s�]	�n��y_>}�u��Y���F	����Wp�oT�b/���>���������1�y�$����[W����[�
]Z-�*���p�e#��y/l��H9H$e���9}�EW����o]�wBw@[����I�5`���
���/�9bN�-�u�i�q�=��X�F��1���8_����sJ=�P��'�B9�W���)M_�y����p����X5;�7K�kXs�bn�t���-M�k�x���7���n�D���`�4*����%���J����(���bZ��G��s�>P8�;�7,��G(�z��������w���Im^%0E���`����	?a�JE=�?FS������p������V�tq������R]����7o�]i���w�W�'�����n�!C ��_j��G��a��"�eVn�c<1xd-�0*>���@k-��I|���A���/��:m���m��w�6"r��rKwo�<���G���+�����,��������h�c��=��t��3�Q���?��~%�t��<�A�w�S��0D����-�38��}�U��`0���#=�B��������������M8)a��7�D�!,����h�l0.��
�@e��
h�%VB�|���<AA=��)=���5!]�������^�/*��P��5��4Y���I�~��	�jX�4g��5}HL�pF�1�%����=u{��[/9?*=j��XwJ,�T�G#
�p�2H��42N���
x��m��
�f��S�b3��##����&�t�>��@B�|���g�3���.�vg����J�������C,�x���I.��c]�����<L3��^���\sB�L>���.%/@��=s���"���k��7����na~��������6����&�������i?_(s��|��sY�l4��uU�.���������O-p3�}��sa�:�3�<c�D_P�����e�0���(�����M��n�,~����!v~������j��m<��P�FOGE?
����T�Z�d-��������h.�PO��	A��wt���r����%���D��l���q��tC�jX�q�pU����1A'�Mx�����=��c6J0r���7t�+�������	��-��*��z��?����������
�I&}Bj�M����p��s���p��+��
s��oV�G_O��y~�Z��
�-��s|����b|_�xD��W�E2���y�
)X��w@L|�%$)�I��5G�a������&�< �����@�.F��Z
���R��� @S0���B�v9�
RO������7pV�����k��z|^]
���Y�_�">1^�����h����wS�x��C(-��6yw�-~�=�U�.����B�m-U�B>!	�_d\����wy|�����+&����c��tx@v56b�'r1-,���I��������PT8��'\��\}2����CB"��l��G�y�w��� �y���GJ��*< �Zfjx��s�h��HQCP��f��G���:1�@�������
��;b�g��T2r.����=������z���/��:�����o]j"����GjY���/�8��V����S��*'#���|8���p&���a�X���J�8	����b�������mx���B�9��P��4�P�B�L���+�����q�T�[a���/0�En�R�9�V��Sxen�S������U������\��*1��;� iHV�
xhQ��g�	V]I��#V�?vA�#e���E�O�j`�Q���_��f���[X�� F{@�������/���.�|j�v���=5�Qo/��*��K����D�$����S�k+)���V'm"�Q��,af����h�m�
5F!�7E%R�Q���
@��,v�+ni%�2E��l�'��1�������pLm0�`��4F�^}�����fi�)� NR�'N:�/�m��q?�FI�\[_I=�.e>��*;�
�h���+�L�m5-�0Z�wSZS��57]2y��Cj��B��X������Z�mgyj�M��
-����{�9�qQ;o�s���[&E>F}/!!�}+�b�.c�j|���`����0�iB�4 V
����Z��K9�q����Ye�������7�}�f��xH���H4��B'��>�[�����	$^�(�d<��(�=�&w
��c`��b�Sq�.D�@{��V�1��$`���O��c7�du���	Vl��<�c�BjK�f�����j�:��Q!��2Y�&`s�I4�hA�8�Y��)Chp�2 ��}�������!dd�qF,���6���)�]v�p��-���^X2j�����*�#��<`���`[�Rs1l���k
��d��
���'���k�GaWL���x�,��X�^Q�![D>Y�%��mi��:�F#)�'`8�I��WG�������p"�������.����H��"�.'��1p-��#\[����i�.�SD��ql������3������3
C����VI�/����f-�,�fk�����&EDi�%�+����^��W2�"A��r��g��b��kN�bR���9��qx#Z3�q=���d��8�T2�j�����7
����^������|5��,(��+h���I�x����y�g��~{���y��.�E��x$7
�j�h����x����:�#P��J�����4�	��,�vs�#A2������U��5�AwF;�t	���Vo�u��j�_eh/��w�~���+�~�����q��5�I�q������4&��S#�<�jT��}�;�'��&n �@��������6�(��8�$&
&-	4�&=���>��_�q
��M����V�4�,]JC���B[c���
'��sb��k�L�����H���2}4�M���!�f�9���1�
�:lh�EO�6�H9,�Q�IH�q�6��{<&��"��4d�nT6�r[��B��C�����=��1�*������CInbi�����q���9�9��-�/dokw��tB%V��������m8�R4Nb��F�@������}�{�j���a�_:��F�%�P_8�iB��Z��y�2�p,H���I�a��a��x�%
���'��\km��-���y���lAq`��p>J��`6O���Z,�W
W�w|t��I���������K��l��yu����Tb���W��{,��-�����R���)t�* 3Z&�����0�2^�|^S�0���TB�K%��+�Uu�I���*�ZS,Gn��D�R��)E.R\����?�K�'�,��!b���El�
uP�.�6%lW6�P�z?�Q�f�B�yE����:�e]�EmdHg���x����2�T)�I�#H�h�h�����<\����/������G���Yec��*z�r��m�xv�\�a����vE��v(����?�_���5��b��@q��l�������j���sa�o+
�FY��Z�%7|��<��iz9Fen�*#��_���V�j�;����y�q�t�$-x%�������8x"�2e�~y����k�����GW��g�����B�%��V�������3~~���.y�n�)���fh�y��#����#���������T�J���U�7���dH�R���$+��+��{��	������|�Q�Q���h�C�T�����g]�E�!�Pl}��SD��}�oB�?R��tL��GQ3� ~M�-$Ev�s.�=���?�����[�!{~�2�6
�_�3�V#�8�&�����6�5�R�;��!r*��*7�X����8�IXmo�&c����]`� �NA�Y$���9��Uc��>y1Q��(��
���Q."/��FC���c+��`���
xL�$���+��0�������p����Y�P����I�>Z�2;��l�rp>d�W���:^j�4���ra�C�J�����������*��A�h2C]�i�*�*a�,y�m�h�'mY4D�TV3d����v�MT�'-=�G�=������B��N_��������C��Y���j�/�R}��*���m�"���S��M(A��oXe-��P������e��>u�%�
��=���Fd���)W`��������`D2������+��T���������:�i����R�O3,�$v4i�E�45&&s��kO����d����5V+��+���5��M��5��P��!�mM��4j(����7y����1v�)3X{ C���+
���>�!|�(4U`D�	��d�����PV���S#�q��1=�!'V����m����@�b���Z����V"�q���~T�	�p0����-�
/���A)49�\���e�2]z{��yB}��6SWM�.��@�jx����K�WF���A�_�b�C<����mk�?#����\.���F62<C8
���5b�+�I3�_�0�UX�=-h��S+��������A����!����h<fs�8b�L���U�q�&l<�n�H�����Gl���;��q�m`lSbW���/�F)�!�c�u[�bp;���>fl_����7""u-4�4j���I����nb��-�M9����4������^���6�t=��v�7��_<��^��o���J��YGtk����3mA����������K�/����[Qcr���7�^�u8T{�#u,�����a���@|$������l�A��O�L�.p���pH������)g�B]����������h1J���O3>R�����l���R$0�p����(���?�KZPb���!����r[J]C)�&��8FD��A�_��iO��Pz;X�A�x�S�f��LZ��Y+:�8l�����Mu��:�t`���8K��0��N=YT�����������^��f�\�-�V4	�
����2nk���o�]r�TId�Et��������d?;T���������w���#c��1S@f7ol������'�7�zO5�����t�1���!'�m}e3����E���'�Y�����}H�.�3v��/pH���������`���^���Ae�(�5�&�Q�C�H?�q���NB��#;b������&�j�i�_�i����v����������7������K��wyC�i����O���V��I(=�F����<0�"��6p@������������d�������6�:nNn����s�������>m=8�2�X_��K�"�!���!�u�$�e���UN�tf����������R�����/�
�����g� �8�AEV�sB����x����C�i����n��������0��!��W���-�P���?���W�]����tg~�qk���V���x^2|[���1��FC�\������U&&��fk���#�}s�en`Z��=�<��m�W��\?,�J<�T�,�
���
�Z{�1��k��������� 1�r���n���3�i�#���?��
�������~L��6�F-���M��?�������6O��s5���/q�n6�p8T�������+F���87>5em%���%
MJP{�NaC���RP����Py�Q]���:��`��i��n���3nG�U�Y��k������9o��7��Uc���fJ�$]��w
u�,��(A������G�}��6]�"n@k�9��->��hR��j�N�f�������k���D�F��yX�3G
.�f�r45�m�V���<�Ahg�e$\}
��a�C�Nj4��P<�&�O�YC�3���b5�	8u��h"S&3����� J��yXX;4G��xV���j��]V���v7P��)
a
���{E���������l]��u�������z��c���,�N`���!<�>��y��B_S\(U����!~�6�EjJ�pJ�����V\m� %������!�#	��� ���z��(�p@�R�lS�HG�|@!Q��������j��
��,)���W���#�B._�u���9��[2Zn-��������y�D6�*���r���U��{�����Qe7����L+��e��������������U�h���0zG�au%��#-��=B��E]}��������xoDAx�}������rv���rv}�42���n%�t<��#�����4S1��m,\,)����YE�W[�%����D~�����-|�x�R�#�j� %�~>S���K�����[���:�@�)59D���2T��S�K��&W�xs��e��x�J8��
�����#�F:bg�x��xLF����gP�4-��|�-b��M�*s-��,������dp%,C'��w�4����/5�����e<�*��|a��\�U%�8�i�wV�9��Vp[6m4f[��Y�UK#������H2�RvT����g�0%���0au�A��F��2?n���
1e����w���.�q�����'�#�0,k����`RQ����2E�;�7�_��v`��b�KE[���v�p��9y~����(e�����QHZ�8���<�	T2�2����Rr��z���c�^�h~���q����E�R2���������xAKW�5���:�^�>������J6m1���:�ua"9��J]��<�r0�I�]:�3E��_����/�8��<jc9E%�F8#�����
-�3nO����^�F�4��nN������Yx������c^�m��9+���b:-��Ag8�7�{�����u��S
�NO%��6�V������!`SK�����u��y�'x1e;cLog!���2m�*�W�<�Y���9����xw|�.�3q9�A�����G� �
�:{�>YtJ�����~��#U�H;��n0��+MHP��Bm�i�'�����P6���;lV�Q��4&�������^�[���+�� 9������������Gc������'�Rz��iv��������E�g������#0���n�$�!E�������D�#�~��}Z���-?�*�=�L�)��_H�pg�����������V��+h&K�

2�7'/�Rg��G��(�]V�xF+9������_Yi���	�R�������y���^�0�4bh�����~���������LU������N>�����+W=]�<����	z��&��,��F#l�������0��������
7�Y�$^������o�S����?�|�Br���>��Kt�����H��S����[Nl���<�Q���� B��������Na�Em~z_���E�F��i���&I�f����;�	�8,�8����*H����!�&~���tT7��.XW�:"?.��g9��,#g0��r��h���,fOG�[4��7��UrV�\����c����%������
D��N�&NR����q�����q����^� ��j4���NM>k��m�5w��!I��&�;f�t�yTp�.s2?���^��h^�q����.�1���.����*Z��9��#���L��pG�1����H���P�Z2�*K;�s�^.����
����F�MQ��U�z�X/!�al,]����K�.Cw�&%�n����t�������_�c@K����E�i�	��+G�L&��n�K�05�K�������"�.����4_�n��7Z
S�`���	q�����{�u�,O�,
�r�[�2��l�s�r����!�U��i��z���� �1�	M7����i�/qk7�arw���(�����
��Nt�p������0���qM]U&"�T��4�A78�����Bo@��R��W���;�c���L�:�5�Vk,2�FPKl�0���x��Q���h����Z�Y/?��GL�d,��p��
���*Wy���k��	SI2����d�=�dA�)W]�"�,���~�qw��q���g��u0�*���?�C�3���A�ps��!��a1��-n�[�'*��@!�3��r\f���lDy���O�w��Mu��-x;���	{K�6�!��)�X�3�D��o�&�#��pdx��9��x��
�'�9���b{[�
���(`~B#�U�,�<M��Q�/uvW�9��9��Il�S��������~����5�Bb�Yf����4�R�_��d+:y�m�N�A�_@�+���K�u���+%FjbXqU�*��5�f�p���`8�z3��L������A�S����+U���4���;����R+yF6g.��������fB;B ��
���PD_�$���;�ClK��GU���y����j��xe������e��
�q �k$[�^�i�F�k!IF�F��!�C�;<e�L�B	��}�]�w�CGc`�s'�R�L���US;��9����F?T��K���)������f�������n�##�xk1GA����h	������~;����,X��u��6)u����H�7���-�oM�����1��x��t�1#V�W4�il�+u&#�?���
�����^�g��+�~K���G��W�LC	��$Y7��$01��2$�w�}��b�i�]��`x�+�8�&bi��F�3��fF�U�J0���;�y5y`qp�����[�0jFC,Wa����hWY��xe���^�	����n������HM�
���r�"������ ��a����J`R�A��]��nME'�>�g��I5����`�uO'��QM��u@2��x�b|�d&j���u��l9���@P���e-�D���/���
j-��Y��T�������VR��H�J$���?_kQ���.`n5�c��I+���g�`r'�E����K5���G3�������.������;K����`�Sa�*��~�����'BGK��VEr�E,����'�h���]����X�%t������b�t��R�a����U�q��3��ng�[���1C�zL����������U�V�����aO���A5����.���0�x0h�XK�u*�R,*0yGJ���	���#A���a�5'�C$~_��[=���R��Zq�=�X+O�f�u���PP�u
���}�Qd$�:_����T
����dlV���W�}d��M�`��q���z%t4�$0
����A�����G@�+��
���T�ni�h�LLDR�/e�-O�@�ns���5N����S�`)x��L���z1P��ft�V��at�q����Y�K�	ZG�hS-|�L$R���� Y��b����+f
2�Q�GF����Y�V��kh%���]rV���0��I��Lk�Ac�����@�W��(�I�d�������7� %�
���Dp� 3d�����������3%]��	�
���>�ls<$Bqz����{�<LK��R&�%e^01!���	!�F�b:zk"?WLi}W8��@�@��3N?��ao�f��������RB��*E��%�a�@/G�Ek�4,���MB��):*��Q�������v�*��N"�lD���t��9���l�D�L�)��i���T���w?8����ms��)W�K@������Djd'
����e�:��>#���#�
����%�J�����|�s����U$}#X�7�g�������wq��%��~D�%����	`H?_�7$�^\�G�w�p0:�&z�R�j��I�O��I�q��R�5�M��u������A�:qF���/�������E���SI�
���z�C�8k�N5�EDR�%�h�x� ;�D���/@����{�+�w�\��|�� $�|��'�f�&$��i��S9�P��-p7�yy��y�������*�z����\-O7m�lz�4"��Z������
d-N$�x�i(��g���p�	�?�
l�5�A����rF`Su7�6�k�2�Z�OY��=���514R��,���1�����UD�=���p�N�����&�z^l�����������_�M{�I��%��G�TK����x�-�3@�����<����XA!dZn*W��+�~���,U&��}���&�����rU��qC�{{R,��6fp�C�
C����s)�4E���%`){M�B+��
Fx����P�	6W��2��0���;
W�8 ��(���j��{���<����,t~��{h�[T|!���W�����P���/�5�����
oq�l���;r��/��^�Ny:�>7����[gn��:!�A�wz�M~���&�"��S82-�1qmm�m��w�c/�x��{��tK��������%l�5���L�������>q:U,��tbEf�QzS�����'E*��e����X�
��_iP(��Js8wl����V[K�3D�bK��|��F��U��uv���������9����aU$���I�]��r:�o���\.�+Q��F�r�T2V��^(�M4��A���"0w2�I3�F��_^��4Nd>m��il��*\4X�vL����FR�[}�������sI�����}n"��v�}"�����a�R��nB��;'�|6�����v�����Eot3]Qe���](h[�k�9�6R�0r�C�������H��ED��H%�{dv3����Re�P�����O�^M��K�RDW_�t�m��
/����J�����8JF
.%��pzO2���-D�s���b#�i���U�yB��/�6�6F�T^%���q�fF��.������K8��~O�!�"',�R?\���HE�~�?�!9LK�QM�W����n�	>E��St��������8*�������G�8��������R�>���1�)����
Sx�MZD"p�\��^�i��m ���\+y=�s
��h�M-��f4u�R��S"��d-!��v9��7�^;�����s����#�-D��;�H�y���hY�����������)���9|De_�����kz�������5�|���@��5-X+q��G��32q����Qx[�'E��(�"e��B���/�8��O�����i����H�8&��&�Jv����h��m����c���r��F��3e���(Y�[��0�L�@�_y}���"^��������t�c�$1�P�:B�n�iq��w��88<�����v�a� ��h��_'��Q�VI���1z��k�����m~�t$���%�o������e�3v6ZG�/$�,X�X��|aE�'�H+S�nj4E�@������j�F&�7HI�SO��f�z�T��������v����6^+��(W���B������'��.����
��is��0cC ���s�T���������>�$���i a�4����xeO��������p��v���8v�"��f�.O^�_a8���Jm*'��x�>��|@���t���e�&-\&<
��uu~�23)��<���a�>�J1�u�1���D;�t���Ee-���jQ�b"�4���u��*��$���@�Zaw�tLqY�u�S����5|>]�x�=�+���A��:Y���[�W�������%..�����j��I��)DI*�Ki*7��pK�A���u?Tcb,Ws��mg��W8��37���Y��j�~ZLxj������p�`�'�%���f����a0L?��K�8"�0��]�����ns��o���5�f<���}
z���c����O2F5�@�h"^���,�����t8|�@	�<PD��t�Jh��B'81��i�����I�S:�6������ ���vw��j��Q���6��
��u(������avX���l����cn7p��2(��~���G����P��N�io�C�)�Z�y|�� �_�Q�BE�|w�R������#��K'�
�,����w�xQwn�]]�U������F4x�������A�XQ_>�����FK^!��b���4���)�����Ujp�>w�0�8��!�v=L��Q ��)�/r�I�C�k�Iz�5\��Q��v��&c,���i����#����5��F�YS20(&��X
��V�i���r��{!0�A6�W8�s`��F�[�_B)�����
��Y��g/����_MU���~���>G���k`"$�/���Y�	[��������l���
�Vy��NA����f�v��s�3����Jl���p�5
����e:�G�n�r���e�n��4��m"��a�FzY�|'X�����T�nw����.o�54�}3�%�.���#)+������'d����)Kp�5%��7�����b�9�K�
��M"G;���
 ��Q8�B�RB�����@mw`I���i?W��B0~B�=��NN�o0jdL5�mLaq�D��(�L�\�S#���SD�u�3�&z���V��S���Mt�����Rr
m��8�*���M�V8_�HQH��
�M��L�@�����F��

CxR�x(t��L8E���oxl%g�u.���fG�-���\�����%8{"�
$���t�&�Jy�lu5�v?����~���J��\���HW�{�i� �T����V��{�z�r�p��]���e�q���]H{�<��R�j�.y�� �����9�nV��]/�A�9X���ZzMj��������{�e)����Z�4�$�M�<Nh�]G����O����N��8�4;��G
���
��t9��K����q��\H\{/�_{h�	(�� \��n�p8�MZ�m+I83-�r�H�Z� �|���.�%���S��Y�xD[aM��k�s�������f�kg��p�����	+��z�[�
����U�����}'�9y��JG�Q8����P�
;���[
%���h`Z�D��l�+^��������:�1��@� X�T��s����W��u�~�I�
S����r[������'Q8���c�:��T��J�	�{��/�������F��V��*>��ZhD����`�~$���D�jw�h�*�t:L�O�������o.N.����*U'�,���~NV4���v��fk�%Hcx������R/���8�a���f/g���\m7���H��#�
#���s�����������W����Y&���K_t�	�7_��}�i��C���t�F��<��R5��qr��q:S�jk���'���q����	'�|��@y��xF #1"�8k�������"��� n������aE��8�[^b�t{�x��v�+v���w8�-�}�M������,@8���f���w�6�B���m����9�H�_n������X���;���U�B�Zd��`$�|��5;�<�|�
T'����U#�=���F�.���R��+A����!�@���X�%KL�%��T�`4��!������9�A8����8�L#���LFS�f�
������P3nc�T?=����J5V�Cr����J	s�J��������eL�z4���-�JA7SSUU'=t����y�+�����;"%�S���v��T���Q}���3�f,�&:Vz�s��4x���.�������O.�~�T
����
"��
bd������_�VJ
t��3C��z��[5�Ahe�����e�����SU���r�/O~��a�@����:�S4x���.��)	����Z _?�P�jr���)��;�:��CN^��W�/�iu#�\�Uj����bMZKqU�e���IW�v�g%���?zI{`�'S�g�T�"�r"��:���1�>K�p�d���U�����g�#Z�S+��e��RNQ����m�bm;a9�
'�v��6�z�������/��d�V[�g�C"@�)k���-���6�Em�;�1"o��A�Cs|X
j��r�]���4�����2
�Gyn�W���
���'GtD�I`�"��Fa�/9H
o�
�<?�D��J���{3P�uP�4���~����������*��`��9��k��Mn�<��S���z(��%���+c�Q�r�����A�E�������9$na<�����J#�X���_�
�!��Cq�H,c�GRa��@��a���#�S� A����]b��H����{���rC�9�����8��LSr�O�`��{����E��r��q�=n��7f��1�o��n���T��E��X���*O������g��BD��VL2���L)��cM���4��A2����
��7��:��s�����C�����7'g���g��{�c���(F&����JI�t��;|At�>�P�AH�����%�4��?��q�'�f��}}tIe��Z����TC�`�.W�(f��iD���4bxS���1�)2%���������)�8�d�N��������������pX����J��8���<�j���'2�I�������GX������X��h�8�98��}��f&!(���7�O5F���x&I����&J�2{Nv<��V�d�������2��e2��.8vZ�_�77�2=�z������)@/z/<P�F���X�ew�pan4�,{�����4W����<-(!�?����.1Q�������� ��?W7�����U�.W�s���k�^,�]oj7!�a�x|f.j�u4�c�n=~�Th��,��{�;��s�u���@�����3�jU��d�����]����t��tA�l��1tF*�G��t9#7��6��\�
I�$��**gyF��w]��+�W�`+�R������j�4�F��!g�B�o�3E����:��\R�����m4�]�<H���R��Iji���S�������%uD�����8���)�i(����?��
���,�$rp8!V�X8��� m&WW��K�r���QsT�F���}��=��t��_E"B�\4�����3(;4`_����,m�������������<���	�8^�2L�?Z������C��#b���^;���*LhA*���c���w
IAB�g���\uI��;�E^d2UY�������p�@yCq��kz�� �x-�u�b�POt���)w�^��:z���w|��5�{��/�1�oZ�#���y���<r��s��?��������z�������i��&�����;��Z��Q������0�Q�������|���O�����#P��������	�<B#}�A,�X_��bc�)��y85��$�9a��Caf��o�V?����C�R��9�@y^F���G�i?���O�Z*��4*{O)��,�yY����"�/��&-��l54K�9�'f8s��k"��Q�8��[�[�����"��?��j~�xQ11~�)�jb��\RS����!]:Y#<��7K���	�%!�&����Y)A"�H.4��%�X��>����0Od{����AO_U��k���=W��"�c���M�UAdp}}���'�v����i#��,gq�X����q�Pg�f<�Hz�,99'��������Dy�}	�&x����#�j��� '��K'��\n���������0�"3�����t��������4�F�����������G����������R�A�?E�"��<��D�7���B�tA����Tt�D�!�1��p�H	L��i�yK'l�iu�G�rKk!��'�+��i�,���qV����xt�O�T��������������s!�rU$@�%�~�}x!6m�{4�qN~TB�C���[+����+|<��"�^��� U���i�y�&�U;��A�����^�;;&6����K}�a,P6	J������Q�m��m2��u�kmsn 

Z*��#���.M�[�}f;���&����
-R�_l����$L������Y����+��aP������.�h�]��7�y�dow�8+o�gk#H'����K	�cw2�b�1��R�����e9PaD�zVH
,��g�!zy�({���*Ne�+ur�6������]�/6��+IUL=B��`R�����H��]Q>�!G�������h����U��L����y�< ��p�G������z��������`Op��j���
JF;<wn�1�
l$9^ 0f��[o��@����,��Y�$��}��r�>�'s��N����s����Qb����}H��#p��M���t00��{�L���_��AE��F�B�{O#��y��\K~8�=�@xO�!�E|��M�\ �}�H�6���B� �����[��L-+���1�HE[���m!�b.
+�^���+����5����OZ����K%R�����-�o�gq��
��n���K�h�,��8<��Za����������/d
B�����P�i&2^������0�����Q5��V���P�����������d
��4k���(	]k�StM��1\��RyM��f)+z�(�x����@��s:u�G���1!��=:��SZ����#�2��{�/z.�{���:�5�:������&m1�=V��c'�X�"���+mv����|7Y�S�H+�$�b��g���{X���uZ�s�v����3X�E@Y����wJL���4Y�8K#D�����wok����_�)*�� I�;�.��a��8�Y�xZRd������������u��'3�2c����:u����������}�Q�CA�����k��H5��(��e�-���/���W6�5�6���E7�A�4n����Q�{��L��Jsgv�����o�|j�6����F^0	��-RZ��|��k>��z&�Ok>�������	M ��4%�������R���"�c�J�y�h�/=�:cGtge��&Sl7Cn�n���h��M'l�f�Q�SH���;�Z��[��6��a[���%�b^v
c�T�b���7���&g��N����w����9Q���Y��X(��^0���uB	�p�>����6���1M����:a��]��	������v'j��������R���K�����m�[�}Iw�wGvG��[;[��?����X��gI��=������zspxZ�B%)<��#�]*���}V��ciM�R��K���\����
R�5�Q�I����#������h�<tq]bDe�R�V
��8�f��
�H�}��aBI��q��Q\C���}A���w���=�=��Cx�C���.�������Js�0�	
TG�����e��������?X;Zj%������G�e�z=B�3r�+�=���xB$Z��`>UV���Xas��F�0y��`�f�8g=�	����+���~I8K��>�P��A<���.k���E�g�G��� �����|�B�r�w���qc�G�r�U�����h���OqSLx%}qC���9/g��x�M�x�#�������Jn��o���x
�hq��b
n��
��@��q�����Ee��%��u���9B]HeC���LM�b�hZB7�Z�Oj�!T��Lg1<����������U���V��5��_j���a=���b�K2���-~;��m��J���A9h?%p�,�A`���Shu����$�("{�pG@��d��L�e���UX������z
h����}^	6��	�R��z�{�+}���=�7�d�GJ6���a����q����\���dfI+.���v�?P'S�b7���'+�������h�c�93m_���3�-%:?Q����Y+���{�
|w�y��V`��n0G�{�#7����Fy���������"�ZvDS
G4��Y`p���-;�<��|:��`g��������M�`��S��Z�R�zl=G�&������O�cR���=M�0�_`�9jM����qy�;��:e	�N|7�-+7�����	�*E$H�� X�(�	G����#F��������i��Z9�����=�e2��e�K�_�2������+��O'��'�x�������*�>
�:���B���9u���'�s�|�3�z�w���?���P������]q�8�����1�
�A&A}'�6�����><�N��v�[����*�N7&�z���X���:.��>G����R��c����V�.��b��}|X�{\D�	=w�����p���F�o<�,m��-�9����3�#'e/�A��	z�����rf��T&��r�E�O_�����>;a���G�f����\P��G���63�T���g��hO��d_���,����3E�@���
�-��@b6���A��{�k�he�f�b��	�}��]�������J�IK��'i��L�l��\r��&�^K��6yI��_J���lt~n"���Ooc�b �N�n�d]�3
"s
���L�X�V�)��)�4&�.��[�i��c���� 01�?�=�x	��jL��0�U�E��eA�/7�����\M'�R���
�#��s
9����g�l���\�3���Sb��WB�3��
�o�t�����������1b`��
a.,��_nF�Z0c��s�����H4�q���Q�(����&Gc��b����#B:I����������h�q�(]^b�h��;��ey���|Y��z�R~��d8~/�S\�[���YT
�)g��9*Z��f�����E��y���5�5gZ�2�9aM�&����
W�:��6�H��:@�9�B2������(?��${���Y����L�c�{�n���P�q`�=b���I�W}�>���I���{��Z�qG3%�\$G+a-�$E(��iZ����2'a%������'R������>As���p��63!���S���W�K����O�W�P�����F���y���6�v�c�8Zh(@�1���p�������:�%��TNH8}��s�o�j
V��u0]p������u�6%B�.�%I��g�n�el�)��J��>-�i���h:��]@"���_�~����;$�{C�
�h�������H$��UF�Y�� �"4�P�
b?W��*����R��U����J��jf;Y���9����!�	L(��x28G�1��O�"]C)(ln'����bMq3a`��%�wf��C����p�;�
�eT�r�Q���V���=��J{��� ��z�-y���%�b�Q�T���s2�\�v��"&��`�<5�Ug��i��j��L�N�d,O��#��C�����Aq����r|#��m������i\dK�P��x����W�j�6��T���$���-[k���G�������@o�i�:�<��Rm���O���'}��ld4U2*b9P���`dN�e
���tq�V�� r�~#-���2K��=N�-��?�|��1ro��yY�2#��yi*[�-dL���1g�2��hfo��[���B(���I��*�������:P�Ig]3izIm9A�u�o��l�?B�^�3x�|�$���jy��L�pF���'�#������IYn�(h��^�v
�MI��p�Z=
�UkQcjyZ[��Dr������m����~X�4�����
�`U'J�.��H3G]�=�9���s����;�C-�)@�����P����Zbz 'Yf?��*A�;���Y��HC�Q����^`�:�7��Ay��N�� H����ii&�N/�lP�Bc,���!�����v����z<i��-q%�B:l�
�����\y���L�#��gn���
*j&N����bmu)
�e�0���b�&��X��i�F�� �*�0a���D��K���@S�0�|�qH��<�L.?��0y��P3�����%���Wl�J��4n/@�t�q����"7(z��%�%��%�9v/IRfN������7����1
�n���Vs�|�W�\:P�����#�xCIM�_�'��������"� 2��a�+���H�&��^�U�m2wV|[|'����$g�����rC����b�;��s��d���&`��)�p�hP'�&%#��D&'�d:���,7zl�0N&f���R����w��i^������'���K�������#�'�_�NQ��.T�o�$R�I"0M�x�8�y'�[���5Q��F��c�����j:�+^p��p�q����	H�� �
�z=��_��3s�����r���
��p�6'�*a�\�Q-�vxt���dt@��/�&��3�"�u��o$z|`�N"3�A��2u�^�w����*�d���ZP�����q�����dUVh��\�Qa�fk2"I&
7W'
{d������q�����b��� ��A������������ 0O1x���`�x�t_����r_M���:���w
[��#�����$a~|�J�����g(���b�\�%���S� �tprz�w���::TH����z{p�����+���������Su����s4`@�A����B���a��hX�P*��t#$�1Utv�D��2��6�o���vI�,�v'����I*X#�?\8o�����?b O������an@�����g�)�/X[1Zb�����5�d����=�����J�Lyg�� K��7��U�_���*;����
���7u���x�aD�q2}Z�bj��N��	�h�O��6������v���"q�l�n���ju�Qm�����j����NF�|0���Xsi
#SrL�H��+���V�R�����X����-u)�k�c���3����v�[�5�~�F��������N�8��"�t�pjU��q��v� 1,��1�A�^
��V����
cZh�[��*`��dlZp�����o��s�Z��������p\���hBY���v<I�b,��<�G��`�}�&jV}������ns����M��5��w��Vo����N��z����F���R�*�8��F�~0�"�Wi����Z[��VgU��"�X��KO[�;U��
���	6#`&���(%t����������s�8]?z6:`�|Kgb&#�<��7Ny:8'���r�
W���C��,DX��E}d9nqhh�w���!�@�n���R:�i�)����B�7f������	#�&*�;�������	��66	��W�1b%F���Q��
�[�����b_P����[2g�@|yE|op�d��GdxA����Y��`�@�L_�W4����.\�Os��Eb�x`��4��N�%YE��F`[��Y^_��~@�1_4{��|���"m�>z�)Q��x��A�� |6��6���.��
��&8���$j��+�Y�)$f{����8n�]�E2�4n�>��L��;gQ�&�x%��v��%v&x�q�F�0s\�)_��2mR[�yn��&S�tc�]��y����5=B����m^�:&��b8��p'E-�"�j�E��2����M8�y�L��\Q�gd������b��%�V����&[j�P�nV0c�)�@����ewR� U�����H��,]%�t���`���x&i,9v
Jv:���`��{�k�����ur��S_^P4��.��Y��]A;c��nEz��x��W(h������R�IX]��������s�<P�^%R��iT�YC��{�	���=l�����+����c����~�A�B�*��0���dKL_��9'���Ij�"���x�hq�#s����A�7��D�A�j����'vI�����.�A�&n�(���#cA���#E.��	�J������.��862[���x-2C�z�"�WYac��8�n�UWtkI�����>Q��3:#9&.���[�`�Y7��E����#^���){�������"+����xna���]j��J��$b@���@��hU�.�.�_^F���$C:�o�-��q�����1b�T���:d�b����[m4&��
^����*7���N��pe_��W����z3�X�#(��j�Ui�Ej��Pw�T5�
�T��G���8h
O.ylH�H=��\�(����v�'�����[N!!�s�xH�d�P�]��P����D�X���!��
�th_��U�>���G���������$��D�pA�tN���I?&Q��O����8?C���L�I�L���+S
����������^�6��N��iD6$�(WC��Z����3:��e�Ei1����K��O*eVV�����3��K�7��6�u�m��Of�(�WL���I]b�.�fg�09)|�j�t�8�0)�!��3��C��D%��9e���H��Kqq7#���U|B��h��(�O�&����:�f�jNr%;A���ajz�������u��_�����5sp�q�4��a�c57�����~��P��3��km��*�2Kh�=HHh�"���:��f����
Tl��������y)�D7u7Q��
J.����*��*R������������k,���{��x	���T�2�@f�@D�mR��b���d�E)�p�����l��G�}b���-�l�4`��l�D`�d��T`d��l��M�J*�8�m���o�Q�mO����n�s��rW���H-���"V�����~8v/^V/@Z/��������|����@��q�1Q���8&-��F���i���MkV���eIBQ�`�E4��Wk�_�OC������I����,~p�� %t`
~����l|MbA[�7��n(x�8��N8���B�4�7�C�����66�_G���7�J�g����q��$�F�����v��]3������1<6}N(����/���d|We�	rZF�h����"��(�Bn2�C�6���e7	��n�/d�o���lW�*z���0��>+6������$Y�L�ss.������R����bh�b��$;�L�g�x��Rr�B��`�s�e{&�2������
���pc��m@�D5V���(Y�"��g��C�V�+]���IuPm4���
���9�-!D�0`��Fk�X�����5�f��i�R.����N��c'�p�L"��hq�-&)B	g�Qc�8��y#�B���r�p������R�J<}A
����� �d��.o��&^P G]vHK�nW�H�XZ<��� B4]��s�k7k�*�R'����6���J�zB�8;���&���|&.23����2sL��X��KK?�$�U�O+�PN���P���>�8#�c!�����,�����$����ri�8m��p�CiZ��K�P
L_���L9�\C�|���\�N�_��<h_����2�5��*8����IY}�g>-t���Y��0X��:�s�ex����Qb�c�����[�mc�h1���(�����7b��������WYb$�XZ�'�L|��T��D�+����)W�%f?����Q`hx�CGBp�-L,H���s�<U����-^�f���92=���Z��k^�r��"nB����[�ec��������'2������v<o�_�a}blw�|{��AXm@�(�i���.pN�XL��xy��~�����(�-��Y�����x�9K��E��m�5��I����m�"��	\cC����P��s�i%ph(:X���/5eJ����H��+j����W20�(�vf_�[�W��((F.��*���'1��rvSk�7�51�L�^���~"ba�&hn�2kg8]G�iQ7%���C�{\�i�������4A)���tYN`���_w:s����
��u���X���9��:�Z�����InKr�T���N`�S���2�dc����z%.����8����6��B����K�h�S`�B2O1���M���_���Y17=���q��E�(���J����u�cRA�W��R��C
��_$S��k�����P���o��K9��<J�w��(-����WiaA
��~����>%�����*b��cZ�����r.��4ct4v�1�9��i��z'�l�T�����Rs9�����5�?u���rQ
��i������S�XY7A����P�k[V��#�� Lt��9K����ZN����J�<���E����9.��aMP|"#`�R/�s��7p�@|���;���;����z��h��/vN!`���>`�*j���T"�C��$*U6�s)7\���
Y]���%��O��'�F���J|��Sy�t�3��HF�����;�g�#1�{	��(7]Y�c�0�Y ������g��&����J��l}�`I2^8#�4�g��h��d����2���&(��;"�z����Qg|#WD���7��:5�tc�GN0J=����m9��I��H!&�� pF&��'�!�!P��9'��������9�������D
���������5c�����%�+��?A1q����/�������dQ�g��"� ����7���9�s@�������S}��w�2j������W�}���6	F�:H���2!�j]�IQ�q#�C�M~�PH��
��m��t$�+v��o
{���%�V��)��R���:��?H�P�C���{Y$�������&�}}���;����+�?�8X����:�n��`�!����d��=�Dq���Q��Q������3e �ID��}0=�6=p�i��z��N�b~�Q���]�g�"�G�#��L�����R���V�Ou
o^8����lG����?�8��)����:�P^[�c������\D��M$
:�	�f�G�a�p�����uJ�tx�L����1������u��%L �9����^�4}�X<:��#�9�C:�b���U��]H�~mML���{���,��z���~`���YG���d]8/=h�s�bF'�$(6_�����2��O�
s�L�Rl�
��y�w�d�md��.��E@�-C�(�3/�ro$+���{����������2�h����H7�c�@V�$�]
v#��cd\����"JJ��r���{f���
�����9��i
3Y5�R	����-���(�&c`��w����LA���;Zr9���G��}�8G��<H�����f����q��DB;z���D�	3��t��k��=��l48*2�s�*�D�,�Ld���ckZ1��AeJ�%Nn�8I"�0�����K��m-����h���v��7xg/R��%����9^Q9����[�BV�%�g���!w�e8��S)@��y���A>�,�����i���vv�er��r�_�I�}-��|z�Q�.����a������B������x�E��#������]Sme��&�&�n0C�q��1u8�V��
8����e�N8�a��g�v��z��7��4m�'g��(cfLqY���	���<�2��=�#�%� 2��\��[\��o�����*���C������+y2X���D����������P�D��m�"&�R�2������YX���QO{�Z�c��2��w��?3��7�1�d���v����T%R��xD��~7��!��V[��-�K�$���JH�z�v�s�w����
ms��d����h��$6������J.(w�5p����1d���#@3�'H@`<dDh��9����(N����Q�j�M� k�*�1�S�T����c0D��^��
�i���I��9���w�76���DK��n�
O�M]ez
�0b�g�\RF:3�P0;������-��:#�������5��������[��%�������<��@�W|o:%1��e��Y�o���W��)�@z�6k,`!�!�l���8���T�=�F#O$h�B9FN�tz����)��<�����0��.�9B)�T<�H� ��d�W�#,�n��I��a��<!���#�Z���������ox+7U��I�,�Y��qd0�I��;1����y�`��V���f���K�7AFf!b��"J��8��\E�}Mgmi�_v�tOz�z�_�LXu����3��4��txj���<����)f�Y�!>m��w�T�1h����8'W���o:��aaWad�!br��8��_%_�������z���u�;'�s����2db+Za���!o��l���IvTy�|5��j��J�p[p%7E^P�����u��)����H�L�D�h���X�k�i=�PV�f���|�l��8�������g��^q&	��(G$�0/"�L���������
�����9)�`���,�����F��xy&�9j�,�Umv���%T����Y�t�y=�'~�y�hm�h�������N�����9g��a���_R>�	J&uf�/k6\3���b�����-��0
ZU8%�ri�F��0�_�_���]�h�"n���z�.��[�a�F
{�������'6�����t��H�E&�1���"��	�/A�����T���#*/�I��y\�W2�vkyo��=�f
DdsY	����^3n��45�nqvI0xkq�"������������E3&�+��H�k+��fx��7 \�Nb(rF�g��pP��y�B����>�s�'��%XlW�a"���s�&g�5�oJCEKz�Lk�j�
�5�
�+J�;V���Vq�����l�)X���KJAsr�����b%�@��	-��"�^�������� G2�)�`�1���l0J}��P�1m078(o�I�.}�v��*�(�0J��E%14��������.5]M��!�r4��Q" S��K���������27`�X���VhWj&<������:��U�l�dn��z�I�4��e	��+Z�*F���[��1va>h�|�����X<�4��r��Sc��H�6&��I�s����
p�`���y�����3�X���W��]
��k�6��]l��6]�B9���)��i��Q��|r�{�5�$��^
)T�$���-���,�$�bD���:���p��M�z��C|k�-�H$��tu�!�a��rTNk�mbU#��8�>YxKp�5l)u�5���(M����"������eQ�L%UV�J�,���M���_�=s�G�R����~�����i��>�����x��7�����=?:~�{zv���P�������z^�g�~FQh`d����3��;���'�x�`Nk11S��6t��|��������#4� �>"V��
S���/������+8�go#��rY>��2�y�`d/�OaLhwVd_{	�G4�-qpx�l����:q��U��~�]�O>]�����||t\V�D��O!�TR2���U��%S�o�q$�;�����}k�opvpr�����to���b�&Tum����k���l�:k�&B���-�p,�	�(��M��JY��yi�8J3D��i�1uK`kuY�FC6=DWh����6�-��7c!�'wgg=z�?�N�EQ�6&T�>��q*�)�3G=���F��i���Y�g�E��������2�'(Wm���R����gf�����]
s�w'�+T�!�����!AcO�_+�{C�_���b����2����E��������$���z}trp��
9e����P�'�������g�g��wO�������y����t�YIWP�29/�:w
�]'�gM����m��Jt���Nw��������J]WXg�3X9��KM$p�0�,a�����M�[iiO+V�c��!x/��G���O<�q�����-���%]��W��������i�����y�w�����������iL����s������2
�8O6<2���a�"��!�0�M��(f1��X���D�@+J�$�/����~������h��@�u�Y���9�Bu�=Z8!�8�mE�z����E���X����,��{~y>���*|����c��0#Kw�,f6XRi
&�E��\���jD�u���wRXq8a���2@���@�hAR���hd�^���pB��uD�dD<w_t�NKkt�X�oJKj����_�������7K�mm�O�`f���&�f��w�O]�����4��l���<����w%���X�jV������]��+�x���3����Y!��:����:������^���R���)���Z������^�YBM��#�Z��S�
���*h�����nI�����a1���U(y��@-�r���}��/����Dwb�t��b�2a��������"c�nF��n�	�^��]v�X�#�!#2`@6��"�E�C$sF�2xWr]����'^p�
EZ�3\��_R��	>+&��
�(��\}��������]�<Ro�lZB&�!���~Pb������@�a���y�����������v	���b�w�Nd
E�B�����'���h�	���J��=��F�� 2����S�`.�'������n�	��E��yC��Q��dh����D����R�p�x���������O�a����%����
�I��A�����hi
���6Y}���h���e�����=9���R
��L���7�v)��xV�?V�P�RC�����qvi���� ��:]U��jU"����h�w�Y�����<��tkFkT��5��b[jq���zCO1�n�C���X
sB�����I
Ol3��X�0�r��*�_/J^�Ff�����F���2h��dAs2b����,�uy���9)!�c���;I���Z���Y~mr�]���Y`}���Ib��@�f����e�s�+��w������U�m���L"�},�$z�G�����(�>tb7-����Z���U�-��r
-�.�a�����1���-��ST��[����i>L���3�%GNy��`�1����:a�b�N�Uw����z������"���a��Xz��#F��S�����y�H�l������y�J��4/7��,>}����U��`e��)�h�.A�YM�Bx� 7)es��h�T��#�1�G>abc�,9��>��m���?e�7��0��c\R4���V����I0�mH�(�5\��������E����!���up��e������:��=���{������p[Go����������o�3��f��4�������u3���9><;���TkF�����l�m�r_�X3�_+�u�}vt�������9W���s0**bw2��Y���&�-iM@���X&kv�A��C��
d�>��]0|���.}pm�D�@�����F$2��xl�6r�u�\
&�����g�q/>�����J�����w��W[~��(	Mk�N����U�d;�h�6$����<+����+d��gHb,k���SZJ�d�I�N(�����Z�r�aUEA��Q��"DB2�������v=��<�B���'�S��O�\�8�}eq�f�&���E�U�n����[��(����U�dR=�&��[���T��"J��G(%2
�D�%�	:��q�"!�	��v�
PM��*]�c$�NFu���`�:�1��M!���w}y)������A^�wY��U��q���$����	>/J
r����
����P��-��	(R���ExiQ<H��������B�`	����}�.�����8{�3�W6������s���bp��>�{���5��u���
K$$��	B�������g�Se?`���,w�d+���
�,+V7H�\_���#x��s��������i�8��F~������BVW4�i=MN4B��c�Yl��71����JF#�%_=�����p������}U�y��O�B����0?��	0�4�����bb@4�������]���O\�:!	����*��K
0����?a��m�q����?#Z�qN�:����� I�?m�5���~����wk�-�h
\0nQ�m�ZuU��9*��.\7�Hu=�]d���:.��$�S�����w��A��rv�,W�m������������o�%h����20F%U�n��M�2���`�W��}U�����F2�n���q��6�`�N\��N���w�4�Q�]�V���z������V��R�T�[]Y__������4;��m�.��.p4�O�a���E��%�2�^F��2.�(�#��������x��u�O����������S5
6�\E����I|n~��|�)b������Tls�}������W��u��1?rjR�$�'���:���v^�����[��Pn���������s��#�?FN�����TSTM���G�TQW�a�����`g[��JR}��������b�::D���4�h�U����r��7�|��vF]*6[
BE�[o".����[���(G(�
��{yU|��������M$d�TF�6�M�U���R�g
�Q��0�Z>�	(�����q$e^�2���J�D�&@��&�e���N��8�X�b�+��6?��n��|<�b��9Xz\���#M%��F�5��F��z!�+��P��Z��7�:nu��N#M�[�42����Vy��� n>�=�-��.�LUQ�fs�YP�M�$5�����j���-�:��R-!���Y���F���Kz/���!8�r]����C��Jcz���c�f������n�5��93�B���/'k�l�Z;��j������^�7������Em����Z�?M\�=������t�����}%t�{<�=�����p��#D�r��'��RD^<`�2���p���W{?Sl	@�E/�Gd�����7<0�%��U7@XI>WG<3�*��q�-��:���%�`8c)OrKg��6���*��z��6Jlao�Ni�5��<m������YaI�#�k�,p,�c�H�*$h�NX�I�`s��^gPf�!E���KL���_@�]�Q��fVF�"T~�t�@6��^��vP��������`��0��?���o�gX��u��z����<����G�&����yFBCLA�rr��UK��)vE;	C�����VoYW���)��MX�L3IXv~Qdi�dc���9
M�N��P���Gm~�g����BasNq��
[sJ;��Ph�E�1��������*tw�6���6��p�X�>�(
�6���1c�����(��Q6����x�sJ{��S��p�6����"�6�sJ;��_�,�<�9�9��^���E/dm��LM�E��E�g(VB�����p-��a`�Y���_����Z������^{!LZ��yIb��
���o�r���������D	�	�Wd�[x����5@�
JW6����O���W�7�B'��u�l?ll5d���v����&�5�����������Zml������M�sm�������|P���#�e	K�ec��8���+���p��/XX`m�mfc3\Wo:gI(}�M�
W�p"��{����\pt����n�I�<������	������� ���C�_�� ���������6A2��mhpd~�*�qK�3\���4P�.��})��*�V��W�p�h���AMX�F{sQ�s{���6�p�L�6[L����L�_(c����c���Jq�&2C����Z�E�W�Yx~*k��
���H�X��-���|���g}+5t���({)����y�o�O��%�F ��������o���)H� ���|����4���x��:g(��F#d���1��l�b��l��u�?�����3��:�A�H����3f���v��
�z���"<2����������F�]kV������B<��0�G�2xx��g�������*x���:���N��[�l�H�E���i��U��x������<����8����Q�eL����h�B$���C=y_�bgo��p�0�l:�l��{=�yy��������>:8<�?\���p�4�'������������N�WB3������b@x2[�y{��g��@��=�3��g�^��z�{���l��B4�L6&1�TKm���Q������j�oE�f��S
��`����v������8��xhz�63T���JJ��h�_XO��Q7��K�S6WU`yp�e���e�j��xjd�s�y^XK��� ���?�?��gQ��"S�����G\,�&o�	�4w\O�F�AC����k-��w6;Q�Zm�:[�~-�Ho�T��M<��D�G$	�m��@d�NM�-�i�:�B�
�����@���
�gV����y��Q%o"��������j���k��6�-��l��!�K2���$���n'�V�(��3D�a�<T�%��l[���
�)�V��K6n����e�bU��Tc��h�;�b�b����M��(�_�r�56R���F��ui��zD$;!z���I���f0���:��������~�P+H>�XI"0�E0������m��E�o
-�;C3s���|����g��S��0��!����W��%ghw'�����~L/��M��NZ������
c
p��������&��Ub�|��3������c�����V�V���vf7����~�=�����v�d�uP�D�ASfK��Q�-G_z9N��9�t�	���=,����f���OOT}���r,�_e2��XD��6F��8�K�z�����e�I��y�I9��;����������9RPx��9���X���6�SX��[�����0�����TL	����3�R��������az���^S|>����������h�Av������b5�y�j}���.�x�A�4��/�#���d��1�w�#�41�|���n����d"A��86.�i!A���
Y�k3x����	Tv����A�����A���)+�.��!h{����g�[��*bj��P,)C�+��N�R��M��c�rg���A����%�L"("6�m����}
Mj�K���)
OM4|��m8c)�u����jC~�$���%N�,~��YE���O�`�����O.h�[Ljb����9�5C�	��"����������-�����\���"q}/�G��LE��:��F��1&k���P���h�����Xu�\�(:�������+P������Qvm~�,�#�*�=F[>]�v�~~�6!���Ax���]��!��W4�k���31+���Y��)���d9y���d�J~S�� 9��7d&�hd�oa��4�g}v?|���T�`�>�D���O��XLY��������IiC���W#�)���K�`VP�Wj+��1(?&MM��:�1D�&8Y�
����C*��&{*�KC5�^V�QEU��1�\����p��KY�H���nl��YH�c��h��yF���A�x�'J(�7��T��!���Tb���k�NV���v�A��iM?��n�����wf�\2����g�HZ;��S���`��fv,=�uq���A�5����Ne�������N�p�h���|��Bh����|ui���p$���;��)NG*�D)�i��QV�� T�S�2U�3&�H�s&c���Cvdz���$(�����[�n��S��i�L7&(�L������6\\�R��%
�`�*�
1�{F��$���j������1I~7z��^�)��]!��kj�����uh�&�AU$�y��{������t�����mN��
����{�����+�U�AG�b��i������������b�:�y�5K�l�:�(���nbrI�U�/}���.��u�"]C�]�GxB#EM�����@��csz�����jc��Vk�G������h�j��L�l������_x`d��UF�W�O\��}x&�?�1"�!\��4a���G���|D�3�en���^�qu�rw�����4=��������q���w�'����)��vx�*q^
TD/�:&��?M���Xt��3�Y������VR�k���=�}�����,�x���W6���]i������5��5ru��zA^��w�y�@s?�O�9U��w�)�1�[	o���������!�x	�R���F��`�^+����Q�=�x�����l�aP���W�h�����PR/��m�tXr!�ta��b8�i�/F�[!�����1=�@�	�*��k������p"������C;����4f������3$Ij��,��������*h�0�*����
i������31��1���x�{��Ed�FY�ab�(X�A��t���p��M[�������u[��A�HJ�|�+|������]���p����p���7����'������}8���k���Z��f�����,`F�����9N��p9�:LH���0����b���hZ�,�����{�'������0b�:���0e�%�����5�^#�^cv�V^����jm��]lw��	��������s��'��O��2V�������3Z^��'���O�r@
T�����Y�b��+�X��7��3�PV����r$B�v[�y��e���~�������������vsf�[���$���i�E��R���Y��~�V�/*������n���Q{���f��`����������np9��@57�G���/������C����@��&|��_\�s�Z����1E�V[�
.>���7QA��(����j/�*,Bt-�Z�S
5�Q�U�D�;��X�ZNo�b���u�^��jp[=
>��xt���1Y^��:/��l2��67)�}:�9B]u��4����=dJ��q���mT����z��������9�$��l7���R8M���w�j������i�p��q^�������c7�?&����&Y����+�rK��
�o)[���vA��T��|P,UE(oH�|/�tS=�C�]���$Bpx�$������/v�c:v�Y6���+��~jj��#�9���*T3oK����h�r�E����\����j���������6o��)N"�z�)��|p\��P����ek���Z���|$���qxt�^����_���=�vmEq�����f�#��S��u�`��'b�����X�QGr�\�O�W��	�D�$r�P��G _a�)�
���1����Z��o�1��z�	���*^�$&NF}0�'��G���>���|�AE@��gHez@���2UH0���mv��{lw���~w���A�tA���,�����c:Ty���x4=�N��1,
?��O����zD���|��*\�5�S�e�bc������L��`��������V���V��V������������F�	��?��i�8��
g��\R�
}s)x�9��rYpmn�w���b\����[[��d���������0��jFQ_�����)��������#;��$����H0�;)����P�}-H����,��70�7���@�����'���_t�t��������{��]���j~P6�i�\�|[iT7��++������hW���\'q��+�����~������T�
���^����}���Z����������������5�[�F
�7��������;}�M����dv����A?�'p%i������j7��n7�wZ����ko���q����n���W��:��0�y����������<V���(Q?�G�I�������:�z�AwZ��/X��}P���T[�y����T}�q����P�5hn����P����5�T���w@f_?��HN�VTO;X�,����"!�@�����E�f���lm�D�r��d0�b�)�~W�mK�G���:�����hS`�q�1|Cq�b�M��
c~P)�d���?�������jvZ�j������[�x�7����_�r�QY���0��t� ��u<���z+�~�e�s�*���aa����7��?D��#������7�d<�}L��������8(�4�^g�*8�%�Q|�l��I���V�~�j�g�Xkd�F_zzo�3y��aj>������^w����~5j\�a����=���E�b��p��x��o��J&�v
�������10���x���`Z|=�F�`L4��0��]�����c�d�fs
�c�H
~Hl�H"�r`�ezeZV�������*6s��F�(�z�*i�"��Ui
�
���G	����L��d?m"�&BIj�Pg�]nR"Fg������*o<R�pA�-�	�j:�C���8�������/�D��H��pG��w����������c %�K����jP��a9��a�!���@�kA��)��X�P�]	��w<���
�&�[�`�B#nc���
,��I��u���8���`�Qf����`q��x��Co`o��?\3i���l�L����Et���)J	��`q��I��5�4�su�����hatm������R�"���0(����*r=�wyS�
H���,R��S����wf6D�CK>�Lk7������S��N��s7A��h�
A�m��J��)U�d��������"�����]( ,E���P�X.Y�uV���
I������
(����qp�����j�����D����
���&�������Wd��_�1`������Sh���Dr�#���7��������#g��N��#"i�
���F1�^�k���ZA��&`��x��>]�\����Ln'�+������%��j�:��6N9��
j�W��
����5���� �Z������j<!�hi|�b���L�iK~p���-`ut���d�^=���9����.�\4���S�n���M�
v�<�U�vVPX��=|M
8���0v;N�r`�"��)D�W)������=�e���^��
�����A1����%�uS>�7�;O����D-�U�W�q'���QK@
���K1�rd�1�����[�cC�����
!Kr�n_����1u���W��it-��#^aL�9�>^�0��
����H���&0�xc�doL-I��������� JT�����W$J<������� &c��NpaCv�fM����$Xa�C##8��i�������dU�(�B�����"�h$�i6p�V�������ZC=����0��&���&M�{��^�8��2���s�aa9<%��1r�h&������cD�,Te���&��:�]]
�4o��,G���a$�7�c��UXM�z����`��S�p4)b�������b�1F
TE�I`K����#������%!:ZV�8�������<����]Q�����OL�0�B����KC�����	"��Dp�h:@4U���>���������2�����sZ�!2�?U}����	C�(����CJ���D��J�6PX���o�?$�P��H�q�~�8H��kR�A����-8l��!3v������*��3����u�v�x��,[�R;�(�Y�CA�w���'x���$�b���9�5�����OX��TMu�r�1�

�:k��W���\[iY�yf��+`��;���`�,�%�D��KJ�/2/�8�i��1�?B�(�(�dxO�FL��|<�9��;���}���~@�{�������%C8���"��WG�H� ��6���ao�6��<�.�rc��SU���H�09c�e�t�j�$����A���M�mF� '0�f
Qp��A��f���A���������0I�h�O(�{�?w��Z%�~^K(`Oy�NV������1���h�Eba������B�s������������<�������;r�SRz�e��?�6q��h(�:%n�@�	�
h[�U$�RT_rR(�}
F�������.�����������
������g���N���H���z��sz�~�3� ���4�T���r3��������s��H�`y�g�4$����aw0g���A/Y��)c����c+�z��������[N,��W�#kTu���Cr�_��*w@��a�T��$��CJ������k�N�&���k��2���	bA�}�D��&t�#�'.�u�3���qOkDBF�Q(��W.O�}�N#����y�])�����������0,������!l}��a�8���|���H���\i��"��o���U|k.����E��&�c�p�h>�:L&���1�� ��c�4�`K�I�����t�9���x%�J��	�j���^��)*��|"0s����$A����Ky6�8:,����
������oz�4Z��%/�LH�7o���I.���j�|��a���	-�L�e��*`�{/r��a7��}�O]������<�5������
��C��	\���YT�g6����E$$��DI���Q�6q4AP��z�d:xC�D�S��$��mVn���$6�j?����'�h�ti��pg��F�9)�}(i&yl��|&���������i����D�N�1���|N'���
�L�����aJCN�Qz���[���e��c0�$M�YD2�o�4,+P����D6K��&i|�B9U9I(��c�l�]s;p�_W
(�=A����j����D�_RC�)B��n���X����(�{:�	RT'��H���_���L�G�1ygXl�����]��:�7��R�����]��J���@P���5���LQI�R�F2������{(
5Y��{6��C>�XBOY6�w>��[sj��S����"@/�UY�E���[��w5c���,��i�:� �F#���s4�3wW�D�d����(]���W��l8�M�������
�}P��&?+y{��s����g}=6�������D>+SrA"M�
w�:FY�%��e��}=pZ����g���2u�dL0+�H�����+MW��SnB{��u���K������"�&64{��������	��t�vO8�I�:��=5�T`����zz;��.
u�v������i��i���_�+� ��g{��WoRF�4�H�2���v{aWXT����XZ��������x��=	Mc���u:���d'��}�VF�,�@Z��P6�2-�c��]��T��Vs����,f������I��Y�mh�����{o����)�+Y�m3�{RnW>U�������{TnW�M�]��v��}�r�ro����)�+���L|�r��������r�rO���g)����I�]�'�v%_���=(�+������r�r��m�.��� T���U��r����J�r��������
<@v������TnW�q��Rn�C��v����{Qnh������UnWQn����ve�r[c�L�v�Sn�������]��Wn�
�r��RnW�p�������e�c�o�t�h�v(��V��y{3-��F�z��[i�*�������O^���{opY���l��'ZA�������J�&@=1@DpU,rxC����A����NFYR��nu[��#��$+U����������5mg����O�`K�(�������h>]��oq��{��2���a����m:iU���M�������H�����L�j�Wf8�	����C���c���5��%���e>�Zm����I��	.*�~^�$I�}��{��n46[�����U{������uz�V���tjq�������nw�������h7��v���������Tl;��������N������&,u�t��z
������,�N��U��\�o��� ���B=����3�����^:�����TC�g
"�M��J0��:�3�i�:[�����MryO��{}3��^|�U��S/��2�\�~������x�W��v�Qo����}��]�o�u�J��{��uW�~�;��
�,���X����:��U�O�c��t����s#\`�4�%;p��A_
>������]l��LE�E>�����h��q}�!�%�o���{���,O_���
���c!,�!��f{��8�����;�	���^_���E�\���V���y��O�n��O��jap��^��%u��������������w�e�a|������v�^�Or6_`W��y���*8�9����^���`��l�~k�Z_u�i{�?�~�C�;�71����Y����������b��
�����x)���v�-�C(}����'}}Z$�F;�l��b95f��d�ho/�X1�LPf��&F��;����s���������o<����RV��������W:���i��MZ%��-���ld�u���-�XkQ?q���OS��7!v{���Uvd$���|��SKz�6���yY]�usO�1j{8i�����d|}���5udP^��e.�����P��}Y^!V?�������ml��`���M���y��X���~D)H	~���8�����}������~���
���(����������������u�m�IZ��<o;��[���������f���L�������%A�I��M���taH�OkU$���:���`O[�&:=L/�6��9Y�b������4p0�X����G������I�q�����������O�s�BE��F>�-X����@��j>mK]zK,�`��v���;m�������n+7D�{t����+�Z�4�v����42���i��w�R�$%�$ d�������I69�d�]���SH�_Y ^b������&p�[p�WoD��R�%�n��V���"W�/y	�����v~�
��p�e��\�������Lv������e�8&��9v���
��4���4��T:Q�����~�}����m�7��/��v{����k|���7k�n��k�;��^���S��w����Vc���z�����b������c�Ue����?�R�c�b,u���
,���9�K�]�6��H��Z�5�&t���3���%�m����ju��km�43l�r�;d9%(IK]���O�����b0�_��%^Q��Y<9#���������l�~j�C�$*CW�n�E�rl6���;*���(��Y��zu~������JV�\#���u���p��� ��%WQW���42u���YO���1��hj�t-trk���p�s,"�>xT��.����o�:wg���{���s@�3
�r6U��I/9���xr�V����.��z�������r�����?�\p�/o����(�X}�Q�]�b[���������M<���X���3�7%���<���-}�z���WQ�T��������tp�������?��hF�}���,T���&�G��6?h4������oE^6 H�s�w�f�����}������/�3��������C�q�>�_�va�'T�����7/O�����^���?��G���Ze�������������Z���_N�uk����^*lU�=�}����z%V��9��q�/��� W)=*~y�{�����}�~�b_��_����W0��������L�|��A_��G��+��>zy��W|s|������-��x�z��X!�&��=�X�d������D?-DP���fn�3����U�%��^}s����z+����3%��bz�w�������3�����i'���%��<����V}k�&�������_�W�fq�zr GovGw����o�����^=�?�}��h�3<Q�l����$����0��0��q:_�T<z{A�83ZmX-�N�d�%K'c��DG�L�g��.�P���J
��hl�����\@=�g�N��(8iiU&u���E)���u����6���}�������[s���S;Y��
1����@p��!L�1 ���I��{
�2,������Zu�
���x����6����U���J��O<tzh���~�����?�e����y�<|>��������y�<|>��������y�<|>��������y�<|>��������y�<|>���������������

#168

Petr Jelinek

petr.jelinek@2ndquadrant.com

about 9 years ago

In reply to: Tomas Vondra (#167)

Re: multivariate statistics (v19)

On 12/12/16 22:50, Tomas Vondra wrote:

+<programlisting>
+SELECT pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE staname = 's1';
+
+ pg_mv_stats_dependencies_show
+-------------------------------
+ (1) => 2, (2) => 1
+(1 row)
+</programlisting>
Couldn't this somehow show actual column names, instead of attribute
numbers?
Yeah, I was thinking about that too. The trouble is that's table-level
metadata, so we don't have that kind of info serialized within the data
type (e.g. because it would not handle column renames etc.).

It might be possible to explicitly pass the table OID as a parameter of
the function, but it seemed a bit ugly to me.

I think it makes sense to have such function, this is not out function
so I think it's ok for it to have the oid as input, especially since in
the use-case shown above you can use starelid easily.

FWIW, as I wrote in this thread, the place where this patch series needs
feedback most desperately is integration into the optimizer. Currently
all the magic happens in clausesel.c and does not leave it.I think it
would be good to move some of that (particularly the choice of
statistics to apply) to an earlier stage, and store the information
within the plan tree itself, so that it's available outside clausesel.c
(e.g. for EXPLAIN - showing which stats were picked seems useful).

I was thinking it might work similarly to the foreign key estimation
patch (100340e2). It might even be more efficient, as the current code
may end repeating the selection of statistics multiple times. But
enriching the plan tree turned out to be way more invasive than I'm
comfortable with (but maybe that'd be OK).

In theory it seems like possibly reasonable approach to me, mainly
because mv statistics are user defined objects. I guess we'd have to see
at least some PoC to see how invasive it is. But I ultimately think that
feedback from a committer who is more familiar with planner is needed here.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#169

Petr Jelinek

petr.jelinek@2ndquadrant.com

about 9 years ago

In reply to: Tomas Vondra (#167)

Re: multivariate statistics (v19)

On 12/12/16 22:50, Tomas Vondra wrote:

On 12/12/2016 12:26 PM, Amit Langote wrote:

Hi Tomas,

On 2016/10/30 4:23, Tomas Vondra wrote:

Hi,

Attached is v20 of the multivariate statistics patch series, doing
mostly
the changes outlined in the preceding e-mail from October 11.

The patch series currently has these parts:

* 0001 : (FIX) teach pull_varno about RestrictInfo
* 0002 : (PATCH) shared infrastructure and ndistinct coefficients

Hi,

I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
rather limited).

dropping statistics
-------------------

The statistics may be dropped automatically using DROP STATISTICS.

After ALTER TABLE ... DROP COLUMN, statistics referencing are:

(a) dropped, if the statistics would reference only one column

(b) retained, but modified on the next ANALYZE

This should be documented in user visible form if you plan to keep it
(it does make sense to me).

+   therefore perfectly correlated. Providing additional information about
+   correlation between columns is the purpose of multivariate statistics,
+   and the rest of this section thoroughly explains how the planner
+   leverages them to improve estimates.
+  </para>
+
+  <para>
+   For additional details about multivariate statistics, see
+   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
+   <literal>READMEs</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
+
+ </sect1>

I don't think this qualifies as "thoroughly explains" ;)

+
+Oid
+get_statistics_oid(List *names, bool missing_ok)

No comment?

+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;

This sounds somewhat weird (plural vs singular).

+ * XXX Maybe this should check for duplicate stats. Although it's not clear
+ * what "duplicate" would mean here (wheter to compare only keys or also
+ * options). Moreover, we don't do such checks for indexes, although those
+ * store tuples and recreating a new index may be a way to fix bloat (which
+ * is a problem statistics don't have).
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)

I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).

+ if (true)
+ {

Huh?

+
+List *
+RelationGetMVStatList(Relation relation)
+{

...

+
+void
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+				int2vector *attrs, VacAttrStats **stats)

...

+static double
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats,
+				   int k, int *combination)
+{

Again, these deserve comment.

I'll try to look at other patches in the series as time permits.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#170

Dilip Kumar

dilipbalaut@gmail.com

about 9 years ago

In reply to: Tomas Vondra (#167)

Re: multivariate statistics (v19)

On Tue, Dec 13, 2016 at 3:20 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

attached is v21 of the patch series, rebased to current master (resolving
the duplicate OID and a few trivial merge conflicts), and also fixing some
of the issues you reported.

I wanted to test the grouping estimation behaviour with TPCH, While
testing I found some crash so I thought of reporting it.

My setup detail:
TPCH scale factor : 5
Applied all the patch for 21 series, and ran below queries.

postgres=# analyze part;
ANALYZE
postgres=# CREATE STATISTICS s2 WITH (ndistinct) on (p_brand, p_type,
p_size) from part;
CREATE STATISTICS
postgres=# analyze part;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I think it should be easily reproducible, in case it's not I can send
call stack or core dump.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#171

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 9 years ago

In reply to: Dilip Kumar (#170)

Re: multivariate statistics (v19)

On 01/03/2017 02:42 PM, Dilip Kumar wrote:

On Tue, Dec 13, 2016 at 3:20 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

attached is v21 of the patch series, rebased to current master (resolving
the duplicate OID and a few trivial merge conflicts), and also fixing some
of the issues you reported.

I wanted to test the grouping estimation behaviour with TPCH, While
testing I found some crash so I thought of reporting it.

My setup detail:
TPCH scale factor : 5
Applied all the patch for 21 series, and ran below queries.

postgres=# analyze part;
ANALYZE
postgres=# CREATE STATISTICS s2 WITH (ndistinct) on (p_brand, p_type,
p_size) from part;
CREATE STATISTICS
postgres=# analyze part;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I think it should be easily reproducible, in case it's not I can send
call stack or core dump.

Thanks for the report. It was trivial to reproduce and it turned out to
be a fairly simple bug. Will send a new version of the patch soon.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#172

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 9 years ago

In reply to: Petr Jelinek (#169)

Re: multivariate statistics (v19)

On 12/30/2016 02:12 PM, Petr Jelinek wrote:

On 12/12/16 22:50, Tomas Vondra wrote:

On 12/12/2016 12:26 PM, Amit Langote wrote:

Hi Tomas,

On 2016/10/30 4:23, Tomas Vondra wrote:

Hi,

Attached is v20 of the multivariate statistics patch series, doing
mostly
the changes outlined in the preceding e-mail from October 11.

The patch series currently has these parts:

* 0001 : (FIX) teach pull_varno about RestrictInfo
* 0002 : (PATCH) shared infrastructure and ndistinct coefficients

Hi,

I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
rather limited).

Yes, although I still have my doubts 0001 is the right way to make
pull_varnos work. It's probably related to the bigger design question,
because moving the statistics selection to an earlier phase could make
it unnecessary I guess.

dropping statistics
-------------------

The statistics may be dropped automatically using DROP STATISTICS.

After ALTER TABLE ... DROP COLUMN, statistics referencing are:

(a) dropped, if the statistics would reference only one column

(b) retained, but modified on the next ANALYZE

This should be documented in user visible form if you plan to keep it
(it does make sense to me).

Yes, I plan to keep it. I agree it should be documented, probably on the
ALTER TABLE page (and linked from CREATE/DROP statistics pages).

+   therefore perfectly correlated. Providing additional information about
+   correlation between columns is the purpose of multivariate statistics,
+   and the rest of this section thoroughly explains how the planner
+   leverages them to improve estimates.
+  </para>
+
+  <para>
+   For additional details about multivariate statistics, see
+   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
+   <literal>READMEs</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
+
+ </sect1>

I don't think this qualifies as "thoroughly explains" ;)

OK, I'll drop the "thoroughly" ;-)

+
+Oid
+get_statistics_oid(List *names, bool missing_ok)

No comment?

+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;

This sounds somewhat weird (plural vs singular).

Ah, right - it should be either "statistic ... does not" or "statistics
... do not". I think "statistics" is the right choice here, because (a)
we have CREATE STATISTICS and (b) it may be a combination of statistics,
e.g. histogram + MCV.

+ * XXX Maybe this should check for duplicate stats. Although it's not clear
+ * what "duplicate" would mean here (wheter to compare only keys or also
+ * options). Moreover, we don't do such checks for indexes, although those
+ * store tuples and recreating a new index may be a way to fix bloat (which
+ * is a problem statistics don't have).
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)

I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).

Yes, I came to the same conclusion - we can only really check for exact
matches (same set of columns, same choice of statistic types), but
that's fairly useless. I'll remove the XXX.

+ if (true)
+ {

Huh?

Yeah, that's a bit weird pattern. It's a remainder of copy-pasting the
preceding block, which looks like this

if (hasindex)
{
...
}

But we've decided to not add similar flag for the statistics. I'll move
the block to a separate function (instead of merging it directly into
the function, which is already a bit largeish).

+
+List *
+RelationGetMVStatList(Relation relation)
+{

...

+
+void
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+				int2vector *attrs, VacAttrStats **stats)

...

+static double
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats,
+				   int k, int *combination)
+{

Again, these deserve comment.

OK, will add.

I'll try to look at other patches in the series as time permits.

thanks

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#173

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 9 years ago

In reply to: Tomas Vondra (#171)

9 attachment(s)

Re: multivariate statistics (v19)

On 01/03/2017 05:22 PM, Tomas Vondra wrote:

On 01/03/2017 02:42 PM, Dilip Kumar wrote:

...

I think it should be easily reproducible, in case it's not I can send
call stack or core dump.

Thanks for the report. It was trivial to reproduce and it turned out to
be a fairly simple bug. Will send a new version of the patch soon.

Attached is v22 of the patch series, rebased to current master and
fixing the reported bug. I haven't made any other changes - the issues
reported by Petr are mostly minor, so I've decided to wait a bit more
for (hopefully) other reviews.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-Restric-v22.patchbinary/octet-stream; name=0001-teach-pull_-varno-varattno-_walker-about-Restric-v22.patchDownload

From d242a85fd3d21a48c2e3b01dc7cfec92d2a40268 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:35:05 +0200
Subject: [PATCH 1/9] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 292e1f4..9228a46 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -196,6 +196,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -244,6 +251,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.5.5

0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v22.patchbinary/octet-stream; name=0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v22.patchDownload

From 1d633082cbc2d62a0fa412c4417331aaf0331772 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:35:47 +0200
Subject: [PATCH 2/9] PATCH: shared infrastructure and ndistinct coefficients

Basic infrastructure shared by all kinds of multivariate stats, most
importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON (columns) FROM table
- DROP STATISTICS name
- ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME
- implementation of ndistinct coefficients (the simplest type of
  multivariate statistics)
- computing ndistinct coefficients during ANALYZE
- updates existing regression tests (new catalog etc.)
- modifies estimate_num_groups() to use ndistinct if available

The current implementation requires a valid 'ltopr' for the columns, so
that we can sort the sample rows in various ways, both in this patch
and other kinds of statistics. Maybe this restriction could be relaxed
in the future, requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Some of the stats implemented in follow-up patches (e.g. functional
dependencies and MCV list with limited functionality) might be made
to work with hashes of the values. That would save a lot of space
for storing the statistics, and it would be sufficient for estimating
equality conditions.

creating statistics
-------------------

Statistics are created by CREATE STATISTICS command, with this syntax:

    CREATE STATISTICS statistics_name ON (columns) FROM table

where 'statistics_name' may be a fully-qualified name (i.e. specifying
a schema). It's expected that we'll eventually add support for join
statistics, referencing tables that may be located in different schemas,
so we can't make the name unique per-table (like constraints), and we
can't just pick one of the table schemas.

dropping statistics
-------------------

The statistics may be dropped automatically using DROP STATISTICS.

After ALTER TABLE ... DROP COLUMN, statistics referencing are:

  (a) dropped, if the statistics would reference only one column

  (b) retained, but modified on the next ANALYZE

The goal of the lazy cleanup is not to disrupt the optimizer, but
arguably this is over-engineering and it might also made work just
like for indexes by simply dropping all dependent statistics on
ALTER TABLE ... DROP COLUMN. If the user wants to minimize impact,
the smaller statistics needs to be created explicitly in advance.

This also adds a simple list of statistics to \d in psql.

ndistinct coefficients
----------------------

The patch only implements a very simple type of statistics, tracking
the number of groups for different combinations of columns. For
example given columns (a,b,c) the statistics will estimate number
of distinct combinations of values in (a,b), (a,c), (b,c) and (a,b,c).

This is then used in estimate_num_groups() for estimating cardinality
of GROUP BY and similar clauses.

pg_ndistinct data type
----------------------

The patch introduces pg_ndistinct, a new varlena data type used for
serialized version of ndistinct coefficients. Internally it's just
a bytea value, but it allows us to control casting, input/output
and so on. It's somewhat inspired by pg_node_tree.
---
 doc/src/sgml/catalogs.sgml                   | 108 +++++
 doc/src/sgml/planstats.sgml                  | 141 +++++++
 doc/src/sgml/ref/allfiles.sgml               |   3 +
 doc/src/sgml/ref/alter_statistics.sgml       | 115 ++++++
 doc/src/sgml/ref/create_statistics.sgml      | 153 +++++++
 doc/src/sgml/ref/drop_statistics.sgml        |  91 +++++
 doc/src/sgml/reference.sgml                  |   3 +
 src/backend/catalog/Makefile                 |   1 +
 src/backend/catalog/aclchk.c                 |  27 ++
 src/backend/catalog/dependency.c             |  11 +-
 src/backend/catalog/heap.c                   | 101 +++++
 src/backend/catalog/namespace.c              |  51 +++
 src/backend/catalog/objectaddress.c          |  54 +++
 src/backend/catalog/system_views.sql         |  10 +
 src/backend/commands/Makefile                |   6 +-
 src/backend/commands/alter.c                 |   3 +
 src/backend/commands/analyze.c               |   8 +
 src/backend/commands/dropcmds.c              |   4 +
 src/backend/commands/event_trigger.c         |   3 +
 src/backend/commands/statscmds.c             | 259 ++++++++++++
 src/backend/nodes/copyfuncs.c                |  16 +
 src/backend/nodes/outfuncs.c                 |  18 +
 src/backend/optimizer/util/plancat.c         |  59 +++
 src/backend/parser/gram.y                    |  58 ++-
 src/backend/tcop/utility.c                   |  14 +
 src/backend/utils/Makefile                   |   2 +-
 src/backend/utils/adt/selfuncs.c             | 168 +++++++-
 src/backend/utils/cache/relcache.c           |  59 +++
 src/backend/utils/cache/syscache.c           |  23 ++
 src/backend/utils/mvstats/Makefile           |  17 +
 src/backend/utils/mvstats/README.ndistinct   |  22 +
 src/backend/utils/mvstats/README.stats       |  98 +++++
 src/backend/utils/mvstats/common.c           | 384 ++++++++++++++++++
 src/backend/utils/mvstats/common.h           |  80 ++++
 src/backend/utils/mvstats/mvdist.c           | 585 +++++++++++++++++++++++++++
 src/bin/psql/describe.c                      |  44 ++
 src/include/catalog/dependency.h             |   5 +-
 src/include/catalog/heap.h                   |   1 +
 src/include/catalog/indexing.h               |   7 +
 src/include/catalog/namespace.h              |   2 +
 src/include/catalog/pg_cast.h                |   4 +
 src/include/catalog/pg_mv_statistic.h        |  78 ++++
 src/include/catalog/pg_proc.h                |   9 +
 src/include/catalog/pg_type.h                |   4 +
 src/include/catalog/toasting.h               |   1 +
 src/include/commands/defrem.h                |   4 +
 src/include/nodes/nodes.h                    |   2 +
 src/include/nodes/parsenodes.h               |  11 +
 src/include/nodes/relation.h                 |  27 ++
 src/include/utils/acl.h                      |   1 +
 src/include/utils/builtins.h                 |   4 +
 src/include/utils/mvstats.h                  |  60 +++
 src/include/utils/rel.h                      |   4 +
 src/include/utils/relcache.h                 |   1 +
 src/include/utils/syscache.h                 |   2 +
 src/test/regress/expected/mv_ndistinct.out   | 117 ++++++
 src/test/regress/expected/object_address.out |   7 +-
 src/test/regress/expected/opr_sanity.out     |   3 +-
 src/test/regress/expected/rules.out          |   8 +
 src/test/regress/expected/sanity_check.out   |   1 +
 src/test/regress/expected/type_sanity.out    |  13 +-
 src/test/regress/parallel_schedule           |   3 +
 src/test/regress/serial_schedule             |   1 +
 src/test/regress/sql/mv_ndistinct.sql        |  68 ++++
 src/test/regress/sql/object_address.sql      |   4 +-
 65 files changed, 3232 insertions(+), 19 deletions(-)
 create mode 100644 doc/src/sgml/ref/alter_statistics.sgml
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/README.ndistinct
 create mode 100644 src/backend/utils/mvstats/README.stats
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/mvdist.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h
 create mode 100644 src/test/regress/expected/mv_ndistinct.out
 create mode 100644 src/test/regress/sql/mv_ndistinct.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 4930506..b39ca69 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -201,6 +201,11 @@
      </row>
 
      <row>
+      <entry><link linkend="catalog-pg-mv-statistic"><structname>pg_mv_statistic</structname></link></entry>
+      <entry>multivariate statistics</entry>
+     </row>
+
+     <row>
       <entry><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link></entry>
       <entry>schemas</entry>
      </row>
@@ -4196,6 +4201,109 @@
   </table>
  </sect1>
 
+ <sect1 id="catalog-pg-mv-statistic">
+  <title><structname>pg_mv_statistic</structname></title>
+
+  <indexterm zone="catalog-pg-mv-statistic">
+   <primary>pg_mv_statistic</primary>
+  </indexterm>
+
+  <para>
+   The catalog <structname>pg_mv_statistic</structname>
+   holds multivariate statistics about combinations of columns.
+  </para>
+
+  <table>
+   <title><structname>pg_mv_statistic</> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+    <tbody>
+
+     <row>
+      <entry><structfield>starelid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
+      <entry>The table that the described columns belongs to</entry>
+     </row>
+
+     <row>
+      <entry><structfield>staname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry></entry>
+      <entry>Name of the statistic.</entry>
+     </row>
+
+     <row>
+      <entry><structfield>stanamespace</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the namespace that contains this statistic
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>staowner</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.oid</literal></entry>
+      <entry>Owner of the statistic</entry>
+     </row>
+
+     <row>
+      <entry><structfield>ndist_enabled</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, ndistinct coefficients will be computed for the combination of
+       columns, covered by the statistics. This does not mean the coefficients
+       are already computed, though.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>ndist_built</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, ndistinct coefficients are already computed and available for
+       use during query estimation.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>stakeys</structfield></entry>
+      <entry><type>int2vector</type></entry>
+      <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
+      <entry>
+       This is an array of values that indicate which table columns this
+       statistic covers. For example a value of <literal>1 3</literal> would
+       mean that the first and the third table columns make up the statistic key.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>standist</structfield></entry>
+      <entry><type>pg_ndistinct</type></entry>
+      <entry></entry>
+      <entry>
+       Ndistict coefficients, serialized as <structname>pg_ndistinct</> type.
+      </entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
  <sect1 id="catalog-pg-namespace">
   <title><structname>pg_namespace</structname></title>
 
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 1a482d3..e9248b4 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -448,4 +448,145 @@ rows = (outer_cardinality * inner_cardinality) * selectivity
 
  </sect1>
 
+ <sect1 id="multivariate-statistics">
+  <title>Multivariate Statistics</title>
+
+  <indexterm zone="multivariate-statistics">
+   <primary>multivariate statistics</primary>
+   <secondary>planner</secondary>
+  </indexterm>
+
+  <para>
+   The examples presented in <xref linkend="row-estimation-examples"> used
+   statistics about individual columns to compute selectivity estimates.
+   When estimating conditions on multiple columns, the planner assumes
+   independence of the conditions and multiplies the selectivities. When the
+   columns are correlated, the independence assumption is violated, and the
+   estimates may be off by several orders of magnitude, resulting in poor
+   plan choices.
+  </para>
+
+  <para>
+   The examples presented below demonstrate such estimation errors on simple
+   data sets, and also how to resolve them by creating multivariate statistics
+   using <command>CREATE STATISTICS</> command.
+  </para>
+
+  <para>
+   Let's start with a very simple data set - a table with two columns,
+   containing exactly the same values:
+
+<programlisting>
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i);
+ANALYZE t;
+</programlisting>
+
+   As explained in <xref linkend="planner-stats">, the planner can determine
+   cardinality of <structname>t</structname> using the number of pages and
+   rows is looked up in <structname>pg_class</structname>:
+
+<programlisting>
+SELECT relpages, reltuples FROM pg_class WHERE relname = 't';
+
+ relpages | reltuples
+----------+-----------
+       45 |     10000
+</programlisting>
+
+   The data distribution is very simple - there are only 100 distinct values
+   in each column, uniformly distributed.
+  </para>
+
+  <para>
+   The following example shows the result of estimating a <literal>WHERE</>
+   condition on the <structfield>a</> column:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
+                                           QUERY PLAN                                            
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual time=0.031..2.870 rows=100 loops=1)
+   Filter: (a = 1)
+   Rows Removed by Filter: 9900
+ Planning time: 0.092 ms
+ Execution time: 3.103 ms
+(5 rows)
+</programlisting>
+
+   The planner examines the condition and computes the estimate using
+   <function>eqsel</>, the selectivity function for <literal>=</>, and
+   statistics stored in the <structname>pg_stats</> table. In this case
+   the planner estimates the condition matches 1% rows, and by comparing
+   the estimated and actual number of rows, we see that the estimate is
+   very accurate (in fact exact, as the table is very small).
+ </para>
+
+  <para>
+   Adding a condition on the second column results in the following plan:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.033..3.006 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.121 ms
+ Execution time: 3.220 ms
+(5 rows)
+</programlisting>
+
+   The planner estimates the selectivity for each condition individually,
+   arriving to the 1% estimates as above, and then multiplies them, getting
+   the final 0.01% estimate. The plan however shows that this results in
+   a significant underestimate, as the actual number of rows matching the
+   conditions is two orders of magnitude higher than estimated.
+  </para>
+
+  <para>
+   Overestimates, i.e. errors in the opposite direction, are also possible.
+   Consider for example the following combination of range conditions, each
+   matching 
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=2500 width=8) (actual time=1.607..1.607 rows=0 loops=1)
+   Filter: ((a <= 49) AND (b > 49))
+   Rows Removed by Filter: 10000
+ Planning time: 0.050 ms
+ Execution time: 1.623 ms
+(5 rows)
+</programlisting>
+
+   The planner examines both <literal>WHERE</> clauses and estimates them
+   using the <function>scalarltsel</> and <function>scalargtsel</> functions,
+   specified as the selectivity functions matching the <literal>&lt;=</> and
+   <literal>&gt;</literal> operators. Both conditions match 50% of the
+   table, and assuming independence the planner multiplies them to compute
+   the total estimate of 25%. However as the explain output shows, the actual
+   number of rows is 0, because the columns are correlated and the conditions
+   contradict each other.
+  </para>
+
+  <para>
+   Both estimation errors are caused by violation of the independence
+   assumption, as the two columns contain exactly the same values, and are
+   therefore perfectly correlated. Providing additional information about
+   correlation between columns is the purpose of multivariate statistics,
+   and the rest of this section thoroughly explains how the planner
+   leverages them to improve estimates.
+  </para>
+
+  <para>
+   For additional details about multivariate statistics, see
+   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
+   <literal>READMEs</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
+
+ </sect1>
+
 </chapter>
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 77667bd..75977af 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -32,6 +32,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY alterServer        SYSTEM "alter_server.sgml">
 <!ENTITY alterSequence      SYSTEM "alter_sequence.sgml">
 <!ENTITY alterSystem        SYSTEM "alter_system.sgml">
+<!ENTITY alterStatistics    SYSTEM "alter_statistics.sgml">
 <!ENTITY alterTable         SYSTEM "alter_table.sgml">
 <!ENTITY alterTableSpace    SYSTEM "alter_tablespace.sgml">
 <!ENTITY alterTSConfig      SYSTEM "alter_tsconfig.sgml">
@@ -77,6 +78,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
 <!ENTITY createTableSpace   SYSTEM "create_tablespace.sgml">
@@ -121,6 +123,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
 <!ENTITY dropTransform      SYSTEM "drop_transform.sgml">
diff --git a/doc/src/sgml/ref/alter_statistics.sgml b/doc/src/sgml/ref/alter_statistics.sgml
new file mode 100644
index 0000000..3f477cb
--- /dev/null
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -0,0 +1,115 @@
+<!--
+doc/src/sgml/ref/alter_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-ALTERSTATISTICS">
+ <indexterm zone="sql-alterstatistics">
+  <primary>ALTER STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>ALTER STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>ALTER STATISTICS</refname>
+  <refpurpose>
+   change the definition of a multivariate statistics
+  </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> | CURRENT_USER | SESSION_USER }
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> RENAME TO <replaceable class="parameter">new_name</replaceable>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> SET SCHEMA <replaceable class="parameter">new_schema</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>ALTER STATISTICS</command> changes the parameters of an existing
+   multivariate statistics.  Any parameters not specifically set in the
+   <command>ALTER STATISTICS</command> command retain their prior settings.
+  </para>
+
+  <para>
+   You must own the statistics to use <command>ALTER STATISTICS</>.
+   To change a statistics' schema, you must also have <literal>CREATE</>
+   privilege on the new schema.
+   To alter the owner, you must also be a direct or indirect member of the new
+   owning role, and that role must have <literal>CREATE</literal> privilege on
+   the statistics' schema.  (These restrictions enforce that altering the owner
+   doesn't do anything you couldn't do by dropping and recreating the statistics.
+   However, a superuser can alter ownership of any statistics anyway.)
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><replaceable class="parameter">name</replaceable></term>
+      <listitem>
+       <para>
+        The name (optionally schema-qualified) of a statistics to be altered.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="PARAMETER">new_owner</replaceable></term>
+      <listitem>
+       <para>
+        The user name of the new owner of the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="parameter">new_name</replaceable></term>
+      <listitem>
+       <para>
+        The new name for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="parameter">new_schema</replaceable></term>
+      <listitem>
+       <para>
+        The new schema for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+  </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>ALTER STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..7fa118c
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,153 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
+  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+  FROM <replaceable class="PARAMETER">table_name</replaceable>
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   Create table <structname>t1</> with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for detemining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
+
+<programlisting>
+CREATE TABLE t1 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t1 SELECT i/100, i/500
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s1 ON (a, b) FROM t1;
+
+ANALYZE t1;
+
+-- valid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
+
+-- invalid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..dd9047a
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,91 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics does not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 8acdff1..ba7e17b 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -59,6 +59,7 @@
    &alterSchema;
    &alterSequence;
    &alterServer;
+   &alterStatistics;
    &alterSystem;
    &alterTable;
    &alterTableSpace;
@@ -105,6 +106,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createTable;
    &createTableAs;
    &createTableSpace;
@@ -149,6 +151,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropTable;
    &dropTableSpace;
    &dropTSConfig;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index cd38c8a..9fc46b4 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -32,6 +32,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index fb6c276..7db7de9 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -5071,6 +5072,32 @@ pg_extension_ownercheck(Oid ext_oid, Oid roleid)
 }
 
 /*
+ * Ownership check for a multivariate statistics (specified by OID).
+ */
+bool
+pg_statistics_ownercheck(Oid stat_oid, Oid roleid)
+{
+	HeapTuple	tuple;
+	Oid			ownerId;
+
+	/* Superusers bypass all permission checking. */
+	if (superuser_arg(roleid))
+		return true;
+
+	tuple = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(stat_oid));
+	if (!HeapTupleIsValid(tuple))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics with OID %u does not exist", stat_oid)));
+
+	ownerId = ((Form_pg_mv_statistic) GETSTRUCT(tuple))->staowner;
+
+	ReleaseSysCache(tuple);
+
+	return has_privs_of_role(roleid, ownerId);
+}
+
+/*
  * Check whether specified role has CREATEROLE privilege (or is a superuser)
  *
  * Note: roles do not have owners per se; instead we use this test in
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 18a14bf..f2afa1c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_init_privs.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -164,7 +165,8 @@ static const Oid object_classes[] = {
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
 	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
 	PolicyRelationId,			/* OCLASS_POLICY */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1248,6 +1250,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2405,6 +2411,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index e5d6aec..a0fbb7a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -48,6 +48,7 @@
 #include "catalog/pg_constraint_fn.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_partitioned_table.h"
@@ -1625,7 +1626,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1876,6 +1880,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2794,6 +2803,98 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single remaining
+	 * (undropped column). To do that, we need the tuple descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER TABLE ...
+	 * DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation	rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool		delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16 *) ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((!tupdesc->attrs[attnums[i] - 1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index e3cfe22..2b6bbf1 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4251,3 +4251,54 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" does not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index bb4b080..7b2c079 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -450,9 +451,22 @@ static const ObjectPropertyType ObjectProperty[] =
 		Anum_pg_type_typacl,
 		ACL_KIND_TYPE,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		Anum_pg_mv_statistic_staowner,
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
+
 /*
  * This struct maps the string object types as returned by
  * getObjectTypeDescription into ObjType enum values.  Note that some enum
@@ -656,6 +670,10 @@ static const struct object_type_map
 	/* OCLASS_TRANSFORM */
 	{
 		"transform", OBJECT_TRANSFORM
+	},
+	/* OBJECT_STATISTICS */
+	{
+		"statistics", OBJECT_STATISTICS
 	}
 };
 
@@ -930,6 +948,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2237,6 +2260,10 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			if (!pg_statistics_ownercheck(address.objectId, roleid))
+				aclcheck_error_type(ACLCHECK_NOT_OWNER, address.objectId);
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
@@ -3677,6 +3704,10 @@ getObjectTypeDescription(const ObjectAddress *object)
 			appendStringInfoString(&buffer, "access method");
 			break;
 
+		case OCLASS_STATISTICS:
+			appendStringInfoString(&buffer, "statistics");
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized %u", object->classId);
 			break;
@@ -4648,6 +4679,29 @@ getObjectIdentityParts(const ObjectAddress *object,
 			}
 			break;
 
+		case OCLASS_STATISTICS:
+			{
+				HeapTuple	tup;
+				Form_pg_mv_statistic formStatistic;
+				char	   *schema;
+
+				tup = SearchSysCache1(MVSTATOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for statistics %u",
+						 object->objectId);
+				formStatistic = (Form_pg_mv_statistic) GETSTRUCT(tup);
+				schema = get_namespace_name_or_temp(formStatistic->stanamespace);
+				appendStringInfoString(&buffer,
+									   quote_qualified_identifier(schema,
+										   NameStr(formStatistic->staname)));
+				if (objname)
+					*objname = list_make2(schema,
+								   pstrdup(NameStr(formStatistic->staname)));
+				ReleaseSysCache(tup);
+			}
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 649cef8..7eb356e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -181,6 +181,16 @@ CREATE OR REPLACE VIEW pg_sequences AS
     WHERE NOT pg_is_other_temp_schema(N.oid)
           AND relkind = 'S';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(s.standist) AS ndistbytes
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 6b3742c..3203c01 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o \
-	schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
-	tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
-	variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o \
+	user.o vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 03c0433..0716635 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -359,6 +359,7 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
 		case OBJECT_LANGUAGE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -473,6 +474,7 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -780,6 +782,7 @@ ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABLESPACE:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSCONFIGURATION:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index f4afcd9..7bb3cfe 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -17,6 +17,7 @@
 #include <math.h>
 
 #include "access/multixact.h"
+#include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tupconvert.h"
 #include "access/tuptoaster.h"
@@ -27,6 +28,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -45,10 +47,13 @@
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_rusage.h"
 #include "utils/sampling.h"
 #include "utils/sortsupport.h"
@@ -559,6 +564,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 61ff8f2..070348b 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -296,6 +296,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index e87fce7..f9ce2a5 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -111,6 +111,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
 	{"TRANSFORM", true},
@@ -1106,6 +1107,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_RULE:
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1168,6 +1170,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_EXTENSION:
 		case OCLASS_POLICY:
 		case OCLASS_AM:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..8453dc4
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,259 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/mvstats.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int
+compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON (columns) FROM table
+ *
+ * We do require that the types support sorting (ltopr), although some
+ * statistics might work with  equality only.
+ *
+ * XXX Maybe this should check for duplicate stats. Although it's not clear
+ * what "duplicate" would mean here (wheter to compare only keys or also
+ * options). Moreover, we don't do such checks for indexes, although those
+ * store tuples and recreating a new index may be a way to fix bloat (which
+ * is a problem statistics don't have).
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i;
+	ListCell   *l;
+	int16		attnums[MVSTATS_MAX_DIMENSIONS];
+	int			numcols = 0;
+	ObjectAddress address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	Oid			relid;
+	ObjectAddress parentobject,
+				childobject;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (stmt->if_not_exists &&
+		SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exists, skipping",
+						namestr)));
+		return InvalidObjectAddress;
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+	relid = RelationGetRelid(rel);
+
+	/*
+	 * Transform column names to array of attnums. While doing that, we
+	 * also enforce the maximum number of keys.
+	 */
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(relid, attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+			  errmsg("column \"%s\" referenced in statistics does not exist",
+					 attname)));
+
+		/* more than MVSTATS_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in a statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check that at least two columns were specified in the statement.
+	 * The upper bound was already checked in the loop above.
+	 */
+	if (numcols < 2)
+		ereport(ERROR,
+				(errcode(ERRCODE_TOO_MANY_COLUMNS),
+				 errmsg("statistics require at least 2 columns")));
+
+	/*
+	 * Sort the attnums, which makes detecting duplicies somewhat
+	 * easier, and it does not hurt (it does not affect the efficiency,
+	 * onlike for indexes, for example).
+	 */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+
+	/*
+	 * Look for duplicities in the list of columns. The attnums are sorted
+	 * so just check consecutive elements.
+	 */
+	for (i = 1; i < numcols; i++)
+		if (attnums[i] == attnums[i-1])
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+			  errmsg("duplicate column name in statistics definition")));
+
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Everything seems fine, so let's build the pg_mv_statistic entry.
+	 * At this point we obviously only have the keys and options.
+	 */
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* metadata */
+	values[Anum_pg_mv_statistic_starelid - 1] = ObjectIdGetDatum(relid);
+	values[Anum_pg_mv_statistic_staname - 1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace - 1] = ObjectIdGetDatum(namespaceId);
+	values[Anum_pg_mv_statistic_staowner - 1] = ObjectIdGetDatum(GetUserId());
+
+	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(stakeys);
+
+	/* enabled statistics */
+	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(true);
+
+	nulls[Anum_pg_mv_statistic_standist - 1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+	/*
+	 * Add a dependency on a table, so that stats get dropped on DROP TABLE.
+	 */
+	ObjectAddressSet(parentobject, RelationRelationId, relid);
+	ObjectAddressSet(childobject, MvStatisticRelationId, statoid);
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also add dependency on the schema (to drop statistics on DROP SCHEMA).
+	 * This is not handled automatically by DROP TABLE because statistics have
+	 * their own schema.
+	 */
+	ObjectAddressSet(parentobject, NamespaceRelationId, namespaceId);
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *	   DROP STATISTICS stats_name
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	Oid			relid;
+	Relation	rel;
+	HeapTuple	tup;
+	Form_pg_mv_statistic mvstat;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(tup);
+	relid = mvstat->starelid;
+
+	rel = heap_open(relid, AccessExclusiveLock);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	CacheInvalidateRelcache(rel);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+	heap_close(rel, NoLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6955298..256f8c6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4252,6 +4252,19 @@ _copyPartitionCmd(const PartitionCmd *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_SCALAR_FIELD(if_not_exists);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -5154,6 +5167,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9fe9873..1c2c200 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2171,6 +2171,21 @@ _outForeignKeyOptInfo(StringInfo str, const ForeignKeyOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(ndist_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(ndist_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3764,6 +3779,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 72272d9..16a90ea 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -29,6 +29,7 @@
 #include "catalog/heap.h"
 #include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -41,7 +42,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -100,6 +103,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *stainfos = NIL;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -397,6 +401,61 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	if (true)
+	{
+		List	   *mvstatoidlist;
+		ListCell   *l;
+
+		mvstatoidlist = RelationGetMVStatList(relation);
+
+		foreach(l, mvstatoidlist)
+		{
+			ArrayType  *arr;
+			Datum		adatum;
+			bool		isnull;
+			Oid			mvoid = lfirst_oid(l);
+			Form_pg_mv_statistic mvstat;
+			MVStatisticInfo *info;
+
+			HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+			/* unavailable stats are not interesting for the planner */
+			if (mvstat->ndist_built)
+			{
+				info = makeNode(MVStatisticInfo);
+
+				info->mvoid = mvoid;
+				info->rel = rel;
+
+				/* enabled statistics */
+				info->ndist_enabled = mvstat->ndist_enabled;
+
+				/* built/available statistics */
+				info->ndist_built = mvstat->ndist_built;
+
+				/* stakeys */
+				adatum = SysCacheGetAttr(MVSTATOID, htup,
+									  Anum_pg_mv_statistic_stakeys, &isnull);
+				Assert(!isnull);
+
+				arr = DatumGetArrayTypeP(adatum);
+
+				info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+												ARR_DIMS(arr)[0]);
+
+				stainfos = lcons(info, stainfos);
+			}
+
+			ReleaseSysCache(htup);
+		}
+
+		list_free(mvstatoidlist);
+	}
+
+	rel->mvstatlist = stainfos;
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 08cf5b7..3fd1fac 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -248,7 +248,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -834,6 +834,7 @@ stmt :
 			| CreateSchemaStmt
 			| CreateSeqStmt
 			| CreateStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3713,6 +3714,34 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON '(' columnList ')' FROM qualified_name
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $9;
+							n->keys = $6;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON '(' columnList ')' FROM qualified_name
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $12;
+							n->keys = $9;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -6050,6 +6079,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
@@ -8435,6 +8465,15 @@ RenameStmt: ALTER AGGREGATE aggregate_with_argtypes RENAME TO name
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name RENAME TO name
+				{
+					RenameStmt *n = makeNode(RenameStmt);
+					n->renameType = OBJECT_STATISTICS;
+					n->object = $3;
+					n->newname = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 		;
 
 opt_column: COLUMN									{ $$ = COLUMN; }
@@ -8720,6 +8759,15 @@ AlterObjectSchemaStmt:
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name SET SCHEMA name
+				{
+					AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = $3;
+					n->newschema = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 		;
 
 /*****************************************************************************
@@ -8910,6 +8958,14 @@ AlterOwnerStmt: ALTER AGGREGATE aggregate_with_argtypes OWNER TO RoleSpec
 					n->newowner = $7;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS name OWNER TO RoleSpec
+				{
+					AlterOwnerStmt *n = makeNode(AlterOwnerStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = list_make1(makeString($3));
+					n->newowner = $6;
+					$$ = (Node *)n;
+				}
 		;
 
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 2e89ad7..b19632f 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1562,6 +1562,10 @@ ProcessUtilitySlow(ParseState *pstate,
 				address = CreateAccessMethod((CreateAmStmt *) parsetree);
 				break;
 
+			case T_CreateStatsStmt:		/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -1920,6 +1924,9 @@ AlterObjectTypeCommandTag(ObjectType objtype)
 		case OBJECT_MATVIEW:
 			tag = "ALTER MATERIALIZED VIEW";
 			break;
+		case OBJECT_STATISTICS:
+			tag = "ALTER STATISTICS";
+			break;
 		default:
 			tag = "???";
 			break;
@@ -2205,6 +2212,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_ACCESS_METHOD:
 					tag = "DROP ACCESS METHOD";
 					break;
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2583,6 +2593,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..eba0352 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4973396..a33549c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -132,6 +132,7 @@
 #include "utils/fmgroids.h"
 #include "utils/index_selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/nabstime.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
@@ -206,6 +207,8 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static double find_ndistinct(PlannerInfo *root, RelOptInfo *rel, List *varinfos,
+							 bool *found);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3435,12 +3438,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient
+			 * to get the (probably way more accurate) estimate.
+			 *
+			 * XXX Might benefit from some refactoring, mixing the ndistinct
+			 * coefficients and clamp seems a bit unfortunate.
 			 */
 			double		clamp = rel->tuples;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				bool found;
+				double ndist = find_ndistinct(root, rel, varinfos, &found);
+
+				if (found)
+					reldistinct = ndist;
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3449,6 +3466,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7599,3 +7617,151 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * XXX Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics. This may be
+ * a bit problematic as adding a column (not covered by the ndistinct stats)
+ * will prevent us from using the stats entirely. So instead this needs to
+ * estimate the covered attributes, and then combine that with the extra
+ * attributes somehow (probably the old way).
+ */
+static double
+find_ndistinct(PlannerInfo *root, RelOptInfo *rel, List *varinfos, bool *found)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+	VariableStatData vardata;
+
+	/* assume we haven't found any suitable ndistinct statistics */
+	*found = false;
+
+	/* bail out immediately if the table has no multivariate statistics */
+	if (!rel->mvstatlist)
+		return 0.0;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+				= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach (lc, rel->mvstatlist)
+	{
+		int i, k;
+		bool matches;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/*
+		 * Only ndistinct stats covering all Vars are acceptable, which can't
+		 * happen if the statistics has fewer attributes than we have Vars.
+		 */
+		if (bms_num_members(attnums) > info->stakeys->dim1)
+			continue;
+
+		/* check that all Vars are covered by the statistic */
+		matches = true; /* assume match until we find unmatched attribute */
+		k = -1;
+		while ((k = bms_next_member(attnums, k)) >= 0)
+		{
+			bool attr_found = false;
+			for (i = 0; i < info->stakeys->dim1; i++)
+			{
+				if (info->stakeys->values[i] == k)
+				{
+					attr_found = true;
+					break;
+				}
+			}
+
+			/* found attribute not covered by this ndistinct stats, skip */
+			if (!attr_found)
+			{
+				matches = false;
+				break;
+			}
+		}
+
+		if (! matches)
+			continue;
+
+		/* hey, this statistics matches! great, let's extract the value */
+		*found = true;
+
+		{
+			int j;
+			MVNDistinct stat = load_mv_ndistinct(info->mvoid);
+
+			for (j = 0; j < stat->nitems; j++)
+			{
+				bool item_matches = true;
+				MVNDistinctItem * item = &stat->items[j];
+
+				/* not the right item (different number of attributes) */
+				if (item->nattrs != bms_num_members(attnums))
+					continue;
+
+				/* check the attribute numbers */
+				k = -1;
+				while ((k = bms_next_member(attnums, k)) >= 0)
+				{
+					bool attr_found = false;
+					for (i = 0; i < item->nattrs; i++)
+					{
+						if (info->stakeys->values[item->attrs[i]] == k)
+						{
+							attr_found = true;
+							break;
+						}
+					}
+
+					if (! attr_found)
+					{
+						item_matches = false;
+						break;
+					}
+				}
+
+				if (! item_matches)
+					continue;
+
+				return item->ndistinct;
+			}
+		}
+	}
+
+	Assert(!(*found));
+
+	return 0.0;
+}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 2a68359..83688eb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -49,6 +49,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_partitioned_table.h"
@@ -4440,6 +4441,62 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
 /*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
@@ -5411,6 +5468,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_indexattr = NULL;
 		rel->rd_keyattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index e87fe0e..4d3e8cd 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -504,6 +505,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..7295d46
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o mvdist.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.ndistinct b/src/backend/utils/mvstats/README.ndistinct
new file mode 100644
index 0000000..9365b17
--- /dev/null
+++ b/src/backend/utils/mvstats/README.ndistinct
@@ -0,0 +1,22 @@
+ndistinct coefficients
+======================
+
+Estimating number of groups in a combination of columns (e.g. for GROUP BY)
+is tricky, and the estimation error is often significant.
+
+The ndistinct coefficients address this by storing ndistinct estimates not
+only for individual columns, but also for (all) combinations of columns.
+So for example given three columns (a,b,c) the statistics will estimate
+ndistinct for (a,b), (a,c), (b,c) and (a,b,c). The per-column estimates
+are already available in pg_statistic.
+
+
+GROUP BY estimation (estimate_num_groups)
+-----------------------------------------
+
+Although ndistinct coefficient might be used for selectivity estimation
+(of equality conditions in WHERE clause), that is not implemented at this
+point.
+
+Instead, ndistinct coefficients are only used in estimate_num_groups() to
+estimate grouped queries.
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
new file mode 100644
index 0000000..865a75d
--- /dev/null
+++ b/src/backend/utils/mvstats/README.stats
@@ -0,0 +1,98 @@
+Multivariate statististics
+==========================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Multivariate stats track different types of dependencies between the columns,
+hopefully improving the estimates.
+
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) ndistinct coefficients
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable multivariate stats
+        exist (so if you are not using multivariate stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are multivariate stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with multivariate statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
+
+
+Size of sample in ANALYZE
+-------------------------
+When performing ANALYZE, the number of rows to sample is determined as
+
+    (300 * statistics_target)
+
+That works reasonably well for statistics on individual columns, but perhaps
+it's not enough for multivariate statistics. Papers analyzing estimation errors
+all use samples proportional to the table (usually finding that 1-3% of the
+table is enough to build accurate stats).
+
+The requested accuracy (number of MCV items or histogram bins) should also
+be considered when determining the sample size, and in multivariate statistics
+those are not necessarily limited by statistics_target.
+
+This however merits further discussion, because collecting the sample is quite
+expensive and increasing it further would make ANALYZE even more painful.
+Judging by the experiments with the current implementation, the fixed size
+seems to work reasonably well for now, so we leave this as a future work.
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..c4a2644
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,384 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats **lookup_var_attr_stats(int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats);
+
+static List *list_mv_stats(Oid relid);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell   *lc;
+	List	   *mvstats;
+
+	TupleDesc	tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute the MV
+	 * statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach(lc, mvstats)
+	{
+		int			j;
+		MVStatisticInfo *stat = (MVStatisticInfo *) lfirst(lc);
+		MVNDistinct	ndistinct = NULL;
+
+		VacAttrStats **stats = NULL;
+		int			numatts = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector *attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (!tupdesc->attrs[attrs->values[j] - 1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16	   *tmp = palloc0(numatts * sizeof(int16));
+			int			attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (!tupdesc->attrs[attrs->values[j] - 1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/* compute ndistinct coefficients */
+		if (stat->ndist_enabled)
+			ndistinct = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
+
+		/* store the statistics in the catalog */
+		update_mv_stats(stat->mvoid, ndistinct, attrs, stats);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int			i,
+				j;
+	int			numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats **) palloc0(numattrs * sizeof(VacAttrStats *));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and that
+		 * there's the requested 'lt' operator and that the type is
+		 * 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/*
+		 * FIXME This is rather ugly way to check for 'ltopr' (which is
+		 * defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *) stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List *
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/*
+	 * TODO maybe save the list into relcache, as in RelationGetIndexList
+	 * (which was used as an inspiration of this one)?.
+	 */
+
+	return result;
+}
+
+void
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+				int2vector *attrs, VacAttrStats **stats)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls, 1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values, 0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram and
+	 * MCV list, depending whether it actually was computed.
+	 */
+	if (ndistinct != NULL)
+	{
+		bytea	   *data = serialize_mv_ndistinct(ndistinct);
+
+		nulls[Anum_pg_mv_statistic_standist -1] = (data == NULL);
+		values[Anum_pg_mv_statistic_standist-1] = PointerGetDatum(data);
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_standist - 1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_ndist_built - 1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys - 1] = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_ndist_built - 1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys - 1] = true;
+
+	values[Anum_pg_mv_statistic_ndist_built - 1] = BoolGetDatum(ndistinct != NULL);
+
+	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum *) a;
+	Datum		db = *(Datum *) b;
+	SortSupport ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem *) a)->value;
+	Datum		db = ((ScalarItem *) b)->value;
+	SortSupport ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport) palloc0(offsetof(MultiSortSupportData, ssup)
+									 +sizeof(SortSupportData) * ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *) vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int			i;
+	SortItem   *ia = (SortItem *) a;
+	SortItem   *ib = (SortItem *) b;
+
+	MultiSortSupport mss = (MultiSortSupport) arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int			compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
+
+int
+multi_sort_compare_dims(int start, int end,
+						const SortItem *a, const SortItem *b,
+						MultiSortSupport mss)
+{
+	int			dim;
+
+	for (dim = start; dim <= end; dim++)
+	{
+		int			r = ApplySortComparator(a->values[dim], a->isnull[dim],
+											b->values[dim], b->isnull[dim],
+											&mss->ssup[dim]);
+
+		if (r != 0)
+			return r;
+	}
+
+	return 0;
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..e471c88
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,80 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/sysattr.h"
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/fmgroids.h"
+#include "utils/mvstats.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+
+/* multi-sort */
+typedef struct MultiSortSupportData
+{
+	int			ndims;			/* number of dimensions supported by the */
+	SortSupportData ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData *MultiSortSupport;
+
+typedef struct SortItem
+{
+	Datum	   *values;
+	bool	   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+					   const SortItem *b, MultiSortSupport mss);
+
+int multi_sort_compare_dims(int start, int end, const SortItem *a,
+						const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..9cb0a00
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,585 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <math.h>
+
+#include "common.h"
+#include "utils/bytea.h"
+#include "utils/lsyscache.h"
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/* internal state for generator of k-combinations of n elements */
+typedef struct CombinationGeneratorData
+{
+
+	int			k;				/* size of the combination */
+	int			current;		/* index of the next combination to return */
+
+	int			ncombinations;	/* number of combinations (size of array) */
+	int		   *combinations;	/* array of pre-built combinations */
+
+} CombinationGeneratorData;
+
+typedef CombinationGeneratorData *CombinationGenerator;
+
+/* generator API */
+static CombinationGenerator generator_init(int2vector *attrs, int k);
+static void generator_free(CombinationGenerator state);
+static int *generator_next(CombinationGenerator state, int2vector *attrs);
+
+static int n_choose_k(int n, int k);
+static int num_combinations(int n);
+static double ndistinct_for_combination(double totalrows, int numrows,
+				   HeapTuple *rows, int2vector *attrs, VacAttrStats **stats,
+				   int k, int *combination);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+MVNDistinct
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats)
+{
+	int		i, k;
+	int		numattrs = attrs->dim1;
+	int		numcombs = num_combinations(numattrs);
+
+	MVNDistinct	result;
+
+	result = palloc0(offsetof(MVNDistinctData, items) +
+					 numcombs * sizeof(MVNDistinctItem));
+
+	result->nitems = numcombs;
+
+	i = 0;
+	for (k = 2; k <= numattrs; k++)
+	{
+		int	* combination;
+		CombinationGenerator generator;
+
+		generator = generator_init(attrs, k);
+
+		while ((combination = generator_next(generator, attrs)))
+		{
+			MVNDistinctItem *item = &result->items[i++];
+
+			item->nattrs = k;
+			item->ndistinct = ndistinct_for_combination(totalrows, numrows, rows,
+												attrs, stats, k, combination);
+
+			item->attrs = palloc(k * sizeof(int));
+			memcpy(item->attrs, combination, k * sizeof(int));
+
+			/* must not overflow the output array */
+			Assert(i <= result->nitems);
+		}
+
+		generator_free(generator);
+	}
+
+	/* must consume exactly the whole output array */
+	Assert(i == result->nitems);
+
+	return result;
+}
+
+static double
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats,
+				   int k, int *combination)
+{
+	int i, j;
+	int f1, cnt, d;
+	int nmultiple, summultiple;
+	MultiSortSupport mss = multi_sort_init(k);
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * k);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * k);
+
+	Assert((k >= 2) && (k <= attrs->dim1));
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	for (i = 0; i < k; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, combination[i], stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[combination[i]],
+							   stats[combination[i]]->tupDesc,
+							   &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	return estimate_ndistinct(totalrows, numrows, d, f1);
+}
+
+MVNDistinct
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		ndist;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	ndist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_ndistinct(DatumGetByteaP(ndist));
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double	numer,
+			denom,
+			ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+			(double) f1 * (double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
+
+
+/*
+ * pg_ndistinct_in		- input routine for type pg_ndistinct.
+ *
+ * pg_ndistinct is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_ndistinct_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_ndistinct")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_ndistinct		- output routine for type pg_ndistinct.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ */
+Datum
+pg_ndistinct_out(PG_FUNCTION_ARGS)
+{
+	int i, j;
+	char		   *ret;
+	StringInfoData	str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVNDistinct ndist = deserialize_mv_ndistinct(data);
+
+	initStringInfo(&str);
+	appendStringInfoString(&str, "[");
+
+	for (i = 0; i < ndist->nitems; i++)
+	{
+		MVNDistinctItem item = ndist->items[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoString(&str, "{");
+
+		for (j = 0; j < item.nattrs; j++)
+		{
+			if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", item.attrs[j]);
+		}
+
+		appendStringInfo(&str, ", %f", item.ndistinct);
+
+		appendStringInfoString(&str, "}");
+	}
+
+	appendStringInfoString(&str, "]");
+
+	ret = pstrdup(str.data);
+	pfree(str.data);
+
+	PG_RETURN_CSTRING(ret);
+}
+
+/*
+ * pg_ndistinct_recv		- binary input routine for type pg_ndistinct.
+ */
+Datum
+pg_ndistinct_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_ndistinct")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_ndistinct_send		- binary output routine for type pg_ndistinct.
+ *
+ * XXX Histograms are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_ndistinct_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+static int
+n_choose_k(int n, int k)
+{
+	int i, numer, denom;
+
+	Assert((n > 0) && (k > 0) && (n >= k));
+
+	numer = denom = 1;
+	for (i = 1; i <= k; i++)
+	{
+		numer *= (n - i + 1);
+		denom *= i;
+	}
+
+	Assert(numer % denom == 0);
+
+	return numer / denom;
+}
+
+static int
+num_combinations(int n)
+{
+	int k;
+	int ncombs = 0;
+
+	/* ignore combinations with a single column */
+	for (k = 2; k <= n; k++)
+		ncombs += n_choose_k(n, k);
+
+	return ncombs;
+}
+
+/*
+ * generate all combinations (k elements from n)
+ */
+static void
+generate_combinations_recurse(CombinationGenerator state,
+							  int n, int index, int start, int *current)
+{
+	/* If we haven't filled all the elements, simply recurse. */
+	if (index < state->k)
+	{
+		int i;
+
+		/*
+		 * The values have to be in ascending order, so make sure we start
+		 * with the value passed by parameter.
+		 */
+
+		for (i = start; i < n; i++)
+		{
+			current[index] = i;
+			generate_combinations_recurse(state, n, (index+1), (i+1), current);
+		}
+
+		return;
+	}
+	else
+	{
+		/* we got a correct combination */
+		state->combinations = (int*)repalloc(state->combinations,
+						 state->k * (state->current + 1) * sizeof(int));
+		memcpy(&state->combinations[(state->k * state->current)],
+			   current, state->k * sizeof(int));
+		state->current++;
+	}
+}
+
+/* generate all k-combinations of n elements */
+static void
+generate_combinations(CombinationGenerator state, int n)
+{
+	int	   *current = (int *) palloc0(sizeof(int) * state->k);
+
+	generate_combinations_recurse(state, n, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the generator of combinations, and prebuild them.
+ *
+ * This pre-builds all the combinations. We could also generate them in
+ * generator_next(), but this seems simpler.
+ */
+static CombinationGenerator
+generator_init(int2vector *attrs, int k)
+{
+	int			n = attrs->dim1;
+	CombinationGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the generator state as a single chunk of memory */
+	state = (CombinationGenerator) palloc0(sizeof(CombinationGeneratorData));
+	state->combinations = (int*)palloc(k * sizeof(int));
+
+	state->ncombinations = n_choose_k(n, k);
+	state->current = 0;
+	state->k = k;
+
+	/* now actually pre-generate all the combinations */
+	generate_combinations(state, n);
+
+	/* make sure we got the expected number of combinations */
+	Assert(state->current == state->ncombinations);
+
+	/* reset the number, so we start with the first one */
+	state->current = 0;
+
+	return state;
+}
+
+/* free the generator state */
+static void
+generator_free(CombinationGenerator state)
+{
+	/* we've allocated a single chunk, so just free it */
+	pfree(state);
+}
+
+/* generate next combination */
+static int *
+generator_next(CombinationGenerator state, int2vector *attrs)
+{
+	if (state->current == state->ncombinations)
+		return NULL;
+
+	return &state->combinations[state->k * state->current++];
+}
+
+/*
+ * serialize list of ndistinct items into a bytea
+ */
+bytea *
+serialize_mv_ndistinct(MVNDistinct ndistinct)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+
+	/* we need to store nitems */
+	Size		len = VARHDRSZ + offsetof(MVNDistinctData, items) +
+					  ndistinct->nitems * offsetof(MVNDistinctItem, attrs);
+
+	/* and also include space for the actual attribute numbers */
+	for (i = 0; i < ndistinct->nitems; i++)
+		len += (sizeof(int) * ndistinct->items[i].nattrs);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	ndistinct->magic = MVSTAT_NDISTINCT_MAGIC;
+	ndistinct->type = MVSTAT_NDISTINCT_TYPE_BASIC;
+
+	/* first, store the number of items */
+	memcpy(tmp, ndistinct, offsetof(MVNDistinctData, items));
+	tmp += offsetof(MVNDistinctData, items);
+
+	/* store number of attributes and attribute numbers for each ndistinct entry */
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		MVNDistinctItem item = ndistinct->items[i];
+
+		memcpy(tmp, &item, offsetof(MVNDistinctItem, attrs));
+		tmp += offsetof(MVNDistinctItem, attrs);
+
+		memcpy(tmp, item.attrs, sizeof(int) * item.nattrs);
+		tmp += sizeof(int) * item.nattrs;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized ndistinct into MVNDistinct structure.
+ */
+MVNDistinct
+deserialize_mv_ndistinct(bytea *data)
+{
+	int			i;
+	Size		expected_size;
+	MVNDistinct ndistinct;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVNDistinctData, items))
+		elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVNDistinctData, items));
+
+	/* read the MVNDistinct header */
+	ndistinct = (MVNDistinct) palloc0(sizeof(MVNDistinctData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(ndistinct, tmp, offsetof(MVNDistinctData, items));
+	tmp += offsetof(MVNDistinctData, items);
+
+	if (ndistinct->magic != MVSTAT_NDISTINCT_MAGIC)
+		elog(ERROR, "invalid ndistinct magic %d (expected %dd)",
+			 ndistinct->magic, MVSTAT_NDISTINCT_MAGIC);
+
+	if (ndistinct->type != MVSTAT_NDISTINCT_TYPE_BASIC)
+		elog(ERROR, "invalid ndistinct type %d (expected %dd)",
+			 ndistinct->type, MVSTAT_NDISTINCT_TYPE_BASIC);
+
+	Assert(ndistinct->nitems > 0);
+
+	/* what minimum bytea size do we expect for those parameters */
+	expected_size = offsetof(MVNDistinctData, items) +
+		ndistinct->nitems * (offsetof(MVNDistinctItem, attrs) + sizeof(int) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the ndistinct items */
+	ndistinct = repalloc(ndistinct, offsetof(MVNDistinctData, items) +
+						 (ndistinct->nitems * sizeof(MVNDistinctItem)));
+
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		MVNDistinctItem *item = &ndistinct->items[i];
+
+		/* number of attributes */
+		memcpy(item, tmp, offsetof(MVNDistinctItem, attrs));
+		tmp += offsetof(MVNDistinctItem, attrs);
+
+		/* is the number of attributes valid? */
+		Assert((item->nattrs >= 2) && (item->nattrs <= MVSTATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the attribute */
+		item->attrs = (int*)palloc0(item->nattrs * sizeof(int));
+
+		/* copy attribute numbers */
+		memcpy(item->attrs, tmp, sizeof(int) * item->nattrs);
+		tmp += sizeof(int) * item->nattrs;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return ndistinct;
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index a582a37..1f1050b 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2293,6 +2293,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90600)
+		{
+			printfPQExpBuffer(&buf,
+							  "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+							  "  ndist_enabled,\n"
+							  "  ndist_built,\n"
+							  "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+							  "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+			  "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/* options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+									  PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index e8a302f..76c81a7 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -161,10 +161,11 @@ typedef enum ObjectClass
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
 	OCLASS_POLICY,				/* pg_policy */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 /* flag bits for performDeletion/performMultipleDeletions: */
 #define PERFORM_DELETION_INTERNAL			0x0001		/* internal action */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 0e4262f..3a3a200 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 293985d..7fb09a4 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -176,6 +176,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId	3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId 3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index eee94d8..b9a4db1 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -141,6 +141,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 04d11c0..d54a0f9 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -254,6 +254,10 @@ DATA(insert (	23	 18   78 e f ));
 /* pg_node_tree can be coerced to, but not from, text */
 DATA(insert (  194	 25    0 i b ));
 
+/* pg_ndistinct can be coerced to, but not from, bytea and text */
+DATA(insert (  3343	 17    0 i b ));
+DATA(insert (  3343	 25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..fad80a3
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,78 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+	NameData	staname;		/* statistics name */
+	Oid			stanamespace;	/* OID of namespace containing this statistics */
+	Oid			staowner;		/* statistics owner */
+
+	/* statistics requested to build */
+	bool		ndist_enabled;	/* build ndist coefficient? */
+
+	/* statistics that are available (if requested) */
+	bool		ndist_built;	/* ndistinct coeff built */
+
+	/*
+	 * variable-length fields start here, but we allow direct access to
+	 * stakeys
+	 */
+	int2vector	stakeys;		/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	pg_ndistinct		standist;		/* ndistinct coeff (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					8
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_staowner			4
+#define Anum_pg_mv_statistic_ndist_enabled		5
+#define Anum_pg_mv_statistic_ndist_built		6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_standist			8
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index a6cc2eb..8bf46e2 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2722,6 +2722,15 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3344 (  pg_ndistinct_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3343 "2275" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3345 (  pg_ndistinct_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3343" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3346 (  pg_ndistinct_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3343 "2281" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3347 (  pg_ndistinct_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3343" _null_ _null_ _null_ _null_ _null_	pg_ndistinct_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 162239c..e6ab642 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -364,6 +364,10 @@ DATA(insert OID = 194 ( pg_node_tree	PGNSP PGUID -1 f b S f t \054 0 0 0 pg_node
 DESCR("string representing an internal node tree");
 #define PGNODETREEOID	194
 
+DATA(insert OID = 3343 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_ndistinct_in pg_ndistinct_out pg_ndistinct_recv pg_ndistinct_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate ndistinct coefficients");
+#define PGNDISTINCTOID	3343
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index b7a38ce..e1e4945 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3439, 3440);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index d790fbf..e5ef713 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -77,6 +77,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(ParseState *pstate, List *name, List *args, bool oldstyle,
 				List *parameters);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 201f248..9790059 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -269,6 +269,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -407,6 +408,7 @@ typedef enum NodeTag
 	T_CreateTransformStmt,
 	T_CreateAmStmt,
 	T_PartitionCmd,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 9d8ef77..13bfc52 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -603,6 +603,16 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	bool		if_not_exists;	/* do nothing if statistics already exists */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1516,6 +1526,7 @@ typedef enum ObjectType
 	OBJECT_RULE,
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3a1255a..3e9f930 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -520,6 +520,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -656,6 +657,32 @@ typedef struct ForeignKeyOptInfo
 	List	   *rinfos[INDEX_MAX_KEYS];
 } ForeignKeyOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
+
+	/* built/available statistics */
+	bool		ndist_built;	/* ndistinct coefficient built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/acl.h b/src/include/utils/acl.h
index faadebd..a1fbf3c 100644
--- a/src/include/utils/acl.h
+++ b/src/include/utils/acl.h
@@ -332,6 +332,7 @@ extern bool pg_foreign_data_wrapper_ownercheck(Oid srv_oid, Oid roleid);
 extern bool pg_foreign_server_ownercheck(Oid srv_oid, Oid roleid);
 extern bool pg_event_trigger_ownercheck(Oid et_oid, Oid roleid);
 extern bool pg_extension_ownercheck(Oid ext_oid, Oid roleid);
+extern bool pg_statistics_ownercheck(Oid stat_oid, Oid roleid);
 extern bool has_createrole_privilege(Oid roleid);
 extern bool has_bypassrls_privilege(Oid roleid);
 
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 7ed1623..de00edd 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -613,6 +613,10 @@ extern Datum pg_ddl_command_in(PG_FUNCTION_ARGS);
 extern Datum pg_ddl_command_out(PG_FUNCTION_ARGS);
 extern Datum pg_ddl_command_recv(PG_FUNCTION_ARGS);
 extern Datum pg_ddl_command_send(PG_FUNCTION_ARGS);
+extern Datum pg_ndistinct_in(PG_FUNCTION_ARGS);
+extern Datum pg_ndistinct_out(PG_FUNCTION_ARGS);
+extern Datum pg_ndistinct_recv(PG_FUNCTION_ARGS);
+extern Datum pg_ndistinct_send(PG_FUNCTION_ARGS);
 
 /* regexp.c */
 extern Datum nameregexeq(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..2937c78
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,60 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+#define MVSTAT_NDISTINCT_MAGIC		0xA352BFA4		/* marks serialized bytea */
+#define MVSTAT_NDISTINCT_TYPE_BASIC	1		/* basic MCV list type */
+
+/* Multivariate distinct coefficients. */
+typedef struct MVNDistinctItem {
+	double		ndistinct;
+	int			nattrs;
+	int		   *attrs;
+} MVNDistinctItem;
+
+typedef struct MVNDistinctData {
+	uint32			magic;			/* magic constant marker */
+	uint32			type;			/* type of ndistinct (BASIC) */
+	int				nitems;
+	MVNDistinctItem	items[FLEXIBLE_ARRAY_MEMBER];
+} MVNDistinctData;
+
+typedef MVNDistinctData *MVNDistinct;
+
+
+MVNDistinct		load_mv_ndistinct(Oid mvoid);
+
+bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVNDistinct deserialize_mv_ndistinct(bytea *data);
+
+
+MVNDistinct build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				 int2vector *attrs, VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats);
+
+void update_mv_stats(Oid relid, MVNDistinct ndistinct,
+					 int2vector *attrs, VacAttrStats **stats);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index cd7ea1d..d89e1eb 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -91,6 +91,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid; /* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -134,6 +135,9 @@ typedef struct RelationData
 	Oid			rd_oidindex;	/* OID of unique index on OID, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
+
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
 	Bitmapset  *rd_keyattr;		/* cols that can be ref'd by foreign keys */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 6ea7dd2..6f23593 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -39,6 +39,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
 extern List *RelationGetIndexExpressions(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 4b7631e..cffb5dc 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/mv_ndistinct.out b/src/test/regress/expected/mv_ndistinct.out
new file mode 100644
index 0000000..5f55091
--- /dev/null
+++ b/src/test/regress/expected/mv_ndistinct.out
@@ -0,0 +1,117 @@
+-- data type passed by value
+CREATE TABLE ndistinct (
+    a INT,
+    b INT,
+    c INT,
+    d INT
+);
+-- unknown column
+CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s10 ON (a) FROM ndistinct;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+ERROR:  duplicate column name in statistics definition
+-- correct command
+CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+-- perfectly correlated groups
+INSERT INTO ndistinct
+     SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
+ANALYZE ndistinct;
+SELECT ndist_enabled, ndist_built, standist
+  FROM pg_mv_statistic WHERE starelid = 'ndistinct'::regclass;
+ ndist_enabled | ndist_built |                                      standist                                       
+---------------+-------------+-------------------------------------------------------------------------------------
+ t             | t           | [{0, 1, 101.000000}, {0, 2, 101.000000}, {1, 2, 101.000000}, {0, 1, 2, 101.000000}]
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b, c
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b, c, d
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+TRUNCATE TABLE ndistinct;
+-- partially correlated groups
+INSERT INTO ndistinct
+     SELECT i/50, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE ndistinct;
+SELECT ndist_enabled, ndist_built, standist
+  FROM pg_mv_statistic WHERE starelid = 'ndistinct'::regclass;
+ ndist_enabled | ndist_built |                                      standist                                       
+---------------+-------------+-------------------------------------------------------------------------------------
+ t             | t           | [{0, 1, 201.000000}, {0, 2, 201.000000}, {1, 2, 101.000000}, {0, 1, 2, 201.000000}]
+(1 row)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ HashAggregate  (cost=230.00..232.01 rows=201 width=16)
+   Group Key: a, b
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=8)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=255.00..257.01 rows=201 width=20)
+   Group Key: a, b, c
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=12)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=280.00..290.00 rows=1000 width=24)
+   Group Key: a, b, c, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=16)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=255.00..265.00 rows=1000 width=20)
+   Group Key: b, c, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=12)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ HashAggregate  (cost=230.00..240.00 rows=1000 width=16)
+   Group Key: a, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=8)
+(3 rows)
+
+DROP TABLE ndistinct;
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index 7e81cdd..f85381a 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -36,6 +36,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regress_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
+CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
@@ -379,7 +380,8 @@ WITH objects (type, name, args) AS (VALUES
 				-- event trigger
 				('policy', '{addr_nsp, gentable, genpol}', '{}'),
 				('transform', '{int}', '{sql}'),
-				('access method', '{btree}', '{}')
+				('access method', '{btree}', '{}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
@@ -427,13 +429,14 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
  trigger                   |            |                   | t on addr_nsp.gentable                                               | t
  operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops USING btree                                   | t
  policy                    |            |                   | genpol on addr_nsp.gentable                                          | t
+ statistics                | addr_nsp   | gentable_stat     | addr_nsp.gentable_stat                                               | t
  collation                 | pg_catalog | "default"         | pg_catalog."default"                                                 | t
  transform                 |            |                   | for integer on language sql                                          | t
  text search dictionary    | addr_nsp   | addr_ts_dict      | addr_nsp.addr_ts_dict                                                | t
  text search parser        | addr_nsp   | addr_ts_prs       | addr_nsp.addr_ts_prs                                                 | t
  text search configuration | addr_nsp   | addr_ts_conf      | addr_nsp.addr_ts_conf                                                | t
  text search template      | addr_nsp   | addr_ts_temp      | addr_nsp.addr_ts_temp                                                | t
-(42 rows)
+(43 rows)
 
 ---
 --- Cleanup resources
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 0bcec13..9a26205 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -817,11 +817,12 @@ WHERE c.castmethod = 'b' AND
  text              | character         |        0 | i
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
+ pg_ndistinct      | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(7 rows)
+(8 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e9cfadb..7599d6c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1376,6 +1376,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length((s.standist)::text) AS ndistbytes
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 7ad68c7..827f133 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -116,6 +116,7 @@ pg_init_privs|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index e5adfba..9356aaa 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -67,12 +67,13 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid |   typname    
------+--------------
- 194 | pg_node_tree
- 210 | smgr
- 705 | unknown
-(3 rows)
+ oid  |   typname    
+------+--------------
+  194 | pg_node_tree
+ 3343 | pg_ndistinct
+  210 | smgr
+  705 | unknown
+(4 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 8641769..b70b82f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -113,3 +113,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_ndistinct
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 835cf35..1c5bfaf 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -169,3 +169,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_ndistinct
diff --git a/src/test/regress/sql/mv_ndistinct.sql b/src/test/regress/sql/mv_ndistinct.sql
new file mode 100644
index 0000000..5cef254
--- /dev/null
+++ b/src/test/regress/sql/mv_ndistinct.sql
@@ -0,0 +1,68 @@
+-- data type passed by value
+CREATE TABLE ndistinct (
+    a INT,
+    b INT,
+    c INT,
+    d INT
+);
+
+-- unknown column
+CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+
+-- single column
+CREATE STATISTICS s10 ON (a) FROM ndistinct;
+
+-- single column, duplicated
+CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+
+-- two columns, one duplicated
+CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+
+-- correct command
+CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+
+-- perfectly correlated groups
+INSERT INTO ndistinct
+     SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
+
+ANALYZE ndistinct;
+
+SELECT ndist_enabled, ndist_built, standist
+  FROM pg_mv_statistic WHERE starelid = 'ndistinct'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+
+TRUNCATE TABLE ndistinct;
+
+-- partially correlated groups
+INSERT INTO ndistinct
+     SELECT i/50, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE ndistinct;
+
+SELECT ndist_enabled, ndist_built, standist
+  FROM pg_mv_statistic WHERE starelid = 'ndistinct'::regclass;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
+
+DROP TABLE ndistinct;
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index 7d1f93f..956bef3 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -39,6 +39,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regress_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
+CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
@@ -169,7 +170,8 @@ WITH objects (type, name, args) AS (VALUES
 				-- event trigger
 				('policy', '{addr_nsp, gentable, genpol}', '{}'),
 				('transform', '{int}', '{sql}'),
-				('access method', '{btree}', '{}')
+				('access method', '{btree}', '{}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
-- 
2.5.5

0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v22.patchbinary/octet-stream; name=0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v22.patchDownload

From c3351cdc79390baadedcd4d72ec9c650ab60917b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:36:25 +0200
Subject: [PATCH 3/9] PATCH: functional dependencies (only the ANALYZE part)

- implementation of soft functional dependencies (ANALYZE etc.)
- updates existing regression tests (new catalog etc.)
- new regression test for functional dependencies
- pg_ndistinct data type (varlena-based)

The algorithm detecting the dependencies is rather simple and probably
needs improvements, so that it detects more complicated dependencies,
and also validation of the math.

The patch introduces pg_dependencies, a new varlena data type for
storing serialized version of functional dependencies. This is similar
to what pg_ndistinct does for ndistinct coefficients.
---
 doc/src/sgml/catalogs.sgml                    |  30 ++
 doc/src/sgml/ref/create_statistics.sgml       |  42 +-
 src/backend/catalog/system_views.sql          |   3 +-
 src/backend/commands/statscmds.c              |  37 +-
 src/backend/nodes/copyfuncs.c                 |   1 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/optimizer/util/plancat.c          |   4 +-
 src/backend/parser/gram.y                     |  14 +-
 src/backend/utils/mvstats/Makefile            |   2 +-
 src/backend/utils/mvstats/README.dependencies | 118 +++++
 src/backend/utils/mvstats/common.c            |  22 +-
 src/backend/utils/mvstats/dependencies.c      | 632 ++++++++++++++++++++++++++
 src/include/catalog/pg_cast.h                 |   4 +
 src/include/catalog/pg_mv_statistic.h         |  14 +-
 src/include/catalog/pg_proc.h                 |   9 +
 src/include/catalog/pg_type.h                 |   4 +
 src/include/nodes/parsenodes.h                |   1 +
 src/include/nodes/relation.h                  |   2 +
 src/include/utils/builtins.h                  |   4 +
 src/include/utils/mvstats.h                   |  39 +-
 src/test/regress/expected/mv_dependencies.out | 147 ++++++
 src/test/regress/expected/mv_ndistinct.out    |  10 +-
 src/test/regress/expected/object_address.out  |   2 +-
 src/test/regress/expected/opr_sanity.out      |   3 +-
 src/test/regress/expected/rules.out           |   3 +-
 src/test/regress/expected/type_sanity.out     |   7 +-
 src/test/regress/parallel_schedule            |   2 +-
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 139 ++++++
 src/test/regress/sql/mv_ndistinct.sql         |  10 +-
 src/test/regress/sql/object_address.sql       |   2 +-
 31 files changed, 1269 insertions(+), 41 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.dependencies
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index b39ca69..716024b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4270,6 +4270,17 @@
      </row>
 
      <row>
+      <entry><structfield>deps_enabled</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, functional dependencies will be computed for the combination of
+       columns, covered by the statistics. This does not mean the dependencies
+       are already computed, though.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>ndist_built</structfield></entry>
       <entry><type>bool</type></entry>
       <entry></entry>
@@ -4280,6 +4291,16 @@
      </row>
 
      <row>
+      <entry><structfield>deps_built</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, functional depenedencies are already computed and available for
+       use during query estimation.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>stakeys</structfield></entry>
       <entry><type>int2vector</type></entry>
       <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
@@ -4299,6 +4320,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stadeps</structfield></entry>
+      <entry><type>pg_dependencies</type></entry>
+      <entry></entry>
+      <entry>
+       Functional dependencies, serialized as <structname>pg_dependencies</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 7fa118c..42adc38 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -21,8 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
-  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable>
+  WITH ( <replaceable class="PARAMETER">option</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+  ON ( <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
   FROM <replaceable class="PARAMETER">table_name</replaceable>
 </synopsis>
 
@@ -100,6 +101,41 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
 
   </variablelist>
 
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>options</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
  </refsect1>
 
  <refsect1 id="SQL-CREATESTATISTICS-examples">
@@ -120,7 +156,7 @@ CREATE TABLE t1 (
 INSERT INTO t1 SELECT i/100, i/500
                  FROM generate_series(1,1000000) s(i);
 
-CREATE STATISTICS s1 ON (a, b) FROM t1;
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t1;
 
 ANALYZE t1;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7eb356e..fc1b1a9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -187,7 +187,8 @@ CREATE VIEW pg_mv_stats AS
         C.relname AS tablename,
         S.staname AS staname,
         S.stakeys AS attnums,
-        length(s.standist) AS ndistbytes
+        length(s.standist::bytea) AS ndistbytes,
+        length(S.stadeps::bytea) AS depsbytes
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 8453dc4..ec41bbc 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -38,7 +38,9 @@ compare_int16(const void *a, const void *b)
 }
 
 /*
- * Implements the CREATE STATISTICS name ON (columns) FROM table
+ * Implements the CREATE STATISTICS command with syntax:
+ *
+ *    CREATE STATISTICS name WITH (options) ON (columns) FROM table
  *
  * We do require that the types support sorting (ltopr), although some
  * statistics might work with  equality only.
@@ -72,6 +74,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject,
 				childobject;
 
+	/* by default build nothing */
+	bool		build_ndistinct = false,
+				build_dependencies = false;
+
 	Assert(IsA(stmt, CreateStatsStmt));
 
 	/* resolve the pieces of the name (namespace etc.) */
@@ -151,6 +157,31 @@ CreateStatistics(CreateStatsStmt *stmt)
 					(errcode(ERRCODE_UNDEFINED_COLUMN),
 			  errmsg("duplicate column name in statistics definition")));
 
+	/*
+	 * Parse the statistics options - currently only statistics types are
+	 * recognized (ndistinct, dependencies).
+	 */
+	foreach(l, stmt->options)
+	{
+		DefElem    *opt = (DefElem *) lfirst(l);
+
+		if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* Make sure there's at least one statistics type specified. */
+	if (! (build_ndistinct || build_dependencies))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (ndistinct, dependencies) requested")));
+
 	stakeys = buildint2vector(attnums, numcols);
 
 	/*
@@ -170,9 +201,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(stakeys);
 
 	/* enabled statistics */
-	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(true);
+	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(build_ndistinct);
+	values[Anum_pg_mv_statistic_deps_enabled - 1] = BoolGetDatum(build_dependencies);
 
 	nulls[Anum_pg_mv_statistic_standist - 1] = true;
+	nulls[Anum_pg_mv_statistic_stadeps - 1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 256f8c6..8791b1c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4260,6 +4260,7 @@ _copyCreateStatsStmt(const CreateStatsStmt *from)
 	COPY_NODE_FIELD(defnames);
 	COPY_NODE_FIELD(relation);
 	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
 	COPY_SCALAR_FIELD(if_not_exists);
 
 	return newnode;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1c2c200..26e504f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2180,9 +2180,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(ndist_enabled);
+	WRITE_BOOL_FIELD(deps_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(ndist_built);
+	WRITE_BOOL_FIELD(deps_built);
 }
 
 static void
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 16a90ea..91d4099 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -422,7 +422,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->ndist_built)
+			if (mvstat->deps_built || mvstat->ndist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -431,9 +431,11 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 				/* enabled statistics */
 				info->ndist_enabled = mvstat->ndist_enabled;
+				info->deps_enabled = mvstat->deps_enabled;
 
 				/* built/available statistics */
 				info->ndist_built = mvstat->ndist_built;
+				info->deps_built = mvstat->deps_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3fd1fac..a4a965d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3722,21 +3722,23 @@ ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
  *****************************************************************************/
 
 
-CreateStatsStmt:	CREATE STATISTICS any_name ON '(' columnList ')' FROM qualified_name
+CreateStatsStmt:	CREATE STATISTICS any_name opt_reloptions ON '(' columnList ')' FROM qualified_name
 						{
 							CreateStatsStmt *n = makeNode(CreateStatsStmt);
 							n->defnames = $3;
-							n->relation = $9;
-							n->keys = $6;
+							n->relation = $10;
+							n->keys = $7;
+							n->options = $4;
 							n->if_not_exists = false;
 							$$ = (Node *)n;
 						}
-					| CREATE STATISTICS IF_P NOT EXISTS any_name ON '(' columnList ')' FROM qualified_name
+					| CREATE STATISTICS IF_P NOT EXISTS any_name opt_reloptions ON '(' columnList ')' FROM qualified_name
 						{
 							CreateStatsStmt *n = makeNode(CreateStatsStmt);
 							n->defnames = $6;
-							n->relation = $12;
-							n->keys = $9;
+							n->relation = $13;
+							n->keys = $10;
+							n->options = $7;
 							n->if_not_exists = true;
 							$$ = (Node *)n;
 						}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 7295d46..21fe7e5 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o mvdist.o
+OBJS = common.o dependencies.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
new file mode 100644
index 0000000..86bee6e
--- /dev/null
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -0,0 +1,118 @@
+Soft functional dependencies
+============================
+
+Functional dependencies are a concept well described in relational theory,
+particularly in definition of normalization and "normal forms". Wikipedia
+has a nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency
+    on a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee"
+    table that includes the attributes "Employee ID" and "Employee Date of
+    Birth", the functional dependency
+
+        {Employee ID} -> {Employee Date of Birth}
+
+    would hold. It follows from the previous two sentences that each
+    {Employee ID} is associated with precisely one {Employee Date of Birth}.
+
+    [1] http://en.wikipedia.org/wiki/Database_normalization
+
+In practical terms, functional dependencies mean that a value in one column
+determines values in some other column. Consider for example this trivial
+table with two integer columns:
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, knowledge of the value in column 'a' is sufficient to determine the
+value in column 'b', as it's simply (a/10). A more practical example may be
+addresses, where the knowledge of a ZIP code (usually) determines city. Larger
+cities may have multiple ZIP codes, so the dependency can't be reversed.
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases it's actually a conscious
+design choice to model the dataset in denormalized way, either because of
+performance or to make querying easier.
+
+
+soft dependencies
+-----------------
+
+Real-world data sets often contain data errors, either because of data entry
+mistakes (user mistyping the ZIP code) or perhaps issues in generating the
+data (e.g. a ZIP code mistakenly assigned to two cities in different states).
+
+A strict implementation would either ignore dependencies in such cases,
+rendering the approach mostly useless even for slightly noisy data sets, or
+result in sudden changes in behavior depending on minor differences between
+samples provided to ANALYZE.
+
+For this reason the statistics implementes "soft" functional dependencies,
+associating each functional dependency with a degree of validity (a number
+number between 0 and 1). This degree is then used to combine selectivities
+in a smooth manner.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current algorithm is fairly simple - generate all possible functional
+dependencies, and for each one count the number of rows rows consistent it.
+Then use the fraction of rows (supporting/total) as the degree.
+
+To count the rows consistent with the dependency (a => b):
+
+ (a) Sort the data lexicographically, i.e. first by 'a' then 'b'.
+
+ (b) For each group of rows with the same 'a' value, count the number of
+     distinct values in 'b'.
+
+ (c) If there's a single distinct value in 'b', the rows are consistent with
+     the functional dependency. Otherwise they contradict it.
+
+The algorithm also requires a minimum size of the group to consider it
+consistent (currently 3 rows in the sample). Small groups make it less likely
+to break the consistency.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Apllying the functional dependencies is fairly simple - given a list of
+equality clauses, we compute selectivities of each clause and then use the
+degree to combine them using this formula
+
+    P(a=?,b=?) = P(a=?) * (d + (1-d) * P(b=?))
+
+Where 'd' is the degree of functional dependence (a=>b).
+
+With more than two equality clauses, this process happens recursively. For
+example for (a,b,c) we first use (a,b=>c) to break the computation into
+
+    P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))
+
+and then apply (a=>b) the same way on P(a=?,b=?).
+
+
+Consistecy of clauses
+---------------------
+
+Functional dependencies only express general dependencies between columns,
+without referencing particular values. This assumes that the equality clauses
+are in fact consistent with the functinal dependency, i.e. that given the a
+dependency (a=>b), the value in (b=?) clause is the value determined by (a=?).
+If that's not the case, the clauses are "inconsistent" with the functional
+dependency and the result will be over-estimation.
+
+This may happen for example when using conditions on ZIP and city name with
+mismatching values (ZIP for a different city), etc. In such case the result
+set will be empty, but we'll estimate the selectivity using the ZIP condition.
+
+In this case the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+This issue is the price for the simplicity of functional dependencies. If the
+application frequently constructs queries with clauses inconsistent with
+functional dependencies present in the data, the best solution is not to
+use functional dependencies, but one of the more complex types of statistics.
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index c4a2644..e1bed39 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -50,6 +50,7 @@ build_mv_stats(Relation onerel, double totalrows,
 		int			j;
 		MVStatisticInfo *stat = (MVStatisticInfo *) lfirst(lc);
 		MVNDistinct	ndistinct = NULL;
+		MVDependencies deps = NULL;
 
 		VacAttrStats **stats = NULL;
 		int			numatts = 0;
@@ -86,8 +87,12 @@ build_mv_stats(Relation onerel, double totalrows,
 		if (stat->ndist_enabled)
 			ndistinct = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
 
+		/* analyze functional dependencies between the columns */
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
 		/* store the statistics in the catalog */
-		update_mv_stats(stat->mvoid, ndistinct, attrs, stats);
+		update_mv_stats(stat->mvoid, ndistinct, deps, attrs, stats);
 	}
 }
 
@@ -167,6 +172,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->ndist_enabled = stats->ndist_enabled;
 		info->ndist_built = stats->ndist_built;
+		info->deps_enabled = stats->deps_enabled;
+		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
 	}
@@ -184,7 +191,7 @@ list_mv_stats(Oid relid)
 }
 
 void
-update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -211,18 +218,29 @@ update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
 		values[Anum_pg_mv_statistic_standist-1] = PointerGetDatum(data);
 	}
 
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps - 1] = false;
+		values[Anum_pg_mv_statistic_stadeps - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_standist - 1] = true;
+	replaces[Anum_pg_mv_statistic_stadeps - 1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_ndist_built - 1] = false;
+	nulls[Anum_pg_mv_statistic_deps_built - 1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys - 1] = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_ndist_built - 1] = true;
+	replaces[Anum_pg_mv_statistic_deps_built - 1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys - 1] = true;
 
 	values[Anum_pg_mv_statistic_ndist_built - 1] = BoolGetDatum(ndistinct != NULL);
+	values[Anum_pg_mv_statistic_deps_built - 1] = BoolGetDatum(dependencies != NULL);
 
 	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(attrs);
 
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..03865ed
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,632 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+#include "utils/bytea.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Internal state for DependencyGenerator of dependencies. Dependencies are similar to
+ * k-permutations of n elements, except that the order does not matter for the
+ * first (k-1) elements. That is, (a,b=>c) and (b,a=>c) are equivalent.
+ */
+typedef struct DependencyGeneratorData
+{
+	int		k;					/* size of the dependency */
+	int		current;			/* next dependency to return (index) */
+	int		ndependencies;		/* number of dependencies generated */
+	int	   *dependencies;		/* array of pre-generated dependencies  */
+} DependencyGeneratorData;
+
+typedef DependencyGeneratorData *DependencyGenerator;
+
+static void
+generate_dependencies_recurse(DependencyGenerator state,
+							  int n, int index, int start, int *current)
+{
+	/*
+	 * The generator handles the first (k-1) elements differently from
+	 * the last element.
+	 */
+	if (index < (state->k - 1))
+	{
+		int i;
+
+		/*
+		 * The first (k-1) values have to be in ascending order, which we
+		 * generate recursively.
+		 */
+
+		for (i = start; i < n; i++)
+		{
+			current[index] = i;
+			generate_dependencies_recurse(state, n, (index+1), (i+1), current);
+		}
+	}
+	else
+	{
+		int i;
+
+		/*
+		 * the last element is the implied value, which does not respect the
+		 * ascending order. We just need to check that the value is not in the
+		 * first (k-1) elements.
+		 */
+
+		for (i = 0; i < n; i++)
+		{
+			int		j;
+			bool	match = false;
+
+			current[index] = i;
+
+			for (j = 0; j < index; j++)
+			{
+				if (current[j] == i)
+				{
+					match = true;
+					break;
+				}
+			}
+
+			/*
+			 * If the value is not found in the first part of the dependency,
+			 * we're done.
+			 */
+			if (! match)
+			{
+				state->dependencies
+					= (int*)repalloc(state->dependencies,
+									 state->k * (state->ndependencies + 1) * sizeof(int));
+				memcpy(&state->dependencies[(state->k * state->ndependencies)],
+					   current, state->k * sizeof(int));
+				state->ndependencies++;
+			}
+		}
+	}
+}
+
+/* generate all dependencies (k-permutations of n elements) */
+static void
+generate_dependencies(DependencyGenerator state, int n)
+{
+	int	   *current = (int *) palloc0(sizeof(int) * state->k);
+
+	generate_dependencies_recurse(state, n, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the DependencyGenerator of variations, and prebuild the variations
+ *
+ * This pre-builds all the variations. We could also generate them in
+ * DependencyGenerator_next(), but this seems simpler.
+ */
+static DependencyGenerator
+DependencyGenerator_init(int2vector *attrs, int k)
+{
+	int			n = attrs->dim1;
+	DependencyGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the DependencyGenerator state as a single chunk of memory */
+	state = (DependencyGenerator) palloc0(sizeof(DependencyGeneratorData));
+	state->dependencies = (int*)palloc(k * sizeof(int));
+
+	state->ndependencies = 0;
+	state->current = 0;
+	state->k = k;
+
+	/* now actually pre-generate all the variations */
+	generate_dependencies(state, n);
+
+	return state;
+}
+
+/* free the DependencyGenerator state */
+static void
+DependencyGenerator_free(DependencyGenerator state)
+{
+	/* we've allocated a single chunk, so just free it */
+	pfree(state);
+}
+
+/* generate next combination */
+static int *
+DependencyGenerator_next(DependencyGenerator state, int2vector *attrs)
+{
+	if (state->current == state->ndependencies)
+		return NULL;
+
+	return &state->dependencies[state->k * state->current++];
+}
+
+
+/*
+ * validates functional dependency on the data
+ *
+ * An actual work horse of detecting functional dependencies. Given a variation
+ * of k attributes, it checks that the first (k-1) are sufficient to determine
+ * the last one.
+ */
+static double
+dependency_degree(int numrows, HeapTuple *rows, int k, int *dependency,
+				  VacAttrStats **stats, int2vector *attrs)
+{
+	int			i,
+				j;
+	int			nvalues = numrows * k;
+
+	/*
+	 * XXX Maybe the threshold should be somehow related to the number of
+	 * distinct values in the combination of columns we're analyzing. Assuming
+	 * the distribution is uniform, we can estimate the average group size and
+	 * use it as a threshold, similarly to what we do for MCV lists.
+	 */
+	int			min_group_size = 3;
+
+	/* number of groups supporting / contradicting the dependency */
+	int			n_supporting = 0;
+	int			n_contradicting = 0;
+
+	/* counters valid within a group */
+	int			group_size = 0;
+	int			n_violations = 0;
+
+	int			n_supporting_rows = 0;
+	int			n_contradicting_rows = 0;
+
+	/* sort info for all attributes columns */
+	MultiSortSupport mss = multi_sort_init(k);
+
+	/* data for the sort */
+	SortItem   *items = (SortItem *) palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum *) palloc0(sizeof(Datum) * nvalues);
+	bool	   *isnull = (bool *) palloc0(sizeof(bool) * nvalues);
+
+	/* fix the pointers to values/isnull */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	/*
+	 * Verify the dependency (a,b,...)->z, using a rather simple algorithm:
+	 *
+	 * (a) sort the data lexicographically
+	 *
+	 * (b) split the data into groups by first (k-1) columns
+	 *
+	 * (c) for each group count different values in the last column
+	 */
+
+	/* prepare the sort function for the first dimension, and SortItem array */
+	for (i = 0; i < k; i++)
+	{
+		multi_sort_add_dimension(mss, i, dependency[i], stats);
+
+		/* accumulate all the data for both columns into an array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[dependency[i]],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	/* sort the items so that we can detect the groups */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/*
+	 * Walk through the sorted array, split it into rows according to the
+	 * first (k-1) columns. If there's a single value in the last column, we
+	 * count the group as 'supporting' the functional dependency. Otherwise we
+	 * count it as contradicting.
+	 *
+	 * We also require a group to have a minimum number of rows to be
+	 * considered useful for supporting the dependency. Contradicting groups
+	 * may be of any size, though.
+	 *
+	 * XXX The minimum size requirement makes it impossible to identify case
+	 * when both columns are unique (or nearly unique), and therefore
+	 * trivially functionally dependent.
+	 */
+
+	/* start with the first row forming a group */
+	group_size = 1;
+
+	for (i = 1; i < numrows; i++)
+	{
+		/* end of the preceding group */
+		if (multi_sort_compare_dims(0, (k - 2), &items[i - 1], &items[i], mss) != 0)
+		{
+			/*
+			 * If there is a single are no contradicting rows, count the group
+			 * as supporting, otherwise contradicting.
+			 */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+			{
+				n_supporting += 1;
+				n_supporting_rows += group_size;
+			}
+			else if (n_violations > 0)
+			{
+				n_contradicting += 1;
+				n_contradicting_rows += group_size;
+			}
+
+			/* current values start a new group */
+			n_violations = 0;
+			group_size = 0;
+		}
+		/* first colums match, but the last one does not (so contradicting) */
+		else if (multi_sort_compare_dims((k - 1), (k - 1), &items[i - 1], &items[i], mss) != 0)
+			n_violations += 1;
+
+		group_size += 1;
+	}
+
+	/* handle the last group (just like above) */
+	if ((n_violations == 0) && (group_size >= min_group_size))
+	{
+		n_supporting += 1;
+		n_supporting_rows += group_size;
+	}
+	else if (n_violations)
+	{
+		n_contradicting += 1;
+		n_contradicting_rows += group_size;
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(mss);
+
+	/* Compute the 'degree of validity' as (supporting/total). */
+	return (n_supporting_rows * 1.0 / numrows);
+}
+
+/*
+ * detects functional dependencies between groups of columns
+ *
+ * Generates all possible subsets of columns (variations) and checks if the
+ * last one is determined by the preceding ones. For example given 3 columns,
+ * there are 12 variations (6 for variations on 2 columns, 6 for 3 columns):
+ *
+ *	   two columns			  three columns
+ *	   -----------			  -------------
+ *	   (a) -> c				  (a,b) -> c
+ *	   (b) -> c				  (b,a) -> c
+ *	   (a) -> b				  (a,c) -> b
+ *	   (c) -> b				  (c,a) -> b
+ *	   (c) -> a				  (c,b) -> a
+ *	   (b) -> a				  (b,c) -> a
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int			i;
+	int			k;
+	int			numattrs = attrs->dim1;
+
+	/* result */
+	MVDependencies dependencies = NULL;
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * We'll try build functional dependencies starting from the smallest ones
+	 * covering jut 2 columns, to the largest ones, covering all columns
+	 * included int the statistics. We start from the smallest ones because we
+	 * want to be able to skip already implied ones.
+	 */
+	for (k = 2; k <= numattrs; k++)
+	{
+		int		   *dependency; /* array with k elements */
+
+		/* prepare a DependencyGenerator of variation */
+		DependencyGenerator DependencyGenerator = DependencyGenerator_init(attrs, k);
+
+		/* generate all possible variations of k values (out of n) */
+		while ((dependency = DependencyGenerator_next(DependencyGenerator, attrs)))
+		{
+			double			degree;
+			MVDependency	d;
+
+			/* compute how valid the dependency seems */
+			degree = dependency_degree(numrows, rows, k, dependency, stats, attrs);
+
+			/* if the dependency seems entirely invalid, don't bother storing it */
+			if (degree == 0.0)
+				continue;
+
+			d = (MVDependency) palloc0(offsetof(MVDependencyData, attributes)
+									   +k * sizeof(int));
+
+			/* copy the dependency (and keep the indexes into stakeys) */
+			d->degree = degree;
+			d->nattributes = k;
+			for (i = 0; i < k; i++)
+				d->attributes[i] = dependency[i];
+
+			/* initialize the list of dependencies */
+			if (dependencies == NULL)
+			{
+				dependencies
+					= (MVDependencies) palloc0(sizeof(MVDependenciesData));
+
+				dependencies->magic = MVSTAT_DEPS_MAGIC;
+				dependencies->type = MVSTAT_DEPS_TYPE_BASIC;
+				dependencies->ndeps = 0;
+			}
+
+			dependencies->ndeps++;
+			dependencies = (MVDependencies) repalloc(dependencies,
+										   offsetof(MVDependenciesData, deps)
+								+dependencies->ndeps * sizeof(MVDependency));
+
+			dependencies->deps[dependencies->ndeps - 1] = d;
+		}
+
+		/* we're done with variations of k elements, so free the DependencyGenerator */
+		DependencyGenerator_free(DependencyGenerator);
+	}
+
+	return dependencies;
+}
+
+
+/*
+ * serialize list of dependencies into a bytea
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+	Size		len;
+
+	/* we need to store ndeps, with a number of attributes for each one */
+	len = VARHDRSZ + offsetof(MVDependenciesData, deps) +
+		  dependencies->ndeps * offsetof(MVDependencyData, attributes);
+
+	/* and also include space for the actual attribute numbers and degrees */
+	for (i = 0; i < dependencies->ndeps; i++)
+		len += (sizeof(int16) * dependencies->deps[i]->nattributes);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* store number of attributes and attribute numbers for each dependency */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency d = dependencies->deps[i];
+
+		memcpy(tmp, d, offsetof(MVDependencyData, attributes));
+		tmp += offsetof(MVDependencyData, attributes);
+
+		memcpy(tmp, d->attributes, sizeof(int16) * d->nattributes);
+		tmp += sizeof(int16) * d->nattributes;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea *data)
+{
+	int			i;
+	Size		expected_size;
+	MVDependencies dependencies;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData, deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData, deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies) palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+		elog(ERROR, "invalid dependency magic %d (expected %dd)",
+			 dependencies->magic, MVSTAT_DEPS_MAGIC);
+
+	if (dependencies->type != MVSTAT_DEPS_TYPE_BASIC)
+		elog(ERROR, "invalid dependency type %d (expected %dd)",
+			 dependencies->type, MVSTAT_DEPS_TYPE_BASIC);
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what minimum bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData, deps) +
+		dependencies->ndeps * (offsetof(MVDependencyData, attributes) +
+							   sizeof(int16) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+							+(dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		double		degree;
+		int			k;
+		MVDependency d;
+
+		/* degree of validity */
+		memcpy(&degree, tmp, sizeof(double));
+		tmp += sizeof(double);
+
+		/* number of attributes */
+		memcpy(&k, tmp, sizeof(int));
+		tmp += sizeof(int);
+
+		/* is the number of attributes valid? */
+		Assert((k >= 2) && (k <= MVSTATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the dependency */
+		d = (MVDependency) palloc0(offsetof(MVDependencyData, attributes) +
+								   (k * sizeof(int)));
+
+		d->degree = degree;
+		d->nattributes = k;
+
+		/* copy attribute numbers */
+		memcpy(d->attributes, tmp, sizeof(int16) * d->nattributes);
+		tmp += sizeof(int16) * d->nattributes;
+
+		dependencies->deps[i] = d;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return dependencies;
+}
+
+/*
+ * pg_dependencies_in		- input routine for type pg_dependencies.
+ *
+ * pg_dependencies is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_dependencies_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies		- output routine for type pg_dependencies.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ */
+Datum
+pg_dependencies_out(PG_FUNCTION_ARGS)
+{
+	int i, j;
+	char		   *ret;
+	StringInfoData	str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	initStringInfo(&str);
+	appendStringInfoString(&str, "[");
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoString(&str, "{");
+
+		for (j = 0; j < dependency->nattributes; j++)
+		{
+			if (j == dependency->nattributes-1)
+				appendStringInfoString(&str, " => ");
+			else if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", dependency->attributes[j]);
+		}
+
+		appendStringInfo(&str, " : %f", dependency->degree);
+
+		appendStringInfoString(&str, "}");
+	}
+
+	appendStringInfoString(&str, "]");
+
+	ret = pstrdup(str.data);
+	pfree(str.data);
+
+	PG_RETURN_CSTRING(ret);
+}
+
+/*
+ * pg_dependencies_recv		- binary input routine for type pg_dependencies.
+ */
+Datum
+pg_dependencies_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies_send		- binary output routine for type pg_dependencies.
+ *
+ * XXX Histograms are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_dependencies_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index d54a0f9..1ecf1cb 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -258,6 +258,10 @@ DATA(insert (  194	 25    0 i b ));
 DATA(insert (  3343	 17    0 i b ));
 DATA(insert (  3343	 25    0 i i ));
 
+/* pg_dependencies can be coerced to, but not from, bytea and text */
+DATA(insert (  3353	 17    0 i b ));
+DATA(insert (  3353	 25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index fad80a3..e119cb7 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,9 +38,11 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		ndist_enabled;	/* build ndist coefficient? */
+	bool		deps_enabled;	/* analyze dependencies? */
 
 	/* statistics that are available (if requested) */
 	bool		ndist_built;	/* ndistinct coeff built */
+	bool		deps_built;		/* dependencies were built */
 
 	/*
 	 * variable-length fields start here, but we allow direct access to
@@ -50,6 +52,7 @@ CATALOG(pg_mv_statistic,3381)
 
 #ifdef CATALOG_VARLEN
 	pg_ndistinct		standist;		/* ndistinct coeff (serialized) */
+	pg_dependencies		stadeps;		/* dependencies (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -65,14 +68,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					8
+#define Natts_pg_mv_statistic					11
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_ndist_enabled		5
-#define Anum_pg_mv_statistic_ndist_built		6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_standist			8
+#define Anum_pg_mv_statistic_deps_enabled		6
+#define Anum_pg_mv_statistic_ndist_built		7
+#define Anum_pg_mv_statistic_deps_built			8
+#define Anum_pg_mv_statistic_stakeys			9
+#define Anum_pg_mv_statistic_standist			10
+#define Anum_pg_mv_statistic_stadeps			11
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8bf46e2..0c78361 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2731,6 +2731,15 @@ DESCR("I/O");
 DATA(insert OID = 3347 (  pg_ndistinct_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3343" _null_ _null_ _null_ _null_ _null_	pg_ndistinct_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 3354 (  pg_dependencies_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3353 "2275" _null_ _null_ _null_ _null_ _null_ pg_dependencies_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3355 (  pg_dependencies_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3353" _null_ _null_ _null_ _null_ _null_ pg_dependencies_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3356 (  pg_dependencies_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3353 "2281" _null_ _null_ _null_ _null_ _null_ pg_dependencies_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3357 (  pg_dependencies_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3353" _null_ _null_ _null_ _null_ _null_	pg_dependencies_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index e6ab642..250952b 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -368,6 +368,10 @@ DATA(insert OID = 3343 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_nd
 DESCR("multivariate ndistinct coefficients");
 #define PGNDISTINCTOID	3343
 
+DATA(insert OID = 3353 ( pg_dependencies		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_dependencies_in pg_dependencies_out pg_dependencies_recv pg_dependencies_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate histogram");
+#define PGDEPENDENCIESOID	3353
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 13bfc52..5494cc4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -609,6 +609,7 @@ typedef struct CreateStatsStmt
 	List	   *defnames;		/* qualified name (list of Value strings) */
 	RangeVar   *relation;		/* relation to build statistics on */
 	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
 	bool		if_not_exists;	/* do nothing if statistics already exists */
 } CreateStatsStmt;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3e9f930..8b7db72 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -674,9 +674,11 @@ typedef struct MVStatisticInfo
 	RelOptInfo *rel;			/* back-link to index's table */
 
 	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
 	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index de00edd..4cb09e7 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -617,6 +617,10 @@ extern Datum pg_ndistinct_in(PG_FUNCTION_ARGS);
 extern Datum pg_ndistinct_out(PG_FUNCTION_ARGS);
 extern Datum pg_ndistinct_recv(PG_FUNCTION_ARGS);
 extern Datum pg_ndistinct_send(PG_FUNCTION_ARGS);
+extern Datum pg_dependencies_in(PG_FUNCTION_ARGS);
+extern Datum pg_dependencies_out(PG_FUNCTION_ARGS);
+extern Datum pg_dependencies_recv(PG_FUNCTION_ARGS);
+extern Datum pg_dependencies_send(PG_FUNCTION_ARGS);
 
 /* regexp.c */
 extern Datum nameregexeq(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 2937c78..227b9ff 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -39,22 +39,55 @@ typedef struct MVNDistinctData {
 typedef MVNDistinctData *MVNDistinct;
 
 
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C		/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1		/* basic dependencies type */
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
+ */
+typedef struct MVDependencyData
+{
+	double		degree;			/* degree of validity (0-1) */
+	int			nattributes;	/* number of attributes */
+	int16		attributes[1];	/* attribute numbers */
+} MVDependencyData;
+
+typedef MVDependencyData *MVDependency;
+
+typedef struct MVDependenciesData
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MV Dependencies (BASIC) */
+	int32		ndeps;			/* number of dependencies */
+	MVDependency deps[1];		/* XXX why not a pointer? */
+} MVDependenciesData;
+
+typedef MVDependenciesData *MVDependencies;
+
+
+
 MVNDistinct		load_mv_ndistinct(Oid mvoid);
 
 bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
+bytea *serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVNDistinct deserialize_mv_ndistinct(bytea *data);
-
+MVDependencies deserialize_mv_dependencies(bytea *data);
 
 MVNDistinct build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
-				 int2vector *attrs, VacAttrStats **stats);
+							   int2vector *attrs, VacAttrStats **stats);
+
+MVDependencies build_mv_dependencies(int numrows, HeapTuple *rows,
+					  int2vector *attrs,
+					  VacAttrStats **stats);
 
 void build_mv_stats(Relation onerel, double totalrows,
 			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVNDistinct ndistinct,
+void update_mv_stats(Oid relid, MVNDistinct ndistinct, MVDependencies dependencies,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..d442a16
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,147 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 WITH (dependencies) ON (unknown_column) FROM functional_dependencies;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 WITH (dependencies) ON (a) FROM functional_dependencies;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s1 WITH (dependencies) ON (a,a) FROM functional_dependencies;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 WITH (dependencies) ON (a, a, b) FROM functional_dependencies;
+ERROR:  duplicate column name in statistics definition
+-- correct command
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | stadeps 
+--------------+------------+---------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 0.999900}, {0 => 2 : 0.999900}, {1 => 2 : 0.999900}, {0, 1 => 2 : 0.999900}, {0, 2 => 1 : 0.999900}]
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 0.999900}, {0 => 2 : 0.999900}, {1 => 2 : 0.494900}, {0, 1 => 2 : 0.999900}, {0, 2 => 1 : 0.999900}]
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 1.000000}, {0 => 2 : 1.000000}, {1 => 2 : 1.000000}, {0, 1 => 2 : 1.000000}, {0, 2 => 1 : 1.000000}]
+(1 row)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | stadeps 
+--------------+------------+---------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 0.999900}, {0 => 2 : 0.999900}, {1 => 2 : 0.999900}, {0, 1 => 2 : 0.999900}, {0, 2 => 1 : 0.999900}]
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 0.999900}, {0 => 2 : 0.999900}, {1 => 2 : 0.494900}, {0, 1 => 2 : 0.999900}, {0, 2 => 1 : 0.999900}]
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 1.000000}, {0 => 2 : 1.000000}, {1 => 2 : 1.000000}, {0, 1 => 2 : 1.000000}, {0, 2 => 1 : 1.000000}]
+(1 row)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 WITH (dependencies) ON (a, b, c, d) FROM functional_dependencies;
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                                                                                                                     stadeps                                                                                                                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ t            | t          | [{1 => 0 : 1.000000}, {2 => 0 : 1.000000}, {2 => 1 : 1.000000}, {3 => 0 : 1.000000}, {3 => 1 : 0.996700}, {0, 2 => 1 : 1.000000}, {0, 3 => 1 : 0.996700}, {1, 2 => 0 : 1.000000}, {1, 3 => 0 : 1.000000}, {2, 3 => 0 : 1.000000}, {2, 3 => 1 : 1.000000}, {0, 2, 3 => 1 : 1.000000}, {1, 2, 3 => 0 : 1.000000}]
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/expected/mv_ndistinct.out b/src/test/regress/expected/mv_ndistinct.out
index 5f55091..06a7634 100644
--- a/src/test/regress/expected/mv_ndistinct.out
+++ b/src/test/regress/expected/mv_ndistinct.out
@@ -6,19 +6,19 @@ CREATE TABLE ndistinct (
     d INT
 );
 -- unknown column
-CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (unknown_column) FROM ndistinct;
 ERROR:  column "unknown_column" referenced in statistics does not exist
 -- single column
-CREATE STATISTICS s10 ON (a) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a) FROM ndistinct;
 ERROR:  statistics require at least 2 columns
 -- single column, duplicated
-CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a,a) FROM ndistinct;
 ERROR:  duplicate column name in statistics definition
 -- two columns, one duplicated
-CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a, a, b) FROM ndistinct;
 ERROR:  duplicate column name in statistics definition
 -- correct command
-CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a, b, c) FROM ndistinct;
 -- perfectly correlated groups
 INSERT INTO ndistinct
      SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index f85381a..b9dd138 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -36,7 +36,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regress_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
-CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
+CREATE STATISTICS addr_nsp.gentable_stat WITH (ndistinct) ON (a,b) FROM addr_nsp.gentable;
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9a26205..db1cf8a 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -818,11 +818,12 @@ WHERE c.castmethod = 'b' AND
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
  pg_ndistinct      | bytea             |        0 | i
+ pg_dependencies   | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(8 rows)
+(9 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7599d6c..ea3f51d 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1380,7 +1380,8 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.staname,
     s.stakeys AS attnums,
-    length((s.standist)::text) AS ndistbytes
+    length((s.standist)::bytea) AS ndistbytes,
+    length((s.stadeps)::bytea) AS depsbytes
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 9356aaa..8b849b9 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -67,13 +67,14 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid  |   typname    
-------+--------------
+ oid  |     typname     
+------+-----------------
   194 | pg_node_tree
  3343 | pg_ndistinct
+ 3353 | pg_dependencies
   210 | smgr
   705 | unknown
-(4 rows)
+(5 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index b70b82f..ecb6f04 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -115,4 +115,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_ndistinct
+test: mv_ndistinct mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 1c5bfaf..fa0e993 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -170,3 +170,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_ndistinct
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..43df798
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,139 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 WITH (dependencies) ON (unknown_column) FROM functional_dependencies;
+
+-- single column
+CREATE STATISTICS s1 WITH (dependencies) ON (a) FROM functional_dependencies;
+
+-- single column, duplicated
+CREATE STATISTICS s1 WITH (dependencies) ON (a,a) FROM functional_dependencies;
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 WITH (dependencies) ON (a, a, b) FROM functional_dependencies;
+
+-- correct command
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 WITH (dependencies) ON (a, b, c, d) FROM functional_dependencies;
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/sql/mv_ndistinct.sql b/src/test/regress/sql/mv_ndistinct.sql
index 5cef254..43024ca 100644
--- a/src/test/regress/sql/mv_ndistinct.sql
+++ b/src/test/regress/sql/mv_ndistinct.sql
@@ -7,19 +7,19 @@ CREATE TABLE ndistinct (
 );
 
 -- unknown column
-CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (unknown_column) FROM ndistinct;
 
 -- single column
-CREATE STATISTICS s10 ON (a) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a) FROM ndistinct;
 
 -- single column, duplicated
-CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a,a) FROM ndistinct;
 
 -- two columns, one duplicated
-CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a, a, b) FROM ndistinct;
 
 -- correct command
-CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a, b, c) FROM ndistinct;
 
 -- perfectly correlated groups
 INSERT INTO ndistinct
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index 956bef3..bf3323f 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -39,7 +39,7 @@ ALTER DEFAULT PRIVILEGES FOR ROLE regress_addr_user REVOKE DELETE ON TABLES FROM
 CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION varchar_transform(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
-CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
+CREATE STATISTICS addr_nsp.gentable_stat WITH (ndistinct) ON (a,b) FROM addr_nsp.gentable;
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
-- 
2.5.5

0004-PATCH-selectivity-estimation-using-functional-de-v22.patchbinary/octet-stream; name=0004-PATCH-selectivity-estimation-using-functional-de-v22.patchDownload

From 89af849280f99c6dab0d2b642677732461440ed0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:37:27 +0200
Subject: [PATCH 4/9] PATCH: selectivity estimation using functional
 dependencies

Use functional dependencies to correct selectivity estimates of
equality clauses. For now this only works with regular WHERE
conditions, not join clauses etc.

Given two equality clauses

     (a = 1) AND (b = 2)

we compute selectivity for each condition, and then combine them
using formula

     P(a=1, b=2) = P(a=1) * [degree + (1 - degree) * P(b=2)]

where 'degree' of the functional dependence (a => b) is a number
between [0,1] measuring how much the knowledge of 'a' determines
the value of 'b'. For 'degree=0' this degrades to independence,
for 'degree=1' we get perfect functional dependency.

Estimates of more than two clauses are computed recursively, so
for example

    (a = 1) AND (b = 2) AND (c = 3)

is first split into

    P(a=1, b=2, c=3) = P(a=1, b=2) * [d + (1-d) * P(c=3)]

where 'd' is degree of (a,b => c) functional dependency. And then
the first part of the estimate is computed recursively:

    P(a=1, b=2) = P(a=1) * [d + (1-d) * P(b=2)]

where 'd' is degree of (a => b) dependency.

The patch includes regression tests with functional dependencies
on several synthetic datasets (random, perfectly correlated, etc.)
---
 doc/src/sgml/planstats.sgml                   | 178 +++++-
 src/backend/optimizer/path/clausesel.c        | 781 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/README.stats        |  45 +-
 src/backend/utils/mvstats/common.c            |   1 +
 src/backend/utils/mvstats/dependencies.c      |  68 +++
 src/include/utils/mvstats.h                   |   6 +-
 src/test/regress/expected/mv_dependencies.out |  28 +-
 src/test/regress/sql/mv_dependencies.sql      |  19 +-
 8 files changed, 1072 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index e9248b4..bdad2db 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -504,7 +504,7 @@ SELECT relpages, reltuples FROM pg_class WHERE relname = 't';
 
 <programlisting>
 EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
-                                           QUERY PLAN                                            
+                                           QUERY PLAN
 -------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual time=0.031..2.870 rows=100 loops=1)
    Filter: (a = 1)
@@ -527,7 +527,7 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
 
 <programlisting>
 EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
-                                          QUERY PLAN                                           
+                                          QUERY PLAN
 -----------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.033..3.006 rows=100 loops=1)
    Filter: ((a = 1) AND (b = 1))
@@ -547,11 +547,11 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
   <para>
    Overestimates, i.e. errors in the opposite direction, are also possible.
    Consider for example the following combination of range conditions, each
-   matching 
+   matching roughly half the rows.
 
 <programlisting>
 EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
-                                           QUERY PLAN                                           
+                                           QUERY PLAN
 ------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..195.00 rows=2500 width=8) (actual time=1.607..1.607 rows=0 loops=1)
    Filter: ((a <= 49) AND (b > 49))
@@ -587,6 +587,176 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
    sections.
   </para>
 
+  <sect2 id="functional-dependencies">
+   <title>Functional Dependencies</title>
+
+   <para>
+    The simplest type of multivariate statistics are functional dependencies,
+    used in definitions of database normal forms. When simplified, saying that
+    <literal>b</> is functionally dependent on <literal>a</> means that
+    knowledge of value of <literal>a</> is sufficient to determine value of
+    <literal>b</>.
+   </para>
+
+   <para>
+    In normalized databases, only functional dependencies on primary keys
+    and super keys are allowed. In practice however many data sets are not
+    fully normalized, for example thanks to intentional denormalization for
+    performance reasons. The table <literal>t</> is an example of a data set
+    with functional dependencies. As <literal>a = b</> for all rows in the
+    table, <literal>a</> is functionally dependent on <literal>b</> and
+    <literal>b</> is functionally dependent on <literal>a</literal>.
+   </para>
+
+   <para>
+    Functional dependencies directly affect accuracy of the estimates, as
+    conditions on the dependent column(s) do not restrict the result set,
+    and are often redundant, causing underestimates. In the first example,
+    either <literal>a = 1</> or <literal>b = 1</> is sufficient (however see
+    <xref linkend="functional-dependencies-limitations">).
+   </para>
+
+   <para>
+    To inform the planner about the functional dependencies, or rather to
+    instruct it to search for them during <command>ANALYZE</>, we can use
+    the <command>CREATE STATISTICS</> command.
+
+<programlisting>
+CREATE STATISTICS s1 ON t (a,b) WITH (dependencies);
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.095..3.118 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.367 ms
+ Execution time: 3.380 ms
+(5 rows)
+</programlisting>
+
+    As you can see, the estimate improved quite a bit, as the planner is now
+    aware of the functional dependencies and eliminates the second condition
+    when computing the estimates.
+   </para>
+
+   <para>
+    Let's inspect multivariate statistics on a table, as defined by
+    <command>CREATE STATISTICS</> and built by <command>ANALYZE</>. If you're
+    using <application>psql</>, the easiest way to list statistics on a table
+    is by using <command>\d</>.
+
+<programlisting>
+\d t
+       Table "public.t"
+ Column |  Type   | Modifiers
+--------+---------+-----------
+ a      | integer |
+ b      | integer |
+Statistics:
+    "public.s1" (dependencies) ON (a, b)
+</programlisting>
+
+   </para>
+
+   <para>
+    Similarly to per-column statistics, multivariate statistics are stored in
+    a system catalog called <structname>pg_mv_statistic</structname>, but
+    there is also a more convenient view <structname>pg_mv_stats</structname>.
+    To inspect the statistics <literal>s1</literal> defined above,
+    you may do this:
+
+<programlisting>
+SELECT tablename, staname, attnums, depsbytes, depsinfo
+  FROM pg_mv_stats WHERE staname = 's1';
+
+ tablename | staname | attnums | depsbytes |    depsinfo
+-----------+---------+---------+-----------+----------------
+ t         | s1      | 1 2     |        32 | dependencies=2
+(1 row)
+</programlisting>
+
+     This shows that the statistic is defined on table <structname>t</>,
+     <structfield>attnums</structfield> lists attribute numbers of columns
+     (references <structname>pg_attribute</structname>). It also shows
+     <command>ANALYZE</> found two functional dependencies, and size when
+     serialized into a <literal>bytea</> column. Inspecting the functional
+     dependencies is possible using <function>pg_mv_stats_dependencies_show</>
+     function.
+
+<programlisting>
+SELECT pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE staname = 's1';
+
+ pg_mv_stats_dependencies_show
+-------------------------------
+ (1) => 2, (2) => 1
+(1 row)
+</programlisting>
+
+    Which confirms <literal>a</> is functionally dependent on <literal>b</> and
+    <literal>b</> is functionally dependent on <literal>a</literal>.
+   </para>
+
+   <para>
+    Now let's quickly discuss how this knowledge is applied when estimating
+    the selectivity. The planner walks through the conditions and attempts
+    to identify which conditions are already implied by other conditions,
+    and eliminates them (but only for the estimation, all conditions will be
+    checked on tuples during execution). In the example query, either of
+    the conditions may get eliminated, improving the estimate. This happens
+    in <function>clauselist_apply_dependencies</> in <filename>clausesel.c</>.
+   </para>
+
+    <sect3 id="functional-dependencies-limitations">
+     <title>Limitations of functional dependencies</title>
+
+     <para>
+      The first limitation of functional dependencies is that they only work
+      with simple equality conditions, comparing columns and constant values.
+      It's not possible to use them to eliminate equality conditions comparing
+      two columns or a column to an expression, range clauses, <literal>LIKE</>
+      or any other type of condition.
+     </para>
+
+     <para>
+      When eliminating the implied conditions, the planner assumes that the
+      conditions are compatible. Consider the following example, violating
+      this assumption:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=2.992..2.992 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.232 ms
+ Execution time: 3.033 ms
+(5 rows)
+</programlisting>
+
+      There are no rows with this combination of values, however the planner
+      is unable to verify whether the values match - it only knows that
+      the columns are functionally dependent.
+     </para>
+
+     <para>
+      This assumption is more about queries executed on the database - in many
+      cases it's actually satisfied (e.g. when the GUI only allows selecting
+      compatible values). But if that's not the case, functional dependencies
+      may not be a viable option.
+     </para>
+
+     <para>
+      For additional information about functional dependencies, see
+      <filename>src/backend/utils/mvstats/README.dependencies</>.
+     </para>
+
+    </sect3>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 02660c2..ec74b16 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,33 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		STATS_TYPE_FDEPS	0x01
+
+static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+
+static Bitmapset *collect_mv_attnums(List *clauses, Index relid);
+
+static int	count_mv_attnums(List *clauses, Index relid);
+
+static int	count_varnos(List *clauses, Index *relid);
+
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums,
+											 int types);
+
+static List *clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity_deps(PlannerInfo *root,
+						Index relid, List *clauses, MVStatisticInfo *mvstats,
+						Index varRelid, JoinType jointype, SpecialJoinInfo *sjinfo);
+
+static bool has_stats(List *stats, int type);
+
+static List *find_stats(PlannerInfo *root, Index relid);
+
+static bool stats_type_matches(MVStatisticInfo *stat, int type);
+
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +92,19 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
+ * The first thing we try to do is applying multivariate statistics, in a way
+ * that intends to minimize the overhead when there are no multivariate stats
+ * on the relation. Thus we do several simple (and inexpensive) checks first,
+ * to verify that suitable multivariate statistics exist.
+ *
+ * If we identify such multivariate statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using multivariate stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -99,15 +143,81 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* list of multivariate stats on the relation */
+	List	   *stats = NIL;
+
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then multivariate statistics is futile
+	 * at this level (we might be able to apply them later if it's AND/OR
+	 * clause). So just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * To fetch the statistics, we first need to determine the rel. Currently
+	 * we only support estimates of simple restrictions referencing a single
+	 * baserel (no join statistics). However set_baserel_size_estimates() sets
+	 * varRelid=0 so we have to actually inspect the clauses by pull_varnos
+	 * and see if there's just a single varno referenced.
+	 *
+	 * XXX Maybe there's a better way to find the relid?
+	 */
+	if ((count_varnos(clauses, &relid) == 1) &&
+		((varRelid == 0) || (varRelid == relid)))
+		stats = find_stats(root, relid);
+
+	/*
+	 * Check that there are multivariate statistics usable for selectivity
+	 * estimation, i.e. anything except ndistinct coefficients.
+	 *
+	 * Also check the number of attributes in clauses that might be estimated
+	 * using those statistics, and that there are at least two such attributes.
+	 * It may easily happen that we won't be able to estimate the clauses using
+	 * the multivariate statistics anyway, but that requires a more expensive
+	 * to verify (so the check check should be worth it).
+	 *
+	 * If there are no such stats or not enough attributes, don't waste time
+	 * simply skip to estimation using the plain per-column stats.
+	 */
+	if (has_stats(stats, STATS_TYPE_FDEPS) &&
+		(count_mv_attnums(clauses, relid) >= 2))
+	{
+		MVStatisticInfo *mvstat;
+		Bitmapset  *mvattnums;
+
+		/* collect attributes from the compatible conditions */
+		mvattnums = collect_mv_attnums(clauses, relid);
+
+		/* and search for the statistic covering the most attributes */
+		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_FDEPS);
+
+		if (mvstat != NULL)		/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	   *mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, STATS_TYPE_FDEPS);
+
+			/* Empty list of clauses is a clear sign something went wrong. */
+			Assert(list_length(mvclauses));
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats (dependencies) */
+			s1 *= clauselist_mv_selectivity_deps(root, relid, mvclauses, mvstat,
+												 varRelid, jointype, sjinfo);
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +873,668 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * When applying functional dependencies, we start with the strongest ones
+ * strongest dependencies. That is, we select the dependency that:
+ *
+ * (a) has all attributes covered by the clauses
+ *
+ * (b) has the most attributes
+ *
+ * (c) has the higher degree of validity
+ *
+ * TODO Explain why we select the dependencies this way.
+ */
+static MVDependency
+find_strongest_dependency(MVStatisticInfo *mvstats, MVDependencies dependencies,
+						  Bitmapset *attnums)
+{
+	int i;
+	MVDependency strongest = NULL;
+
+	/* number of attnums in clauses */
+	int nattnums = bms_num_members(attnums);
+
+	/*
+	 * Iterate over the MVDependency items and find the strongest one from
+	 * the fully-matched dependencies. We do the cheap checks first, before
+	 * matching it against the attnums.
+	 */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency	dependency = dependencies->deps[i];
+
+		/*
+		 * Skip dependencies referencing more attributes than available clauses,
+		 * as those can't be fully matched.
+		 */
+		if (dependency->nattributes > nattnums)
+			continue;
+
+		/* We can skip dependencies on fewer attributes than the best one. */
+		if (strongest && (strongest->nattributes > dependency->nattributes))
+			continue;
+
+		/* And also weaker dependencies on the same number of attributes. */
+		if (strongest &&
+			(strongest->nattributes == dependency->nattributes) &&
+			(strongest->degree > dependency->degree))
+			continue;
+
+		/*
+		 * Check that the dependency actually is fully covered by clauses.
+		 * If the dependency is not fully matched by clauses, we can't use
+		 * it for the estimation.
+		 */
+		if (! dependency_is_fully_matched(dependency, attnums,
+										  mvstats->stakeys->values))
+			continue;
+
+		/*
+		 * We have a fully-matched dependency, and we already know it has to
+		 * be stronger than the current one (otherwise we'd skip it before
+		 * inspecting it at the very beginning.
+		 */
+		strongest = dependency;
+	}
+
+	return strongest;
+}
+
+/*
+ * clauselist_mv_selectivity_deps
+ *		estimate selectivity using functional dependencies
+ *
+ * Given equality clauses on attributes (a,b) we find the strongest dependency
+ * between them, i.e. either (a=>b) or (b=>a). Assuming (a=>b) is the selected
+ * dependency, we then combine the per-clause selectivities using the formula
+ *
+ *     P(a,b) = P(a) * [f + (1-f)*P(b)]
+ *
+ * where 'f' is the degree of the dependency.
+ *
+ * With clauses on more than two attributes, the dependencies are applied
+ * recursively, starting with the widest/strongest dependencies. For example
+ * P(a,b,c) is first split like this:
+ *
+ *     P(a,b,c) = P(a,b) * [f + (1-f)*P(c)]
+ *
+ * assuming (a,b=>c) is the strongest dependency.
+ */
+static Selectivity
+clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
+							   List *clauses, MVStatisticInfo *mvstats,
+							   Index varRelid, JoinType jointype,
+							   SpecialJoinInfo *sjinfo)
+{
+	ListCell	   *lc;
+	Selectivity		s1 = 1.0;
+	MVDependencies	dependencies;
+
+	Assert(mvstats->deps_enabled && mvstats->deps_built);
+
+	/* load the dependency items stored in the statistics */
+	dependencies = load_mv_dependencies(mvstats->mvoid);
+
+	Assert(dependencies);
+
+	/*
+	 * Apply the dependencies recursively, starting with the widest/strongest
+	 * ones, and proceeding to the smaller/weaker ones. At the end of each
+	 * round we factor in the selectivity of clauses on the implied attribute,
+	 * and remove the clauses from the list.
+	 */
+	while (true)
+	{
+		Selectivity		s2 = 1.0;
+		Bitmapset	   *attnums;
+		MVDependency	dependency;
+
+		/* clauses remaining after removing those on the "implied" attribute */
+		List		   *clauses_filtered = NIL;
+
+		attnums = collect_mv_attnums(clauses, relid);
+
+		/* no point in looking for dependencies with fewer than 2 attributes */
+		if (bms_num_members(attnums) < 2)
+			break;
+
+		/* the widest/strongest dependency, fully matched by clauses */
+		dependency = find_strongest_dependency(mvstats, dependencies, attnums);
+
+		/* if no suitable dependency was found, we're done */
+		if (! dependency)
+			break;
+
+		/*
+		 * We found an applicable dependency, so find all the clauses on the
+		 * implied attribute, so with dependency (a,b => c) we seach clauses
+		 * on 'c'. We only really expect a single such clause, but in case
+		 * there are more we simply multiply the selectivities as usual.
+		 *
+		 * XXX Maybe we should use the maximum, minimum or just error out?
+		 */
+		foreach(lc, clauses)
+		{
+			AttrNumber	attnum_clause = InvalidAttrNumber;
+			Node	   *clause = (Node *) lfirst(lc);
+
+			/*
+			 * XXX We need the attnum referenced by the clause, and this is the
+			 * easiest way to get it (but maybe not the best one). At this point
+			 * we should only see equality clauses compatible with functional
+			 * dependencies, so just error out if we stumble upon something else.
+			 */
+			if (! clause_is_mv_compatible(clause, relid, &attnum_clause))
+				elog(ERROR, "clause not compatible with functional dependencies");
+
+			Assert(AttributeNumberIsValid(attnum_clause));
+
+			/*
+			 * If the clause is not on the implied attribute, add it to the list
+			 * of filtered clauses (for the next round) and continue with the
+			 * next one.
+			 */
+			if (! dependency_implies_attribute(dependency, attnum_clause,
+											   mvstats->stakeys->values))
+			{
+				clauses_filtered = lappend(clauses_filtered, clause);
+				continue;
+			}
+
+			/*
+			 * Otherwise compute selectivity of the clause, and multiply it with
+			 * other clauses on the same attribute.
+			 *
+			 * XXX Not sure if we need to worry about multiple clauses, though.
+			 * Those are all equality clauses, and if they reference different
+			 * constants, that's not going to work.
+			 */
+			s2 *= clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		}
+
+		/*
+		 * Now factor in the selectivity for all the "implied" clauses into the
+		 * final one, using this formula:
+		 *
+		 *     P(a,b) = P(a) * (f + (1-f) * P(b))
+		 *
+		 * where 'f' is the degree of validity of the dependency.
+		*/
+		s1 *= (dependency->degree + (1 - dependency->degree) * s2);
+
+		/* And only keep the filtered clauses for the next round. */
+		clauses = clauses_filtered;
+	}
+
+	/* And now simply multiply with selectivities of the remaining clauses. */
+	foreach (lc, clauses)
+	{
+		Node   *clause = (Node *) lfirst(lc);
+
+		s1 *= clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+	}
+
+	return s1;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate using
+	 * multivariate stats, and remember the relid/columns. We'll then
+	 * cross-check if we have suitable stats, and only if needed we'll split
+	 * the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
+	 * using either a range or equality.
+	 */
+	foreach(l, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid)
+{
+	int			c;
+	Bitmapset  *attnums = collect_mv_attnums(clauses, relid);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int			cnt;
+	Bitmapset  *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+static int
+count_attnums_covered_by_stats(MVStatisticInfo *info, Bitmapset *attnums)
+{
+	int i;
+	int matches = 0;
+	int2vector *attrs = info->stakeys;
+
+	/* count columns covered by the statistics */
+	for (i = 0; i < attrs->dim1; i++)
+		if (bms_is_member(attrs->values[i], attnums))
+			matches++;
+
+	return matches;
+}
+
+/*
+ * We're looking for statistics matching at least 2 attributes, referenced in
+ * clauses compatible with multivariate statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * If there are multiple statistics referencing the same number of columns
+ * (from the clauses), the one with less source columns (as listed in the
+ * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *	   If there are two histograms built on the same set of columns, but one
+ *	   has 100 buckets and the other one has 1000 buckets (thus likely
+ *	   providing better estimates), this is not currently considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *	   If there are three statistics - one containing just a MCV list, another
+ *	   one with just a histogram and a third one with both, we treat them equally.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *	   As explained, only the number of referenced attributes counts, so if
+ *	   there are multiple clauses on a single attribute, this still counts as
+ *	   a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *	   Some clauses may work better with some statistics - for example equality
+ *	   clauses probably work better with MCV lists than with histograms. But
+ *	   IS [NOT] NULL conditions may often work better with histograms (thanks
+ *	   to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *	   WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
+ * as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list of
+ * clauses into two parts - conditions that are compatible with the selected
+ * stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *	   (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while the last
+ * condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting conditions
+ * instead of just referenced attributes), but eventually the best option should
+ * be to combine multiple statistics. But that's much harder to do correctly.
+ *
+ * TODO: Select multiple statistics and combine them when computing the estimate.
+ *
+ * TODO: This will probably have to consider compatibility of clauses, because
+ * 'dependencies' will probably work only with equality clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums, int types)
+{
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int			current_matches = 2;	/* goal #1: maximize */
+	int			current_dims = (MVSTATS_MAX_DIMENSIONS + 1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements) and
+	 * for each one count the referenced attributes (encoded in the 'attnums'
+	 * bitmap).
+	 */
+	foreach(lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *) lfirst(lc);
+
+		/* columns matching this statistics */
+		int			matches = 0;
+
+		/* size (number of dimensions) of this statistics */
+		int			numattrs = info->stakeys->dim1;
+
+		/* skip statistics not matching any of the requested types */
+		if (! (info->deps_built && (STATS_TYPE_FDEPS & types)))
+			continue;
+
+		/* count columns covered by the statistics */
+		matches = count_attnums_covered_by_stats(info, attnums);
+
+		/*
+		 * Use this statistics when it increases the number of matched clauses
+		 * or when it matches the same number of attributes but is smaller
+		 * (in terms of number of attributes covered).
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * clauselist_mv_split
+ *		split the clause list into a part to be estimated using the provided
+ *		statistics, and remaining clauses (estimated in some other way)
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int			i;
+	ListCell   *l;
+	List	   *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector *attrs = mvstats->stakeys;
+	int			numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset  *mvattnums = NULL;
+
+	/* build bitmap of attributes, so we can do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach(l, clauses)
+	{
+		bool		match = false;		/* by default not mv-compatible */
+		AttrNumber	attnum = InvalidAttrNumber;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_member(attnum, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list of
+		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
+		 * clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible with the
+	 * chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+typedef struct
+{
+	Index		varno;			/* relid we're interested in */
+	Bitmapset  *varattnos;		/* attnums referenced by the clauses */
+} mv_compatible_context;
+
+/*
+ * Recursive walker that checks compatibility of the clause with multivariate
+ * statistics, and collects attnums from the Vars.
+ *
+ * XXX The original idea was to combine this with expression_tree_walker, but
+ *	   I've been unable to make that work - seems that does not quite allow
+ *	   checking the structure. Hence the explicit calls to the walker.
+ */
+static bool
+mv_compatible_walker(Node *node, mv_compatible_context *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return true;
+
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+
+		/* check the clause inside the RestrictInfo */
+		return mv_compatible_walker((Node *) rinfo->clause, (void *) context);
+	}
+
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might
+		 * be unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (!AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+		/*
+		 * Only expressions with two arguments are considered compatible.
+		 *
+		 * XXX Possibly unnecessary (can OpExpr have different arg count?).
+		 */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node *) expr) == 1) &&
+			(is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (!ok)
+			return true;
+
+		/*
+		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
+		 * Otherwise note the relid and attnum for the variable. This uses the
+		 * function for estimating selectivity, ont the operator directly (a
+		 * bit awkward, but well ...).
+		 */
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+		}
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return mv_compatible_walker((Node *) var, context);
+	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *	   variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO: Support 'OR clauses' - shouldn't be all that difficult to
+ * evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+{
+	mv_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (mv_compatible_walker(clause, (void *) &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+
+/*
+ * Check that the statistics matches at least one of the requested types.
+ */
+static bool
+stats_type_matches(MVStatisticInfo *stat, int type)
+{
+	if ((type & STATS_TYPE_FDEPS) && stat->deps_built)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach(s, stats)
+	{
+		MVStatisticInfo *stat = (MVStatisticInfo *) lfirst(s);
+
+		/* terminate if we've found at least one matching statistics */
+		if (stats_type_matches(stat, type))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Lookups stats for the given baserel.
+ */
+static List *
+find_stats(PlannerInfo *root, Index relid)
+{
+	Assert(root->simple_rel_array[relid] != NULL);
+
+	return root->simple_rel_array[relid]->mvstatlist;
+}
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 865a75d..251a468 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,48 +8,9 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-
-Types of statistics
--------------------
-
-Currently we only have two kinds of multivariate statistics
-
-    (a) soft functional dependencies (README.dependencies)
-
-    (b) ndistinct coefficients
-
-
-Compatible clause types
------------------------
-
-Each type of statistics may be used to estimate some subset of clause types.
-
-    (a) functional dependencies - equality clauses (AND), possibly IS NULL
-
-Currently only simple operator clauses (Var op Const) are supported, but it's
-possible to support more complex clause types, e.g. (Var op Var).
-
-
-Complex clauses
----------------
-
-We also support estimating more complex clauses - essentially AND/OR clauses
-with (Var op Const) as leaves, as long as all the referenced attributes are
-covered by a single statistics.
-
-For example this condition
-
-    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
-
-may be estimated using statistics on (a,b,c,d). If we only have statistics on
-(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
-
-If we only have statistics on (a,b,c) we can't apply it at all at this point,
-but it's worth pointing out clauselist_selectivity() works recursively and when
-handling the second part (the OR-clause), we'll be able to apply the statistics.
-
-Note: The multi-statistics estimation patch also makes it possible to pass some
-clauses as 'conditions' into the deeper parts of the expression tree.
+Currently we only have one kind of multivariate statistics - soft functional
+dependencies, and we use it to improve estimates of equality clauses. See
+README.dependencies for details.
 
 
 Selectivity estimation
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index e1bed39..5f3fb55 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -300,6 +300,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index 03865ed..de58302 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -320,6 +320,10 @@ dependency_degree(int numrows, HeapTuple *rows, int k, int *dependency,
  *	   (c) -> b				  (c,a) -> b
  *	   (c) -> a				  (c,b) -> a
  *	   (b) -> a				  (b,c) -> a
+ *
+ * XXX Currently this builds redundant dependencies, becuse (a,b => c) and
+ * (b,a => c) is exactly the same thing, but both versions are generated
+ * and stored in the statistics.
  */
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -533,6 +537,70 @@ deserialize_mv_dependencies(bytea *data)
 }
 
 /*
+ * dependency_is_fully_matched
+ *		checks that a functional dependency is fully matched given clauses on
+ * 		attributes (assuming the clauses are suitable equality clauses)
+ */
+bool
+dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
+							int16 *attmap)
+{
+	int j;
+
+	/*
+	 * Check that the dependency actually is fully covered by clauses. We
+	 * have to translate all attribute numbers, as those are referenced
+	 */
+	for (j = 0; j < dependency->nattributes; j++)
+	{
+		int attnum = attmap[dependency->attributes[j]];
+
+		if (! bms_is_member(attnum, attnums))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * dependency_implies_attribute
+ *		check that the attnum matches is implied by the functional dependency
+ */
+bool
+dependency_implies_attribute(MVDependency dependency, AttrNumber attnum,
+							 int16 *attmap)
+{
+	if (attnum == attmap[dependency->attributes[dependency->nattributes-1]])
+		return true;
+
+	return false;
+}
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
+
+/*
  * pg_dependencies_in		- input routine for type pg_dependencies.
  *
  * pg_dependencies is real enough to be a table column, but it has no operations
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 227b9ff..3ad4e48 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -65,9 +65,13 @@ typedef struct MVDependenciesData
 
 typedef MVDependenciesData *MVDependencies;
 
-
+bool dependency_implies_attribute(MVDependency dependency, AttrNumber attnum,
+								  int16 *attmap);
+bool dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
+								 int16 *attmap);
 
 MVNDistinct		load_mv_ndistinct(Oid mvoid);
+MVDependencies	load_mv_dependencies(Oid mvoid);
 
 bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
 bytea *serialize_mv_dependencies(MVDependencies dependencies);
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
index d442a16..cf57a67 100644
--- a/src/test/regress/expected/mv_dependencies.out
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -55,8 +55,10 @@ SELECT deps_enabled, deps_built, stadeps
 
 TRUNCATE functional_dependencies;
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
-     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+     SELECT mod(i,400), mod(i,200), mod(i,100) FROM generate_series(1,30000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
 ANALYZE functional_dependencies;
 SELECT deps_enabled, deps_built, stadeps
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
@@ -65,6 +67,16 @@ SELECT deps_enabled, deps_built, stadeps
  t            | t          | [{0 => 1 : 1.000000}, {0 => 2 : 1.000000}, {1 => 2 : 1.000000}, {0, 1 => 2 : 1.000000}, {0, 2 => 1 : 1.000000}]
 (1 row)
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
 DROP TABLE functional_dependencies;
 -- varlena type (text)
 CREATE TABLE functional_dependencies (
@@ -110,8 +122,10 @@ SELECT deps_enabled, deps_built, stadeps
 
 TRUNCATE functional_dependencies;
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
-     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+     SELECT mod(i,400), mod(i,200), mod(i,100) FROM generate_series(1,30000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
 ANALYZE functional_dependencies;
 SELECT deps_enabled, deps_built, stadeps
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
@@ -120,6 +134,16 @@ SELECT deps_enabled, deps_built, stadeps
  t            | t          | [{0 => 1 : 1.000000}, {0 => 2 : 1.000000}, {1 => 2 : 1.000000}, {0, 1 => 2 : 1.000000}, {0, 2 => 1 : 1.000000}]
 (1 row)
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
 DROP TABLE functional_dependencies;
 -- NULL values (mix of int and text columns)
 CREATE TABLE functional_dependencies (
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
index 43df798..49db649 100644
--- a/src/test/regress/sql/mv_dependencies.sql
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -53,13 +53,20 @@ SELECT deps_enabled, deps_built, stadeps
 TRUNCATE functional_dependencies;
 
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
-     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+     SELECT mod(i,400), mod(i,200), mod(i,100) FROM generate_series(1,30000) s(i);
+
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, stadeps
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
 DROP TABLE functional_dependencies;
 
 -- varlena type (text)
@@ -96,6 +103,7 @@ TRUNCATE functional_dependencies;
 -- a => b, a => c
 INSERT INTO functional_dependencies
      SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, stadeps
@@ -104,13 +112,20 @@ SELECT deps_enabled, deps_built, stadeps
 TRUNCATE functional_dependencies;
 
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
-     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+     SELECT mod(i,400), mod(i,200), mod(i,100) FROM generate_series(1,30000) s(i);
+
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, stadeps
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
 DROP TABLE functional_dependencies;
 
 -- NULL values (mix of int and text columns)
-- 
2.5.5

0005-PATCH-multivariate-MCV-lists-v22.patchbinary/octet-stream; name=0005-PATCH-multivariate-MCV-lists-v22.patchDownload

From 29eb7e89bf69361075230d7926553a035b0c841b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:38:02 +0200
Subject: [PATCH 5/9] PATCH: multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries
- pg_mcv_list data type (varlena-based)

Includes regression tests, mostly equal to regression tests for
functional dependencies.

A varlena-based data type for storing serialized MCV lists.
---
 doc/src/sgml/catalogs.sgml                |   30 +
 doc/src/sgml/planstats.sgml               |  157 ++++
 doc/src/sgml/ref/create_statistics.sgml   |   34 +
 src/backend/catalog/system_views.sql      |    4 +-
 src/backend/commands/statscmds.c          |   11 +-
 src/backend/nodes/outfuncs.c              |    2 +
 src/backend/optimizer/path/clausesel.c    |  636 +++++++++++++++-
 src/backend/optimizer/util/plancat.c      |    4 +-
 src/backend/utils/mvstats/Makefile        |    2 +-
 src/backend/utils/mvstats/README.mcv      |  137 ++++
 src/backend/utils/mvstats/README.stats    |   87 ++-
 src/backend/utils/mvstats/common.c        |  136 +++-
 src/backend/utils/mvstats/common.h        |   22 +-
 src/backend/utils/mvstats/mcv.c           | 1184 +++++++++++++++++++++++++++++
 src/bin/psql/describe.c                   |   24 +-
 src/include/catalog/pg_cast.h             |    5 +
 src/include/catalog/pg_mv_statistic.h     |   18 +-
 src/include/catalog/pg_proc.h             |   14 +
 src/include/catalog/pg_type.h             |    4 +
 src/include/nodes/relation.h              |    6 +-
 src/include/utils/builtins.h              |    4 +
 src/include/utils/mvstats.h               |   67 +-
 src/test/regress/expected/mv_mcv.out      |  198 +++++
 src/test/regress/expected/opr_sanity.out  |    3 +-
 src/test/regress/expected/rules.out       |    4 +-
 src/test/regress/expected/type_sanity.out |    3 +-
 src/test/regress/parallel_schedule        |    2 +-
 src/test/regress/serial_schedule          |    1 +
 src/test/regress/sql/mv_mcv.sql           |  169 ++++
 29 files changed, 2898 insertions(+), 70 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.mcv
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 716024b..b82ca13 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4281,6 +4281,17 @@
      </row>
 
      <row>
+      <entry><structfield>mcv_enabled</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, MVC list will be computed for the combination of columns,
+       covered by the statistics. This does not mean the MCV list is already
+       computed, though.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>ndist_built</structfield></entry>
       <entry><type>bool</type></entry>
       <entry></entry>
@@ -4301,6 +4312,16 @@
      </row>
 
      <row>
+      <entry><structfield>mcv_built</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, MCV list is already computed and available for use during query
+       estimation.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>stakeys</structfield></entry>
       <entry><type>int2vector</type></entry>
       <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
@@ -4329,6 +4350,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stamcv</structfield></entry>
+      <entry><type>pg_mcv_list</type></entry>
+      <entry></entry>
+      <entry>
+       MCV list, serialized as <structname>pg_mcv_list</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index bdad2db..1ee4293 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -757,6 +757,163 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
 
   </sect2>
 
+  <sect2 id="mcv-lists">
+   <title>MCV lists</title>
+
+   <para>
+    As explained in the previous section, functional dependencies are very
+    cheap and efficient type of statistics, but it has limitations due to the
+    global nature (only tracking column-level dependencies, not between values
+    stored in the columns).
+   </para>
+
+   <para>
+    This section introduces multivariate most-common values (<acronym>MCV</>)
+    lists, a direct generalization of the statistics introduced in
+    <xref linkend="row-estimation-examples">, that is not subject to this
+    limitation. It is however more expensive, both in terms of storage and
+    planning time.
+   </para>
+
+   <para>
+    Let's look at the example query from the previous section again, creating
+    a multivariate <acronym>MCV</> list on the columns (after dropping the
+    functional dependencies, to make sure the planner uses the newly created
+    <acronym>MCV</> list when computing the estimates).
+
+<programlisting>
+DROP STATISTICS s1;
+CREATE STATISTICS s2 ON t (a,b) WITH (mcv);
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.036..3.011 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.188 ms
+ Execution time: 3.229 ms
+(5 rows)
+</programlisting>
+
+    The estimate is as accurate as with the functional dependencies, mostly
+    thanks to the table being a fairly small and having a simple distribution
+    with low number of distinct values. Before looking at the second query,
+    which was not handled by functional dependencies this well, let's inspect
+    the <acronym>MCV</> list a bit.
+   </para>
+
+   <para>
+    First, let's list statistics defined on a table using <command>\d</>
+    in <application>psql</>:
+
+<programlisting>
+\d t
+       Table "public.t"
+ Column |  Type   | Modifiers
+--------+---------+-----------
+ a      | integer |
+ b      | integer |
+Statistics:
+    "public.s2" (mcv) ON (a, b)
+</programlisting>
+
+   </para>
+
+   <para>
+    To inspect details of the <acronym>MCV</> statistics, we can look into the
+    <structname>pg_mv_stats</structname> view
+
+<programlisting>
+SELECT tablename, staname, attnums, mcvbytes, mcvinfo
+  FROM pg_mv_stats WHERE staname = 's2';
+ tablename | staname | attnums | mcvbytes |  mcvinfo
+-----------+---------+---------+----------+------------
+ t         | s2      | 1 2     |     2048 | nitems=100
+(1 row)
+</programlisting>
+
+    According to this, the statistics has 2kB when serialized into
+    a <literal>bytea</> value, and <command>ANALYZE</> found 100 distinct
+    combinations of values in the two columns.
+   </para>
+
+   <para>
+    Inspecting the contents of the MCV list is possible using
+    <function>pg_mv_mcv_items</> function.
+
+<programlisting>
+SELECT * FROM pg_mv_mcv_items((SELECT oid FROM pg_mv_statistic WHERE staname = 's2'));
+ index | values  | nulls | frequency
+-------+---------+-------+-----------
+     0 | {0,0}   | {f,f} |      0.01
+     1 | {1,1}   | {f,f} |      0.01
+     2 | {2,2}   | {f,f} |      0.01
+...
+    49 | {49,49} | {f,f} |      0.01
+    50 | {50,0}  | {f,f} |      0.01
+...
+    97 | {97,47} | {f,f} |      0.01
+    98 | {98,48} | {f,f} |      0.01
+    99 | {99,49} | {f,f} |      0.01
+(100 rows)
+</programlisting>
+
+    Which confirms there are 100 distinct combinations of values in the two
+    columns, and all of them are equally likely (1% frequency for each).
+    Had there been any null values in either of the columns, this would be
+    identified in the <structfield>nulls</> column.
+   </para>
+
+   <para>
+    When estimating the selectivity, the planner applies all the conditions
+    on items in the <acronym>MCV</> list, and them sums the frequencies
+    of the matching ones. See <function>clauselist_mv_selectivity_mcvlist</>
+    in <filename>clausesel.c</> for details.
+   </para>
+
+   <para>
+    Compared to functional dependencies, <acronym>MCV</> lists have two major
+    advantages. Firstly, the list stores actual values, making it possible to
+    detect "incompatible" combinations.
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+                                         QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=2.823..2.823 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.268 ms
+ Execution time: 2.866 ms
+(5 rows)
+</programlisting>
+
+    Secondly, <acronym>MCV</> also handle a wide range of clause types, not
+    just equality clauses like functional dependencies. See for example the
+    example range query, presented earlier:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
+                                         QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=3.349..3.349 rows=0 loops=1)
+   Filter: ((a <= 49) AND (b > 49))
+   Rows Removed by Filter: 10000
+ Planning time: 0.163 ms
+ Execution time: 3.389 ms
+(5 rows)
+</programlisting>
+
+   </para>
+
+   <para>
+    For additional information about multivariate MCV lists, see
+    <filename>src/backend/utils/mvstats/README.mcv</>.
+   </para>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 42adc38..fc97b16 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -125,6 +125,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>ndistinct</> (<type>boolean</>)</term>
     <listitem>
      <para>
@@ -168,6 +177,31 @@ EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
 </programlisting>
   </para>
 
+  <para>
+   Create table <structname>t2</> with two perfectly correlated columns
+   (containing identical data), and a MCV list on those columns:
+
+<programlisting>
+CREATE TABLE t2 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t2 SELECT mod(i,100), mod(i,100)
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s2 WITH (mcv) ON (a, b) FROM t2;
+
+ANALYZE t2;
+
+-- valid combination (found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
+
+-- invalid combination (not found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
+</programlisting>
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fc1b1a9..7ac3c4c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -188,7 +188,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(s.standist::bytea) AS ndistbytes,
-        length(S.stadeps::bytea) AS depsbytes
+        length(S.stadeps::bytea) AS depsbytes,
+        length(S.stamcv::bytea) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index ec41bbc..e428c69 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -76,7 +76,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool		build_ndistinct = false,
-				build_dependencies = false;
+				build_dependencies = false,
+				build_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -169,6 +170,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -177,10 +180,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* Make sure there's at least one statistics type specified. */
-	if (! (build_ndistinct || build_dependencies))
+	if (!(build_ndistinct || build_dependencies || build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (ndistinct, dependencies) requested")));
+				 errmsg("no statistics type (ndistinct, dependencies, mcv) requested")));
 
 	stakeys = buildint2vector(attnums, numcols);
 
@@ -203,9 +206,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* enabled statistics */
 	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(build_ndistinct);
 	values[Anum_pg_mv_statistic_deps_enabled - 1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled - 1] = BoolGetDatum(build_mcv);
 
 	nulls[Anum_pg_mv_statistic_standist - 1] = true;
 	nulls[Anum_pg_mv_statistic_stadeps - 1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv - 1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 26e504f..ea3db02 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2181,10 +2181,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(ndist_enabled);
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(ndist_built);
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index ec74b16..c422dd5 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,12 +48,14 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		STATS_TYPE_FDEPS	0x01
+#define		STATS_TYPE_MCV		0x02
 
-static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
+						int type);
 
-static Bitmapset *collect_mv_attnums(List *clauses, Index relid);
+static Bitmapset *collect_mv_attnums(List *clauses, Index relid, int type);
 
-static int	count_mv_attnums(List *clauses, Index relid);
+static int	count_mv_attnums(List *clauses, Index relid, int type);
 
 static int	count_varnos(List *clauses, Index *relid);
 
@@ -63,10 +66,23 @@ static List *clauselist_mv_split(PlannerInfo *root, Index relid,
 					List *clauses, List **mvclauses,
 					MVStatisticInfo *mvstats, int types);
 
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						  List *clauses, MVStatisticInfo *mvstats);
+
 static Selectivity clauselist_mv_selectivity_deps(PlannerInfo *root,
 						Index relid, List *clauses, MVStatisticInfo *mvstats,
 						Index varRelid, JoinType jointype, SpecialJoinInfo *sjinfo);
 
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+								  List *clauses, MVStatisticInfo *mvstats,
+								  bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+							int2vector *stakeys, MCVList mcvlist,
+							int nmatches, char *matches,
+							Selectivity *lowsel, bool *fullmatch,
+							bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List *find_stats(PlannerInfo *root, Index relid);
@@ -74,6 +90,9 @@ static List *find_stats(PlannerInfo *root, Index relid);
 static bool stats_type_matches(MVStatisticInfo *stat, int type);
 
 
+#define UPDATE_RESULT(m,r,isor) \
+	(m) = (isor) ? (Max(m,r)) : (Min(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -99,11 +118,13 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
  * to verify that suitable multivariate statistics exist.
  *
  * If we identify such multivariate statistics apply, we try to apply them.
- * Currently we only have (soft) functional dependencies, so we try to reduce
- * the list of clauses.
  *
- * Then we remove the clauses estimated using multivariate stats, and process
- * the rest of the clauses using the regular per-column stats.
+ * First we try to reduce the list of clauses by applying (soft) functional
+ * dependencies, and then we try to estimate the selectivity of the reduced
+ * list of clauses using the multivariate MCV list.
+ *
+ * Finally we remove the portion of clauses estimated using multivariate stats,
+ * and process the rest of the clauses using the regular per-column stats.
  *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
@@ -173,7 +194,10 @@ clauselist_selectivity(PlannerInfo *root,
 
 	/*
 	 * Check that there are multivariate statistics usable for selectivity
-	 * estimation, i.e. anything except ndistinct coefficients.
+	 * estimation. We try to apply MCV lists first, because statistics
+	 * tracking actual values tend to provide more reliable estimates than
+	 * functional dependencies (which assume that the clauses are consistent
+	 * with the statistics).
 	 *
 	 * Also check the number of attributes in clauses that might be estimated
 	 * using those statistics, and that there are at least two such attributes.
@@ -184,14 +208,43 @@ clauselist_selectivity(PlannerInfo *root,
 	 * If there are no such stats or not enough attributes, don't waste time
 	 * simply skip to estimation using the plain per-column stats.
 	 */
+	if (has_stats(stats, STATS_TYPE_MCV) &&
+		(count_mv_attnums(clauses, relid, STATS_TYPE_MCV) >= 2))
+	{
+		/* collect attributes from the compatible conditions */
+		Bitmapset  *mvattnums = collect_mv_attnums(clauses, relid,
+												   STATS_TYPE_MCV);
+
+		/* and search for the statistic covering the most attributes */
+		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums,
+													   STATS_TYPE_MCV);
+
+		if (mvstat != NULL)		/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	   *mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, STATS_TYPE_MCV);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats */
+			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+		}
+	}
+
+	/* Now try to apply functional dependencies on the remaining clauses. */
 	if (has_stats(stats, STATS_TYPE_FDEPS) &&
-		(count_mv_attnums(clauses, relid) >= 2))
+		(count_mv_attnums(clauses, relid, STATS_TYPE_FDEPS) >= 2))
 	{
 		MVStatisticInfo *mvstat;
 		Bitmapset  *mvattnums;
 
 		/* collect attributes from the compatible conditions */
-		mvattnums = collect_mv_attnums(clauses, relid);
+		mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_FDEPS);
 
 		/* and search for the statistic covering the most attributes */
 		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_FDEPS);
@@ -994,7 +1047,7 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 		/* clauses remaining after removing those on the "implied" attribute */
 		List		   *clauses_filtered = NIL;
 
-		attnums = collect_mv_attnums(clauses, relid);
+		attnums = collect_mv_attnums(clauses, relid, STATS_TYPE_FDEPS);
 
 		/* no point in looking for dependencies with fewer than 2 attributes */
 		if (bms_num_members(attnums) < 2)
@@ -1017,7 +1070,7 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 		 */
 		foreach(lc, clauses)
 		{
-			AttrNumber	attnum_clause = InvalidAttrNumber;
+			Bitmapset  *attnums_clause = NULL;
 			Node	   *clause = (Node *) lfirst(lc);
 
 			/*
@@ -1026,17 +1079,20 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 			 * we should only see equality clauses compatible with functional
 			 * dependencies, so just error out if we stumble upon something else.
 			 */
-			if (! clause_is_mv_compatible(clause, relid, &attnum_clause))
+			if (! clause_is_mv_compatible(clause, relid, &attnums_clause,
+										  STATS_TYPE_FDEPS))
 				elog(ERROR, "clause not compatible with functional dependencies");
 
-			Assert(AttributeNumberIsValid(attnum_clause));
+			/* we also expect only simple equality clauses */
+			Assert(bms_num_members(attnums_clause) == 1);
 
 			/*
 			 * If the clause is not on the implied attribute, add it to the list
 			 * of filtered clauses (for the next round) and continue with the
 			 * next one.
 			 */
-			if (! dependency_implies_attribute(dependency, attnum_clause,
+			if (! dependency_implies_attribute(dependency,
+											   bms_singleton_member(attnums_clause),
 											   mvstats->stakeys->values))
 			{
 				clauses_filtered = lappend(clauses_filtered, clause);
@@ -1080,10 +1136,71 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 }
 
 /*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO: We may support some additional conditions, most importantly those
+ * matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO: Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ * selectivity of the most restrictive clause), because that's the maximum
+ * we can ever get from ANDed list of clauses. This may probably prevent
+ * issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO: We may remember the lowest frequency in the MCV list, and then later
+ * use it as a upper boundary for the selectivity (had there been a more
+ * frequent item, it'd be in the MCV list). This might improve cases with
+ * low-detail histograms.
+ *
+ * TODO: We may also derive some additional boundaries for the selectivity from
+ * the MCV list, because
+ *
+ * (a) if we have a "full equality condition" (one equality condition on
+ * each column of the statistic) and we found a match in the MCV list,
+ * then this is the final selectivity (and pretty accurate),
+ *
+ * (b) if we have a "full equality condition" and we haven't found a match
+ * in the MCV list, then the selectivity is below the lowest frequency
+ * found in the MCV list,
+ *
+ * TODO: When applying the clauses to the histogram/MCV list, we can do that
+ * from the most selective clauses first, because that'll eliminate the
+ * buckets/items sooner (so we'll be able to skip them without inspection,
+ * which is more expensive). But this requires really knowing the per-clause
+ * selectivities in advance, and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool		fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound for
+	 * full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/*
+	 * TODO: Evaluate simple 1D selectivities, use the smallest one as an
+	 * upper bound, product as lower bound, and sort the clauses in ascending
+	 * order by selectivity (to optimize the MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+											 &fullmatch, &mcv_low);
+}
+
+/*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid)
+collect_mv_attnums(List *clauses, Index relid, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
@@ -1099,12 +1216,10 @@ collect_mv_attnums(List *clauses, Index relid)
 	 */
 	foreach(l, clauses)
 	{
-		AttrNumber	attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(clause, relid, &attnum))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
 	}
 
 	/*
@@ -1125,10 +1240,10 @@ collect_mv_attnums(List *clauses, Index relid)
  * Count the number of attributes in clauses compatible with multivariate stats.
  */
 static int
-count_mv_attnums(List *clauses, Index relid)
+count_mv_attnums(List *clauses, Index relid, int type)
 {
 	int			c;
-	Bitmapset  *attnums = collect_mv_attnums(clauses, relid);
+	Bitmapset  *attnums = collect_mv_attnums(clauses, relid, type);
 
 	c = bms_num_members(attnums);
 
@@ -1263,7 +1378,8 @@ choose_mv_statistics(List *stats, Bitmapset *attnums, int types)
 		int			numattrs = info->stakeys->dim1;
 
 		/* skip statistics not matching any of the requested types */
-		if (! (info->deps_built && (STATS_TYPE_FDEPS & types)))
+		if (! ((info->deps_built && (STATS_TYPE_FDEPS & types)) ||
+			   (info->mcv_built && (STATS_TYPE_MCV & types))))
 			continue;
 
 		/* count columns covered by the statistics */
@@ -1317,13 +1433,13 @@ clauselist_mv_split(PlannerInfo *root, Index relid,
 	foreach(l, clauses)
 	{
 		bool		match = false;		/* by default not mv-compatible */
-		AttrNumber	attnum = InvalidAttrNumber;
+		Bitmapset  *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(l);
 
-		if (clause_is_mv_compatible(clause, relid, &attnum))
+		if (clause_is_mv_compatible(clause, relid, &attnums, types))
 		{
 			/* are all the attributes part of the selected stats? */
-			if (bms_is_member(attnum, mvattnums))
+			if (bms_is_subset(attnums, mvattnums))
 				match = true;
 		}
 
@@ -1348,6 +1464,7 @@ clauselist_mv_split(PlannerInfo *root, Index relid,
 
 typedef struct
 {
+	int			types;			/* types of statistics ? */
 	Index		varno;			/* relid we're interested in */
 	Bitmapset  *varattnos;		/* attnums referenced by the clauses */
 } mv_compatible_context;
@@ -1382,6 +1499,49 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		return mv_compatible_walker((Node *) rinfo->clause, (void *) context);
 	}
 
+	if (or_clause(node) || and_clause(node) || not_clause(node))
+	{
+		/*
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO: We might support mixed case, where some of the clauses are
+		 * supported and some are not, and treat all supported subclauses as a
+		 * single clause, compute it's selectivity using mv stats, and compute
+		 * the total selectivity using the current algorithm.
+		 *
+		 * TODO: For RestrictInfo above an OR-clause, we might use the
+		 * orclause with nested RestrictInfo - we won't have to call
+		 * pull_varnos() for each clause, saving time.
+		 *
+		 * TODO: Perhaps this needs a bit more thought for functional
+		 * dependencies? Those don't quite work for NOT cases.
+		 */
+		BoolExpr   *expr = (BoolExpr *) node;
+		ListCell   *lc;
+
+		foreach(lc, expr->args)
+		{
+			if (mv_compatible_walker((Node *) lfirst(lc), context))
+				return true;
+		}
+
+		return false;
+	}
+
+	if (IsA(node, NullTest))
+	{
+		NullTest   *nt = (NullTest *) node;
+
+		/*
+		 * Only simple (Var IS NULL) expressions supported for now. Maybe we
+		 * could use examine_variable to fix this?
+		 */
+		if (!IsA(nt->arg, Var))
+			return true;
+
+		return mv_compatible_walker((Node *) (nt->arg), context);
+	}
+
 	if (IsA(node, Var))
 	{
 		Var		   *var = (Var *) node;
@@ -1442,10 +1602,18 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		switch (get_oprrest(expr->opno))
 		{
 			case F_EQSEL:
-
 				/* equality conditions are compatible with all statistics */
 				break;
 
+			case F_SCALARLTSEL:
+			case F_SCALARGTSEL:
+
+				/* not compatible with functional dependencies */
+				if (!(context->types & STATS_TYPE_MCV))
+					return true;	/* terminate */
+
+				break;
+
 			default:
 
 				/* unknown estimator */
@@ -1479,10 +1647,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
  * evaluate them using multivariate stats.
  */
 static bool
-clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int types)
 {
 	mv_compatible_context context;
 
+	context.types = types;
 	context.varno = relid;
 	context.varattnos = NULL;	/* no attnums */
 
@@ -1490,7 +1659,7 @@ clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
 		return false;
 
 	/* remember the newly collected attnums */
-	*attnum = bms_singleton_member(context.varattnos);
+	*attnums = bms_add_members(*attnums, context.varattnos);
 
 	return true;
 }
@@ -1505,6 +1674,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & STATS_TYPE_FDEPS) && stat->deps_built)
 		return true;
 
+	if ((type & STATS_TYPE_MCV) && stat->mcv_built)
+		return true;
+
 	return false;
 }
 
@@ -1538,3 +1710,409 @@ find_stats(PlannerInfo *root, Index relid)
 
 	return root->simple_rel_array[relid]->mvstatlist;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *	 1) mark all items as 'match'
+ *	 2) walk through all the clauses
+ *	 3) for a particular clause, walk through all the items
+ *	 4) skip items that are already 'no match'
+ *	 5) check clause for items that still match
+ *	 6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO: This only handles AND-ed clauses, but it might work for OR-ed
+ * lists too - it just needs to reverse the logic a bit. I.e. start
+ * with 'no match' for all items, and mark the items as a match
+ * as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int			i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList		mcvlist = NULL;
+	int			nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char	   *matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (!mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char) * mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s * u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO: This works with 'bitmap' where each bit is represented as a char,
+ * which is slightly wasteful. Instead, we could use a regular
+ * bitmap, reducing the size to ~1/8. Another thing is merging the
+ * bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+							int2vector *stakeys, MCVList mcvlist,
+							int nmatches, char *matches,
+							Selectivity *lowsel, bool *fullmatch,
+							bool is_or)
+{
+	int			i;
+	ListCell   *l;
+
+	Bitmapset  *eqmatches = NULL;		/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (!is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/*
+	 * find the lowest frequency in the MCV list
+	 *
+	 * We need to do that here, because we do various tricks in the following
+	 * code - skipping items already ruled out, etc.
+	 *
+	 * XXX A loop is necessary because the MCV list is not sorted by
+	 * frequency.
+	 */
+	*lowsel = 1.0;
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem		item = mcvlist->items[i];
+
+		if (item->frequency < *lowsel)
+			*lowsel = item->frequency;
+	}
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate all the
+	 * MCV items not yet eliminated by the preceding clauses.
+	 */
+	foreach(l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node *) ((RestrictInfo *) clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (!is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+			break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+			FmgrInfo	opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	gtproc;
+				Var		   *var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const	   *cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool		isgt = (!varonleft);
+
+				TypeCacheEntry *typecache
+				= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int			idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		mismatch = false;
+					MCVItem		item = mcvlist->items[i];
+
+					/*
+					 * If there are no more matches (AND) or no remaining
+					 * unmatched items (OR), we can stop processing this
+					 * clause.
+					 */
+					if (((nmatches == 0) && (!is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no
+					 * match' (and then skip them). For OR-lists this is not
+					 * possible.
+					 */
+					if ((!is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((!is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					switch (oprrest)
+					{
+						case F_EQSEL:
+
+							/*
+							 * We don't care about isgt in equality, because
+							 * it does not matter whether it's (var = const)
+							 * or (const = var).
+							 */
+							mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+													   DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+														 item->values[idx]));
+
+							if (!mismatch)
+								eqmatches = bms_add_member(eqmatches, idx);
+
+							break;
+
+						case F_SCALARLTSEL:		/* column < constant */
+						case F_SCALARGTSEL:		/* column > constant */
+
+							/*
+							 * First check whether the constant is below the
+							 * lower boundary (in that case we can skip the
+							 * bucket, because there's no overlap).
+							 */
+							if (isgt)
+								mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+														   DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															item->values[idx]));
+							else
+								mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+														   DEFAULT_COLLATION_OID,
+															 item->values[idx],
+															  cst->constvalue));
+
+							break;
+					}
+
+					/*
+					 * XXX The conditions on matches[i] are not needed, as we
+					 * skip MCV items that can't become true/false, depending
+					 * on the current flag. See beginning of the loop over MCV
+					 * items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (!mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((!is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest   *expr = (NullTest *) clause;
+			Var		   *var = (Var *) (expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int			idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We
+			 * can skip items that were already ruled out, and terminate if
+			 * there are no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem		item = mcvlist->items[i];
+
+				/*
+				 * if there are no more matches, we can stop processing this
+				 * clause
+				 */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (!item->isnull[idx])) ||
+				((expr->nulltesttype == IS_NOT_NULL) && (item->isnull[idx])))
+				{
+					matches[i] = MVSTATS_MATCH_NONE;
+					--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/*
+			 * AND/OR clause, with all clauses compatible with the selected MV
+			 * stat
+			 */
+
+			int			i;
+			BoolExpr   *orclause = ((BoolExpr *) clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int			or_nmatches = 0;
+			char	   *or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char) * or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char) * or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+													  stakeys, mcvlist,
+													  or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * Merge the result into the bitmap (Min for AND, Max for OR).
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match. In this
+	 * case there can be just a single MCV item, matching the clause (if there
+	 * were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 91d4099..ff93ddb 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -422,7 +422,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->ndist_built)
+			if (mvstat->deps_built || mvstat->ndist_built || mvstat->mcv_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -432,10 +432,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				/* enabled statistics */
 				info->ndist_enabled = mvstat->ndist_enabled;
 				info->deps_enabled = mvstat->deps_enabled;
+				info->mcv_enabled = mvstat->mcv_enabled;
 
 				/* built/available statistics */
 				info->ndist_built = mvstat->ndist_built;
 				info->deps_built = mvstat->deps_built;
+				info->mcv_built = mvstat->mcv_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 21fe7e5..d5d47ba 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mvdist.o
+OBJS = common.o dependencies.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.mcv b/src/backend/utils/mvstats/README.mcv
new file mode 100644
index 0000000..e93cfe4
--- /dev/null
+++ b/src/backend/utils/mvstats/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+    (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+    (1) there's an equality condition on all attributes of the statistic
+
+    (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+     f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+     of the actual values. This type of MCV list will be useful only for
+     estimating equality clauses, and will reduce space requirements for large
+     varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+     building the histogram at all), except for doing this decision manually in
+     ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+    SELECT * FROM pg_mv_mcv_items();
+
+It has two input parameters:
+
+    oid   - OID of the MCV list (pg_mv_statistic.staoid)
+
+and produces a table with these columns:
+
+    - item ID (0...nitems-1)
+    - values (string array)
+    - nulls only (boolean array)
+    - frequency (double precision)
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 251a468..57f5c8b 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,9 +8,50 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-Currently we only have one kind of multivariate statistics - soft functional
-dependencies, and we use it to improve estimates of equality clauses. See
-README.dependencies for details.
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) MCV lists (README.mcv)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+    (b) MCV list - equality and inequality clauses, IS [NOT] NULL, AND/OR
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
 
 
 Selectivity estimation
@@ -23,21 +64,53 @@ When estimating selectivity, we aim to achieve several things:
     (b) minimize the overhead, especially when no suitable multivariate stats
         exist (so if you are not using multivariate stats, there's no overhead)
 
-This clauselist_selectivity() performs several inexpensive checks first, before
+Thus clauselist_selectivity() performs several inexpensive checks first, before
 even attempting to do the more expensive estimation.
 
     (1) check if there are multivariate stats on the relation
 
-    (2) check there are at least two attributes referenced by clauses compatible
-        with multivariate statistics (equality clauses for func. dependencies)
+    (2) check that there are functional dependencies on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equality clauses for func. dependencies)
 
     (3) perform reduction of equality clauses using func. dependencies
 
-    (4) estimate the reduced list of clauses using regular statistics
+    (4) check that there are multivariate MCV lists on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equalities, inequalities, etc.)
+
+    (5) find the best multivariate statistics (matching the most conditions)
+        and use it to compute the estimate
+
+    (6) estimate the remaining clauses (not estimated using multivariate stats)
+        using the regular per-column statistics
 
 Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
+Further (possibly crazy) ideas
+------------------------------
+
+Currently the clauses are only estimated using a single statistics, even if
+there are multiple candidate statistics - for example assume we have statistics
+on (a,b,c) and (b,c,d), and estimate conditions
+
+    (b = 1) AND (c = 2)
+
+Then both statistics may be used, but we only use one of them. Maybe we could
+use compute estimates using all candidate stats, and somehow aggregate them
+into the final estimate by using average or median.
+
+Some stats may give better estimates than others, but it's very difficult to say
+in advance which stats are the best (it depends on the number of buckets, number
+of additional columns not referenced in the clauses, type of condition etc.).
+
+But of course, this may result in expensive estimation (CPU-wise).
+
+So we might add a GUC to choose between a simple (single statistics) and thus
+multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
+
+
 Size of sample in ANALYZE
 -------------------------
 When performing ANALYZE, the number of rows to sample is determined as
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 5f3fb55..5d9caa8 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -15,13 +15,13 @@
  */
 
 #include "common.h"
+#include "utils/array.h"
 
 static VacAttrStats **lookup_var_attr_stats(int2vector *attrs,
 					  int natts, VacAttrStats **vacattrstats);
 
 static List *list_mv_stats(Oid relid);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -51,6 +51,8 @@ build_mv_stats(Relation onerel, double totalrows,
 		MVStatisticInfo *stat = (MVStatisticInfo *) lfirst(lc);
 		MVNDistinct	ndistinct = NULL;
 		MVDependencies deps = NULL;
+		MCVList		mcvlist = NULL;
+		int			numrows_filtered = 0;
 
 		VacAttrStats **stats = NULL;
 		int			numatts = 0;
@@ -91,8 +93,12 @@ build_mv_stats(Relation onerel, double totalrows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the statistics in the catalog */
-		update_mv_stats(stat->mvoid, ndistinct, deps, attrs, stats);
+		update_mv_stats(stat->mvoid, ndistinct, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -174,6 +180,8 @@ list_mv_stats(Oid relid)
 		info->ndist_built = stats->ndist_built;
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -190,8 +198,56 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
+
+/*
+ * Find attnims of MV stats using the mvoid.
+ */
+int2vector *
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+						   ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (!HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+						   ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/*
+	 * TODO maybe save the list into relcache, as in RelationGetIndexList
+	 * (which was used as an inspiration of this one)?.
+	 */
+
+	return keys;
+}
+
 void
-update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
+update_mv_stats(Oid mvoid,
+				MVNDistinct ndistinct, MVDependencies dependencies, MCVList mcvlist,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -225,22 +281,36 @@ update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea	   *data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+
+		nulls[Anum_pg_mv_statistic_stamcv - 1] = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_standist - 1] = true;
 	replaces[Anum_pg_mv_statistic_stadeps - 1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv - 1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_ndist_built - 1] = false;
 	nulls[Anum_pg_mv_statistic_deps_built - 1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built - 1] = false;
+
 	nulls[Anum_pg_mv_statistic_stakeys - 1] = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_ndist_built - 1] = true;
 	replaces[Anum_pg_mv_statistic_deps_built - 1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built - 1] = true;
+
 	replaces[Anum_pg_mv_statistic_stakeys - 1] = true;
 
 	values[Anum_pg_mv_statistic_ndist_built - 1] = BoolGetDatum(ndistinct != NULL);
 	values[Anum_pg_mv_statistic_deps_built - 1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built - 1] = BoolGetDatum(mcvlist != NULL);
 
 	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(attrs);
 
@@ -270,6 +340,23 @@ update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector *stakeys)
+{
+	int			i,
+				idx = 0;
+
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -280,11 +367,15 @@ update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum *) a;
-	Datum		db = *(Datum *) b;
-	SortSupport ssup = (SortSupport) arg;
+	return compare_datums_simple(*(Datum *) a,
+								 *(Datum *) b,
+								 (SortSupport) arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
@@ -401,3 +492,34 @@ multi_sort_compare_dims(int start, int end,
 
 	return 0;
 }
+
+/* simple counterpart to qsort_arg */
+void *
+bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,
+			int (*compar) (const void *, const void *, void *),
+			void *arg)
+{
+	size_t		l,
+				u,
+				idx;
+	const void *p;
+	int			comparison;
+
+	l = 0;
+	u = nmemb;
+	while (l < u)
+	{
+		idx = (l + u) / 2;
+		p = (void *) (((const char *) base) + (idx * size));
+		comparison = (*compar) (key, p, arg);
+
+		if (comparison < 0)
+			u = idx;
+		else if (comparison > 0)
+			l = idx + 1;
+		else
+			return (void *) p;
+	}
+
+	return NULL;
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index e471c88..fe56f51 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -47,6 +47,15 @@ typedef struct
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
 
+/* (de)serialization info */
+typedef struct DimensionInfo
+{
+	int			nvalues;		/* number of deduplicated values */
+	int			nbytes;			/* number of bytes (serialized) */
+	int			typlen;			/* pg_type.typlen */
+	bool		typbyval;		/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData
 {
@@ -60,6 +69,7 @@ typedef struct SortItem
 {
 	Datum	   *values;
 	bool	   *isnull;
+	int			count;
 } SortItem;
 
 MultiSortSupport multi_sort_init(int ndims);
@@ -67,7 +77,7 @@ MultiSortSupport multi_sort_init(int ndims);
 void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 						 int dim, VacAttrStats **vacattrstats);
 
-int multi_sort_compare(const void *a, const void *b, void *arg);
+int			multi_sort_compare(const void *a, const void *b, void *arg);
 
 int multi_sort_compare_dim(int dim, const SortItem *a,
 					   const SortItem *b, MultiSortSupport mss);
@@ -76,5 +86,11 @@ int multi_sort_compare_dims(int start, int end, const SortItem *a,
 						const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
-int compare_scalars_simple(const void *a, const void *b, void *arg);
-int compare_scalars_partition(const void *a, const void *b, void *arg);
+int			compare_datums_simple(Datum a, Datum b, SortSupport ssup);
+int			compare_scalars_simple(const void *a, const void *b, void *arg);
+int			compare_scalars_partition(const void *a, const void *b, void *arg);
+
+void *bsearch_arg(const void *key, const void *base,
+			size_t nmemb, size_t size,
+			int (*compar) (const void *, const void *, void *),
+			void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..c1c2409
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1184 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/bytea.h"
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes			  (ndim * sizeof(uint16))
+ * - null flags			  (ndim * sizeof(bool))
+ * - frequency			  (sizeof(double))
+ *
+ * So in total:
+ *
+ *	 ndim * (sizeof(uint16) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* Macros for convenient access to parts of the serialized MCV item */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+static MultiSortSupport build_mss(VacAttrStats **stats, int2vector *attrs);
+
+static SortItem *build_sorted_items(int numrows, HeapTuple *rows,
+				   TupleDesc tdesc, MultiSortSupport mss,
+				   int2vector *attrs);
+
+static SortItem *build_distinct_groups(int numrows, SortItem *items,
+					  MultiSortSupport mss, int *ndistinct);
+
+static int count_distinct_groups(int numrows, SortItem *items,
+					  MultiSortSupport mss);
+
+/*
+ * Builds MCV list from the set of sampled rows.
+ *
+ * The algorithm is quite simple:
+ *
+ *	   (1) sort the data (default collation, '<' for the data type)
+ *
+ *	   (2) count distinct groups, decide how many to keep
+ *
+ *	   (3) build the MCV list using the threshold determined in (2)
+ *
+ *	   (4) remove rows represented by the MCV from the sample
+ *
+ * The method also removes rows matching the MCV items from the input array,
+ * and passes the number of remaining rows (useful for building histograms)
+ * using the numrows_filtered parameter.
+ *
+ * FIXME: Single-dimensional MCV is sorted by frequency (descending). We should
+ * do that too, because when walking through the list we want to check
+ * the most frequent items first.
+ *
+ * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4).
+ * Maybe we could save some space here, but the bytea compression should
+ * handle it just fine.
+ *
+ * TODO: This probably should not use the ndistinct directly (as computed from
+ * the table, but rather estimate the number of distinct values in the
+ * table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered)
+{
+	int			i;
+	int			numattrs = attrs->dim1;
+	int			ndistinct = 0;
+	int			mcv_threshold = 0;
+	int			nitems = 0;
+
+	MCVList		mcvlist = NULL;
+
+	/* comparator for all the columns */
+	MultiSortSupport mss = build_mss(stats, attrs);
+
+	/* sort the rows */
+	SortItem   *items = build_sorted_items(numrows, rows, stats[0]->tupDesc,
+										   mss, attrs);
+
+	/* transform the sorted rows into groups (sorted by frequency) */
+	SortItem   *groups = build_distinct_groups(numrows, items, mss, &ndistinct);
+
+	/*
+	 * Determine the minimum size of a group to be eligible for MCV list, and
+	 * check how many groups actually pass that threshold. We use 1.25x the
+	 * avarage group size, just like for regular statistics.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e. if
+	 * there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS), we'll
+	 * require only 2 rows per group.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4 : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/* Walk through the groups and stop once we fall below the threshold. */
+	nitems = 0;
+	for (i = 0; i < ndistinct; i++)
+	{
+		if (groups[i].count < mcv_threshold)
+			break;
+
+		nitems++;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList) palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as we will
+		 * pass the result outside and thus it needs to be easy to pfree().
+		 *
+		 * XXX Although we're the only ones dealing with this.
+		 */
+		mcvlist->items = (MCVItem *) palloc0(sizeof(MCVItem) * nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem) palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum *) palloc0(sizeof(Datum) * numattrs);
+			mcvlist->items[i]->isnull = (bool *) palloc0(sizeof(bool) * numattrs);
+		}
+
+		/* Copy the first chunk of groups into the result. */
+		for (i = 0; i < nitems; i++)
+		{
+			/* just pointer to the proper place in the list */
+			MCVItem		item = mcvlist->items[i];
+
+			/* copy values from the _previous_ group (last item of) */
+			memcpy(item->values, groups[i].values, sizeof(Datum) * numattrs);
+			memcpy(item->isnull, groups[i].isnull, sizeof(bool) * numattrs);
+
+			/* and finally the group frequency */
+			item->frequency = (double) groups[i].count / numrows;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows that are
+		 * not represented by the MCV list). We will first sort the groups by
+		 * the keys (not by count) and then use binary search.
+		 */
+		if (nitems > ndistinct)
+		{
+			int			i,
+						j;
+			int			nfiltered = 0;
+
+			/* used for the searches */
+			SortItem	key;
+
+			/* wfill this with data from the rows */
+			key.values = (Datum *) palloc0(numattrs * sizeof(Datum));
+			key.isnull = (bool *) palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * Sort the groups for bsearch_r (but only the items that actually
+			 * made it to the MCV list).
+			 */
+			qsort_arg((void *) groups, nitems, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					key.values[j]
+						= heap_getattr(rows[i], attrs->values[j],
+									   stats[j]->tupDesc, &key.isnull[j]);
+
+				/* if not included in the MCV list, keep it in the array */
+				if (bsearch_arg(&key, groups, nitems, sizeof(SortItem),
+								multi_sort_compare, mss) == NULL)
+					rows[nfiltered++] = rows[i];
+			}
+
+			/* remember how many rows we actually kept */
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(key.values);
+			pfree(key.isnull);
+		}
+		else
+			/* the MCV list convers all the rows */
+			*numrows_filtered = 0;
+	}
+
+	pfree(items);
+	pfree(groups);
+
+	return mcvlist;
+}
+
+/* build MultiSortSupport for the attributes passed in attrs */
+static MultiSortSupport
+build_mss(VacAttrStats **stats, int2vector *attrs)
+{
+	int			i;
+	int			numattrs = attrs->dim1;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	return mss;
+}
+
+/* build sorted array of SortItem with values from rows */
+static SortItem *
+build_sorted_items(int numrows, HeapTuple *rows, TupleDesc tdesc,
+				   MultiSortSupport mss, int2vector *attrs)
+{
+	int			i,
+				j,
+				len;
+	int			numattrs = attrs->dim1;
+	int			nvalues = numrows * numattrs;
+
+	/*
+	 * We won't allocate the arrays for each item independenly, but in one
+	 * large chunk and then just set the pointers.
+	 */
+	SortItem   *items;
+	Datum	   *values;
+	bool	   *isnull;
+	char	   *ptr;
+
+	/* Compute the total amount of memory we need (both items and values). */
+	len = numrows * sizeof(SortItem) + nvalues * (sizeof(Datum) + sizeof(bool));
+
+	/* Allocate the memory and split it into the pieces. */
+	ptr = palloc0(len);
+
+	/* items to sort */
+	items = (SortItem *) ptr;
+	ptr += numrows * sizeof(SortItem);
+
+	/* values and null flags */
+	values = (Datum *) ptr;
+	ptr += nvalues * sizeof(Datum);
+
+	isnull = (bool *) ptr;
+	ptr += nvalues * sizeof(bool);
+
+	/* make sure we consumed the whole buffer exactly */
+	Assert((ptr - (char *) items) == len);
+
+	/* fix the pointers to Datum and bool arrays */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+
+		/* load the values/null flags from sample rows */
+		for (j = 0; j < numattrs; j++)
+		{
+			items[i].values[j] = heap_getattr(rows[i],
+											  attrs->values[j], /* attnum */
+											  tdesc,
+											  &items[i].isnull[j]);		/* isnull */
+		}
+	}
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	return items;
+}
+
+/* count distinct combinations of SortItems in the array */
+static int
+count_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss)
+{
+	int			i;
+	int			ndistinct;
+
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+			ndistinct += 1;
+
+	return ndistinct;
+}
+
+/* compares frequencies of the SortItem entries (in descending order) */
+static int
+compare_sort_item_count(const void *a, const void *b)
+{
+	SortItem   *ia = (SortItem *) a;
+	SortItem   *ib = (SortItem *) b;
+
+	if (ia->count == ib->count)
+		return 0;
+	else if (ia->count > ib->count)
+		return -1;
+
+	return 1;
+}
+
+/* builds SortItems for distinct groups and counts the matching items */
+static SortItem *
+build_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss,
+					  int *ndistinct)
+{
+	int			i,
+				j;
+	int			ngroups = count_distinct_groups(numrows, items, mss);
+
+	SortItem   *groups = (SortItem *) palloc0(ngroups * sizeof(SortItem));
+
+	j = 0;
+	groups[0] = items[0];
+	groups[0].count = 1;
+
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+			groups[++j] = items[i];
+
+		groups[j].count++;
+	}
+
+	pg_qsort((void *) groups, ngroups, sizeof(SortItem),
+			 compare_sort_item_count);
+
+	*ndistinct = ngroups;
+	return groups;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (!HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+							  Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO: Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList		mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * serialize MCV list into a bytea value
+ *
+ *
+ * The basic algorithm is simple:
+ *
+ * (1) perform deduplication (for each attribute separately)
+ *	   (a) collect all (non-NULL) attribute values from all MCV items
+ *	   (b) sort the data (using 'lt' from VacAttrStats)
+ *	   (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *	   (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we may be mixing
+ * different datatypes, with different sort operators, etc.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't allow more
+ * than 8k MCV items, although that's mostly arbitrary limit. We might increase
+ * this to 65k and still fit into uint16.
+ *
+ * We don't really expect the serialization to save as much space as for
+ * histograms, because we are not doing any bucket splits (which is the source
+ * of high redundancy in histograms).
+ *
+ * TODO: Consider packing boolean flags (NULL) for each item into a single char
+ * (or a longer type) instead of using an array of bool items.
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int			i,
+				j;
+	int			ndims = mcvlist->ndimensions;
+	int			itemsize = ITEM_SIZE(ndims);
+
+	SortSupport ssup;
+	DimensionInfo *info;
+
+	Size		total_length;
+
+	/* allocate just once */
+	char	   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea	   *output;
+	char	   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum	  **values = (Datum **) palloc0(sizeof(Datum *) * ndims);
+	int		   *counts = (int *) palloc0(sizeof(int) * ndims);
+
+	/*
+	 * We'll include some rudimentary information about the attributes (type
+	 * length, etc.), so that we don't have to look them up while
+	 * deserializing the MCV list.
+	 */
+	info = (DimensionInfo *) palloc0(sizeof(DimensionInfo) * ndims);
+
+	/* sort support data for all attributes included in the MCV list */
+	ssup = (SortSupport) palloc0(sizeof(SortSupportData) * ndims);
+
+	/* collect and deduplicate values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+		int			ndistinct;
+		StdAnalyzeData *tmp = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* copy important info about the data type (length, by-value) */
+		info[i].typlen = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for values in the attribute and collect them */
+		values[i] = (Datum *) palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			/* skip NULL values - we don't need to serialize them */
+			if (mcvlist->items[j]->isnull[i])
+				continue;
+
+			values[i][counts[i]] = mcvlist->items[j]->values[i];
+			counts[i] += 1;
+		}
+
+		/* there are just NULL values in this dimension, we're done */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate the data */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicate values, but keep the
+		 * ordering (so that we can do bsearch later). We know there's at
+		 * least one item as (counts[i] != 0), so we can skip the first
+		 * element.
+		 */
+		ndistinct = 1;			/* number of distinct values */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if the value is the same as the previous one, we can skip it */
+			if (!compare_datums_simple(values[i][j - 1], values[i][j], &ssup[i]))
+				continue;
+
+			values[i][ndistinct] = values[i][j];
+			ndistinct += 1;
+		}
+
+		/* we must not exceed UINT16_MAX, as we use uint16 indexes */
+		Assert(ndistinct <= UINT16_MAX);
+
+		/*
+		 * Store additional info about the attribute - number of deduplicated
+		 * values, and also size of the serialized data. For fixed-length data
+		 * types this is trivial to compute, for varwidth types we need to
+		 * actually walk the array and sum the sizes.
+		 */
+		info[i].nvalues = ndistinct;
+
+		if (info[i].typlen > 0) /* fixed-length data types */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)	/* varlena */
+		{
+			info[i].nbytes = 0;
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		}
+		else if (info[i].typlen == -2)	/* cstring */
+		{
+			info[i].nbytes = 0;
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		}
+
+		/* we know (count>0) so there must be some data */
+		Assert(info[i].nbytes > 0);
+	}
+
+	/*
+	 * Now we can finally compute how much space we'll actually need for the
+	 * serialized MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena - magic (4B) - type (4B) - ndimensions (4B) -
+	 * nitems (4B) - info (ndim * sizeof(DimensionInfo) - arrays of values for
+	 * each dimension - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and then we
+	 * will place all the data (values + indexes).
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (1024 * 1024))
+		elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea *) palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* 'data' points to the current position in the output buffer */
+	data = VARDATA(output);
+
+	/* MCV list header (number of items, ...) */
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	/* information about the attributes */
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* now serialize the deduplicated values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char	   *tmp = data; /* remember the starting point */
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			Datum		v = values[i][j];
+
+			if (info[i].typbyval)		/* passed by value */
+			{
+				memcpy(data, &v, info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)		/* pased by reference */
+			{
+				memcpy(data, DatumGetPointer(v), info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)		/* varlena */
+			{
+				memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+				data += VARSIZE_ANY(v);
+			}
+			else if (info[i].typlen == -2)		/* cstring */
+			{
+				memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
+				data += strlen(DatumGetPointer(v)) + 1; /* terminator */
+			}
+		}
+
+		/* make sure we got exactly the amount of data we expected */
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* finally serialize the items, with uint16 indexes instead of the values */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem		mcvitem = mcvlist->items[i];
+
+		/* don't write beyond the allocated space */
+		Assert(data <= (char *) output + total_length - itemsize);
+
+		/* reset the item (we only allocate it once and reuse it) */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			Datum	   *v = NULL;
+
+			/* do the lookup only for non-NULL values */
+			if (mcvlist->items[i]->isnull[j])
+				continue;
+
+			v = (Datum *) bsearch_arg(&mcvitem->values[j], values[j],
+									  info[j].nvalues, sizeof(Datum),
+									  compare_scalars_simple, &ssup[j]);
+
+			Assert(v != NULL);	/* serialization or deduplication error */
+
+			/* compute index within the array */
+			ITEM_INDEXES(item)[j] = (v - values[j]);
+
+			/* check the index is within expected bounds */
+			Assert(ITEM_INDEXES(item)[j] >= 0);
+			Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims), mcvitem->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims), &mcvitem->frequency, sizeof(double));
+
+		/* copy the serialized item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char *) output) == total_length);
+
+	return output;
+}
+
+/*
+ * deserialize MCV list from the varlena value
+ *
+ *
+ * We deserialize the MCV list fully, because we don't expect there bo be a lot
+ * of duplicate values. But perhaps we should keep the MCV in serialized form
+ * just like histograms.
+ */
+MCVList
+deserialize_mv_mcvlist(bytea *data)
+{
+	int			i,
+				j;
+	Size		expected_size;
+	MCVList		mcvlist;
+	char	   *tmp;
+
+	int			ndims,
+				nitems,
+				itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16	   *indexes = NULL;
+	Datum	  **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int			bufflen;
+	char	   *buff;
+	char	   *ptr;
+
+	/* buffer used for the result */
+	int			rbufflen;
+	char	   *rbuff;
+	char	   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	/* we can't deserialize the MCV if there's not even a complete header */
+	expected_size = offsetof(MCVListData, items);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData, items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList) palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform further sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData, items));
+	tmp += offsetof(MCVListData, items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert((nitems > 0) && (nitems <= MVSTAT_MCVLIST_MAX_ITEMS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Check amount of data including DimensionInfo for all dimensions and
+	 * also the serialized items (including uint16 indexes). Also, walk
+	 * through the dimension information and add it to the sum.
+	 */
+	expected_size += ndims * sizeof(DimensionInfo) +
+		(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo *) (tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+	{
+		Assert(info[i].nvalues >= 0);
+		Assert(info[i].nbytes >= 0);
+
+		expected_size += info[i].nbytes;
+	}
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * Allocate one large chunk of memory for the intermediate data, needed
+	 * only for deserializing the MCV list (and allocate densely to minimize
+	 * the palloc overhead).
+	 *
+	 * Let's see how much space we'll actually need, and also include space
+	 * for the array with pointers.
+	 */
+	bufflen = sizeof(Datum *) * ndims;	/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (!(info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc0(bufflen);
+	ptr = buff;
+
+	values = (Datum **) buff;
+	ptr += (sizeof(Datum *) * ndims);
+
+	/*
+	 * XXX This uses pointers to the original data array (the types not passed
+	 * by value), so when someone frees the memory, e.g. by doing something
+	 * like this:
+	 *
+	 * bytea * data = ... fetch the data from catalog ... MCVList mcvlist =
+	 * deserialize_mcv_list(data); pfree(data);
+	 *
+	 * then 'mcvlist' references the freed memory. Should copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum *) tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum *) ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the other types need a chunk of the buffer */
+			values[i] = (Datum *) ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1);	/* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should have exhausted the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for all the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum) * ndims + sizeof(bool) * ndims) * nitems;
+
+	rbuff = palloc0(rbufflen);
+	rptr = rbuff;
+
+	mcvlist->items = (MCVItem *) rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem		item = (MCVItem) rptr;
+
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum *) rptr;
+		rptr += (sizeof(Datum) * ndims);
+
+		item->isnull = (bool *) rptr;
+		rptr += (sizeof(bool) * ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (!item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char *) data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char *) data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if
+ * the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	int			call_cntr;
+	int			max_calls;
+	TupleDesc	tupdesc;
+	AttInMetadata *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext oldcontext;
+		MCVList		mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/* build metadata needed later to produce tuples from raw C-strings */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)	/* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char	   *buff = palloc0(1024);
+		char	   *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList) funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple. This should
+		 * be an array of C strings which will be processed later by the type
+		 * input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid *) palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool		isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);		/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum		val,
+						valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions - 1)
+				format = "%s, %s}";
+
+			if (item->isnull[i])
+				valout = CStringGetDatum("NULL");
+			else
+			{
+				val = item->values[i];
+				valout = FunctionCall1(&fmgrinfo[i], val);
+			}
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency); /* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else	/* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+/*
+ * pg_mcv_list_in		- input routine for type PG_MCV_LIST.
+ *
+ * pg_mcv_list is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_mcv_list_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_mcv_list")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+
+/*
+ * pg_mcv_list_out		- output routine for type PG_MCV_LIST.
+ *
+ * MCV lists are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ *
+ * FIXME not implemented yet, returning dummy value
+ */
+Datum
+pg_mcv_list_out(PG_FUNCTION_ARGS)
+{
+	return byteaout(fcinfo);
+}
+
+/*
+ * pg_mcv_list_recv		- binary input routine for type PG_MCV_LIST.
+ */
+Datum
+pg_mcv_list_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_mcv_list")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_mcv_list_send		- binary output routine for type PG_MCV_LIST.
+ *
+ * XXX MCV lists are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_mcv_list_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 1f1050b..e2220a9 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2298,8 +2298,8 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 							  "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-							  "  ndist_enabled,\n"
-							  "  ndist_built,\n"
+							  "  ndist_enabled, deps_enabled, mcv_enabled,\n"
+							  "  ndist_built, deps_built, mcv_built,\n"
 							  "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 							  "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2317,6 +2317,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool		first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2326,10 +2328,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/* options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (!first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-									  PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+									  PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 1ecf1cb..80aa9f5 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -262,6 +262,11 @@ DATA(insert (  3343	 25    0 i i ));
 DATA(insert (  3353	 17    0 i b ));
 DATA(insert (  3353	 25    0 i i ));
 
+/* pg_mcv_list can be coerced to, but not from, bytea and text */
+DATA(insert (  441	 17    0 i b ));
+DATA(insert (  441	 25    0 i i ));
+
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index e119cb7..34049d6 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,10 +39,12 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		ndist_enabled;	/* build ndist coefficient? */
 	bool		deps_enabled;	/* analyze dependencies? */
+	bool		mcv_enabled;	/* build MCV list? */
 
 	/* statistics that are available (if requested) */
 	bool		ndist_built;	/* ndistinct coeff built */
 	bool		deps_built;		/* dependencies were built */
+	bool		mcv_built;		/* MCV list was built */
 
 	/*
 	 * variable-length fields start here, but we allow direct access to
@@ -53,6 +55,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	pg_ndistinct		standist;		/* ndistinct coeff (serialized) */
 	pg_dependencies		stadeps;		/* dependencies (serialized) */
+	pg_mcv_list			stamcv;			/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -68,17 +71,20 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					11
+#define Natts_pg_mv_statistic					14
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_ndist_enabled		5
 #define Anum_pg_mv_statistic_deps_enabled		6
-#define Anum_pg_mv_statistic_ndist_built		7
-#define Anum_pg_mv_statistic_deps_built			8
-#define Anum_pg_mv_statistic_stakeys			9
-#define Anum_pg_mv_statistic_standist			10
-#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_mcv_enabled		7
+#define Anum_pg_mv_statistic_ndist_built		8
+#define Anum_pg_mv_statistic_deps_built			9
+#define Anum_pg_mv_statistic_mcv_built			10
+#define Anum_pg_mv_statistic_stakeys			11
+#define Anum_pg_mv_statistic_standist			12
+#define Anum_pg_mv_statistic_stadeps			13
+#define Anum_pg_mv_statistic_stamcv				14
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0c78361..6be685c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2722,6 +2722,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "441" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
+
 DATA(insert OID = 3344 (  pg_ndistinct_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3343 "2275" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_in _null_ _null_ _null_ ));
 DESCR("I/O");
 DATA(insert OID = 3345 (  pg_ndistinct_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3343" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_out _null_ _null_ _null_ ));
@@ -2740,6 +2745,15 @@ DESCR("I/O");
 DATA(insert OID = 3357 (  pg_dependencies_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3353" _null_ _null_ _null_ _null_ _null_	pg_dependencies_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 442 (  pg_mcv_list_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 441 "2275" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 443 (  pg_mcv_list_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "441" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 444 (  pg_mcv_list_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 441 "2281" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 445 (  pg_mcv_list_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "441" _null_ _null_ _null_ _null_ _null_	pg_mcv_list_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 250952b..4621703 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -372,6 +372,10 @@ DATA(insert OID = 3353 ( pg_dependencies		PGNSP PGUID -1 f b S f t \054 0 0 0 pg
 DESCR("multivariate histogram");
 #define PGDEPENDENCIESOID	3353
 
+DATA(insert OID = 441 ( pg_mcv_list		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_mcv_list_in pg_mcv_list_out pg_mcv_list_recv pg_mcv_list_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate MCV list");
+#define PGMCVLISTOID	441
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 8b7db72..b37835c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -674,12 +674,14 @@ typedef struct MVStatisticInfo
 	RelOptInfo *rel;			/* back-link to index's table */
 
 	/* enabled statistics */
-	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		ndist_enabled;	/* ndistinct coefficient enabled */
+	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
-	bool		deps_built;		/* functional dependencies built */
 	bool		ndist_built;	/* ndistinct coefficient built */
+	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 4cb09e7..fe95bad 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -621,6 +621,10 @@ extern Datum pg_dependencies_in(PG_FUNCTION_ARGS);
 extern Datum pg_dependencies_out(PG_FUNCTION_ARGS);
 extern Datum pg_dependencies_recv(PG_FUNCTION_ARGS);
 extern Datum pg_dependencies_send(PG_FUNCTION_ARGS);
+extern Datum pg_mcv_list_in(PG_FUNCTION_ARGS);
+extern Datum pg_mcv_list_out(PG_FUNCTION_ARGS);
+extern Datum pg_mcv_list_recv(PG_FUNCTION_ARGS);
+extern Datum pg_mcv_list_send(PG_FUNCTION_ARGS);
 
 /* regexp.c */
 extern Datum nameregexeq(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 3ad4e48..b17fcba 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
+
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
 #define MVSTAT_NDISTINCT_MAGIC		0xA352BFA4		/* marks serialized bytea */
@@ -65,6 +73,42 @@ typedef struct MVDependenciesData
 
 typedef MVDependenciesData *MVDependencies;
 
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2		/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1				/* basic MCV list type */
+
+/* max items in MCV list (mostly arbitrary number */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192
+
+/*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData
+{
+	double		frequency;		/* frequency of this combination */
+	bool	   *isnull;			/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;			/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem    *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
 bool dependency_implies_attribute(MVDependency dependency, AttrNumber attnum,
 								  int16 *attmap);
 bool dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
@@ -72,13 +116,30 @@ bool dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
 
 MVNDistinct		load_mv_ndistinct(Oid mvoid);
 MVDependencies	load_mv_dependencies(Oid mvoid);
+MCVList			load_mv_mcvlist(Oid mvoid);
 
 bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
 bytea *serialize_mv_dependencies(MVDependencies dependencies);
+bytea *serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVNDistinct deserialize_mv_ndistinct(bytea *data);
 MVDependencies deserialize_mv_dependencies(bytea *data);
+MCVList deserialize_mv_mcvlist(bytea *data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector *stakeys);
+
+int2vector *find_mv_attnums(Oid mvoid, Oid *relid);
+
+/* functions for inspecting the statistics */
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+
 
 MVNDistinct build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
 							   int2vector *attrs, VacAttrStats **stats);
@@ -87,11 +148,15 @@ MVDependencies build_mv_dependencies(int numrows, HeapTuple *rows,
 					  int2vector *attrs,
 					  VacAttrStats **stats);
 
+MCVList build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
+
 void build_mv_stats(Relation onerel, double totalrows,
 			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats);
 
-void update_mv_stats(Oid relid, MVNDistinct ndistinct, MVDependencies dependencies,
+void update_mv_stats(Oid relid,
+					 MVNDistinct ndistinct, MVDependencies dependencies, MCVList mcvlist,
 					 int2vector *attrs, VacAttrStats **stats);
 
 #endif
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..d8ba619
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,198 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s4 WITH (mcv) ON (unknown_column) FROM mcv_list;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s4 WITH (mcv) ON (a) FROM mcv_list;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s4 WITH (mcv) ON (a, a) FROM mcv_list;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s4 WITH (mcv) ON (a, a, b) FROM mcv_list;
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s4 WITH (unknown_option) ON (a, b, c) FROM mcv_list;
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s4 WITH (mcv) ON (a, b, c) FROM mcv_list;
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/100, i/200, i/400 FROM generate_series(1,10000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s5 WITH (mcv) ON (a, b, c) FROM mcv_list;
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/100, i/200, i/400 FROM generate_series(1,10000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/100 = 0 THEN NULL ELSE i/100 END),
+       (CASE WHEN i/200 = 0 THEN NULL ELSE i/200 END),
+       (CASE WHEN i/400 = 0 THEN NULL ELSE i/400 END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s6 WITH (mcv) ON (a, b, c, d) FROM mcv_list;
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index db1cf8a..9969c10 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -819,11 +819,12 @@ WHERE c.castmethod = 'b' AND
  pg_node_tree      | text              |        0 | i
  pg_ndistinct      | bytea             |        0 | i
  pg_dependencies   | bytea             |        0 | i
+ pg_mcv_list       | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(9 rows)
+(10 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ea3f51d..d14d864 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1381,7 +1381,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     s.staname,
     s.stakeys AS attnums,
     length((s.standist)::bytea) AS ndistbytes,
-    length((s.stadeps)::bytea) AS depsbytes
+    length((s.stadeps)::bytea) AS depsbytes,
+    length((s.stamcv)::bytea) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 8b849b9..b810e71 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -72,9 +72,10 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
   194 | pg_node_tree
  3343 | pg_ndistinct
  3353 | pg_dependencies
+  441 | pg_mcv_list
   210 | smgr
   705 | unknown
-(5 rows)
+(6 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index ecb6f04..d6c3cc0 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -115,4 +115,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_ndistinct mv_dependencies
+test: mv_ndistinct mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index fa0e993..2394d74 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -171,3 +171,4 @@ test: event_trigger
 test: stats
 test: mv_ndistinct
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..693288f
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,169 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s4 WITH (mcv) ON (unknown_column) FROM mcv_list;
+
+-- single column
+CREATE STATISTICS s4 WITH (mcv) ON (a) FROM mcv_list;
+
+-- single column, duplicated
+CREATE STATISTICS s4 WITH (mcv) ON (a, a) FROM mcv_list;
+
+-- two columns, one duplicated
+CREATE STATISTICS s4 WITH (mcv) ON (a, a, b) FROM mcv_list;
+
+-- unknown option
+CREATE STATISTICS s4 WITH (unknown_option) ON (a, b, c) FROM mcv_list;
+
+-- correct command
+CREATE STATISTICS s4 WITH (mcv) ON (a, b, c) FROM mcv_list;
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/100, i/200, i/400 FROM generate_series(1,10000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s5 WITH (mcv) ON (a, b, c) FROM mcv_list;
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/100, i/200, i/400 FROM generate_series(1,10000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/100 = 0 THEN NULL ELSE i/100 END),
+       (CASE WHEN i/200 = 0 THEN NULL ELSE i/200 END),
+       (CASE WHEN i/400 = 0 THEN NULL ELSE i/400 END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s6 WITH (mcv) ON (a, b, c, d) FROM mcv_list;
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.5.5

0006-PATCH-multivariate-histograms-v22.patchbinary/octet-stream; name=0006-PATCH-multivariate-histograms-v22.patchDownload

From 1330a200b59b7f0f77374e8414f615a818e7ab6f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:38:35 +0200
Subject: [PATCH 6/9] PATCH: multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries
- pg_histogram data type (varlena-based)

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.

A new varlena-based data type for storing serialized histograms.
---
 doc/src/sgml/catalogs.sgml                 |   30 +
 doc/src/sgml/planstats.sgml                |  125 ++
 doc/src/sgml/ref/create_statistics.sgml    |   35 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   11 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  606 +++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/README.histogram |  299 ++++
 src/backend/utils/mvstats/README.stats     |    2 +
 src/backend/utils/mvstats/common.c         |   31 +-
 src/backend/utils/mvstats/common.h         |    8 +-
 src/backend/utils/mvstats/histogram.c      | 2123 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   15 +-
 src/include/catalog/pg_cast.h              |    3 +
 src/include/catalog/pg_mv_statistic.h      |   22 +-
 src/include/catalog/pg_proc.h              |   13 +
 src/include/catalog/pg_type.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/builtins.h               |    4 +
 src/include/utils/mvstats.h                |  128 +-
 src/test/regress/expected/mv_histogram.out |  198 +++
 src/test/regress/expected/opr_sanity.out   |    3 +-
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/expected/type_sanity.out  |    3 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  167 +++
 29 files changed, 3802 insertions(+), 49 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.histogram
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index b82ca13..f422a92 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4292,6 +4292,17 @@
      </row>
 
      <row>
+      <entry><structfield>hist_enabled</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, histogram will be computed for the combination of columns,
+       covered by the statistics. This does not mean the histogram is already
+       computed, though.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>ndist_built</structfield></entry>
       <entry><type>bool</type></entry>
       <entry></entry>
@@ -4322,6 +4333,16 @@
      </row>
 
      <row>
+      <entry><structfield>hist_built</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, histogram is already computed and available for use during query
+       estimation.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>stakeys</structfield></entry>
       <entry><type>int2vector</type></entry>
       <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
@@ -4359,6 +4380,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stahist</structfield></entry>
+      <entry><type>pg_histogram</type></entry>
+      <entry></entry>
+      <entry>
+       Histogram, serialized as <structname>pg_histogram</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 1ee4293..3c80966 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -914,6 +914,131 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
 
   </sect2>
 
+  <sect2 id="mv-histograms">
+   <title>Histograms</title>
+
+   <para>
+    <acronym>MCV</> lists, introduced in the previous section, work very well
+    for low-cardinality columns (i.e. columns with only very few distinct
+    values), and for columns with a few very frequent values (and possibly
+    many rare ones). Histograms, a generalization of per-column histograms
+    briefly described in <xref linkend="row-estimation-examples">, are meant
+    to address the other cases, i.e. high-cardinality columns, particularly
+    when there are no frequent values.
+   </para>
+
+   <para>
+    Although the example data we've used so far is not a very good match, we
+    can try creating a histogram instead of the <acronym>MCV</> list. With the
+    histogram in place, you may get a plan like this:
+
+<programlisting>
+DROP STATISTICS s2;
+CREATE STATISTICS s3 ON t (a,b) WITH (histogram);
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.035..2.967 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.227 ms
+ Execution time: 3.189 ms
+(5 rows)
+</programlisting>
+
+    Which seems quite accurate, however for other combinations of values the
+    results may be much worse, as illustrated by the following query
+
+<programlisting>
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=2.771..2.771 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.179 ms
+ Execution time: 2.812 ms
+(5 rows)
+</programlisting>
+
+    This is due to histograms tracking ranges of values, not individual values.
+    That means it's only possible say whether a bucket may contain items
+    matching the conditions, but it's unclear how many such tuples there
+    actually are in the bucket. Moreover, for larger tables only a small subset
+    of rows gets sampled by <command>ANALYZE</>, causing small variations in
+    the shape of buckets.
+   </para>
+
+   <para>
+    To inspect details of the histogram, we can look into the
+    <structname>pg_mv_stats</> view
+
+<programlisting>
+SELECT tablename, staname, attnums, histbytes, histinfo
+  FROM pg_mv_stats WHERE staname = 's3';
+ tablename | staname | attnums | histbytes |  histinfo   
+-----------+---------+---------+-----------+-------------
+ t         | s3      | 1 2     |      1928 | nbuckets=64
+(1 row)
+</programlisting>
+
+    This shows the histogram has 64 buckets, but as we know there are 100
+    distinct combinations of values in the two columns. This means there are
+    buckets containing multiple combinations, causing the inaccuracy.
+   </para>
+
+   <para>
+    Similarly to <acronym>MCV</> lists, we can inspect histogram contents
+    using a function called <function>pg_mv_histogram_buckets</>.
+
+<programlisting>
+test=# SELECT * FROM pg_mv_histogram_buckets((SELECT oid FROM pg_mv_statistic WHERE staname = 's3'), 0);
+ index | minvals | maxvals | nullsonly | mininclusive | maxinclusive | frequency | density  | bucket_volume 
+-------+---------+---------+-----------+--------------+--------------+-----------+----------+---------------
+     0 | {0,0}   | {3,1}   | {f,f}     | {t,t}        | {f,f}        |      0.01 |     1.68 |      0.005952
+     1 | {50,0}  | {51,3}  | {f,f}     | {t,t}        | {f,f}        |      0.01 |     1.12 |      0.008929
+     2 | {0,25}  | {26,31} | {f,f}     | {t,t}        | {f,f}        |      0.01 |     0.28 |      0.035714
+...
+    61 | {60,0}  | {99,12} | {f,f}     | {t,t}        | {t,f}        |      0.02 | 0.124444 |      0.160714
+    62 | {34,35} | {37,49} | {f,f}     | {t,t}        | {t,t}        |      0.02 |     0.96 |      0.020833
+    63 | {84,35} | {87,49} | {f,f}     | {t,t}        | {t,t}        |      0.02 |     0.96 |      0.020833
+(64 rows)
+</programlisting>
+
+    Which confirms there are 64 buckets, with frequencies ranging between 1%
+    and 2%. The <structfield>minvals</> and <structfield>maxvals</> show the
+    bucket boundaries, <structfield>nullsonly</> shows which columns contain
+    only null values (in the given bucket).
+   </para>
+
+   <para>
+    Similarly to <acronym>MCV</> lists, the planner applies all conditions to
+    the buckets, and sums the frequencies of the matching ones. For details,
+    see <function>clauselist_mv_selectivity_histogram</> function in
+    <filename>clausesel.c</>.
+   </para>
+
+   <para>
+    It's also possible to build <acronym>MCV</> lists and a histogram, in which
+    case <command>ANALYZE</> will build a <acronym>MCV</> lists with the most
+    frequent values, and a histogram on the remaining part of the sample.
+
+<programlisting>
+DROP STATISTICS s3;
+CREATE STATISTICS s4 ON t (a,b) WITH (mcv, histogram);
+</programlisting>
+
+    In this case the <acronym>MCV</> list and histogram are treated as a single
+    composed statistics.
+   </para>
+
+   <para>
+    For additional information about multivariate histograms, see
+    <filename>src/backend/utils/mvstats/README.histogram</>.
+   </para>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index fc97b16..5b1f1ca 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -125,6 +125,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>mcv</> (<type>boolean</>)</term>
     <listitem>
      <para>
@@ -202,6 +211,32 @@ EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
 </programlisting>
   </para>
 
+  <para>
+   Create table <structname>t3</> with two strongly correlated columns, and
+   a histogram on those two columns:
+
+<programlisting>
+CREATE TABLE t3 (
+    a   float,
+    b   float
+);
+
+INSERT INTO t3 SELECT mod(i,1000), mod(i,1000) + 50 * (r - 0.5) FROM (
+                   SELECT i, random() r FROM generate_series(1,1000000) s(i)
+                 ) foo;
+
+CREATE STATISTICS s3 WITH (histogram) ON (a, b) FROM t3;
+
+ANALYZE t3;
+
+-- small overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
+
+-- no overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);
+</programlisting>
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7ac3c4c..7c3532c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -190,7 +190,9 @@ CREATE VIEW pg_mv_stats AS
         length(s.standist::bytea) AS ndistbytes,
         length(S.stadeps::bytea) AS depsbytes,
         length(S.stamcv::bytea) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist::bytea) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index e428c69..6c001ed 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -77,7 +77,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool		build_ndistinct = false,
 				build_dependencies = false,
-				build_mcv = false;
+				build_mcv = false,
+				build_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -172,6 +173,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 			build_dependencies = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -180,10 +183,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* Make sure there's at least one statistics type specified. */
-	if (!(build_ndistinct || build_dependencies || build_mcv))
+	if (!(build_ndistinct || build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (ndistinct, dependencies, mcv) requested")));
+				 errmsg("no statistics type (ndistinct, dependencies, mcv, histogram) requested")));
 
 	stakeys = buildint2vector(attnums, numcols);
 
@@ -207,10 +210,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(build_ndistinct);
 	values[Anum_pg_mv_statistic_deps_enabled - 1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled - 1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled - 1] = BoolGetDatum(build_histogram);
 
 	nulls[Anum_pg_mv_statistic_standist - 1] = true;
 	nulls[Anum_pg_mv_statistic_stadeps - 1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv - 1] = true;
+	nulls[Anum_pg_mv_statistic_stahist - 1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ea3db02..901a328 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2182,11 +2182,13 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	WRITE_BOOL_FIELD(ndist_enabled);
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(ndist_built);
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c422dd5..0243b4d 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		STATS_TYPE_FDEPS	0x01
 #define		STATS_TYPE_MCV		0x02
+#define		STATS_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 						int type);
@@ -77,12 +78,21 @@ static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 								  List *clauses, MVStatisticInfo *mvstats,
 								  bool *fullmatch, Selectivity *lowsel);
 
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
+
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 							int2vector *stakeys, MCVList mcvlist,
 							int nmatches, char *matches,
 							Selectivity *lowsel, bool *fullmatch,
 							bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char *matches,
+							  bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List *find_stats(PlannerInfo *root, Index relid);
@@ -93,6 +103,7 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
 #define UPDATE_RESULT(m,r,isor) \
 	(m) = (isor) ? (Max(m,r)) : (Min(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -121,7 +132,7 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
  *
  * First we try to reduce the list of clauses by applying (soft) functional
  * dependencies, and then we try to estimate the selectivity of the reduced
- * list of clauses using the multivariate MCV list.
+ * list of clauses using the multivariate MCV list and histograms.
  *
  * Finally we remove the portion of clauses estimated using multivariate stats,
  * and process the rest of the clauses using the regular per-column stats.
@@ -208,16 +219,17 @@ clauselist_selectivity(PlannerInfo *root,
 	 * If there are no such stats or not enough attributes, don't waste time
 	 * simply skip to estimation using the plain per-column stats.
 	 */
-	if (has_stats(stats, STATS_TYPE_MCV) &&
-		(count_mv_attnums(clauses, relid, STATS_TYPE_MCV) >= 2))
+	if (has_stats(stats, STATS_TYPE_MCV | STATS_TYPE_HIST) &&
+		(count_mv_attnums(clauses, relid,
+						  STATS_TYPE_MCV | STATS_TYPE_HIST) >= 2))
 	{
 		/* collect attributes from the compatible conditions */
 		Bitmapset  *mvattnums = collect_mv_attnums(clauses, relid,
-												   STATS_TYPE_MCV);
+									  STATS_TYPE_MCV | STATS_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
 		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums,
-													   STATS_TYPE_MCV);
+									  STATS_TYPE_MCV | STATS_TYPE_HIST);
 
 		if (mvstat != NULL)		/* we have a matching stats */
 		{
@@ -226,7 +238,7 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* split the clauselist into regular and mv-clauses */
 			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, STATS_TYPE_MCV);
+							  mvstat, STATS_TYPE_MCV | STATS_TYPE_HIST);
 
 			/* we've chosen the histogram to match the clauses */
 			Assert(mvclauses != NIL);
@@ -1178,6 +1190,8 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool		fullmatch = false;
+	Selectivity s1 = 0.0,
+				s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound for
@@ -1191,9 +1205,26 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 * order by selectivity (to optimize the MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
-											 &fullmatch, &mcv_low);
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and the
+	 * estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/*
+	 * TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 * selectivity as upper bound
+	 */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1379,7 +1410,8 @@ choose_mv_statistics(List *stats, Bitmapset *attnums, int types)
 
 		/* skip statistics not matching any of the requested types */
 		if (! ((info->deps_built && (STATS_TYPE_FDEPS & types)) ||
-			   (info->mcv_built && (STATS_TYPE_MCV & types))))
+			   (info->mcv_built && (STATS_TYPE_MCV & types)) ||
+			   (info->hist_built && (STATS_TYPE_HIST & types))))
 			continue;
 
 		/* count columns covered by the statistics */
@@ -1609,7 +1641,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 			case F_SCALARGTSEL:
 
 				/* not compatible with functional dependencies */
-				if (!(context->types & STATS_TYPE_MCV))
+				if (!(context->types & (STATS_TYPE_MCV | STATS_TYPE_HIST)))
 					return true;	/* terminate */
 
 				break;
@@ -1677,6 +1709,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & STATS_TYPE_MCV) && stat->mcv_built)
 		return true;
 
+	if ((type & STATS_TYPE_HIST) && stat->hist_built)
+		return true;
+
 	return false;
 }
 
@@ -1695,6 +1730,9 @@ has_stats(List *stats, int type)
 		/* terminate if we've found at least one matching statistics */
 		if (stats_type_matches(stat, type))
 			return true;
+
+		if ((type & STATS_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -1725,12 +1763,12 @@ find_stats(PlannerInfo *root, Index relid)
  *
  * The algorithm works like this:
  *
- *	 1) mark all items as 'match'
- *	 2) walk through all the clauses
- *	 3) for a particular clause, walk through all the items
- *	 4) skip items that are already 'no match'
- *	 5) check clause for items that still match
- *	 6) sum frequencies for items to get selectivity
+ * 1) mark all items as 'match'
+ * 2) walk through all the clauses
+ * 3) for a particular clause, walk through all the items
+ * 4) skip items that are already 'no match'
+ * 5) check clause for items that still match
+ * 6) sum frequencies for items to get selectivity
  *
  * The function also returns the frequency of the least frequent item
  * on the MCV list, which may be useful for clamping estimate from the
@@ -2116,3 +2154,537 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *	 1) mark all buckets as 'full match'
+ *	 2) walk through all the clauses
+ *	 3) for a particular clause, walk through all the buckets
+ *	 4) skip buckets that are already 'no match'
+ *	 5) check clause for buckets that still match (at least partially)
+ *	 6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO: This might use a similar shortcut to MCV lists - count buckets
+ * marked as partial/full match, and terminate once this drop to 0.
+ * Not sure if it's really worth it - for MCV lists a situation like
+ * this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int			i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int			nmatches = 0;
+	char	   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (!mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert(mvhist != NULL);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default all
+	 * buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char) * mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram, so that
+		 * we can 'scale' the selectivity properly (e.g. when only 50% of the
+		 * sample got into the histogram, and the rest is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency" for the
+		 * whole histogram, which might save us some time spent accessing the
+		 * not-matching part of the histogram. Although it's likely in a
+		 * cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/* cached result of bucket boundary comparison for a single dimension */
+
+#define HIST_CACHE_NOT_FOUND		0x00
+#define HIST_CACHE_FALSE			0x01
+#define HIST_CACHE_TRUE				0x03
+#define HIST_CACHE_MASK				0x02
+
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value, Datum max_value,
+					  int min_index, int max_index,
+					  bool min_include, bool max_include,
+					  char *callcache)
+{
+	bool		a,
+				b;
+
+	char		min_cached = callcache[min_index];
+	char		max_cached = callcache[max_index];
+
+	/*
+	 * First some quick checks on equality - if any of the boundaries equals,
+	 * we have a partial match (so no need to call the comparator).
+	 */
+	if (((min_value == constvalue) && (min_include)) ||
+		((max_value == constvalue) && (max_include)))
+		return MVSTATS_MATCH_PARTIAL;
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	/*
+	 * If result for the bucket lower bound not in cache, evaluate the
+	 * function and store the result in the cache.
+	 */
+	if (!min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, min_value));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/* And do the same for the upper bound. */
+	if (!max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, max_value));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	return (a ^ b) ? MVSTATS_MATCH_PARTIAL : MVSTATS_MATCH_NONE;
+}
+
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+							 Datum min_value, Datum max_value,
+							 int min_index, int max_index,
+							 bool min_include, bool max_include,
+							 char *callcache, bool isgt)
+{
+	char		min_cached = callcache[min_index];
+	char		max_cached = callcache[max_index];
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	bool		a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	bool		b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	if (!min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   min_value,
+										   constvalue));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	if (!max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   max_value,
+										   constvalue));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/*
+	 * Now, we need to combine both results into the final answer, and we need
+	 * to be careful about the 'isgt' variable which kinda inverts the
+	 * meaning.
+	 *
+	 * First, we handle the case when each boundary returns different results.
+	 * In that case the outcome can only be 'partial' match.
+	 */
+	if (a != b)
+		return MVSTATS_MATCH_PARTIAL;
+
+	/*
+	 * When the results are the same, then it depends on the 'isgt' value.
+	 * There are four options:
+	 *
+	 * isgt=false a=b=true	=> full match isgt=false a=b=false => empty
+	 * isgt=true  a=b=true	=> empty isgt=true	a=b=false => full match
+	 *
+	 * We'll cheat a bit, because we know that (a=b) so we'll use just one of
+	 * them.
+	 */
+	if (isgt)
+		return (!a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+	else
+		return (a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ * than two possible values for each item - no match, partial
+ * match and full match. So we need 2 bits per item.
+ *
+ * TODO: This works with 'bitmap' where each item is represented as a
+ * char, which is slightly wasteful. Instead, we could use a bitmap
+ * with 2 bits per item, reducing the size to ~1/4. By using values
+ * 0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ * might be performed just like for simple bitmap by using & and |,
+ * which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char *matches,
+							  bool is_or)
+{
+	int			i;
+	ListCell   *l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses, to
+	 * minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte per
+	 * value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called 0x01 - called, result is 'false' 0x03 - called,
+	 * result is 'true'
+	 */
+	char	   *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach(l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node *) ((RestrictInfo *) clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc; /* operator */
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				RegProcedure oprrest = get_oprrest(expr->opno);
+
+				Var		   *var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const	   *cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool		isgt = (!varonleft);
+
+				TypeCacheEntry *typecache
+				= lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+
+				/* lookup dimension for the attribute */
+				int			idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the
+				 * bitmap
+				 *
+				 * We already know the clauses use suitable operators (because
+				 * that's how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					char		res = MVSTATS_MATCH_NONE;
+
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum		minval,
+								maxval;
+					bool		mininclude,
+								maxinclude;
+					int			minidx,
+								maxidx;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no
+					 * match' (and then skip them). For OR-lists this is not
+					 * possible.
+					 */
+					if ((!is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is
+					 * impotant considering how we update the info (we only
+					 * lower the match). We can't really do anything about the
+					 * MATCH_PARTIAL buckets.
+					 */
+					if ((!is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minidx = bucket->min[idx];
+					maxidx = bucket->max[idx];
+
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mininclude = bucket->min_inclusive[idx];
+					maxinclude = bucket->max_inclusive[idx];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar
+					 * optimization as for the MCV lists:
+					 *
+					 * (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 * (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 * But it's more complex because of the partial matches.
+					 */
+
+					/*
+					 * If it's not a "<" or ">" or "=" operator, just ignore
+					 * the clause. Otherwise note the relid and attnum for the
+					 * variable.
+					 *
+					 * TODO I'm really unsure the handling of 'isgt' flag
+					 * (that is, clauses with reverse order of
+					 * variable/constant) is correct. I wouldn't be surprised
+					 * if there was some mixup. Using the lt/gt operators
+					 * instead of messing with the opproc could make it
+					 * simpler. It would however be using a different operator
+					 * than the query, although it's not any shadier than
+					 * using the selectivity function as is done currently.
+					 */
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:		/* Var < Const */
+						case F_SCALARGTSEL:		/* Var > Const */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															   minval, maxval,
+															   minidx, maxidx,
+													  mininclude, maxinclude,
+															callcache, isgt);
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the
+							 * bucket, using the lt operator, and we also
+							 * check for equality with the boundaries.
+							 */
+
+							res = bucket_contains_value(ltproc, cst->constvalue,
+														minval, maxval,
+														minidx, maxidx,
+													  mininclude, maxinclude,
+														callcache);
+							break;
+					}
+
+					UPDATE_RESULT(matches[i], res, is_or);
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest   *expr = (NullTest *) clause;
+			Var		   *var = (Var *) (expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int			idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We
+			 * can skip items that were already ruled out, and terminate if
+			 * there are no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is
+				 * impotant considering how we update the info (we only lower
+				 * the match)
+				 */
+				if ((!is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the bucket, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (!bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/*
+			 * AND/OR clause, with all clauses compatible with the selected MV
+			 * stat
+			 */
+
+			int			i;
+			BoolExpr   *orclause = ((BoolExpr *) clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int			or_nmatches = 0;
+			char	   *or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char) * or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char) * or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+														stakeys, mvhist,
+								 or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * Merge the result into the bitmap (Min for AND, Max for OR).
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index ff93ddb..f7615f5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -422,7 +422,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 			/* unavailable stats are not interesting for the planner */
-			if (mvstat->deps_built || mvstat->ndist_built || mvstat->mcv_built)
+			if (mvstat->deps_built || mvstat->ndist_built || mvstat->mcv_built || mvstat->hist_built)
 			{
 				info = makeNode(MVStatisticInfo);
 
@@ -433,11 +433,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->ndist_enabled = mvstat->ndist_enabled;
 				info->deps_enabled = mvstat->deps_enabled;
 				info->mcv_enabled = mvstat->mcv_enabled;
+				info->hist_enabled = mvstat->hist_enabled;
 
 				/* built/available statistics */
 				info->ndist_built = mvstat->ndist_built;
 				info->deps_built = mvstat->deps_built;
 				info->mcv_built = mvstat->mcv_built;
+				info->hist_built = mvstat->hist_built;
 
 				/* stakeys */
 				adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index d5d47ba..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o mvdist.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.histogram b/src/backend/utils/mvstats/README.histogram
new file mode 100644
index 0000000..a182fa3
--- /dev/null
+++ b/src/backend/utils/mvstats/README.histogram
@@ -0,0 +1,299 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+    * frequency (fraction of tuples in the bucket)
+    * whether the boundaries are inclusive or exclusive
+    * whether the dimension contains only NULL values
+    * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+    (a) FULL_MATCH - The bucket definitely matches the clause.
+
+    (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+                        the tuples it represents.
+
+    (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+    WHERE (a < 10) => FULL_MATCH (all range values are below
+                      10, so the whole bucket matches)
+
+    WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+                      the clause, but we don't know how many)
+
+    WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+                      no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+    (a) create an initial bucket (containing all sample rows)
+
+    (b) create NULL buckets (by splitting the initial bucket)
+
+    (c) repeat
+
+        (1) choose bucket to split next
+
+        (2) terminate if no bucket that might be split found, or if we've
+            reached the maximum number of buckets (16384)
+
+        (3) choose dimension to partition the bucket by
+
+        (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split, as discussed in the next section.
+
+
+Partitioning criteria
+---------------------
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency.
+
+We also need to produce "regular" buckets, because buckets with one dimension
+much longer than the others are very likely to match a lot of conditions (which
+increases error, even if the bucket frequency is very low).
+
+This is especially important when handling OR-clauses, because in that case each
+clause may add buckets independently. With AND-clauses all the clauses have to
+match each bucket, which makes this issue somewhat less concenrning.
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket(),
+which also includes further discussion.
+
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL.
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_mv_statistic table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+    2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+    (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+    2*N + K       = 1007
+    (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+    1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+    SELECT * FROM pg_mv_histogram_buckets();
+
+It has two input parameters:
+
+    oid   - OID of the histogram (pg_mv_statistic.staoid)
+    otype - type of output
+
+and produces a table with these columns:
+
+    - bucket ID                (0...nbuckets-1)
+    - lower bucket boundaries  (string array)
+    - upper bucket boundaries  (string array)
+    - nulls only dimensions    (boolean array)
+    - lower boundary inclusive (boolean array)
+    - upper boundary includive (boolean array)
+    - frequency                (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+    - 0 - values stored in the histogram, encoded as text
+    - 1 - indexes into the deduplicated arrays
+    - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 57f5c8b..6072b29 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -18,6 +18,8 @@ Currently we only have two kinds of multivariate statistics
 
     (b) MCV lists (README.mcv)
 
+    (c) multivariate histograms (README.histogram)
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 5d9caa8..5246012 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,6 +13,7 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
 
 #include "common.h"
 #include "utils/array.h"
@@ -52,7 +53,8 @@ build_mv_stats(Relation onerel, double totalrows,
 		MVNDistinct	ndistinct = NULL;
 		MVDependencies deps = NULL;
 		MCVList		mcvlist = NULL;
-		int			numrows_filtered = 0;
+		MVHistogram histogram = NULL;
+		int			numrows_filtered = numrows;
 
 		VacAttrStats **stats = NULL;
 		int			numatts = 0;
@@ -97,8 +99,12 @@ build_mv_stats(Relation onerel, double totalrows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
-		/* store the statistics in the catalog */
-		update_mv_stats(stat->mvoid, ndistinct, deps, mcvlist, attrs, stats);
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, ndistinct, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -182,6 +188,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -198,7 +206,6 @@ list_mv_stats(Oid relid)
 	return result;
 }
 
-
 /*
  * Find attnims of MV stats using the mvoid.
  */
@@ -247,7 +254,8 @@ find_mv_attnums(Oid mvoid, Oid *relid)
 
 void
 update_mv_stats(Oid mvoid,
-				MVNDistinct ndistinct, MVDependencies dependencies, MCVList mcvlist,
+				MVNDistinct ndistinct, MVDependencies dependencies,
+				MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -289,15 +297,26 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea	   *data = serialize_mv_histogram(histogram, attrs, stats);
+
+		nulls[Anum_pg_mv_statistic_stahist - 1] = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_standist - 1] = true;
 	replaces[Anum_pg_mv_statistic_stadeps - 1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv - 1] = true;
+	replaces[Anum_pg_mv_statistic_stahist - 1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_ndist_built - 1] = false;
 	nulls[Anum_pg_mv_statistic_deps_built - 1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built - 1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built - 1] = false;
 
 	nulls[Anum_pg_mv_statistic_stakeys - 1] = false;
 
@@ -305,12 +324,14 @@ update_mv_stats(Oid mvoid,
 	replaces[Anum_pg_mv_statistic_ndist_built - 1] = true;
 	replaces[Anum_pg_mv_statistic_deps_built - 1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built - 1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built - 1] = true;
 
 	replaces[Anum_pg_mv_statistic_stakeys - 1] = true;
 
 	values[Anum_pg_mv_statistic_ndist_built - 1] = BoolGetDatum(ndistinct != NULL);
 	values[Anum_pg_mv_statistic_deps_built - 1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built - 1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built - 1] = BoolGetDatum(histogram != NULL);
 
 	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(attrs);
 
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index fe56f51..96c0317 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -77,7 +77,7 @@ MultiSortSupport multi_sort_init(int ndims);
 void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 						 int dim, VacAttrStats **vacattrstats);
 
-int			multi_sort_compare(const void *a, const void *b, void *arg);
+int multi_sort_compare(const void *a, const void *b, void *arg);
 
 int multi_sort_compare_dim(int dim, const SortItem *a,
 					   const SortItem *b, MultiSortSupport mss);
@@ -86,9 +86,9 @@ int multi_sort_compare_dims(int start, int end, const SortItem *a,
 						const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
-int			compare_datums_simple(Datum a, Datum b, SortSupport ssup);
-int			compare_scalars_simple(const void *a, const void *b, void *arg);
-int			compare_scalars_partition(const void *a, const void *b, void *arg);
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
 
 void *bsearch_arg(const void *key, const void *base,
 			size_t nmemb, size_t size,
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..fc0c9c2
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2123 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/bytea.h"
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+						 int2vector *attrs,
+						 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket *buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+						VacAttrStats **stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+						   int2vector *attrs,
+						   VacAttrStats **stats,
+						   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats **stats);
+
+static Datum *build_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				VacAttrStats **stats, int i, int *nvals);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples	  (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(uint16))
+ * - max boundary indexes (2 * ndim * sizeof(uint16))
+ *
+ * So in total:
+ *
+ *	 ndim * (4 * sizeof(uint16) + 3 * sizeof(bool)) + (2 * sizeof(float))
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		(*(float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n) ((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n) ((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData
+{
+
+	float		ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;			/* aray of sample rows */
+	uint32		numrows;		/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when building
+	 * the histogram (and is not serialized/deserialized).
+	 */
+	uint32	   *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData *HistogramBuild;
+
+/*
+ * builds a multivariate algorithm
+ *
+ * The build algorithm is iterative - initially a single bucket containing all
+ * the sample rows is formed, and then repeatedly split into smaller buckets.
+ * In each step the largest bucket (in some sense) is chosen to be split next.
+ *
+ * The criteria for selecting the largest bucket (and the dimension for the
+ * split) needs to be elaborate enough to produce buckets of roughly the same
+ * size, and also regular shape (not very long in one dimension).
+ *
+ * The current algorithm works like this:
+ *
+ *	   build NULL-buckets (create_null_buckets)
+ *
+ *	   while [maximum number of buckets not reached]
+ *
+ *		   choose bucket to partition (largest bucket)
+ *			   if no bucket to partition
+ *				   terminate the algorithm
+ *
+ *		   choose bucket dimension to partition (largest dimension)
+ *			   split the bucket into two buckets
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket for
+ * more details about the algorithm.
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int			i;
+	int			numattrs = attrs->dim1;
+
+	int		   *ndistvalues;
+	Datum	  **distvalues;
+
+	MVHistogram histogram;
+
+	HeapTuple  *rows_copy = (HeapTuple *) palloc0(numrows * sizeof(HeapTuple));
+
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* build histogram header */
+
+	histogram = (MVHistogram) palloc0(sizeof(MVHistogramData));
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type = MVSTAT_HIST_TYPE_BASIC;
+
+	histogram->nbuckets = 1;
+	histogram->ndimensions = numattrs;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket *) palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later to select
+	 * dimension to partition).
+	 */
+	ndistvalues = (int *) palloc0(sizeof(int) * numattrs);
+	distvalues = (Datum **) palloc0(sizeof(Datum *) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+		distvalues[i] = build_ndistinct(numrows, rows, attrs, stats, i,
+										&ndistvalues[i]);
+
+	/*
+	 * Split the initial bucket into buckets that don't mix NULL and non-NULL
+	 * values in a single dimension.
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	/*
+	 * Do the actual histogram build - select a bucket and split it.
+	 */
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket	bucket = select_bucket_to_partition(histogram->nbuckets,
+														histogram->buckets);
+
+		/* no buckets eligible for partitioning */
+		if (bucket == NULL)
+			break;
+
+		/* we modify the bucket in-place and add one new bucket */
+		histogram->buckets[histogram->nbuckets++]
+			= partition_bucket(bucket, attrs, stats, ndistvalues, distvalues);
+	}
+
+	/* finalize the histogram build - compute the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+		= ((HistogramBuild) histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in case
+		 * some of the rows were used for MCV.
+		 *
+		 * XXX Perhaps this should simply compute frequency with respect to
+		 * the local freuquency, and then factor-in the MCV later.
+		 *
+		 * FIXME The 'ntuples' sounds a bit inappropriate for frequency.
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* build array of distinct values for a single attribute */
+static Datum *
+build_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				VacAttrStats **stats, int i, int *nvals)
+{
+	int			j;
+	int			nvalues,
+				ndistinct;
+	Datum	   *values,
+			   *distvalues;
+
+	SortSupportData ssup;
+	StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	nvalues = 0;
+	values = (Datum *) palloc0(sizeof(Datum) * numrows);
+
+	/* collect values from the sample rows, ignore NULLs */
+	for (j = 0; j < numrows; j++)
+	{
+		Datum		value;
+		bool		isnull;
+
+		/*
+		 * remember the index of the sample row, to make the partitioning
+		 * simpler
+		 */
+		value = heap_getattr(rows[j], attrs->values[i],
+							 stats[i]->tupDesc, &isnull);
+
+		if (isnull)
+			continue;
+
+		values[nvalues++] = value;
+	}
+
+	/* if no non-NULL values were found, free the memory and terminate */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		return NULL;
+	}
+
+	/* sort the array of values using the SortSupport */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/* count the distinct values first, and allocate just enough memory */
+	ndistinct = 1;
+	for (j = 1; j < nvalues; j++)
+		if (compare_scalars_simple(&values[j], &values[j - 1], &ssup) != 0)
+			ndistinct += 1;
+
+	distvalues = (Datum *) palloc0(sizeof(Datum) * ndistinct);
+
+	/* now collect distinct values into the array */
+	distvalues[0] = values[0];
+	ndistinct = 1;
+
+	for (j = 1; j < nvalues; j++)
+	{
+		if (compare_scalars_simple(&values[j], &values[j - 1], &ssup) != 0)
+		{
+			distvalues[ndistinct] = values[j];
+			ndistinct += 1;
+		}
+	}
+
+	pfree(values);
+
+	*nvals = ndistinct;
+	return distvalues;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (!HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+								Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ *	   (a) collect all (non-NULL) attribute values from all buckets
+ *	   (b) sort the data (using 'lt' from VacAttrStats)
+ *	   (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ *	   (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ * (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char' or
+ * a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int			i = 0,
+				j = 0;
+	Size		total_length = 0;
+
+	bytea	   *output = NULL;
+	char	   *data = NULL;
+
+	DimensionInfo *info;
+	SortSupport ssup;
+
+	int			nbuckets = histogram->nbuckets;
+	int			ndims = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int			bucketsize = BUCKET_SIZE(ndims);
+	char	   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum	  **values = (Datum **) palloc0(sizeof(Datum *) * ndims);
+	int		   *counts = (int *) palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	info = (DimensionInfo *) palloc0(sizeof(DimensionInfo) * ndims);
+
+	/* sort support data */
+	ssup = (SortSupport) palloc0(sizeof(SortSupportData) * ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int			count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs (we won't
+		 * use them, but we don't know how many are there), and then collect
+		 * all non-NULL values.
+		 */
+		values[i] = (Datum *) palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (!histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but keep
+		 * the ordering (so that we can do bsearch later). We know there's at
+		 * least 1 item, so we can skip the first element.
+		 */
+		count = 1;				/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j - 1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				 info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena - magic (4B) - type (4B) - ndimensions (4B) -
+	 * nbuckets (4B) - info (ndim * sizeof(DimensionInfo) - arrays of values
+	 * for each dimension - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and then
+	 * we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 1MB (%ld > %d)",
+			 total_length, (1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea *) palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* serialize the deduplicated values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char	   *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			Datum		v = values[i][j];
+
+			if (info[i].typbyval)		/* passed by value */
+			{
+				memcpy(data, &v, info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)		/* pased by reference */
+			{
+				memcpy(data, DatumGetPointer(v), info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)		/* varlena */
+			{
+				memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)		/* cstring */
+			{
+				memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
+				data += strlen(DatumGetPointer(v)) + 1;
+			}
+		}
+
+		/* make sure we got exactly the amount of data we expected */
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* finally serialize the items, with uint16 indexes instead of the values */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char *) output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		BUCKET_NTUPLES(bucket) = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (!histogram->buckets[i]->nullsonly[j])
+			{
+				uint16		idx;
+				Datum	   *v = NULL;
+
+				/* min boundary */
+				v = (Datum *) bsearch_arg(&histogram->buckets[i]->min[j],
+								   values[j], info[j].nvalues, sizeof(Datum),
+										  compare_scalars_simple, &ssup[j]);
+
+				Assert(v != NULL);		/* serialization or deduplication
+										 * error */
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum *) bsearch_arg(&histogram->buckets[i]->max[j],
+								   values[j], info[j].nvalues, sizeof(Datum),
+										  compare_scalars_simple, &ssup[j]);
+
+				Assert(v != NULL);		/* serialization or deduplication
+										 * error */
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+			   histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+			   histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+			   histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char *) output) == total_length);
+
+	/* free the values/counts arrays here */
+	pfree(counts);
+	pfree(info);
+	pfree(ssup);
+
+	for (i = 0; i < ndims; i++)
+		pfree(values[i]);
+
+	pfree(values);
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary values
+ * deduplicated, so that it's possible to optimize the estimation part by
+ * caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea *data)
+{
+	int			i = 0,
+				j = 0;
+
+	Size		expected_size;
+	char	   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int			nbuckets;
+	int			ndims;
+	int			bucketsize;
+
+	/* temporary deserialization buffer */
+	int			bufflen;
+	char	   *buff;
+	char	   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData, buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData, buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram) palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete, as we
+	 * yet have to count the array sizes (from DimensionInfo records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData, buckets) +
+		ndims * sizeof(DimensionInfo) +
+		(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo *) (tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* a single buffer for all the values and counts */
+	bufflen = (sizeof(int) + sizeof(Datum *)) * ndims;
+
+	for (i = 0; i < ndims; i++)
+		/* don't allocate space for byval types, matching Datum */
+		if (!(info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+						   sizeof(MVSerializedBucket) + /* bucket pointer */
+						   sizeof(MVSerializedBucketData));		/* bucket data */
+
+	buff = palloc0(bufflen);
+	ptr = buff;
+
+	histogram->nvalues = (int *) ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum **) ptr;
+	ptr += (sizeof(Datum *) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types not
+	 * passed by value), so when someone frees the memory, e.g. by doing
+	 * something like this:
+	 *
+	 * bytea * data = ... fetch the data from catalog ... MCVList mcvlist =
+	 * deserialize_mcv_list(data); pfree(data);
+	 *
+	 * then 'mcvlist' references the freed memory. This needs to copy the
+	 * pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				histogram->values[i] = (Datum *) tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				histogram->values[i] = (Datum *) ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the other types need a chunk of the buffer */
+			histogram->values[i] = (Datum *) ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1);	/* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket *) ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket) ptr;
+
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples = BUCKET_NTUPLES(tmp);
+		bucket->nullsonly = BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive = BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive = BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min = BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max = BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char *) data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int			i;
+	int			numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket	bucket = (MVBucket) palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool *) palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool *) palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool *) palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum *) palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum *) palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild) palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32 *) palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which we use
+	 * when selecting bucket to partition), and then number of distinct values
+	 * for each partition (which we use when choosing which dimension to
+	 * split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm produces
+ * buckets with about equal frequency and regular size. We select the bucket
+ * with the highest number of distinct values, and then split it by the longest
+ * dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this is used
+ * to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this contains
+ *		 values for all the tuples from the sample, not just the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned, or NULL if
+ * there are no buckets that may be split (e.g. if all buckets are too small
+ * or contain too few distinct values).
+ *
+ *
+ * Tricky example
+ * --------------
+ *
+ * Consider this table:
+ *
+ *	   CREATE TABLE t AS SELECT i AS a, i AS b
+ *						   FROM generate_series(1,1000000) s(i);
+ *
+ *	   CREATE STATISTICS s1 ON t (a,b) WITH (histogram);
+ *
+ *	   ANALYZE t;
+ *
+ * It's a very specific (and perhaps artificial) example, because every bucket
+ * always has exactly the same number of distinct values in all dimensions,
+ * which makes the partitioning tricky.
+ *
+ * Then:
+ *
+ *	   SELECT * FROM t WHERE (a < 100) AND (b < 100);
+ *
+ * is estimated to return ~120 rows, while in reality it returns only 99.
+ *
+ *							 QUERY PLAN
+ *	   -------------------------------------------------------------
+ *		Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *					   (actual time=0.129..82.776 rows=99 loops=1)
+ *		  Filter: ((a < 100) AND (b < 100))
+ *		  Rows Removed by Filter: 999901
+ *		Planning time: 1.286 ms
+ *		Execution time: 82.984 ms
+ *	   (5 rows)
+ *
+ * So this estimate is reasonably close. Let's change the query to OR clause:
+ *
+ *	   SELECT * FROM t WHERE (a < 100) OR (b < 100);
+ *
+ *							 QUERY PLAN
+ *	   -------------------------------------------------------------
+ *		Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *					   (actual time=0.145..99.910 rows=99 loops=1)
+ *		  Filter: ((a < 100) OR (b < 100))
+ *		  Rows Removed by Filter: 999901
+ *		Planning time: 1.578 ms
+ *		Execution time: 100.132 ms
+ *	   (5 rows)
+ *
+ * That's clearly a much worse estimate. This happens because the histogram
+ * contains buckets like this:
+ *
+ *	   bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ * i.e. the length of "a" dimension is (30310-3)=30307, while the length of "b"
+ * is (30593-30134)=459. So the "b" dimension is much narrower than "a".
+ * Of course, there are also buckets where "b" is the wider dimension.
+ *
+ * This is partially mitigated by selecting the "longest" dimension but that
+ * only happens after we already selected the bucket. So if we never select the
+ * bucket, this optimization does not apply.
+ *
+ * The other reason why this particular example behaves so poorly is due to the
+ * way we actually split the selected bucket. We do attempt to divide the bucket
+ * into two parts containing about the same number of tuples, but that does not
+ * too well when most of the tuples is squashed on one side of the bucket.
+ *
+ * For example for columns with data on the diagonal (i.e. when a=b), we end up
+ * with a narrow bucket on the diagonal and a huge bucket overing the remaining
+ * part (with much lower density).
+ *
+ * So perhaps we need two partitioning strategies - one aiming to split buckets
+ * with high frequency (number of sampled rows), the other aiming to split
+ * "large" buckets. And alternating between them, somehow.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ * histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket *buckets)
+{
+	int			i;
+	int			numrows = 0;
+	MVBucket	bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild) buckets[i]->build_data;
+
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS))
+		{
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest bucket
+ * dimension, measured using the array of distinct values built at the very
+ * beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly distributed,
+ * and then use this to measure length. It's essentially a number of distinct
+ * values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts with
+ * roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning the new
+ * bucket (essentially shrinking the existing one in-place and returning the
+ * other "half" as a new bucket). The caller is responsible for adding the new
+ * bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension most in
+ * need of a split. For a nice summary and general overview, see "rK-Hist : an
+ * R-Tree based histogram for multi-dimensional selectivity estimation" thesis
+ * by J. A. Lopez, Concordia University, p.34-37 (and possibly p. 32-34 for
+ * explanation of the terms).
+ *
+ * It requires care to prevent splitting only one dimension and not splitting
+ * another one at all (which might happen easily in case of strongly dependent
+ * columns - e.g. y=x). The current algorithm minimizes this, but may still
+ * happen for perfectly dependent examples (when all the dimensions have equal
+ * length, the first one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ * to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int			i;
+	int			dimension;
+	int			numattrs = attrs->dim1;
+
+	Datum		split_value;
+	MVBucket	new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool		isNull;
+	int			nvalues = 0;
+	HistogramBuild data = (HistogramBuild) bucket->build_data;
+	StdAnalyzeData *mystats = NULL;
+	ScalarItem *values = (ScalarItem *) palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	int			nrows = 1;		/* number of rows below current value */
+	double		delta;
+
+	/* needed when splitting the values */
+	HeapTuple  *oldrows = data->rows;
+	int			oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* Look for the next dimension to split. */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum	   *a,
+				   *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum *) bsearch_arg(&bucket->min[i],
+								  distvalues[i], ndistvalues[i],
+							   sizeof(Datum), compare_scalars_simple, &ssup);
+
+		b = (Datum *) bsearch_arg(&bucket->max[i],
+								  distvalues[i], ndistvalues[i],
+							   sizeof(Datum), compare_scalars_simple, &ssup);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b - a) * 1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b - a) * 1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something wrong in
+	 * select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values and
+	 * then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * remember the index of the sample row, to make the partitioning
+		 * simpler
+		 */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+										 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we never split null-only dimension) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values in this
+	 * dimension, and we want to split this into half, so walk through the
+	 * array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value, and
+	 * use it as an exclusive upper boundary (and inclusive lower boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct values
+	 * (at least for even distinct counts), but that would require being able
+	 * to do an average (which does not work for non-numeric types).
+	 *
+	 * TODO Another option is to look for a split that'd give about 50% tuples
+	 * (not distinct values) in each partition. That might work better when
+	 * there are a few very frequent values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i - 1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows / 2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows / 2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/*
+	 * create the new bucket as a (incomplete) copy of the one being
+	 * partitioned.
+	 */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild) new_bucket->build_data;
+
+	/*
+	 * Do the actual split of the chosen dimension, using the split value as
+	 * the upper bound for the existing bucket, and lower bound for the new
+	 * one.
+	 */
+	bucket->max[dimension] = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	/*
+	 * We also treat only one side of the new boundary as inclusive, in the
+	 * bucket where it happens to be the upper boundary. We never set the
+	 * min_inclusive[] to false anywhere, but we set it to true anyway.
+	 */
+	bucket->max_inclusive[dimension] = false;
+	new_bucket->min_inclusive[dimension] = true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno' index. We
+	 * know 'nrows' rows should remain in the original bucket and the rest
+	 * goes to the new one.
+	 */
+
+	data->rows = (HeapTuple *) palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple *) palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should go to
+	 * the new one. Use the tupno field to get the actual HeapTuple row from
+	 * the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i - nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 * because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time data, i.e.
+ * sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket	new_bucket = (MVBucket) palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild) palloc0(sizeof(HistogramBuildData));
+
+	/*
+	 * Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split.
+	 */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool *) palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool *) palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool *) palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum *) palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum *) palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions * sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions * sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions * sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions * sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32 *) palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies the
+ * Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types (assuming
+ * they don't use collations etc.)
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats **stats)
+{
+	int			i,
+				j;
+	int			numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild) bucket->build_data;
+	int			numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes above
+	 * (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items = (SortItem *) palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum *) palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool *) palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+							   stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats **stats, bool update_boundaries)
+{
+	int			j;
+	int			nvalues = 0;
+	bool		isNull;
+	HistogramBuild data = (HistogramBuild) bucket->build_data;
+	Datum	   *values = (Datum *) palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData *mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (!isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/*
+	 * if there are only NULL values in the column, mark it so and continue
+	 * with the next one
+	 */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues - 1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs etc.).
+	 * Although thanks to the deduplication it might work even for those types
+	 * (equal values will get the same item in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++)
+	{
+		if (values[j] != values[j - 1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and non-NULL
+ * values in a single dimension. Each dimension may either be marked as 'nulls
+ * only', and thus containing only NULL values, or it must not contain any NULL
+ * values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns, it's
+ * necessary to build those NULL-buckets. This is done in an iterative way
+ * using this algorithm, operating on a single bucket:
+ *
+ *	   (1) Check that all dimensions are well-formed (not mixing NULL and
+ *		   non-NULL values).
+ *
+ *	   (2) If all dimensions are well-formed, terminate.
+ *
+ *	   (3) If the dimension contains only NULL values, but is not marked as
+ *		   NULL-only, mark it as NULL-only and run the algorithm again (on
+ *		   this bucket).
+ *
+ *	   (4) If the dimension mixes NULL and non-NULL values, split the bucket
+ *		   into two parts - one with NULL values, one with non-NULL values
+ *		   (replacing the current one). Then run the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions should
+ * be quite low - limited by the number of NULL-buckets. Also, in each branch
+ * the number of nested calls is limited by the number of dimensions
+ * (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The number of
+ * buckets produced by this algorithm is rather limited - with N dimensions,
+ * there may be only 2^N such buckets (each dimension may be either NULL or
+ * non-NULL). So with 8 dimensions (current value of MVSTATS_MAX_DIMENSIONS)
+ * there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further optimizing
+ * the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats **stats)
+{
+	int			i,
+				j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket,
+				null_bucket;
+	int			null_idx,
+				curr_idx;
+	HistogramBuild data,
+				null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild) bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL in a
+	 * dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute here - we can
+		 * start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (!null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only, but is
+	 * not yet marked like that. It's enough to mark it and repeat the process
+	 * recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in the
+	 * dimension, one with non-NULL values. We don't need to sort the data or
+	 * anything, but otherwise it's similar to what partition_bucket() does.
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild) null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple *) palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple *) palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+				   sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+				   sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 * because we know how many distinct values went to each bucket (NULL is
+	 * not a value, so NULL buckets get 0, and the other bucket got all the
+	 * distinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new one
+	 * first, because the call may change number of buckets, and it's used as
+	 * an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets - 1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if the
+ * statistics contains no histogram (or if there's no statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ *
+ * 0 (actual values)
+ * -----------------
+ *	  - prints actual values
+ *	  - using the output function of the data type (as string)
+ *	  - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *	  - prints index of the distinct value (into the serialized array)
+ *	  - makes it easier to spot neighbor buckets, etc.
+ *	  - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *	  - prints index of the distinct value, but normalized into [0,1]
+ *	  - similar to 1, but shows how 'long' the bucket range is
+ *	  - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options skew the
+ * lengths by distributing the distinct values uniformly. For data types
+ * without a clear meaning of 'distance' (e.g. strings) that is not a big deal,
+ * but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+#define OUTPUT_FORMAT_RAW		0
+#define OUTPUT_FORMAT_INDEXES	1
+#define OUTPUT_FORMAT_DISTINCT	2
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	int			call_cntr;
+	int			max_calls;
+	TupleDesc	tupdesc;
+	AttInMetadata *attinmeta;
+
+	Oid			mvoid = PG_GETARG_OID(0);
+	int			otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples from raw
+		 * C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)	/* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_volume = 1.0;
+		StringInfo	bufs;
+
+		char	   *format;
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram) funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * The scalar values will be formatted directly, using snprintf.
+		 *
+		 * The 'array' values will be formatted through StringInfo.
+		 */
+		values = (char **) palloc0(9 * sizeof(char *));
+		bufs = (StringInfo) palloc0(9 * sizeof(StringInfoData));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		initStringInfo(&bufs[1]);		/* lower boundaries */
+		initStringInfo(&bufs[2]);		/* upper boundaries */
+		initStringInfo(&bufs[3]);		/* nulls-only */
+		initStringInfo(&bufs[4]);		/* lower inclusive */
+		initStringInfo(&bufs[5]);		/* upper inclusive */
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid *) palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		/*
+		 * lookup output functions for all histogram dimensions
+		 *
+		 * XXX This might be one in the first call and stored in user_fctx.
+		 */
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool		isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);		/* bucket ID */
+
+		/*
+		 * for the arrays of lower/upper boundaries, formated according to
+		 * otype
+		 */
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			Datum	   *vals = histogram->values[i];
+
+			uint16		minidx = bucket->min[i];
+			uint16		maxidx = bucket->max[i];
+
+			/*
+			 * compute bucket volume, using distinct values as a measure
+			 *
+			 * XXX Not really sure what to do for NULL dimensions here, so
+			 * let's simply count them as '1'.
+			 */
+			bucket_volume
+				*= (double) (maxidx - minidx + 1) / (histogram->nvalues[i] - 1);
+
+			if (i == 0)
+				format = "{%s"; /* fist dimension */
+			else if (i < (histogram->ndimensions - 1))
+				format = ", %s";	/* medium dimensions */
+			else
+				format = ", %s}";		/* last dimension */
+
+			appendStringInfo(&bufs[3], format, bucket->nullsonly[i] ? "t" : "f");
+			appendStringInfo(&bufs[4], format, bucket->min_inclusive[i] ? "t" : "f");
+			appendStringInfo(&bufs[5], format, bucket->max_inclusive[i] ? "t" : "f");
+
+			/*
+			 * for NULL-only  dimension, simply put there the NULL and
+			 * continue
+			 */
+			if (bucket->nullsonly[i])
+			{
+				if (i == 0)
+					format = "{%s";
+				else if (i < (histogram->ndimensions - 1))
+					format = ", %s";
+				else
+					format = ", %s}";
+
+				appendStringInfo(&bufs[1], format, "NULL");
+				appendStringInfo(&bufs[2], format, "NULL");
+
+				continue;
+			}
+
+			/* otherwise we really need to format the value */
+			switch (otype)
+			{
+				case OUTPUT_FORMAT_RAW: /* actual boundary values */
+
+					if (i == 0)
+						format = "{%s";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %s";
+					else
+						format = ", %s}";
+
+					appendStringInfo(&bufs[1], format,
+								  FunctionCall1(&fmgrinfo[i], vals[minidx]));
+
+					appendStringInfo(&bufs[2], format,
+								  FunctionCall1(&fmgrinfo[i], vals[maxidx]));
+
+					break;
+
+				case OUTPUT_FORMAT_INDEXES:		/* indexes into deduplicated
+												 * arrays */
+
+					if (i == 0)
+						format = "{%d";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %d";
+					else
+						format = ", %d}";
+
+					appendStringInfo(&bufs[1], format, minidx);
+
+					appendStringInfo(&bufs[2], format, maxidx);
+
+					break;
+
+				case OUTPUT_FORMAT_DISTINCT:	/* distinct arrays as measure */
+
+					if (i == 0)
+						format = "{%f";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %f";
+					else
+						format = ", %f}";
+
+					appendStringInfo(&bufs[1], format,
+							   (minidx * 1.0 / (histogram->nvalues[i] - 1)));
+
+					appendStringInfo(&bufs[2], format,
+							   (maxidx * 1.0 / (histogram->nvalues[i] - 1)));
+
+					break;
+
+				default:
+					elog(ERROR, "unknown output type: %d", otype);
+			}
+		}
+
+		values[1] = bufs[1].data;
+		values[2] = bufs[2].data;
+		values[3] = bufs[3].data;
+		values[4] = bufs[4].data;
+		values[5] = bufs[5].data;
+
+		snprintf(values[6], 64, "%f", bucket->ntuples); /* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_volume); /* density */
+		snprintf(values[8], 64, "%f", bucket_volume);	/* volume (as a
+														 * fraction) */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[6]);
+		pfree(values[7]);
+		pfree(values[8]);
+
+		resetStringInfo(&bufs[1]);
+		resetStringInfo(&bufs[2]);
+		resetStringInfo(&bufs[3]);
+		resetStringInfo(&bufs[4]);
+		resetStringInfo(&bufs[5]);
+
+		pfree(bufs);
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else	/* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+/*
+ * pg_histogram_in		- input routine for type pg_histogram.
+ *
+ * pg_histogram is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_histogram_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_histogram")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_histogram		- output routine for type PG_HISTOGRAM.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ *
+ * FIXME not implemented yet, returning dummy value
+ */
+Datum
+pg_histogram_out(PG_FUNCTION_ARGS)
+{
+	return byteaout(fcinfo);
+}
+
+/*
+ * pg_histogram_recv		- binary input routine for type pg_histogram.
+ */
+Datum
+pg_histogram_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_histogram")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_histogram_send		- binary output routine for type pg_histogram.
+ *
+ * XXX Histograms are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_histogram_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int			i,
+				j;
+
+	float		ffull = 0,
+				fpartial = 0;
+	int			nfull = 0,
+				npartial = 0;
+
+	StringInfoData buf;
+
+	initStringInfo(&buf);
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		if (!matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		resetStringInfo(&buf);
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			appendStringInfo(&buf, '[%d %d]',
+							 DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+						   DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, buf.data, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index e2220a9..4246f2a 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2298,8 +2298,8 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 							  "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-							  "  ndist_enabled, deps_enabled, mcv_enabled,\n"
-							  "  ndist_built, deps_built, mcv_built,\n"
+							  "  ndist_enabled, deps_enabled, mcv_enabled, hist_enabled,\n"
+							  "  ndist_built, deps_built, mcv_built, hist_built,\n"
 							  "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 							  "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2342,8 +2342,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (!first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-									  PQgetvalue(result, i, 9));
+									  PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 80aa9f5..a6bcbb2 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -266,6 +266,9 @@ DATA(insert (  3353	 25    0 i i ));
 DATA(insert (  441	 17    0 i b ));
 DATA(insert (  441	 25    0 i i ));
 
+/* pg_histogram can be coerced to, but not from, bytea */
+DATA(insert (  774	 17    0 i b ));
+
 
 /*
  * Datetime category
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 34049d6..d30d3cd9 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -40,11 +40,13 @@ CATALOG(pg_mv_statistic,3381)
 	bool		ndist_enabled;	/* build ndist coefficient? */
 	bool		deps_enabled;	/* analyze dependencies? */
 	bool		mcv_enabled;	/* build MCV list? */
+	bool		hist_enabled;	/* build histogram? */
 
 	/* statistics that are available (if requested) */
 	bool		ndist_built;	/* ndistinct coeff built */
 	bool		deps_built;		/* dependencies were built */
 	bool		mcv_built;		/* MCV list was built */
+	bool		hist_built;		/* histogram was built */
 
 	/*
 	 * variable-length fields start here, but we allow direct access to
@@ -56,6 +58,7 @@ CATALOG(pg_mv_statistic,3381)
 	pg_ndistinct		standist;		/* ndistinct coeff (serialized) */
 	pg_dependencies		stadeps;		/* dependencies (serialized) */
 	pg_mcv_list			stamcv;			/* MCV list (serialized) */
+	pg_histogram		stahist;		/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -71,7 +74,7 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					14
+#define Natts_pg_mv_statistic					17
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
@@ -79,12 +82,15 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
 #define Anum_pg_mv_statistic_ndist_enabled		5
 #define Anum_pg_mv_statistic_deps_enabled		6
 #define Anum_pg_mv_statistic_mcv_enabled		7
-#define Anum_pg_mv_statistic_ndist_built		8
-#define Anum_pg_mv_statistic_deps_built			9
-#define Anum_pg_mv_statistic_mcv_built			10
-#define Anum_pg_mv_statistic_stakeys			11
-#define Anum_pg_mv_statistic_standist			12
-#define Anum_pg_mv_statistic_stadeps			13
-#define Anum_pg_mv_statistic_stamcv				14
+#define Anum_pg_mv_statistic_hist_enabled		8
+#define Anum_pg_mv_statistic_ndist_built		9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_stakeys			13
+#define Anum_pg_mv_statistic_standist			14
+#define Anum_pg_mv_statistic_stadeps			15
+#define Anum_pg_mv_statistic_stamcv				16
+#define Anum_pg_mv_statistic_stahist			17
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 6be685c..6c9ad2a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2726,6 +2726,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "774" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_volume}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 3344 (  pg_ndistinct_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3343 "2275" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_in _null_ _null_ _null_ ));
 DESCR("I/O");
@@ -2754,6 +2758,15 @@ DESCR("I/O");
 DATA(insert OID = 445 (  pg_mcv_list_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "441" _null_ _null_ _null_ _null_ _null_	pg_mcv_list_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 775 (  pg_histogram_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 774 "2275" _null_ _null_ _null_ _null_ _null_ pg_histogram_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 776 (  pg_histogram_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "774" _null_ _null_ _null_ _null_ _null_ pg_histogram_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 777 (  pg_histogram_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 774 "2281" _null_ _null_ _null_ _null_ _null_ pg_histogram_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 778 (  pg_histogram_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "774" _null_ _null_ _null_ _null_ _null_	pg_histogram_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 4621703..fdd7ba6 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -376,6 +376,10 @@ DATA(insert OID = 441 ( pg_mcv_list		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_mcv_
 DESCR("multivariate MCV list");
 #define PGMCVLISTOID	441
 
+DATA(insert OID = 774 ( pg_histogram		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_histogram_in pg_histogram_out pg_histogram_recv pg_histogram_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate histogram");
+#define PGHISTOGRAMOID	774
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b37835c..0e6ab3e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -677,11 +677,13 @@ typedef struct MVStatisticInfo
 	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		ndist_built;	/* ndistinct coefficient built */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index fe95bad..3db64f0 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -625,6 +625,10 @@ extern Datum pg_mcv_list_in(PG_FUNCTION_ARGS);
 extern Datum pg_mcv_list_out(PG_FUNCTION_ARGS);
 extern Datum pg_mcv_list_recv(PG_FUNCTION_ARGS);
 extern Datum pg_mcv_list_send(PG_FUNCTION_ARGS);
+extern Datum pg_histogram_in(PG_FUNCTION_ARGS);
+extern Datum pg_histogram_out(PG_FUNCTION_ARGS);
+extern Datum pg_histogram_recv(PG_FUNCTION_ARGS);
+extern Datum pg_histogram_send(PG_FUNCTION_ARGS);
 
 /* regexp.c */
 extern Datum nameregexeq(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index b17fcba..21aaef7 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -18,7 +18,7 @@
 #include "commands/vacuum.h"
 
 /*
- * Degree of how much MCV item matches a clause.
+ * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
  */
 #define MVSTATS_MATCH_NONE		0		/* no match at all */
@@ -114,19 +114,133 @@ bool dependency_implies_attribute(MVDependency dependency, AttrNumber attnum,
 bool dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
 								 int16 *attmap);
 
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670		/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1				/* basic histogram type */
+
+/* max buckets in a histogram (mostly arbitrary number */
+#define MVSTAT_HIST_MAX_BUCKETS 16384
+
+/*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData
+{
+
+	/* Frequencies of this bucket. */
+	float		ntuples;		/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool	   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum	   *min;
+	bool	   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum	   *max;
+	bool	   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void	   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData *MVBucket;
+
+
+typedef struct MVHistogramData
+{
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData
+{
+
+	/* Frequencies of this bucket. */
+	float		ntuples;		/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool	   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16	   *min;
+	bool	   *min_inclusive;
+
+	/*
+	 * indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive)
+	 */
+	uint16	   *max;
+	bool	   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData *MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData
+{
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of deserialization
+	 * (same offset)
+	 */
+	MVSerializedBucket *buckets;	/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated (the
+	 * min/max indexes point into these arrays)
+	 */
+	int		   *nvalues;
+	Datum	  **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
 MVNDistinct		load_mv_ndistinct(Oid mvoid);
 MVDependencies	load_mv_dependencies(Oid mvoid);
 MCVList			load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram load_mv_histogram(Oid mvoid);
 
 bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
 bytea *serialize_mv_dependencies(MVDependencies dependencies);
 bytea *serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							VacAttrStats **stats);
+bytea *serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+							VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVNDistinct deserialize_mv_ndistinct(bytea *data);
 MVDependencies deserialize_mv_dependencies(bytea *data);
 MCVList deserialize_mv_mcvlist(bytea *data);
+MVSerializedHistogram deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
@@ -139,6 +253,8 @@ int2vector *find_mv_attnums(Oid mvoid, Oid *relid);
 /* functions for inspecting the statistics */
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 
 MVNDistinct build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
@@ -151,12 +267,20 @@ MVDependencies build_mv_dependencies(int numrows, HeapTuple *rows,
 MCVList build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, double totalrows,
 			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats);
 
 void update_mv_stats(Oid relid,
-					 MVNDistinct ndistinct, MVDependencies dependencies, MCVList mcvlist,
+					 MVNDistinct ndistinct, MVDependencies dependencies,
+					 MCVList mcvlist, MVHistogram histogram,
 					 int2vector *attrs, VacAttrStats **stats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..16410ce
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,198 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s7 WITH (histogram) ON (unknown_column) FROM mv_histogram;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s7 WITH (histogram) ON (a) FROM mv_histogram;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s7 WITH (histogram) ON (a, a) FROM mv_histogram;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s7 WITH (histogram) ON (a, a, b) FROM mv_histogram;
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s7 WITH (unknown_option) ON (a, b, c) FROM mv_histogram;
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s7 WITH (histogram) ON (a, b, c) FROM mv_histogram;
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/100, i/200, i/400 FROM generate_series(1,30000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s8 WITH (histogram) ON (a, b, c) FROM mv_histogram;
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/100, i/200, i/400 FROM generate_series(1,30000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/100 = 0 THEN NULL ELSE i/100 END),
+       (CASE WHEN i/200 = 0 THEN NULL ELSE i/200 END),
+       (CASE WHEN i/400 = 0 THEN NULL ELSE i/400 END)
+     FROM generate_series(1,30000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s9 WITH (histogram) ON (a, b, c, d) FROM mv_histogram;
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9969c10..a9d8163 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -820,11 +820,12 @@ WHERE c.castmethod = 'b' AND
  pg_ndistinct      | bytea             |        0 | i
  pg_dependencies   | bytea             |        0 | i
  pg_mcv_list       | bytea             |        0 | i
+ pg_histogram      | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(10 rows)
+(11 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d14d864..759fc76 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1383,7 +1383,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length((s.standist)::bytea) AS ndistbytes,
     length((s.stadeps)::bytea) AS depsbytes,
     length((s.stamcv)::bytea) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length((s.stahist)::bytea) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index b810e71..2e0c3b8 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -73,9 +73,10 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
  3343 | pg_ndistinct
  3353 | pg_dependencies
   441 | pg_mcv_list
+  774 | pg_histogram
   210 | smgr
   705 | unknown
-(6 rows)
+(7 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index d6c3cc0..8b750fc 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -115,4 +115,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_ndistinct mv_dependencies mv_mcv
+test: mv_ndistinct mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 2394d74..ff47035 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -172,3 +172,4 @@ test: stats
 test: mv_ndistinct
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..55197cb
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,167 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s7 WITH (histogram) ON (unknown_column) FROM mv_histogram;
+
+-- single column
+CREATE STATISTICS s7 WITH (histogram) ON (a) FROM mv_histogram;
+
+-- single column, duplicated
+CREATE STATISTICS s7 WITH (histogram) ON (a, a) FROM mv_histogram;
+
+-- two columns, one duplicated
+CREATE STATISTICS s7 WITH (histogram) ON (a, a, b) FROM mv_histogram;
+
+-- unknown option
+CREATE STATISTICS s7 WITH (unknown_option) ON (a, b, c) FROM mv_histogram;
+
+-- correct command
+CREATE STATISTICS s7 WITH (histogram) ON (a, b, c) FROM mv_histogram;
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/100, i/200, i/400 FROM generate_series(1,30000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s8 WITH (histogram) ON (a, b, c) FROM mv_histogram;
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/100, i/200, i/400 FROM generate_series(1,30000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/100 = 0 THEN NULL ELSE i/100 END),
+       (CASE WHEN i/200 = 0 THEN NULL ELSE i/200 END),
+       (CASE WHEN i/400 = 0 THEN NULL ELSE i/400 END)
+     FROM generate_series(1,30000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s9 WITH (histogram) ON (a, b, c, d) FROM mv_histogram;
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.5.5

0007-WIP-use-ndistinct-for-selectivity-estimation-in--v22.patchbinary/octet-stream; name=0007-WIP-use-ndistinct-for-selectivity-estimation-in--v22.patchDownload

From 965892c97cb950e8b2bf56118165ae42144a34cc Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Thu, 27 Oct 2016 15:24:42 +0200
Subject: [PATCH 7/9] WIP: use ndistinct for selectivity estimation in
 clausesel.c

---
 src/backend/optimizer/path/clausesel.c | 382 ++++++++++++++++++++++++++-------
 1 file changed, 299 insertions(+), 83 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0243b4d..c35b914 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -47,9 +47,10 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
-#define		STATS_TYPE_FDEPS	0x01
-#define		STATS_TYPE_MCV		0x02
-#define		STATS_TYPE_HIST		0x04
+#define		STATS_TYPE_NDIST	0x01
+#define		STATS_TYPE_FDEPS	0x02
+#define		STATS_TYPE_MCV		0x04
+#define		STATS_TYPE_HIST		0x08
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 						int type);
@@ -70,6 +71,10 @@ static List *clauselist_mv_split(PlannerInfo *root, Index relid,
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 						  List *clauses, MVStatisticInfo *mvstats);
 
+static Selectivity clauselist_mv_selectivity_ndist(PlannerInfo *root,
+						Index relid, List *clauses, MVStatisticInfo *mvstats,
+						Index varRelid, JoinType jointype, SpecialJoinInfo *sjinfo);
+
 static Selectivity clauselist_mv_selectivity_deps(PlannerInfo *root,
 						Index relid, List *clauses, MVStatisticInfo *mvstats,
 						Index varRelid, JoinType jointype, SpecialJoinInfo *sjinfo);
@@ -282,6 +287,37 @@ clauselist_selectivity(PlannerInfo *root,
 		}
 	}
 
+	/* And finally, try to use ndistinct coefficients. */
+	if (has_stats(stats, STATS_TYPE_NDIST) &&
+		(count_mv_attnums(clauses, relid, STATS_TYPE_NDIST) >= 2))
+	{
+		MVStatisticInfo *mvstat;
+		Bitmapset  *mvattnums;
+
+		/* collect attributes from the compatible conditions */
+		mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_NDIST);
+
+		/* and search for the statistic covering the most attributes */
+		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_NDIST);
+
+		if (mvstat != NULL)		/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	   *mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, STATS_TYPE_NDIST);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats (dependencies) */
+			s1 *= clauselist_mv_selectivity_ndist(root, relid, mvclauses, mvstat,
+												  varRelid, jointype, sjinfo);
+		}
+	}
+
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -939,6 +975,261 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO: We may support some additional conditions, most importantly those
+ * matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO: Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ * selectivity of the most restrictive clause), because that's the maximum
+ * we can ever get from ANDed list of clauses. This may probably prevent
+ * issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO: We may remember the lowest frequency in the MCV list, and then later
+ * use it as a upper boundary for the selectivity (had there been a more
+ * frequent item, it'd be in the MCV list). This might improve cases with
+ * low-detail histograms.
+ *
+ * TODO: We may also derive some additional boundaries for the selectivity from
+ * the MCV list, because
+ *
+ * (a) if we have a "full equality condition" (one equality condition on
+ * each column of the statistic) and we found a match in the MCV list,
+ * then this is the final selectivity (and pretty accurate),
+ *
+ * (b) if we have a "full equality condition" and we haven't found a match
+ * in the MCV list, then the selectivity is below the lowest frequency
+ * found in the MCV list,
+ *
+ * TODO: When applying the clauses to the histogram/MCV list, we can do that
+ * from the most selective clauses first, because that'll eliminate the
+ * buckets/items sooner (so we'll be able to skip them without inspection,
+ * which is more expensive). But this requires really knowing the per-clause
+ * selectivities in advance, and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool		fullmatch = false;
+	Selectivity s1 = 0.0,
+				s2 = 0.0;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound for
+	 * full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/*
+	 * TODO: Evaluate simple 1D selectivities, use the smallest one as an
+	 * upper bound, product as lower bound, and sort the clauses in ascending
+	 * order by selectivity (to optimize the MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and the
+	 * estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/*
+	 * TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 * selectivity as upper bound
+	 */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
+}
+
+static MVNDistinctItem *
+find_widest_ndistinct_item(MVNDistinct ndistinct, Bitmapset *attnums,
+						   int16 *attmap)
+{
+	int i;
+	MVNDistinctItem *widest = NULL;
+
+	/* number of attnums in clauses */
+	int nattnums = bms_num_members(attnums);
+
+	/* with less than two attributes, we can bail out right away */
+	if (nattnums < 2)
+		return NULL;
+
+	/*
+	 * Iterate over the MVNDistinctItem items and find the widest one from
+	 * those fully-matched by clasuse.
+	 */
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		int				j;
+		bool			full_match = true;
+		MVNDistinctItem *item = &ndistinct->items[i];
+
+		/*
+		 * Skip items referencing more attributes than available clauses,
+		 * as those can't be fully matched.
+		 */
+		if (item->nattrs > nattnums)
+			continue;
+
+		/* We can skip items with fewer attributes than the best one. */
+		if (widest && (widest->nattrs >= item->nattrs))
+			continue;
+
+		/*
+		 * Check that the item actually is fully covered by clauses. We
+		 * have to translate all attribute numbers.
+		 */
+		for (j = 0; j < item->nattrs; j++)
+		{
+			int attnum = attmap[item->attrs[j]];
+
+			if (! bms_is_member(attnum, attnums))
+			{
+				full_match = false;
+				break;
+			}
+		}
+
+		/*
+		 * If the item is not fully matched by clauses, we can't use
+		 * it for the estimation.
+		 */
+		if (! full_match)
+			continue;
+
+		/*
+		 * We have a fully-matched item, and we already know it has to
+		 * be wider than the current one (otherwise we'd skip it before
+		 * inspecting it at the very beginning).
+		 */
+		widest = item;
+	}
+
+	return widest;
+}
+
+static bool
+attnum_in_ndistinct_item(MVNDistinctItem *item, int attnum, int16 *attmap)
+{
+	int j;
+
+	for (j = 0; j < item->nattrs; j++)
+	{
+		if (attnum == attmap[item->attrs[j]])
+			return true;
+	}
+
+	return false;
+}
+
+static Selectivity
+clauselist_mv_selectivity_ndist(PlannerInfo *root, Index relid,
+								List *clauses, MVStatisticInfo *mvstats,
+								Index varRelid, JoinType jointype,
+								SpecialJoinInfo *sjinfo)
+{
+	ListCell	   *lc;
+	Selectivity		s1 = 1.0;
+	MVNDistinct		ndistinct;
+	MVNDistinctItem *item;
+	Bitmapset	   *attnums;
+	List		   *clauses_filtered = NIL;
+
+	/* we should only get here if the statistics includes ndistinct */
+	Assert(mvstats->ndist_enabled && mvstats->ndist_built);
+
+	/* load the ndistinct items stored in the statistics */
+	ndistinct = load_mv_ndistinct(mvstats->mvoid);
+
+	/* collect attnums in the clauses */
+	attnums = collect_mv_attnums(clauses, relid, STATS_TYPE_NDIST);
+
+	Assert(bms_num_members(attnums) >= 2);
+
+	/*
+	 * Search for the widest ndistinct item (covering the most clauses), and
+	 * then use it to estimate the number of entries.
+	 */
+	item = find_widest_ndistinct_item(ndistinct, attnums,
+									  mvstats->stakeys->values);
+
+	if (item)
+	{
+		/*
+		 * We have an applicable item, so identify all covered clauses, and
+		 * remove them from the list of clauses.
+		 */
+		foreach(lc, clauses)
+		{
+			Bitmapset  *attnums_clause = NULL;
+			Node	   *clause = (Node *) lfirst(lc);
+
+			/*
+			 * XXX We need the attnum referenced by the clause, and this is the
+			 * easiest way to get it (but maybe not the best one). At this point
+			 * we should only see equality clauses, so just error out if we
+			 * stumble upon something else.
+			 */
+			if (! clause_is_mv_compatible(clause, relid, &attnums_clause,
+										  STATS_TYPE_NDIST))
+				elog(ERROR, "clause not compatible with ndistinct stats");
+
+			/*
+			 * We also expect only simple equality clauses, with a single Var.
+			 *
+			 * XXX This checks the number of attnums, not the number of Vars,
+			 * but clause_is_mv_compatible only accepts (Var=Const) clauses.
+			 */
+			Assert(bms_num_members(attnums_clause) == 1);
+
+			/*
+			 * If the clause matches the selected ndistinct item, add it to
+			 * the list of ndistinct clauses.
+			 */
+			if (!attnum_in_ndistinct_item(item,
+										  bms_singleton_member(attnums_clause),
+										  mvstats->stakeys->values))
+				clauses_filtered = lappend(clauses_filtered, clause);
+		}
+
+		/* Compute selectivity using the ndistinct item. */
+		s1 *= (1.0 / item->ndistinct);
+
+		/*
+		 * Throw away the clauses matched by the ndistinct, so that we don't
+		 * estimate them twice.
+		 */
+		clauses = clauses_filtered;
+	}
+
+	/* And now simply multiply with selectivities of the remaining clauses. */
+	foreach (lc, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(lc);
+
+		s1 *= clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+	}
+
+	return s1;
+}
+
+
 /*
  * When applying functional dependencies, we start with the strongest ones
  * strongest dependencies. That is, we select the dependency that:
@@ -1147,85 +1438,6 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 	return s1;
 }
 
-/*
- * estimate selectivity of clauses using multivariate statistic
- *
- * Perform estimation of the clauses using a MCV list.
- *
- * This assumes all the clauses are compatible with the selected statistics
- * (e.g. only reference columns covered by the statistics, use supported
- * operator, etc.).
- *
- * TODO: We may support some additional conditions, most importantly those
- * matching multiple columns (e.g. "a = b" or "a < b").
- *
- * TODO: Clamp the selectivity by min of the per-clause selectivities (i.e. the
- * selectivity of the most restrictive clause), because that's the maximum
- * we can ever get from ANDed list of clauses. This may probably prevent
- * issues with hitting too many buckets and low precision histograms.
- *
- * TODO: We may remember the lowest frequency in the MCV list, and then later
- * use it as a upper boundary for the selectivity (had there been a more
- * frequent item, it'd be in the MCV list). This might improve cases with
- * low-detail histograms.
- *
- * TODO: We may also derive some additional boundaries for the selectivity from
- * the MCV list, because
- *
- * (a) if we have a "full equality condition" (one equality condition on
- * each column of the statistic) and we found a match in the MCV list,
- * then this is the final selectivity (and pretty accurate),
- *
- * (b) if we have a "full equality condition" and we haven't found a match
- * in the MCV list, then the selectivity is below the lowest frequency
- * found in the MCV list,
- *
- * TODO: When applying the clauses to the histogram/MCV list, we can do that
- * from the most selective clauses first, because that'll eliminate the
- * buckets/items sooner (so we'll be able to skip them without inspection,
- * which is more expensive). But this requires really knowing the per-clause
- * selectivities in advance, and that's not what we do now.
- */
-static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
-{
-	bool		fullmatch = false;
-	Selectivity s1 = 0.0,
-				s2 = 0.0;
-
-	/*
-	 * Lowest frequency in the MCV list (may be used as an upper bound for
-	 * full equality conditions that did not match any MCV item).
-	 */
-	Selectivity mcv_low = 0.0;
-
-	/*
-	 * TODO: Evaluate simple 1D selectivities, use the smallest one as an
-	 * upper bound, product as lower bound, and sort the clauses in ascending
-	 * order by selectivity (to optimize the MCV/histogram evaluation).
-	 */
-
-	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
-										   &fullmatch, &mcv_low);
-
-	/*
-	 * If we got a full equality match on the MCV list, we're done (and the
-	 * estimate is pretty good).
-	 */
-	if (fullmatch && (s1 > 0.0))
-		return s1;
-
-	/*
-	 * TODO if (fullmatch) without matching MCV item, use the mcv_low
-	 * selectivity as upper bound
-	 */
-
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
-
-	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
-	return s1 + s2;
-}
 
 /*
  * Collect attributes from mv-compatible clauses.
@@ -1409,7 +1621,8 @@ choose_mv_statistics(List *stats, Bitmapset *attnums, int types)
 		int			numattrs = info->stakeys->dim1;
 
 		/* skip statistics not matching any of the requested types */
-		if (! ((info->deps_built && (STATS_TYPE_FDEPS & types)) ||
+		if (! ((info->ndist_built && (STATS_TYPE_NDIST & types)) ||
+			   (info->deps_built && (STATS_TYPE_FDEPS & types)) ||
 			   (info->mcv_built && (STATS_TYPE_MCV & types)) ||
 			   (info->hist_built && (STATS_TYPE_HIST & types))))
 			continue;
@@ -1703,6 +1916,9 @@ clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int type
 static bool
 stats_type_matches(MVStatisticInfo *stat, int type)
 {
+	if ((type & STATS_TYPE_NDIST) && stat->ndist_built)
+		return true;
+
 	if ((type & STATS_TYPE_FDEPS) && stat->deps_built)
 		return true;
 
-- 
2.5.5

0008-WIP-allow-using-multiple-statistics-in-clauselis-v22.patchbinary/octet-stream; name=0008-WIP-allow-using-multiple-statistics-in-clauselis-v22.patchDownload

From 6a4c76169f37c0e24efd3d3786cec1132258fa2f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 28 Oct 2016 17:03:09 +0200
Subject: [PATCH 8/9] WIP: allow using multiple statistics in
 clauselist_selectivity

---
 src/backend/optimizer/path/clausesel.c      | 31 +++++++-----
 src/test/regress/expected/mv_statistics.out | 78 +++++++++++++++++++++++++++++
 src/test/regress/parallel_schedule          |  2 +-
 src/test/regress/serial_schedule            |  1 +
 src/test/regress/sql/mv_statistics.sql      | 60 ++++++++++++++++++++++
 5 files changed, 159 insertions(+), 13 deletions(-)
 create mode 100644 src/test/regress/expected/mv_statistics.out
 create mode 100644 src/test/regress/sql/mv_statistics.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c35b914..e8b214f 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -228,15 +228,16 @@ clauselist_selectivity(PlannerInfo *root,
 		(count_mv_attnums(clauses, relid,
 						  STATS_TYPE_MCV | STATS_TYPE_HIST) >= 2))
 	{
+		Bitmapset  *mvattnums;
+		MVStatisticInfo *mvstat;
+
 		/* collect attributes from the compatible conditions */
-		Bitmapset  *mvattnums = collect_mv_attnums(clauses, relid,
-									  STATS_TYPE_MCV | STATS_TYPE_HIST);
+		mvattnums = collect_mv_attnums(clauses, relid,
+									   STATS_TYPE_MCV | STATS_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
-		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums,
-									  STATS_TYPE_MCV | STATS_TYPE_HIST);
-
-		if (mvstat != NULL)		/* we have a matching stats */
+		while ((mvstat = choose_mv_statistics(stats, mvattnums,
+									  STATS_TYPE_MCV | STATS_TYPE_HIST)))
 		{
 			/* clauses compatible with multi-variate stats */
 			List	   *mvclauses = NIL;
@@ -250,6 +251,10 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* compute the multivariate stats */
 			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+
+			/* update the bitmap if attnums using the remaining clauses) */
+			mvattnums = collect_mv_attnums(clauses, relid,
+								   STATS_TYPE_MCV | STATS_TYPE_HIST);
 		}
 	}
 
@@ -264,9 +269,7 @@ clauselist_selectivity(PlannerInfo *root,
 		mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_FDEPS);
 
 		/* and search for the statistic covering the most attributes */
-		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_FDEPS);
-
-		if (mvstat != NULL)		/* we have a matching stats */
+		while ((mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_FDEPS)))
 		{
 			/* clauses compatible with multi-variate stats */
 			List	   *mvclauses = NIL;
@@ -284,6 +287,9 @@ clauselist_selectivity(PlannerInfo *root,
 			/* compute the multivariate stats (dependencies) */
 			s1 *= clauselist_mv_selectivity_deps(root, relid, mvclauses, mvstat,
 												 varRelid, jointype, sjinfo);
+
+			/* update the bitmap if attnums using the remaining clauses) */
+			mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_FDEPS);
 		}
 	}
 
@@ -298,9 +304,7 @@ clauselist_selectivity(PlannerInfo *root,
 		mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_NDIST);
 
 		/* and search for the statistic covering the most attributes */
-		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_NDIST);
-
-		if (mvstat != NULL)		/* we have a matching stats */
+		while ((mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_NDIST)))
 		{
 			/* clauses compatible with multi-variate stats */
 			List	   *mvclauses = NIL;
@@ -315,6 +319,9 @@ clauselist_selectivity(PlannerInfo *root,
 			/* compute the multivariate stats (dependencies) */
 			s1 *= clauselist_mv_selectivity_ndist(root, relid, mvclauses, mvstat,
 												  varRelid, jointype, sjinfo);
+
+			/* collect attributes from the compatible conditions */
+			mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_NDIST);
 		}
 	}
 
diff --git a/src/test/regress/expected/mv_statistics.out b/src/test/regress/expected/mv_statistics.out
new file mode 100644
index 0000000..7eb6f2e
--- /dev/null
+++ b/src/test/regress/expected/mv_statistics.out
@@ -0,0 +1,78 @@
+-- data type passed by value
+CREATE TABLE multi_stats (
+    a INT,
+    b INT,
+    c INT,
+    d INT,
+    e INT,
+    f INT,
+    g INT,
+    h INT
+);
+-- MCV list on (a,b)
+CREATE STATISTICS m1 WITH (mcv) ON (a, b) FROM multi_stats;
+-- histogram on (c,d)
+CREATE STATISTICS m2 WITH (histogram) ON (c, d) FROM multi_stats;
+-- functional dependencies on (e,f)
+CREATE STATISTICS m3 WITH (dependencies) ON (e, f) FROM multi_stats;
+-- ndistinct coefficients on (g,h)
+CREATE STATISTICS m4 WITH (ndistinct) ON (g, h) FROM multi_stats;
+-- perfectly correlated groups
+INSERT INTO multi_stats
+SELECT
+    i, i/2,      -- MCV
+    i, i + j,    -- histogram
+    k, k/2,      -- dependencies
+    l/5, l/10    -- ndistinct
+FROM (
+    SELECT
+        mod(x, 13)   AS i,
+        mod(x, 17)   AS j,
+        mod(x, 11)   AS k,
+        mod(x, 51)   AS l
+    FROM generate_series(1,30000) AS s(x)
+) foo;
+ANALYZE multi_stats;
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (c >= 3) AND (d <= 10);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..821.00 rows=413 width=32)
+   Filter: ((c >= 3) AND (d <= 10) AND (a = 8) AND (b = 4))
+(2 rows)
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (e = 10) AND (f = 5);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..821.00 rows=210 width=32)
+   Filter: ((a = 8) AND (b = 4) AND (e = 10) AND (f = 5))
+(2 rows)
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (e = 10) AND (f = 5);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..821.00 rows=210 width=32)
+   Filter: ((a = 8) AND (b = 4) AND (e = 10) AND (f = 5))
+(2 rows)
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (g = 2) AND (h = 1);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..821.00 rows=210 width=32)
+   Filter: ((a = 8) AND (b = 4) AND (g = 2) AND (h = 1))
+(2 rows)
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND
+               (c >= 3) AND (d <= 10) AND
+               (e = 10) AND (f = 5);
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..971.00 rows=37 width=32)
+   Filter: ((c >= 3) AND (d <= 10) AND (a = 8) AND (b = 4) AND (e = 10) AND (f = 5))
+(2 rows)
+
+DROP TABLE multi_stats;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 8b750fc..b220707 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -115,4 +115,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_ndistinct mv_dependencies mv_mcv mv_histogram
+test: mv_ndistinct mv_dependencies mv_mcv mv_histogram mv_statistics
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index ff47035..9a8de27 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -173,3 +173,4 @@ test: mv_ndistinct
 test: mv_dependencies
 test: mv_mcv
 test: mv_histogram
+test: mv_statistics
diff --git a/src/test/regress/sql/mv_statistics.sql b/src/test/regress/sql/mv_statistics.sql
new file mode 100644
index 0000000..cd12ad0
--- /dev/null
+++ b/src/test/regress/sql/mv_statistics.sql
@@ -0,0 +1,60 @@
+-- data type passed by value
+CREATE TABLE multi_stats (
+    a INT,
+    b INT,
+    c INT,
+    d INT,
+    e INT,
+    f INT,
+    g INT,
+    h INT
+);
+
+-- MCV list on (a,b)
+CREATE STATISTICS m1 WITH (mcv) ON (a, b) FROM multi_stats;
+
+-- histogram on (c,d)
+CREATE STATISTICS m2 WITH (histogram) ON (c, d) FROM multi_stats;
+
+-- functional dependencies on (e,f)
+CREATE STATISTICS m3 WITH (dependencies) ON (e, f) FROM multi_stats;
+
+-- ndistinct coefficients on (g,h)
+CREATE STATISTICS m4 WITH (ndistinct) ON (g, h) FROM multi_stats;
+
+-- perfectly correlated groups
+INSERT INTO multi_stats
+SELECT
+    i, i/2,      -- MCV
+    i, i + j,    -- histogram
+    k, k/2,      -- dependencies
+    l/5, l/10    -- ndistinct
+FROM (
+    SELECT
+        mod(x, 13)   AS i,
+        mod(x, 17)   AS j,
+        mod(x, 11)   AS k,
+        mod(x, 51)   AS l
+    FROM generate_series(1,30000) AS s(x)
+) foo;
+
+ANALYZE multi_stats;
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (c >= 3) AND (d <= 10);
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (e = 10) AND (f = 5);
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (e = 10) AND (f = 5);
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (g = 2) AND (h = 1);
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND
+               (c >= 3) AND (d <= 10) AND
+               (e = 10) AND (f = 5);
+
+DROP TABLE multi_stats;
-- 
2.5.5

0009-WIP-psql-tab-completion-basics-v22.patchbinary/octet-stream; name=0009-WIP-psql-tab-completion-basics-v22.patchDownload

From a1be139710453d0a799eb5773cbe3598ffcecfed Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 28 Oct 2016 20:46:37 +0200
Subject: [PATCH 9/9] WIP: psql tab-completion basics

---
 src/bin/psql/tab-complete.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 02c8d60..2d11aad 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -448,6 +448,21 @@ static const SchemaQuery Query_for_list_of_foreign_tables = {
 	NULL
 };
 
+static const SchemaQuery Query_for_list_of_statistics = {
+	/* catname */
+	"pg_catalog.pg_mv_statistic s",
+	/* selcondition */
+	NULL,
+	/* viscondition */
+	NULL,
+	/* namespace */
+	"s.stanamespace",
+	/* result */
+	"pg_catalog.quote_ident(s.staname)",
+	/* qualresult */
+	NULL
+};
+
 static const SchemaQuery Query_for_list_of_tables = {
 	/* catname */
 	"pg_catalog.pg_class c",
@@ -965,6 +980,7 @@ static const pgsql_thing_t words_after_create[] = {
 	{"SCHEMA", Query_for_list_of_schemas},
 	{"SEQUENCE", NULL, &Query_for_list_of_sequences},
 	{"SERVER", Query_for_list_of_servers},
+	{"STATISTICS", NULL, &Query_for_list_of_statistics},
 	{"TABLE", NULL, &Query_for_list_of_tables},
 	{"TABLESPACE", Query_for_list_of_tablespaces},
 	{"TEMP", NULL, NULL, THING_NO_DROP},		/* for CREATE TEMP TABLE ... */
@@ -1407,8 +1423,8 @@ psql_completion(const char *text, int start, int end)
 		{"AGGREGATE", "COLLATION", "CONVERSION", "DATABASE", "DEFAULT PRIVILEGES", "DOMAIN",
 			"EVENT TRIGGER", "EXTENSION", "FOREIGN DATA WRAPPER", "FOREIGN TABLE", "FUNCTION",
 			"GROUP", "INDEX", "LANGUAGE", "LARGE OBJECT", "MATERIALIZED VIEW", "OPERATOR",
-			"POLICY", "ROLE", "RULE", "SCHEMA", "SERVER", "SEQUENCE", "SYSTEM", "TABLE",
-			"TABLESPACE", "TEXT SEARCH", "TRIGGER", "TYPE",
+			"POLICY", "ROLE", "RULE", "SCHEMA", "SERVER", "SEQUENCE", "STATISTICS", "SYSTEM",
+			"TABLE", "TABLESPACE", "TEXT SEARCH", "TRIGGER", "TYPE",
 		"USER", "USER MAPPING FOR", "VIEW", NULL};
 
 		COMPLETE_WITH_LIST(list_ALTER);
@@ -1692,6 +1708,10 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches5("ALTER", "RULE", MatchAny, "ON", MatchAny))
 		COMPLETE_WITH_CONST("RENAME TO");
 
+	/* ALTER STATISTICS <name> */
+	else if (Matches3("ALTER", "STATISTICS", MatchAny))
+		COMPLETE_WITH_LIST3("OWNER TO", "RENAME TO", "SET SCHEMA");
+
 	/* ALTER TRIGGER <name>, add ON */
 	else if (Matches3("ALTER", "TRIGGER", MatchAny))
 		COMPLETE_WITH_CONST("ON");
@@ -2257,6 +2277,12 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches3("CREATE", "SERVER", MatchAny))
 		COMPLETE_WITH_LIST3("TYPE", "VERSION", "FOREIGN DATA WRAPPER");
 
+/* CREATE STATISTICS <name> */
+	else if (Matches3("CREATE", "STATISTICS", MatchAny))
+		COMPLETE_WITH_LIST2("WITH", "ON");
+	else if (Matches4("CREATE", "STATISTICS", MatchAny, "ON|WITH"))
+		COMPLETE_WITH_CONST("(");
+
 /* CREATE TABLE --- is allowed inside CREATE SCHEMA, so use TailMatches */
 	/* Complete "CREATE TEMP/TEMPORARY" with the possible temp objects */
 	else if (TailMatches2("CREATE", "TEMP|TEMPORARY"))
-- 
2.5.5

#174

Dilip Kumar

dilipbalaut@gmail.com

about 9 years ago

In reply to: Tomas Vondra (#173)

Re: multivariate statistics (v19)

On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is v22 of the patch series, rebased to current master and fixing
the reported bug. I haven't made any other changes - the issues reported by
Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
other reviews.

v22 fixes the problem, I reported. In my test, I observed that group
by estimation is much better with ndistinct stat.

Here is one example:

postgres=# explain analyze select p_brand, p_type, p_size from part
group by p_brand, p_type, p_size;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=37992.00..38992.00 rows=100000 width=36) (actual
time=953.359..1011.302 rows=186607 loops=1)
Group Key: p_brand, p_type, p_size
-> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36)
(actual time=0.013..163.672 rows=1000000 loops=1)
Planning time: 0.194 ms
Execution time: 1020.776 ms
(5 rows)

postgres=# CREATE STATISTICS s2 WITH (ndistinct) on (p_brand, p_type,
p_size) from part;
CREATE STATISTICS
postgres=# analyze part;
ANALYZE
postgres=# explain analyze select p_brand, p_type, p_size from part
group by p_brand, p_type, p_size;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=37992.00..39622.46 rows=163046 width=36) (actual
time=935.162..992.944 rows=186607 loops=1)
Group Key: p_brand, p_type, p_size
-> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36)
(actual time=0.013..156.746 rows=1000000 loops=1)
Planning time: 0.308 ms
Execution time: 1001.889 ms

In above example,
Without MVStat-> estimated: 100000 Actual: 186607
With MVStat-> estimated: 163046 Actual: 186607

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#175

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 9 years ago

In reply to: Dilip Kumar (#174)

Re: multivariate statistics (v19)

On 01/04/2017 03:21 PM, Dilip Kumar wrote:

On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is v22 of the patch series, rebased to current master and fixing
the reported bug. I haven't made any other changes - the issues reported by
Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
other reviews.

v22 fixes the problem, I reported. In my test, I observed that group
by estimation is much better with ndistinct stat.

Here is one example:

postgres=# explain analyze select p_brand, p_type, p_size from part
group by p_brand, p_type, p_size;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=37992.00..38992.00 rows=100000 width=36) (actual
time=953.359..1011.302 rows=186607 loops=1)
Group Key: p_brand, p_type, p_size
-> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36)
(actual time=0.013..163.672 rows=1000000 loops=1)
Planning time: 0.194 ms
Execution time: 1020.776 ms
(5 rows)

postgres=# CREATE STATISTICS s2 WITH (ndistinct) on (p_brand, p_type,
p_size) from part;
CREATE STATISTICS
postgres=# analyze part;
ANALYZE
postgres=# explain analyze select p_brand, p_type, p_size from part
group by p_brand, p_type, p_size;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=37992.00..39622.46 rows=163046 width=36) (actual
time=935.162..992.944 rows=186607 loops=1)
Group Key: p_brand, p_type, p_size
-> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36)
(actual time=0.013..156.746 rows=1000000 loops=1)
Planning time: 0.308 ms
Execution time: 1001.889 ms

In above example,
Without MVStat-> estimated: 100000 Actual: 186607
With MVStat-> estimated: 163046 Actual: 186607

Thanks. Those plans match my experiments with the TPC-H data set,
although I've been playing with the smallest scale (1GB).

It's not very difficult to make the estimation error arbitrary large,
e.g. by using perfectly correlated (identical) columns.

regard

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#176

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Tomas Vondra (#173)

Re: multivariate statistics (v19)

On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 01/03/2017 05:22 PM, Tomas Vondra wrote:

On 01/03/2017 02:42 PM, Dilip Kumar wrote:

...

I think it should be easily reproducible, in case it's not I can send
call stack or core dump.

Thanks for the report. It was trivial to reproduce and it turned out to
be a fairly simple bug. Will send a new version of the patch soon.

Attached is v22 of the patch series, rebased to current master and fixing
the reported bug. I haven't made any other changes - the issues reported by
Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
other reviews.

And nothing has happened since. Are there people willing to review
this patch and help it proceed? As this patch is quite large, I am not
sure if it is fit to join the last CF. Thoughts?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#177

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Michael Paquier (#176)

Re: multivariate statistics (v19)

Michael Paquier wrote:

On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Attached is v22 of the patch series, rebased to current master and fixing
the reported bug. I haven't made any other changes - the issues reported by
Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
other reviews.

And nothing has happened since. Are there people willing to review
this patch and help it proceed?

I am going to grab this patch as committer.

As this patch is quite large, I am not sure if it is fit to join the
last CF. Thoughts?

All patches, regardless of size, are welcome to join any commitfest.
The last commitfest is not different in that regard. The rule I
remember is that patches may not arrive *for the first time* in the last
commitfest. This patch has already seen a lot of work in previous
commitfests, so it's fine.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#178

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Alvaro Herrera (#177)

Re: multivariate statistics (v19)

On Wed, Jan 25, 2017 at 9:56 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Michael Paquier wrote:

And nothing has happened since. Are there people willing to review
this patch and help it proceed?

I am going to grab this patch as committer.

Thanks, that's good to know.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#179

Ideriha, Takeshi

ideriha.takeshi@jp.fujitsu.com

almost 9 years ago

In reply to: Michael Paquier (#178)

Re: multivariate statistics (v19)

When you have time, could you rebase the pathes?
Some patches cannot be applied to the current HEAD.
0001 patch can be applied but the following 0002 patch cannot be.

I've just started reading your patch (mainly docs and README, not yet source code.)

Though these are minor things, I've found some typos or mistakes in the document and README.

+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing

Regarding line 629 at 0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v22.patch,
there is a double "in the".

+   knowledge of a value in the first column is sufficient for detemining the
+   value in the other column. Then functional dependencies are built on those

Regarding line 701 at 0002-PATCH,
"determining" is mistakenly spelled "detemining".

@@ -0,0 +1,98 @@
+Multivariate statististics
+==========================

Regarding line 2415 at 0002-PATCH, "statististics" should be statistics

+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>

+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>

Regarding line 612 and 771 at 0002-PATCH,
I assume saying "multiple statistics" explicitly is easier to understand to users
since these commands don't for the statistics we already have in the pg_statistics in my understanding.

+ [1] http://en.wikipedia.org/wiki/Database_normalization

Regarding line 386 at 0003-PATCH, is it better to change this link to this one:
https://en.wikipedia.org/wiki/Functional_dependency ?
README.dependencies cites directly above link.

Though I pointed out these typoes and so on,
I believe these feedback are less priority compared to the source code itself.

So please work on my feedback if you have time.

regards,
Ideriha Takeshi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#180

Dilip Kumar

dilipbalaut@gmail.com

almost 9 years ago

In reply to: Tomas Vondra (#175)

Re: multivariate statistics (v19)

On Thu, Jan 5, 2017 at 3:27 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Thanks. Those plans match my experiments with the TPC-H data set, although
I've been playing with the smallest scale (1GB).

It's not very difficult to make the estimation error arbitrary large, e.g.
by using perfectly correlated (identical) columns.

I have done an initial review for ndistint and histogram patches,
there are few review comments.

ndistinct
---------
1. Duplicate statistics:
postgres=# create statistics s with (ndistinct) on (a,c) from t;
2017-01-07 16:21:54.575 IST [63817] ERROR: duplicate key value
violates unique constraint "pg_mv_statistic_name_index"
2017-01-07 16:21:54.575 IST [63817] DETAIL: Key (staname,
stanamespace)=(s, 2200) already exists.
2017-01-07 16:21:54.575 IST [63817] STATEMENT: create statistics s
with (ndistinct) on (a,c) from t;
ERROR: duplicate key value violates unique constraint
"pg_mv_statistic_name_index"
DETAIL: Key (staname, stanamespace)=(s, 2200) already exists.

For duplicate statistics, I think we can check the existence of the
statistics and give more meaningful error code something statistics
"s" already exist.

2. Typo
+ /*
+ * Sort the attnums, which makes detecting duplicies somewhat
+ * easier, and it does not hurt (it does not affect the efficiency,
+ * onlike for indexes, for example).
+ */
/onlike/unlike

3. Typo
/*
* Find attnims of MV stats using the mvoid.
*/
int2vector *
find_mv_attnums(Oid mvoid, Oid *relid)

/attnims/attnums

histograms
--------------
+ if (matches[i] == MVSTATS_MATCH_FULL)
+ s += mvhist->buckets[i]->ntuples;
+ else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+ s += 0.5 * mvhist->buckets[i]->ntuples;

Isn't it will be better that take some percentage of the bucket based
on the number of distinct element for partial matching buckets.

+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+  int2vector *stakeys,
+  MVSerializedHistogram mvhist,
+  int nmatches, char *matches,
+  bool is_or)
+{
+ int i;

For each clause we are processing all the buckets, can't we use some
data structure which can make multi-dimensions information searching
faster.
Something like HTree, RTree, Maybe storing histogram in these formats
will be difficult?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#181

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 9 years ago

In reply to: Ideriha, Takeshi (#179)

Re: multivariate statistics (v19)

Hello, I'll return on this since this should welcome more eyeballs.

At Thu, 26 Jan 2017 09:03:10 +0000, "Ideriha, Takeshi" <ideriha.takeshi@jp.fujitsu.com> wrote in <4E72940DA2BF16479384A86D54D0988A565822A9@G01JPEXMBKW04>

Hi

When you have time, could you rebase the pathes?
Some patches cannot be applied to the current HEAD.

For those who are willing to look this,
352a24a1f9d6f7d4abb1175bfd22acc358f43140 breaks this. So just
before it can accept this patches cleanly.

0001 patch can be applied but the following 0002 patch cannot be.

I've just started reading your patch (mainly docs and README, not yet source code.)

Though these are minor things, I've found some typos or mistakes in the document and README.
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
Regarding line 629 at 0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v22.patch,
there is a double "in the".
+   knowledge of a value in the first column is sufficient for detemining the
+   value in the other column. Then functional dependencies are built on those
Regarding line 701 at 0002-PATCH,
"determining" is mistakenly spelled "detemining".
@@ -0,0 +1,98 @@
+Multivariate statististics
+==========================
Regarding line 2415 at 0002-PATCH, "statististics" should be statistics
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define a new statistics</refpurpose>
+ </refnamediv>
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove a statistics</refpurpose>
+ </refnamediv>
Regarding line 612 and 771 at 0002-PATCH,
I assume saying "multiple statistics" explicitly is easier to understand to users
since these commands don't for the statistics we already have in the pg_statistics in my understanding.

+ [1] http://en.wikipedia.org/wiki/Database_normalization

Regarding line 386 at 0003-PATCH, is it better to change this link to this one:
https://en.wikipedia.org/wiki/Functional_dependency ?
README.dependencies cites directly above link.

Though I pointed out these typoes and so on,
I believe these feedback are less priority compared to the source code itself.

So please work on my feedback if you have time.

README.dependencies

dependencies, and for each one count the number of rows rows consistent it.

"of rows rows consistent it" => "or rows consistent with it"?

are in fact consistent with the functinal dependency, i.e. that given the a

"that given the a" => "that given a" ?

dependencies.c:

dependency_dgree():

- The k is assumed larger than 1. I think assertion is required.

- "/* end of the preceding group */" seems to be better if it
is just after the "if (multi_sort.." currently just after it.

- The following comment seems mis-edited.

* If there is a single are no contradicting rows, count the group
* as supporting, otherwise contradicting.

maybe this would be like the following? The varialbe counting
the first "contradiction" is named "n_violations". This seems
somewhat confusing.

* If there are no violating rows up to here, count the group
* as supporting, otherwise contradicting.

- "/* first columns match, but the last one does not"
else if (multi_sort_compare_dims((k - 1), (k - 1), ...

The above comparison should use multi_sort_compare_dim, not
dims

- This function counts "n_contradicting_rows" but it is not
referenced. Anyway n_contradicting_rows = numrows -
n_supporing_rows so it and n_contradicting seem
unncecessary.

build_mv_dependencies():

- In the commnet,
"* covering jut 2 columns, to the largest ones, covering all columns"
"* included int the statistics. We start from the smallest ones because we"

l1: "jut" => "just", l2: "int" => "in"

mvstats.h:

- struct MVDependencyData/ MVDependenciesData

The varialbe length member at the last of the structs should
be defined using FLEXIBLE_ARRAY_MEMBER, from the convention.

- I'm not sure how much it impacts performance, but some
struct members seems to have a bit too wide types. For
example, MVDepedenciesData.type is of int32 but it can have
only '1' for now and it won't be two-digits. Also ndeps
cannot be so large.

common.c:

multi_sort_compare_dims needs comment.

general:
This patch uses int16 as the type of attrubute number but it
might be better to use AttrNumber for the purpose.
(Specifically it seems defined as the type for an attribute
index but also used as the varialbe for number of attributes)

Sorry for the random comment in advance. I'll learn this further.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#182

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#173)

Re: multivariate statistics (v19)

Tomas Vondra wrote:

On 01/03/2017 05:22 PM, Tomas Vondra wrote:

On 01/03/2017 02:42 PM, Dilip Kumar wrote:

...

I think it should be easily reproducible, in case it's not I can send
call stack or core dump.

Thanks for the report. It was trivial to reproduce and it turned out to
be a fairly simple bug. Will send a new version of the patch soon.

Attached is v22 of the patch series, rebased to current master and fixing
the reported bug. I haven't made any other changes - the issues reported by
Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
other reviews.

Hmm. So we have a catalog pg_mv_statistics which stores two things:
1. the configuration regarding mvstats that have been requested by user
via CREATE/ALTER STATISTICS
2. the actual values captured from the above, via ANALYZE

I think this conflates two things that really are separate, given their
different timings and usage patterns. This decision is causing the
catalog to have columns enabled/built flags for each set of stats
requested, which looks a bit odd. In particular, the fact that you have
to heap_update the catalog in order to add more stuff as it's built
looks inconvenient.

Have you thought about having the "requested" bits be separate from the
actual computed values? Something like

pg_mv_statistics
starelid
staname
stanamespace
staowner -- all the above as currently
staenabled array of "char" {d,f,s}
stakeys
// no CATALOG_VARLEN here

where each char in the staenabled array has a #define and indicates one
type, "ndistinct", "functional dep", "selectivity" etc.

The actual values computed by ANALYZE would live in a catalog like:

pg_mv_statistics_values
stvstaid -- OID of the corresponding pg_mv_statistics row. Needed?
stvrelid -- same as starelid
stvkeys -- same as stakeys
#ifdef CATALOG_VARLEN
stvkind 'd' or 'f' or 's', etc
stvvalue the bytea blob
#endif

I think that would be simpler, both conceptually and in terms of code.

The other angle to consider is planner-side: how does the planner gets
to the values? I think as far as the planner goes, the first catalog
doesn't matter at all, because a statistics type that has been enabled
but not computed is not interesting at all; planner only cares about the
values in the second catalog (this is why I added stvkeys). Currently
you're just caching a single pg_mv_statistics row in get_relation_info
(and only if any of the "built" flags is set), which is simple. With my
proposed change, you'd need to keep multiple pg_mv_statistics_values
rows.

But maybe you already tried something like what I propose and there's a
reason not to do it?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#183

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#173)

Re: multivariate statistics (v19)

Minor nitpicks:

Let me suggest to use get_attnum() in CreateStatistics instead of
SearchSysCacheAttName for each column. Also, we use type AttrNumber for
attribute numbers rather than int16. Finally in the same function you
have an erroneous ERRCODE_UNDEFINED_COLUMN which should be
ERRCODE_DUPLICATE_COLUMN in the loop that searches for duplicates.

May I suggest that compare_int16 be named attnum_cmp (just to be
consistent with other qsort comparators) and look like
return *((const AttrNumber *) a) - *((const AttrNumber *) b);
instead of memcmp?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#184

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Kyotaro HORIGUCHI (#181)

9 attachment(s)

Re: multivariate statistics (v19)

Hi everyone,

thanks for the reviews! Attached is v23 of the patch series, addressing
most of the points raised in the reviews.

A quick summary of the changes (I'll respond to the other threads for
points that deserve a bit more detailed discussion):

0) Rebase to current master. The main culprit was the pesky logical
replication patch committed a week ago, because SUBSCRIPTION and
STATISTICS are right next to each other in gram.y, various switches etc.

1) Many typos, mentioned by all the reviewers.

2) I've added a short explanation (in alter_table.sgml) of how ALTER
TABLE ... DROP COLUMN handles multivariate statistics, i.e. that those
are only dropped if there would be a single remaining column.

3) I've reworded 'thoroughly' to 'in more detail' in planstats.sgml, to
make Petr happy ;-)

4) Added missing comments to get_statistics_oid, RelationGetMVStatList,
update_mv_stats, ndistinct_for_combination. Also update_mv_stats() was
not used outside common.c, so I've made it static and removed the
prototype from mvstats.h.

5) I've changed 'statistics does not exist' to 'statistics do not exist'
on a number of places.

6) Removed XXX about checking for duplicates in CreateStatistics. I
agree with Petr that we shouldn't do such checks, as we're not doing
that for other objects (e.g. indexes).

7) I've moved moved the code loading statistics from get_relation_info
into a new function get_relation_statistics, to get rid of the

if (true)
{
...
}

block, which was there due to mimicking how index details are loaded
without having hasindex-like flag. I like this better than merging the
block into get_relation_info directly.

8) I've changed 'a statistics' to 'multivariate statistics' on a few
places in sgml docs, to make it clear it's not referring to the
'regular' statistics (e.g. at CREATE/DROP STATISTICS, mentioned by
Ideriha Takeshi).

9) I've changed the link in README.dependencies to
https://en.wikipedia.org/wiki/Functional_dependency as proposed by
Ideriha Takeshi. I'm pretty sure the wiki page about database
normalization, referenced by the original link, included a nice
functional dependency example some time ago, but it seems to have
changed and the new link is better.

But perhaps it's not a good idea to link to wikipedia, as the pages
clearly change quite significantly?

10) The CREATE STATISTICS now reports a nice 'already exists' message,
instead of the 'duplicate key', pointed out by Dilip.

11) MVNDistinctItem/MVNDistinctData now use FLEXIBLE_ARRAY_MEMBER for
the array, just like the other structs.

On 01/26/2017 12:01 PM, Kyotaro HORIGUCHI wrote:

dependencies.c:

dependency_dgree():

- The k is assumed larger than 1. I think assertion is required.

- "/* end of the preceding group */" seems to be better if it
is just after the "if (multi_sort.." currently just after it.

- The following comment seems mis-edited.

* If there is a single are no contradicting rows, count the group
* as supporting, otherwise contradicting.

maybe this would be like the following? The varialbe counting
the first "contradiction" is named "n_violations". This seems
somewhat confusing.

* If there are no violating rows up to here, count the group
* as supporting, otherwise contradicting.

- "/* first columns match, but the last one does not"
else if (multi_sort_compare_dims((k - 1), (k - 1), ...

The above comparison should use multi_sort_compare_dim, not
dims

- This function counts "n_contradicting_rows" but it is not
referenced. Anyway n_contradicting_rows = numrows -
n_supporing_rows so it and n_contradicting seem
unncecessary.

Yes, absolutely. This was clearly unnecessary remainder of the original
implementation, and I failed to clean it up after adopting Dean's idea
of continuous dependency degree.

I've also reworked the method a bit, moving handling of the last group
into the main loop (instead of doing that separately right after the
loop, which I think was a bit ugly anyway). Can you check if you're
happy with the code & comments now?

mvstats.h:

- struct MVDependencyData/ MVDependenciesData

The varialbe length member at the last of the structs should
be defined using FLEXIBLE_ARRAY_MEMBER, from the convention.

Yes, fixed. The other structures already used that macro, but I failed
to notice MVDependencyData/ MVDependenciesData need that fix too.

- I'm not sure how much it impacts performance, but some
struct members seems to have a bit too wide types. For
example, MVDepedenciesData.type is of int32 but it can have
only '1' for now and it won't be two-digits. Also ndeps
cannot be so large.

I doubt the impact on performance is measurable, particularly for the
global fields (e.g. nbuckets is tiny compared to the space needed for
the buckets themselves).

But I think you're right we shouldn't use fields wider than actually
needed (e.g. using uint32 for nbuckets is a bit insane, and uint16 would
be just fine). It's not just a matter of performance, but also a way to
document expected values etc.

I'll go through the fields and use smaller data types where appropriate.

general:
This patch uses int16 as the type of attrubute number but it
might be better to use AttrNumber for the purpose.
(Specifically it seems defined as the type for an attribute
index but also used as the varialbe for number of attributes)

Agreed. Will check with the struct members.

Sorry for the random comment in advance. I'll learn this further.

Thanks for the review!

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-Restric-v23.patchbinary/octet-stream; name=0001-teach-pull_-varno-varattno-_walker-about-Restric-v23.patchDownload

From 5dfca15a5db7cabd9145c76715cb9aea396ec83f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:35:05 +0200
Subject: [PATCH 1/9] teach pull_(varno|varattno)_walker about RestrictInfo

otherwise pull_varnos fails when processing OR clauses
---
 src/backend/optimizer/util/var.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index cf326ae..7056bcd 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -196,6 +196,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		context->sublevels_up--;
 		return result;
 	}
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo*)node;
+		context->varnos = bms_add_members(context->varnos,
+										  rinfo->clause_relids);
+		return false;
+	}
 	return expression_tree_walker(node, pull_varnos_walker,
 								  (void *) context);
 }
@@ -244,6 +251,15 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)
 		return false;
 	}
 
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *)node;
+
+		return expression_tree_walker((Node*)rinfo->clause,
+									  pull_varattnos_walker,
+									  (void*) context);
+	}
+
 	/* Should not find an unplanned subquery */
 	Assert(!IsA(node, Query));
 
-- 
2.5.5

0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v23.patchbinary/octet-stream; name=0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v23.patchDownload

From 7f32d263f8b6e6aea95d6544fd10e46177346fb3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:35:47 +0200
Subject: [PATCH 2/9] PATCH: shared infrastructure and ndistinct coefficients

Basic infrastructure shared by all kinds of multivariate stats, most
importantly:

- adds a new system catalog (pg_mv_statistic)
- CREATE STATISTICS name ON (columns) FROM table
- DROP STATISTICS name
- ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME
- implementation of ndistinct coefficients (the simplest type of
  multivariate statistics)
- computing ndistinct coefficients during ANALYZE
- updates existing regression tests (new catalog etc.)
- modifies estimate_num_groups() to use ndistinct if available

The current implementation requires a valid 'ltopr' for the columns, so
that we can sort the sample rows in various ways, both in this patch
and other kinds of statistics. Maybe this restriction could be relaxed
in the future, requiring just 'eqopr' in case of stats not sorting the
data (e.g. functional dependencies and MCV lists).

Some of the stats implemented in follow-up patches (e.g. functional
dependencies and MCV list with limited functionality) might be made
to work with hashes of the values. That would save a lot of space
for storing the statistics, and it would be sufficient for estimating
equality conditions.

creating statistics
-------------------

Statistics are created by CREATE STATISTICS command, with this syntax:

    CREATE STATISTICS statistics_name ON (columns) FROM table

where 'statistics_name' may be a fully-qualified name (i.e. specifying
a schema). It's expected that we'll eventually add support for join
statistics, referencing tables that may be located in different schemas,
so we can't make the name unique per-table (like constraints), and we
can't just pick one of the table schemas.

dropping statistics
-------------------

The statistics may be dropped automatically using DROP STATISTICS.

After ALTER TABLE ... DROP COLUMN, statistics referencing are:

  (a) dropped, if the statistics would reference only one column

  (b) retained, but modified on the next ANALYZE

The goal of the lazy cleanup is not to disrupt the optimizer, but
arguably this is over-engineering and it might also made work just
like for indexes by simply dropping all dependent statistics on
ALTER TABLE ... DROP COLUMN. If the user wants to minimize impact,
the smaller statistics needs to be created explicitly in advance.

This also adds a simple list of statistics to \d in psql.

ndistinct coefficients
----------------------

The patch only implements a very simple type of statistics, tracking
the number of groups for different combinations of columns. For
example given columns (a,b,c) the statistics will estimate number
of distinct combinations of values in (a,b), (a,c), (b,c) and (a,b,c).

This is then used in estimate_num_groups() for estimating cardinality
of GROUP BY and similar clauses.

pg_ndistinct data type
----------------------

The patch introduces pg_ndistinct, a new varlena data type used for
serialized version of ndistinct coefficients. Internally it's just
a bytea value, but it allows us to control casting, input/output
and so on. It's somewhat inspired by pg_node_tree.
---
 doc/src/sgml/catalogs.sgml                   | 108 +++++
 doc/src/sgml/planstats.sgml                  | 141 +++++++
 doc/src/sgml/ref/allfiles.sgml               |   3 +
 doc/src/sgml/ref/alter_statistics.sgml       | 115 ++++++
 doc/src/sgml/ref/alter_table.sgml            |   8 +-
 doc/src/sgml/ref/create_statistics.sgml      | 152 +++++++
 doc/src/sgml/ref/drop_statistics.sgml        |  91 ++++
 doc/src/sgml/reference.sgml                  |   3 +
 src/backend/catalog/Makefile                 |   1 +
 src/backend/catalog/aclchk.c                 |  27 ++
 src/backend/catalog/dependency.c             |  11 +-
 src/backend/catalog/heap.c                   | 101 +++++
 src/backend/catalog/namespace.c              |  56 +++
 src/backend/catalog/objectaddress.c          |  53 +++
 src/backend/catalog/system_views.sql         |  10 +
 src/backend/commands/Makefile                |   6 +-
 src/backend/commands/alter.c                 |   3 +
 src/backend/commands/analyze.c               |   8 +
 src/backend/commands/dropcmds.c              |   4 +
 src/backend/commands/event_trigger.c         |   3 +
 src/backend/commands/statscmds.c             | 259 ++++++++++++
 src/backend/nodes/copyfuncs.c                |  16 +
 src/backend/nodes/outfuncs.c                 |  18 +
 src/backend/optimizer/util/plancat.c         |  72 +++-
 src/backend/parser/gram.y                    |  58 ++-
 src/backend/tcop/utility.c                   |  12 +
 src/backend/utils/Makefile                   |   2 +-
 src/backend/utils/adt/selfuncs.c             | 168 +++++++-
 src/backend/utils/cache/relcache.c           |  78 ++++
 src/backend/utils/cache/syscache.c           |  23 ++
 src/backend/utils/mvstats/Makefile           |  17 +
 src/backend/utils/mvstats/README.ndistinct   |  22 +
 src/backend/utils/mvstats/README.stats       |  98 +++++
 src/backend/utils/mvstats/common.c           | 391 ++++++++++++++++++
 src/backend/utils/mvstats/common.h           |  80 ++++
 src/backend/utils/mvstats/mvdist.c           | 597 +++++++++++++++++++++++++++
 src/bin/psql/describe.c                      |  44 ++
 src/include/catalog/dependency.h             |   5 +-
 src/include/catalog/heap.h                   |   1 +
 src/include/catalog/indexing.h               |   7 +
 src/include/catalog/namespace.h              |   2 +
 src/include/catalog/pg_cast.h                |   4 +
 src/include/catalog/pg_mv_statistic.h        |  78 ++++
 src/include/catalog/pg_proc.h                |   9 +
 src/include/catalog/pg_type.h                |   4 +
 src/include/catalog/toasting.h               |   1 +
 src/include/commands/defrem.h                |   4 +
 src/include/nodes/nodes.h                    |   2 +
 src/include/nodes/parsenodes.h               |  11 +
 src/include/nodes/relation.h                 |  27 ++
 src/include/utils/acl.h                      |   1 +
 src/include/utils/builtins.h                 |   6 +
 src/include/utils/mvstats.h                  |  57 +++
 src/include/utils/rel.h                      |   4 +
 src/include/utils/relcache.h                 |   1 +
 src/include/utils/syscache.h                 |   2 +
 src/test/regress/expected/mv_ndistinct.out   | 117 ++++++
 src/test/regress/expected/object_address.out |   7 +-
 src/test/regress/expected/opr_sanity.out     |   3 +-
 src/test/regress/expected/rules.out          |   8 +
 src/test/regress/expected/sanity_check.out   |   1 +
 src/test/regress/expected/type_sanity.out    |  11 +-
 src/test/regress/parallel_schedule           |   3 +
 src/test/regress/serial_schedule             |   1 +
 src/test/regress/sql/mv_ndistinct.sql        |  68 +++
 src/test/regress/sql/object_address.sql      |   4 +-
 66 files changed, 3286 insertions(+), 22 deletions(-)
 create mode 100644 doc/src/sgml/ref/alter_statistics.sgml
 create mode 100644 doc/src/sgml/ref/create_statistics.sgml
 create mode 100644 doc/src/sgml/ref/drop_statistics.sgml
 create mode 100644 src/backend/commands/statscmds.c
 create mode 100644 src/backend/utils/mvstats/Makefile
 create mode 100644 src/backend/utils/mvstats/README.ndistinct
 create mode 100644 src/backend/utils/mvstats/README.stats
 create mode 100644 src/backend/utils/mvstats/common.c
 create mode 100644 src/backend/utils/mvstats/common.h
 create mode 100644 src/backend/utils/mvstats/mvdist.c
 create mode 100644 src/include/catalog/pg_mv_statistic.h
 create mode 100644 src/include/utils/mvstats.h
 create mode 100644 src/test/regress/expected/mv_ndistinct.out
 create mode 100644 src/test/regress/sql/mv_ndistinct.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 086fafc..2a7bd6c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -201,6 +201,11 @@
      </row>
 
      <row>
+      <entry><link linkend="catalog-pg-mv-statistic"><structname>pg_mv_statistic</structname></link></entry>
+      <entry>multivariate statistics</entry>
+     </row>
+
+     <row>
       <entry><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link></entry>
       <entry>schemas</entry>
      </row>
@@ -4211,6 +4216,109 @@
   </table>
  </sect1>
 
+ <sect1 id="catalog-pg-mv-statistic">
+  <title><structname>pg_mv_statistic</structname></title>
+
+  <indexterm zone="catalog-pg-mv-statistic">
+   <primary>pg_mv_statistic</primary>
+  </indexterm>
+
+  <para>
+   The catalog <structname>pg_mv_statistic</structname>
+   holds multivariate statistics about combinations of columns.
+  </para>
+
+  <table>
+   <title><structname>pg_mv_statistic</> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+    <tbody>
+
+     <row>
+      <entry><structfield>starelid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
+      <entry>The table that the described columns belongs to</entry>
+     </row>
+
+     <row>
+      <entry><structfield>staname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry></entry>
+      <entry>Name of the statistic.</entry>
+     </row>
+
+     <row>
+      <entry><structfield>stanamespace</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the namespace that contains this statistic
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>staowner</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.oid</literal></entry>
+      <entry>Owner of the statistic</entry>
+     </row>
+
+     <row>
+      <entry><structfield>ndist_enabled</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, ndistinct coefficients will be computed for the combination of
+       columns, covered by the statistics. This does not mean the coefficients
+       are already computed, though.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>ndist_built</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, ndistinct coefficients are already computed and available for
+       use during query estimation.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>stakeys</structfield></entry>
+      <entry><type>int2vector</type></entry>
+      <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
+      <entry>
+       This is an array of values that indicate which table columns this
+       statistic covers. For example a value of <literal>1 3</literal> would
+       mean that the first and the third table columns make up the statistic key.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>standist</structfield></entry>
+      <entry><type>pg_ndistinct</type></entry>
+      <entry></entry>
+      <entry>
+       Ndistict coefficients, serialized as <structname>pg_ndistinct</> type.
+      </entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
  <sect1 id="catalog-pg-namespace">
   <title><structname>pg_namespace</structname></title>
 
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index b73c66b..d5b975d 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -448,4 +448,145 @@ rows = (outer_cardinality * inner_cardinality) * selectivity
 
  </sect1>
 
+ <sect1 id="multivariate-statistics">
+  <title>Multivariate Statistics</title>
+
+  <indexterm zone="multivariate-statistics">
+   <primary>multivariate statistics</primary>
+   <secondary>planner</secondary>
+  </indexterm>
+
+  <para>
+   The examples presented in <xref linkend="row-estimation-examples"> used
+   statistics about individual columns to compute selectivity estimates.
+   When estimating conditions on multiple columns, the planner assumes
+   independence of the conditions and multiplies the selectivities. When the
+   columns are correlated, the independence assumption is violated, and the
+   estimates may be off by several orders of magnitude, resulting in poor
+   plan choices.
+  </para>
+
+  <para>
+   The examples presented below demonstrate such estimation errors on simple
+   data sets, and also how to resolve them by creating multivariate statistics
+   using <command>CREATE STATISTICS</> command.
+  </para>
+
+  <para>
+   Let's start with a very simple data set - a table with two columns,
+   containing exactly the same values:
+
+<programlisting>
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i);
+ANALYZE t;
+</programlisting>
+
+   As explained in <xref linkend="planner-stats">, the planner can determine
+   cardinality of <structname>t</structname> using the number of pages and
+   rows is looked up in <structname>pg_class</structname>:
+
+<programlisting>
+SELECT relpages, reltuples FROM pg_class WHERE relname = 't';
+
+ relpages | reltuples
+----------+-----------
+       45 |     10000
+</programlisting>
+
+   The data distribution is very simple - there are only 100 distinct values
+   in each column, uniformly distributed.
+  </para>
+
+  <para>
+   The following example shows the result of estimating a <literal>WHERE</>
+   condition on the <structfield>a</> column:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
+                                           QUERY PLAN                                            
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual time=0.031..2.870 rows=100 loops=1)
+   Filter: (a = 1)
+   Rows Removed by Filter: 9900
+ Planning time: 0.092 ms
+ Execution time: 3.103 ms
+(5 rows)
+</programlisting>
+
+   The planner examines the condition and computes the estimate using
+   <function>eqsel</>, the selectivity function for <literal>=</>, and
+   statistics stored in the <structname>pg_stats</> table. In this case
+   the planner estimates the condition matches 1% rows, and by comparing
+   the estimated and actual number of rows, we see that the estimate is
+   very accurate (in fact exact, as the table is very small).
+ </para>
+
+  <para>
+   Adding a condition on the second column results in the following plan:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.033..3.006 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.121 ms
+ Execution time: 3.220 ms
+(5 rows)
+</programlisting>
+
+   The planner estimates the selectivity for each condition individually,
+   arriving to the 1% estimates as above, and then multiplies them, getting
+   the final 0.01% estimate. The plan however shows that this results in
+   a significant underestimate, as the actual number of rows matching the
+   conditions is two orders of magnitude higher than estimated.
+  </para>
+
+  <para>
+   Overestimates, i.e. errors in the opposite direction, are also possible.
+   Consider for example the following combination of range conditions, each
+   matching 
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=2500 width=8) (actual time=1.607..1.607 rows=0 loops=1)
+   Filter: ((a <= 49) AND (b > 49))
+   Rows Removed by Filter: 10000
+ Planning time: 0.050 ms
+ Execution time: 1.623 ms
+(5 rows)
+</programlisting>
+
+   The planner examines both <literal>WHERE</> clauses and estimates them
+   using the <function>scalarltsel</> and <function>scalargtsel</> functions,
+   specified as the selectivity functions matching the <literal>&lt;=</> and
+   <literal>&gt;</literal> operators. Both conditions match 50% of the
+   table, and assuming independence the planner multiplies them to compute
+   the total estimate of 25%. However as the explain output shows, the actual
+   number of rows is 0, because the columns are correlated and the conditions
+   contradict each other.
+  </para>
+
+  <para>
+   Both estimation errors are caused by violation of the independence
+   assumption, as the two columns contain exactly the same values, and are
+   therefore perfectly correlated. Providing additional information about
+   correlation between columns is the purpose of multivariate statistics,
+   and the rest of this section explains in more detail how the planner
+   leverages them to improve estimates.
+  </para>
+
+  <para>
+   For additional details about multivariate statistics, see
+   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
+   <literal>READMEs</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
+
+ </sect1>
+
 </chapter>
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 0d09f81..a49da6d 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -34,6 +34,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY alterSequence      SYSTEM "alter_sequence.sgml">
 <!ENTITY alterSubscription  SYSTEM "alter_subscription.sgml">
 <!ENTITY alterSystem        SYSTEM "alter_system.sgml">
+<!ENTITY alterStatistics    SYSTEM "alter_statistics.sgml">
 <!ENTITY alterTable         SYSTEM "alter_table.sgml">
 <!ENTITY alterTableSpace    SYSTEM "alter_tablespace.sgml">
 <!ENTITY alterTSConfig      SYSTEM "alter_tsconfig.sgml">
@@ -80,6 +81,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createSubscription SYSTEM "create_subscription.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
@@ -126,6 +128,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropSubscription   SYSTEM "drop_subscription.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
diff --git a/doc/src/sgml/ref/alter_statistics.sgml b/doc/src/sgml/ref/alter_statistics.sgml
new file mode 100644
index 0000000..3f477cb
--- /dev/null
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -0,0 +1,115 @@
+<!--
+doc/src/sgml/ref/alter_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-ALTERSTATISTICS">
+ <indexterm zone="sql-alterstatistics">
+  <primary>ALTER STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>ALTER STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>ALTER STATISTICS</refname>
+  <refpurpose>
+   change the definition of a multivariate statistics
+  </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> | CURRENT_USER | SESSION_USER }
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> RENAME TO <replaceable class="parameter">new_name</replaceable>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> SET SCHEMA <replaceable class="parameter">new_schema</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>ALTER STATISTICS</command> changes the parameters of an existing
+   multivariate statistics.  Any parameters not specifically set in the
+   <command>ALTER STATISTICS</command> command retain their prior settings.
+  </para>
+
+  <para>
+   You must own the statistics to use <command>ALTER STATISTICS</>.
+   To change a statistics' schema, you must also have <literal>CREATE</>
+   privilege on the new schema.
+   To alter the owner, you must also be a direct or indirect member of the new
+   owning role, and that role must have <literal>CREATE</literal> privilege on
+   the statistics' schema.  (These restrictions enforce that altering the owner
+   doesn't do anything you couldn't do by dropping and recreating the statistics.
+   However, a superuser can alter ownership of any statistics anyway.)
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><replaceable class="parameter">name</replaceable></term>
+      <listitem>
+       <para>
+        The name (optionally schema-qualified) of a statistics to be altered.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="PARAMETER">new_owner</replaceable></term>
+      <listitem>
+       <para>
+        The user name of the new owner of the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="parameter">new_name</replaceable></term>
+      <listitem>
+       <para>
+        The new name for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="parameter">new_schema</replaceable></term>
+      <listitem>
+       <para>
+        The new schema for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+  </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>ALTER STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index da431f8..9ce1079 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -119,9 +119,11 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
      <para>
       This form drops a column from a table.  Indexes and
       table constraints involving the column will be automatically
-      dropped as well.  You will need to say <literal>CASCADE</> if
-      anything outside the table depends on the column, for example,
-      foreign key references or views.
+      dropped as well.  Multivariate statistics referencing the column will
+      be dropped only if there would remain a single non-dropped column.
+      You will need to say <literal>CASCADE</> if anything outside the table
+      depends on the column, for example, foreign key references or views.
+
       If <literal>IF EXISTS</literal> is specified and the column
       does not exist, no error is thrown. In this case a notice
       is issued instead.
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..9f6a65c
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,152 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define multivariate statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
+  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+  FROM <replaceable class="PARAMETER">table_name</replaceable>
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new multivariate
+   statistics on the table. The statistics will be created in the current
+   database and will be owned by the user issuing the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   Create table <structname>t1</> with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for determining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
+
+<programlisting>
+CREATE TABLE t1 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t1 SELECT i/100, i/500
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s1 ON (a, b) FROM t1;
+
+ANALYZE t1;
+
+-- valid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
+
+-- invalid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..3e73d10
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,91 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove multivariate statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics do not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 34007d3..abc3f42 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -60,6 +60,7 @@
    &alterSchema;
    &alterSequence;
    &alterServer;
+   &alterStatistics;
    &alterSubscription;
    &alterSystem;
    &alterTable;
@@ -108,6 +109,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createSubscription;
    &createTable;
    &createTableAs;
@@ -154,6 +156,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropSubscription;
    &dropTable;
    &dropTableSpace;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 3136858..5a891a0 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -33,6 +33,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_mv_statistic.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index f4df6df..d0e5b2e 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -5133,6 +5134,32 @@ pg_subscription_ownercheck(Oid sub_oid, Oid roleid)
 }
 
 /*
+ * Ownership check for a multivariate statistics (specified by OID).
+ */
+bool
+pg_statistics_ownercheck(Oid stat_oid, Oid roleid)
+{
+	HeapTuple	tuple;
+	Oid			ownerId;
+
+	/* Superusers bypass all permission checking. */
+	if (superuser_arg(roleid))
+		return true;
+
+	tuple = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(stat_oid));
+	if (!HeapTupleIsValid(tuple))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics with OID %u do not exist", stat_oid)));
+
+	ownerId = ((Form_pg_mv_statistic) GETSTRUCT(tuple))->staowner;
+
+	ReleaseSysCache(tuple);
+
+	return has_privs_of_role(roleid, ownerId);
+}
+
+/*
  * Check whether specified role has CREATEROLE privilege (or is a superuser)
  *
  * Note: roles do not have owners per se; instead we use this test in
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 1c43af6..7083270 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_init_privs.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -171,7 +172,8 @@ static const Oid object_classes[] = {
 	PublicationRelationId,		/* OCLASS_PUBLICATION */
 	PublicationRelRelationId,	/* OCLASS_PUBLICATION_REL */
 	SubscriptionRelationId,		/* OCLASS_SUBSCRIPTION */
-	TransformRelationId			/* OCLASS_TRANSFORM */
+	TransformRelationId,		/* OCLASS_TRANSFORM */
+	MvStatisticRelationId		/* OCLASS_STATISTICS */
 };
 
 
@@ -1263,6 +1265,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTICS:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2430,6 +2436,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case TransformRelationId:
 			return OCLASS_TRANSFORM;
+
+		case MvStatisticRelationId:
+			return OCLASS_STATISTICS;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 7ce9115..470f7ad 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -48,6 +48,7 @@
 #include "catalog/pg_constraint_fn.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_partitioned_table.h"
@@ -1614,7 +1615,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveMVStatistics(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1865,6 +1869,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveMVStatistics(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2783,6 +2792,98 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveMVStatistics --- remove entries in pg_mv_statistic for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveMVStatistics(Oid relid, AttrNumber attnum)
+{
+	Relation	pgmvstatistic;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single remaining
+	 * (undropped column). To do that, we need the tuple descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER TABLE ...
+	 * DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation	rel = relation_open(relid, NoLock);
+
+		/* multivariate stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgmvstatistic = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgmvstatistic,
+							  MvStatisticRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool		delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(MVSTATOID, tuple,
+									 Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16 *) ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((!tupdesc->attrs[attnums[i] - 1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgmvstatistic, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgmvstatistic, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index a38da30..6cd7d36 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -4236,3 +4236,59 @@ pg_is_other_temp_schema(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(isOtherTempNamespace(oid));
 }
+
+/*
+ * get_statistics_oid - find a statistics by possibly qualified name
+ *
+ * If not found, returns InvalidOid if missing_ok, else throws error
+ */
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(MVSTATNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" do not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 2a38792..d390dc1 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_language.h"
 #include "catalog/pg_largeobject.h"
 #include "catalog/pg_largeobject_metadata.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
@@ -478,6 +479,18 @@ static const ObjectPropertyType ObjectProperty[] =
 		InvalidAttrNumber,
 		-1,
 		true
+	},
+	{
+		MvStatisticRelationId,
+		MvStatisticOidIndexId,
+		MVSTATOID,
+		MVSTATNAMENSP,
+		Anum_pg_mv_statistic_staname,
+		Anum_pg_mv_statistic_stanamespace,
+		Anum_pg_mv_statistic_staowner,
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
@@ -696,6 +709,10 @@ static const struct object_type_map
 	/* OCLASS_TRANSFORM */
 	{
 		"transform", OBJECT_TRANSFORM
+	},
+	/* OBJECT_STATISTICS */
+	{
+		"statistics", OBJECT_STATISTICS
 	}
 };
 
@@ -980,6 +997,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 				address = get_object_address_defacl(objname, objargs,
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = MvStatisticRelationId;
+				address.objectId = get_statistics_oid(objname, missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2361,6 +2383,10 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			if (!pg_statistics_ownercheck(address.objectId, roleid))
+				aclcheck_error_type(ACLCHECK_NOT_OWNER, address.objectId);
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
@@ -3848,6 +3874,10 @@ getObjectTypeDescription(const ObjectAddress *object)
 			appendStringInfoString(&buffer, "subscription");
 			break;
 
+		case OCLASS_STATISTICS:
+			appendStringInfoString(&buffer, "statistics");
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized %u", object->classId);
 			break;
@@ -4871,6 +4901,29 @@ getObjectIdentityParts(const ObjectAddress *object,
 				break;
 			}
 
+		case OCLASS_STATISTICS:
+			{
+				HeapTuple	tup;
+				Form_pg_mv_statistic formStatistic;
+				char	   *schema;
+
+				tup = SearchSysCache1(MVSTATOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for statistics %u",
+						 object->objectId);
+				formStatistic = (Form_pg_mv_statistic) GETSTRUCT(tup);
+				schema = get_namespace_name_or_temp(formStatistic->stanamespace);
+				appendStringInfoString(&buffer,
+									   quote_qualified_identifier(schema,
+										   NameStr(formStatistic->staname)));
+				if (objname)
+					*objname = list_make2(schema,
+								   pstrdup(NameStr(formStatistic->staname)));
+				ReleaseSysCache(tup);
+			}
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 4dfedf8..00ab440 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -181,6 +181,16 @@ CREATE OR REPLACE VIEW pg_sequences AS
     WHERE NOT pg_is_other_temp_schema(N.oid)
           AND relkind = 'S';
 
+CREATE VIEW pg_mv_stats AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(s.standist) AS ndistbytes
+    FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e0fab38..4a6c99e 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
-	schemacmds.o seclabel.o sequence.o subscriptioncmds.o tablecmds.o \
-	tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
-	vacuumlazy.o variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
+	vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 768fcc8..e2d1243 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -361,6 +361,7 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
 		case OBJECT_LANGUAGE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -475,6 +476,7 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -790,6 +792,7 @@ ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABLESPACE:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSCONFIGURATION:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c9f6afe..27eaf79 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -17,6 +17,7 @@
 #include <math.h>
 
 #include "access/multixact.h"
+#include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tupconvert.h"
 #include "access/tuptoaster.h"
@@ -27,6 +28,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
@@ -45,10 +47,13 @@
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/mvstats.h"
 #include "utils/pg_rusage.h"
 #include "utils/sampling.h"
 #include "utils/sortsupport.h"
@@ -559,6 +564,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build multivariate stats (if there are any). */
+		build_mv_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index ff3108c..8fd4269 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -294,6 +294,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = NameListToString(objname);
 			break;
+		case OBJECT_STATISTICS:
+			msg = gettext_noop("statistics \"%s\" do not exist, skipping");
+			name = NameListToString(objname);
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(objname, &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 8125537..763a45a 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -112,6 +112,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"SUBSCRIPTION", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
@@ -1111,6 +1112,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
 		case OBJECT_SUBSCRIPTION:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1176,6 +1178,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_PUBLICATION:
 		case OCLASS_PUBLICATION_REL:
 		case OCLASS_SUBSCRIPTION:
+		case OCLASS_STATISTICS:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..bde7e4b
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,259 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering multivariate statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_mv_statistic.h"
+#include "catalog/pg_namespace.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/mvstats.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int
+compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON (columns) FROM table
+ *
+ * We do require that the types support sorting (ltopr), although some
+ * statistics might work with  equality only.
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i;
+	ListCell   *l;
+	int16		attnums[MVSTATS_MAX_DIMENSIONS];
+	int			numcols = 0;
+	ObjectAddress address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	int2vector *stakeys;
+	Relation	mvstatrel;
+	Relation	rel;
+	Oid			relid;
+	ObjectAddress parentobject,
+				childobject;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (SearchSysCacheExists2(MVSTATNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		if (stmt->if_not_exists)
+		{
+			ereport(NOTICE,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					errmsg("statistics \"%s\" already exist, skipping",
+							namestr)));
+			return InvalidObjectAddress;
+		}
+
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				errmsg("statistics \"%s\" already exist", namestr)));
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+	relid = RelationGetRelid(rel);
+
+	/*
+	 * Transform column names to array of attnums. While doing that, we
+	 * also enforce the maximum number of keys.
+	 */
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(relid, attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+			  errmsg("column \"%s\" referenced in statistics does not exist",
+					 attname)));
+
+		/* more than MVSTATS_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= MVSTATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in statistics",
+							MVSTATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check that at least two columns were specified in the statement.
+	 * The upper bound was already checked in the loop above.
+	 */
+	if (numcols < 2)
+		ereport(ERROR,
+				(errcode(ERRCODE_TOO_MANY_COLUMNS),
+				 errmsg("statistics require at least 2 columns")));
+
+	/*
+	 * Sort the attnums, which makes detecting duplicies somewhat
+	 * easier, and it does not hurt (it does not affect the efficiency,
+	 * unlike for indexes, for example).
+	 */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+
+	/*
+	 * Look for duplicities in the list of columns. The attnums are sorted
+	 * so just check consecutive elements.
+	 */
+	for (i = 1; i < numcols; i++)
+		if (attnums[i] == attnums[i-1])
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+			  errmsg("duplicate column name in statistics definition")));
+
+	stakeys = buildint2vector(attnums, numcols);
+
+	/*
+	 * Everything seems fine, so let's build the pg_mv_statistic entry.
+	 * At this point we obviously only have the keys and options.
+	 */
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* metadata */
+	values[Anum_pg_mv_statistic_starelid - 1] = ObjectIdGetDatum(relid);
+	values[Anum_pg_mv_statistic_staname - 1] = NameGetDatum(&staname);
+	values[Anum_pg_mv_statistic_stanamespace - 1] = ObjectIdGetDatum(namespaceId);
+	values[Anum_pg_mv_statistic_staowner - 1] = ObjectIdGetDatum(GetUserId());
+
+	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(stakeys);
+
+	/* enabled statistics */
+	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(true);
+
+	nulls[Anum_pg_mv_statistic_standist - 1] = true;
+
+	/* insert the tuple into pg_mv_statistic */
+	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(mvstatrel->rd_att, values, nulls);
+
+	simple_heap_insert(mvstatrel, htup);
+
+	CatalogUpdateIndexes(mvstatrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+	/*
+	 * Add a dependency on a table, so that stats get dropped on DROP TABLE.
+	 */
+	ObjectAddressSet(parentobject, RelationRelationId, relid);
+	ObjectAddressSet(childobject, MvStatisticRelationId, statoid);
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also add dependency on the schema (to drop statistics on DROP SCHEMA).
+	 * This is not handled automatically by DROP TABLE because statistics have
+	 * their own schema.
+	 */
+	ObjectAddressSet(parentobject, NamespaceRelationId, namespaceId);
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	heap_close(mvstatrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, MvStatisticRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *	   DROP STATISTICS stats_name
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	Oid			relid;
+	Relation	rel;
+	HeapTuple	tup;
+	Form_pg_mv_statistic mvstat;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(statsOid));
+
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(tup);
+	relid = mvstat->starelid;
+
+	rel = heap_open(relid, AccessExclusiveLock);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	CacheInvalidateRelcache(rel);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+	heap_close(rel, NoLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 30d733e..dc42be0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4349,6 +4349,19 @@ _copyDropSubscriptionStmt(const DropSubscriptionStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_SCALAR_FIELD(if_not_exists);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -5272,6 +5285,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_FuncWithArgs:
 			retval = _copyFuncWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1560ac3..57cc0b4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2193,6 +2193,21 @@ _outForeignKeyOptInfo(StringInfo str, const ForeignKeyOptInfo *node)
 }
 
 static void
+_outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
+{
+	WRITE_NODE_TYPE("MVSTATISTICINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(mvoid);
+
+	/* enabled statistics */
+	WRITE_BOOL_FIELD(ndist_enabled);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(ndist_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3799,6 +3814,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_MVStatisticInfo:
+				_outMVStatisticInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 7836e6b..fc9ad93 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -29,6 +29,7 @@
 #include "catalog/heap.h"
 #include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_mv_statistic.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -41,7 +42,9 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -63,7 +66,7 @@ static List *get_relation_constraints(PlannerInfo *root,
 						 bool include_notnull);
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 				  Relation heapRelation);
-
+static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
 
 /*
  * get_relation_info -
@@ -397,6 +400,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	rel->mvstatlist = get_relation_statistics(rel, relation);
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
@@ -1250,6 +1255,71 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * get_relation_statistics
+ *
+ * Retrieve multivariate statistics defined on the table.
+ *
+ * Returns a List (possibly empty) of MVStatisticInfo objects describing
+ * the statistics.  Only attributes needed for selecting statistics are
+ * retrieved (columns covered by the statistics, etc.).
+ */
+static List *
+get_relation_statistics(RelOptInfo *rel, Relation relation)
+{
+	List	   *mvstatoidlist;
+	ListCell   *l;
+	List	   *stainfos = NIL;
+
+	mvstatoidlist = RelationGetMVStatList(relation);
+
+	foreach(l, mvstatoidlist)
+	{
+		ArrayType  *arr;
+		Datum		adatum;
+		bool		isnull;
+		Oid			mvoid = lfirst_oid(l);
+		Form_pg_mv_statistic mvstat;
+		MVStatisticInfo *info;
+
+		HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+		mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		/* unavailable stats are not interesting for the planner */
+		if (mvstat->ndist_built)
+		{
+			info = makeNode(MVStatisticInfo);
+
+			info->mvoid = mvoid;
+			info->rel = rel;
+
+			/* enabled statistics */
+			info->ndist_enabled = mvstat->ndist_enabled;
+
+			/* built/available statistics */
+			info->ndist_built = mvstat->ndist_built;
+
+			/* stakeys */
+			adatum = SysCacheGetAttr(MVSTATOID, htup,
+								  Anum_pg_mv_statistic_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+
+			info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+											ARR_DIMS(arr)[0]);
+
+			stainfos = lcons(info, stainfos);
+		}
+
+		ReleaseSysCache(htup);
+	}
+
+	list_free(mvstatoidlist);
+
+	return stainfos;
+}
 
 /*
  * relation_excluded_by_constraints
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a4edea0..475a8a6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -257,7 +257,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -867,6 +867,7 @@ stmt :
 			| CreateSeqStmt
 			| CreateStmt
 			| CreateSubscriptionStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3747,6 +3748,34 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON '(' columnList ')' FROM qualified_name
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $9;
+							n->keys = $6;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON '(' columnList ')' FROM qualified_name
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $12;
+							n->keys = $9;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -6090,6 +6119,7 @@ drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
 			| TEXT_P SEARCH CONFIGURATION			{ $$ = OBJECT_TSCONFIGURATION; }
 			| PUBLICATION							{ $$ = OBJECT_PUBLICATION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 any_name_list:
@@ -8475,6 +8505,15 @@ RenameStmt: ALTER AGGREGATE aggregate_with_argtypes RENAME TO name
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name RENAME TO name
+				{
+					RenameStmt *n = makeNode(RenameStmt);
+					n->renameType = OBJECT_STATISTICS;
+					n->object = $3;
+					n->newname = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 		;
 
 opt_column: COLUMN									{ $$ = COLUMN; }
@@ -8760,6 +8799,15 @@ AlterObjectSchemaStmt:
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name SET SCHEMA name
+				{
+					AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = $3;
+					n->newschema = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 		;
 
 /*****************************************************************************
@@ -8966,6 +9014,14 @@ AlterOwnerStmt: ALTER AGGREGATE aggregate_with_argtypes OWNER TO RoleSpec
 					n->newowner = $6;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS name OWNER TO RoleSpec
+				{
+					AlterOwnerStmt *n = makeNode(AlterOwnerStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = list_make1(makeString($3));
+					n->newowner = $6;
+					$$ = (Node *)n;
+				}
 		;
 
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5d3be38..6f2371c 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1621,6 +1621,10 @@ ProcessUtilitySlow(ParseState *pstate,
 				commandCollected = true;
 				break;
 
+			case T_CreateStatsStmt:		/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -1986,6 +1990,8 @@ AlterObjectTypeCommandTag(ObjectType objtype)
 			break;
 		case OBJECT_SUBSCRIPTION:
 			tag = "ALTER SUBSCRIPTION";
+		case OBJECT_STATISTICS:
+			tag = "ALTER STATISTICS";
 			break;
 		default:
 			tag = "???";
@@ -2280,6 +2286,8 @@ CreateCommandTag(Node *parsetree)
 					break;
 				case OBJECT_PUBLICATION:
 					tag = "DROP PUBLICATION";
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
 					break;
 				default:
 					tag = "???";
@@ -2679,6 +2687,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 2e35ca5..0eb2331 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS        = fmgrtab.o
-SUBDIRS     = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS     = adt cache error fmgr hash init mb misc mmgr mvstats resowner sort time
 
 # location of Catalog.pm
 catalogdir  = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fa32e9e..7774c44 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -132,6 +132,7 @@
 #include "utils/fmgroids.h"
 #include "utils/index_selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/nabstime.h"
 #include "utils/pg_locale.h"
 #include "utils/rel.h"
@@ -207,6 +208,8 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static double find_ndistinct(PlannerInfo *root, RelOptInfo *rel, List *varinfos,
+							 bool *found);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3436,12 +3439,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient
+			 * to get the (probably way more accurate) estimate.
+			 *
+			 * XXX Might benefit from some refactoring, mixing the ndistinct
+			 * coefficients and clamp seems a bit unfortunate.
 			 */
 			double		clamp = rel->tuples;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				bool found;
+				double ndist = find_ndistinct(root, rel, varinfos, &found);
+
+				if (found)
+					reldistinct = ndist;
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3450,6 +3467,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7600,3 +7618,151 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * XXX Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics. This may be
+ * a bit problematic as adding a column (not covered by the ndistinct stats)
+ * will prevent us from using the stats entirely. So instead this needs to
+ * estimate the covered attributes, and then combine that with the extra
+ * attributes somehow (probably the old way).
+ */
+static double
+find_ndistinct(PlannerInfo *root, RelOptInfo *rel, List *varinfos, bool *found)
+{
+	ListCell *lc;
+	Bitmapset *attnums = NULL;
+	VariableStatData vardata;
+
+	/* assume we haven't found any suitable ndistinct statistics */
+	*found = false;
+
+	/* bail out immediately if the table has no multivariate statistics */
+	if (!rel->mvstatlist)
+		return 0.0;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+				= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach (lc, rel->mvstatlist)
+	{
+		int i, k;
+		bool matches;
+		MVStatisticInfo *info = (MVStatisticInfo *)lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/*
+		 * Only ndistinct stats covering all Vars are acceptable, which can't
+		 * happen if the statistics has fewer attributes than we have Vars.
+		 */
+		if (bms_num_members(attnums) > info->stakeys->dim1)
+			continue;
+
+		/* check that all Vars are covered by the statistic */
+		matches = true; /* assume match until we find unmatched attribute */
+		k = -1;
+		while ((k = bms_next_member(attnums, k)) >= 0)
+		{
+			bool attr_found = false;
+			for (i = 0; i < info->stakeys->dim1; i++)
+			{
+				if (info->stakeys->values[i] == k)
+				{
+					attr_found = true;
+					break;
+				}
+			}
+
+			/* found attribute not covered by this ndistinct stats, skip */
+			if (!attr_found)
+			{
+				matches = false;
+				break;
+			}
+		}
+
+		if (! matches)
+			continue;
+
+		/* hey, this statistics matches! great, let's extract the value */
+		*found = true;
+
+		{
+			int j;
+			MVNDistinct stat = load_mv_ndistinct(info->mvoid);
+
+			for (j = 0; j < stat->nitems; j++)
+			{
+				bool item_matches = true;
+				MVNDistinctItem * item = &stat->items[j];
+
+				/* not the right item (different number of attributes) */
+				if (item->nattrs != bms_num_members(attnums))
+					continue;
+
+				/* check the attribute numbers */
+				k = -1;
+				while ((k = bms_next_member(attnums, k)) >= 0)
+				{
+					bool attr_found = false;
+					for (i = 0; i < item->nattrs; i++)
+					{
+						if (info->stakeys->values[item->attrs[i]] == k)
+						{
+							attr_found = true;
+							break;
+						}
+					}
+
+					if (! attr_found)
+					{
+						item_matches = false;
+						break;
+					}
+				}
+
+				if (! item_matches)
+					continue;
+
+				return item->ndistinct;
+			}
+		}
+	}
+
+	Assert(!(*found));
+
+	return 0.0;
+}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 26ff7e1..1316104 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -49,6 +49,7 @@
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_partitioned_table.h"
@@ -4452,6 +4453,81 @@ RelationGetIndexList(Relation relation)
 }
 
 /*
+ * RelationGetMVStatList -- get a list of OIDs of statistics on this relation
+ *
+ * The statistics list is created only if someone requests it, in a way
+ * similar to RelationGetIndexList().  We scan pg_mv_statistic to find
+ * relevant statistics, and add the list to the relcache entry so that we
+ * won't have to compute it again.  Note that shared cache inval of a
+ * relcache entry will delete the old list and set rd_mvstatvalid to 0,
+ * so that we must recompute the statistics list on next request.  This
+ * handles creation or deletion of a statistic.
+ *
+ * The returned list is guaranteed to be sorted in order by OID, although
+ * this is not currently needed.
+ *
+ * Since shared cache inval causes the relcache's copy of the list to go away,
+ * we return a copy of the list palloc'd in the caller's context.  The caller
+ * may list_free() the returned list after scanning it. This is necessary
+ * since the caller will typically be doing syscache lookups on the relevant
+ * statistics, and syscache lookup could cause SI messages to be processed!
+ */
+List *
+RelationGetMVStatList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_mvstatvalid != 0)
+		return list_copy(relation->rd_mvstatlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_index for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_mvstatlist;
+	relation->rd_mvstatlist = list_copy(result);
+
+	relation->rd_mvstatvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
+/*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
  *
@@ -5531,6 +5607,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_pkattr = NULL;
 		rel->rd_idattr = NULL;
 		rel->rd_pubactions = NULL;
+		rel->rd_mvstatvalid = false;
+		rel->rd_mvstatlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bdfaa0c..fbd1885 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_language.h"
+#include "catalog/pg_mv_statistic.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -507,6 +508,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		4
 	},
+	{MvStatisticRelationId,		/* MVSTATNAMENSP */
+		MvStatisticNameIndexId,
+		2,
+		{
+			Anum_pg_mv_statistic_staname,
+			Anum_pg_mv_statistic_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{MvStatisticRelationId,		/* MVSTATOID */
+		MvStatisticOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{NamespaceRelationId,		/* NAMESPACENAME */
 		NamespaceNameIndexId,
 		1,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
new file mode 100644
index 0000000..7295d46
--- /dev/null
+++ b/src/backend/utils/mvstats/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for utils/mvstats
+#
+# IDENTIFICATION
+#    src/backend/utils/mvstats/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/mvstats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o mvdist.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.ndistinct b/src/backend/utils/mvstats/README.ndistinct
new file mode 100644
index 0000000..9365b17
--- /dev/null
+++ b/src/backend/utils/mvstats/README.ndistinct
@@ -0,0 +1,22 @@
+ndistinct coefficients
+======================
+
+Estimating number of groups in a combination of columns (e.g. for GROUP BY)
+is tricky, and the estimation error is often significant.
+
+The ndistinct coefficients address this by storing ndistinct estimates not
+only for individual columns, but also for (all) combinations of columns.
+So for example given three columns (a,b,c) the statistics will estimate
+ndistinct for (a,b), (a,c), (b,c) and (a,b,c). The per-column estimates
+are already available in pg_statistic.
+
+
+GROUP BY estimation (estimate_num_groups)
+-----------------------------------------
+
+Although ndistinct coefficient might be used for selectivity estimation
+(of equality conditions in WHERE clause), that is not implemented at this
+point.
+
+Instead, ndistinct coefficients are only used in estimate_num_groups() to
+estimate grouped queries.
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
new file mode 100644
index 0000000..30d60d6
--- /dev/null
+++ b/src/backend/utils/mvstats/README.stats
@@ -0,0 +1,98 @@
+Multivariate statistics
+=======================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Multivariate stats track different types of dependencies between the columns,
+hopefully improving the estimates.
+
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) ndistinct coefficients
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable multivariate stats
+        exist (so if you are not using multivariate stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are multivariate stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with multivariate statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
+
+
+Size of sample in ANALYZE
+-------------------------
+When performing ANALYZE, the number of rows to sample is determined as
+
+    (300 * statistics_target)
+
+That works reasonably well for statistics on individual columns, but perhaps
+it's not enough for multivariate statistics. Papers analyzing estimation errors
+all use samples proportional to the table (usually finding that 1-3% of the
+table is enough to build accurate stats).
+
+The requested accuracy (number of MCV items or histogram bins) should also
+be considered when determining the sample size, and in multivariate statistics
+those are not necessarily limited by statistics_target.
+
+This however merits further discussion, because collecting the sample is quite
+expensive and increasing it further would make ANALYZE even more painful.
+Judging by the experiments with the current implementation, the fixed size
+seems to work reasonably well for now, so we leave this as a future work.
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
new file mode 100644
index 0000000..7d2f3f3
--- /dev/null
+++ b/src/backend/utils/mvstats/common.c
@@ -0,0 +1,391 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+static VacAttrStats **lookup_var_attr_stats(int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats);
+
+static List *list_mv_stats(Oid relid);
+
+static void update_mv_stats(Oid relid, MVNDistinct ndistinct,
+					  int2vector *attrs, VacAttrStats **stats);
+
+
+/*
+ * Compute requested multivariate stats, using the rows sampled for the
+ * plain (single-column) stats.
+ *
+ * This fetches a list of stats from pg_mv_statistic, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats)
+{
+	ListCell   *lc;
+	List	   *mvstats;
+
+	TupleDesc	tupdesc = RelationGetDescr(onerel);
+
+	/*
+	 * Fetch defined MV groups from pg_mv_statistic, and then compute the MV
+	 * statistics (histograms for now).
+	 */
+	mvstats = list_mv_stats(RelationGetRelid(onerel));
+
+	foreach(lc, mvstats)
+	{
+		int			j;
+		MVStatisticInfo *stat = (MVStatisticInfo *) lfirst(lc);
+		MVNDistinct	ndistinct = NULL;
+
+		VacAttrStats **stats = NULL;
+		int			numatts = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector *attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (!tupdesc->attrs[attrs->values[j] - 1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16	   *tmp = palloc0(numatts * sizeof(int16));
+			int			attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (!tupdesc->attrs[attrs->values[j] - 1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= MVSTATS_MAX_DIMENSIONS));
+
+		/* compute ndistinct coefficients */
+		if (stat->ndist_enabled)
+			ndistinct = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
+
+		/* store the statistics in the catalog */
+		update_mv_stats(stat->mvoid, ndistinct, attrs, stats);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing multivariate stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int			i,
+				j;
+	int			numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats **) palloc0(numattrs * sizeof(VacAttrStats *));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and that
+		 * there's the requested 'lt' operator and that the type is
+		 * 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/*
+		 * FIXME This is rather ugly way to check for 'ltopr' (which is
+		 * defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *) stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List *
+list_mv_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_mv_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(MvStatisticRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, MvStatisticRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		MVStatisticInfo *info = makeNode(MVStatisticInfo);
+		Form_pg_mv_statistic stats = (Form_pg_mv_statistic) GETSTRUCT(htup);
+
+		info->mvoid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+		info->ndist_enabled = stats->ndist_enabled;
+		info->ndist_built = stats->ndist_built;
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/*
+	 * TODO maybe save the list into relcache, as in RelationGetIndexList
+	 * (which was used as an inspiration of this one)?.
+	 */
+
+	return result;
+}
+
+/*
+ * update_mv_stats
+ *	Serializes the statistics and stores them into the pg_mv_statistic tuple.
+ */
+static void
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+				int2vector *attrs, VacAttrStats **stats)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_mv_statistic];
+	bool		nulls[Natts_pg_mv_statistic];
+	bool		replaces[Natts_pg_mv_statistic];
+
+	Relation	sd = heap_open(MvStatisticRelationId, RowExclusiveLock);
+
+	memset(nulls, 1, Natts_pg_mv_statistic * sizeof(bool));
+	memset(replaces, 0, Natts_pg_mv_statistic * sizeof(bool));
+	memset(values, 0, Natts_pg_mv_statistic * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_mv_statistic tuple - replace only the histogram and
+	 * MCV list, depending whether it actually was computed.
+	 */
+	if (ndistinct != NULL)
+	{
+		bytea	   *data = serialize_mv_ndistinct(ndistinct);
+
+		nulls[Anum_pg_mv_statistic_standist -1] = (data == NULL);
+		values[Anum_pg_mv_statistic_standist-1] = PointerGetDatum(data);
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_mv_statistic_standist - 1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_mv_statistic_ndist_built - 1] = false;
+	nulls[Anum_pg_mv_statistic_stakeys - 1] = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_mv_statistic_ndist_built - 1] = true;
+	replaces[Anum_pg_mv_statistic_stakeys - 1] = true;
+
+	values[Anum_pg_mv_statistic_ndist_built - 1] = BoolGetDatum(ndistinct != NULL);
+
+	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(attrs);
+
+	/* Is there already a pg_mv_statistic tuple for this attribute? */
+	oldtup = SearchSysCache1(MVSTATOID,
+							 ObjectIdGetDatum(mvoid));
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		/* Yes, replace it */
+		stup = heap_modify_tuple(oldtup,
+								 RelationGetDescr(sd),
+								 values,
+								 nulls,
+								 replaces);
+		ReleaseSysCache(oldtup);
+		simple_heap_update(sd, &stup->t_self, stup);
+	}
+	else
+		elog(ERROR, "invalid pg_mv_statistic record (oid=%d)", mvoid);
+
+	/* update indexes too */
+	CatalogUpdateIndexes(sd, stup);
+
+	heap_freetuple(stup);
+
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum *) a;
+	Datum		db = *(Datum *) b;
+	SortSupport ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem *) a)->value;
+	Datum		db = ((ScalarItem *) b)->value;
+	SortSupport ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport) palloc0(offsetof(MultiSortSupportData, ssup)
+									 +sizeof(SortSupportData) * ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * add sort into for dimension 'dim' (index into vacattrstats) to mss,
+ * at the position 'sortattr'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *) vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int			i;
+	SortItem   *ia = (SortItem *) a;
+	SortItem   *ib = (SortItem *) b;
+
+	MultiSortSupport mss = (MultiSortSupport) arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int			compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
+
+int
+multi_sort_compare_dims(int start, int end,
+						const SortItem *a, const SortItem *b,
+						MultiSortSupport mss)
+{
+	int			dim;
+
+	for (dim = start; dim <= end; dim++)
+	{
+		int			r = ApplySortComparator(a->values[dim], a->isnull[dim],
+											b->values[dim], b->isnull[dim],
+											&mss->ssup[dim]);
+
+		if (r != 0)
+			return r;
+	}
+
+	return 0;
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
new file mode 100644
index 0000000..e471c88
--- /dev/null
+++ b/src/backend/utils/mvstats/common.h
@@ -0,0 +1,80 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES multivariate statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/sysattr.h"
+#include "access/tuptoaster.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_mv_statistic.h"
+#include "foreign/fdwapi.h"
+#include "postmaster/autovacuum.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/fmgroids.h"
+#include "utils/mvstats.h"
+#include "utils/sortsupport.h"
+#include "utils/syscache.h"
+
+
+/* FIXME private structure copied from analyze.c */
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+
+/* multi-sort */
+typedef struct MultiSortSupportData
+{
+	int			ndims;			/* number of dimensions supported by the */
+	SortSupportData ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData *MultiSortSupport;
+
+typedef struct SortItem
+{
+	Datum	   *values;
+	bool	   *isnull;
+} SortItem;
+
+MultiSortSupport multi_sort_init(int ndims);
+
+void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 int dim, VacAttrStats **vacattrstats);
+
+int multi_sort_compare(const void *a, const void *b, void *arg);
+
+int multi_sort_compare_dim(int dim, const SortItem *a,
+					   const SortItem *b, MultiSortSupport mss);
+
+int multi_sort_compare_dims(int start, int end, const SortItem *a,
+						const SortItem *b, MultiSortSupport mss);
+
+/* comparators, used when constructing multivariate stats */
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
diff --git a/src/backend/utils/mvstats/mvdist.c b/src/backend/utils/mvstats/mvdist.c
new file mode 100644
index 0000000..188bf99
--- /dev/null
+++ b/src/backend/utils/mvstats/mvdist.c
@@ -0,0 +1,597 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate distinct coefficients
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mvdist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <math.h>
+
+#include "common.h"
+#include "utils/bytea.h"
+#include "utils/lsyscache.h"
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/* internal state for generator of k-combinations of n elements */
+typedef struct CombinationGeneratorData
+{
+
+	int			k;				/* size of the combination */
+	int			current;		/* index of the next combination to return */
+
+	int			ncombinations;	/* number of combinations (size of array) */
+	int		   *combinations;	/* array of pre-built combinations */
+
+} CombinationGeneratorData;
+
+typedef CombinationGeneratorData *CombinationGenerator;
+
+/* generator API */
+static CombinationGenerator generator_init(int2vector *attrs, int k);
+static void generator_free(CombinationGenerator state);
+static int *generator_next(CombinationGenerator state, int2vector *attrs);
+
+static int n_choose_k(int n, int k);
+static int num_combinations(int n);
+static double ndistinct_for_combination(double totalrows, int numrows,
+				   HeapTuple *rows, int2vector *attrs, VacAttrStats **stats,
+				   int k, int *combination);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+MVNDistinct
+build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats)
+{
+	int		i, k;
+	int		numattrs = attrs->dim1;
+	int		numcombs = num_combinations(numattrs);
+
+	MVNDistinct	result;
+
+	result = palloc0(offsetof(MVNDistinctData, items) +
+					 numcombs * sizeof(MVNDistinctItem));
+
+	result->nitems = numcombs;
+
+	i = 0;
+	for (k = 2; k <= numattrs; k++)
+	{
+		int	* combination;
+		CombinationGenerator generator;
+
+		generator = generator_init(attrs, k);
+
+		while ((combination = generator_next(generator, attrs)))
+		{
+			MVNDistinctItem *item = &result->items[i++];
+
+			item->nattrs = k;
+			item->ndistinct = ndistinct_for_combination(totalrows, numrows, rows,
+												attrs, stats, k, combination);
+
+			item->attrs = palloc(k * sizeof(int));
+			memcpy(item->attrs, combination, k * sizeof(int));
+
+			/* must not overflow the output array */
+			Assert(i <= result->nitems);
+		}
+
+		generator_free(generator);
+	}
+
+	/* must consume exactly the whole output array */
+	Assert(i == result->nitems);
+
+	return result;
+}
+
+/*
+ * ndistinct_for_combination
+ *	Estimates number of distinct values in a combination of columns.
+ *
+ * This uses the same ndistinct estimator as compute_scalar_stats() in
+ * ANALYZE, i.e.
+ *
+ *     n*d / (n - f1 + f1*n/N)
+ *
+ * except that instead of values in a single column we are dealing with
+ * combination of multiple columns.
+ */
+static double
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+				   int2vector *attrs, VacAttrStats **stats,
+				   int k, int *combination)
+{
+	int i, j;
+	int f1, cnt, d;
+	int nmultiple, summultiple;
+	MultiSortSupport mss = multi_sort_init(k);
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed
+	 * somehow simpler / less error prone. Another option would be to
+	 * allocate the arrays for each SortItem separately, but that'd be
+	 * significant overhead (not just CPU, but especially memory bloat).
+	 */
+	SortItem * items = (SortItem*)palloc0(numrows * sizeof(SortItem));
+
+	Datum *values = (Datum*)palloc0(sizeof(Datum) * numrows * k);
+	bool  *isnull = (bool*)palloc0(sizeof(bool) * numrows * k);
+
+	Assert((k >= 2) && (k <= attrs->dim1));
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	for (i = 0; i < k; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, combination[i], stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[combination[i]],
+							   stats[combination[i]]->tupDesc,
+							   &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i-1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	return estimate_ndistinct(totalrows, numrows, d, f1);
+}
+
+MVNDistinct
+load_mv_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		ndist;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->ndist_enabled && mvstat->ndist_built);
+#endif
+
+	ndist = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_standist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_ndistinct(DatumGetByteaP(ndist));
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double	numer,
+			denom,
+			ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+			(double) f1 * (double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
+
+
+/*
+ * pg_ndistinct_in		- input routine for type pg_ndistinct.
+ *
+ * pg_ndistinct is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_ndistinct_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_ndistinct")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_ndistinct		- output routine for type pg_ndistinct.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ */
+Datum
+pg_ndistinct_out(PG_FUNCTION_ARGS)
+{
+	int i, j;
+	char		   *ret;
+	StringInfoData	str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVNDistinct ndist = deserialize_mv_ndistinct(data);
+
+	initStringInfo(&str);
+	appendStringInfoString(&str, "[");
+
+	for (i = 0; i < ndist->nitems; i++)
+	{
+		MVNDistinctItem item = ndist->items[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoString(&str, "{");
+
+		for (j = 0; j < item.nattrs; j++)
+		{
+			if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", item.attrs[j]);
+		}
+
+		appendStringInfo(&str, ", %f", item.ndistinct);
+
+		appendStringInfoString(&str, "}");
+	}
+
+	appendStringInfoString(&str, "]");
+
+	ret = pstrdup(str.data);
+	pfree(str.data);
+
+	PG_RETURN_CSTRING(ret);
+}
+
+/*
+ * pg_ndistinct_recv		- binary input routine for type pg_ndistinct.
+ */
+Datum
+pg_ndistinct_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_ndistinct")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_ndistinct_send		- binary output routine for type pg_ndistinct.
+ *
+ * XXX Histograms are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_ndistinct_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+static int
+n_choose_k(int n, int k)
+{
+	int i, numer, denom;
+
+	Assert((n > 0) && (k > 0) && (n >= k));
+
+	numer = denom = 1;
+	for (i = 1; i <= k; i++)
+	{
+		numer *= (n - i + 1);
+		denom *= i;
+	}
+
+	Assert(numer % denom == 0);
+
+	return numer / denom;
+}
+
+static int
+num_combinations(int n)
+{
+	int k;
+	int ncombs = 0;
+
+	/* ignore combinations with a single column */
+	for (k = 2; k <= n; k++)
+		ncombs += n_choose_k(n, k);
+
+	return ncombs;
+}
+
+/*
+ * generate all combinations (k elements from n)
+ */
+static void
+generate_combinations_recurse(CombinationGenerator state,
+							  int n, int index, int start, int *current)
+{
+	/* If we haven't filled all the elements, simply recurse. */
+	if (index < state->k)
+	{
+		int i;
+
+		/*
+		 * The values have to be in ascending order, so make sure we start
+		 * with the value passed by parameter.
+		 */
+
+		for (i = start; i < n; i++)
+		{
+			current[index] = i;
+			generate_combinations_recurse(state, n, (index+1), (i+1), current);
+		}
+
+		return;
+	}
+	else
+	{
+		/* we got a correct combination */
+		state->combinations = (int*)repalloc(state->combinations,
+						 state->k * (state->current + 1) * sizeof(int));
+		memcpy(&state->combinations[(state->k * state->current)],
+			   current, state->k * sizeof(int));
+		state->current++;
+	}
+}
+
+/* generate all k-combinations of n elements */
+static void
+generate_combinations(CombinationGenerator state, int n)
+{
+	int	   *current = (int *) palloc0(sizeof(int) * state->k);
+
+	generate_combinations_recurse(state, n, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the generator of combinations, and prebuild them.
+ *
+ * This pre-builds all the combinations. We could also generate them in
+ * generator_next(), but this seems simpler.
+ */
+static CombinationGenerator
+generator_init(int2vector *attrs, int k)
+{
+	int			n = attrs->dim1;
+	CombinationGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the generator state as a single chunk of memory */
+	state = (CombinationGenerator) palloc0(sizeof(CombinationGeneratorData));
+	state->combinations = (int*)palloc(k * sizeof(int));
+
+	state->ncombinations = n_choose_k(n, k);
+	state->current = 0;
+	state->k = k;
+
+	/* now actually pre-generate all the combinations */
+	generate_combinations(state, n);
+
+	/* make sure we got the expected number of combinations */
+	Assert(state->current == state->ncombinations);
+
+	/* reset the number, so we start with the first one */
+	state->current = 0;
+
+	return state;
+}
+
+/* free the generator state */
+static void
+generator_free(CombinationGenerator state)
+{
+	/* we've allocated a single chunk, so just free it */
+	pfree(state);
+}
+
+/* generate next combination */
+static int *
+generator_next(CombinationGenerator state, int2vector *attrs)
+{
+	if (state->current == state->ncombinations)
+		return NULL;
+
+	return &state->combinations[state->k * state->current++];
+}
+
+/*
+ * serialize list of ndistinct items into a bytea
+ */
+bytea *
+serialize_mv_ndistinct(MVNDistinct ndistinct)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+
+	/* we need to store nitems */
+	Size		len = VARHDRSZ + offsetof(MVNDistinctData, items) +
+					  ndistinct->nitems * offsetof(MVNDistinctItem, attrs);
+
+	/* and also include space for the actual attribute numbers */
+	for (i = 0; i < ndistinct->nitems; i++)
+		len += (sizeof(int) * ndistinct->items[i].nattrs);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	ndistinct->magic = MVSTAT_NDISTINCT_MAGIC;
+	ndistinct->type = MVSTAT_NDISTINCT_TYPE_BASIC;
+
+	/* first, store the number of items */
+	memcpy(tmp, ndistinct, offsetof(MVNDistinctData, items));
+	tmp += offsetof(MVNDistinctData, items);
+
+	/* store number of attributes and attribute numbers for each ndistinct entry */
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		MVNDistinctItem item = ndistinct->items[i];
+
+		memcpy(tmp, &item, offsetof(MVNDistinctItem, attrs));
+		tmp += offsetof(MVNDistinctItem, attrs);
+
+		memcpy(tmp, item.attrs, sizeof(int) * item.nattrs);
+		tmp += sizeof(int) * item.nattrs;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized ndistinct into MVNDistinct structure.
+ */
+MVNDistinct
+deserialize_mv_ndistinct(bytea *data)
+{
+	int			i;
+	Size		expected_size;
+	MVNDistinct ndistinct;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVNDistinctData, items))
+		elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVNDistinctData, items));
+
+	/* read the MVNDistinct header */
+	ndistinct = (MVNDistinct) palloc0(sizeof(MVNDistinctData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(ndistinct, tmp, offsetof(MVNDistinctData, items));
+	tmp += offsetof(MVNDistinctData, items);
+
+	if (ndistinct->magic != MVSTAT_NDISTINCT_MAGIC)
+		elog(ERROR, "invalid ndistinct magic %d (expected %dd)",
+			 ndistinct->magic, MVSTAT_NDISTINCT_MAGIC);
+
+	if (ndistinct->type != MVSTAT_NDISTINCT_TYPE_BASIC)
+		elog(ERROR, "invalid ndistinct type %d (expected %dd)",
+			 ndistinct->type, MVSTAT_NDISTINCT_TYPE_BASIC);
+
+	Assert(ndistinct->nitems > 0);
+
+	/* what minimum bytea size do we expect for those parameters */
+	expected_size = offsetof(MVNDistinctData, items) +
+		ndistinct->nitems * (offsetof(MVNDistinctItem, attrs) + sizeof(int) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the ndistinct items */
+	ndistinct = repalloc(ndistinct, offsetof(MVNDistinctData, items) +
+						 (ndistinct->nitems * sizeof(MVNDistinctItem)));
+
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		MVNDistinctItem *item = &ndistinct->items[i];
+
+		/* number of attributes */
+		memcpy(item, tmp, offsetof(MVNDistinctItem, attrs));
+		tmp += offsetof(MVNDistinctItem, attrs);
+
+		/* is the number of attributes valid? */
+		Assert((item->nattrs >= 2) && (item->nattrs <= MVSTATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the attribute */
+		item->attrs = (int*)palloc0(item->nattrs * sizeof(int));
+
+		/* copy attribute numbers */
+		memcpy(item->attrs, tmp, sizeof(int) * item->nattrs);
+		tmp += sizeof(int) * item->nattrs;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return ndistinct;
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c501168..e7d5b51 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2293,6 +2293,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any multivariate statistics */
+		if (pset.sversion >= 90600)
+		{
+			printfPQExpBuffer(&buf,
+							  "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+							  "  ndist_enabled,\n"
+							  "  ndist_built,\n"
+							  "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+							  "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+			  "FROM pg_mv_statistic stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/* options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+									  PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != 'm')
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 10759c7..86acca4 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -164,10 +164,11 @@ typedef enum ObjectClass
 	OCLASS_PUBLICATION,			/* pg_publication */
 	OCLASS_PUBLICATION_REL,		/* pg_publication_rel */
 	OCLASS_SUBSCRIPTION,		/* pg_subscription */
-	OCLASS_TRANSFORM			/* pg_transform */
+	OCLASS_TRANSFORM,			/* pg_transform */
+	OCLASS_STATISTICS			/* pg_mv_statistics */
 } ObjectClass;
 
-#define LAST_OCLASS		OCLASS_TRANSFORM
+#define LAST_OCLASS		OCLASS_STATISTICS
 
 /* flag bits for performDeletion/performMultipleDeletions: */
 #define PERFORM_DELETION_INTERNAL			0x0001		/* internal action */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 1187797..2d2a8c9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveMVStatistics(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a3635a4..e938300 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -176,6 +176,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_oid_index, 3380, on pg_mv_statistic using btree(oid oid_ops));
+#define MvStatisticOidIndexId  3380
+DECLARE_UNIQUE_INDEX(pg_mv_statistic_name_index, 3997, on pg_mv_statistic using btree(staname name_ops, stanamespace oid_ops));
+#define MvStatisticNameIndexId	3997
+DECLARE_INDEX(pg_mv_statistic_relid_index, 3379, on pg_mv_statistic using btree(starelid oid_ops));
+#define MvStatisticRelidIndexId 3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index dbeb25b..35e0e2b 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -141,6 +141,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 80a40ab..bf39d43 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -254,6 +254,10 @@ DATA(insert (	23	 18   78 e f ));
 /* pg_node_tree can be coerced to, but not from, text */
 DATA(insert (  194	 25    0 i b ));
 
+/* pg_ndistinct can be coerced to, but not from, bytea and text */
+DATA(insert (  3353	 17    0 i b ));
+DATA(insert (  3353	 25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
new file mode 100644
index 0000000..fad80a3
--- /dev/null
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -0,0 +1,78 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_mv_statistic.h
+ *	  definition of the system "multivariate statistic" relation (pg_mv_statistic)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_mv_statistic.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_MV_STATISTIC_H
+#define PG_MV_STATISTIC_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_mv_statistic definition.  cpp turns this into
+ *		typedef struct FormData_pg_mv_statistic
+ * ----------------
+ */
+#define MvStatisticRelationId  3381
+
+CATALOG(pg_mv_statistic,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+	NameData	staname;		/* statistics name */
+	Oid			stanamespace;	/* OID of namespace containing this statistics */
+	Oid			staowner;		/* statistics owner */
+
+	/* statistics requested to build */
+	bool		ndist_enabled;	/* build ndist coefficient? */
+
+	/* statistics that are available (if requested) */
+	bool		ndist_built;	/* ndistinct coeff built */
+
+	/*
+	 * variable-length fields start here, but we allow direct access to
+	 * stakeys
+	 */
+	int2vector	stakeys;		/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	pg_ndistinct		standist;		/* ndistinct coeff (serialized) */
+#endif
+
+} FormData_pg_mv_statistic;
+
+/* ----------------
+ *		Form_pg_mv_statistic corresponds to a pointer to a tuple with
+ *		the format of pg_mv_statistic relation.
+ * ----------------
+ */
+typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
+
+/* ----------------
+ *		compiler constants for pg_mv_statistic
+ * ----------------
+ */
+#define Natts_pg_mv_statistic					8
+#define Anum_pg_mv_statistic_starelid			1
+#define Anum_pg_mv_statistic_staname			2
+#define Anum_pg_mv_statistic_stanamespace		3
+#define Anum_pg_mv_statistic_staowner			4
+#define Anum_pg_mv_statistic_ndist_enabled		5
+#define Anum_pg_mv_statistic_ndist_built		6
+#define Anum_pg_mv_statistic_stakeys			7
+#define Anum_pg_mv_statistic_standist			8
+
+#endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 31c828a..940a991 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2726,6 +2726,15 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3354 (  pg_ndistinct_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3353 "2275" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3355 (  pg_ndistinct_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3353" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3356 (  pg_ndistinct_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3353 "2281" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3357 (  pg_ndistinct_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3353" _null_ _null_ _null_ _null_ _null_	pg_ndistinct_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 6e4c65e..9c9caf3 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -364,6 +364,10 @@ DATA(insert OID = 194 ( pg_node_tree	PGNSP PGUID -1 f b S f t \054 0 0 0 pg_node
 DESCR("string representing an internal node tree");
 #define PGNODETREEOID	194
 
+DATA(insert OID = 3353 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_ndistinct_in pg_ndistinct_out pg_ndistinct_recv pg_ndistinct_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate ndistinct coefficients");
+#define PGNDISTINCTOID	3353
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index db7f145..37a2f7a 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -49,6 +49,7 @@ extern void BootstrapToastTable(char *relName,
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_mv_statistic, 3439, 3440);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 8740cee..c323e81 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -77,6 +77,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(ParseState *pstate, List *name, List *args, bool oldstyle,
 				List *parameters);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 95dd8ba..e828b43 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_MVStatisticInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -416,6 +417,7 @@ typedef enum NodeTag
 	T_CreateSubscriptionStmt,
 	T_AlterSubscriptionStmt,
 	T_DropSubscriptionStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 07a8436..18e1dd1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -611,6 +611,16 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	bool		if_not_exists;	/* do nothing if statistics already exists */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1554,6 +1564,7 @@ typedef enum ObjectType
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
 	OBJECT_SUBSCRIPTION,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 643be54..7a55151 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -525,6 +525,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *mvstatlist;		/* list of MVStatisticInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -663,6 +664,32 @@ typedef struct ForeignKeyOptInfo
 	List	   *rinfos[INDEX_MAX_KEYS];
 } ForeignKeyOptInfo;
 
+/*
+ * MVStatisticInfo
+ *		Information about multivariate stats for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct MVStatisticInfo
+{
+	NodeTag		type;
+
+	Oid			mvoid;			/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
+
+	/* built/available statistics */
+	bool		ndist_built;	/* ndistinct coefficient built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+
+} MVStatisticInfo;
 
 /*
  * EquivalenceClasses
diff --git a/src/include/utils/acl.h b/src/include/utils/acl.h
index 686141b..4368adc 100644
--- a/src/include/utils/acl.h
+++ b/src/include/utils/acl.h
@@ -322,6 +322,7 @@ extern bool pg_event_trigger_ownercheck(Oid et_oid, Oid roleid);
 extern bool pg_extension_ownercheck(Oid ext_oid, Oid roleid);
 extern bool pg_publication_ownercheck(Oid pub_oid, Oid roleid);
 extern bool pg_subscription_ownercheck(Oid sub_oid, Oid roleid);
+extern bool pg_statistics_ownercheck(Oid stat_oid, Oid roleid);
 extern bool has_createrole_privilege(Oid roleid);
 extern bool has_bypassrls_privilege(Oid roleid);
 
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 5bdca82..262ee94 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -68,6 +68,12 @@ extern int	float8_cmp_internal(float8 a, float8 b);
 extern oidvector *buildoidvector(const Oid *oids, int n);
 extern Oid	oidparse(Node *node);
 
+/* mvdist.c */
+extern Datum pg_ndistinct_in(PG_FUNCTION_ARGS);
+extern Datum pg_ndistinct_out(PG_FUNCTION_ARGS);
+extern Datum pg_ndistinct_recv(PG_FUNCTION_ARGS);
+extern Datum pg_ndistinct_send(PG_FUNCTION_ARGS);
+
 /* regexp.c */
 extern char *regexp_fixed_prefix(text *text_re, bool case_insensitive,
 					Oid collation, bool *exact);
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
new file mode 100644
index 0000000..0660c59
--- /dev/null
+++ b/src/include/utils/mvstats.h
@@ -0,0 +1,57 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvstats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/mvstats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVSTATS_H
+#define MVSTATS_H
+
+#include "fmgr.h"
+#include "commands/vacuum.h"
+
+#define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+#define MVSTAT_NDISTINCT_MAGIC		0xA352BFA4		/* marks serialized bytea */
+#define MVSTAT_NDISTINCT_TYPE_BASIC	1		/* basic MCV list type */
+
+/* Multivariate distinct coefficients. */
+typedef struct MVNDistinctItem {
+	double		ndistinct;
+	int			nattrs;
+	int		   *attrs;
+} MVNDistinctItem;
+
+typedef struct MVNDistinctData {
+	uint32			magic;			/* magic constant marker */
+	uint32			type;			/* type of ndistinct (BASIC) */
+	int				nitems;
+	MVNDistinctItem	items[FLEXIBLE_ARRAY_MEMBER];
+} MVNDistinctData;
+
+typedef MVNDistinctData *MVNDistinct;
+
+
+MVNDistinct		load_mv_ndistinct(Oid mvoid);
+
+bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
+
+/* deserialization of stats (serialization is private to analyze) */
+MVNDistinct deserialize_mv_ndistinct(bytea *data);
+
+
+MVNDistinct build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+				 int2vector *attrs, VacAttrStats **stats);
+
+void build_mv_stats(Relation onerel, double totalrows,
+			   int numrows, HeapTuple *rows,
+			   int natts, VacAttrStats **vacattrstats);
+
+#endif
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index a617a7c..121d4c9 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -92,6 +92,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_mvstatvalid; /* state of rd_mvstatlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -136,6 +137,9 @@ typedef struct RelationData
 	Oid			rd_pkindex;		/* OID of primary key, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
 
+	/* data managed by RelationGetMVStatList: */
+	List	   *rd_mvstatlist;	/* list of OIDs of multivariate stats */
+
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
 	Bitmapset  *rd_keyattr;		/* cols that can be ref'd by foreign keys */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index da36b67..bee390e 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -39,6 +39,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetMVStatList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 66f60d2..d4ebbf7 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -66,6 +66,8 @@ enum SysCacheIdentifier
 	INDEXRELID,
 	LANGNAME,
 	LANGOID,
+	MVSTATNAMENSP,
+	MVSTATOID,
 	NAMESPACENAME,
 	NAMESPACEOID,
 	OPERNAMENSP,
diff --git a/src/test/regress/expected/mv_ndistinct.out b/src/test/regress/expected/mv_ndistinct.out
new file mode 100644
index 0000000..5f55091
--- /dev/null
+++ b/src/test/regress/expected/mv_ndistinct.out
@@ -0,0 +1,117 @@
+-- data type passed by value
+CREATE TABLE ndistinct (
+    a INT,
+    b INT,
+    c INT,
+    d INT
+);
+-- unknown column
+CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s10 ON (a) FROM ndistinct;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+ERROR:  duplicate column name in statistics definition
+-- correct command
+CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+-- perfectly correlated groups
+INSERT INTO ndistinct
+     SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
+ANALYZE ndistinct;
+SELECT ndist_enabled, ndist_built, standist
+  FROM pg_mv_statistic WHERE starelid = 'ndistinct'::regclass;
+ ndist_enabled | ndist_built |                                      standist                                       
+---------------+-------------+-------------------------------------------------------------------------------------
+ t             | t           | [{0, 1, 101.000000}, {0, 2, 101.000000}, {1, 2, 101.000000}, {0, 1, 2, 101.000000}]
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b, c
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b, c, d
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+TRUNCATE TABLE ndistinct;
+-- partially correlated groups
+INSERT INTO ndistinct
+     SELECT i/50, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE ndistinct;
+SELECT ndist_enabled, ndist_built, standist
+  FROM pg_mv_statistic WHERE starelid = 'ndistinct'::regclass;
+ ndist_enabled | ndist_built |                                      standist                                       
+---------------+-------------+-------------------------------------------------------------------------------------
+ t             | t           | [{0, 1, 201.000000}, {0, 2, 201.000000}, {1, 2, 101.000000}, {0, 1, 2, 201.000000}]
+(1 row)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ HashAggregate  (cost=230.00..232.01 rows=201 width=16)
+   Group Key: a, b
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=8)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=255.00..257.01 rows=201 width=20)
+   Group Key: a, b, c
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=12)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=280.00..290.00 rows=1000 width=24)
+   Group Key: a, b, c, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=16)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=255.00..265.00 rows=1000 width=20)
+   Group Key: b, c, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=12)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ HashAggregate  (cost=230.00..240.00 rows=1000 width=16)
+   Group Key: a, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=8)
+(3 rows)
+
+DROP TABLE ndistinct;
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index ec5ada9..2b5c022 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -38,6 +38,7 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
+CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
@@ -399,7 +400,8 @@ WITH objects (type, name, args) AS (VALUES
 				('access method', '{btree}', '{}'),
 				('publication', '{addr_pub}', '{}'),
 				('publication relation', '{addr_nsp, gentable}', '{addr_pub}'),
-				('subscription', '{addr_sub}', '{}')
+				('subscription', '{addr_sub}', '{}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
@@ -447,6 +449,7 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
  trigger                   |            |                   | t on addr_nsp.gentable                                               | t
  operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops USING btree                                   | t
  policy                    |            |                   | genpol on addr_nsp.gentable                                          | t
+ statistics                | addr_nsp   | gentable_stat     | addr_nsp.gentable_stat                                               | t
  collation                 | pg_catalog | "default"         | pg_catalog."default"                                                 | t
  transform                 |            |                   | for integer on language sql                                          | t
  text search dictionary    | addr_nsp   | addr_ts_dict      | addr_nsp.addr_ts_dict                                                | t
@@ -456,7 +459,7 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
  subscription              |            | addr_sub          | addr_sub                                                             | t
  publication               |            | addr_pub          | addr_pub                                                             | t
  publication relation      |            |                   | gentable in publication addr_pub                                     | t
-(45 rows)
+(46 rows)
 
 ---
 --- Cleanup resources
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 0bcec13..9a26205 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -817,11 +817,12 @@ WHERE c.castmethod = 'b' AND
  text              | character         |        0 | i
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
+ pg_ndistinct      | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(7 rows)
+(8 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 60abcad..2c54779 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1376,6 +1376,14 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_mv_stats| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length((s.standist)::text) AS ndistbytes
+   FROM ((pg_mv_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 0af013f..9d6bd18 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -116,6 +116,7 @@ pg_init_privs|t
 pg_language|t
 pg_largeobject|t
 pg_largeobject_metadata|t
+pg_mv_statistic|t
 pg_namespace|t
 pg_opclass|t
 pg_operator|t
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 312d290..6281cef 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -67,11 +67,12 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid |   typname    
------+--------------
- 194 | pg_node_tree
- 210 | smgr
-(2 rows)
+ oid  |   typname    
+------+--------------
+  194 | pg_node_tree
+ 3353 | pg_ndistinct
+  210 | smgr
+(3 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e9b2bad..0273ea6 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -116,3 +116,6 @@ test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
+
+# run tests of multivariate stats
+test: mv_ndistinct
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7cdc0f6..f7f3a14 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -171,3 +171,4 @@ test: with
 test: xml
 test: event_trigger
 test: stats
+test: mv_ndistinct
diff --git a/src/test/regress/sql/mv_ndistinct.sql b/src/test/regress/sql/mv_ndistinct.sql
new file mode 100644
index 0000000..5cef254
--- /dev/null
+++ b/src/test/regress/sql/mv_ndistinct.sql
@@ -0,0 +1,68 @@
+-- data type passed by value
+CREATE TABLE ndistinct (
+    a INT,
+    b INT,
+    c INT,
+    d INT
+);
+
+-- unknown column
+CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+
+-- single column
+CREATE STATISTICS s10 ON (a) FROM ndistinct;
+
+-- single column, duplicated
+CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+
+-- two columns, one duplicated
+CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+
+-- correct command
+CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+
+-- perfectly correlated groups
+INSERT INTO ndistinct
+     SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
+
+ANALYZE ndistinct;
+
+SELECT ndist_enabled, ndist_built, standist
+  FROM pg_mv_statistic WHERE starelid = 'ndistinct'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+
+TRUNCATE TABLE ndistinct;
+
+-- partially correlated groups
+INSERT INTO ndistinct
+     SELECT i/50, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE ndistinct;
+
+SELECT ndist_enabled, ndist_built, standist
+  FROM pg_mv_statistic WHERE starelid = 'ndistinct'::regclass;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
+
+DROP TABLE ndistinct;
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index e658ea3..791b942 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -41,6 +41,7 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
+CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
@@ -179,7 +180,8 @@ WITH objects (type, name, args) AS (VALUES
 				('access method', '{btree}', '{}'),
 				('publication', '{addr_pub}', '{}'),
 				('publication relation', '{addr_nsp, gentable}', '{addr_pub}'),
-				('subscription', '{addr_sub}', '{}')
+				('subscription', '{addr_sub}', '{}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
 	-- test roundtrip through pg_identify_object_as_address
-- 
2.5.5

0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v23.patchbinary/octet-stream; name=0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v23.patchDownload

From e04f7a0b43dc914d5b661723e1a4a14abc1df4ef Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:36:25 +0200
Subject: [PATCH 3/9] PATCH: functional dependencies (only the ANALYZE part)

- implementation of soft functional dependencies (ANALYZE etc.)
- updates existing regression tests (new catalog etc.)
- new regression test for functional dependencies
- pg_ndistinct data type (varlena-based)

The algorithm detecting the dependencies is rather simple and probably
needs improvements, so that it detects more complicated dependencies,
and also validation of the math.

The patch introduces pg_dependencies, a new varlena data type for
storing serialized version of functional dependencies. This is similar
to what pg_ndistinct does for ndistinct coefficients.
---
 doc/src/sgml/catalogs.sgml                    |  30 ++
 doc/src/sgml/ref/create_statistics.sgml       |  42 +-
 src/backend/catalog/system_views.sql          |   3 +-
 src/backend/commands/statscmds.c              |  37 +-
 src/backend/nodes/copyfuncs.c                 |   1 +
 src/backend/nodes/outfuncs.c                  |   2 +
 src/backend/optimizer/util/plancat.c          |   4 +-
 src/backend/parser/gram.y                     |  14 +-
 src/backend/utils/mvstats/Makefile            |   2 +-
 src/backend/utils/mvstats/README.dependencies | 118 +++++
 src/backend/utils/mvstats/common.c            |  26 +-
 src/backend/utils/mvstats/dependencies.c      | 622 ++++++++++++++++++++++++++
 src/include/catalog/pg_cast.h                 |   4 +
 src/include/catalog/pg_mv_statistic.h         |  14 +-
 src/include/catalog/pg_proc.h                 |   9 +
 src/include/catalog/pg_type.h                 |   4 +
 src/include/nodes/parsenodes.h                |   1 +
 src/include/nodes/relation.h                  |   2 +
 src/include/utils/builtins.h                  |   4 +
 src/include/utils/mvstats.h                   |  37 +-
 src/test/regress/expected/mv_dependencies.out | 147 ++++++
 src/test/regress/expected/mv_ndistinct.out    |  10 +-
 src/test/regress/expected/object_address.out  |   2 +-
 src/test/regress/expected/opr_sanity.out      |   3 +-
 src/test/regress/expected/rules.out           |   3 +-
 src/test/regress/expected/type_sanity.out     |   7 +-
 src/test/regress/parallel_schedule            |   2 +-
 src/test/regress/serial_schedule              |   1 +
 src/test/regress/sql/mv_dependencies.sql      | 139 ++++++
 src/test/regress/sql/mv_ndistinct.sql         |  10 +-
 src/test/regress/sql/object_address.sql       |   2 +-
 31 files changed, 1261 insertions(+), 41 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.dependencies
 create mode 100644 src/backend/utils/mvstats/dependencies.c
 create mode 100644 src/test/regress/expected/mv_dependencies.out
 create mode 100644 src/test/regress/sql/mv_dependencies.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2a7bd6c..852f573 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4285,6 +4285,17 @@
      </row>
 
      <row>
+      <entry><structfield>deps_enabled</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, functional dependencies will be computed for the combination of
+       columns, covered by the statistics. This does not mean the dependencies
+       are already computed, though.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>ndist_built</structfield></entry>
       <entry><type>bool</type></entry>
       <entry></entry>
@@ -4295,6 +4306,16 @@
      </row>
 
      <row>
+      <entry><structfield>deps_built</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, functional depenedencies are already computed and available for
+       use during query estimation.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>stakeys</structfield></entry>
       <entry><type>int2vector</type></entry>
       <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
@@ -4314,6 +4335,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stadeps</structfield></entry>
+      <entry><type>pg_dependencies</type></entry>
+      <entry></entry>
+      <entry>
+       Functional dependencies, serialized as <structname>pg_dependencies</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 9f6a65c..eaa39ee 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -21,8 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
-  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable>
+  WITH ( <replaceable class="PARAMETER">option</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+  ON ( <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
   FROM <replaceable class="PARAMETER">table_name</replaceable>
 </synopsis>
 
@@ -99,6 +100,41 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
 
   </variablelist>
 
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>options</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
  </refsect1>
 
  <refsect1 id="SQL-CREATESTATISTICS-examples">
@@ -119,7 +155,7 @@ CREATE TABLE t1 (
 INSERT INTO t1 SELECT i/100, i/500
                  FROM generate_series(1,1000000) s(i);
 
-CREATE STATISTICS s1 ON (a, b) FROM t1;
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t1;
 
 ANALYZE t1;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 00ab440..216ece5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -187,7 +187,8 @@ CREATE VIEW pg_mv_stats AS
         C.relname AS tablename,
         S.staname AS staname,
         S.stakeys AS attnums,
-        length(s.standist) AS ndistbytes
+        length(s.standist::bytea) AS ndistbytes,
+        length(S.stadeps::bytea) AS depsbytes
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index bde7e4b..af4f4d3 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -38,7 +38,9 @@ compare_int16(const void *a, const void *b)
 }
 
 /*
- * Implements the CREATE STATISTICS name ON (columns) FROM table
+ * Implements the CREATE STATISTICS command with syntax:
+ *
+ *    CREATE STATISTICS name WITH (options) ON (columns) FROM table
  *
  * We do require that the types support sorting (ltopr), although some
  * statistics might work with  equality only.
@@ -66,6 +68,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	ObjectAddress parentobject,
 				childobject;
 
+	/* by default build nothing */
+	bool		build_ndistinct = false,
+				build_dependencies = false;
+
 	Assert(IsA(stmt, CreateStatsStmt));
 
 	/* resolve the pieces of the name (namespace etc.) */
@@ -151,6 +157,31 @@ CreateStatistics(CreateStatsStmt *stmt)
 					(errcode(ERRCODE_UNDEFINED_COLUMN),
 			  errmsg("duplicate column name in statistics definition")));
 
+	/*
+	 * Parse the statistics options - currently only statistics types are
+	 * recognized (ndistinct, dependencies).
+	 */
+	foreach(l, stmt->options)
+	{
+		DefElem    *opt = (DefElem *) lfirst(l);
+
+		if (strcmp(opt->defname, "ndistinct") == 0)
+			build_ndistinct = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "dependencies") == 0)
+			build_dependencies = defGetBoolean(opt);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognized STATISTICS option \"%s\"",
+							opt->defname)));
+	}
+
+	/* Make sure there's at least one statistics type specified. */
+	if (! (build_ndistinct || build_dependencies))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("no statistics type (ndistinct, dependencies) requested")));
+
 	stakeys = buildint2vector(attnums, numcols);
 
 	/*
@@ -170,9 +201,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(stakeys);
 
 	/* enabled statistics */
-	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(true);
+	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(build_ndistinct);
+	values[Anum_pg_mv_statistic_deps_enabled - 1] = BoolGetDatum(build_dependencies);
 
 	nulls[Anum_pg_mv_statistic_standist - 1] = true;
+	nulls[Anum_pg_mv_statistic_stadeps - 1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index dc42be0..6e465a7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4357,6 +4357,7 @@ _copyCreateStatsStmt(const CreateStatsStmt *from)
 	COPY_NODE_FIELD(defnames);
 	COPY_NODE_FIELD(relation);
 	COPY_NODE_FIELD(keys);
+	COPY_NODE_FIELD(options);
 	COPY_SCALAR_FIELD(if_not_exists);
 
 	return newnode;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 57cc0b4..c72473b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2202,9 +2202,11 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(ndist_enabled);
+	WRITE_BOOL_FIELD(deps_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(ndist_built);
+	WRITE_BOOL_FIELD(deps_built);
 }
 
 static void
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index fc9ad93..8129143 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1287,7 +1287,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 		mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 		/* unavailable stats are not interesting for the planner */
-		if (mvstat->ndist_built)
+		if (mvstat->deps_built || mvstat->ndist_built)
 		{
 			info = makeNode(MVStatisticInfo);
 
@@ -1296,9 +1296,11 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 
 			/* enabled statistics */
 			info->ndist_enabled = mvstat->ndist_enabled;
+			info->deps_enabled = mvstat->deps_enabled;
 
 			/* built/available statistics */
 			info->ndist_built = mvstat->ndist_built;
+			info->deps_built = mvstat->deps_built;
 
 			/* stakeys */
 			adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 475a8a6..f61765f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3756,21 +3756,23 @@ ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
  *****************************************************************************/
 
 
-CreateStatsStmt:	CREATE STATISTICS any_name ON '(' columnList ')' FROM qualified_name
+CreateStatsStmt:	CREATE STATISTICS any_name opt_reloptions ON '(' columnList ')' FROM qualified_name
 						{
 							CreateStatsStmt *n = makeNode(CreateStatsStmt);
 							n->defnames = $3;
-							n->relation = $9;
-							n->keys = $6;
+							n->relation = $10;
+							n->keys = $7;
+							n->options = $4;
 							n->if_not_exists = false;
 							$$ = (Node *)n;
 						}
-					| CREATE STATISTICS IF_P NOT EXISTS any_name ON '(' columnList ')' FROM qualified_name
+					| CREATE STATISTICS IF_P NOT EXISTS any_name opt_reloptions ON '(' columnList ')' FROM qualified_name
 						{
 							CreateStatsStmt *n = makeNode(CreateStatsStmt);
 							n->defnames = $6;
-							n->relation = $12;
-							n->keys = $9;
+							n->relation = $13;
+							n->keys = $10;
+							n->options = $7;
 							n->if_not_exists = true;
 							$$ = (Node *)n;
 						}
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 7295d46..21fe7e5 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o mvdist.o
+OBJS = common.o dependencies.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
new file mode 100644
index 0000000..908f094
--- /dev/null
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -0,0 +1,118 @@
+Soft functional dependencies
+============================
+
+Functional dependencies are a concept well described in relational theory,
+particularly in definition of normalization and "normal forms". Wikipedia
+has a nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency
+    on a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee"
+    table that includes the attributes "Employee ID" and "Employee Date of
+    Birth", the functional dependency
+
+        {Employee ID} -> {Employee Date of Birth}
+
+    would hold. It follows from the previous two sentences that each
+    {Employee ID} is associated with precisely one {Employee Date of Birth}.
+
+    [1] https://en.wikipedia.org/wiki/Functional_dependency
+
+In practical terms, functional dependencies mean that a value in one column
+determines values in some other column. Consider for example this trivial
+table with two integer columns:
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, knowledge of the value in column 'a' is sufficient to determine the
+value in column 'b', as it's simply (a/10). A more practical example may be
+addresses, where the knowledge of a ZIP code (usually) determines city. Larger
+cities may have multiple ZIP codes, so the dependency can't be reversed.
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases it's actually a conscious
+design choice to model the dataset in denormalized way, either because of
+performance or to make querying easier.
+
+
+soft dependencies
+-----------------
+
+Real-world data sets often contain data errors, either because of data entry
+mistakes (user mistyping the ZIP code) or perhaps issues in generating the
+data (e.g. a ZIP code mistakenly assigned to two cities in different states).
+
+A strict implementation would either ignore dependencies in such cases,
+rendering the approach mostly useless even for slightly noisy data sets, or
+result in sudden changes in behavior depending on minor differences between
+samples provided to ANALYZE.
+
+For this reason the statistics implementes "soft" functional dependencies,
+associating each functional dependency with a degree of validity (a number
+number between 0 and 1). This degree is then used to combine selectivities
+in a smooth manner.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current algorithm is fairly simple - generate all possible functional
+dependencies, and for each one count the number of rows rows consistent it.
+Then use the fraction of rows (supporting/total) as the degree.
+
+To count the rows consistent with the dependency (a => b):
+
+ (a) Sort the data lexicographically, i.e. first by 'a' then 'b'.
+
+ (b) For each group of rows with the same 'a' value, count the number of
+     distinct values in 'b'.
+
+ (c) If there's a single distinct value in 'b', the rows are consistent with
+     the functional dependency. Otherwise they contradict it.
+
+The algorithm also requires a minimum size of the group to consider it
+consistent (currently 3 rows in the sample). Small groups make it less likely
+to break the consistency.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Apllying the functional dependencies is fairly simple - given a list of
+equality clauses, we compute selectivities of each clause and then use the
+degree to combine them using this formula
+
+    P(a=?,b=?) = P(a=?) * (d + (1-d) * P(b=?))
+
+Where 'd' is the degree of functional dependence (a=>b).
+
+With more than two equality clauses, this process happens recursively. For
+example for (a,b,c) we first use (a,b=>c) to break the computation into
+
+    P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))
+
+and then apply (a=>b) the same way on P(a=?,b=?).
+
+
+Consistecy of clauses
+---------------------
+
+Functional dependencies only express general dependencies between columns,
+without referencing particular values. This assumes that the equality clauses
+are in fact consistent with the functinal dependency, i.e. that given a
+dependency (a=>b), the value in (b=?) clause is the value determined by (a=?).
+If that's not the case, the clauses are "inconsistent" with the functional
+dependency and the result will be over-estimation.
+
+This may happen for example when using conditions on ZIP and city name with
+mismatching values (ZIP for a different city), etc. In such case the result
+set will be empty, but we'll estimate the selectivity using the ZIP condition.
+
+In this case the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+This issue is the price for the simplicity of functional dependencies. If the
+application frequently constructs queries with clauses inconsistent with
+functional dependencies present in the data, the best solution is not to
+use functional dependencies, but one of the more complex types of statistics.
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 7d2f3f3..4b570a1 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -21,7 +21,8 @@ static VacAttrStats **lookup_var_attr_stats(int2vector *attrs,
 
 static List *list_mv_stats(Oid relid);
 
-static void update_mv_stats(Oid relid, MVNDistinct ndistinct,
+static void update_mv_stats(Oid relid,
+					  MVNDistinct ndistinct, MVDependencies dependencies,
 					  int2vector *attrs, VacAttrStats **stats);
 
 
@@ -53,6 +54,7 @@ build_mv_stats(Relation onerel, double totalrows,
 		int			j;
 		MVStatisticInfo *stat = (MVStatisticInfo *) lfirst(lc);
 		MVNDistinct	ndistinct = NULL;
+		MVDependencies deps = NULL;
 
 		VacAttrStats **stats = NULL;
 		int			numatts = 0;
@@ -89,8 +91,12 @@ build_mv_stats(Relation onerel, double totalrows,
 		if (stat->ndist_enabled)
 			ndistinct = build_mv_ndistinct(totalrows, numrows, rows, attrs, stats);
 
+		/* analyze functional dependencies between the columns */
+		if (stat->deps_enabled)
+			deps = build_mv_dependencies(numrows, rows, attrs, stats);
+
 		/* store the statistics in the catalog */
-		update_mv_stats(stat->mvoid, ndistinct, attrs, stats);
+		update_mv_stats(stat->mvoid, ndistinct, deps, attrs, stats);
 	}
 }
 
@@ -170,6 +176,8 @@ list_mv_stats(Oid relid)
 		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
 		info->ndist_enabled = stats->ndist_enabled;
 		info->ndist_built = stats->ndist_built;
+		info->deps_enabled = stats->deps_enabled;
+		info->deps_built = stats->deps_built;
 
 		result = lappend(result, info);
 	}
@@ -191,7 +199,7 @@ list_mv_stats(Oid relid)
  *	Serializes the statistics and stores them into the pg_mv_statistic tuple.
  */
 static void
-update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -218,18 +226,29 @@ update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
 		values[Anum_pg_mv_statistic_standist-1] = PointerGetDatum(data);
 	}
 
+	if (dependencies != NULL)
+	{
+		nulls[Anum_pg_mv_statistic_stadeps - 1] = false;
+		values[Anum_pg_mv_statistic_stadeps - 1]
+			= PointerGetDatum(serialize_mv_dependencies(dependencies));
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_standist - 1] = true;
+	replaces[Anum_pg_mv_statistic_stadeps - 1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_ndist_built - 1] = false;
+	nulls[Anum_pg_mv_statistic_deps_built - 1] = false;
 	nulls[Anum_pg_mv_statistic_stakeys - 1] = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_ndist_built - 1] = true;
+	replaces[Anum_pg_mv_statistic_deps_built - 1] = true;
 	replaces[Anum_pg_mv_statistic_stakeys - 1] = true;
 
 	values[Anum_pg_mv_statistic_ndist_built - 1] = BoolGetDatum(ndistinct != NULL);
+	values[Anum_pg_mv_statistic_deps_built - 1] = BoolGetDatum(dependencies != NULL);
 
 	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(attrs);
 
@@ -370,6 +389,7 @@ multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
 							   &mss->ssup[dim]);
 }
 
+/* compare all the dimensions in a given range (inclusive) */
 int
 multi_sort_compare_dims(int start, int end,
 						const SortItem *a, const SortItem *b,
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
new file mode 100644
index 0000000..c6390e2
--- /dev/null
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -0,0 +1,622 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES multivariate functional dependencies
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "common.h"
+
+#include "utils/bytea.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Internal state for DependencyGenerator of dependencies. Dependencies are similar to
+ * k-permutations of n elements, except that the order does not matter for the
+ * first (k-1) elements. That is, (a,b=>c) and (b,a=>c) are equivalent.
+ */
+typedef struct DependencyGeneratorData
+{
+	int		k;					/* size of the dependency */
+	int		current;			/* next dependency to return (index) */
+	int		ndependencies;		/* number of dependencies generated */
+	int	   *dependencies;		/* array of pre-generated dependencies  */
+} DependencyGeneratorData;
+
+typedef DependencyGeneratorData *DependencyGenerator;
+
+static void
+generate_dependencies_recurse(DependencyGenerator state,
+							  int n, int index, int start, int *current)
+{
+	/*
+	 * The generator handles the first (k-1) elements differently from
+	 * the last element.
+	 */
+	if (index < (state->k - 1))
+	{
+		int i;
+
+		/*
+		 * The first (k-1) values have to be in ascending order, which we
+		 * generate recursively.
+		 */
+
+		for (i = start; i < n; i++)
+		{
+			current[index] = i;
+			generate_dependencies_recurse(state, n, (index+1), (i+1), current);
+		}
+	}
+	else
+	{
+		int i;
+
+		/*
+		 * the last element is the implied value, which does not respect the
+		 * ascending order. We just need to check that the value is not in the
+		 * first (k-1) elements.
+		 */
+
+		for (i = 0; i < n; i++)
+		{
+			int		j;
+			bool	match = false;
+
+			current[index] = i;
+
+			for (j = 0; j < index; j++)
+			{
+				if (current[j] == i)
+				{
+					match = true;
+					break;
+				}
+			}
+
+			/*
+			 * If the value is not found in the first part of the dependency,
+			 * we're done.
+			 */
+			if (! match)
+			{
+				state->dependencies
+					= (int*)repalloc(state->dependencies,
+									 state->k * (state->ndependencies + 1) * sizeof(int));
+				memcpy(&state->dependencies[(state->k * state->ndependencies)],
+					   current, state->k * sizeof(int));
+				state->ndependencies++;
+			}
+		}
+	}
+}
+
+/* generate all dependencies (k-permutations of n elements) */
+static void
+generate_dependencies(DependencyGenerator state, int n)
+{
+	int	   *current = (int *) palloc0(sizeof(int) * state->k);
+
+	generate_dependencies_recurse(state, n, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the DependencyGenerator of variations, and prebuild the variations
+ *
+ * This pre-builds all the variations. We could also generate them in
+ * DependencyGenerator_next(), but this seems simpler.
+ */
+static DependencyGenerator
+DependencyGenerator_init(int2vector *attrs, int k)
+{
+	int			n = attrs->dim1;
+	DependencyGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the DependencyGenerator state as a single chunk of memory */
+	state = (DependencyGenerator) palloc0(sizeof(DependencyGeneratorData));
+	state->dependencies = (int*)palloc(k * sizeof(int));
+
+	state->ndependencies = 0;
+	state->current = 0;
+	state->k = k;
+
+	/* now actually pre-generate all the variations */
+	generate_dependencies(state, n);
+
+	return state;
+}
+
+/* free the DependencyGenerator state */
+static void
+DependencyGenerator_free(DependencyGenerator state)
+{
+	/* we've allocated a single chunk, so just free it */
+	pfree(state);
+}
+
+/* generate next combination */
+static int *
+DependencyGenerator_next(DependencyGenerator state, int2vector *attrs)
+{
+	if (state->current == state->ndependencies)
+		return NULL;
+
+	return &state->dependencies[state->k * state->current++];
+}
+
+
+/*
+ * validates functional dependency on the data
+ *
+ * An actual work horse of detecting functional dependencies. Given a variation
+ * of k attributes, it checks that the first (k-1) are sufficient to determine
+ * the last one.
+ */
+static double
+dependency_degree(int numrows, HeapTuple *rows, int k, int *dependency,
+				  VacAttrStats **stats, int2vector *attrs)
+{
+	int			i,
+				j;
+	int			nvalues = numrows * k;
+	MultiSortSupport mss;
+	SortItem   *items;
+	Datum	   *values;
+	bool	   *isnull;
+
+	/*
+	 * XXX Maybe the threshold should be somehow related to the number of
+	 * distinct values in the combination of columns we're analyzing. Assuming
+	 * the distribution is uniform, we can estimate the average group size and
+	 * use it as a threshold, similarly to what we do for MCV lists.
+	 */
+	int			min_group_size = 3;
+
+	/* counters valid within a group */
+	int			group_size = 0;
+	int			n_violations = 0;
+
+	/* total number of rows supporting (consistent with) the dependency */
+	int			n_supporting_rows = 0;
+
+	/* Make sure we have at least two input attributes. */
+	Assert(k >= 2);
+
+	/* sort info for all attributes columns */
+	mss = multi_sort_init(k);
+
+	/* data for the sort */
+	items = (SortItem *) palloc0(numrows * sizeof(SortItem));
+	values = (Datum *) palloc0(sizeof(Datum) * nvalues);
+	isnull = (bool *) palloc0(sizeof(bool) * nvalues);
+
+	/* fix the pointers to values/isnull */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	/*
+	 * Verify the dependency (a,b,...)->z, using a rather simple algorithm:
+	 *
+	 * (a) sort the data lexicographically
+	 *
+	 * (b) split the data into groups by first (k-1) columns
+	 *
+	 * (c) for each group count different values in the last column
+	 */
+
+	/* prepare the sort function for the first dimension, and SortItem array */
+	for (i = 0; i < k; i++)
+	{
+		multi_sort_add_dimension(mss, i, dependency[i], stats);
+
+		/* accumulate all the data for both columns into an array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i]
+				= heap_getattr(rows[j], attrs->values[dependency[i]],
+							   stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	/* sort the items so that we can detect the groups */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/*
+	 * Walk through the sorted array, split it into rows according to the
+	 * first (k-1) columns. If there's a single value in the last column, we
+	 * count the group as 'supporting' the functional dependency. Otherwise we
+	 * count it as contradicting.
+	 *
+	 * We also require a group to have a minimum number of rows to be
+	 * considered useful for supporting the dependency. Contradicting groups
+	 * may be of any size, though.
+	 *
+	 * XXX The minimum size requirement makes it impossible to identify case
+	 * when both columns are unique (or nearly unique), and therefore
+	 * trivially functionally dependent.
+	 */
+
+	/* start with the first row forming a group */
+	group_size = 1;
+
+	for (i = 1; i <= numrows; i++)
+	{
+		/*
+		 * Check if the group ended, which may be either because we processed
+		 * all the items (i==numrows), or because the i-th item is not equal
+		 * to the preceding one.
+		 */
+		if ((i == numrows) ||
+			(multi_sort_compare_dims(0, (k - 2), &items[i - 1], &items[i], mss) != 0))
+		{
+			/*
+			 * Do accounting for the preceding group, and reset counters.
+			 *
+			 * If there were no contradicting rows in the group, count the
+			 * rows as supporting.
+			 */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+				n_supporting_rows += group_size;
+
+			/* current values start a new group */
+			n_violations = 0;
+			group_size = 0;
+		}
+		/* first colums match, but the last one does not (so contradicting) */
+		else if (multi_sort_compare_dim((k - 1), &items[i - 1], &items[i], mss) != 0)
+			n_violations += 1;
+
+		group_size += 1;
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(mss);
+
+	/* Compute the 'degree of validity' as (supporting/total). */
+	return (n_supporting_rows * 1.0 / numrows);
+}
+
+/*
+ * detects functional dependencies between groups of columns
+ *
+ * Generates all possible subsets of columns (variations) and checks if the
+ * last one is determined by the preceding ones. For example given 3 columns,
+ * there are 12 variations (6 for variations on 2 columns, 6 for 3 columns):
+ *
+ *	   two columns			  three columns
+ *	   -----------			  -------------
+ *	   (a) -> c				  (a,b) -> c
+ *	   (b) -> c				  (b,a) -> c
+ *	   (a) -> b				  (a,c) -> b
+ *	   (c) -> b				  (c,a) -> b
+ *	   (c) -> a				  (c,b) -> a
+ *	   (b) -> a				  (b,c) -> a
+ */
+MVDependencies
+build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
+					  VacAttrStats **stats)
+{
+	int			i;
+	int			k;
+	int			numattrs = attrs->dim1;
+
+	/* result */
+	MVDependencies dependencies = NULL;
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * We'll try build functional dependencies starting from the smallest ones
+	 * covering just 2 columns, to the largest ones, covering all columns
+	 * included int the statistics. We start from the smallest ones because we
+	 * want to be able to skip already implied ones.
+	 */
+	for (k = 2; k <= numattrs; k++)
+	{
+		int		   *dependency; /* array with k elements */
+
+		/* prepare a DependencyGenerator of variation */
+		DependencyGenerator DependencyGenerator = DependencyGenerator_init(attrs, k);
+
+		/* generate all possible variations of k values (out of n) */
+		while ((dependency = DependencyGenerator_next(DependencyGenerator, attrs)))
+		{
+			double			degree;
+			MVDependency	d;
+
+			/* compute how valid the dependency seems */
+			degree = dependency_degree(numrows, rows, k, dependency, stats, attrs);
+
+			/* if the dependency seems entirely invalid, don't bother storing it */
+			if (degree == 0.0)
+				continue;
+
+			d = (MVDependency) palloc0(offsetof(MVDependencyData, attributes)
+									   +k * sizeof(int));
+
+			/* copy the dependency (and keep the indexes into stakeys) */
+			d->degree = degree;
+			d->nattributes = k;
+			for (i = 0; i < k; i++)
+				d->attributes[i] = dependency[i];
+
+			/* initialize the list of dependencies */
+			if (dependencies == NULL)
+			{
+				dependencies
+					= (MVDependencies) palloc0(sizeof(MVDependenciesData));
+
+				dependencies->magic = MVSTAT_DEPS_MAGIC;
+				dependencies->type = MVSTAT_DEPS_TYPE_BASIC;
+				dependencies->ndeps = 0;
+			}
+
+			dependencies->ndeps++;
+			dependencies = (MVDependencies) repalloc(dependencies,
+										   offsetof(MVDependenciesData, deps)
+								+dependencies->ndeps * sizeof(MVDependency));
+
+			dependencies->deps[dependencies->ndeps - 1] = d;
+		}
+
+		/* we're done with variations of k elements, so free the DependencyGenerator */
+		DependencyGenerator_free(DependencyGenerator);
+	}
+
+	return dependencies;
+}
+
+
+/*
+ * serialize list of dependencies into a bytea
+ */
+bytea *
+serialize_mv_dependencies(MVDependencies dependencies)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+	Size		len;
+
+	/* we need to store ndeps, with a number of attributes for each one */
+	len = VARHDRSZ + offsetof(MVDependenciesData, deps) +
+		  dependencies->ndeps * offsetof(MVDependencyData, attributes);
+
+	/* and also include space for the actual attribute numbers and degrees */
+	for (i = 0; i < dependencies->ndeps; i++)
+		len += (sizeof(int16) * dependencies->deps[i]->nattributes);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	/* first, store the number of dimensions / items */
+	memcpy(tmp, dependencies, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	/* store number of attributes and attribute numbers for each dependency */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency d = dependencies->deps[i];
+
+		memcpy(tmp, d, offsetof(MVDependencyData, attributes));
+		tmp += offsetof(MVDependencyData, attributes);
+
+		memcpy(tmp, d->attributes, sizeof(int16) * d->nattributes);
+		tmp += sizeof(int16) * d->nattributes;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies
+deserialize_mv_dependencies(bytea *data)
+{
+	int			i;
+	Size		expected_size;
+	MVDependencies dependencies;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVDependenciesData, deps))
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVDependenciesData, deps));
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies) palloc0(sizeof(MVDependenciesData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(dependencies, tmp, offsetof(MVDependenciesData, deps));
+	tmp += offsetof(MVDependenciesData, deps);
+
+	if (dependencies->magic != MVSTAT_DEPS_MAGIC)
+		elog(ERROR, "invalid dependency magic %d (expected %dd)",
+			 dependencies->magic, MVSTAT_DEPS_MAGIC);
+
+	if (dependencies->type != MVSTAT_DEPS_TYPE_BASIC)
+		elog(ERROR, "invalid dependency type %d (expected %dd)",
+			 dependencies->type, MVSTAT_DEPS_TYPE_BASIC);
+
+	Assert(dependencies->ndeps > 0);
+
+	/* what minimum bytea size do we expect for those parameters */
+	expected_size = offsetof(MVDependenciesData, deps) +
+		dependencies->ndeps * (offsetof(MVDependencyData, attributes) +
+							   sizeof(int16) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependenciesData, deps)
+							+(dependencies->ndeps * sizeof(MVDependency)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		double		degree;
+		int			k;
+		MVDependency d;
+
+		/* degree of validity */
+		memcpy(&degree, tmp, sizeof(double));
+		tmp += sizeof(double);
+
+		/* number of attributes */
+		memcpy(&k, tmp, sizeof(int));
+		tmp += sizeof(int);
+
+		/* is the number of attributes valid? */
+		Assert((k >= 2) && (k <= MVSTATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the dependency */
+		d = (MVDependency) palloc0(offsetof(MVDependencyData, attributes) +
+								   (k * sizeof(int)));
+
+		d->degree = degree;
+		d->nattributes = k;
+
+		/* copy attribute numbers */
+		memcpy(d->attributes, tmp, sizeof(int16) * d->nattributes);
+		tmp += sizeof(int16) * d->nattributes;
+
+		dependencies->deps[i] = d;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return dependencies;
+}
+
+/*
+ * pg_dependencies_in		- input routine for type pg_dependencies.
+ *
+ * pg_dependencies is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_dependencies_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies		- output routine for type pg_dependencies.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ */
+Datum
+pg_dependencies_out(PG_FUNCTION_ARGS)
+{
+	int i, j;
+	char		   *ret;
+	StringInfoData	str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVDependencies dependencies = deserialize_mv_dependencies(data);
+
+	initStringInfo(&str);
+	appendStringInfoString(&str, "[");
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency dependency = dependencies->deps[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoString(&str, "{");
+
+		for (j = 0; j < dependency->nattributes; j++)
+		{
+			if (j == dependency->nattributes-1)
+				appendStringInfoString(&str, " => ");
+			else if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", dependency->attributes[j]);
+		}
+
+		appendStringInfo(&str, " : %f", dependency->degree);
+
+		appendStringInfoString(&str, "}");
+	}
+
+	appendStringInfoString(&str, "]");
+
+	ret = pstrdup(str.data);
+	pfree(str.data);
+
+	PG_RETURN_CSTRING(ret);
+}
+
+/*
+ * pg_dependencies_recv		- binary input routine for type pg_dependencies.
+ */
+Datum
+pg_dependencies_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies_send		- binary output routine for type pg_dependencies.
+ *
+ * XXX Histograms are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_dependencies_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index bf39d43..22fa4b8 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -258,6 +258,10 @@ DATA(insert (  194	 25    0 i b ));
 DATA(insert (  3353	 17    0 i b ));
 DATA(insert (  3353	 25    0 i i ));
 
+/* pg_dependencies can be coerced to, but not from, bytea and text */
+DATA(insert (  3358	 17    0 i b ));
+DATA(insert (  3358	 25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index fad80a3..e119cb7 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -38,9 +38,11 @@ CATALOG(pg_mv_statistic,3381)
 
 	/* statistics requested to build */
 	bool		ndist_enabled;	/* build ndist coefficient? */
+	bool		deps_enabled;	/* analyze dependencies? */
 
 	/* statistics that are available (if requested) */
 	bool		ndist_built;	/* ndistinct coeff built */
+	bool		deps_built;		/* dependencies were built */
 
 	/*
 	 * variable-length fields start here, but we allow direct access to
@@ -50,6 +52,7 @@ CATALOG(pg_mv_statistic,3381)
 
 #ifdef CATALOG_VARLEN
 	pg_ndistinct		standist;		/* ndistinct coeff (serialized) */
+	pg_dependencies		stadeps;		/* dependencies (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -65,14 +68,17 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					8
+#define Natts_pg_mv_statistic					11
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_ndist_enabled		5
-#define Anum_pg_mv_statistic_ndist_built		6
-#define Anum_pg_mv_statistic_stakeys			7
-#define Anum_pg_mv_statistic_standist			8
+#define Anum_pg_mv_statistic_deps_enabled		6
+#define Anum_pg_mv_statistic_ndist_built		7
+#define Anum_pg_mv_statistic_deps_built			8
+#define Anum_pg_mv_statistic_stakeys			9
+#define Anum_pg_mv_statistic_standist			10
+#define Anum_pg_mv_statistic_stadeps			11
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 940a991..b1f7b75 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2735,6 +2735,15 @@ DESCR("I/O");
 DATA(insert OID = 3357 (  pg_ndistinct_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3353" _null_ _null_ _null_ _null_ _null_	pg_ndistinct_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 3359 (  pg_dependencies_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3358 "2275" _null_ _null_ _null_ _null_ _null_ pg_dependencies_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3360 (  pg_dependencies_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3358" _null_ _null_ _null_ _null_ _null_ pg_dependencies_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3361 (  pg_dependencies_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3358 "2281" _null_ _null_ _null_ _null_ _null_ pg_dependencies_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3362 (  pg_dependencies_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3358" _null_ _null_ _null_ _null_ _null_	pg_dependencies_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 9c9caf3..da637d4 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -368,6 +368,10 @@ DATA(insert OID = 3353 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_nd
 DESCR("multivariate ndistinct coefficients");
 #define PGNDISTINCTOID	3353
 
+DATA(insert OID = 3358 ( pg_dependencies		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_dependencies_in pg_dependencies_out pg_dependencies_recv pg_dependencies_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate histogram");
+#define PGDEPENDENCIESOID	3358
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 18e1dd1..fe4b93a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -617,6 +617,7 @@ typedef struct CreateStatsStmt
 	List	   *defnames;		/* qualified name (list of Value strings) */
 	RangeVar   *relation;		/* relation to build statistics on */
 	List	   *keys;			/* String nodes naming referenced column(s) */
+	List	   *options;		/* list of DefElem nodes */
 	bool		if_not_exists;	/* do nothing if statistics already exists */
 } CreateStatsStmt;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7a55151..56957e8 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -681,9 +681,11 @@ typedef struct MVStatisticInfo
 	RelOptInfo *rel;			/* back-link to index's table */
 
 	/* enabled statistics */
+	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 
 	/* built/available statistics */
+	bool		deps_built;		/* functional dependencies built */
 	bool		ndist_built;	/* ndistinct coefficient built */
 
 	/* columns in the statistics (attnums) */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 262ee94..9ffd80c 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -73,6 +73,10 @@ extern Datum pg_ndistinct_in(PG_FUNCTION_ARGS);
 extern Datum pg_ndistinct_out(PG_FUNCTION_ARGS);
 extern Datum pg_ndistinct_recv(PG_FUNCTION_ARGS);
 extern Datum pg_ndistinct_send(PG_FUNCTION_ARGS);
+extern Datum pg_dependencies_in(PG_FUNCTION_ARGS);
+extern Datum pg_dependencies_out(PG_FUNCTION_ARGS);
+extern Datum pg_dependencies_recv(PG_FUNCTION_ARGS);
+extern Datum pg_dependencies_send(PG_FUNCTION_ARGS);
 
 /* regexp.c */
 extern char *regexp_fixed_prefix(text *text_re, bool case_insensitive,
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 0660c59..e5a49bf 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -39,16 +39,49 @@ typedef struct MVNDistinctData {
 typedef MVNDistinctData *MVNDistinct;
 
 
+#define MVSTAT_DEPS_MAGIC		0xB4549A2C		/* marks serialized bytea */
+#define MVSTAT_DEPS_TYPE_BASIC	1		/* basic dependencies type */
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
+ */
+typedef struct MVDependencyData
+{
+	double		degree;			/* degree of validity (0-1) */
+	int			nattributes;	/* number of attributes */
+	int16		attributes[FLEXIBLE_ARRAY_MEMBER];	/* attribute numbers */
+} MVDependencyData;
+
+typedef MVDependencyData *MVDependency;
+
+typedef struct MVDependenciesData
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MV Dependencies (BASIC) */
+	int32		ndeps;			/* number of dependencies */
+	MVDependency deps[FLEXIBLE_ARRAY_MEMBER];	/* dependencies */
+} MVDependenciesData;
+
+typedef MVDependenciesData *MVDependencies;
+
+
+
 MVNDistinct		load_mv_ndistinct(Oid mvoid);
 
 bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
+bytea *serialize_mv_dependencies(MVDependencies dependencies);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVNDistinct deserialize_mv_ndistinct(bytea *data);
-
+MVDependencies deserialize_mv_dependencies(bytea *data);
 
 MVNDistinct build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
-				 int2vector *attrs, VacAttrStats **stats);
+							   int2vector *attrs, VacAttrStats **stats);
+
+MVDependencies build_mv_dependencies(int numrows, HeapTuple *rows,
+					  int2vector *attrs,
+					  VacAttrStats **stats);
 
 void build_mv_stats(Relation onerel, double totalrows,
 			   int numrows, HeapTuple *rows,
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
new file mode 100644
index 0000000..d442a16
--- /dev/null
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -0,0 +1,147 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s1 WITH (dependencies) ON (unknown_column) FROM functional_dependencies;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s1 WITH (dependencies) ON (a) FROM functional_dependencies;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s1 WITH (dependencies) ON (a,a) FROM functional_dependencies;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s1 WITH (dependencies) ON (a, a, b) FROM functional_dependencies;
+ERROR:  duplicate column name in statistics definition
+-- correct command
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | stadeps 
+--------------+------------+---------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 0.999900}, {0 => 2 : 0.999900}, {1 => 2 : 0.999900}, {0, 1 => 2 : 0.999900}, {0, 2 => 1 : 0.999900}]
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 0.999900}, {0 => 2 : 0.999900}, {1 => 2 : 0.494900}, {0, 1 => 2 : 0.999900}, {0, 2 => 1 : 0.999900}]
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 1.000000}, {0 => 2 : 1.000000}, {1 => 2 : 1.000000}, {0, 1 => 2 : 1.000000}, {0, 2 => 1 : 1.000000}]
+(1 row)
+
+DROP TABLE functional_dependencies;
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s2 WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built | stadeps 
+--------------+------------+---------
+ t            | f          | 
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 0.999900}, {0 => 2 : 0.999900}, {1 => 2 : 0.999900}, {0, 1 => 2 : 0.999900}, {0, 2 => 1 : 0.999900}]
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 0.999900}, {0 => 2 : 0.999900}, {1 => 2 : 0.494900}, {0, 1 => 2 : 0.999900}, {0, 2 => 1 : 0.999900}]
+(1 row)
+
+TRUNCATE functional_dependencies;
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                     stadeps                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------
+ t            | t          | [{0 => 1 : 1.000000}, {0 => 2 : 1.000000}, {1 => 2 : 1.000000}, {0, 1 => 2 : 1.000000}, {0, 2 => 1 : 1.000000}]
+(1 row)
+
+DROP TABLE functional_dependencies;
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s3 WITH (dependencies) ON (a, b, c, d) FROM functional_dependencies;
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+ deps_enabled | deps_built |                                                                                                                                                     stadeps                                                                                                                                                     
+--------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ t            | t          | [{1 => 0 : 1.000000}, {2 => 0 : 1.000000}, {2 => 1 : 1.000000}, {3 => 0 : 1.000000}, {3 => 1 : 0.996700}, {0, 2 => 1 : 1.000000}, {0, 3 => 1 : 0.996700}, {1, 2 => 0 : 1.000000}, {1, 3 => 0 : 1.000000}, {2, 3 => 0 : 1.000000}, {2, 3 => 1 : 1.000000}, {0, 2, 3 => 1 : 1.000000}, {1, 2, 3 => 0 : 1.000000}]
+(1 row)
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/expected/mv_ndistinct.out b/src/test/regress/expected/mv_ndistinct.out
index 5f55091..06a7634 100644
--- a/src/test/regress/expected/mv_ndistinct.out
+++ b/src/test/regress/expected/mv_ndistinct.out
@@ -6,19 +6,19 @@ CREATE TABLE ndistinct (
     d INT
 );
 -- unknown column
-CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (unknown_column) FROM ndistinct;
 ERROR:  column "unknown_column" referenced in statistics does not exist
 -- single column
-CREATE STATISTICS s10 ON (a) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a) FROM ndistinct;
 ERROR:  statistics require at least 2 columns
 -- single column, duplicated
-CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a,a) FROM ndistinct;
 ERROR:  duplicate column name in statistics definition
 -- two columns, one duplicated
-CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a, a, b) FROM ndistinct;
 ERROR:  duplicate column name in statistics definition
 -- correct command
-CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a, b, c) FROM ndistinct;
 -- perfectly correlated groups
 INSERT INTO ndistinct
      SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index 2b5c022..f574554 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -38,7 +38,7 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
-CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
+CREATE STATISTICS addr_nsp.gentable_stat WITH (ndistinct) ON (a,b) FROM addr_nsp.gentable;
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9a26205..db1cf8a 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -818,11 +818,12 @@ WHERE c.castmethod = 'b' AND
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
  pg_ndistinct      | bytea             |        0 | i
+ pg_dependencies   | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(8 rows)
+(9 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2c54779..39179a6 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1380,7 +1380,8 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.staname,
     s.stakeys AS attnums,
-    length((s.standist)::text) AS ndistbytes
+    length((s.standist)::bytea) AS ndistbytes,
+    length((s.stadeps)::bytea) AS depsbytes
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 6281cef..b0b40ca 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -67,12 +67,13 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid  |   typname    
-------+--------------
+ oid  |     typname     
+------+-----------------
   194 | pg_node_tree
  3353 | pg_ndistinct
+ 3358 | pg_dependencies
   210 | smgr
-(3 rows)
+(4 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 0273ea6..fda9166 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -118,4 +118,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_ndistinct
+test: mv_ndistinct mv_dependencies
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index f7f3a14..90d74d2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -172,3 +172,4 @@ test: xml
 test: event_trigger
 test: stats
 test: mv_ndistinct
+test: mv_dependencies
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
new file mode 100644
index 0000000..43df798
--- /dev/null
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -0,0 +1,139 @@
+-- data type passed by value
+CREATE TABLE functional_dependencies (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s1 WITH (dependencies) ON (unknown_column) FROM functional_dependencies;
+
+-- single column
+CREATE STATISTICS s1 WITH (dependencies) ON (a) FROM functional_dependencies;
+
+-- single column, duplicated
+CREATE STATISTICS s1 WITH (dependencies) ON (a,a) FROM functional_dependencies;
+
+-- two columns, one duplicated
+CREATE STATISTICS s1 WITH (dependencies) ON (a, a, b) FROM functional_dependencies;
+
+-- correct command
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
+
+-- varlena type (text)
+CREATE TABLE functional_dependencies (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s2 WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c
+INSERT INTO functional_dependencies
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+TRUNCATE functional_dependencies;
+
+-- a => b, a => c, b => c
+INSERT INTO functional_dependencies
+     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE functional_dependencies (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s3 WITH (dependencies) ON (a, b, c, d) FROM functional_dependencies;
+
+INSERT INTO functional_dependencies
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE functional_dependencies;
+
+SELECT deps_enabled, deps_built, stadeps
+  FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
+
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/sql/mv_ndistinct.sql b/src/test/regress/sql/mv_ndistinct.sql
index 5cef254..43024ca 100644
--- a/src/test/regress/sql/mv_ndistinct.sql
+++ b/src/test/regress/sql/mv_ndistinct.sql
@@ -7,19 +7,19 @@ CREATE TABLE ndistinct (
 );
 
 -- unknown column
-CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (unknown_column) FROM ndistinct;
 
 -- single column
-CREATE STATISTICS s10 ON (a) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a) FROM ndistinct;
 
 -- single column, duplicated
-CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a,a) FROM ndistinct;
 
 -- two columns, one duplicated
-CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a, a, b) FROM ndistinct;
 
 -- correct command
-CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+CREATE STATISTICS s10 WITH (ndistinct) ON (a, b, c) FROM ndistinct;
 
 -- perfectly correlated groups
 INSERT INTO ndistinct
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index 791b942..902599b 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -41,7 +41,7 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
-CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
+CREATE STATISTICS addr_nsp.gentable_stat WITH (ndistinct) ON (a,b) FROM addr_nsp.gentable;
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
-- 
2.5.5

0004-PATCH-selectivity-estimation-using-functional-de-v23.patchbinary/octet-stream; name=0004-PATCH-selectivity-estimation-using-functional-de-v23.patchDownload

From a5badc43aa37d249c562a4605478bb7c897b76f6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:37:27 +0200
Subject: [PATCH 4/9] PATCH: selectivity estimation using functional
 dependencies

Use functional dependencies to correct selectivity estimates of
equality clauses. For now this only works with regular WHERE
conditions, not join clauses etc.

Given two equality clauses

     (a = 1) AND (b = 2)

we compute selectivity for each condition, and then combine them
using formula

     P(a=1, b=2) = P(a=1) * [degree + (1 - degree) * P(b=2)]

where 'degree' of the functional dependence (a => b) is a number
between [0,1] measuring how much the knowledge of 'a' determines
the value of 'b'. For 'degree=0' this degrades to independence,
for 'degree=1' we get perfect functional dependency.

Estimates of more than two clauses are computed recursively, so
for example

    (a = 1) AND (b = 2) AND (c = 3)

is first split into

    P(a=1, b=2, c=3) = P(a=1, b=2) * [d + (1-d) * P(c=3)]

where 'd' is degree of (a,b => c) functional dependency. And then
the first part of the estimate is computed recursively:

    P(a=1, b=2) = P(a=1) * [d + (1-d) * P(b=2)]

where 'd' is degree of (a => b) dependency.

The patch includes regression tests with functional dependencies
on several synthetic datasets (random, perfectly correlated, etc.)
---
 doc/src/sgml/planstats.sgml                   | 178 +++++-
 src/backend/optimizer/path/clausesel.c        | 781 +++++++++++++++++++++++++-
 src/backend/utils/mvstats/README.stats        |  45 +-
 src/backend/utils/mvstats/common.c            |   1 +
 src/backend/utils/mvstats/dependencies.c      |  68 +++
 src/include/utils/mvstats.h                   |   6 +-
 src/test/regress/expected/mv_dependencies.out |  28 +-
 src/test/regress/sql/mv_dependencies.sql      |  19 +-
 8 files changed, 1072 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index d5b975d..5436c8a 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -504,7 +504,7 @@ SELECT relpages, reltuples FROM pg_class WHERE relname = 't';
 
 <programlisting>
 EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
-                                           QUERY PLAN                                            
+                                           QUERY PLAN
 -------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual time=0.031..2.870 rows=100 loops=1)
    Filter: (a = 1)
@@ -527,7 +527,7 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
 
 <programlisting>
 EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
-                                          QUERY PLAN                                           
+                                          QUERY PLAN
 -----------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.033..3.006 rows=100 loops=1)
    Filter: ((a = 1) AND (b = 1))
@@ -547,11 +547,11 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
   <para>
    Overestimates, i.e. errors in the opposite direction, are also possible.
    Consider for example the following combination of range conditions, each
-   matching 
+   matching roughly half the rows.
 
 <programlisting>
 EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
-                                           QUERY PLAN                                           
+                                           QUERY PLAN
 ------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..195.00 rows=2500 width=8) (actual time=1.607..1.607 rows=0 loops=1)
    Filter: ((a <= 49) AND (b > 49))
@@ -587,6 +587,176 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
    sections.
   </para>
 
+  <sect2 id="functional-dependencies">
+   <title>Functional Dependencies</title>
+
+   <para>
+    The simplest type of multivariate statistics are functional dependencies,
+    used in definitions of database normal forms. When simplified, saying that
+    <literal>b</> is functionally dependent on <literal>a</> means that
+    knowledge of value of <literal>a</> is sufficient to determine value of
+    <literal>b</>.
+   </para>
+
+   <para>
+    In normalized databases, only functional dependencies on primary keys
+    and super keys are allowed. In practice however many data sets are not
+    fully normalized, for example thanks to intentional denormalization for
+    performance reasons. The table <literal>t</> is an example of a data set
+    with functional dependencies. As <literal>a = b</> for all rows in the
+    table, <literal>a</> is functionally dependent on <literal>b</> and
+    <literal>b</> is functionally dependent on <literal>a</literal>.
+   </para>
+
+   <para>
+    Functional dependencies directly affect accuracy of the estimates, as
+    conditions on the dependent column(s) do not restrict the result set,
+    and are often redundant, causing underestimates. In the first example,
+    either <literal>a = 1</> or <literal>b = 1</> is sufficient (however see
+    <xref linkend="functional-dependencies-limitations">).
+   </para>
+
+   <para>
+    To inform the planner about the functional dependencies, or rather to
+    instruct it to search for them during <command>ANALYZE</>, we can use
+    the <command>CREATE STATISTICS</> command.
+
+<programlisting>
+CREATE STATISTICS s1 ON t (a,b) WITH (dependencies);
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.095..3.118 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.367 ms
+ Execution time: 3.380 ms
+(5 rows)
+</programlisting>
+
+    As you can see, the estimate improved quite a bit, as the planner is now
+    aware of the functional dependencies and eliminates the second condition
+    when computing the estimates.
+   </para>
+
+   <para>
+    Let's inspect multivariate statistics on a table, as defined by
+    <command>CREATE STATISTICS</> and built by <command>ANALYZE</>. If you're
+    using <application>psql</>, the easiest way to list statistics on a table
+    is by using <command>\d</>.
+
+<programlisting>
+\d t
+       Table "public.t"
+ Column |  Type   | Modifiers
+--------+---------+-----------
+ a      | integer |
+ b      | integer |
+Statistics:
+    "public.s1" (dependencies) ON (a, b)
+</programlisting>
+
+   </para>
+
+   <para>
+    Similarly to per-column statistics, multivariate statistics are stored in
+    a system catalog called <structname>pg_mv_statistic</structname>, but
+    there is also a more convenient view <structname>pg_mv_stats</structname>.
+    To inspect the statistics <literal>s1</literal> defined above,
+    you may do this:
+
+<programlisting>
+SELECT tablename, staname, attnums, depsbytes, depsinfo
+  FROM pg_mv_stats WHERE staname = 's1';
+
+ tablename | staname | attnums | depsbytes |    depsinfo
+-----------+---------+---------+-----------+----------------
+ t         | s1      | 1 2     |        32 | dependencies=2
+(1 row)
+</programlisting>
+
+     This shows that the statistic is defined on table <structname>t</>,
+     <structfield>attnums</structfield> lists attribute numbers of columns
+     (references <structname>pg_attribute</structname>). It also shows
+     <command>ANALYZE</> found two functional dependencies, and size when
+     serialized into a <literal>bytea</> column. Inspecting the functional
+     dependencies is possible using <function>pg_mv_stats_dependencies_show</>
+     function.
+
+<programlisting>
+SELECT pg_mv_stats_dependencies_show(stadeps)
+  FROM pg_mv_statistic WHERE staname = 's1';
+
+ pg_mv_stats_dependencies_show
+-------------------------------
+ (1) => 2, (2) => 1
+(1 row)
+</programlisting>
+
+    Which confirms <literal>a</> is functionally dependent on <literal>b</> and
+    <literal>b</> is functionally dependent on <literal>a</literal>.
+   </para>
+
+   <para>
+    Now let's quickly discuss how this knowledge is applied when estimating
+    the selectivity. The planner walks through the conditions and attempts
+    to identify which conditions are already implied by other conditions,
+    and eliminates them (but only for the estimation, all conditions will be
+    checked on tuples during execution). In the example query, either of
+    the conditions may get eliminated, improving the estimate. This happens
+    in <function>clauselist_apply_dependencies</> in <filename>clausesel.c</>.
+   </para>
+
+    <sect3 id="functional-dependencies-limitations">
+     <title>Limitations of functional dependencies</title>
+
+     <para>
+      The first limitation of functional dependencies is that they only work
+      with simple equality conditions, comparing columns and constant values.
+      It's not possible to use them to eliminate equality conditions comparing
+      two columns or a column to an expression, range clauses, <literal>LIKE</>
+      or any other type of condition.
+     </para>
+
+     <para>
+      When eliminating the implied conditions, the planner assumes that the
+      conditions are compatible. Consider the following example, violating
+      this assumption:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=2.992..2.992 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.232 ms
+ Execution time: 3.033 ms
+(5 rows)
+</programlisting>
+
+      There are no rows with this combination of values, however the planner
+      is unable to verify whether the values match - it only knows that
+      the columns are functionally dependent.
+     </para>
+
+     <para>
+      This assumption is more about queries executed on the database - in many
+      cases it's actually satisfied (e.g. when the GUI only allows selecting
+      compatible values). But if that's not the case, functional dependencies
+      may not be a viable option.
+     </para>
+
+     <para>
+      For additional information about functional dependencies, see
+      <filename>src/backend/utils/mvstats/README.dependencies</>.
+     </para>
+
+    </sect3>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index af2934a..cc79282 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,19 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/mvstats.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -41,6 +46,33 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
+#define		STATS_TYPE_FDEPS	0x01
+
+static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+
+static Bitmapset *collect_mv_attnums(List *clauses, Index relid);
+
+static int	count_mv_attnums(List *clauses, Index relid);
+
+static int	count_varnos(List *clauses, Index *relid);
+
+static MVStatisticInfo *choose_mv_statistics(List *mvstats, Bitmapset *attnums,
+											 int types);
+
+static List *clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types);
+
+static Selectivity clauselist_mv_selectivity_deps(PlannerInfo *root,
+						Index relid, List *clauses, MVStatisticInfo *mvstats,
+						Index varRelid, JoinType jointype, SpecialJoinInfo *sjinfo);
+
+static bool has_stats(List *stats, int type);
+
+static List *find_stats(PlannerInfo *root, Index relid);
+
+static bool stats_type_matches(MVStatisticInfo *stat, int type);
+
 
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -60,7 +92,19 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
+ *
+ * The first thing we try to do is applying multivariate statistics, in a way
+ * that intends to minimize the overhead when there are no multivariate stats
+ * on the relation. Thus we do several simple (and inexpensive) checks first,
+ * to verify that suitable multivariate statistics exist.
+ *
+ * If we identify such multivariate statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using multivariate stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
  * query components if they are restriction opclauses whose operators have
@@ -99,15 +143,81 @@ clauselist_selectivity(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
+	/* processing mv stats */
+	Oid			relid = InvalidOid;
+
+	/* list of multivariate stats on the relation */
+	List	   *stats = NIL;
+
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then multivariate statistics is futile
+	 * at this level (we might be able to apply them later if it's AND/OR
+	 * clause). So just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
 								  varRelid, jointype, sjinfo);
 
 	/*
+	 * To fetch the statistics, we first need to determine the rel. Currently
+	 * we only support estimates of simple restrictions referencing a single
+	 * baserel (no join statistics). However set_baserel_size_estimates() sets
+	 * varRelid=0 so we have to actually inspect the clauses by pull_varnos
+	 * and see if there's just a single varno referenced.
+	 *
+	 * XXX Maybe there's a better way to find the relid?
+	 */
+	if ((count_varnos(clauses, &relid) == 1) &&
+		((varRelid == 0) || (varRelid == relid)))
+		stats = find_stats(root, relid);
+
+	/*
+	 * Check that there are multivariate statistics usable for selectivity
+	 * estimation, i.e. anything except ndistinct coefficients.
+	 *
+	 * Also check the number of attributes in clauses that might be estimated
+	 * using those statistics, and that there are at least two such attributes.
+	 * It may easily happen that we won't be able to estimate the clauses using
+	 * the multivariate statistics anyway, but that requires a more expensive
+	 * to verify (so the check check should be worth it).
+	 *
+	 * If there are no such stats or not enough attributes, don't waste time
+	 * simply skip to estimation using the plain per-column stats.
+	 */
+	if (has_stats(stats, STATS_TYPE_FDEPS) &&
+		(count_mv_attnums(clauses, relid) >= 2))
+	{
+		MVStatisticInfo *mvstat;
+		Bitmapset  *mvattnums;
+
+		/* collect attributes from the compatible conditions */
+		mvattnums = collect_mv_attnums(clauses, relid);
+
+		/* and search for the statistic covering the most attributes */
+		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_FDEPS);
+
+		if (mvstat != NULL)		/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	   *mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, STATS_TYPE_FDEPS);
+
+			/* Empty list of clauses is a clear sign something went wrong. */
+			Assert(list_length(mvclauses));
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats (dependencies) */
+			s1 *= clauselist_mv_selectivity_deps(root, relid, mvclauses, mvstat,
+												 varRelid, jointype, sjinfo);
+		}
+	}
+
+	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
 	 * does gets inserted into an rqlist entry.
@@ -763,3 +873,668 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * When applying functional dependencies, we start with the strongest ones
+ * strongest dependencies. That is, we select the dependency that:
+ *
+ * (a) has all attributes covered by the clauses
+ *
+ * (b) has the most attributes
+ *
+ * (c) has the higher degree of validity
+ *
+ * TODO Explain why we select the dependencies this way.
+ */
+static MVDependency
+find_strongest_dependency(MVStatisticInfo *mvstats, MVDependencies dependencies,
+						  Bitmapset *attnums)
+{
+	int i;
+	MVDependency strongest = NULL;
+
+	/* number of attnums in clauses */
+	int nattnums = bms_num_members(attnums);
+
+	/*
+	 * Iterate over the MVDependency items and find the strongest one from
+	 * the fully-matched dependencies. We do the cheap checks first, before
+	 * matching it against the attnums.
+	 */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency	dependency = dependencies->deps[i];
+
+		/*
+		 * Skip dependencies referencing more attributes than available clauses,
+		 * as those can't be fully matched.
+		 */
+		if (dependency->nattributes > nattnums)
+			continue;
+
+		/* We can skip dependencies on fewer attributes than the best one. */
+		if (strongest && (strongest->nattributes > dependency->nattributes))
+			continue;
+
+		/* And also weaker dependencies on the same number of attributes. */
+		if (strongest &&
+			(strongest->nattributes == dependency->nattributes) &&
+			(strongest->degree > dependency->degree))
+			continue;
+
+		/*
+		 * Check that the dependency actually is fully covered by clauses.
+		 * If the dependency is not fully matched by clauses, we can't use
+		 * it for the estimation.
+		 */
+		if (! dependency_is_fully_matched(dependency, attnums,
+										  mvstats->stakeys->values))
+			continue;
+
+		/*
+		 * We have a fully-matched dependency, and we already know it has to
+		 * be stronger than the current one (otherwise we'd skip it before
+		 * inspecting it at the very beginning.
+		 */
+		strongest = dependency;
+	}
+
+	return strongest;
+}
+
+/*
+ * clauselist_mv_selectivity_deps
+ *		estimate selectivity using functional dependencies
+ *
+ * Given equality clauses on attributes (a,b) we find the strongest dependency
+ * between them, i.e. either (a=>b) or (b=>a). Assuming (a=>b) is the selected
+ * dependency, we then combine the per-clause selectivities using the formula
+ *
+ *     P(a,b) = P(a) * [f + (1-f)*P(b)]
+ *
+ * where 'f' is the degree of the dependency.
+ *
+ * With clauses on more than two attributes, the dependencies are applied
+ * recursively, starting with the widest/strongest dependencies. For example
+ * P(a,b,c) is first split like this:
+ *
+ *     P(a,b,c) = P(a,b) * [f + (1-f)*P(c)]
+ *
+ * assuming (a,b=>c) is the strongest dependency.
+ */
+static Selectivity
+clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
+							   List *clauses, MVStatisticInfo *mvstats,
+							   Index varRelid, JoinType jointype,
+							   SpecialJoinInfo *sjinfo)
+{
+	ListCell	   *lc;
+	Selectivity		s1 = 1.0;
+	MVDependencies	dependencies;
+
+	Assert(mvstats->deps_enabled && mvstats->deps_built);
+
+	/* load the dependency items stored in the statistics */
+	dependencies = load_mv_dependencies(mvstats->mvoid);
+
+	Assert(dependencies);
+
+	/*
+	 * Apply the dependencies recursively, starting with the widest/strongest
+	 * ones, and proceeding to the smaller/weaker ones. At the end of each
+	 * round we factor in the selectivity of clauses on the implied attribute,
+	 * and remove the clauses from the list.
+	 */
+	while (true)
+	{
+		Selectivity		s2 = 1.0;
+		Bitmapset	   *attnums;
+		MVDependency	dependency;
+
+		/* clauses remaining after removing those on the "implied" attribute */
+		List		   *clauses_filtered = NIL;
+
+		attnums = collect_mv_attnums(clauses, relid);
+
+		/* no point in looking for dependencies with fewer than 2 attributes */
+		if (bms_num_members(attnums) < 2)
+			break;
+
+		/* the widest/strongest dependency, fully matched by clauses */
+		dependency = find_strongest_dependency(mvstats, dependencies, attnums);
+
+		/* if no suitable dependency was found, we're done */
+		if (! dependency)
+			break;
+
+		/*
+		 * We found an applicable dependency, so find all the clauses on the
+		 * implied attribute, so with dependency (a,b => c) we seach clauses
+		 * on 'c'. We only really expect a single such clause, but in case
+		 * there are more we simply multiply the selectivities as usual.
+		 *
+		 * XXX Maybe we should use the maximum, minimum or just error out?
+		 */
+		foreach(lc, clauses)
+		{
+			AttrNumber	attnum_clause = InvalidAttrNumber;
+			Node	   *clause = (Node *) lfirst(lc);
+
+			/*
+			 * XXX We need the attnum referenced by the clause, and this is the
+			 * easiest way to get it (but maybe not the best one). At this point
+			 * we should only see equality clauses compatible with functional
+			 * dependencies, so just error out if we stumble upon something else.
+			 */
+			if (! clause_is_mv_compatible(clause, relid, &attnum_clause))
+				elog(ERROR, "clause not compatible with functional dependencies");
+
+			Assert(AttributeNumberIsValid(attnum_clause));
+
+			/*
+			 * If the clause is not on the implied attribute, add it to the list
+			 * of filtered clauses (for the next round) and continue with the
+			 * next one.
+			 */
+			if (! dependency_implies_attribute(dependency, attnum_clause,
+											   mvstats->stakeys->values))
+			{
+				clauses_filtered = lappend(clauses_filtered, clause);
+				continue;
+			}
+
+			/*
+			 * Otherwise compute selectivity of the clause, and multiply it with
+			 * other clauses on the same attribute.
+			 *
+			 * XXX Not sure if we need to worry about multiple clauses, though.
+			 * Those are all equality clauses, and if they reference different
+			 * constants, that's not going to work.
+			 */
+			s2 *= clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		}
+
+		/*
+		 * Now factor in the selectivity for all the "implied" clauses into the
+		 * final one, using this formula:
+		 *
+		 *     P(a,b) = P(a) * (f + (1-f) * P(b))
+		 *
+		 * where 'f' is the degree of validity of the dependency.
+		*/
+		s1 *= (dependency->degree + (1 - dependency->degree) * s2);
+
+		/* And only keep the filtered clauses for the next round. */
+		clauses = clauses_filtered;
+	}
+
+	/* And now simply multiply with selectivities of the remaining clauses. */
+	foreach (lc, clauses)
+	{
+		Node   *clause = (Node *) lfirst(lc);
+
+		s1 *= clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+	}
+
+	return s1;
+}
+
+/*
+ * Collect attributes from mv-compatible clauses.
+ */
+static Bitmapset *
+collect_mv_attnums(List *clauses, Index relid)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate using
+	 * multivariate stats, and remember the relid/columns. We'll then
+	 * cross-check if we have suitable stats, and only if needed we'll split
+	 * the clauses into multivariate and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
+	 * using either a range or equality.
+	 */
+	foreach(l, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		if (attnums != NULL)
+			pfree(attnums);
+		attnums = NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * Count the number of attributes in clauses compatible with multivariate stats.
+ */
+static int
+count_mv_attnums(List *clauses, Index relid)
+{
+	int			c;
+	Bitmapset  *attnums = collect_mv_attnums(clauses, relid);
+
+	c = bms_num_members(attnums);
+
+	bms_free(attnums);
+
+	return c;
+}
+
+/*
+ * Count varnos referenced in the clauses, and if there's a single varno then
+ * return the index in 'relid'.
+ */
+static int
+count_varnos(List *clauses, Index *relid)
+{
+	int			cnt;
+	Bitmapset  *varnos = NULL;
+
+	varnos = pull_varnos((Node *) clauses);
+	cnt = bms_num_members(varnos);
+
+	/* if there's a single varno in the clauses, remember it */
+	if (bms_num_members(varnos) == 1)
+		*relid = bms_singleton_member(varnos);
+
+	bms_free(varnos);
+
+	return cnt;
+}
+
+static int
+count_attnums_covered_by_stats(MVStatisticInfo *info, Bitmapset *attnums)
+{
+	int i;
+	int matches = 0;
+	int2vector *attrs = info->stakeys;
+
+	/* count columns covered by the statistics */
+	for (i = 0; i < attrs->dim1; i++)
+		if (bms_is_member(attrs->values[i], attnums))
+			matches++;
+
+	return matches;
+}
+
+/*
+ * We're looking for statistics matching at least 2 attributes, referenced in
+ * clauses compatible with multivariate statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * If there are multiple statistics referencing the same number of columns
+ * (from the clauses), the one with less source columns (as listed in the
+ * ADD STATISTICS when creating the statistics) wins. Else the first one wins.
+ *
+ * This is a very simple criteria, and has several weaknesses:
+ *
+ * (a) does not consider the accuracy of the statistics
+ *
+ *	   If there are two histograms built on the same set of columns, but one
+ *	   has 100 buckets and the other one has 1000 buckets (thus likely
+ *	   providing better estimates), this is not currently considered.
+ *
+ * (b) does not consider the type of statistics
+ *
+ *	   If there are three statistics - one containing just a MCV list, another
+ *	   one with just a histogram and a third one with both, we treat them equally.
+ *
+ * (c) does not consider the number of clauses
+ *
+ *	   As explained, only the number of referenced attributes counts, so if
+ *	   there are multiple clauses on a single attribute, this still counts as
+ *	   a single attribute.
+ *
+ * (d) does not consider type of condition
+ *
+ *	   Some clauses may work better with some statistics - for example equality
+ *	   clauses probably work better with MCV lists than with histograms. But
+ *	   IS [NOT] NULL conditions may often work better with histograms (thanks
+ *	   to NULL-buckets).
+ *
+ * So for example with five WHERE conditions
+ *
+ *	   WHERE (a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1)
+ *
+ * and statistics on (a,b), (a,b,e) and (a,b,c,d), the last one will be selected
+ * as it references the most columns.
+ *
+ * Once we have selected the multivariate statistics, we split the list of
+ * clauses into two parts - conditions that are compatible with the selected
+ * stats, and conditions are estimated using simple statistics.
+ *
+ * From the example above, conditions
+ *
+ *	   (a = 1) AND (b = 1) AND (c = 1) AND (d = 1)
+ *
+ * will be estimated using the multivariate statistics (a,b,c,d) while the last
+ * condition (e = 1) will get estimated using the regular ones.
+ *
+ * There are various alternative selection criteria (e.g. counting conditions
+ * instead of just referenced attributes), but eventually the best option should
+ * be to combine multiple statistics. But that's much harder to do correctly.
+ *
+ * TODO: Select multiple statistics and combine them when computing the estimate.
+ *
+ * TODO: This will probably have to consider compatibility of clauses, because
+ * 'dependencies' will probably work only with equality clauses.
+ */
+static MVStatisticInfo *
+choose_mv_statistics(List *stats, Bitmapset *attnums, int types)
+{
+	ListCell   *lc;
+
+	MVStatisticInfo *choice = NULL;
+
+	int			current_matches = 2;	/* goal #1: maximize */
+	int			current_dims = (MVSTATS_MAX_DIMENSIONS + 1);	/* goal #2: minimize */
+
+	/*
+	 * Walk through the statistics (simple array with nmvstats elements) and
+	 * for each one count the referenced attributes (encoded in the 'attnums'
+	 * bitmap).
+	 */
+	foreach(lc, stats)
+	{
+		MVStatisticInfo *info = (MVStatisticInfo *) lfirst(lc);
+
+		/* columns matching this statistics */
+		int			matches = 0;
+
+		/* size (number of dimensions) of this statistics */
+		int			numattrs = info->stakeys->dim1;
+
+		/* skip statistics not matching any of the requested types */
+		if (! (info->deps_built && (STATS_TYPE_FDEPS & types)))
+			continue;
+
+		/* count columns covered by the statistics */
+		matches = count_attnums_covered_by_stats(info, attnums);
+
+		/*
+		 * Use this statistics when it increases the number of matched clauses
+		 * or when it matches the same number of attributes but is smaller
+		 * (in terms of number of attributes covered).
+		 */
+		if ((matches > current_matches) ||
+			((matches == current_matches) && (current_dims > numattrs)))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * clauselist_mv_split
+ *		split the clause list into a part to be estimated using the provided
+ *		statistics, and remaining clauses (estimated in some other way)
+ */
+static List *
+clauselist_mv_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					MVStatisticInfo *mvstats, int types)
+{
+	int			i;
+	ListCell   *l;
+	List	   *non_mvclauses = NIL;
+
+	/* FIXME is there a better way to get info on int2vector? */
+	int2vector *attrs = mvstats->stakeys;
+	int			numattrs = mvstats->stakeys->dim1;
+
+	Bitmapset  *mvattnums = NULL;
+
+	/* build bitmap of attributes, so we can do bms_is_subset later */
+	for (i = 0; i < numattrs; i++)
+		mvattnums = bms_add_member(mvattnums, attrs->values[i]);
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach(l, clauses)
+	{
+		bool		match = false;		/* by default not mv-compatible */
+		AttrNumber	attnum = InvalidAttrNumber;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_mv_compatible(clause, relid, &attnum))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_member(attnum, mvattnums))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list of
+		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
+		 * clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible with the
+	 * chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+typedef struct
+{
+	Index		varno;			/* relid we're interested in */
+	Bitmapset  *varattnos;		/* attnums referenced by the clauses */
+} mv_compatible_context;
+
+/*
+ * Recursive walker that checks compatibility of the clause with multivariate
+ * statistics, and collects attnums from the Vars.
+ *
+ * XXX The original idea was to combine this with expression_tree_walker, but
+ *	   I've been unable to make that work - seems that does not quite allow
+ *	   checking the structure. Hence the explicit calls to the walker.
+ */
+static bool
+mv_compatible_walker(Node *node, mv_compatible_context *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return true;
+
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+
+		/* check the clause inside the RestrictInfo */
+		return mv_compatible_walker((Node *) rinfo->clause, (void *) context);
+	}
+
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might
+		 * be unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (!AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+		/*
+		 * Only expressions with two arguments are considered compatible.
+		 *
+		 * XXX Possibly unnecessary (can OpExpr have different arg count?).
+		 */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node *) expr) == 1) &&
+			(is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (!ok)
+			return true;
+
+		/*
+		 * If it's not a "<" or ">" or "=" operator, just ignore the clause.
+		 * Otherwise note the relid and attnum for the variable. This uses the
+		 * function for estimating selectivity, ont the operator directly (a
+		 * bit awkward, but well ...).
+		 */
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+		}
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return mv_compatible_walker((Node *) var, context);
+	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+
+/*
+ * Determines whether the clause is compatible with multivariate stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related multivariate statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *	   variable OP constant
+ *
+ * where OP is one of [=,<,<=,>=,>] (which is however determined by
+ * looking at the associated function for estimating selectivity, just
+ * like with the single-dimensional case).
+ *
+ * TODO: Support 'OR clauses' - shouldn't be all that difficult to
+ * evaluate them using multivariate stats.
+ */
+static bool
+clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+{
+	mv_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (mv_compatible_walker(clause, (void *) &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+
+/*
+ * Check that the statistics matches at least one of the requested types.
+ */
+static bool
+stats_type_matches(MVStatisticInfo *stat, int type)
+{
+	if ((type & STATS_TYPE_FDEPS) && stat->deps_built)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check that there are stats with at least one of the requested types.
+ */
+static bool
+has_stats(List *stats, int type)
+{
+	ListCell   *s;
+
+	foreach(s, stats)
+	{
+		MVStatisticInfo *stat = (MVStatisticInfo *) lfirst(s);
+
+		/* terminate if we've found at least one matching statistics */
+		if (stats_type_matches(stat, type))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Lookups stats for the given baserel.
+ */
+static List *
+find_stats(PlannerInfo *root, Index relid)
+{
+	Assert(root->simple_rel_array[relid] != NULL);
+
+	return root->simple_rel_array[relid]->mvstatlist;
+}
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 30d60d6..814f39c 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,48 +8,9 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-
-Types of statistics
--------------------
-
-Currently we only have two kinds of multivariate statistics
-
-    (a) soft functional dependencies (README.dependencies)
-
-    (b) ndistinct coefficients
-
-
-Compatible clause types
------------------------
-
-Each type of statistics may be used to estimate some subset of clause types.
-
-    (a) functional dependencies - equality clauses (AND), possibly IS NULL
-
-Currently only simple operator clauses (Var op Const) are supported, but it's
-possible to support more complex clause types, e.g. (Var op Var).
-
-
-Complex clauses
----------------
-
-We also support estimating more complex clauses - essentially AND/OR clauses
-with (Var op Const) as leaves, as long as all the referenced attributes are
-covered by a single statistics.
-
-For example this condition
-
-    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
-
-may be estimated using statistics on (a,b,c,d). If we only have statistics on
-(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
-
-If we only have statistics on (a,b,c) we can't apply it at all at this point,
-but it's worth pointing out clauselist_selectivity() works recursively and when
-handling the second part (the OR-clause), we'll be able to apply the statistics.
-
-Note: The multi-statistics estimation patch also makes it possible to pass some
-clauses as 'conditions' into the deeper parts of the expression tree.
+Currently we only have one kind of multivariate statistics - soft functional
+dependencies, and we use it to improve estimates of equality clauses. See
+README.dependencies for details.
 
 
 Selectivity estimation
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 4b570a1..39e3b92 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -308,6 +308,7 @@ compare_scalars_partition(const void *a, const void *b, void *arg)
 	return ApplySortComparator(da, false, db, false, ssup);
 }
 
+
 /* initialize multi-dimensional sort */
 MultiSortSupport
 multi_sort_init(int ndims)
diff --git a/src/backend/utils/mvstats/dependencies.c b/src/backend/utils/mvstats/dependencies.c
index c6390e2..6bca03b 100644
--- a/src/backend/utils/mvstats/dependencies.c
+++ b/src/backend/utils/mvstats/dependencies.c
@@ -310,6 +310,10 @@ dependency_degree(int numrows, HeapTuple *rows, int k, int *dependency,
  *	   (c) -> b				  (c,a) -> b
  *	   (c) -> a				  (c,b) -> a
  *	   (b) -> a				  (b,c) -> a
+ *
+ * XXX Currently this builds redundant dependencies, becuse (a,b => c) and
+ * (b,a => c) is exactly the same thing, but both versions are generated
+ * and stored in the statistics.
  */
 MVDependencies
 build_mv_dependencies(int numrows, HeapTuple *rows, int2vector *attrs,
@@ -523,6 +527,70 @@ deserialize_mv_dependencies(bytea *data)
 }
 
 /*
+ * dependency_is_fully_matched
+ *		checks that a functional dependency is fully matched given clauses on
+ * 		attributes (assuming the clauses are suitable equality clauses)
+ */
+bool
+dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
+							int16 *attmap)
+{
+	int j;
+
+	/*
+	 * Check that the dependency actually is fully covered by clauses. We
+	 * have to translate all attribute numbers, as those are referenced
+	 */
+	for (j = 0; j < dependency->nattributes; j++)
+	{
+		int attnum = attmap[dependency->attributes[j]];
+
+		if (! bms_is_member(attnum, attnums))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * dependency_implies_attribute
+ *		check that the attnum matches is implied by the functional dependency
+ */
+bool
+dependency_implies_attribute(MVDependency dependency, AttrNumber attnum,
+							 int16 *attmap)
+{
+	if (attnum == attmap[dependency->attributes[dependency->nattributes-1]])
+		return true;
+
+	return false;
+}
+
+MVDependencies
+load_mv_dependencies(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		deps;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->deps_enabled && mvstat->deps_built);
+#endif
+
+	deps = SysCacheGetAttr(MVSTATOID, htup,
+						   Anum_pg_mv_statistic_stadeps, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_dependencies(DatumGetByteaP(deps));
+}
+
+/*
  * pg_dependencies_in		- input routine for type pg_dependencies.
  *
  * pg_dependencies is real enough to be a table column, but it has no operations
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index e5a49bf..b230747 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -65,9 +65,13 @@ typedef struct MVDependenciesData
 
 typedef MVDependenciesData *MVDependencies;
 
-
+bool dependency_implies_attribute(MVDependency dependency, AttrNumber attnum,
+								  int16 *attmap);
+bool dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
+								 int16 *attmap);
 
 MVNDistinct		load_mv_ndistinct(Oid mvoid);
+MVDependencies	load_mv_dependencies(Oid mvoid);
 
 bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
 bytea *serialize_mv_dependencies(MVDependencies dependencies);
diff --git a/src/test/regress/expected/mv_dependencies.out b/src/test/regress/expected/mv_dependencies.out
index d442a16..cf57a67 100644
--- a/src/test/regress/expected/mv_dependencies.out
+++ b/src/test/regress/expected/mv_dependencies.out
@@ -55,8 +55,10 @@ SELECT deps_enabled, deps_built, stadeps
 
 TRUNCATE functional_dependencies;
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
-     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+     SELECT mod(i,400), mod(i,200), mod(i,100) FROM generate_series(1,30000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
 ANALYZE functional_dependencies;
 SELECT deps_enabled, deps_built, stadeps
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
@@ -65,6 +67,16 @@ SELECT deps_enabled, deps_built, stadeps
  t            | t          | [{0 => 1 : 1.000000}, {0 => 2 : 1.000000}, {1 => 2 : 1.000000}, {0, 1 => 2 : 1.000000}, {0, 2 => 1 : 1.000000}]
 (1 row)
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
 DROP TABLE functional_dependencies;
 -- varlena type (text)
 CREATE TABLE functional_dependencies (
@@ -110,8 +122,10 @@ SELECT deps_enabled, deps_built, stadeps
 
 TRUNCATE functional_dependencies;
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
-     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+     SELECT mod(i,400), mod(i,200), mod(i,100) FROM generate_series(1,30000) s(i);
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
 ANALYZE functional_dependencies;
 SELECT deps_enabled, deps_built, stadeps
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
@@ -120,6 +134,16 @@ SELECT deps_enabled, deps_built, stadeps
  t            | t          | [{0 => 1 : 1.000000}, {0 => 2 : 1.000000}, {1 => 2 : 1.000000}, {0, 1 => 2 : 1.000000}, {0, 2 => 1 : 1.000000}]
 (1 row)
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on fdeps_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
 DROP TABLE functional_dependencies;
 -- NULL values (mix of int and text columns)
 CREATE TABLE functional_dependencies (
diff --git a/src/test/regress/sql/mv_dependencies.sql b/src/test/regress/sql/mv_dependencies.sql
index 43df798..49db649 100644
--- a/src/test/regress/sql/mv_dependencies.sql
+++ b/src/test/regress/sql/mv_dependencies.sql
@@ -53,13 +53,20 @@ SELECT deps_enabled, deps_built, stadeps
 TRUNCATE functional_dependencies;
 
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
-     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+     SELECT mod(i,400), mod(i,200), mod(i,100) FROM generate_series(1,30000) s(i);
+
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, stadeps
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = 10 AND b = 5;
+
 DROP TABLE functional_dependencies;
 
 -- varlena type (text)
@@ -96,6 +103,7 @@ TRUNCATE functional_dependencies;
 -- a => b, a => c
 INSERT INTO functional_dependencies
      SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, stadeps
@@ -104,13 +112,20 @@ SELECT deps_enabled, deps_built, stadeps
 TRUNCATE functional_dependencies;
 
 -- a => b, a => c, b => c
+-- check explain (expect bitmap index scan, not plain index scan)
 INSERT INTO functional_dependencies
-     SELECT i/10000, i/20000, i/40000 FROM generate_series(1,1000000) s(i);
+     SELECT mod(i,400), mod(i,200), mod(i,100) FROM generate_series(1,30000) s(i);
+
+CREATE INDEX fdeps_idx ON functional_dependencies (a, b);
+
 ANALYZE functional_dependencies;
 
 SELECT deps_enabled, deps_built, stadeps
   FROM pg_mv_statistic WHERE starelid = 'functional_dependencies'::regclass;
 
+EXPLAIN (COSTS off)
+ SELECT * FROM functional_dependencies WHERE a = '10' AND b = '5';
+
 DROP TABLE functional_dependencies;
 
 -- NULL values (mix of int and text columns)
-- 
2.5.5

0005-PATCH-multivariate-MCV-lists-v23.patchbinary/octet-stream; name=0005-PATCH-multivariate-MCV-lists-v23.patchDownload

From 699bc7bb78f1fc1a37225f2f69439ee39ee6adcf Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:38:02 +0200
Subject: [PATCH 5/9] PATCH: multivariate MCV lists

- extends the pg_mv_statistic catalog (add 'mcv' fields)
- building the MCV lists during ANALYZE
- simple estimation while planning the queries
- pg_mcv_list data type (varlena-based)

Includes regression tests, mostly equal to regression tests for
functional dependencies.

A varlena-based data type for storing serialized MCV lists.
---
 doc/src/sgml/catalogs.sgml                |   30 +
 doc/src/sgml/planstats.sgml               |  157 ++++
 doc/src/sgml/ref/create_statistics.sgml   |   34 +
 src/backend/catalog/system_views.sql      |    4 +-
 src/backend/commands/statscmds.c          |   11 +-
 src/backend/nodes/outfuncs.c              |    2 +
 src/backend/optimizer/path/clausesel.c    |  636 +++++++++++++++-
 src/backend/optimizer/util/plancat.c      |    4 +-
 src/backend/utils/mvstats/Makefile        |    2 +-
 src/backend/utils/mvstats/README.mcv      |  137 ++++
 src/backend/utils/mvstats/README.stats    |   87 ++-
 src/backend/utils/mvstats/common.c        |  136 +++-
 src/backend/utils/mvstats/common.h        |   22 +-
 src/backend/utils/mvstats/mcv.c           | 1184 +++++++++++++++++++++++++++++
 src/bin/psql/describe.c                   |   24 +-
 src/include/catalog/pg_cast.h             |    5 +
 src/include/catalog/pg_mv_statistic.h     |   18 +-
 src/include/catalog/pg_proc.h             |   14 +
 src/include/catalog/pg_type.h             |    4 +
 src/include/nodes/relation.h              |    6 +-
 src/include/utils/builtins.h              |    4 +
 src/include/utils/mvstats.h               |   64 ++
 src/test/regress/expected/mv_mcv.out      |  198 +++++
 src/test/regress/expected/opr_sanity.out  |    3 +-
 src/test/regress/expected/rules.out       |    4 +-
 src/test/regress/expected/type_sanity.out |    3 +-
 src/test/regress/parallel_schedule        |    2 +-
 src/test/regress/serial_schedule          |    1 +
 src/test/regress/sql/mv_mcv.sql           |  169 ++++
 29 files changed, 2896 insertions(+), 69 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.mcv
 create mode 100644 src/backend/utils/mvstats/mcv.c
 create mode 100644 src/test/regress/expected/mv_mcv.out
 create mode 100644 src/test/regress/sql/mv_mcv.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 852f573..bca03e9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4296,6 +4296,17 @@
      </row>
 
      <row>
+      <entry><structfield>mcv_enabled</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, MVC list will be computed for the combination of columns,
+       covered by the statistics. This does not mean the MCV list is already
+       computed, though.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>ndist_built</structfield></entry>
       <entry><type>bool</type></entry>
       <entry></entry>
@@ -4316,6 +4327,16 @@
      </row>
 
      <row>
+      <entry><structfield>mcv_built</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, MCV list is already computed and available for use during query
+       estimation.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>stakeys</structfield></entry>
       <entry><type>int2vector</type></entry>
       <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
@@ -4344,6 +4365,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stamcv</structfield></entry>
+      <entry><type>pg_mcv_list</type></entry>
+      <entry></entry>
+      <entry>
+       MCV list, serialized as <structname>pg_mcv_list</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 5436c8a..57f9441 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -757,6 +757,163 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
 
   </sect2>
 
+  <sect2 id="mcv-lists">
+   <title>MCV lists</title>
+
+   <para>
+    As explained in the previous section, functional dependencies are very
+    cheap and efficient type of statistics, but it has limitations due to the
+    global nature (only tracking column-level dependencies, not between values
+    stored in the columns).
+   </para>
+
+   <para>
+    This section introduces multivariate most-common values (<acronym>MCV</>)
+    lists, a direct generalization of the statistics introduced in
+    <xref linkend="row-estimation-examples">, that is not subject to this
+    limitation. It is however more expensive, both in terms of storage and
+    planning time.
+   </para>
+
+   <para>
+    Let's look at the example query from the previous section again, creating
+    a multivariate <acronym>MCV</> list on the columns (after dropping the
+    functional dependencies, to make sure the planner uses the newly created
+    <acronym>MCV</> list when computing the estimates).
+
+<programlisting>
+DROP STATISTICS s1;
+CREATE STATISTICS s2 ON t (a,b) WITH (mcv);
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.036..3.011 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.188 ms
+ Execution time: 3.229 ms
+(5 rows)
+</programlisting>
+
+    The estimate is as accurate as with the functional dependencies, mostly
+    thanks to the table being a fairly small and having a simple distribution
+    with low number of distinct values. Before looking at the second query,
+    which was not handled by functional dependencies this well, let's inspect
+    the <acronym>MCV</> list a bit.
+   </para>
+
+   <para>
+    First, let's list statistics defined on a table using <command>\d</>
+    in <application>psql</>:
+
+<programlisting>
+\d t
+       Table "public.t"
+ Column |  Type   | Modifiers
+--------+---------+-----------
+ a      | integer |
+ b      | integer |
+Statistics:
+    "public.s2" (mcv) ON (a, b)
+</programlisting>
+
+   </para>
+
+   <para>
+    To inspect details of the <acronym>MCV</> statistics, we can look into the
+    <structname>pg_mv_stats</structname> view
+
+<programlisting>
+SELECT tablename, staname, attnums, mcvbytes, mcvinfo
+  FROM pg_mv_stats WHERE staname = 's2';
+ tablename | staname | attnums | mcvbytes |  mcvinfo
+-----------+---------+---------+----------+------------
+ t         | s2      | 1 2     |     2048 | nitems=100
+(1 row)
+</programlisting>
+
+    According to this, the statistics has 2kB when serialized into
+    a <literal>bytea</> value, and <command>ANALYZE</> found 100 distinct
+    combinations of values in the two columns.
+   </para>
+
+   <para>
+    Inspecting the contents of the MCV list is possible using
+    <function>pg_mv_mcv_items</> function.
+
+<programlisting>
+SELECT * FROM pg_mv_mcv_items((SELECT oid FROM pg_mv_statistic WHERE staname = 's2'));
+ index | values  | nulls | frequency
+-------+---------+-------+-----------
+     0 | {0,0}   | {f,f} |      0.01
+     1 | {1,1}   | {f,f} |      0.01
+     2 | {2,2}   | {f,f} |      0.01
+...
+    49 | {49,49} | {f,f} |      0.01
+    50 | {50,0}  | {f,f} |      0.01
+...
+    97 | {97,47} | {f,f} |      0.01
+    98 | {98,48} | {f,f} |      0.01
+    99 | {99,49} | {f,f} |      0.01
+(100 rows)
+</programlisting>
+
+    Which confirms there are 100 distinct combinations of values in the two
+    columns, and all of them are equally likely (1% frequency for each).
+    Had there been any null values in either of the columns, this would be
+    identified in the <structfield>nulls</> column.
+   </para>
+
+   <para>
+    When estimating the selectivity, the planner applies all the conditions
+    on items in the <acronym>MCV</> list, and them sums the frequencies
+    of the matching ones. See <function>clauselist_mv_selectivity_mcvlist</>
+    in <filename>clausesel.c</> for details.
+   </para>
+
+   <para>
+    Compared to functional dependencies, <acronym>MCV</> lists have two major
+    advantages. Firstly, the list stores actual values, making it possible to
+    detect "incompatible" combinations.
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+                                         QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=2.823..2.823 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.268 ms
+ Execution time: 2.866 ms
+(5 rows)
+</programlisting>
+
+    Secondly, <acronym>MCV</> also handle a wide range of clause types, not
+    just equality clauses like functional dependencies. See for example the
+    example range query, presented earlier:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
+                                         QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=3.349..3.349 rows=0 loops=1)
+   Filter: ((a <= 49) AND (b > 49))
+   Rows Removed by Filter: 10000
+ Planning time: 0.163 ms
+ Execution time: 3.389 ms
+(5 rows)
+</programlisting>
+
+   </para>
+
+   <para>
+    For additional information about multivariate MCV lists, see
+    <filename>src/backend/utils/mvstats/README.mcv</>.
+   </para>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index eaa39ee..e95d8d3 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -124,6 +124,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>mcv</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables MCV list for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>ndistinct</> (<type>boolean</>)</term>
     <listitem>
      <para>
@@ -167,6 +176,31 @@ EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
 </programlisting>
   </para>
 
+  <para>
+   Create table <structname>t2</> with two perfectly correlated columns
+   (containing identical data), and a MCV list on those columns:
+
+<programlisting>
+CREATE TABLE t2 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t2 SELECT mod(i,100), mod(i,100)
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s2 WITH (mcv) ON (a, b) FROM t2;
+
+ANALYZE t2;
+
+-- valid combination (found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
+
+-- invalid combination (not found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
+</programlisting>
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 216ece5..d4d9c24 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -188,7 +188,9 @@ CREATE VIEW pg_mv_stats AS
         S.staname AS staname,
         S.stakeys AS attnums,
         length(s.standist::bytea) AS ndistbytes,
-        length(S.stadeps::bytea) AS depsbytes
+        length(S.stadeps::bytea) AS depsbytes,
+        length(S.stamcv::bytea) AS mcvbytes,
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index af4f4d3..ef05745 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -70,7 +70,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* by default build nothing */
 	bool		build_ndistinct = false,
-				build_dependencies = false;
+				build_dependencies = false,
+				build_mcv = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -169,6 +170,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 			build_ndistinct = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "dependencies") == 0)
 			build_dependencies = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "mcv") == 0)
+			build_mcv = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -177,10 +180,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* Make sure there's at least one statistics type specified. */
-	if (! (build_ndistinct || build_dependencies))
+	if (!(build_ndistinct || build_dependencies || build_mcv))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (ndistinct, dependencies) requested")));
+				 errmsg("no statistics type (ndistinct, dependencies, mcv) requested")));
 
 	stakeys = buildint2vector(attnums, numcols);
 
@@ -203,9 +206,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* enabled statistics */
 	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(build_ndistinct);
 	values[Anum_pg_mv_statistic_deps_enabled - 1] = BoolGetDatum(build_dependencies);
+	values[Anum_pg_mv_statistic_mcv_enabled - 1] = BoolGetDatum(build_mcv);
 
 	nulls[Anum_pg_mv_statistic_standist - 1] = true;
 	nulls[Anum_pg_mv_statistic_stadeps - 1] = true;
+	nulls[Anum_pg_mv_statistic_stamcv - 1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c72473b..a9cc9ad 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2203,10 +2203,12 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	/* enabled statistics */
 	WRITE_BOOL_FIELD(ndist_enabled);
 	WRITE_BOOL_FIELD(deps_enabled);
+	WRITE_BOOL_FIELD(mcv_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(ndist_built);
 	WRITE_BOOL_FIELD(deps_built);
+	WRITE_BOOL_FIELD(mcv_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index cc79282..abdbc5b 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_operator.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@@ -47,12 +48,14 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
 #define		STATS_TYPE_FDEPS	0x01
+#define		STATS_TYPE_MCV		0x02
 
-static bool clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum);
+static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
+						int type);
 
-static Bitmapset *collect_mv_attnums(List *clauses, Index relid);
+static Bitmapset *collect_mv_attnums(List *clauses, Index relid, int type);
 
-static int	count_mv_attnums(List *clauses, Index relid);
+static int	count_mv_attnums(List *clauses, Index relid, int type);
 
 static int	count_varnos(List *clauses, Index *relid);
 
@@ -63,10 +66,23 @@ static List *clauselist_mv_split(PlannerInfo *root, Index relid,
 					List *clauses, List **mvclauses,
 					MVStatisticInfo *mvstats, int types);
 
+static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
+						  List *clauses, MVStatisticInfo *mvstats);
+
 static Selectivity clauselist_mv_selectivity_deps(PlannerInfo *root,
 						Index relid, List *clauses, MVStatisticInfo *mvstats,
 						Index varRelid, JoinType jointype, SpecialJoinInfo *sjinfo);
 
+static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
+								  List *clauses, MVStatisticInfo *mvstats,
+								  bool *fullmatch, Selectivity *lowsel);
+
+static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+							int2vector *stakeys, MCVList mcvlist,
+							int nmatches, char *matches,
+							Selectivity *lowsel, bool *fullmatch,
+							bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List *find_stats(PlannerInfo *root, Index relid);
@@ -74,6 +90,9 @@ static List *find_stats(PlannerInfo *root, Index relid);
 static bool stats_type_matches(MVStatisticInfo *stat, int type);
 
 
+#define UPDATE_RESULT(m,r,isor) \
+	(m) = (isor) ? (Max(m,r)) : (Min(m,r))
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -99,11 +118,13 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
  * to verify that suitable multivariate statistics exist.
  *
  * If we identify such multivariate statistics apply, we try to apply them.
- * Currently we only have (soft) functional dependencies, so we try to reduce
- * the list of clauses.
  *
- * Then we remove the clauses estimated using multivariate stats, and process
- * the rest of the clauses using the regular per-column stats.
+ * First we try to reduce the list of clauses by applying (soft) functional
+ * dependencies, and then we try to estimate the selectivity of the reduced
+ * list of clauses using the multivariate MCV list.
+ *
+ * Finally we remove the portion of clauses estimated using multivariate stats,
+ * and process the rest of the clauses using the regular per-column stats.
  *
  * Currently, the only extra smarts we have is to recognize "range queries",
  * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
@@ -173,7 +194,10 @@ clauselist_selectivity(PlannerInfo *root,
 
 	/*
 	 * Check that there are multivariate statistics usable for selectivity
-	 * estimation, i.e. anything except ndistinct coefficients.
+	 * estimation. We try to apply MCV lists first, because statistics
+	 * tracking actual values tend to provide more reliable estimates than
+	 * functional dependencies (which assume that the clauses are consistent
+	 * with the statistics).
 	 *
 	 * Also check the number of attributes in clauses that might be estimated
 	 * using those statistics, and that there are at least two such attributes.
@@ -184,14 +208,43 @@ clauselist_selectivity(PlannerInfo *root,
 	 * If there are no such stats or not enough attributes, don't waste time
 	 * simply skip to estimation using the plain per-column stats.
 	 */
+	if (has_stats(stats, STATS_TYPE_MCV) &&
+		(count_mv_attnums(clauses, relid, STATS_TYPE_MCV) >= 2))
+	{
+		/* collect attributes from the compatible conditions */
+		Bitmapset  *mvattnums = collect_mv_attnums(clauses, relid,
+												   STATS_TYPE_MCV);
+
+		/* and search for the statistic covering the most attributes */
+		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums,
+													   STATS_TYPE_MCV);
+
+		if (mvstat != NULL)		/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	   *mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, STATS_TYPE_MCV);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats */
+			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+		}
+	}
+
+	/* Now try to apply functional dependencies on the remaining clauses. */
 	if (has_stats(stats, STATS_TYPE_FDEPS) &&
-		(count_mv_attnums(clauses, relid) >= 2))
+		(count_mv_attnums(clauses, relid, STATS_TYPE_FDEPS) >= 2))
 	{
 		MVStatisticInfo *mvstat;
 		Bitmapset  *mvattnums;
 
 		/* collect attributes from the compatible conditions */
-		mvattnums = collect_mv_attnums(clauses, relid);
+		mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_FDEPS);
 
 		/* and search for the statistic covering the most attributes */
 		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_FDEPS);
@@ -994,7 +1047,7 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 		/* clauses remaining after removing those on the "implied" attribute */
 		List		   *clauses_filtered = NIL;
 
-		attnums = collect_mv_attnums(clauses, relid);
+		attnums = collect_mv_attnums(clauses, relid, STATS_TYPE_FDEPS);
 
 		/* no point in looking for dependencies with fewer than 2 attributes */
 		if (bms_num_members(attnums) < 2)
@@ -1017,7 +1070,7 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 		 */
 		foreach(lc, clauses)
 		{
-			AttrNumber	attnum_clause = InvalidAttrNumber;
+			Bitmapset  *attnums_clause = NULL;
 			Node	   *clause = (Node *) lfirst(lc);
 
 			/*
@@ -1026,17 +1079,20 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 			 * we should only see equality clauses compatible with functional
 			 * dependencies, so just error out if we stumble upon something else.
 			 */
-			if (! clause_is_mv_compatible(clause, relid, &attnum_clause))
+			if (! clause_is_mv_compatible(clause, relid, &attnums_clause,
+										  STATS_TYPE_FDEPS))
 				elog(ERROR, "clause not compatible with functional dependencies");
 
-			Assert(AttributeNumberIsValid(attnum_clause));
+			/* we also expect only simple equality clauses */
+			Assert(bms_num_members(attnums_clause) == 1);
 
 			/*
 			 * If the clause is not on the implied attribute, add it to the list
 			 * of filtered clauses (for the next round) and continue with the
 			 * next one.
 			 */
-			if (! dependency_implies_attribute(dependency, attnum_clause,
+			if (! dependency_implies_attribute(dependency,
+											   bms_singleton_member(attnums_clause),
 											   mvstats->stakeys->values))
 			{
 				clauses_filtered = lappend(clauses_filtered, clause);
@@ -1080,10 +1136,71 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 }
 
 /*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO: We may support some additional conditions, most importantly those
+ * matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO: Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ * selectivity of the most restrictive clause), because that's the maximum
+ * we can ever get from ANDed list of clauses. This may probably prevent
+ * issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO: We may remember the lowest frequency in the MCV list, and then later
+ * use it as a upper boundary for the selectivity (had there been a more
+ * frequent item, it'd be in the MCV list). This might improve cases with
+ * low-detail histograms.
+ *
+ * TODO: We may also derive some additional boundaries for the selectivity from
+ * the MCV list, because
+ *
+ * (a) if we have a "full equality condition" (one equality condition on
+ * each column of the statistic) and we found a match in the MCV list,
+ * then this is the final selectivity (and pretty accurate),
+ *
+ * (b) if we have a "full equality condition" and we haven't found a match
+ * in the MCV list, then the selectivity is below the lowest frequency
+ * found in the MCV list,
+ *
+ * TODO: When applying the clauses to the histogram/MCV list, we can do that
+ * from the most selective clauses first, because that'll eliminate the
+ * buckets/items sooner (so we'll be able to skip them without inspection,
+ * which is more expensive). But this requires really knowing the per-clause
+ * selectivities in advance, and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool		fullmatch = false;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound for
+	 * full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/*
+	 * TODO: Evaluate simple 1D selectivities, use the smallest one as an
+	 * upper bound, product as lower bound, and sort the clauses in ascending
+	 * order by selectivity (to optimize the MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV selectivity */
+	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+											 &fullmatch, &mcv_low);
+}
+
+/*
  * Collect attributes from mv-compatible clauses.
  */
 static Bitmapset *
-collect_mv_attnums(List *clauses, Index relid)
+collect_mv_attnums(List *clauses, Index relid, int types)
 {
 	Bitmapset  *attnums = NULL;
 	ListCell   *l;
@@ -1099,12 +1216,10 @@ collect_mv_attnums(List *clauses, Index relid)
 	 */
 	foreach(l, clauses)
 	{
-		AttrNumber	attnum;
 		Node	   *clause = (Node *) lfirst(l);
 
-		/* ignore the result for now - we only need the info */
-		if (clause_is_mv_compatible(clause, relid, &attnum))
-			attnums = bms_add_member(attnums, attnum);
+		/* ignore the result here - we only need the attnums */
+		clause_is_mv_compatible(clause, relid, &attnums, types);
 	}
 
 	/*
@@ -1125,10 +1240,10 @@ collect_mv_attnums(List *clauses, Index relid)
  * Count the number of attributes in clauses compatible with multivariate stats.
  */
 static int
-count_mv_attnums(List *clauses, Index relid)
+count_mv_attnums(List *clauses, Index relid, int type)
 {
 	int			c;
-	Bitmapset  *attnums = collect_mv_attnums(clauses, relid);
+	Bitmapset  *attnums = collect_mv_attnums(clauses, relid, type);
 
 	c = bms_num_members(attnums);
 
@@ -1263,7 +1378,8 @@ choose_mv_statistics(List *stats, Bitmapset *attnums, int types)
 		int			numattrs = info->stakeys->dim1;
 
 		/* skip statistics not matching any of the requested types */
-		if (! (info->deps_built && (STATS_TYPE_FDEPS & types)))
+		if (! ((info->deps_built && (STATS_TYPE_FDEPS & types)) ||
+			   (info->mcv_built && (STATS_TYPE_MCV & types))))
 			continue;
 
 		/* count columns covered by the statistics */
@@ -1317,13 +1433,13 @@ clauselist_mv_split(PlannerInfo *root, Index relid,
 	foreach(l, clauses)
 	{
 		bool		match = false;		/* by default not mv-compatible */
-		AttrNumber	attnum = InvalidAttrNumber;
+		Bitmapset  *attnums = NULL;
 		Node	   *clause = (Node *) lfirst(l);
 
-		if (clause_is_mv_compatible(clause, relid, &attnum))
+		if (clause_is_mv_compatible(clause, relid, &attnums, types))
 		{
 			/* are all the attributes part of the selected stats? */
-			if (bms_is_member(attnum, mvattnums))
+			if (bms_is_subset(attnums, mvattnums))
 				match = true;
 		}
 
@@ -1348,6 +1464,7 @@ clauselist_mv_split(PlannerInfo *root, Index relid,
 
 typedef struct
 {
+	int			types;			/* types of statistics ? */
 	Index		varno;			/* relid we're interested in */
 	Bitmapset  *varattnos;		/* attnums referenced by the clauses */
 } mv_compatible_context;
@@ -1382,6 +1499,49 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		return mv_compatible_walker((Node *) rinfo->clause, (void *) context);
 	}
 
+	if (or_clause(node) || and_clause(node) || not_clause(node))
+	{
+		/*
+		 * AND/OR/NOT-clauses are supported if all sub-clauses are supported
+		 *
+		 * TODO: We might support mixed case, where some of the clauses are
+		 * supported and some are not, and treat all supported subclauses as a
+		 * single clause, compute it's selectivity using mv stats, and compute
+		 * the total selectivity using the current algorithm.
+		 *
+		 * TODO: For RestrictInfo above an OR-clause, we might use the
+		 * orclause with nested RestrictInfo - we won't have to call
+		 * pull_varnos() for each clause, saving time.
+		 *
+		 * TODO: Perhaps this needs a bit more thought for functional
+		 * dependencies? Those don't quite work for NOT cases.
+		 */
+		BoolExpr   *expr = (BoolExpr *) node;
+		ListCell   *lc;
+
+		foreach(lc, expr->args)
+		{
+			if (mv_compatible_walker((Node *) lfirst(lc), context))
+				return true;
+		}
+
+		return false;
+	}
+
+	if (IsA(node, NullTest))
+	{
+		NullTest   *nt = (NullTest *) node;
+
+		/*
+		 * Only simple (Var IS NULL) expressions supported for now. Maybe we
+		 * could use examine_variable to fix this?
+		 */
+		if (!IsA(nt->arg, Var))
+			return true;
+
+		return mv_compatible_walker((Node *) (nt->arg), context);
+	}
+
 	if (IsA(node, Var))
 	{
 		Var		   *var = (Var *) node;
@@ -1442,10 +1602,18 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 		switch (get_oprrest(expr->opno))
 		{
 			case F_EQSEL:
-
 				/* equality conditions are compatible with all statistics */
 				break;
 
+			case F_SCALARLTSEL:
+			case F_SCALARGTSEL:
+
+				/* not compatible with functional dependencies */
+				if (!(context->types & STATS_TYPE_MCV))
+					return true;	/* terminate */
+
+				break;
+
 			default:
 
 				/* unknown estimator */
@@ -1479,10 +1647,11 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
  * evaluate them using multivariate stats.
  */
 static bool
-clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
+clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int types)
 {
 	mv_compatible_context context;
 
+	context.types = types;
 	context.varno = relid;
 	context.varattnos = NULL;	/* no attnums */
 
@@ -1490,7 +1659,7 @@ clause_is_mv_compatible(Node *clause, Index relid, AttrNumber *attnum)
 		return false;
 
 	/* remember the newly collected attnums */
-	*attnum = bms_singleton_member(context.varattnos);
+	*attnums = bms_add_members(*attnums, context.varattnos);
 
 	return true;
 }
@@ -1505,6 +1674,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & STATS_TYPE_FDEPS) && stat->deps_built)
 		return true;
 
+	if ((type & STATS_TYPE_MCV) && stat->mcv_built)
+		return true;
+
 	return false;
 }
 
@@ -1538,3 +1710,409 @@ find_stats(PlannerInfo *root, Index relid)
 
 	return root->simple_rel_array[relid]->mvstatlist;
 }
+
+/*
+ * Estimate selectivity of clauses using a MCV list.
+ *
+ * If there's no MCV list for the stats, the function returns 0.0.
+ *
+ * While computing the estimate, the function checks whether all the
+ * columns were matched with an equality condition. If that's the case,
+ * we can skip processing the histogram, as there can be no rows in
+ * it with the same values - all the rows matching the condition are
+ * represented by the MCV item. This can only happen with equality
+ * on all the attributes.
+ *
+ * The algorithm works like this:
+ *
+ *	 1) mark all items as 'match'
+ *	 2) walk through all the clauses
+ *	 3) for a particular clause, walk through all the items
+ *	 4) skip items that are already 'no match'
+ *	 5) check clause for items that still match
+ *	 6) sum frequencies for items to get selectivity
+ *
+ * The function also returns the frequency of the least frequent item
+ * on the MCV list, which may be useful for clamping estimate from the
+ * histogram (all items not present in the MCV list are less frequent).
+ * This however seems useful only for cases with conditions on all
+ * attributes.
+ *
+ * TODO: This only handles AND-ed clauses, but it might work for OR-ed
+ * lists too - it just needs to reverse the logic a bit. I.e. start
+ * with 'no match' for all items, and mark the items as a match
+ * as the clauses are processed (and skip items that are 'match').
+ */
+static Selectivity
+clauselist_mv_selectivity_mcvlist(PlannerInfo *root, List *clauses,
+								  MVStatisticInfo *mvstats, bool *fullmatch,
+								  Selectivity *lowsel)
+{
+	int			i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	MCVList		mcvlist = NULL;
+	int			nmatches = 0;
+
+	/* match/mismatch bitmap for each MCV item */
+	char	   *matches = NULL;
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/* there's no MCV list built yet */
+	if (!mvstats->mcv_built)
+		return 0.0;
+
+	mcvlist = load_mv_mcvlist(mvstats->mvoid);
+
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* by default all the MCV items match the clauses fully */
+	matches = palloc0(sizeof(char) * mcvlist->nitems);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char) * mcvlist->nitems);
+
+	/* number of matching MCV items */
+	nmatches = mcvlist->nitems;
+
+	nmatches = update_match_bitmap_mcvlist(root, clauses,
+										   mvstats->stakeys, mcvlist,
+										   nmatches, matches,
+										   lowsel, fullmatch, false);
+
+	/* sum frequencies for all the matching MCV items */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		/* used to 'scale' for MCV lists not covering all tuples */
+		u += mcvlist->items[i]->frequency;
+
+		if (matches[i] != MVSTATS_MATCH_NONE)
+			s += mcvlist->items[i]->frequency;
+	}
+
+	pfree(matches);
+	pfree(mcvlist);
+
+	return s * u;
+}
+
+/*
+ * Evaluate clauses using the MCV list, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * TODO: This works with 'bitmap' where each bit is represented as a char,
+ * which is slightly wasteful. Instead, we could use a regular
+ * bitmap, reducing the size to ~1/8. Another thing is merging the
+ * bitmaps using & and |, which might be faster than min/max.
+ */
+static int
+update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
+							int2vector *stakeys, MCVList mcvlist,
+							int nmatches, char *matches,
+							Selectivity *lowsel, bool *fullmatch,
+							bool is_or)
+{
+	int			i;
+	ListCell   *l;
+
+	Bitmapset  *eqmatches = NULL;		/* attributes with equality matches */
+
+	/* The bitmap may be partially built. */
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mcvlist->nitems);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+	Assert(mcvlist != NULL);
+	Assert(mcvlist->nitems > 0);
+
+	/* No possible matches (only works for AND-ded clauses) */
+	if (((nmatches == 0) && (!is_or)) ||
+		((nmatches == mcvlist->nitems) && is_or))
+		return nmatches;
+
+	/*
+	 * find the lowest frequency in the MCV list
+	 *
+	 * We need to do that here, because we do various tricks in the following
+	 * code - skipping items already ruled out, etc.
+	 *
+	 * XXX A loop is necessary because the MCV list is not sorted by
+	 * frequency.
+	 */
+	*lowsel = 1.0;
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem		item = mcvlist->items[i];
+
+		if (item->frequency < *lowsel)
+			*lowsel = item->frequency;
+	}
+
+	/*
+	 * Loop through the list of clauses, and for each of them evaluate all the
+	 * MCV items not yet eliminated by the preceding clauses.
+	 */
+	foreach(l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node *) ((RestrictInfo *) clause)->clause;
+
+		/* if there are no remaining matches possible, we can stop */
+		if (((nmatches == 0) && (!is_or)) ||
+			((nmatches == mcvlist->nitems) && is_or))
+			break;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+			FmgrInfo	opproc;
+
+			/* get procedure computing operator selectivity */
+			RegProcedure oprrest = get_oprrest(expr->opno);
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			ok = (NumRelids(clause) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+
+				FmgrInfo	gtproc;
+				Var		   *var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const	   *cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool		isgt = (!varonleft);
+
+				TypeCacheEntry *typecache
+				= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
+
+				/* FIXME proper matching attribute to dimension */
+				int			idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->gt_opr), &gtproc);
+
+				/*
+				 * Walk through the MCV items and evaluate the current clause.
+				 * We can skip items that were already ruled out, and
+				 * terminate if there are no remaining MCV items that might
+				 * possibly match.
+				 */
+				for (i = 0; i < mcvlist->nitems; i++)
+				{
+					bool		mismatch = false;
+					MCVItem		item = mcvlist->items[i];
+
+					/*
+					 * If there are no more matches (AND) or no remaining
+					 * unmatched items (OR), we can stop processing this
+					 * clause.
+					 */
+					if (((nmatches == 0) && (!is_or)) ||
+						((nmatches == mcvlist->nitems) && is_or))
+						break;
+
+					/*
+					 * For AND-lists, we can also mark NULL items as 'no
+					 * match' (and then skip them). For OR-lists this is not
+					 * possible.
+					 */
+					if ((!is_or) && item->isnull[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/* skip MCV items that were already ruled out */
+					if ((!is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					switch (oprrest)
+					{
+						case F_EQSEL:
+
+							/*
+							 * We don't care about isgt in equality, because
+							 * it does not matter whether it's (var = const)
+							 * or (const = var).
+							 */
+							mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+													   DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+														 item->values[idx]));
+
+							if (!mismatch)
+								eqmatches = bms_add_member(eqmatches, idx);
+
+							break;
+
+						case F_SCALARLTSEL:		/* column < constant */
+						case F_SCALARGTSEL:		/* column > constant */
+
+							/*
+							 * First check whether the constant is below the
+							 * lower boundary (in that case we can skip the
+							 * bucket, because there's no overlap).
+							 */
+							if (isgt)
+								mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+														   DEFAULT_COLLATION_OID,
+															 cst->constvalue,
+															item->values[idx]));
+							else
+								mismatch = !DatumGetBool(FunctionCall2Coll(&opproc,
+														   DEFAULT_COLLATION_OID,
+															 item->values[idx],
+															  cst->constvalue));
+
+							break;
+					}
+
+					/*
+					 * XXX The conditions on matches[i] are not needed, as we
+					 * skip MCV items that can't become true/false, depending
+					 * on the current flag. See beginning of the loop over MCV
+					 * items.
+					 */
+
+					if ((is_or) && (matches[i] == MVSTATS_MATCH_NONE) && (!mismatch))
+					{
+						/* OR - was MATCH_NONE, but will be MATCH_FULL */
+						matches[i] = MVSTATS_MATCH_FULL;
+						++nmatches;
+						continue;
+					}
+					else if ((!is_or) && (matches[i] == MVSTATS_MATCH_FULL) && mismatch)
+					{
+						/* AND - was MATC_FULL, but will be MATCH_NONE */
+						matches[i] = MVSTATS_MATCH_NONE;
+						--nmatches;
+						continue;
+					}
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest   *expr = (NullTest *) clause;
+			Var		   *var = (Var *) (expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int			idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the MCV items and evaluate the current clause. We
+			 * can skip items that were already ruled out, and terminate if
+			 * there are no remaining MCV items that might possibly match.
+			 */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				MCVItem		item = mcvlist->items[i];
+
+				/*
+				 * if there are no more matches, we can stop processing this
+				 * clause
+				 */
+				if (nmatches == 0)
+					break;
+
+				/* skip MCV items that were already ruled out */
+				if (matches[i] == MVSTATS_MATCH_NONE)
+					continue;
+
+				/* if the clause mismatches the MCV item, set it as MATCH_NONE */
+				if (((expr->nulltesttype == IS_NULL) && (!item->isnull[idx])) ||
+				((expr->nulltesttype == IS_NOT_NULL) && (item->isnull[idx])))
+				{
+					matches[i] = MVSTATS_MATCH_NONE;
+					--nmatches;
+				}
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/*
+			 * AND/OR clause, with all clauses compatible with the selected MV
+			 * stat
+			 */
+
+			int			i;
+			BoolExpr   *orclause = ((BoolExpr *) clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each MCV item */
+			int			or_nmatches = 0;
+			char	   *or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching MCV items */
+			or_nmatches = mcvlist->nitems;
+
+			/* by default none of the MCV items matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char) * or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char) * or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_mcvlist(root, orclauses,
+													  stakeys, mcvlist,
+													  or_nmatches, or_matches,
+									   lowsel, fullmatch, or_clause(clause));
+
+			/* merge the bitmap into the existing one */
+			for (i = 0; i < mcvlist->nitems; i++)
+			{
+				/*
+				 * Merge the result into the bitmap (Min for AND, Max for OR).
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+		{
+			elog(ERROR, "unknown clause type: %d", clause->type);
+		}
+	}
+
+	/*
+	 * If all the columns were matched by equality, it's a full match. In this
+	 * case there can be just a single MCV item, matching the clause (if there
+	 * were two, both would match the other one).
+	 */
+	*fullmatch = (bms_num_members(eqmatches) == mcvlist->ndimensions);
+
+	/* free the allocated pieces */
+	if (eqmatches)
+		pfree(eqmatches);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8129143..9dd4e83 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1287,7 +1287,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 		mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 		/* unavailable stats are not interesting for the planner */
-		if (mvstat->deps_built || mvstat->ndist_built)
+		if (mvstat->deps_built || mvstat->ndist_built || mvstat->mcv_built)
 		{
 			info = makeNode(MVStatisticInfo);
 
@@ -1297,10 +1297,12 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			/* enabled statistics */
 			info->ndist_enabled = mvstat->ndist_enabled;
 			info->deps_enabled = mvstat->deps_enabled;
+			info->mcv_enabled = mvstat->mcv_enabled;
 
 			/* built/available statistics */
 			info->ndist_built = mvstat->ndist_built;
 			info->deps_built = mvstat->deps_built;
+			info->mcv_built = mvstat->mcv_built;
 
 			/* stakeys */
 			adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index 21fe7e5..d5d47ba 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mvdist.o
+OBJS = common.o dependencies.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.mcv b/src/backend/utils/mvstats/README.mcv
new file mode 100644
index 0000000..e93cfe4
--- /dev/null
+++ b/src/backend/utils/mvstats/README.mcv
@@ -0,0 +1,137 @@
+MCV lists
+=========
+
+Multivariate MCV (most-common values) lists are a straightforward extension of
+regular MCV list, tracking most frequent combinations of values for a group of
+attributes.
+
+This works particularly well for columns with a small number of distinct values,
+as the list may include all the combinations and approximate the distribution
+very accurately.
+
+For columns with large number of distinct values (e.g. those with continuous
+domains), the list will only track the most frequent combinations. If the
+distribution is mostly uniform (all combinations about equally frequent), the
+MCV list will be empty.
+
+Estimates of some clauses (e.g. equality) based on MCV lists are more accurate
+than when using histograms.
+
+Also, MCV lists don't necessarily require sorting of the values (the fact that
+we use sorting when building them is implementation detail), but even more
+importantly the ordering is not built into the approximation (while histograms
+are built on ordering). So MCV lists work well even for attributes where the
+ordering of the data type is disconnected from the meaning of the data. For
+example we know how to sort strings, but it's unlikely to make much sense for
+city names (or other label-like attributes).
+
+
+Selectivity estimation
+----------------------
+
+The estimation, implemented in clauselist_mv_selectivity_mcvlist(), is quite
+simple in principle - we need to identify MCV items matching all the clauses
+and sum frequencies of all those items.
+
+Currently MCV lists support estimation of the following clause types:
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR clauses          WHERE (a < 1) OR (b >= 2)
+
+It's possible to add support for additional clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and possibly others. These are tasks for the future, not yet implemented.
+
+
+Estimating equality clauses
+---------------------------
+
+When computing selectivity estimate for equality clauses
+
+    (a = 1) AND (b = 2)
+
+we can do this estimate pretty exactly assuming that two conditions are met:
+
+    (1) there's an equality condition on all attributes of the statistic
+
+    (2) we find a matching item in the MCV list
+
+In this case we know the MCV item represents all tuples matching the clauses,
+and the selectivity estimate is complete (i.e. we don't need to perform
+estimation using the histogram). This is what we call 'full match'.
+
+When only (1) holds, but there's no matching MCV item, we don't know whether
+there are no such rows or just are not very frequent. We can however use the
+frequency of the least frequent MCV item as an upper bound for the selectivity.
+
+For a combination of equality conditions (not full-match case) we can clamp the
+selectivity by the minimum of selectivities for each condition. For example if
+we know the number of distinct values for each column, we can use 1/ndistinct
+as a per-column estimate. Or rather 1/ndistinct + selectivity derived from the
+MCV list.
+
+We should also probably only use the 'residual ndistinct' by exluding the items
+included in the MCV list (and also residual frequency):
+
+     f = (1.0 - sum(MCV frequencies)) / (ndistinct - ndistinct(MCV list))
+
+but it's worth pointing out the ndistinct values are multi-variate for the
+columns referenced by the equality conditions.
+
+Note: Only the "full match" limit is currently implemented.
+
+
+Hashed MCV (not yet implemented)
+--------------------------------
+
+Regular MCV lists have to include actual values for each item, so if those items
+are large the list may be quite large. This is especially true for multi-variate
+MCV lists, although the current implementation partially mitigates this by
+performing de-duplicating the values before storing them on disk.
+
+It's possible to only store hashes (32-bit values) instead of the actual values,
+significantly reducing the space requirements. Obviously, this would only make
+the MCV lists useful for estimating equality conditions (assuming the 32-bit
+hashes make the collisions rare enough).
+
+This might also complicate matching the columns to available stats.
+
+
+TODO Consider implementing hashed MCV list, storing just 32-bit hashes instead
+     of the actual values. This type of MCV list will be useful only for
+     estimating equality clauses, and will reduce space requirements for large
+     varlena types (in such cases we usually only want equality anyway).
+
+TODO Currently there's no logic to consider building only a MCV list (and not
+     building the histogram at all), except for doing this decision manually in
+     ADD STATISTICS.
+
+
+Inspecting the MCV list
+-----------------------
+
+Inspecting the regular (per-attribute) MCV lists is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarrays, so we
+simply get the text representation of the arrays.
+
+With multivariate MCV lits it's not that simple due to the possible mix of
+data types. It might be possible to produce similar array-like representation,
+but that'd unnecessarily complicate further processing and analysis of the MCV
+list. Instead, there's a SRF function providing values, frequencies etc.
+
+    SELECT * FROM pg_mv_mcv_items();
+
+It has two input parameters:
+
+    oid   - OID of the MCV list (pg_mv_statistic.staoid)
+
+and produces a table with these columns:
+
+    - item ID (0...nitems-1)
+    - values (string array)
+    - nulls only (boolean array)
+    - frequency (double precision)
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 814f39c..8d3d268 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -8,9 +8,50 @@ not true, resulting in estimation errors.
 Multivariate stats track different types of dependencies between the columns,
 hopefully improving the estimates.
 
-Currently we only have one kind of multivariate statistics - soft functional
-dependencies, and we use it to improve estimates of equality clauses. See
-README.dependencies for details.
+
+Types of statistics
+-------------------
+
+Currently we only have two kinds of multivariate statistics
+
+    (a) soft functional dependencies (README.dependencies)
+
+    (b) MCV lists (README.mcv)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+    (b) MCV list - equality and inequality clauses, IS [NOT] NULL, AND/OR
+
+Currently only simple operator clauses (Var op Const) are supported, but it's
+possible to support more complex clause types, e.g. (Var op Var).
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
 
 
 Selectivity estimation
@@ -23,21 +64,53 @@ When estimating selectivity, we aim to achieve several things:
     (b) minimize the overhead, especially when no suitable multivariate stats
         exist (so if you are not using multivariate stats, there's no overhead)
 
-This clauselist_selectivity() performs several inexpensive checks first, before
+Thus clauselist_selectivity() performs several inexpensive checks first, before
 even attempting to do the more expensive estimation.
 
     (1) check if there are multivariate stats on the relation
 
-    (2) check there are at least two attributes referenced by clauses compatible
-        with multivariate statistics (equality clauses for func. dependencies)
+    (2) check that there are functional dependencies on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equality clauses for func. dependencies)
 
     (3) perform reduction of equality clauses using func. dependencies
 
-    (4) estimate the reduced list of clauses using regular statistics
+    (4) check that there are multivariate MCV lists on the table, and that
+        there are at least two attributes referenced by compatible clauses
+        (equalities, inequalities, etc.)
+
+    (5) find the best multivariate statistics (matching the most conditions)
+        and use it to compute the estimate
+
+    (6) estimate the remaining clauses (not estimated using multivariate stats)
+        using the regular per-column statistics
 
 Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
+Further (possibly crazy) ideas
+------------------------------
+
+Currently the clauses are only estimated using a single statistics, even if
+there are multiple candidate statistics - for example assume we have statistics
+on (a,b,c) and (b,c,d), and estimate conditions
+
+    (b = 1) AND (c = 2)
+
+Then both statistics may be used, but we only use one of them. Maybe we could
+use compute estimates using all candidate stats, and somehow aggregate them
+into the final estimate by using average or median.
+
+Some stats may give better estimates than others, but it's very difficult to say
+in advance which stats are the best (it depends on the number of buckets, number
+of additional columns not referenced in the clauses, type of condition etc.).
+
+But of course, this may result in expensive estimation (CPU-wise).
+
+So we might add a GUC to choose between a simple (single statistics) and thus
+multi-statistic estimation, possibly table-level parameter (ALTER TABLE ...).
+
+
 Size of sample in ANALYZE
 -------------------------
 When performing ANALYZE, the number of rows to sample is determined as
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index 39e3b92..fc8eae2 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -15,6 +15,7 @@
  */
 
 #include "common.h"
+#include "utils/array.h"
 
 static VacAttrStats **lookup_var_attr_stats(int2vector *attrs,
 					  int natts, VacAttrStats **vacattrstats);
@@ -23,9 +24,9 @@ static List *list_mv_stats(Oid relid);
 
 static void update_mv_stats(Oid relid,
 					  MVNDistinct ndistinct, MVDependencies dependencies,
+					  MCVList mcvlist,
 					  int2vector *attrs, VacAttrStats **stats);
 
-
 /*
  * Compute requested multivariate stats, using the rows sampled for the
  * plain (single-column) stats.
@@ -55,6 +56,8 @@ build_mv_stats(Relation onerel, double totalrows,
 		MVStatisticInfo *stat = (MVStatisticInfo *) lfirst(lc);
 		MVNDistinct	ndistinct = NULL;
 		MVDependencies deps = NULL;
+		MCVList		mcvlist = NULL;
+		int			numrows_filtered = 0;
 
 		VacAttrStats **stats = NULL;
 		int			numatts = 0;
@@ -95,8 +98,12 @@ build_mv_stats(Relation onerel, double totalrows,
 		if (stat->deps_enabled)
 			deps = build_mv_dependencies(numrows, rows, attrs, stats);
 
+		/* build the MCV list */
+		if (stat->mcv_enabled)
+			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
+
 		/* store the statistics in the catalog */
-		update_mv_stats(stat->mvoid, ndistinct, deps, attrs, stats);
+		update_mv_stats(stat->mvoid, ndistinct, deps, mcvlist, attrs, stats);
 	}
 }
 
@@ -178,6 +185,8 @@ list_mv_stats(Oid relid)
 		info->ndist_built = stats->ndist_built;
 		info->deps_enabled = stats->deps_enabled;
 		info->deps_built = stats->deps_built;
+		info->mcv_enabled = stats->mcv_enabled;
+		info->mcv_built = stats->mcv_built;
 
 		result = lappend(result, info);
 	}
@@ -195,11 +204,58 @@ list_mv_stats(Oid relid)
 }
 
 /*
+ * Find attnums of MV stats using the mvoid.
+ */
+int2vector *
+find_mv_attnums(Oid mvoid, Oid *relid)
+{
+	ArrayType  *arr;
+	Datum		adatum;
+	bool		isnull;
+	HeapTuple	htup;
+	int2vector *keys;
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	htup = SearchSysCache1(MVSTATOID,
+						   ObjectIdGetDatum(mvoid));
+
+	/* XXX syscache contains OIDs of deleted stats (not invalidated) */
+	if (!HeapTupleIsValid(htup))
+		return NULL;
+
+	/* starelid */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_starelid, &isnull);
+	Assert(!isnull);
+
+	*relid = DatumGetObjectId(adatum);
+
+	/* stakeys */
+	adatum = SysCacheGetAttr(MVSTATOID, htup,
+							 Anum_pg_mv_statistic_stakeys, &isnull);
+	Assert(!isnull);
+
+	arr = DatumGetArrayTypeP(adatum);
+
+	keys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+						   ARR_DIMS(arr)[0]);
+	ReleaseSysCache(htup);
+
+	/*
+	 * TODO maybe save the list into relcache, as in RelationGetIndexList
+	 * (which was used as an inspiration of this one)?.
+	 */
+
+	return keys;
+}
+
+/*
  * update_mv_stats
  *	Serializes the statistics and stores them into the pg_mv_statistic tuple.
  */
 static void
-update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
+update_mv_stats(Oid mvoid,
+				MVNDistinct ndistinct, MVDependencies dependencies, MCVList mcvlist,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -233,22 +289,36 @@ update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
 			= PointerGetDatum(serialize_mv_dependencies(dependencies));
 	}
 
+	if (mcvlist != NULL)
+	{
+		bytea	   *data = serialize_mv_mcvlist(mcvlist, attrs, stats);
+
+		nulls[Anum_pg_mv_statistic_stamcv - 1] = (data == NULL);
+		values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_standist - 1] = true;
 	replaces[Anum_pg_mv_statistic_stadeps - 1] = true;
+	replaces[Anum_pg_mv_statistic_stamcv - 1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_ndist_built - 1] = false;
 	nulls[Anum_pg_mv_statistic_deps_built - 1] = false;
+	nulls[Anum_pg_mv_statistic_mcv_built - 1] = false;
+
 	nulls[Anum_pg_mv_statistic_stakeys - 1] = false;
 
 	/* use the new attnums, in case we removed some dropped ones */
 	replaces[Anum_pg_mv_statistic_ndist_built - 1] = true;
 	replaces[Anum_pg_mv_statistic_deps_built - 1] = true;
+	replaces[Anum_pg_mv_statistic_mcv_built - 1] = true;
+
 	replaces[Anum_pg_mv_statistic_stakeys - 1] = true;
 
 	values[Anum_pg_mv_statistic_ndist_built - 1] = BoolGetDatum(ndistinct != NULL);
 	values[Anum_pg_mv_statistic_deps_built - 1] = BoolGetDatum(dependencies != NULL);
+	values[Anum_pg_mv_statistic_mcv_built - 1] = BoolGetDatum(mcvlist != NULL);
 
 	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(attrs);
 
@@ -278,6 +348,23 @@ update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
 	heap_close(sd, RowExclusiveLock);
 }
 
+
+int
+mv_get_index(AttrNumber varattno, int2vector *stakeys)
+{
+	int			i,
+				idx = 0;
+
+	for (i = 0; i < stakeys->dim1; i++)
+	{
+		if (stakeys->values[i] < varattno)
+			idx += 1;
+		else
+			break;
+	}
+	return idx;
+}
+
 /* multi-variate stats comparator */
 
 /*
@@ -288,11 +375,15 @@ update_mv_stats(Oid mvoid, MVNDistinct ndistinct, MVDependencies dependencies,
 int
 compare_scalars_simple(const void *a, const void *b, void *arg)
 {
-	Datum		da = *(Datum *) a;
-	Datum		db = *(Datum *) b;
-	SortSupport ssup = (SortSupport) arg;
+	return compare_datums_simple(*(Datum *) a,
+								 *(Datum *) b,
+								 (SortSupport) arg);
+}
 
-	return ApplySortComparator(da, false, db, false, ssup);
+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+	return ApplySortComparator(a, false, b, false, ssup);
 }
 
 /*
@@ -410,3 +501,34 @@ multi_sort_compare_dims(int start, int end,
 
 	return 0;
 }
+
+/* simple counterpart to qsort_arg */
+void *
+bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,
+			int (*compar) (const void *, const void *, void *),
+			void *arg)
+{
+	size_t		l,
+				u,
+				idx;
+	const void *p;
+	int			comparison;
+
+	l = 0;
+	u = nmemb;
+	while (l < u)
+	{
+		idx = (l + u) / 2;
+		p = (void *) (((const char *) base) + (idx * size));
+		comparison = (*compar) (key, p, arg);
+
+		if (comparison < 0)
+			u = idx;
+		else if (comparison > 0)
+			l = idx + 1;
+		else
+			return (void *) p;
+	}
+
+	return NULL;
+}
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index e471c88..fe56f51 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -47,6 +47,15 @@ typedef struct
 	int			tupno;			/* position index for tuple it came from */
 } ScalarItem;
 
+/* (de)serialization info */
+typedef struct DimensionInfo
+{
+	int			nvalues;		/* number of deduplicated values */
+	int			nbytes;			/* number of bytes (serialized) */
+	int			typlen;			/* pg_type.typlen */
+	bool		typbyval;		/* pg_type.typbyval */
+} DimensionInfo;
+
 /* multi-sort */
 typedef struct MultiSortSupportData
 {
@@ -60,6 +69,7 @@ typedef struct SortItem
 {
 	Datum	   *values;
 	bool	   *isnull;
+	int			count;
 } SortItem;
 
 MultiSortSupport multi_sort_init(int ndims);
@@ -67,7 +77,7 @@ MultiSortSupport multi_sort_init(int ndims);
 void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 						 int dim, VacAttrStats **vacattrstats);
 
-int multi_sort_compare(const void *a, const void *b, void *arg);
+int			multi_sort_compare(const void *a, const void *b, void *arg);
 
 int multi_sort_compare_dim(int dim, const SortItem *a,
 					   const SortItem *b, MultiSortSupport mss);
@@ -76,5 +86,11 @@ int multi_sort_compare_dims(int start, int end, const SortItem *a,
 						const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
-int compare_scalars_simple(const void *a, const void *b, void *arg);
-int compare_scalars_partition(const void *a, const void *b, void *arg);
+int			compare_datums_simple(Datum a, Datum b, SortSupport ssup);
+int			compare_scalars_simple(const void *a, const void *b, void *arg);
+int			compare_scalars_partition(const void *a, const void *b, void *arg);
+
+void *bsearch_arg(const void *key, const void *base,
+			size_t nmemb, size_t size,
+			int (*compar) (const void *, const void *, void *),
+			void *arg);
diff --git a/src/backend/utils/mvstats/mcv.c b/src/backend/utils/mvstats/mcv.c
new file mode 100644
index 0000000..c1c2409
--- /dev/null
+++ b/src/backend/utils/mvstats/mcv.c
@@ -0,0 +1,1184 @@
+/*-------------------------------------------------------------------------
+ *
+ * mcv.c
+ *	  POSTGRES multivariate MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/bytea.h"
+#include "utils/lsyscache.h"
+
+#include "common.h"
+
+/*
+ * Each serialized item needs to store (in this order):
+ *
+ * - indexes			  (ndim * sizeof(uint16))
+ * - null flags			  (ndim * sizeof(bool))
+ * - frequency			  (sizeof(double))
+ *
+ * So in total:
+ *
+ *	 ndim * (sizeof(uint16) + sizeof(bool)) + sizeof(double)
+ */
+#define ITEM_SIZE(ndims)	\
+	(ndims * (sizeof(uint16) + sizeof(bool)) + sizeof(double))
+
+/* Macros for convenient access to parts of the serialized MCV item */
+#define ITEM_INDEXES(item)			((uint16*)item)
+#define ITEM_NULLS(item,ndims)		((bool*)(ITEM_INDEXES(item) + ndims))
+#define ITEM_FREQUENCY(item,ndims)	((double*)(ITEM_NULLS(item,ndims) + ndims))
+
+static MultiSortSupport build_mss(VacAttrStats **stats, int2vector *attrs);
+
+static SortItem *build_sorted_items(int numrows, HeapTuple *rows,
+				   TupleDesc tdesc, MultiSortSupport mss,
+				   int2vector *attrs);
+
+static SortItem *build_distinct_groups(int numrows, SortItem *items,
+					  MultiSortSupport mss, int *ndistinct);
+
+static int count_distinct_groups(int numrows, SortItem *items,
+					  MultiSortSupport mss);
+
+/*
+ * Builds MCV list from the set of sampled rows.
+ *
+ * The algorithm is quite simple:
+ *
+ *	   (1) sort the data (default collation, '<' for the data type)
+ *
+ *	   (2) count distinct groups, decide how many to keep
+ *
+ *	   (3) build the MCV list using the threshold determined in (2)
+ *
+ *	   (4) remove rows represented by the MCV from the sample
+ *
+ * The method also removes rows matching the MCV items from the input array,
+ * and passes the number of remaining rows (useful for building histograms)
+ * using the numrows_filtered parameter.
+ *
+ * FIXME: Single-dimensional MCV is sorted by frequency (descending). We should
+ * do that too, because when walking through the list we want to check
+ * the most frequent items first.
+ *
+ * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4).
+ * Maybe we could save some space here, but the bytea compression should
+ * handle it just fine.
+ *
+ * TODO: This probably should not use the ndistinct directly (as computed from
+ * the table, but rather estimate the number of distinct values in the
+ * table), no?
+ */
+MCVList
+build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered)
+{
+	int			i;
+	int			numattrs = attrs->dim1;
+	int			ndistinct = 0;
+	int			mcv_threshold = 0;
+	int			nitems = 0;
+
+	MCVList		mcvlist = NULL;
+
+	/* comparator for all the columns */
+	MultiSortSupport mss = build_mss(stats, attrs);
+
+	/* sort the rows */
+	SortItem   *items = build_sorted_items(numrows, rows, stats[0]->tupDesc,
+										   mss, attrs);
+
+	/* transform the sorted rows into groups (sorted by frequency) */
+	SortItem   *groups = build_distinct_groups(numrows, items, mss, &ndistinct);
+
+	/*
+	 * Determine the minimum size of a group to be eligible for MCV list, and
+	 * check how many groups actually pass that threshold. We use 1.25x the
+	 * avarage group size, just like for regular statistics.
+	 *
+	 * But if we can fit all the distinct values in the MCV list (i.e. if
+	 * there are less distinct groups than MVSTAT_MCVLIST_MAX_ITEMS), we'll
+	 * require only 2 rows per group.
+	 */
+	mcv_threshold = 1.25 * numrows / ndistinct;
+	mcv_threshold = (mcv_threshold < 4) ? 4 : mcv_threshold;
+
+	if (ndistinct <= MVSTAT_MCVLIST_MAX_ITEMS)
+		mcv_threshold = 2;
+
+	/* Walk through the groups and stop once we fall below the threshold. */
+	nitems = 0;
+	for (i = 0; i < ndistinct; i++)
+	{
+		if (groups[i].count < mcv_threshold)
+			break;
+
+		nitems++;
+	}
+
+	/* we know the number of MCV list items, so let's build the list */
+	if (nitems > 0)
+	{
+		/* allocate the MCV list structure, set parameters we know */
+		mcvlist = (MCVList) palloc0(sizeof(MCVListData));
+
+		mcvlist->magic = MVSTAT_MCV_MAGIC;
+		mcvlist->type = MVSTAT_MCV_TYPE_BASIC;
+		mcvlist->ndimensions = numattrs;
+		mcvlist->nitems = nitems;
+
+		/*
+		 * Preallocate Datum/isnull arrays (not as a single chunk, as we will
+		 * pass the result outside and thus it needs to be easy to pfree().
+		 *
+		 * XXX Although we're the only ones dealing with this.
+		 */
+		mcvlist->items = (MCVItem *) palloc0(sizeof(MCVItem) * nitems);
+
+		for (i = 0; i < nitems; i++)
+		{
+			mcvlist->items[i] = (MCVItem) palloc0(sizeof(MCVItemData));
+			mcvlist->items[i]->values = (Datum *) palloc0(sizeof(Datum) * numattrs);
+			mcvlist->items[i]->isnull = (bool *) palloc0(sizeof(bool) * numattrs);
+		}
+
+		/* Copy the first chunk of groups into the result. */
+		for (i = 0; i < nitems; i++)
+		{
+			/* just pointer to the proper place in the list */
+			MCVItem		item = mcvlist->items[i];
+
+			/* copy values from the _previous_ group (last item of) */
+			memcpy(item->values, groups[i].values, sizeof(Datum) * numattrs);
+			memcpy(item->isnull, groups[i].isnull, sizeof(bool) * numattrs);
+
+			/* and finally the group frequency */
+			item->frequency = (double) groups[i].count / numrows;
+		}
+
+		/* make sure the loops are consistent */
+		Assert(nitems == mcvlist->nitems);
+
+		/*
+		 * Remove the rows matching the MCV list (i.e. keep only rows that are
+		 * not represented by the MCV list). We will first sort the groups by
+		 * the keys (not by count) and then use binary search.
+		 */
+		if (nitems > ndistinct)
+		{
+			int			i,
+						j;
+			int			nfiltered = 0;
+
+			/* used for the searches */
+			SortItem	key;
+
+			/* wfill this with data from the rows */
+			key.values = (Datum *) palloc0(numattrs * sizeof(Datum));
+			key.isnull = (bool *) palloc0(numattrs * sizeof(bool));
+
+			/*
+			 * Sort the groups for bsearch_r (but only the items that actually
+			 * made it to the MCV list).
+			 */
+			qsort_arg((void *) groups, nitems, sizeof(SortItem),
+					  multi_sort_compare, mss);
+
+			/* walk through the tuples, compare the values to MCV items */
+			for (i = 0; i < numrows; i++)
+			{
+				/* collect the key values from the row */
+				for (j = 0; j < numattrs; j++)
+					key.values[j]
+						= heap_getattr(rows[i], attrs->values[j],
+									   stats[j]->tupDesc, &key.isnull[j]);
+
+				/* if not included in the MCV list, keep it in the array */
+				if (bsearch_arg(&key, groups, nitems, sizeof(SortItem),
+								multi_sort_compare, mss) == NULL)
+					rows[nfiltered++] = rows[i];
+			}
+
+			/* remember how many rows we actually kept */
+			*numrows_filtered = nfiltered;
+
+			/* free all the data used here */
+			pfree(key.values);
+			pfree(key.isnull);
+		}
+		else
+			/* the MCV list convers all the rows */
+			*numrows_filtered = 0;
+	}
+
+	pfree(items);
+	pfree(groups);
+
+	return mcvlist;
+}
+
+/* build MultiSortSupport for the attributes passed in attrs */
+static MultiSortSupport
+build_mss(VacAttrStats **stats, int2vector *attrs)
+{
+	int			i;
+	int			numattrs = attrs->dim1;
+
+	/* Sort by multiple columns (using array of SortSupport) */
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/* prepare the sort functions for all the attributes */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	return mss;
+}
+
+/* build sorted array of SortItem with values from rows */
+static SortItem *
+build_sorted_items(int numrows, HeapTuple *rows, TupleDesc tdesc,
+				   MultiSortSupport mss, int2vector *attrs)
+{
+	int			i,
+				j,
+				len;
+	int			numattrs = attrs->dim1;
+	int			nvalues = numrows * numattrs;
+
+	/*
+	 * We won't allocate the arrays for each item independenly, but in one
+	 * large chunk and then just set the pointers.
+	 */
+	SortItem   *items;
+	Datum	   *values;
+	bool	   *isnull;
+	char	   *ptr;
+
+	/* Compute the total amount of memory we need (both items and values). */
+	len = numrows * sizeof(SortItem) + nvalues * (sizeof(Datum) + sizeof(bool));
+
+	/* Allocate the memory and split it into the pieces. */
+	ptr = palloc0(len);
+
+	/* items to sort */
+	items = (SortItem *) ptr;
+	ptr += numrows * sizeof(SortItem);
+
+	/* values and null flags */
+	values = (Datum *) ptr;
+	ptr += nvalues * sizeof(Datum);
+
+	isnull = (bool *) ptr;
+	ptr += nvalues * sizeof(bool);
+
+	/* make sure we consumed the whole buffer exactly */
+	Assert((ptr - (char *) items) == len);
+
+	/* fix the pointers to Datum and bool arrays */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+
+		/* load the values/null flags from sample rows */
+		for (j = 0; j < numattrs; j++)
+		{
+			items[i].values[j] = heap_getattr(rows[i],
+											  attrs->values[j], /* attnum */
+											  tdesc,
+											  &items[i].isnull[j]);		/* isnull */
+		}
+	}
+
+	/* do the sort, using the multi-sort */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	return items;
+}
+
+/* count distinct combinations of SortItems in the array */
+static int
+count_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss)
+{
+	int			i;
+	int			ndistinct;
+
+	ndistinct = 1;
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+			ndistinct += 1;
+
+	return ndistinct;
+}
+
+/* compares frequencies of the SortItem entries (in descending order) */
+static int
+compare_sort_item_count(const void *a, const void *b)
+{
+	SortItem   *ia = (SortItem *) a;
+	SortItem   *ib = (SortItem *) b;
+
+	if (ia->count == ib->count)
+		return 0;
+	else if (ia->count > ib->count)
+		return -1;
+
+	return 1;
+}
+
+/* builds SortItems for distinct groups and counts the matching items */
+static SortItem *
+build_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss,
+					  int *ndistinct)
+{
+	int			i,
+				j;
+	int			ngroups = count_distinct_groups(numrows, items, mss);
+
+	SortItem   *groups = (SortItem *) palloc0(ngroups * sizeof(SortItem));
+
+	j = 0;
+	groups[0] = items[0];
+	groups[0].count = 1;
+
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+			groups[++j] = items[i];
+
+		groups[j].count++;
+	}
+
+	pg_qsort((void *) groups, ngroups, sizeof(SortItem),
+			 compare_sort_item_count);
+
+	*ndistinct = ngroups;
+	return groups;
+}
+
+
+/* fetch the MCV list (as a bytea) from the pg_mv_statistic catalog */
+MCVList
+load_mv_mcvlist(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		mcvlist;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (!HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->mcv_enabled && mvstat->mcv_built);
+#endif
+
+	mcvlist = SysCacheGetAttr(MVSTATOID, htup,
+							  Anum_pg_mv_statistic_stamcv, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_mcvlist(DatumGetByteaP(mcvlist));
+}
+
+/* print some basic info about the MCV list
+ *
+ * TODO: Add info about what part of the table this covers.
+ */
+Datum
+pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MCVList		mcvlist = deserialize_mv_mcvlist(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nitems=%d", mcvlist->nitems);
+
+	pfree(mcvlist);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * serialize MCV list into a bytea value
+ *
+ *
+ * The basic algorithm is simple:
+ *
+ * (1) perform deduplication (for each attribute separately)
+ *	   (a) collect all (non-NULL) attribute values from all MCV items
+ *	   (b) sort the data (using 'lt' from VacAttrStats)
+ *	   (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all MCV list items
+ *	   (a) replace values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, because we may be mixing
+ * different datatypes, with different sort operators, etc.
+ *
+ * We'll use uint16 values for the indexes in step (3), as we don't allow more
+ * than 8k MCV items, although that's mostly arbitrary limit. We might increase
+ * this to 65k and still fit into uint16.
+ *
+ * We don't really expect the serialization to save as much space as for
+ * histograms, because we are not doing any bucket splits (which is the source
+ * of high redundancy in histograms).
+ *
+ * TODO: Consider packing boolean flags (NULL) for each item into a single char
+ * (or a longer type) instead of using an array of bool items.
+ */
+bytea *
+serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+					 VacAttrStats **stats)
+{
+	int			i,
+				j;
+	int			ndims = mcvlist->ndimensions;
+	int			itemsize = ITEM_SIZE(ndims);
+
+	SortSupport ssup;
+	DimensionInfo *info;
+
+	Size		total_length;
+
+	/* allocate just once */
+	char	   *item = palloc0(itemsize);
+
+	/* serialized items (indexes into arrays, etc.) */
+	bytea	   *output;
+	char	   *data = NULL;
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum	  **values = (Datum **) palloc0(sizeof(Datum *) * ndims);
+	int		   *counts = (int *) palloc0(sizeof(int) * ndims);
+
+	/*
+	 * We'll include some rudimentary information about the attributes (type
+	 * length, etc.), so that we don't have to look them up while
+	 * deserializing the MCV list.
+	 */
+	info = (DimensionInfo *) palloc0(sizeof(DimensionInfo) * ndims);
+
+	/* sort support data for all attributes included in the MCV list */
+	ssup = (SortSupport) palloc0(sizeof(SortSupportData) * ndims);
+
+	/* collect and deduplicate values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+		int			ndistinct;
+		StdAnalyzeData *tmp = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* copy important info about the data type (length, by-value) */
+		info[i].typlen = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/* allocate space for values in the attribute and collect them */
+		values[i] = (Datum *) palloc0(sizeof(Datum) * mcvlist->nitems);
+
+		for (j = 0; j < mcvlist->nitems; j++)
+		{
+			/* skip NULL values - we don't need to serialize them */
+			if (mcvlist->items[j]->isnull[i])
+				continue;
+
+			values[i][counts[i]] = mcvlist->items[j]->values[i];
+			counts[i] += 1;
+		}
+
+		/* there are just NULL values in this dimension, we're done */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate the data */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicate values, but keep the
+		 * ordering (so that we can do bsearch later). We know there's at
+		 * least one item as (counts[i] != 0), so we can skip the first
+		 * element.
+		 */
+		ndistinct = 1;			/* number of distinct values */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if the value is the same as the previous one, we can skip it */
+			if (!compare_datums_simple(values[i][j - 1], values[i][j], &ssup[i]))
+				continue;
+
+			values[i][ndistinct] = values[i][j];
+			ndistinct += 1;
+		}
+
+		/* we must not exceed UINT16_MAX, as we use uint16 indexes */
+		Assert(ndistinct <= UINT16_MAX);
+
+		/*
+		 * Store additional info about the attribute - number of deduplicated
+		 * values, and also size of the serialized data. For fixed-length data
+		 * types this is trivial to compute, for varwidth types we need to
+		 * actually walk the array and sum the sizes.
+		 */
+		info[i].nvalues = ndistinct;
+
+		if (info[i].typlen > 0) /* fixed-length data types */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)	/* varlena */
+		{
+			info[i].nbytes = 0;
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		}
+		else if (info[i].typlen == -2)	/* cstring */
+		{
+			info[i].nbytes = 0;
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		}
+
+		/* we know (count>0) so there must be some data */
+		Assert(info[i].nbytes > 0);
+	}
+
+	/*
+	 * Now we can finally compute how much space we'll actually need for the
+	 * serialized MCV list, as it contains these fields:
+	 *
+	 * - length (4B) for varlena - magic (4B) - type (4B) - ndimensions (4B) -
+	 * nitems (4B) - info (ndim * sizeof(DimensionInfo) - arrays of values for
+	 * each dimension - serialized items (nitems * itemsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and then we
+	 * will place all the data (values + indexes).
+	 */
+	total_length = (sizeof(int32) + offsetof(MCVListData, items)
+					+ndims * sizeof(DimensionInfo)
+					+ mcvlist->nitems * itemsize);
+
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (1024 * 1024))
+		elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
+
+	/* allocate space for the serialized MCV list, set header fields */
+	output = (bytea *) palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* 'data' points to the current position in the output buffer */
+	data = VARDATA(output);
+
+	/* MCV list header (number of items, ...) */
+	memcpy(data, mcvlist, offsetof(MCVListData, items));
+	data += offsetof(MCVListData, items);
+
+	/* information about the attributes */
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* now serialize the deduplicated values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char	   *tmp = data; /* remember the starting point */
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			Datum		v = values[i][j];
+
+			if (info[i].typbyval)		/* passed by value */
+			{
+				memcpy(data, &v, info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)		/* pased by reference */
+			{
+				memcpy(data, DatumGetPointer(v), info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)		/* varlena */
+			{
+				memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+				data += VARSIZE_ANY(v);
+			}
+			else if (info[i].typlen == -2)		/* cstring */
+			{
+				memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
+				data += strlen(DatumGetPointer(v)) + 1; /* terminator */
+			}
+		}
+
+		/* make sure we got exactly the amount of data we expected */
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* finally serialize the items, with uint16 indexes instead of the values */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		MCVItem		mcvitem = mcvlist->items[i];
+
+		/* don't write beyond the allocated space */
+		Assert(data <= (char *) output + total_length - itemsize);
+
+		/* reset the item (we only allocate it once and reuse it) */
+		memset(item, 0, itemsize);
+
+		for (j = 0; j < ndims; j++)
+		{
+			Datum	   *v = NULL;
+
+			/* do the lookup only for non-NULL values */
+			if (mcvlist->items[i]->isnull[j])
+				continue;
+
+			v = (Datum *) bsearch_arg(&mcvitem->values[j], values[j],
+									  info[j].nvalues, sizeof(Datum),
+									  compare_scalars_simple, &ssup[j]);
+
+			Assert(v != NULL);	/* serialization or deduplication error */
+
+			/* compute index within the array */
+			ITEM_INDEXES(item)[j] = (v - values[j]);
+
+			/* check the index is within expected bounds */
+			Assert(ITEM_INDEXES(item)[j] >= 0);
+			Assert(ITEM_INDEXES(item)[j] < info[j].nvalues);
+		}
+
+		/* copy NULL and frequency flags into the item */
+		memcpy(ITEM_NULLS(item, ndims), mcvitem->isnull, sizeof(bool) * ndims);
+		memcpy(ITEM_FREQUENCY(item, ndims), &mcvitem->frequency, sizeof(double));
+
+		/* copy the serialized item into the array */
+		memcpy(data, item, itemsize);
+
+		data += itemsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char *) output) == total_length);
+
+	return output;
+}
+
+/*
+ * deserialize MCV list from the varlena value
+ *
+ *
+ * We deserialize the MCV list fully, because we don't expect there bo be a lot
+ * of duplicate values. But perhaps we should keep the MCV in serialized form
+ * just like histograms.
+ */
+MCVList
+deserialize_mv_mcvlist(bytea *data)
+{
+	int			i,
+				j;
+	Size		expected_size;
+	MCVList		mcvlist;
+	char	   *tmp;
+
+	int			ndims,
+				nitems,
+				itemsize;
+	DimensionInfo *info = NULL;
+
+	uint16	   *indexes = NULL;
+	Datum	  **values = NULL;
+
+	/* local allocation buffer (used only for deserialization) */
+	int			bufflen;
+	char	   *buff;
+	char	   *ptr;
+
+	/* buffer used for the result */
+	int			rbufflen;
+	char	   *rbuff;
+	char	   *rptr;
+
+	if (data == NULL)
+		return NULL;
+
+	/* we can't deserialize the MCV if there's not even a complete header */
+	expected_size = offsetof(MCVListData, items);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV Size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MCVListData, items));
+
+	/* read the MCV list header */
+	mcvlist = (MCVList) palloc0(sizeof(MCVListData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform further sanity checks */
+	memcpy(mcvlist, tmp, offsetof(MCVListData, items));
+	tmp += offsetof(MCVListData, items);
+
+	if (mcvlist->magic != MVSTAT_MCV_MAGIC)
+		elog(ERROR, "invalid MCV magic %d (expected %dd)",
+			 mcvlist->magic, MVSTAT_MCV_MAGIC);
+
+	if (mcvlist->type != MVSTAT_MCV_TYPE_BASIC)
+		elog(ERROR, "invalid MCV type %d (expected %dd)",
+			 mcvlist->type, MVSTAT_MCV_TYPE_BASIC);
+
+	nitems = mcvlist->nitems;
+	ndims = mcvlist->ndimensions;
+	itemsize = ITEM_SIZE(ndims);
+
+	Assert((nitems > 0) && (nitems <= MVSTAT_MCVLIST_MAX_ITEMS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * Check amount of data including DimensionInfo for all dimensions and
+	 * also the serialized items (including uint16 indexes). Also, walk
+	 * through the dimension information and add it to the sum.
+	 */
+	expected_size += ndims * sizeof(DimensionInfo) +
+		(nitems * itemsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid MCV size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo *) (tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+	{
+		Assert(info[i].nvalues >= 0);
+		Assert(info[i].nbytes >= 0);
+
+		expected_size += info[i].nbytes;
+	}
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid MCV size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/*
+	 * Allocate one large chunk of memory for the intermediate data, needed
+	 * only for deserializing the MCV list (and allocate densely to minimize
+	 * the palloc overhead).
+	 *
+	 * Let's see how much space we'll actually need, and also include space
+	 * for the array with pointers.
+	 */
+	bufflen = sizeof(Datum *) * ndims;	/* space for pointers */
+
+	for (i = 0; i < ndims; i++)
+		/* for full-size byval types, we reuse the serialized value */
+		if (!(info[i].typbyval && info[i].typlen == sizeof(Datum)))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	buff = palloc0(bufflen);
+	ptr = buff;
+
+	values = (Datum **) buff;
+	ptr += (sizeof(Datum *) * ndims);
+
+	/*
+	 * XXX This uses pointers to the original data array (the types not passed
+	 * by value), so when someone frees the memory, e.g. by doing something
+	 * like this:
+	 *
+	 * bytea * data = ... fetch the data from catalog ... MCVList mcvlist =
+	 * deserialize_mcv_list(data); pfree(data);
+	 *
+	 * then 'mcvlist' references the freed memory. Should copy the pieces.
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				values[i] = (Datum *) tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				values[i] = (Datum *) ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the other types need a chunk of the buffer */
+			values[i] = (Datum *) ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			/* pased by reference, but fixed length (name, tid, ...) */
+			if (info[i].typlen > 0)
+			{
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1);	/* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	/* we should have exhausted the buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	/* allocate space for all the MCV items in a single piece */
+	rbufflen = (sizeof(MCVItem) + sizeof(MCVItemData) +
+				sizeof(Datum) * ndims + sizeof(bool) * ndims) * nitems;
+
+	rbuff = palloc0(rbufflen);
+	rptr = rbuff;
+
+	mcvlist->items = (MCVItem *) rbuff;
+	rptr += (sizeof(MCVItem) * nitems);
+
+	for (i = 0; i < nitems; i++)
+	{
+		MCVItem		item = (MCVItem) rptr;
+
+		rptr += (sizeof(MCVItemData));
+
+		item->values = (Datum *) rptr;
+		rptr += (sizeof(Datum) * ndims);
+
+		item->isnull = (bool *) rptr;
+		rptr += (sizeof(bool) * ndims);
+
+		/* just point to the right place */
+		indexes = ITEM_INDEXES(tmp);
+
+		memcpy(item->isnull, ITEM_NULLS(tmp, ndims), sizeof(bool) * ndims);
+		memcpy(&item->frequency, ITEM_FREQUENCY(tmp, ndims), sizeof(double));
+
+#ifdef ASSERT_CHECKING
+		for (j = 0; j < ndims; j++)
+			Assert(indexes[j] <= UINT16_MAX);
+#endif
+
+		/* translate the values */
+		for (j = 0; j < ndims; j++)
+			if (!item->isnull[j])
+				item->values[j] = values[j][indexes[j]];
+
+		mcvlist->items[i] = item;
+
+		tmp += ITEM_SIZE(ndims);
+
+		Assert(tmp <= (char *) data + VARSIZE_ANY(data));
+	}
+
+	/* check that we processed all the data */
+	Assert(tmp == (char *) data + VARSIZE_ANY(data));
+
+	/* release the temporary buffer */
+	pfree(buff);
+
+	return mcvlist;
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - item ID (0...nitems)
+ * - values (string array)
+ * - nulls only (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if
+ * the statistics contains no histogram.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_mcv_items);
+
+Datum
+pg_mv_mcv_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	int			call_cntr;
+	int			max_calls;
+	TupleDesc	tupdesc;
+	AttInMetadata *attinmeta;
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext oldcontext;
+		MCVList		mcvlist;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = load_mv_mcvlist(PG_GETARG_OID(0));
+
+		funcctx->user_fctx = mcvlist;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/* build metadata needed later to produce tuples from raw C-strings */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)	/* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+
+		char	   *buff = palloc0(1024);
+		char	   *format;
+
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MCVList		mcvlist;
+		MCVItem		item;
+
+		mcvlist = (MCVList) funcctx->user_fctx;
+
+		Assert(call_cntr < mcvlist->nitems);
+
+		item = mcvlist->items[call_cntr];
+
+		stakeys = find_mv_attnums(PG_GETARG_OID(0), &relid);
+
+		/*
+		 * Prepare a values array for building the returned tuple. This should
+		 * be an array of C strings which will be processed later by the type
+		 * input functions.
+		 */
+		values = (char **) palloc(4 * sizeof(char *));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		/* arrays */
+		values[1] = (char *) palloc0(1024 * sizeof(char));
+		values[2] = (char *) palloc0(1024 * sizeof(char));
+
+		/* frequency */
+		values[3] = (char *) palloc(64 * sizeof(char));
+
+		outfuncs = (Oid *) palloc0(sizeof(Oid) * mcvlist->ndimensions);
+		fmgrinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * mcvlist->ndimensions);
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			bool		isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);		/* item ID */
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			Datum		val,
+						valout;
+
+			format = "%s, %s";
+			if (i == 0)
+				format = "{%s%s";
+			else if (i == mcvlist->ndimensions - 1)
+				format = "%s, %s}";
+
+			if (item->isnull[i])
+				valout = CStringGetDatum("NULL");
+			else
+			{
+				val = item->values[i];
+				valout = FunctionCall1(&fmgrinfo[i], val);
+			}
+
+			snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
+			strncpy(values[1], buff, 1023);
+			buff[0] = '\0';
+
+			snprintf(buff, 1024, format, values[2], item->isnull[i] ? "t" : "f");
+			strncpy(values[2], buff, 1023);
+			buff[0] = '\0';
+		}
+
+		snprintf(values[3], 64, "%f", item->frequency); /* frequency */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[1]);
+		pfree(values[2]);
+		pfree(values[3]);
+
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else	/* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+/*
+ * pg_mcv_list_in		- input routine for type PG_MCV_LIST.
+ *
+ * pg_mcv_list is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_mcv_list_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_mcv_list")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+
+/*
+ * pg_mcv_list_out		- output routine for type PG_MCV_LIST.
+ *
+ * MCV lists are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ *
+ * FIXME not implemented yet, returning dummy value
+ */
+Datum
+pg_mcv_list_out(PG_FUNCTION_ARGS)
+{
+	return byteaout(fcinfo);
+}
+
+/*
+ * pg_mcv_list_recv		- binary input routine for type PG_MCV_LIST.
+ */
+Datum
+pg_mcv_list_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_mcv_list")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_mcv_list_send		- binary output routine for type PG_MCV_LIST.
+ *
+ * XXX MCV lists are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_mcv_list_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index e7d5b51..db74d93 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2298,8 +2298,8 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 							  "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-							  "  ndist_enabled,\n"
-							  "  ndist_built,\n"
+							  "  ndist_enabled, deps_enabled, mcv_enabled,\n"
+							  "  ndist_built, deps_built, mcv_built,\n"
 							  "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 							  "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2317,6 +2317,8 @@ describeOneTableDetails(const char *schemaname,
 				printTableAddFooter(&cont, _("Statistics:"));
 				for (i = 0; i < tuples; i++)
 				{
+					bool		first = true;
+
 					printfPQExpBuffer(&buf, "    ");
 
 					/* statistics name (qualified with namespace) */
@@ -2326,10 +2328,22 @@ describeOneTableDetails(const char *schemaname,
 
 					/* options */
 					if (!strcmp(PQgetvalue(result, i, 4), "t"))
-						appendPQExpBuffer(&buf, "(dependencies)");
+					{
+						appendPQExpBuffer(&buf, "(dependencies");
+						first = false;
+					}
+
+					if (!strcmp(PQgetvalue(result, i, 5), "t"))
+					{
+						if (!first)
+							appendPQExpBuffer(&buf, ", mcv");
+						else
+							appendPQExpBuffer(&buf, "(mcv");
+						first = false;
+					}
 
-					appendPQExpBuffer(&buf, " ON (%s)",
-									  PQgetvalue(result, i, 6));
+					appendPQExpBuffer(&buf, ") ON (%s)",
+									  PQgetvalue(result, i, 9));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 22fa4b8..80d8ea2 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -262,6 +262,11 @@ DATA(insert (  3353	 25    0 i i ));
 DATA(insert (  3358	 17    0 i b ));
 DATA(insert (  3358	 25    0 i i ));
 
+/* pg_mcv_list can be coerced to, but not from, bytea and text */
+DATA(insert (  441	 17    0 i b ));
+DATA(insert (  441	 25    0 i i ));
+
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index e119cb7..34049d6 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -39,10 +39,12 @@ CATALOG(pg_mv_statistic,3381)
 	/* statistics requested to build */
 	bool		ndist_enabled;	/* build ndist coefficient? */
 	bool		deps_enabled;	/* analyze dependencies? */
+	bool		mcv_enabled;	/* build MCV list? */
 
 	/* statistics that are available (if requested) */
 	bool		ndist_built;	/* ndistinct coeff built */
 	bool		deps_built;		/* dependencies were built */
+	bool		mcv_built;		/* MCV list was built */
 
 	/*
 	 * variable-length fields start here, but we allow direct access to
@@ -53,6 +55,7 @@ CATALOG(pg_mv_statistic,3381)
 #ifdef CATALOG_VARLEN
 	pg_ndistinct		standist;		/* ndistinct coeff (serialized) */
 	pg_dependencies		stadeps;		/* dependencies (serialized) */
+	pg_mcv_list			stamcv;			/* MCV list (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -68,17 +71,20 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					11
+#define Natts_pg_mv_statistic					14
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
 #define Anum_pg_mv_statistic_staowner			4
 #define Anum_pg_mv_statistic_ndist_enabled		5
 #define Anum_pg_mv_statistic_deps_enabled		6
-#define Anum_pg_mv_statistic_ndist_built		7
-#define Anum_pg_mv_statistic_deps_built			8
-#define Anum_pg_mv_statistic_stakeys			9
-#define Anum_pg_mv_statistic_standist			10
-#define Anum_pg_mv_statistic_stadeps			11
+#define Anum_pg_mv_statistic_mcv_enabled		7
+#define Anum_pg_mv_statistic_ndist_built		8
+#define Anum_pg_mv_statistic_deps_built			9
+#define Anum_pg_mv_statistic_mcv_built			10
+#define Anum_pg_mv_statistic_stakeys			11
+#define Anum_pg_mv_statistic_standist			12
+#define Anum_pg_mv_statistic_stadeps			13
+#define Anum_pg_mv_statistic_stamcv				14
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b1f7b75..7cf1e5a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2726,6 +2726,11 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "441" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_mcvlist_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: MCV list info");
+DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
+DESCR("details about MCV list items");
+
 DATA(insert OID = 3354 (  pg_ndistinct_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3353 "2275" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_in _null_ _null_ _null_ ));
 DESCR("I/O");
 DATA(insert OID = 3355 (  pg_ndistinct_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3353" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_out _null_ _null_ _null_ ));
@@ -2744,6 +2749,15 @@ DESCR("I/O");
 DATA(insert OID = 3362 (  pg_dependencies_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3358" _null_ _null_ _null_ _null_ _null_	pg_dependencies_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 442 (  pg_mcv_list_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 441 "2275" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 443 (  pg_mcv_list_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "441" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 444 (  pg_mcv_list_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 441 "2281" _null_ _null_ _null_ _null_ _null_ pg_mcv_list_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 445 (  pg_mcv_list_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "441" _null_ _null_ _null_ _null_ _null_	pg_mcv_list_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index da637d4..fbac135 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -372,6 +372,10 @@ DATA(insert OID = 3358 ( pg_dependencies		PGNSP PGUID -1 f b S f t \054 0 0 0 pg
 DESCR("multivariate histogram");
 #define PGDEPENDENCIESOID	3358
 
+DATA(insert OID = 441 ( pg_mcv_list		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_mcv_list_in pg_mcv_list_out pg_mcv_list_recv pg_mcv_list_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate MCV list");
+#define PGMCVLISTOID	441
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 56957e8..d912827 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -681,12 +681,14 @@ typedef struct MVStatisticInfo
 	RelOptInfo *rel;			/* back-link to index's table */
 
 	/* enabled statistics */
-	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		ndist_enabled;	/* ndistinct coefficient enabled */
+	bool		deps_enabled;	/* functional dependencies enabled */
+	bool		mcv_enabled;	/* MCV list enabled */
 
 	/* built/available statistics */
-	bool		deps_built;		/* functional dependencies built */
 	bool		ndist_built;	/* ndistinct coefficient built */
+	bool		deps_built;		/* functional dependencies built */
+	bool		mcv_built;		/* MCV list built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 9ffd80c..9ed080a 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -77,6 +77,10 @@ extern Datum pg_dependencies_in(PG_FUNCTION_ARGS);
 extern Datum pg_dependencies_out(PG_FUNCTION_ARGS);
 extern Datum pg_dependencies_recv(PG_FUNCTION_ARGS);
 extern Datum pg_dependencies_send(PG_FUNCTION_ARGS);
+extern Datum pg_mcv_list_in(PG_FUNCTION_ARGS);
+extern Datum pg_mcv_list_out(PG_FUNCTION_ARGS);
+extern Datum pg_mcv_list_recv(PG_FUNCTION_ARGS);
+extern Datum pg_mcv_list_send(PG_FUNCTION_ARGS);
 
 /* regexp.c */
 extern char *regexp_fixed_prefix(text *text_re, bool case_insensitive,
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index b230747..0c4f621 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -17,6 +17,14 @@
 #include "fmgr.h"
 #include "commands/vacuum.h"
 
+/*
+ * Degree of how much MCV item matches a clause.
+ * This is then considered when computing the selectivity.
+ */
+#define MVSTATS_MATCH_NONE		0		/* no match at all */
+#define MVSTATS_MATCH_PARTIAL	1		/* partial match */
+#define MVSTATS_MATCH_FULL		2		/* full match */
+
 #define MVSTATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
 #define MVSTAT_NDISTINCT_MAGIC		0xA352BFA4		/* marks serialized bytea */
@@ -65,6 +73,42 @@ typedef struct MVDependenciesData
 
 typedef MVDependenciesData *MVDependencies;
 
+
+/* used to flag stats serialized to bytea */
+#define MVSTAT_MCV_MAGIC		0xE1A651C2		/* marks serialized bytea */
+#define MVSTAT_MCV_TYPE_BASIC	1				/* basic MCV list type */
+
+/* max items in MCV list (mostly arbitrary number */
+#define MVSTAT_MCVLIST_MAX_ITEMS	8192
+
+/*
+ * Multivariate MCV (most-common value) lists
+ *
+ * A straight-forward extension of MCV items - i.e. a list (array) of
+ * combinations of attribute values, together with a frequency and
+ * null flags.
+ */
+typedef struct MCVItemData
+{
+	double		frequency;		/* frequency of this combination */
+	bool	   *isnull;			/* lags of NULL values (up to 32 columns) */
+	Datum	   *values;			/* variable-length (ndimensions) */
+} MCVItemData;
+
+typedef MCVItemData *MCVItem;
+
+/* multivariate MCV list - essentally an array of MCV items */
+typedef struct MCVListData
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MCV list (BASIC) */
+	uint32		ndimensions;	/* number of dimensions */
+	uint32		nitems;			/* number of MCV items in the array */
+	MCVItem    *items;			/* array of MCV items */
+} MCVListData;
+
+typedef MCVListData *MCVList;
+
 bool dependency_implies_attribute(MVDependency dependency, AttrNumber attnum,
 								  int16 *attmap);
 bool dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
@@ -72,13 +116,30 @@ bool dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
 
 MVNDistinct		load_mv_ndistinct(Oid mvoid);
 MVDependencies	load_mv_dependencies(Oid mvoid);
+MCVList			load_mv_mcvlist(Oid mvoid);
 
 bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
 bytea *serialize_mv_dependencies(MVDependencies dependencies);
+bytea *serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
+							VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVNDistinct deserialize_mv_ndistinct(bytea *data);
 MVDependencies deserialize_mv_dependencies(bytea *data);
+MCVList deserialize_mv_mcvlist(bytea *data);
+
+/*
+ * Returns index of the attribute number within the vector (i.e. a
+ * dimension within the stats).
+ */
+int mv_get_index(AttrNumber varattno, int2vector *stakeys);
+
+int2vector *find_mv_attnums(Oid mvoid, Oid *relid);
+
+/* functions for inspecting the statistics */
+extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+
 
 MVNDistinct build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
 							   int2vector *attrs, VacAttrStats **stats);
@@ -87,6 +148,9 @@ MVDependencies build_mv_dependencies(int numrows, HeapTuple *rows,
 					  int2vector *attrs,
 					  VacAttrStats **stats);
 
+MCVList build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
+				 VacAttrStats **stats, int *numrows_filtered);
+
 void build_mv_stats(Relation onerel, double totalrows,
 			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats);
diff --git a/src/test/regress/expected/mv_mcv.out b/src/test/regress/expected/mv_mcv.out
new file mode 100644
index 0000000..d8ba619
--- /dev/null
+++ b/src/test/regress/expected/mv_mcv.out
@@ -0,0 +1,198 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s4 WITH (mcv) ON (unknown_column) FROM mcv_list;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s4 WITH (mcv) ON (a) FROM mcv_list;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s4 WITH (mcv) ON (a, a) FROM mcv_list;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s4 WITH (mcv) ON (a, a, b) FROM mcv_list;
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s4 WITH (unknown_option) ON (a, b, c) FROM mcv_list;
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s4 WITH (mcv) ON (a, b, c) FROM mcv_list;
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/100, i/200, i/400 FROM generate_series(1,10000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s5 WITH (mcv) ON (a, b, c) FROM mcv_list;
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | f         | 
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1000
+(1 row)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/100, i/200, i/400 FROM generate_series(1,10000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mcv_list;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/100 = 0 THEN NULL ELSE i/100 END),
+       (CASE WHEN i/200 = 0 THEN NULL ELSE i/200 END),
+       (CASE WHEN i/400 = 0 THEN NULL ELSE i/400 END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=100
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mcv_list
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on mcv_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mcv_list;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s6 WITH (mcv) ON (a, b, c, d) FROM mcv_list;
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+ mcv_enabled | mcv_built | pg_mv_stats_mcvlist_info 
+-------------+-----------+--------------------------
+ t           | t         | nitems=1200
+(1 row)
+
+DROP TABLE mcv_list;
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index db1cf8a..9969c10 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -819,11 +819,12 @@ WHERE c.castmethod = 'b' AND
  pg_node_tree      | text              |        0 | i
  pg_ndistinct      | bytea             |        0 | i
  pg_dependencies   | bytea             |        0 | i
+ pg_mcv_list       | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(9 rows)
+(10 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 39179a6..2e3c40e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1381,7 +1381,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     s.staname,
     s.stakeys AS attnums,
     length((s.standist)::bytea) AS ndistbytes,
-    length((s.stadeps)::bytea) AS depsbytes
+    length((s.stadeps)::bytea) AS depsbytes,
+    length((s.stamcv)::bytea) AS mcvbytes,
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index b0b40ca..dde15b9 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -72,8 +72,9 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
   194 | pg_node_tree
  3353 | pg_ndistinct
  3358 | pg_dependencies
+  441 | pg_mcv_list
   210 | smgr
-(4 rows)
+(5 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fda9166..d805840 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -118,4 +118,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_ndistinct mv_dependencies
+test: mv_ndistinct mv_dependencies mv_mcv
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 90d74d2..72c6acd 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -173,3 +173,4 @@ test: event_trigger
 test: stats
 test: mv_ndistinct
 test: mv_dependencies
+test: mv_mcv
diff --git a/src/test/regress/sql/mv_mcv.sql b/src/test/regress/sql/mv_mcv.sql
new file mode 100644
index 0000000..693288f
--- /dev/null
+++ b/src/test/regress/sql/mv_mcv.sql
@@ -0,0 +1,169 @@
+-- data type passed by value
+CREATE TABLE mcv_list (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s4 WITH (mcv) ON (unknown_column) FROM mcv_list;
+
+-- single column
+CREATE STATISTICS s4 WITH (mcv) ON (a) FROM mcv_list;
+
+-- single column, duplicated
+CREATE STATISTICS s4 WITH (mcv) ON (a, a) FROM mcv_list;
+
+-- two columns, one duplicated
+CREATE STATISTICS s4 WITH (mcv) ON (a, a, b) FROM mcv_list;
+
+-- unknown option
+CREATE STATISTICS s4 WITH (unknown_option) ON (a, b, c) FROM mcv_list;
+
+-- correct command
+CREATE STATISTICS s4 WITH (mcv) ON (a, b, c) FROM mcv_list;
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/100, i/200, i/400 FROM generate_series(1,10000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = 10 AND b = 5;
+
+DROP TABLE mcv_list;
+
+-- varlena type (text)
+CREATE TABLE mcv_list (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s5 WITH (mcv) ON (a, b, c) FROM mcv_list;
+
+-- random data
+INSERT INTO mcv_list
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c, b => c
+INSERT INTO mcv_list
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- a => b, a => c
+INSERT INTO mcv_list
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mcv_list
+     SELECT i/100, i/200, i/400 FROM generate_series(1,10000) s(i);
+CREATE INDEX mcv_idx ON mcv_list (a, b);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a = '10' AND b = '5';
+
+TRUNCATE mcv_list;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mcv_list
+     SELECT
+       (CASE WHEN i/100 = 0 THEN NULL ELSE i/100 END),
+       (CASE WHEN i/200 = 0 THEN NULL ELSE i/200 END),
+       (CASE WHEN i/400 = 0 THEN NULL ELSE i/400 END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mcv_list WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mcv_list;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mcv_list (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s6 WITH (mcv) ON (a, b, c, d) FROM mcv_list;
+
+INSERT INTO mcv_list
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mcv_list;
+
+SELECT mcv_enabled, mcv_built, pg_mv_stats_mcvlist_info(stamcv)
+  FROM pg_mv_statistic WHERE starelid = 'mcv_list'::regclass;
+
+DROP TABLE mcv_list;
-- 
2.5.5

0006-PATCH-multivariate-histograms-v23.patchbinary/octet-stream; name=0006-PATCH-multivariate-histograms-v23.patchDownload

From 83696f24eceb2c9d7a71ffb74171c30bc0c3727a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Sun, 23 Oct 2016 17:38:35 +0200
Subject: [PATCH 6/9] PATCH: multivariate histograms

- extends the pg_mv_statistic catalog (add 'hist' fields)
- building the histograms during ANALYZE
- simple estimation while planning the queries
- pg_histogram data type (varlena-based)

Includes regression tests mostly equal to those for functional
dependencies / MCV lists.

A new varlena-based data type for storing serialized histograms.
---
 doc/src/sgml/catalogs.sgml                 |   30 +
 doc/src/sgml/planstats.sgml                |  125 ++
 doc/src/sgml/ref/create_statistics.sgml    |   35 +
 src/backend/catalog/system_views.sql       |    4 +-
 src/backend/commands/statscmds.c           |   11 +-
 src/backend/nodes/outfuncs.c               |    2 +
 src/backend/optimizer/path/clausesel.c     |  606 +++++++-
 src/backend/optimizer/util/plancat.c       |    4 +-
 src/backend/utils/mvstats/Makefile         |    2 +-
 src/backend/utils/mvstats/README.histogram |  299 ++++
 src/backend/utils/mvstats/README.stats     |    2 +
 src/backend/utils/mvstats/common.c         |   32 +-
 src/backend/utils/mvstats/common.h         |    8 +-
 src/backend/utils/mvstats/histogram.c      | 2123 ++++++++++++++++++++++++++++
 src/bin/psql/describe.c                    |   15 +-
 src/include/catalog/pg_cast.h              |    3 +
 src/include/catalog/pg_mv_statistic.h      |   22 +-
 src/include/catalog/pg_proc.h              |   13 +
 src/include/catalog/pg_type.h              |    4 +
 src/include/nodes/relation.h               |    2 +
 src/include/utils/builtins.h               |    4 +
 src/include/utils/mvstats.h                |  125 +-
 src/test/regress/expected/mv_histogram.out |  198 +++
 src/test/regress/expected/opr_sanity.out   |    3 +-
 src/test/regress/expected/rules.out        |    4 +-
 src/test/regress/expected/type_sanity.out  |    3 +-
 src/test/regress/parallel_schedule         |    2 +-
 src/test/regress/serial_schedule           |    1 +
 src/test/regress/sql/mv_histogram.sql      |  167 +++
 29 files changed, 3801 insertions(+), 48 deletions(-)
 create mode 100644 src/backend/utils/mvstats/README.histogram
 create mode 100644 src/backend/utils/mvstats/histogram.c
 create mode 100644 src/test/regress/expected/mv_histogram.out
 create mode 100644 src/test/regress/sql/mv_histogram.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index bca03e9..be34e24 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4307,6 +4307,17 @@
      </row>
 
      <row>
+      <entry><structfield>hist_enabled</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, histogram will be computed for the combination of columns,
+       covered by the statistics. This does not mean the histogram is already
+       computed, though.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>ndist_built</structfield></entry>
       <entry><type>bool</type></entry>
       <entry></entry>
@@ -4337,6 +4348,16 @@
      </row>
 
      <row>
+      <entry><structfield>hist_built</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       If true, histogram is already computed and available for use during query
+       estimation.
+      </entry>
+     </row>
+
+     <row>
       <entry><structfield>stakeys</structfield></entry>
       <entry><type>int2vector</type></entry>
       <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
@@ -4374,6 +4395,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stahist</structfield></entry>
+      <entry><type>pg_histogram</type></entry>
+      <entry></entry>
+      <entry>
+       Histogram, serialized as <structname>pg_histogram</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 57f9441..2896b04 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -914,6 +914,131 @@ EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
 
   </sect2>
 
+  <sect2 id="mv-histograms">
+   <title>Histograms</title>
+
+   <para>
+    <acronym>MCV</> lists, introduced in the previous section, work very well
+    for low-cardinality columns (i.e. columns with only very few distinct
+    values), and for columns with a few very frequent values (and possibly
+    many rare ones). Histograms, a generalization of per-column histograms
+    briefly described in <xref linkend="row-estimation-examples">, are meant
+    to address the other cases, i.e. high-cardinality columns, particularly
+    when there are no frequent values.
+   </para>
+
+   <para>
+    Although the example data we've used so far is not a very good match, we
+    can try creating a histogram instead of the <acronym>MCV</> list. With the
+    histogram in place, you may get a plan like this:
+
+<programlisting>
+DROP STATISTICS s2;
+CREATE STATISTICS s3 ON t (a,b) WITH (histogram);
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.035..2.967 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.227 ms
+ Execution time: 3.189 ms
+(5 rows)
+</programlisting>
+
+    Which seems quite accurate, however for other combinations of values the
+    results may be much worse, as illustrated by the following query
+
+<programlisting>
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=2.771..2.771 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.179 ms
+ Execution time: 2.812 ms
+(5 rows)
+</programlisting>
+
+    This is due to histograms tracking ranges of values, not individual values.
+    That means it's only possible say whether a bucket may contain items
+    matching the conditions, but it's unclear how many such tuples there
+    actually are in the bucket. Moreover, for larger tables only a small subset
+    of rows gets sampled by <command>ANALYZE</>, causing small variations in
+    the shape of buckets.
+   </para>
+
+   <para>
+    To inspect details of the histogram, we can look into the
+    <structname>pg_mv_stats</> view
+
+<programlisting>
+SELECT tablename, staname, attnums, histbytes, histinfo
+  FROM pg_mv_stats WHERE staname = 's3';
+ tablename | staname | attnums | histbytes |  histinfo   
+-----------+---------+---------+-----------+-------------
+ t         | s3      | 1 2     |      1928 | nbuckets=64
+(1 row)
+</programlisting>
+
+    This shows the histogram has 64 buckets, but as we know there are 100
+    distinct combinations of values in the two columns. This means there are
+    buckets containing multiple combinations, causing the inaccuracy.
+   </para>
+
+   <para>
+    Similarly to <acronym>MCV</> lists, we can inspect histogram contents
+    using a function called <function>pg_mv_histogram_buckets</>.
+
+<programlisting>
+test=# SELECT * FROM pg_mv_histogram_buckets((SELECT oid FROM pg_mv_statistic WHERE staname = 's3'), 0);
+ index | minvals | maxvals | nullsonly | mininclusive | maxinclusive | frequency | density  | bucket_volume 
+-------+---------+---------+-----------+--------------+--------------+-----------+----------+---------------
+     0 | {0,0}   | {3,1}   | {f,f}     | {t,t}        | {f,f}        |      0.01 |     1.68 |      0.005952
+     1 | {50,0}  | {51,3}  | {f,f}     | {t,t}        | {f,f}        |      0.01 |     1.12 |      0.008929
+     2 | {0,25}  | {26,31} | {f,f}     | {t,t}        | {f,f}        |      0.01 |     0.28 |      0.035714
+...
+    61 | {60,0}  | {99,12} | {f,f}     | {t,t}        | {t,f}        |      0.02 | 0.124444 |      0.160714
+    62 | {34,35} | {37,49} | {f,f}     | {t,t}        | {t,t}        |      0.02 |     0.96 |      0.020833
+    63 | {84,35} | {87,49} | {f,f}     | {t,t}        | {t,t}        |      0.02 |     0.96 |      0.020833
+(64 rows)
+</programlisting>
+
+    Which confirms there are 64 buckets, with frequencies ranging between 1%
+    and 2%. The <structfield>minvals</> and <structfield>maxvals</> show the
+    bucket boundaries, <structfield>nullsonly</> shows which columns contain
+    only null values (in the given bucket).
+   </para>
+
+   <para>
+    Similarly to <acronym>MCV</> lists, the planner applies all conditions to
+    the buckets, and sums the frequencies of the matching ones. For details,
+    see <function>clauselist_mv_selectivity_histogram</> function in
+    <filename>clausesel.c</>.
+   </para>
+
+   <para>
+    It's also possible to build <acronym>MCV</> lists and a histogram, in which
+    case <command>ANALYZE</> will build a <acronym>MCV</> lists with the most
+    frequent values, and a histogram on the remaining part of the sample.
+
+<programlisting>
+DROP STATISTICS s3;
+CREATE STATISTICS s4 ON t (a,b) WITH (mcv, histogram);
+</programlisting>
+
+    In this case the <acronym>MCV</> list and histogram are treated as a single
+    composed statistics.
+   </para>
+
+   <para>
+    For additional information about multivariate histograms, see
+    <filename>src/backend/utils/mvstats/README.histogram</>.
+   </para>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index e95d8d3..de419d2 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -124,6 +124,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
    </varlistentry>
 
    <varlistentry>
+    <term><literal>histogram</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables histogram for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>mcv</> (<type>boolean</>)</term>
     <listitem>
      <para>
@@ -201,6 +210,32 @@ EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
 </programlisting>
   </para>
 
+  <para>
+   Create table <structname>t3</> with two strongly correlated columns, and
+   a histogram on those two columns:
+
+<programlisting>
+CREATE TABLE t3 (
+    a   float,
+    b   float
+);
+
+INSERT INTO t3 SELECT mod(i,1000), mod(i,1000) + 50 * (r - 0.5) FROM (
+                   SELECT i, random() r FROM generate_series(1,1000000) s(i)
+                 ) foo;
+
+CREATE STATISTICS s3 WITH (histogram) ON (a, b) FROM t3;
+
+ANALYZE t3;
+
+-- small overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
+
+-- no overlap
+EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);
+</programlisting>
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d4d9c24..2501455 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -190,7 +190,9 @@ CREATE VIEW pg_mv_stats AS
         length(s.standist::bytea) AS ndistbytes,
         length(S.stadeps::bytea) AS depsbytes,
         length(S.stamcv::bytea) AS mcvbytes,
-        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo
+        pg_mv_stats_mcvlist_info(S.stamcv) AS mcvinfo,
+        length(S.stahist::bytea) AS histbytes,
+        pg_mv_stats_histogram_info(S.stahist) AS histinfo
     FROM (pg_mv_statistic S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index ef05745..2e91b0c 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -71,7 +71,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 	/* by default build nothing */
 	bool		build_ndistinct = false,
 				build_dependencies = false,
-				build_mcv = false;
+				build_mcv = false,
+				build_histogram = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
@@ -172,6 +173,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 			build_dependencies = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "mcv") == 0)
 			build_mcv = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "histogram") == 0)
+			build_histogram = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -180,10 +183,10 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/* Make sure there's at least one statistics type specified. */
-	if (!(build_ndistinct || build_dependencies || build_mcv))
+	if (!(build_ndistinct || build_dependencies || build_mcv || build_histogram))
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("no statistics type (ndistinct, dependencies, mcv) requested")));
+				 errmsg("no statistics type (ndistinct, dependencies, mcv, histogram) requested")));
 
 	stakeys = buildint2vector(attnums, numcols);
 
@@ -207,10 +210,12 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_mv_statistic_ndist_enabled - 1] = BoolGetDatum(build_ndistinct);
 	values[Anum_pg_mv_statistic_deps_enabled - 1] = BoolGetDatum(build_dependencies);
 	values[Anum_pg_mv_statistic_mcv_enabled - 1] = BoolGetDatum(build_mcv);
+	values[Anum_pg_mv_statistic_hist_enabled - 1] = BoolGetDatum(build_histogram);
 
 	nulls[Anum_pg_mv_statistic_standist - 1] = true;
 	nulls[Anum_pg_mv_statistic_stadeps - 1] = true;
 	nulls[Anum_pg_mv_statistic_stamcv - 1] = true;
+	nulls[Anum_pg_mv_statistic_stahist - 1] = true;
 
 	/* insert the tuple into pg_mv_statistic */
 	mvstatrel = heap_open(MvStatisticRelationId, RowExclusiveLock);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a9cc9ad..27dbe76 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2204,11 +2204,13 @@ _outMVStatisticInfo(StringInfo str, const MVStatisticInfo *node)
 	WRITE_BOOL_FIELD(ndist_enabled);
 	WRITE_BOOL_FIELD(deps_enabled);
 	WRITE_BOOL_FIELD(mcv_enabled);
+	WRITE_BOOL_FIELD(hist_enabled);
 
 	/* built/available statistics */
 	WRITE_BOOL_FIELD(ndist_built);
 	WRITE_BOOL_FIELD(deps_built);
 	WRITE_BOOL_FIELD(mcv_built);
+	WRITE_BOOL_FIELD(hist_built);
 }
 
 static void
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index abdbc5b..fddbcc4 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -49,6 +49,7 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 #define		STATS_TYPE_FDEPS	0x01
 #define		STATS_TYPE_MCV		0x02
+#define		STATS_TYPE_HIST		0x04
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 						int type);
@@ -77,12 +78,21 @@ static Selectivity clauselist_mv_selectivity_mcvlist(PlannerInfo *root,
 								  List *clauses, MVStatisticInfo *mvstats,
 								  bool *fullmatch, Selectivity *lowsel);
 
+static Selectivity clauselist_mv_selectivity_histogram(PlannerInfo *root,
+									List *clauses, MVStatisticInfo *mvstats);
+
 static int update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 							int2vector *stakeys, MCVList mcvlist,
 							int nmatches, char *matches,
 							Selectivity *lowsel, bool *fullmatch,
 							bool is_or);
 
+static int update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char *matches,
+							  bool is_or);
+
 static bool has_stats(List *stats, int type);
 
 static List *find_stats(PlannerInfo *root, Index relid);
@@ -93,6 +103,7 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
 #define UPDATE_RESULT(m,r,isor) \
 	(m) = (isor) ? (Max(m,r)) : (Min(m,r))
 
+
 /****************************************************************************
  *		ROUTINES TO COMPUTE SELECTIVITIES
  ****************************************************************************/
@@ -121,7 +132,7 @@ static bool stats_type_matches(MVStatisticInfo *stat, int type);
  *
  * First we try to reduce the list of clauses by applying (soft) functional
  * dependencies, and then we try to estimate the selectivity of the reduced
- * list of clauses using the multivariate MCV list.
+ * list of clauses using the multivariate MCV list and histograms.
  *
  * Finally we remove the portion of clauses estimated using multivariate stats,
  * and process the rest of the clauses using the regular per-column stats.
@@ -208,16 +219,17 @@ clauselist_selectivity(PlannerInfo *root,
 	 * If there are no such stats or not enough attributes, don't waste time
 	 * simply skip to estimation using the plain per-column stats.
 	 */
-	if (has_stats(stats, STATS_TYPE_MCV) &&
-		(count_mv_attnums(clauses, relid, STATS_TYPE_MCV) >= 2))
+	if (has_stats(stats, STATS_TYPE_MCV | STATS_TYPE_HIST) &&
+		(count_mv_attnums(clauses, relid,
+						  STATS_TYPE_MCV | STATS_TYPE_HIST) >= 2))
 	{
 		/* collect attributes from the compatible conditions */
 		Bitmapset  *mvattnums = collect_mv_attnums(clauses, relid,
-												   STATS_TYPE_MCV);
+									  STATS_TYPE_MCV | STATS_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
 		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums,
-													   STATS_TYPE_MCV);
+									  STATS_TYPE_MCV | STATS_TYPE_HIST);
 
 		if (mvstat != NULL)		/* we have a matching stats */
 		{
@@ -226,7 +238,7 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* split the clauselist into regular and mv-clauses */
 			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
-										  mvstat, STATS_TYPE_MCV);
+							  mvstat, STATS_TYPE_MCV | STATS_TYPE_HIST);
 
 			/* we've chosen the histogram to match the clauses */
 			Assert(mvclauses != NIL);
@@ -1178,6 +1190,8 @@ static Selectivity
 clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
 {
 	bool		fullmatch = false;
+	Selectivity s1 = 0.0,
+				s2 = 0.0;
 
 	/*
 	 * Lowest frequency in the MCV list (may be used as an upper bound for
@@ -1191,9 +1205,26 @@ clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvs
 	 * order by selectivity (to optimize the MCV/histogram evaluation).
 	 */
 
-	/* Evaluate the MCV selectivity */
-	return clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
-											 &fullmatch, &mcv_low);
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and the
+	 * estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/*
+	 * TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 * selectivity as upper bound
+	 */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
 }
 
 /*
@@ -1379,7 +1410,8 @@ choose_mv_statistics(List *stats, Bitmapset *attnums, int types)
 
 		/* skip statistics not matching any of the requested types */
 		if (! ((info->deps_built && (STATS_TYPE_FDEPS & types)) ||
-			   (info->mcv_built && (STATS_TYPE_MCV & types))))
+			   (info->mcv_built && (STATS_TYPE_MCV & types)) ||
+			   (info->hist_built && (STATS_TYPE_HIST & types))))
 			continue;
 
 		/* count columns covered by the statistics */
@@ -1609,7 +1641,7 @@ mv_compatible_walker(Node *node, mv_compatible_context *context)
 			case F_SCALARGTSEL:
 
 				/* not compatible with functional dependencies */
-				if (!(context->types & STATS_TYPE_MCV))
+				if (!(context->types & (STATS_TYPE_MCV | STATS_TYPE_HIST)))
 					return true;	/* terminate */
 
 				break;
@@ -1677,6 +1709,9 @@ stats_type_matches(MVStatisticInfo *stat, int type)
 	if ((type & STATS_TYPE_MCV) && stat->mcv_built)
 		return true;
 
+	if ((type & STATS_TYPE_HIST) && stat->hist_built)
+		return true;
+
 	return false;
 }
 
@@ -1695,6 +1730,9 @@ has_stats(List *stats, int type)
 		/* terminate if we've found at least one matching statistics */
 		if (stats_type_matches(stat, type))
 			return true;
+
+		if ((type & STATS_TYPE_HIST) && stat->hist_built)
+			return true;
 	}
 
 	return false;
@@ -1725,12 +1763,12 @@ find_stats(PlannerInfo *root, Index relid)
  *
  * The algorithm works like this:
  *
- *	 1) mark all items as 'match'
- *	 2) walk through all the clauses
- *	 3) for a particular clause, walk through all the items
- *	 4) skip items that are already 'no match'
- *	 5) check clause for items that still match
- *	 6) sum frequencies for items to get selectivity
+ * 1) mark all items as 'match'
+ * 2) walk through all the clauses
+ * 3) for a particular clause, walk through all the items
+ * 4) skip items that are already 'no match'
+ * 5) check clause for items that still match
+ * 6) sum frequencies for items to get selectivity
  *
  * The function also returns the frequency of the least frequent item
  * on the MCV list, which may be useful for clamping estimate from the
@@ -2116,3 +2154,537 @@ update_match_bitmap_mcvlist(PlannerInfo *root, List *clauses,
 
 	return nmatches;
 }
+
+/*
+ * Estimate selectivity of clauses using a histogram.
+ *
+ * If there's no histogram for the stats, the function returns 0.0.
+ *
+ * The general idea of this method is similar to how MCV lists are
+ * processed, except that this introduces the concept of a partial
+ * match (MCV only works with full match / mismatch).
+ *
+ * The algorithm works like this:
+ *
+ *	 1) mark all buckets as 'full match'
+ *	 2) walk through all the clauses
+ *	 3) for a particular clause, walk through all the buckets
+ *	 4) skip buckets that are already 'no match'
+ *	 5) check clause for buckets that still match (at least partially)
+ *	 6) sum frequencies for buckets to get selectivity
+ *
+ * Unlike MCV lists, histograms have a concept of a partial match. In
+ * that case we use 1/2 the bucket, to minimize the average error. The
+ * MV histograms are usually less detailed than the per-column ones,
+ * meaning the sum is often quite high (thanks to combining a lot of
+ * "partially hit" buckets).
+ *
+ * Maybe we could use per-bucket information with number of distinct
+ * values it contains (for each dimension), and then use that to correct
+ * the estimate (so with 10 distinct values, we'd use 1/10 of the bucket
+ * frequency). We might also scale the value depending on the actual
+ * ndistinct estimate (not just the values observed in the sample).
+ *
+ * Another option would be to multiply the selectivities, i.e. if we get
+ * 'partial match' for a bucket for multiple conditions, we might use
+ * 0.5^k (where k is the number of conditions), instead of 0.5. This
+ * probably does not minimize the average error, though.
+ *
+ * TODO: This might use a similar shortcut to MCV lists - count buckets
+ * marked as partial/full match, and terminate once this drop to 0.
+ * Not sure if it's really worth it - for MCV lists a situation like
+ * this is not uncommon, but for histograms it's not that clear.
+ */
+static Selectivity
+clauselist_mv_selectivity_histogram(PlannerInfo *root, List *clauses,
+									MVStatisticInfo *mvstats)
+{
+	int			i;
+	Selectivity s = 0.0;
+	Selectivity u = 0.0;
+
+	int			nmatches = 0;
+	char	   *matches = NULL;
+
+	MVSerializedHistogram mvhist = NULL;
+
+	/* there's no histogram */
+	if (!mvstats->hist_built)
+		return 0.0;
+
+	/* There may be no histogram in the stats (check hist_built flag) */
+	mvhist = load_mv_histogram(mvstats->mvoid);
+
+	Assert(mvhist != NULL);
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 2);
+
+	/*
+	 * Bitmap of bucket matches (mismatch, partial, full). by default all
+	 * buckets fully match (and we'll eliminate them).
+	 */
+	matches = palloc0(sizeof(char) * mvhist->nbuckets);
+	memset(matches, MVSTATS_MATCH_FULL, sizeof(char) * mvhist->nbuckets);
+
+	nmatches = mvhist->nbuckets;
+
+	/* build the match bitmap */
+	update_match_bitmap_histogram(root, clauses,
+								  mvstats->stakeys, mvhist,
+								  nmatches, matches, false);
+
+	/* now, walk through the buckets and sum the selectivities */
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		/*
+		 * Find out what part of the data is covered by the histogram, so that
+		 * we can 'scale' the selectivity properly (e.g. when only 50% of the
+		 * sample got into the histogram, and the rest is in a MCV list).
+		 *
+		 * TODO This might be handled by keeping a global "frequency" for the
+		 * whole histogram, which might save us some time spent accessing the
+		 * not-matching part of the histogram. Although it's likely in a
+		 * cache, so it's very fast.
+		 */
+		u += mvhist->buckets[i]->ntuples;
+
+		if (matches[i] == MVSTATS_MATCH_FULL)
+			s += mvhist->buckets[i]->ntuples;
+		else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+			s += 0.5 * mvhist->buckets[i]->ntuples;
+	}
+
+#ifdef DEBUG_MVHIST
+	debug_histogram_matches(mvhist, matches);
+#endif
+
+	/* release the allocated bitmap and deserialized histogram */
+	pfree(matches);
+	pfree(mvhist);
+
+	return s * u;
+}
+
+/* cached result of bucket boundary comparison for a single dimension */
+
+#define HIST_CACHE_NOT_FOUND		0x00
+#define HIST_CACHE_FALSE			0x01
+#define HIST_CACHE_TRUE				0x03
+#define HIST_CACHE_MASK				0x02
+
+static char
+bucket_contains_value(FmgrInfo ltproc, Datum constvalue,
+					  Datum min_value, Datum max_value,
+					  int min_index, int max_index,
+					  bool min_include, bool max_include,
+					  char *callcache)
+{
+	bool		a,
+				b;
+
+	char		min_cached = callcache[min_index];
+	char		max_cached = callcache[max_index];
+
+	/*
+	 * First some quick checks on equality - if any of the boundaries equals,
+	 * we have a partial match (so no need to call the comparator).
+	 */
+	if (((min_value == constvalue) && (min_include)) ||
+		((max_value == constvalue) && (max_include)))
+		return MVSTATS_MATCH_PARTIAL;
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	/*
+	 * If result for the bucket lower bound not in cache, evaluate the
+	 * function and store the result in the cache.
+	 */
+	if (!min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, min_value));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/* And do the same for the upper bound. */
+	if (!max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&ltproc,
+										   DEFAULT_COLLATION_OID,
+										   constvalue, max_value));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	return (a ^ b) ? MVSTATS_MATCH_PARTIAL : MVSTATS_MATCH_NONE;
+}
+
+static char
+bucket_is_smaller_than_value(FmgrInfo opproc, Datum constvalue,
+							 Datum min_value, Datum max_value,
+							 int min_index, int max_index,
+							 bool min_include, bool max_include,
+							 char *callcache, bool isgt)
+{
+	char		min_cached = callcache[min_index];
+	char		max_cached = callcache[max_index];
+
+	/* Keep the values 0/1 because of the XOR at the end. */
+	bool		a = ((min_cached & HIST_CACHE_MASK) >> 1);
+	bool		b = ((max_cached & HIST_CACHE_MASK) >> 1);
+
+	if (!min_cached)
+	{
+		a = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   min_value,
+										   constvalue));
+		/* remember the result */
+		callcache[min_index] = (a) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	if (!max_cached)
+	{
+		b = DatumGetBool(FunctionCall2Coll(&opproc,
+										   DEFAULT_COLLATION_OID,
+										   max_value,
+										   constvalue));
+		/* remember the result */
+		callcache[max_index] = (b) ? HIST_CACHE_TRUE : HIST_CACHE_FALSE;
+	}
+
+	/*
+	 * Now, we need to combine both results into the final answer, and we need
+	 * to be careful about the 'isgt' variable which kinda inverts the
+	 * meaning.
+	 *
+	 * First, we handle the case when each boundary returns different results.
+	 * In that case the outcome can only be 'partial' match.
+	 */
+	if (a != b)
+		return MVSTATS_MATCH_PARTIAL;
+
+	/*
+	 * When the results are the same, then it depends on the 'isgt' value.
+	 * There are four options:
+	 *
+	 * isgt=false a=b=true	=> full match isgt=false a=b=false => empty
+	 * isgt=true  a=b=true	=> empty isgt=true	a=b=false => full match
+	 *
+	 * We'll cheat a bit, because we know that (a=b) so we'll use just one of
+	 * them.
+	 */
+	if (isgt)
+		return (!a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+	else
+		return (a) ? MVSTATS_MATCH_FULL : MVSTATS_MATCH_NONE;
+}
+
+/*
+ * Evaluate clauses using the histogram, and update the match bitmap.
+ *
+ * The bitmap may be already partially set, so this is really a way to
+ * combine results of several clause lists - either when computing
+ * conditional probability P(A|B) or a combination of AND/OR clauses.
+ *
+ * Note: This is not a simple bitmap in the sense that there are more
+ * than two possible values for each item - no match, partial
+ * match and full match. So we need 2 bits per item.
+ *
+ * TODO: This works with 'bitmap' where each item is represented as a
+ * char, which is slightly wasteful. Instead, we could use a bitmap
+ * with 2 bits per item, reducing the size to ~1/4. By using values
+ * 0, 1 and 3 (instead of 0, 1 and 2), the operations (merging etc.)
+ * might be performed just like for simple bitmap by using & and |,
+ * which might be faster than min/max.
+ */
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+							  int2vector *stakeys,
+							  MVSerializedHistogram mvhist,
+							  int nmatches, char *matches,
+							  bool is_or)
+{
+	int			i;
+	ListCell   *l;
+
+	/*
+	 * Used for caching function calls, only once per deduplicated value.
+	 *
+	 * We know may have up to (2 * nbuckets) values per dimension. It's
+	 * probably overkill, but let's allocate that once for all clauses, to
+	 * minimize overhead.
+	 *
+	 * Also, we only need two bits per value, but this allocates byte per
+	 * value. Might be worth optimizing.
+	 *
+	 * 0x00 - not yet called 0x01 - called, result is 'false' 0x03 - called,
+	 * result is 'true'
+	 */
+	char	   *callcache = palloc(mvhist->nbuckets);
+
+	Assert(mvhist != NULL);
+	Assert(mvhist->nbuckets > 0);
+	Assert(nmatches >= 0);
+	Assert(nmatches <= mvhist->nbuckets);
+
+	Assert(clauses != NIL);
+	Assert(list_length(clauses) >= 1);
+
+	/* loop through the clauses and do the estimation */
+	foreach(l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* if it's a RestrictInfo, then extract the clause */
+		if (IsA(clause, RestrictInfo))
+			clause = (Node *) ((RestrictInfo *) clause)->clause;
+
+		/* it's either OpClause, or NullTest */
+		if (is_opclause(clause))
+		{
+			OpExpr	   *expr = (OpExpr *) clause;
+			bool		varonleft = true;
+			bool		ok;
+
+			FmgrInfo	opproc; /* operator */
+
+			fmgr_info(get_opcode(expr->opno), &opproc);
+
+			/* reset the cache (per clause) */
+			memset(callcache, 0, mvhist->nbuckets);
+
+			ok = (NumRelids(clause) == 1) &&
+				(is_pseudo_constant_clause(lsecond(expr->args)) ||
+				 (varonleft = false,
+				  is_pseudo_constant_clause(linitial(expr->args))));
+
+			if (ok)
+			{
+				FmgrInfo	ltproc;
+				RegProcedure oprrest = get_oprrest(expr->opno);
+
+				Var		   *var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+				Const	   *cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
+				bool		isgt = (!varonleft);
+
+				TypeCacheEntry *typecache
+				= lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+
+				/* lookup dimension for the attribute */
+				int			idx = mv_get_index(var->varattno, stakeys);
+
+				fmgr_info(get_opcode(typecache->lt_opr), &ltproc);
+
+				/*
+				 * Check this for all buckets that still have "true" in the
+				 * bitmap
+				 *
+				 * We already know the clauses use suitable operators (because
+				 * that's how we filtered them).
+				 */
+				for (i = 0; i < mvhist->nbuckets; i++)
+				{
+					char		res = MVSTATS_MATCH_NONE;
+
+					MVSerializedBucket bucket = mvhist->buckets[i];
+
+					/* histogram boundaries */
+					Datum		minval,
+								maxval;
+					bool		mininclude,
+								maxinclude;
+					int			minidx,
+								maxidx;
+
+					/*
+					 * For AND-lists, we can also mark NULL buckets as 'no
+					 * match' (and then skip them). For OR-lists this is not
+					 * possible.
+					 */
+					if ((!is_or) && bucket->nullsonly[idx])
+						matches[i] = MVSTATS_MATCH_NONE;
+
+					/*
+					 * Skip buckets that were already eliminated - this is
+					 * impotant considering how we update the info (we only
+					 * lower the match). We can't really do anything about the
+					 * MATCH_PARTIAL buckets.
+					 */
+					if ((!is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+						continue;
+					else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+						continue;
+
+					/* lookup the values and cache of function calls */
+					minidx = bucket->min[idx];
+					maxidx = bucket->max[idx];
+
+					minval = mvhist->values[idx][bucket->min[idx]];
+					maxval = mvhist->values[idx][bucket->max[idx]];
+
+					mininclude = bucket->min_inclusive[idx];
+					maxinclude = bucket->max_inclusive[idx];
+
+					/*
+					 * TODO Maybe it's possible to add here a similar
+					 * optimization as for the MCV lists:
+					 *
+					 * (nmatches == 0) && AND-list => all eliminated (FALSE)
+					 * (nmatches == N) && OR-list  => all eliminated (TRUE)
+					 *
+					 * But it's more complex because of the partial matches.
+					 */
+
+					/*
+					 * If it's not a "<" or ">" or "=" operator, just ignore
+					 * the clause. Otherwise note the relid and attnum for the
+					 * variable.
+					 *
+					 * TODO I'm really unsure the handling of 'isgt' flag
+					 * (that is, clauses with reverse order of
+					 * variable/constant) is correct. I wouldn't be surprised
+					 * if there was some mixup. Using the lt/gt operators
+					 * instead of messing with the opproc could make it
+					 * simpler. It would however be using a different operator
+					 * than the query, although it's not any shadier than
+					 * using the selectivity function as is done currently.
+					 */
+					switch (oprrest)
+					{
+						case F_SCALARLTSEL:		/* Var < Const */
+						case F_SCALARGTSEL:		/* Var > Const */
+
+							res = bucket_is_smaller_than_value(opproc, cst->constvalue,
+															   minval, maxval,
+															   minidx, maxidx,
+													  mininclude, maxinclude,
+															callcache, isgt);
+							break;
+
+						case F_EQSEL:
+
+							/*
+							 * We only check whether the value is within the
+							 * bucket, using the lt operator, and we also
+							 * check for equality with the boundaries.
+							 */
+
+							res = bucket_contains_value(ltproc, cst->constvalue,
+														minval, maxval,
+														minidx, maxidx,
+													  mininclude, maxinclude,
+														callcache);
+							break;
+					}
+
+					UPDATE_RESULT(matches[i], res, is_or);
+
+				}
+			}
+		}
+		else if (IsA(clause, NullTest))
+		{
+			NullTest   *expr = (NullTest *) clause;
+			Var		   *var = (Var *) (expr->arg);
+
+			/* FIXME proper matching attribute to dimension */
+			int			idx = mv_get_index(var->varattno, stakeys);
+
+			/*
+			 * Walk through the buckets and evaluate the current clause. We
+			 * can skip items that were already ruled out, and terminate if
+			 * there are no remaining buckets that might possibly match.
+			 */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				MVSerializedBucket bucket = mvhist->buckets[i];
+
+				/*
+				 * Skip buckets that were already eliminated - this is
+				 * impotant considering how we update the info (we only lower
+				 * the match)
+				 */
+				if ((!is_or) && (matches[i] == MVSTATS_MATCH_NONE))
+					continue;
+				else if (is_or && (matches[i] == MVSTATS_MATCH_FULL))
+					continue;
+
+				/* if the clause mismatches the bucket, set it as MATCH_NONE */
+				if ((expr->nulltesttype == IS_NULL)
+					&& (!bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+
+				else if ((expr->nulltesttype == IS_NOT_NULL) &&
+						 (bucket->nullsonly[idx]))
+					UPDATE_RESULT(matches[i], MVSTATS_MATCH_NONE, is_or);
+			}
+		}
+		else if (or_clause(clause) || and_clause(clause))
+		{
+			/*
+			 * AND/OR clause, with all clauses compatible with the selected MV
+			 * stat
+			 */
+
+			int			i;
+			BoolExpr   *orclause = ((BoolExpr *) clause);
+			List	   *orclauses = orclause->args;
+
+			/* match/mismatch bitmap for each bucket */
+			int			or_nmatches = 0;
+			char	   *or_matches = NULL;
+
+			Assert(orclauses != NIL);
+			Assert(list_length(orclauses) >= 2);
+
+			/* number of matching buckets */
+			or_nmatches = mvhist->nbuckets;
+
+			/* by default none of the buckets matches the clauses */
+			or_matches = palloc0(sizeof(char) * or_nmatches);
+
+			if (or_clause(clause))
+			{
+				/* OR clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_NONE, sizeof(char) * or_nmatches);
+				or_nmatches = 0;
+			}
+			else
+			{
+				/* AND clauses assume nothing matches, initially */
+				memset(or_matches, MVSTATS_MATCH_FULL, sizeof(char) * or_nmatches);
+			}
+
+			/* build the match bitmap for the OR-clauses */
+			or_nmatches = update_match_bitmap_histogram(root, orclauses,
+														stakeys, mvhist,
+								 or_nmatches, or_matches, or_clause(clause));
+
+			/* merge the bitmap into the existing one */
+			for (i = 0; i < mvhist->nbuckets; i++)
+			{
+				/*
+				 * Merge the result into the bitmap (Min for AND, Max for OR).
+				 *
+				 * FIXME this does not decrease the number of matches
+				 */
+				UPDATE_RESULT(matches[i], or_matches[i], is_or);
+			}
+
+			pfree(or_matches);
+
+		}
+		else
+			elog(ERROR, "unknown clause type: %d", clause->type);
+	}
+
+	/* free the call cache */
+	pfree(callcache);
+
+	return nmatches;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9dd4e83..c804e13 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1287,7 +1287,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 		mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
 
 		/* unavailable stats are not interesting for the planner */
-		if (mvstat->deps_built || mvstat->ndist_built || mvstat->mcv_built)
+		if (mvstat->deps_built || mvstat->ndist_built || mvstat->mcv_built || mvstat->hist_built)
 		{
 			info = makeNode(MVStatisticInfo);
 
@@ -1298,11 +1298,13 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			info->ndist_enabled = mvstat->ndist_enabled;
 			info->deps_enabled = mvstat->deps_enabled;
 			info->mcv_enabled = mvstat->mcv_enabled;
+			info->hist_enabled = mvstat->hist_enabled;
 
 			/* built/available statistics */
 			info->ndist_built = mvstat->ndist_built;
 			info->deps_built = mvstat->deps_built;
 			info->mcv_built = mvstat->mcv_built;
+			info->hist_built = mvstat->hist_built;
 
 			/* stakeys */
 			adatum = SysCacheGetAttr(MVSTATOID, htup,
diff --git a/src/backend/utils/mvstats/Makefile b/src/backend/utils/mvstats/Makefile
index d5d47ba..d4b88e9 100644
--- a/src/backend/utils/mvstats/Makefile
+++ b/src/backend/utils/mvstats/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mvstats
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = common.o dependencies.o mcv.o mvdist.o
+OBJS = common.o dependencies.o histogram.o mcv.o mvdist.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mvstats/README.histogram b/src/backend/utils/mvstats/README.histogram
new file mode 100644
index 0000000..a182fa3
--- /dev/null
+++ b/src/backend/utils/mvstats/README.histogram
@@ -0,0 +1,299 @@
+Multivariate histograms
+=======================
+
+Histograms on individual attributes consist of buckets represented by ranges,
+covering the domain of the attribute. That is, each bucket is a [min,max]
+interval, and contains all values in this range. The histogram is built in such
+a way that all buckets have about the same frequency.
+
+Multivariate histograms are an extension into n-dimensional space - the buckets
+are n-dimensional intervals (i.e. n-dimensional rectagles), covering the domain
+of the combination of attributes. That is, each bucket has a vector of lower
+and upper boundaries, denoted min[i] and max[i] (where i = 1..n).
+
+In addition to the boundaries, each bucket tracks additional info:
+
+    * frequency (fraction of tuples in the bucket)
+    * whether the boundaries are inclusive or exclusive
+    * whether the dimension contains only NULL values
+    * number of distinct values in each dimension (for building only)
+
+It's possible that in the future we'll multiple histogram types, with different
+features. We do however expect all the types to share the same representation
+(buckets as ranges) and only differ in how we build them.
+
+The current implementation builds non-overlapping buckets, that may not be true
+for some histogram types and the code should not rely on this assumption. There
+are interesting types of histograms (or algorithms) with overlapping buckets.
+
+When used on low-cardinality data, histograms usually perform considerably worse
+than MCV lists (which are a good fit for this kind of data). This is especially
+true on label-like values, where ordering of the values is mostly unrelated to
+meaning of the data, as proper ordering is crucial for histograms.
+
+On high-cardinality data the histograms are usually a better choice, because MCV
+lists can't represent the distribution accurately enough.
+
+
+Selectivity estimation
+----------------------
+
+The estimation is implemented in clauselist_mv_selectivity_histogram(), and
+works very similarly to clauselist_mv_selectivity_mcvlist.
+
+The main difference is that while MCV lists support exact matches, histograms
+often result in approximate matches - e.g. with equality we can only say if
+the constant would be part of the bucket, but not whether it really is there
+or what fraction of the bucket it corresponds to. In this case we rely on
+some defaults just like in the per-column histograms.
+
+The current implementation uses histograms to estimates those types of clauses
+(think of WHERE conditions):
+
+    (a) equality clauses    WHERE (a = 1) AND (b = 2)
+    (b) inequality clauses  WHERE (a < 1) AND (b >= 2)
+    (c) NULL clauses        WHERE (a IS NULL) AND (b IS NOT NULL)
+    (d) OR-clauses          WHERE (a = 1)  OR (b = 2)
+
+Similarly to MCV lists, it's possible to add support for additional types of
+clauses, for example:
+
+    (e) multi-var clauses   WHERE (a > b)
+
+and so on. These are tasks for the future, not yet implemented.
+
+
+When evaluating a clause on a bucket, we may get one of three results:
+
+    (a) FULL_MATCH - The bucket definitely matches the clause.
+
+    (b) PARTIAL_MATCH - The bucket matches the clause, but not necessarily all
+                        the tuples it represents.
+
+    (c) NO_MATCH - The bucket definitely does not match the clause.
+
+This may be illustrated using a range [1, 5], which is essentially a 1-D bucket.
+With clause
+
+    WHERE (a < 10) => FULL_MATCH (all range values are below
+                      10, so the whole bucket matches)
+
+    WHERE (a < 3)  => PARTIAL_MATCH (there may be values matching
+                      the clause, but we don't know how many)
+
+    WHERE (a < 0)  => NO_MATCH (the whole range is above 1, so
+                      no values from the bucket can match)
+
+Some clauses may produce only some of those results - for example equality
+clauses may never produce FULL_MATCH as we always hit only part of the bucket
+(we can't match both boundaries at the same time). This results in less accurate
+estimates compared to MCV lists, where we can hit a MCV items exactly (there's
+no PARTIAL match in MCV).
+
+There are also clauses that may not produce any PARTIAL_MATCH results. A nice
+example of that is 'IS [NOT] NULL' clause, which either matches the bucket
+completely (FULL_MATCH) or not at all (NO_MATCH), thanks to how the NULL-buckets
+are constructed.
+
+Computing the total selectivity estimate is trivial - simply sum selectivities
+from all the FULL_MATCH and PARTIAL_MATCH buckets (but for buckets marked with
+PARTIAL_MATCH, multiply the frequency by 0.5 to minimize the average error).
+
+
+Building a histogram
+---------------------
+
+The algorithm of building a histogram in general is quite simple:
+
+    (a) create an initial bucket (containing all sample rows)
+
+    (b) create NULL buckets (by splitting the initial bucket)
+
+    (c) repeat
+
+        (1) choose bucket to split next
+
+        (2) terminate if no bucket that might be split found, or if we've
+            reached the maximum number of buckets (16384)
+
+        (3) choose dimension to partition the bucket by
+
+        (4) partition the bucket by the selected dimension
+
+The main complexity is hidden in steps (c.1) and (c.3), i.e. how we choose the
+bucket and dimension for the split, as discussed in the next section.
+
+
+Partitioning criteria
+---------------------
+
+Similarly to one-dimensional histograms, we want to produce buckets with roughly
+the same frequency.
+
+We also need to produce "regular" buckets, because buckets with one dimension
+much longer than the others are very likely to match a lot of conditions (which
+increases error, even if the bucket frequency is very low).
+
+This is especially important when handling OR-clauses, because in that case each
+clause may add buckets independently. With AND-clauses all the clauses have to
+match each bucket, which makes this issue somewhat less concenrning.
+
+To achieve this, we choose the largest bucket (containing the most sample rows),
+but we only choose buckets that can actually be split (have at least 3 different
+combinations of values).
+
+Then we choose the "longest" dimension of the bucket, which is computed by using
+the distinct values in the sample as a measure.
+
+For details see functions select_bucket_to_partition() and partition_bucket(),
+which also includes further discussion.
+
+
+The current limit on number of buckets (16384) is mostly arbitrary, but chosen
+so that it guarantees we don't exceed the number of distinct values indexable by
+uint16 in any of the dimensions. In practice we could handle more buckets as we
+index each dimension separately and the splits should use the dimensions evenly.
+
+Also, histograms this large (with 16k values in multiple dimensions) would be
+quite expensive to build and process, so the 16k limit is rather reasonable.
+
+The actual number of buckets is also related to statistics target, because we
+require MIN_BUCKET_ROWS (10) tuples per bucket before a split, so we can't have
+more than (2 * 300 * target / 10) buckets. For the default target (100) this
+evaluates to ~6k.
+
+
+NULL handling (create_null_buckets)
+-----------------------------------
+
+When building histograms on a single attribute, we first filter out NULL values.
+In the multivariate case, we can't really do that because the rows may contain
+a mix of NULL and non-NULL values in different columns (so we can't simply
+filter all of them out).
+
+For this reason, the histograms are built in a way so that for each bucket, each
+dimension only contains only NULL or non-NULL values. Building the NULL-buckets
+happens as the first step in the build, by the create_null_buckets() function.
+The number of NULL buckets, as produced by this function, has a clear upper
+boundary (2^N) where N is the number of dimensions (attributes the histogram is
+built on). Or rather 2^K where K is the number of attributes that are not marked
+as not-NULL.
+
+The buckets with NULL dimensions are then subject to the same build algorithm
+(i.e. may be split into smaller buckets) just like any other bucket, but may
+only be split by non-NULL dimension.
+
+
+Serialization
+-------------
+
+To store the histogram in pg_mv_statistic table, it is serialized into a more
+efficient form. We also use the representation for estimation, i.e. we don't
+fully deserialize the histogram.
+
+For example the boundary values are deduplicated to minimize the required space.
+How much redundancy is there, actually? Let's assume there are no NULL values,
+so we start with a single bucket - in that case we have 2*N boundaries. Each
+time we split a bucket we introduce one new value (in the "middle" of one of
+the dimensions), and keep boundries for all the other dimensions. So after K
+splits, we have up to
+
+    2*N + K
+
+unique boundary values (we may have fewe values, if the same value is used for
+several splits). But after K splits we do have (K+1) buckets, so
+
+    (K+1) * 2 * N
+
+boundary values. Using e.g. N=4 and K=999, we arrive to those numbers:
+
+    2*N + K       = 1007
+    (K+1) * 2 * N = 8000
+
+wich means a lot of redundancy. It's somewhat counter-intuitive that the number
+of distinct values does not really depend on the number of dimensions (except
+for the initial bucket, but that's negligible compared to the total).
+
+By deduplicating the values and replacing them with 16-bit indexes (uint16), we
+reduce the required space to
+
+    1007 * 8 + 8000 * 2 ~= 24kB
+
+which is significantly less than 64kB required for the 'raw' histogram (assuming
+the values are 8B).
+
+While the bytea compression (pglz) might achieve the same reduction of space,
+the deduplicated representation is used to optimize the estimation by caching
+results of function calls for already visited values. This significantly
+reduces the number of calls to (often quite expensive) operators.
+
+Note: Of course, this reasoning only holds for histograms built by the algorithm
+that simply splits the buckets in half. Other histograms types (e.g. containing
+overlapping buckets) may behave differently and require different serialization.
+
+Serialized histograms are marked with 'magic' constant, to make it easier to
+check the bytea value really is a serialized histogram.
+
+
+varlena compression
+-------------------
+
+This serialization may however disable automatic varlena compression, the array
+of unique values is placed at the beginning of the serialized form. Which is
+exactly the chunk used by pglz to check if the data is compressible, and it
+will probably decide it's not very compressible. This is similar to the issue
+we had with JSONB initially.
+
+Maybe storing buckets first would make it work, as the buckets may be better
+compressible.
+
+On the other hand the serialization is actually a context-aware compression,
+usually compressing to ~30% (or even less, with large data types). So the lack
+of additional pglz compression may be acceptable.
+
+
+Deserialization
+---------------
+
+The deserialization is not a perfect inverse of the serialization, as we keep
+the deduplicated arrays. This reduces the amount of memory and also allows
+optimizations during estimation (e.g. we can cache results for the distinct
+values, saving expensive function calls).
+
+
+Inspecting the histogram
+------------------------
+
+Inspecting the regular (per-attribute) histograms is trivial, as it's enough
+to select the columns from pg_stats - the data is encoded as anyarray, so we
+simply get the text representation of the array.
+
+With multivariate histograms it's not that simple due to the possible mix of
+data types in the histogram. It might be possible to produce similar array-like
+text representation, but that'd unnecessarily complicate further processing
+and analysis of the histogram. Instead, there's a SRF function that allows
+access to lower/upper boundaries, frequencies etc.
+
+    SELECT * FROM pg_mv_histogram_buckets();
+
+It has two input parameters:
+
+    oid   - OID of the histogram (pg_mv_statistic.staoid)
+    otype - type of output
+
+and produces a table with these columns:
+
+    - bucket ID                (0...nbuckets-1)
+    - lower bucket boundaries  (string array)
+    - upper bucket boundaries  (string array)
+    - nulls only dimensions    (boolean array)
+    - lower boundary inclusive (boolean array)
+    - upper boundary includive (boolean array)
+    - frequency                (double precision)
+
+The 'otype' accepts three values, determining what will be returned in the
+lower/upper boundary arrays:
+
+    - 0 - values stored in the histogram, encoded as text
+    - 1 - indexes into the deduplicated arrays
+    - 2 - idnexes into the deduplicated arrays, scaled to [0,1]
diff --git a/src/backend/utils/mvstats/README.stats b/src/backend/utils/mvstats/README.stats
index 8d3d268..9cc1c3e 100644
--- a/src/backend/utils/mvstats/README.stats
+++ b/src/backend/utils/mvstats/README.stats
@@ -18,6 +18,8 @@ Currently we only have two kinds of multivariate statistics
 
     (b) MCV lists (README.mcv)
 
+    (c) multivariate histograms (README.histogram)
+
 
 Compatible clause types
 -----------------------
diff --git a/src/backend/utils/mvstats/common.c b/src/backend/utils/mvstats/common.c
index fc8eae2..82f4e4a 100644
--- a/src/backend/utils/mvstats/common.c
+++ b/src/backend/utils/mvstats/common.c
@@ -13,6 +13,7 @@
  *
  *-------------------------------------------------------------------------
  */
+#include "postgres.h"
 
 #include "common.h"
 #include "utils/array.h"
@@ -24,7 +25,7 @@ static List *list_mv_stats(Oid relid);
 
 static void update_mv_stats(Oid relid,
 					  MVNDistinct ndistinct, MVDependencies dependencies,
-					  MCVList mcvlist,
+					  MCVList mcvlist, MVHistogram histogram,
 					  int2vector *attrs, VacAttrStats **stats);
 
 /*
@@ -57,7 +58,8 @@ build_mv_stats(Relation onerel, double totalrows,
 		MVNDistinct	ndistinct = NULL;
 		MVDependencies deps = NULL;
 		MCVList		mcvlist = NULL;
-		int			numrows_filtered = 0;
+		MVHistogram histogram = NULL;
+		int			numrows_filtered = numrows;
 
 		VacAttrStats **stats = NULL;
 		int			numatts = 0;
@@ -102,8 +104,12 @@ build_mv_stats(Relation onerel, double totalrows,
 		if (stat->mcv_enabled)
 			mcvlist = build_mv_mcvlist(numrows, rows, attrs, stats, &numrows_filtered);
 
-		/* store the statistics in the catalog */
-		update_mv_stats(stat->mvoid, ndistinct, deps, mcvlist, attrs, stats);
+		/* build a multivariate histogram on the columns */
+		if ((numrows_filtered > 0) && (stat->hist_enabled))
+			histogram = build_mv_histogram(numrows_filtered, rows, attrs, stats, numrows);
+
+		/* store the histogram / MCV list in the catalog */
+		update_mv_stats(stat->mvoid, ndistinct, deps, mcvlist, histogram, attrs, stats);
 	}
 }
 
@@ -187,6 +193,8 @@ list_mv_stats(Oid relid)
 		info->deps_built = stats->deps_built;
 		info->mcv_enabled = stats->mcv_enabled;
 		info->mcv_built = stats->mcv_built;
+		info->hist_enabled = stats->hist_enabled;
+		info->hist_built = stats->hist_built;
 
 		result = lappend(result, info);
 	}
@@ -255,7 +263,8 @@ find_mv_attnums(Oid mvoid, Oid *relid)
  */
 static void
 update_mv_stats(Oid mvoid,
-				MVNDistinct ndistinct, MVDependencies dependencies, MCVList mcvlist,
+				MVNDistinct ndistinct, MVDependencies dependencies,
+				MCVList mcvlist, MVHistogram histogram,
 				int2vector *attrs, VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -297,15 +306,26 @@ update_mv_stats(Oid mvoid,
 		values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data);
 	}
 
+	if (histogram != NULL)
+	{
+		bytea	   *data = serialize_mv_histogram(histogram, attrs, stats);
+
+		nulls[Anum_pg_mv_statistic_stahist - 1] = (data == NULL);
+		values[Anum_pg_mv_statistic_stahist - 1]
+			= PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_mv_statistic_standist - 1] = true;
 	replaces[Anum_pg_mv_statistic_stadeps - 1] = true;
 	replaces[Anum_pg_mv_statistic_stamcv - 1] = true;
+	replaces[Anum_pg_mv_statistic_stahist - 1] = true;
 
 	/* always change the availability flags */
 	nulls[Anum_pg_mv_statistic_ndist_built - 1] = false;
 	nulls[Anum_pg_mv_statistic_deps_built - 1] = false;
 	nulls[Anum_pg_mv_statistic_mcv_built - 1] = false;
+	nulls[Anum_pg_mv_statistic_hist_built - 1] = false;
 
 	nulls[Anum_pg_mv_statistic_stakeys - 1] = false;
 
@@ -313,12 +333,14 @@ update_mv_stats(Oid mvoid,
 	replaces[Anum_pg_mv_statistic_ndist_built - 1] = true;
 	replaces[Anum_pg_mv_statistic_deps_built - 1] = true;
 	replaces[Anum_pg_mv_statistic_mcv_built - 1] = true;
+	replaces[Anum_pg_mv_statistic_hist_built - 1] = true;
 
 	replaces[Anum_pg_mv_statistic_stakeys - 1] = true;
 
 	values[Anum_pg_mv_statistic_ndist_built - 1] = BoolGetDatum(ndistinct != NULL);
 	values[Anum_pg_mv_statistic_deps_built - 1] = BoolGetDatum(dependencies != NULL);
 	values[Anum_pg_mv_statistic_mcv_built - 1] = BoolGetDatum(mcvlist != NULL);
+	values[Anum_pg_mv_statistic_hist_built - 1] = BoolGetDatum(histogram != NULL);
 
 	values[Anum_pg_mv_statistic_stakeys - 1] = PointerGetDatum(attrs);
 
diff --git a/src/backend/utils/mvstats/common.h b/src/backend/utils/mvstats/common.h
index fe56f51..96c0317 100644
--- a/src/backend/utils/mvstats/common.h
+++ b/src/backend/utils/mvstats/common.h
@@ -77,7 +77,7 @@ MultiSortSupport multi_sort_init(int ndims);
 void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 						 int dim, VacAttrStats **vacattrstats);
 
-int			multi_sort_compare(const void *a, const void *b, void *arg);
+int multi_sort_compare(const void *a, const void *b, void *arg);
 
 int multi_sort_compare_dim(int dim, const SortItem *a,
 					   const SortItem *b, MultiSortSupport mss);
@@ -86,9 +86,9 @@ int multi_sort_compare_dims(int start, int end, const SortItem *a,
 						const SortItem *b, MultiSortSupport mss);
 
 /* comparators, used when constructing multivariate stats */
-int			compare_datums_simple(Datum a, Datum b, SortSupport ssup);
-int			compare_scalars_simple(const void *a, const void *b, void *arg);
-int			compare_scalars_partition(const void *a, const void *b, void *arg);
+int compare_datums_simple(Datum a, Datum b, SortSupport ssup);
+int compare_scalars_simple(const void *a, const void *b, void *arg);
+int compare_scalars_partition(const void *a, const void *b, void *arg);
 
 void *bsearch_arg(const void *key, const void *base,
 			size_t nmemb, size_t size,
diff --git a/src/backend/utils/mvstats/histogram.c b/src/backend/utils/mvstats/histogram.c
new file mode 100644
index 0000000..fc0c9c2
--- /dev/null
+++ b/src/backend/utils/mvstats/histogram.c
@@ -0,0 +1,2123 @@
+/*-------------------------------------------------------------------------
+ *
+ * histogram.c
+ *	  POSTGRES multivariate histograms
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mvstats/histogram.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+
+#include "utils/bytea.h"
+#include "utils/lsyscache.h"
+
+#include "common.h"
+#include <math.h>
+
+
+static MVBucket create_initial_mv_bucket(int numrows, HeapTuple *rows,
+						 int2vector *attrs,
+						 VacAttrStats **stats);
+
+static MVBucket select_bucket_to_partition(int nbuckets, MVBucket *buckets);
+
+static MVBucket partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues);
+
+static MVBucket copy_mv_bucket(MVBucket bucket, uint32 ndimensions);
+
+static void update_bucket_ndistinct(MVBucket bucket, int2vector *attrs,
+						VacAttrStats **stats);
+
+static void update_dimension_ndistinct(MVBucket bucket, int dimension,
+						   int2vector *attrs,
+						   VacAttrStats **stats,
+						   bool update_boundaries);
+
+static void create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats **stats);
+
+static Datum *build_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				VacAttrStats **stats, int i, int *nvals);
+
+/*
+ * Each serialized bucket needs to store (in this order):
+ *
+ * - number of tuples	  (float)
+ * - number of distinct   (float)
+ * - min inclusive flags  (ndim * sizeof(bool))
+ * - max inclusive flags  (ndim * sizeof(bool))
+ * - null dimension flags (ndim * sizeof(bool))
+ * - min boundary indexes (2 * ndim * sizeof(uint16))
+ * - max boundary indexes (2 * ndim * sizeof(uint16))
+ *
+ * So in total:
+ *
+ *	 ndim * (4 * sizeof(uint16) + 3 * sizeof(bool)) + (2 * sizeof(float))
+ */
+#define BUCKET_SIZE(ndims)	\
+	(ndims * (4 * sizeof(uint16) + 3 * sizeof(bool)) + sizeof(float))
+
+/* pointers into a flat serialized bucket of BUCKET_SIZE(n) bytes */
+#define BUCKET_NTUPLES(b)		(*(float*)b)
+#define BUCKET_MIN_INCL(b,n)	((bool*)(b + sizeof(float)))
+#define BUCKET_MAX_INCL(b,n)	(BUCKET_MIN_INCL(b,n) + n)
+#define BUCKET_NULLS_ONLY(b,n)	(BUCKET_MAX_INCL(b,n) + n)
+#define BUCKET_MIN_INDEXES(b,n) ((uint16*)(BUCKET_NULLS_ONLY(b,n) + n))
+#define BUCKET_MAX_INDEXES(b,n) ((BUCKET_MIN_INDEXES(b,n) + n))
+
+/* can't split bucket with less than 10 rows */
+#define MIN_BUCKET_ROWS			10
+
+/*
+ * Data used while building the histogram.
+ */
+typedef struct HistogramBuildData
+{
+
+	float		ndistinct;		/* frequency of distinct values */
+
+	HeapTuple  *rows;			/* aray of sample rows */
+	uint32		numrows;		/* number of sample rows (array size) */
+
+	/*
+	 * Number of distinct values in each dimension. This is used when building
+	 * the histogram (and is not serialized/deserialized).
+	 */
+	uint32	   *ndistincts;
+
+} HistogramBuildData;
+
+typedef HistogramBuildData *HistogramBuild;
+
+/*
+ * builds a multivariate algorithm
+ *
+ * The build algorithm is iterative - initially a single bucket containing all
+ * the sample rows is formed, and then repeatedly split into smaller buckets.
+ * In each step the largest bucket (in some sense) is chosen to be split next.
+ *
+ * The criteria for selecting the largest bucket (and the dimension for the
+ * split) needs to be elaborate enough to produce buckets of roughly the same
+ * size, and also regular shape (not very long in one dimension).
+ *
+ * The current algorithm works like this:
+ *
+ *	   build NULL-buckets (create_null_buckets)
+ *
+ *	   while [maximum number of buckets not reached]
+ *
+ *		   choose bucket to partition (largest bucket)
+ *			   if no bucket to partition
+ *				   terminate the algorithm
+ *
+ *		   choose bucket dimension to partition (largest dimension)
+ *			   split the bucket into two buckets
+ *
+ * See the discussion at select_bucket_to_partition and partition_bucket for
+ * more details about the algorithm.
+ */
+MVHistogram
+build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total)
+{
+	int			i;
+	int			numattrs = attrs->dim1;
+
+	int		   *ndistvalues;
+	Datum	  **distvalues;
+
+	MVHistogram histogram;
+
+	HeapTuple  *rows_copy = (HeapTuple *) palloc0(numrows * sizeof(HeapTuple));
+
+	memcpy(rows_copy, rows, sizeof(HeapTuple) * numrows);
+
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* build histogram header */
+
+	histogram = (MVHistogram) palloc0(sizeof(MVHistogramData));
+
+	histogram->magic = MVSTAT_HIST_MAGIC;
+	histogram->type = MVSTAT_HIST_TYPE_BASIC;
+
+	histogram->nbuckets = 1;
+	histogram->ndimensions = numattrs;
+
+	/* create max buckets (better than repalloc for short-lived objects) */
+	histogram->buckets
+		= (MVBucket *) palloc0(MVSTAT_HIST_MAX_BUCKETS * sizeof(MVBucket));
+
+	/* create the initial bucket, covering the whole sample set */
+	histogram->buckets[0]
+		= create_initial_mv_bucket(numrows, rows_copy, attrs, stats);
+
+	/*
+	 * Collect info on distinct values in each dimension (used later to select
+	 * dimension to partition).
+	 */
+	ndistvalues = (int *) palloc0(sizeof(int) * numattrs);
+	distvalues = (Datum **) palloc0(sizeof(Datum *) * numattrs);
+
+	for (i = 0; i < numattrs; i++)
+		distvalues[i] = build_ndistinct(numrows, rows, attrs, stats, i,
+										&ndistvalues[i]);
+
+	/*
+	 * Split the initial bucket into buckets that don't mix NULL and non-NULL
+	 * values in a single dimension.
+	 */
+	create_null_buckets(histogram, 0, attrs, stats);
+
+	/*
+	 * Do the actual histogram build - select a bucket and split it.
+	 */
+	while (histogram->nbuckets < MVSTAT_HIST_MAX_BUCKETS)
+	{
+		MVBucket	bucket = select_bucket_to_partition(histogram->nbuckets,
+														histogram->buckets);
+
+		/* no buckets eligible for partitioning */
+		if (bucket == NULL)
+			break;
+
+		/* we modify the bucket in-place and add one new bucket */
+		histogram->buckets[histogram->nbuckets++]
+			= partition_bucket(bucket, attrs, stats, ndistvalues, distvalues);
+	}
+
+	/* finalize the histogram build - compute the frequencies etc. */
+	for (i = 0; i < histogram->nbuckets; i++)
+	{
+		HistogramBuild build_data
+		= ((HistogramBuild) histogram->buckets[i]->build_data);
+
+		/*
+		 * The frequency has to be computed from the whole sample, in case
+		 * some of the rows were used for MCV.
+		 *
+		 * XXX Perhaps this should simply compute frequency with respect to
+		 * the local freuquency, and then factor-in the MCV later.
+		 *
+		 * FIXME The 'ntuples' sounds a bit inappropriate for frequency.
+		 */
+		histogram->buckets[i]->ntuples
+			= (build_data->numrows * 1.0) / numrows_total;
+	}
+
+	return histogram;
+}
+
+/* build array of distinct values for a single attribute */
+static Datum *
+build_ndistinct(int numrows, HeapTuple *rows, int2vector *attrs,
+				VacAttrStats **stats, int i, int *nvals)
+{
+	int			j;
+	int			nvalues,
+				ndistinct;
+	Datum	   *values,
+			   *distvalues;
+
+	SortSupportData ssup;
+	StdAnalyzeData *mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	nvalues = 0;
+	values = (Datum *) palloc0(sizeof(Datum) * numrows);
+
+	/* collect values from the sample rows, ignore NULLs */
+	for (j = 0; j < numrows; j++)
+	{
+		Datum		value;
+		bool		isnull;
+
+		/*
+		 * remember the index of the sample row, to make the partitioning
+		 * simpler
+		 */
+		value = heap_getattr(rows[j], attrs->values[i],
+							 stats[i]->tupDesc, &isnull);
+
+		if (isnull)
+			continue;
+
+		values[nvalues++] = value;
+	}
+
+	/* if no non-NULL values were found, free the memory and terminate */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		return NULL;
+	}
+
+	/* sort the array of values using the SortSupport */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/* count the distinct values first, and allocate just enough memory */
+	ndistinct = 1;
+	for (j = 1; j < nvalues; j++)
+		if (compare_scalars_simple(&values[j], &values[j - 1], &ssup) != 0)
+			ndistinct += 1;
+
+	distvalues = (Datum *) palloc0(sizeof(Datum) * ndistinct);
+
+	/* now collect distinct values into the array */
+	distvalues[0] = values[0];
+	ndistinct = 1;
+
+	for (j = 1; j < nvalues; j++)
+	{
+		if (compare_scalars_simple(&values[j], &values[j - 1], &ssup) != 0)
+		{
+			distvalues[ndistinct] = values[j];
+			ndistinct += 1;
+		}
+	}
+
+	pfree(values);
+
+	*nvals = ndistinct;
+	return distvalues;
+}
+
+/* fetch the histogram (as a bytea) from the pg_mv_statistic catalog */
+MVSerializedHistogram
+load_mv_histogram(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		histogram;
+
+#ifdef USE_ASSERT_CHECKING
+	Form_pg_mv_statistic mvstat;
+#endif
+
+	/* Prepare to scan pg_mv_statistic for entries having indrelid = this rel. */
+	HeapTuple	htup = SearchSysCache1(MVSTATOID, ObjectIdGetDatum(mvoid));
+
+	if (!HeapTupleIsValid(htup))
+		return NULL;
+
+#ifdef USE_ASSERT_CHECKING
+	mvstat = (Form_pg_mv_statistic) GETSTRUCT(htup);
+	Assert(mvstat->hist_enabled && mvstat->hist_built);
+#endif
+
+	histogram = SysCacheGetAttr(MVSTATOID, htup,
+								Anum_pg_mv_statistic_stahist, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_mv_histogram(DatumGetByteaP(histogram));
+}
+
+/* print some basic info about the histogram */
+Datum
+pg_mv_stats_histogram_info(PG_FUNCTION_ARGS)
+{
+	bytea	   *data = PG_GETARG_BYTEA_P(0);
+	char	   *result;
+
+	MVSerializedHistogram hist = deserialize_mv_histogram(data);
+
+	result = palloc0(128);
+	snprintf(result, 128, "nbuckets=%d", hist->nbuckets);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * Serialize the MV histogram into a bytea value. The basic algorithm is quite
+ * simple, and mostly mimincs the MCV serialization:
+ *
+ * (1) perform deduplication for each attribute (separately)
+ *
+ *	   (a) collect all (non-NULL) attribute values from all buckets
+ *	   (b) sort the data (using 'lt' from VacAttrStats)
+ *	   (c) remove duplicate values from the array
+ *
+ * (2) serialize the arrays into a bytea value
+ *
+ * (3) process all buckets
+ *
+ *	   (a) replace min/max values with indexes into the arrays
+ *
+ * Each attribute has to be processed separately, as we're mixing different
+ * datatypes, and we we need to use the right operators to compare/sort them.
+ * We're also mixing pass-by-value and pass-by-ref types, and so on.
+ *
+ *
+ * FIXME This probably leaks memory, or at least uses it inefficiently
+ * (many small palloc() calls instead of a large one).
+ *
+ * TODO Consider packing boolean flags (NULL) for each item into 'char' or
+ * a longer type (instead of using an array of bool items).
+ */
+bytea *
+serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+					   VacAttrStats **stats)
+{
+	int			i = 0,
+				j = 0;
+	Size		total_length = 0;
+
+	bytea	   *output = NULL;
+	char	   *data = NULL;
+
+	DimensionInfo *info;
+	SortSupport ssup;
+
+	int			nbuckets = histogram->nbuckets;
+	int			ndims = histogram->ndimensions;
+
+	/* allocated for serialized bucket data */
+	int			bucketsize = BUCKET_SIZE(ndims);
+	char	   *bucket = palloc0(bucketsize);
+
+	/* values per dimension (and number of non-NULL values) */
+	Datum	  **values = (Datum **) palloc0(sizeof(Datum *) * ndims);
+	int		   *counts = (int *) palloc0(sizeof(int) * ndims);
+
+	/* info about dimensions (for deserialize) */
+	info = (DimensionInfo *) palloc0(sizeof(DimensionInfo) * ndims);
+
+	/* sort support data */
+	ssup = (SortSupport) palloc0(sizeof(SortSupportData) * ndims);
+
+	/* collect and deduplicate values for each dimension separately */
+	for (i = 0; i < ndims; i++)
+	{
+		int			count;
+		StdAnalyzeData *tmp = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* keep important info about the data type */
+		info[i].typlen = stats[i]->attrtype->typlen;
+		info[i].typbyval = stats[i]->attrtype->typbyval;
+
+		/*
+		 * Allocate space for all min/max values, including NULLs (we won't
+		 * use them, but we don't know how many are there), and then collect
+		 * all non-NULL values.
+		 */
+		values[i] = (Datum *) palloc0(sizeof(Datum) * nbuckets * 2);
+
+		for (j = 0; j < histogram->nbuckets; j++)
+		{
+			/* skip buckets where this dimension is NULL-only */
+			if (!histogram->buckets[j]->nullsonly[i])
+			{
+				values[i][counts[i]] = histogram->buckets[j]->min[i];
+				counts[i] += 1;
+
+				values[i][counts[i]] = histogram->buckets[j]->max[i];
+				counts[i] += 1;
+			}
+		}
+
+		/* there are just NULL values in this dimension */
+		if (counts[i] == 0)
+			continue;
+
+		/* sort and deduplicate */
+		ssup[i].ssup_cxt = CurrentMemoryContext;
+		ssup[i].ssup_collation = DEFAULT_COLLATION_OID;
+		ssup[i].ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup[i]);
+
+		qsort_arg(values[i], counts[i], sizeof(Datum),
+				  compare_scalars_simple, &ssup[i]);
+
+		/*
+		 * Walk through the array and eliminate duplicitate values, but keep
+		 * the ordering (so that we can do bsearch later). We know there's at
+		 * least 1 item, so we can skip the first element.
+		 */
+		count = 1;				/* number of deduplicated items */
+		for (j = 1; j < counts[i]; j++)
+		{
+			/* if it's different from the previous value, we need to keep it */
+			if (compare_datums_simple(values[i][j - 1], values[i][j], &ssup[i]) != 0)
+			{
+				/* XXX: not needed if (count == j) */
+				values[i][count] = values[i][j];
+				count += 1;
+			}
+		}
+
+		/* make sure we fit into uint16 */
+		Assert(count <= UINT16_MAX);
+
+		/* keep info about the deduplicated count */
+		info[i].nvalues = count;
+
+		/* compute size of the serialized data */
+		if (info[i].typlen > 0)
+			/* byval or byref, but with fixed length (name, tid, ...) */
+			info[i].nbytes = info[i].nvalues * info[i].typlen;
+		else if (info[i].typlen == -1)
+			/* varlena, so just use VARSIZE_ANY */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += VARSIZE_ANY(values[i][j]);
+		else if (info[i].typlen == -2)
+			/* cstring, so simply strlen */
+			for (j = 0; j < info[i].nvalues; j++)
+				info[i].nbytes += strlen(DatumGetPointer(values[i][j]));
+		else
+			elog(ERROR, "unknown data type typbyval=%d typlen=%d",
+				 info[i].typbyval, info[i].typlen);
+	}
+
+	/*
+	 * Now we finally know how much space we'll need for the serialized
+	 * histogram, as it contains these fields:
+	 *
+	 * - length (4B) for varlena - magic (4B) - type (4B) - ndimensions (4B) -
+	 * nbuckets (4B) - info (ndim * sizeof(DimensionInfo) - arrays of values
+	 * for each dimension - serialized buckets (nbuckets * bucketsize)
+	 *
+	 * So the 'header' size is 20B + ndim * sizeof(DimensionInfo) and then
+	 * we'll place the data (and buckets).
+	 */
+	total_length = (sizeof(int32) + offsetof(MVHistogramData, buckets)
+					+ndims * sizeof(DimensionInfo)
+					+ nbuckets * bucketsize);
+
+	/* account for the deduplicated data */
+	for (i = 0; i < ndims; i++)
+		total_length += info[i].nbytes;
+
+	/* enforce arbitrary limit of 1MB */
+	if (total_length > (1024 * 1024))
+		elog(ERROR, "serialized histogram exceeds 1MB (%ld > %d)",
+			 total_length, (1024 * 1024));
+
+	/* allocate space for the serialized histogram list, set header */
+	output = (bytea *) palloc0(total_length);
+	SET_VARSIZE(output, total_length);
+
+	/* we'll use 'data' to keep track of the place to write data */
+	data = VARDATA(output);
+
+	memcpy(data, histogram, offsetof(MVHistogramData, buckets));
+	data += offsetof(MVHistogramData, buckets);
+
+	memcpy(data, info, sizeof(DimensionInfo) * ndims);
+	data += sizeof(DimensionInfo) * ndims;
+
+	/* serialize the deduplicated values for all attributes */
+	for (i = 0; i < ndims; i++)
+	{
+#ifdef USE_ASSERT_CHECKING
+		char	   *tmp = data;
+#endif
+		for (j = 0; j < info[i].nvalues; j++)
+		{
+			Datum		v = values[i][j];
+
+			if (info[i].typbyval)		/* passed by value */
+			{
+				memcpy(data, &v, info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen > 0)		/* pased by reference */
+			{
+				memcpy(data, DatumGetPointer(v), info[i].typlen);
+				data += info[i].typlen;
+			}
+			else if (info[i].typlen == -1)		/* varlena */
+			{
+				memcpy(data, DatumGetPointer(v), VARSIZE_ANY(v));
+				data += VARSIZE_ANY(values[i][j]);
+			}
+			else if (info[i].typlen == -2)		/* cstring */
+			{
+				memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
+				data += strlen(DatumGetPointer(v)) + 1;
+			}
+		}
+
+		/* make sure we got exactly the amount of data we expected */
+		Assert((data - tmp) == info[i].nbytes);
+	}
+
+	/* finally serialize the items, with uint16 indexes instead of the values */
+	for (i = 0; i < nbuckets; i++)
+	{
+		/* don't write beyond the allocated space */
+		Assert(data <= (char *) output + total_length - bucketsize);
+
+		/* reset the values for each item */
+		memset(bucket, 0, bucketsize);
+
+		BUCKET_NTUPLES(bucket) = histogram->buckets[i]->ntuples;
+
+		for (j = 0; j < ndims; j++)
+		{
+			/* do the lookup only for non-NULL values */
+			if (!histogram->buckets[i]->nullsonly[j])
+			{
+				uint16		idx;
+				Datum	   *v = NULL;
+
+				/* min boundary */
+				v = (Datum *) bsearch_arg(&histogram->buckets[i]->min[j],
+								   values[j], info[j].nvalues, sizeof(Datum),
+										  compare_scalars_simple, &ssup[j]);
+
+				Assert(v != NULL);		/* serialization or deduplication
+										 * error */
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MIN_INDEXES(bucket, ndims)[j] = idx;
+
+				/* max boundary */
+				v = (Datum *) bsearch_arg(&histogram->buckets[i]->max[j],
+								   values[j], info[j].nvalues, sizeof(Datum),
+										  compare_scalars_simple, &ssup[j]);
+
+				Assert(v != NULL);		/* serialization or deduplication
+										 * error */
+
+				/* compute index within the array */
+				idx = (v - values[j]);
+
+				Assert((idx >= 0) && (idx < info[j].nvalues));
+
+				BUCKET_MAX_INDEXES(bucket, ndims)[j] = idx;
+			}
+		}
+
+		/* copy flags (nulls, min/max inclusive) */
+		memcpy(BUCKET_NULLS_ONLY(bucket, ndims),
+			   histogram->buckets[i]->nullsonly, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MIN_INCL(bucket, ndims),
+			   histogram->buckets[i]->min_inclusive, sizeof(bool) * ndims);
+
+		memcpy(BUCKET_MAX_INCL(bucket, ndims),
+			   histogram->buckets[i]->max_inclusive, sizeof(bool) * ndims);
+
+		/* copy the item into the array */
+		memcpy(data, bucket, bucketsize);
+
+		data += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((data - (char *) output) == total_length);
+
+	/* free the values/counts arrays here */
+	pfree(counts);
+	pfree(info);
+	pfree(ssup);
+
+	for (i = 0; i < ndims; i++)
+		pfree(values[i]);
+
+	pfree(values);
+
+	return output;
+}
+
+/*
+ * Returns histogram in a partially-serialized form (keeps the boundary values
+ * deduplicated, so that it's possible to optimize the estimation part by
+ * caching function call results between buckets etc.).
+ */
+MVSerializedHistogram
+deserialize_mv_histogram(bytea *data)
+{
+	int			i = 0,
+				j = 0;
+
+	Size		expected_size;
+	char	   *tmp = NULL;
+
+	MVSerializedHistogram histogram;
+	DimensionInfo *info;
+
+	int			nbuckets;
+	int			ndims;
+	int			bucketsize;
+
+	/* temporary deserialization buffer */
+	int			bufflen;
+	char	   *buff;
+	char	   *ptr;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVSerializedHistogramData, buckets))
+		elog(ERROR, "invalid histogram size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVSerializedHistogramData, buckets));
+
+	/* read the histogram header */
+	histogram
+		= (MVSerializedHistogram) palloc(sizeof(MVSerializedHistogramData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(histogram, tmp, offsetof(MVSerializedHistogramData, buckets));
+	tmp += offsetof(MVSerializedHistogramData, buckets);
+
+	if (histogram->magic != MVSTAT_HIST_MAGIC)
+		elog(ERROR, "invalid histogram magic %d (expected %dd)",
+			 histogram->magic, MVSTAT_HIST_MAGIC);
+
+	if (histogram->type != MVSTAT_HIST_TYPE_BASIC)
+		elog(ERROR, "invalid histogram type %d (expected %dd)",
+			 histogram->type, MVSTAT_HIST_TYPE_BASIC);
+
+	nbuckets = histogram->nbuckets;
+	ndims = histogram->ndimensions;
+	bucketsize = BUCKET_SIZE(ndims);
+
+	Assert((nbuckets > 0) && (nbuckets <= MVSTAT_HIST_MAX_BUCKETS));
+	Assert((ndims >= 2) && (ndims <= MVSTATS_MAX_DIMENSIONS));
+
+	/*
+	 * What size do we expect with those parameters (it's incomplete, as we
+	 * yet have to count the array sizes (from DimensionInfo records).
+	 */
+	expected_size = offsetof(MVSerializedHistogramData, buckets) +
+		ndims * sizeof(DimensionInfo) +
+		(nbuckets * bucketsize);
+
+	/* check that we have at least the DimensionInfo records */
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	info = (DimensionInfo *) (tmp);
+	tmp += ndims * sizeof(DimensionInfo);
+
+	/* account for the value arrays */
+	for (i = 0; i < ndims; i++)
+		expected_size += info[i].nbytes;
+
+	if (VARSIZE_ANY_EXHDR(data) != expected_size)
+		elog(ERROR, "invalid histogram size %ld (expected %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* looks OK - not corrupted or something */
+
+	/* a single buffer for all the values and counts */
+	bufflen = (sizeof(int) + sizeof(Datum *)) * ndims;
+
+	for (i = 0; i < ndims; i++)
+		/* don't allocate space for byval types, matching Datum */
+		if (!(info[i].typbyval && (info[i].typlen == sizeof(Datum))))
+			bufflen += (sizeof(Datum) * info[i].nvalues);
+
+	/* also, include space for the result, tracking the buckets */
+	bufflen += nbuckets * (
+						   sizeof(MVSerializedBucket) + /* bucket pointer */
+						   sizeof(MVSerializedBucketData));		/* bucket data */
+
+	buff = palloc0(bufflen);
+	ptr = buff;
+
+	histogram->nvalues = (int *) ptr;
+	ptr += (sizeof(int) * ndims);
+
+	histogram->values = (Datum **) ptr;
+	ptr += (sizeof(Datum *) * ndims);
+
+	/*
+	 * FIXME This uses pointers to the original data array (the types not
+	 * passed by value), so when someone frees the memory, e.g. by doing
+	 * something like this:
+	 *
+	 * bytea * data = ... fetch the data from catalog ... MCVList mcvlist =
+	 * deserialize_mcv_list(data); pfree(data);
+	 *
+	 * then 'mcvlist' references the freed memory. This needs to copy the
+	 * pieces.
+	 *
+	 * TODO same as in MCV deserialization / consider moving to common.c
+	 */
+	for (i = 0; i < ndims; i++)
+	{
+		histogram->nvalues[i] = info[i].nvalues;
+
+		if (info[i].typbyval)
+		{
+			/* passed by value / Datum - simply reuse the array */
+			if (info[i].typlen == sizeof(Datum))
+			{
+				histogram->values[i] = (Datum *) tmp;
+				tmp += info[i].nbytes;
+			}
+			else
+			{
+				histogram->values[i] = (Datum *) ptr;
+				ptr += (sizeof(Datum) * info[i].nvalues);
+
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					memcpy(&histogram->values[i][j], tmp, info[i].typlen);
+					tmp += info[i].typlen;
+				}
+			}
+		}
+		else
+		{
+			/* all the other types need a chunk of the buffer */
+			histogram->values[i] = (Datum *) ptr;
+			ptr += (sizeof(Datum) * info[i].nvalues);
+
+			if (info[i].typlen > 0)
+			{
+				/* pased by reference, but fixed length (name, tid, ...) */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += info[i].typlen;
+				}
+			}
+			else if (info[i].typlen == -1)
+			{
+				/* varlena */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += VARSIZE_ANY(tmp);
+				}
+			}
+			else if (info[i].typlen == -2)
+			{
+				/* cstring */
+				for (j = 0; j < info[i].nvalues; j++)
+				{
+					/* just point into the array */
+					histogram->values[i][j] = PointerGetDatum(tmp);
+					tmp += (strlen(tmp) + 1);	/* don't forget the \0 */
+				}
+			}
+		}
+	}
+
+	histogram->buckets = (MVSerializedBucket *) ptr;
+	ptr += (sizeof(MVSerializedBucket) * nbuckets);
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		MVSerializedBucket bucket = (MVSerializedBucket) ptr;
+
+		ptr += sizeof(MVSerializedBucketData);
+
+		bucket->ntuples = BUCKET_NTUPLES(tmp);
+		bucket->nullsonly = BUCKET_NULLS_ONLY(tmp, ndims);
+		bucket->min_inclusive = BUCKET_MIN_INCL(tmp, ndims);
+		bucket->max_inclusive = BUCKET_MAX_INCL(tmp, ndims);
+
+		bucket->min = BUCKET_MIN_INDEXES(tmp, ndims);
+		bucket->max = BUCKET_MAX_INDEXES(tmp, ndims);
+
+		histogram->buckets[i] = bucket;
+
+		Assert(tmp <= (char *) data + VARSIZE_ANY(data));
+
+		tmp += bucketsize;
+	}
+
+	/* at this point we expect to match the total_length exactly */
+	Assert((tmp - VARDATA(data)) == expected_size);
+
+	/* we should exhaust the output buffer exactly */
+	Assert((ptr - buff) == bufflen);
+
+	return histogram;
+}
+
+/*
+ * Build the initial bucket, which will be then split into smaller ones.
+ */
+static MVBucket
+create_initial_mv_bucket(int numrows, HeapTuple *rows, int2vector *attrs,
+						 VacAttrStats **stats)
+{
+	int			i;
+	int			numattrs = attrs->dim1;
+	HistogramBuild data = NULL;
+
+	/* TODO allocate bucket as a single piece, including all the fields. */
+	MVBucket	bucket = (MVBucket) palloc0(sizeof(MVBucketData));
+
+	Assert(numrows > 0);
+	Assert(rows != NULL);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* allocate the per-dimension arrays */
+
+	/* flags for null-only dimensions */
+	bucket->nullsonly = (bool *) palloc0(numattrs * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	bucket->min_inclusive = (bool *) palloc0(numattrs * sizeof(bool));
+	bucket->max_inclusive = (bool *) palloc0(numattrs * sizeof(bool));
+
+	/* lower/upper boundaries */
+	bucket->min = (Datum *) palloc0(numattrs * sizeof(Datum));
+	bucket->max = (Datum *) palloc0(numattrs * sizeof(Datum));
+
+	/* build-data */
+	data = (HistogramBuild) palloc0(sizeof(HistogramBuildData));
+
+	/* number of distinct values (per dimension) */
+	data->ndistincts = (uint32 *) palloc0(numattrs * sizeof(uint32));
+
+	/* all the sample rows fall into the initial bucket */
+	data->numrows = numrows;
+	data->rows = rows;
+
+	bucket->build_data = data;
+
+	/*
+	 * Update the number of ndistinct combinations in the bucket (which we use
+	 * when selecting bucket to partition), and then number of distinct values
+	 * for each partition (which we use when choosing which dimension to
+	 * split).
+	 */
+	update_bucket_ndistinct(bucket, attrs, stats);
+
+	/* Update ndistinct (and also set min/max) for all dimensions. */
+	for (i = 0; i < numattrs; i++)
+		update_dimension_ndistinct(bucket, i, attrs, stats, true);
+
+	return bucket;
+}
+
+/*
+ * Choose the bucket to partition next.
+ *
+ * The current criteria is rather simple, chosen so that the algorithm produces
+ * buckets with about equal frequency and regular size. We select the bucket
+ * with the highest number of distinct values, and then split it by the longest
+ * dimension.
+ *
+ * The distinct values are uniformly mapped to [0,1] interval, and this is used
+ * to compute length of the value range.
+ *
+ * NOTE: This is not the same array used for deduplication, as this contains
+ *		 values for all the tuples from the sample, not just the boundary values.
+ *
+ * Returns either pointer to the bucket selected to be partitioned, or NULL if
+ * there are no buckets that may be split (e.g. if all buckets are too small
+ * or contain too few distinct values).
+ *
+ *
+ * Tricky example
+ * --------------
+ *
+ * Consider this table:
+ *
+ *	   CREATE TABLE t AS SELECT i AS a, i AS b
+ *						   FROM generate_series(1,1000000) s(i);
+ *
+ *	   CREATE STATISTICS s1 ON t (a,b) WITH (histogram);
+ *
+ *	   ANALYZE t;
+ *
+ * It's a very specific (and perhaps artificial) example, because every bucket
+ * always has exactly the same number of distinct values in all dimensions,
+ * which makes the partitioning tricky.
+ *
+ * Then:
+ *
+ *	   SELECT * FROM t WHERE (a < 100) AND (b < 100);
+ *
+ * is estimated to return ~120 rows, while in reality it returns only 99.
+ *
+ *							 QUERY PLAN
+ *	   -------------------------------------------------------------
+ *		Seq Scan on t  (cost=0.00..19425.00 rows=117 width=8)
+ *					   (actual time=0.129..82.776 rows=99 loops=1)
+ *		  Filter: ((a < 100) AND (b < 100))
+ *		  Rows Removed by Filter: 999901
+ *		Planning time: 1.286 ms
+ *		Execution time: 82.984 ms
+ *	   (5 rows)
+ *
+ * So this estimate is reasonably close. Let's change the query to OR clause:
+ *
+ *	   SELECT * FROM t WHERE (a < 100) OR (b < 100);
+ *
+ *							 QUERY PLAN
+ *	   -------------------------------------------------------------
+ *		Seq Scan on t  (cost=0.00..19425.00 rows=8100 width=8)
+ *					   (actual time=0.145..99.910 rows=99 loops=1)
+ *		  Filter: ((a < 100) OR (b < 100))
+ *		  Rows Removed by Filter: 999901
+ *		Planning time: 1.578 ms
+ *		Execution time: 100.132 ms
+ *	   (5 rows)
+ *
+ * That's clearly a much worse estimate. This happens because the histogram
+ * contains buckets like this:
+ *
+ *	   bucket 592  [3 30310] [30134 30593] => [0.000233]
+ *
+ * i.e. the length of "a" dimension is (30310-3)=30307, while the length of "b"
+ * is (30593-30134)=459. So the "b" dimension is much narrower than "a".
+ * Of course, there are also buckets where "b" is the wider dimension.
+ *
+ * This is partially mitigated by selecting the "longest" dimension but that
+ * only happens after we already selected the bucket. So if we never select the
+ * bucket, this optimization does not apply.
+ *
+ * The other reason why this particular example behaves so poorly is due to the
+ * way we actually split the selected bucket. We do attempt to divide the bucket
+ * into two parts containing about the same number of tuples, but that does not
+ * too well when most of the tuples is squashed on one side of the bucket.
+ *
+ * For example for columns with data on the diagonal (i.e. when a=b), we end up
+ * with a narrow bucket on the diagonal and a huge bucket overing the remaining
+ * part (with much lower density).
+ *
+ * So perhaps we need two partitioning strategies - one aiming to split buckets
+ * with high frequency (number of sampled rows), the other aiming to split
+ * "large" buckets. And alternating between them, somehow.
+ *
+ * TODO Consider using similar lower boundary for row count as for simple
+ * histograms, i.e. 300 tuples per bucket.
+ */
+static MVBucket
+select_bucket_to_partition(int nbuckets, MVBucket *buckets)
+{
+	int			i;
+	int			numrows = 0;
+	MVBucket	bucket = NULL;
+
+	for (i = 0; i < nbuckets; i++)
+	{
+		HistogramBuild data = (HistogramBuild) buckets[i]->build_data;
+
+		/* if the number of rows is higher, use this bucket */
+		if ((data->ndistinct > 2) &&
+			(data->numrows > numrows) &&
+			(data->numrows >= MIN_BUCKET_ROWS))
+		{
+			bucket = buckets[i];
+			numrows = data->numrows;
+		}
+	}
+
+	/* may be NULL if there are not buckets with (ndistinct>1) */
+	return bucket;
+}
+
+/*
+ * A simple bucket partitioning implementation - we choose the longest bucket
+ * dimension, measured using the array of distinct values built at the very
+ * beginning of the build.
+ *
+ * We map all the distinct values to a [0,1] interval, uniformly distributed,
+ * and then use this to measure length. It's essentially a number of distinct
+ * values within the range, normalized to [0,1].
+ *
+ * Then we choose a 'middle' value splitting the bucket into two parts with
+ * roughly the same frequency.
+ *
+ * This splits the bucket by tweaking the existing one, and returning the new
+ * bucket (essentially shrinking the existing one in-place and returning the
+ * other "half" as a new bucket). The caller is responsible for adding the new
+ * bucket into the list of buckets.
+ *
+ * There are multiple histogram options, centered around the partitioning
+ * criteria, specifying both how to choose a bucket and the dimension most in
+ * need of a split. For a nice summary and general overview, see "rK-Hist : an
+ * R-Tree based histogram for multi-dimensional selectivity estimation" thesis
+ * by J. A. Lopez, Concordia University, p.34-37 (and possibly p. 32-34 for
+ * explanation of the terms).
+ *
+ * It requires care to prevent splitting only one dimension and not splitting
+ * another one at all (which might happen easily in case of strongly dependent
+ * columns - e.g. y=x). The current algorithm minimizes this, but may still
+ * happen for perfectly dependent examples (when all the dimensions have equal
+ * length, the first one will be selected).
+ *
+ * TODO Should probably consider statistics target for the columns (e.g.
+ * to split dimensions with higher statistics target more frequently).
+ */
+static MVBucket
+partition_bucket(MVBucket bucket, int2vector *attrs,
+				 VacAttrStats **stats,
+				 int *ndistvalues, Datum **distvalues)
+{
+	int			i;
+	int			dimension;
+	int			numattrs = attrs->dim1;
+
+	Datum		split_value;
+	MVBucket	new_bucket;
+	HistogramBuild new_data;
+
+	/* needed for sort, when looking for the split value */
+	bool		isNull;
+	int			nvalues = 0;
+	HistogramBuild data = (HistogramBuild) bucket->build_data;
+	StdAnalyzeData *mystats = NULL;
+	ScalarItem *values = (ScalarItem *) palloc0(data->numrows * sizeof(ScalarItem));
+	SortSupportData ssup;
+
+	int			nrows = 1;		/* number of rows below current value */
+	double		delta;
+
+	/* needed when splitting the values */
+	HeapTuple  *oldrows = data->rows;
+	int			oldnrows = data->numrows;
+
+	/*
+	 * We can't split buckets with a single distinct value (this also
+	 * disqualifies NULL-only dimensions). Also, there has to be multiple
+	 * sample rows (otherwise, how could there be more distinct values).
+	 */
+	Assert(data->ndistinct > 1);
+	Assert(data->numrows > 1);
+	Assert((numattrs >= 2) && (numattrs <= MVSTATS_MAX_DIMENSIONS));
+
+	/* Look for the next dimension to split. */
+	delta = 0.0;
+	dimension = -1;
+
+	for (i = 0; i < numattrs; i++)
+	{
+		Datum	   *a,
+				   *b;
+
+		mystats = (StdAnalyzeData *) stats[i]->extra_data;
+
+		/* initialize sort support, etc. */
+		memset(&ssup, 0, sizeof(ssup));
+		ssup.ssup_cxt = CurrentMemoryContext;
+
+		/* We always use the default collation for statistics */
+		ssup.ssup_collation = DEFAULT_COLLATION_OID;
+		ssup.ssup_nulls_first = false;
+
+		PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+		/* can't split NULL-only dimension */
+		if (bucket->nullsonly[i])
+			continue;
+
+		/* can't split dimension with a single ndistinct value */
+		if (data->ndistincts[i] <= 1)
+			continue;
+
+		/* search for min boundary in the distinct list */
+		a = (Datum *) bsearch_arg(&bucket->min[i],
+								  distvalues[i], ndistvalues[i],
+							   sizeof(Datum), compare_scalars_simple, &ssup);
+
+		b = (Datum *) bsearch_arg(&bucket->max[i],
+								  distvalues[i], ndistvalues[i],
+							   sizeof(Datum), compare_scalars_simple, &ssup);
+
+		/* if this dimension is 'larger' then partition by it */
+		if (((b - a) * 1.0 / ndistvalues[i]) > delta)
+		{
+			delta = ((b - a) * 1.0 / ndistvalues[i]);
+			dimension = i;
+		}
+	}
+
+	/*
+	 * If we haven't found a dimension here, we've done something wrong in
+	 * select_bucket_to_partition.
+	 */
+	Assert(dimension != -1);
+
+	/*
+	 * Walk through the selected dimension, collect and sort the values and
+	 * then choose the value to use as the new boundary.
+	 */
+	mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * remember the index of the sample row, to make the partitioning
+		 * simpler
+		 */
+		values[nvalues].value = heap_getattr(data->rows[i], attrs->values[dimension],
+										 stats[dimension]->tupDesc, &isNull);
+		values[nvalues].tupno = i;
+
+		/* no NULL values allowed here (we never split null-only dimension) */
+		Assert(!isNull);
+
+		nvalues++;
+	}
+
+	/* sort the array of values */
+	qsort_arg((void *) values, nvalues, sizeof(ScalarItem),
+			  compare_scalars_partition, (void *) &ssup);
+
+	/*
+	 * We know there are bucket->ndistincts[dimension] distinct values in this
+	 * dimension, and we want to split this into half, so walk through the
+	 * array and stop once we see (ndistinct/2) values.
+	 *
+	 * We always choose the "next" value, i.e. (n/2+1)-th distinct value, and
+	 * use it as an exclusive upper boundary (and inclusive lower boundary).
+	 *
+	 * TODO Maybe we should use "average" of the two middle distinct values
+	 * (at least for even distinct counts), but that would require being able
+	 * to do an average (which does not work for non-numeric types).
+	 *
+	 * TODO Another option is to look for a split that'd give about 50% tuples
+	 * (not distinct values) in each partition. That might work better when
+	 * there are a few very frequent values, and many rare ones.
+	 */
+	delta = fabs(data->numrows);
+	split_value = values[0].value;
+
+	for (i = 1; i < data->numrows; i++)
+	{
+		if (values[i].value != values[i - 1].value)
+		{
+			/* are we closer to splitting the bucket in half? */
+			if (fabs(i - data->numrows / 2.0) < delta)
+			{
+				/* let's assume we'll use this value for the split */
+				split_value = values[i].value;
+				delta = fabs(i - data->numrows / 2.0);
+				nrows = i;
+			}
+		}
+	}
+
+	Assert(nrows > 0);
+	Assert(nrows < data->numrows);
+
+	/*
+	 * create the new bucket as a (incomplete) copy of the one being
+	 * partitioned.
+	 */
+	new_bucket = copy_mv_bucket(bucket, numattrs);
+	new_data = (HistogramBuild) new_bucket->build_data;
+
+	/*
+	 * Do the actual split of the chosen dimension, using the split value as
+	 * the upper bound for the existing bucket, and lower bound for the new
+	 * one.
+	 */
+	bucket->max[dimension] = split_value;
+	new_bucket->min[dimension] = split_value;
+
+	/*
+	 * We also treat only one side of the new boundary as inclusive, in the
+	 * bucket where it happens to be the upper boundary. We never set the
+	 * min_inclusive[] to false anywhere, but we set it to true anyway.
+	 */
+	bucket->max_inclusive[dimension] = false;
+	new_bucket->min_inclusive[dimension] = true;
+
+	/*
+	 * Redistribute the sample tuples using the 'ScalarItem->tupno' index. We
+	 * know 'nrows' rows should remain in the original bucket and the rest
+	 * goes to the new one.
+	 */
+
+	data->rows = (HeapTuple *) palloc0(nrows * sizeof(HeapTuple));
+	new_data->rows = (HeapTuple *) palloc0((oldnrows - nrows) * sizeof(HeapTuple));
+
+	data->numrows = nrows;
+	new_data->numrows = (oldnrows - nrows);
+
+	/*
+	 * The first nrows should go to the first bucket, the rest should go to
+	 * the new one. Use the tupno field to get the actual HeapTuple row from
+	 * the original array of sample rows.
+	 */
+	for (i = 0; i < nrows; i++)
+		memcpy(&data->rows[i], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	for (i = nrows; i < oldnrows; i++)
+		memcpy(&new_data->rows[i - nrows], &oldrows[values[i].tupno], sizeof(HeapTuple));
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(new_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 * because we know how many distinct values went to each partition.
+	 */
+	for (i = 0; i < numattrs; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(new_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+	pfree(values);
+
+	return new_bucket;
+}
+
+/*
+ * Copy a histogram bucket. The copy does not include the build-time data, i.e.
+ * sampled rows etc.
+ */
+static MVBucket
+copy_mv_bucket(MVBucket bucket, uint32 ndimensions)
+{
+	/* TODO allocate as a single piece (including all the fields) */
+	MVBucket	new_bucket = (MVBucket) palloc0(sizeof(MVBucketData));
+	HistogramBuild data = (HistogramBuild) palloc0(sizeof(HistogramBuildData));
+
+	/*
+	 * Copy only the attributes that will stay the same after the split, and
+	 * we'll recompute the rest after the split.
+	 */
+
+	/* allocate the per-dimension arrays */
+	new_bucket->nullsonly = (bool *) palloc0(ndimensions * sizeof(bool));
+
+	/* inclusiveness boundaries - lower/upper bounds */
+	new_bucket->min_inclusive = (bool *) palloc0(ndimensions * sizeof(bool));
+	new_bucket->max_inclusive = (bool *) palloc0(ndimensions * sizeof(bool));
+
+	/* lower/upper boundaries */
+	new_bucket->min = (Datum *) palloc0(ndimensions * sizeof(Datum));
+	new_bucket->max = (Datum *) palloc0(ndimensions * sizeof(Datum));
+
+	/* copy data */
+	memcpy(new_bucket->nullsonly, bucket->nullsonly, ndimensions * sizeof(bool));
+
+	memcpy(new_bucket->min_inclusive, bucket->min_inclusive, ndimensions * sizeof(bool));
+	memcpy(new_bucket->min, bucket->min, ndimensions * sizeof(Datum));
+
+	memcpy(new_bucket->max_inclusive, bucket->max_inclusive, ndimensions * sizeof(bool));
+	memcpy(new_bucket->max, bucket->max, ndimensions * sizeof(Datum));
+
+	/* allocate and copy the interesting part of the build data */
+	data->ndistincts = (uint32 *) palloc0(ndimensions * sizeof(uint32));
+
+	new_bucket->build_data = data;
+
+	return new_bucket;
+}
+
+/*
+ * Counts the number of distinct values in the bucket. This just copies the
+ * Datum values into a simple array, and sorts them using memcmp-based
+ * comparator. That means it only works for pass-by-value data types (assuming
+ * they don't use collations etc.)
+ */
+static void
+update_bucket_ndistinct(MVBucket bucket, int2vector *attrs, VacAttrStats **stats)
+{
+	int			i,
+				j;
+	int			numattrs = attrs->dim1;
+
+	HistogramBuild data = (HistogramBuild) bucket->build_data;
+	int			numrows = data->numrows;
+
+	MultiSortSupport mss = multi_sort_init(numattrs);
+
+	/*
+	 * We could collect this while walking through all the attributes above
+	 * (this way we have to call heap_getattr twice).
+	 */
+	SortItem   *items = (SortItem *) palloc0(numrows * sizeof(SortItem));
+	Datum	   *values = (Datum *) palloc0(numrows * sizeof(Datum) * numattrs);
+	bool	   *isnull = (bool *) palloc0(numrows * sizeof(bool) * numattrs);
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * numattrs];
+		items[i].isnull = &isnull[i * numattrs];
+	}
+
+	/* prepare the sort function for the first dimension */
+	for (i = 0; i < numattrs; i++)
+		multi_sort_add_dimension(mss, i, i, stats);
+
+	/* collect the values */
+	for (i = 0; i < numrows; i++)
+		for (j = 0; j < numattrs; j++)
+			items[i].values[j]
+				= heap_getattr(data->rows[i], attrs->values[j],
+							   stats[j]->tupDesc, &items[i].isnull[j]);
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	data->ndistinct = 1;
+
+	for (i = 1; i < numrows; i++)
+		if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+			data->ndistinct += 1;
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+}
+
+/*
+ * Count distinct values per bucket dimension.
+ */
+static void
+update_dimension_ndistinct(MVBucket bucket, int dimension, int2vector *attrs,
+						   VacAttrStats **stats, bool update_boundaries)
+{
+	int			j;
+	int			nvalues = 0;
+	bool		isNull;
+	HistogramBuild data = (HistogramBuild) bucket->build_data;
+	Datum	   *values = (Datum *) palloc0(data->numrows * sizeof(Datum));
+	SortSupportData ssup;
+
+	StdAnalyzeData *mystats = (StdAnalyzeData *) stats[dimension]->extra_data;
+
+	/* we may already know this is a NULL-only dimension */
+	if (bucket->nullsonly[dimension])
+		data->ndistincts[dimension] = 1;
+
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(mystats->ltopr, &ssup);
+
+	for (j = 0; j < data->numrows; j++)
+	{
+		values[nvalues] = heap_getattr(data->rows[j], attrs->values[dimension],
+									   stats[dimension]->tupDesc, &isNull);
+
+		/* ignore NULL values */
+		if (!isNull)
+			nvalues++;
+	}
+
+	/* there's always at least 1 distinct value (may be NULL) */
+	data->ndistincts[dimension] = 1;
+
+	/*
+	 * if there are only NULL values in the column, mark it so and continue
+	 * with the next one
+	 */
+	if (nvalues == 0)
+	{
+		pfree(values);
+		bucket->nullsonly[dimension] = true;
+		return;
+	}
+
+	/* sort the array (pass-by-value datum */
+	qsort_arg((void *) values, nvalues, sizeof(Datum),
+			  compare_scalars_simple, (void *) &ssup);
+
+	/*
+	 * Update min/max boundaries to the smallest bounding box. Generally, this
+	 * needs to be done only when constructing the initial bucket.
+	 */
+	if (update_boundaries)
+	{
+		/* store the min/max values */
+		bucket->min[dimension] = values[0];
+		bucket->min_inclusive[dimension] = true;
+
+		bucket->max[dimension] = values[nvalues - 1];
+		bucket->max_inclusive[dimension] = true;
+	}
+
+	/*
+	 * Walk through the array and count distinct values by comparing
+	 * succeeding values.
+	 *
+	 * FIXME This only works for pass-by-value types (i.e. not VARCHARs etc.).
+	 * Although thanks to the deduplication it might work even for those types
+	 * (equal values will get the same item in the deduplicated array).
+	 */
+	for (j = 1; j < nvalues; j++)
+	{
+		if (values[j] != values[j - 1])
+			data->ndistincts[dimension] += 1;
+	}
+
+	pfree(values);
+}
+
+/*
+ * A properly built histogram must not contain buckets mixing NULL and non-NULL
+ * values in a single dimension. Each dimension may either be marked as 'nulls
+ * only', and thus containing only NULL values, or it must not contain any NULL
+ * values.
+ *
+ * Therefore, if the sample contains NULL values in any of the columns, it's
+ * necessary to build those NULL-buckets. This is done in an iterative way
+ * using this algorithm, operating on a single bucket:
+ *
+ *	   (1) Check that all dimensions are well-formed (not mixing NULL and
+ *		   non-NULL values).
+ *
+ *	   (2) If all dimensions are well-formed, terminate.
+ *
+ *	   (3) If the dimension contains only NULL values, but is not marked as
+ *		   NULL-only, mark it as NULL-only and run the algorithm again (on
+ *		   this bucket).
+ *
+ *	   (4) If the dimension mixes NULL and non-NULL values, split the bucket
+ *		   into two parts - one with NULL values, one with non-NULL values
+ *		   (replacing the current one). Then run the algorithm on both buckets.
+ *
+ * This is executed in a recursive manner, but the number of executions should
+ * be quite low - limited by the number of NULL-buckets. Also, in each branch
+ * the number of nested calls is limited by the number of dimensions
+ * (attributes) of the histogram.
+ *
+ * At the end, there should be buckets with no mixed dimensions. The number of
+ * buckets produced by this algorithm is rather limited - with N dimensions,
+ * there may be only 2^N such buckets (each dimension may be either NULL or
+ * non-NULL). So with 8 dimensions (current value of MVSTATS_MAX_DIMENSIONS)
+ * there may be only 256 such buckets.
+ *
+ * After this, a 'regular' bucket-split algorithm shall run, further optimizing
+ * the histogram.
+ */
+static void
+create_null_buckets(MVHistogram histogram, int bucket_idx,
+					int2vector *attrs, VacAttrStats **stats)
+{
+	int			i,
+				j;
+	int			null_dim = -1;
+	int			null_count = 0;
+	bool		null_found = false;
+	MVBucket	bucket,
+				null_bucket;
+	int			null_idx,
+				curr_idx;
+	HistogramBuild data,
+				null_data;
+
+	/* remember original values from the bucket */
+	int			numrows;
+	HeapTuple  *oldrows = NULL;
+
+	Assert(bucket_idx < histogram->nbuckets);
+	Assert(histogram->ndimensions == attrs->dim1);
+
+	bucket = histogram->buckets[bucket_idx];
+	data = (HistogramBuild) bucket->build_data;
+
+	numrows = data->numrows;
+	oldrows = data->rows;
+
+	/*
+	 * Walk through all rows / dimensions, and stop once we find NULL in a
+	 * dimension not yet marked as NULL-only.
+	 */
+	for (i = 0; i < data->numrows; i++)
+	{
+		/*
+		 * FIXME We don't need to start from the first attribute here - we can
+		 * start from the last known dimension.
+		 */
+		for (j = 0; j < histogram->ndimensions; j++)
+		{
+			/* Is this a NULL-only dimension? If yes, skip. */
+			if (bucket->nullsonly[j])
+				continue;
+
+			/* found a NULL in that dimension? */
+			if (heap_attisnull(data->rows[i], attrs->values[j]))
+			{
+				null_found = true;
+				null_dim = j;
+				break;
+			}
+		}
+
+		/* terminate if we found attribute with NULL values */
+		if (null_found)
+			break;
+	}
+
+	/* no regular dimension contains NULL values => we're done */
+	if (!null_found)
+		return;
+
+	/* walk through the rows again, count NULL values in 'null_dim' */
+	for (i = 0; i < data->numrows; i++)
+	{
+		if (heap_attisnull(data->rows[i], attrs->values[null_dim]))
+			null_count += 1;
+	}
+
+	Assert(null_count <= data->numrows);
+
+	/*
+	 * If (null_count == numrows) the dimension already is NULL-only, but is
+	 * not yet marked like that. It's enough to mark it and repeat the process
+	 * recursively (until we run out of dimensions).
+	 */
+	if (null_count == data->numrows)
+	{
+		bucket->nullsonly[null_dim] = true;
+		create_null_buckets(histogram, bucket_idx, attrs, stats);
+		return;
+	}
+
+	/*
+	 * We have to split the bucket into two - one with NULL values in the
+	 * dimension, one with non-NULL values. We don't need to sort the data or
+	 * anything, but otherwise it's similar to what partition_bucket() does.
+	 */
+
+	/* create bucket with NULL-only dimension 'dim' */
+	null_bucket = copy_mv_bucket(bucket, histogram->ndimensions);
+	null_data = (HistogramBuild) null_bucket->build_data;
+
+	/* remember the current array info */
+	oldrows = data->rows;
+	numrows = data->numrows;
+
+	/* we'll keep non-NULL values in the current bucket */
+	data->numrows = (numrows - null_count);
+	data->rows
+		= (HeapTuple *) palloc0(data->numrows * sizeof(HeapTuple));
+
+	/* and the NULL values will go to the new one */
+	null_data->numrows = null_count;
+	null_data->rows
+		= (HeapTuple *) palloc0(null_data->numrows * sizeof(HeapTuple));
+
+	/* mark the dimension as NULL-only (in the new bucket) */
+	null_bucket->nullsonly[null_dim] = true;
+
+	/* walk through the sample rows and distribute them accordingly */
+	null_idx = 0;
+	curr_idx = 0;
+	for (i = 0; i < numrows; i++)
+	{
+		if (heap_attisnull(oldrows[i], attrs->values[null_dim]))
+			/* NULL => copy to the new bucket */
+			memcpy(&null_data->rows[null_idx++], &oldrows[i],
+				   sizeof(HeapTuple));
+		else
+			memcpy(&data->rows[curr_idx++], &oldrows[i],
+				   sizeof(HeapTuple));
+	}
+
+	/* update ndistinct values for the buckets (total and per dimension) */
+	update_bucket_ndistinct(bucket, attrs, stats);
+	update_bucket_ndistinct(null_bucket, attrs, stats);
+
+	/*
+	 * TODO We don't need to do this for the dimension we used for split,
+	 * because we know how many distinct values went to each bucket (NULL is
+	 * not a value, so NULL buckets get 0, and the other bucket got all the
+	 * distinct values).
+	 */
+	for (i = 0; i < histogram->ndimensions; i++)
+	{
+		update_dimension_ndistinct(bucket, i, attrs, stats, false);
+		update_dimension_ndistinct(null_bucket, i, attrs, stats, false);
+	}
+
+	pfree(oldrows);
+
+	/* add the NULL bucket to the histogram */
+	histogram->buckets[histogram->nbuckets++] = null_bucket;
+
+	/*
+	 * And now run the function recursively on both buckets (the new one
+	 * first, because the call may change number of buckets, and it's used as
+	 * an index).
+	 */
+	create_null_buckets(histogram, (histogram->nbuckets - 1), attrs, stats);
+	create_null_buckets(histogram, bucket_idx, attrs, stats);
+}
+
+/*
+ * SRF with details about buckets of a histogram:
+ *
+ * - bucket ID (0...nbuckets)
+ * - min values (string array)
+ * - max values (string array)
+ * - nulls only (boolean array)
+ * - min inclusive flags (boolean array)
+ * - max inclusive flags (boolean array)
+ * - frequency (double precision)
+ *
+ * The input is the OID of the statistics, and there are no rows returned if the
+ * statistics contains no histogram (or if there's no statistics for the OID).
+ *
+ * The second parameter (type) determines what values will be returned
+ * in the (minvals,maxvals). There are three possible values:
+ *
+ * 0 (actual values)
+ * -----------------
+ *	  - prints actual values
+ *	  - using the output function of the data type (as string)
+ *	  - handy for investigating the histogram
+ *
+ * 1 (distinct index)
+ * ------------------
+ *	  - prints index of the distinct value (into the serialized array)
+ *	  - makes it easier to spot neighbor buckets, etc.
+ *	  - handy for plotting the histogram
+ *
+ * 2 (normalized distinct index)
+ * -----------------------------
+ *	  - prints index of the distinct value, but normalized into [0,1]
+ *	  - similar to 1, but shows how 'long' the bucket range is
+ *	  - handy for plotting the histogram
+ *
+ * When plotting the histogram, be careful as the (1) and (2) options skew the
+ * lengths by distributing the distinct values uniformly. For data types
+ * without a clear meaning of 'distance' (e.g. strings) that is not a big deal,
+ * but for numbers it may be confusing.
+ */
+PG_FUNCTION_INFO_V1(pg_mv_histogram_buckets);
+
+#define OUTPUT_FORMAT_RAW		0
+#define OUTPUT_FORMAT_INDEXES	1
+#define OUTPUT_FORMAT_DISTINCT	2
+
+Datum
+pg_mv_histogram_buckets(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	int			call_cntr;
+	int			max_calls;
+	TupleDesc	tupdesc;
+	AttInMetadata *attinmeta;
+
+	Oid			mvoid = PG_GETARG_OID(0);
+	int			otype = PG_GETARG_INT32(1);
+
+	if ((otype < 0) || (otype > 2))
+		elog(ERROR, "invalid output type specified");
+
+	/* stuff done only on the first call of the function */
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext oldcontext;
+		MVSerializedHistogram histogram;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/* switch to memory context appropriate for multiple function calls */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		histogram = load_mv_histogram(mvoid);
+
+		funcctx->user_fctx = histogram;
+
+		/* total number of tuples to be returned */
+		funcctx->max_calls = 0;
+		if (funcctx->user_fctx != NULL)
+			funcctx->max_calls = histogram->nbuckets;
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+
+		/*
+		 * generate attribute metadata needed later to produce tuples from raw
+		 * C strings
+		 */
+		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		funcctx->attinmeta = attinmeta;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	/* stuff done on every call of the function */
+	funcctx = SRF_PERCALL_SETUP();
+
+	call_cntr = funcctx->call_cntr;
+	max_calls = funcctx->max_calls;
+	attinmeta = funcctx->attinmeta;
+
+	if (call_cntr < max_calls)	/* do when there is more left to send */
+	{
+		char	  **values;
+		HeapTuple	tuple;
+		Datum		result;
+		int2vector *stakeys;
+		Oid			relid;
+		double		bucket_volume = 1.0;
+		StringInfo	bufs;
+
+		char	   *format;
+		int			i;
+
+		Oid		   *outfuncs;
+		FmgrInfo   *fmgrinfo;
+
+		MVSerializedHistogram histogram;
+		MVSerializedBucket bucket;
+
+		histogram = (MVSerializedHistogram) funcctx->user_fctx;
+
+		Assert(call_cntr < histogram->nbuckets);
+
+		bucket = histogram->buckets[call_cntr];
+
+		stakeys = find_mv_attnums(mvoid, &relid);
+
+		/*
+		 * The scalar values will be formatted directly, using snprintf.
+		 *
+		 * The 'array' values will be formatted through StringInfo.
+		 */
+		values = (char **) palloc0(9 * sizeof(char *));
+		bufs = (StringInfo) palloc0(9 * sizeof(StringInfoData));
+
+		values[0] = (char *) palloc(64 * sizeof(char));
+
+		initStringInfo(&bufs[1]);		/* lower boundaries */
+		initStringInfo(&bufs[2]);		/* upper boundaries */
+		initStringInfo(&bufs[3]);		/* nulls-only */
+		initStringInfo(&bufs[4]);		/* lower inclusive */
+		initStringInfo(&bufs[5]);		/* upper inclusive */
+
+		values[6] = (char *) palloc(64 * sizeof(char));
+		values[7] = (char *) palloc(64 * sizeof(char));
+		values[8] = (char *) palloc(64 * sizeof(char));
+
+		/* we need to do this only when printing the actual values */
+		outfuncs = (Oid *) palloc0(sizeof(Oid) * histogram->ndimensions);
+		fmgrinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * histogram->ndimensions);
+
+		/*
+		 * lookup output functions for all histogram dimensions
+		 *
+		 * XXX This might be one in the first call and stored in user_fctx.
+		 */
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			bool		isvarlena;
+
+			getTypeOutputInfo(get_atttype(relid, stakeys->values[i]),
+							  &outfuncs[i], &isvarlena);
+
+			fmgr_info(outfuncs[i], &fmgrinfo[i]);
+		}
+
+		snprintf(values[0], 64, "%d", call_cntr);		/* bucket ID */
+
+		/*
+		 * for the arrays of lower/upper boundaries, formated according to
+		 * otype
+		 */
+		for (i = 0; i < histogram->ndimensions; i++)
+		{
+			Datum	   *vals = histogram->values[i];
+
+			uint16		minidx = bucket->min[i];
+			uint16		maxidx = bucket->max[i];
+
+			/*
+			 * compute bucket volume, using distinct values as a measure
+			 *
+			 * XXX Not really sure what to do for NULL dimensions here, so
+			 * let's simply count them as '1'.
+			 */
+			bucket_volume
+				*= (double) (maxidx - minidx + 1) / (histogram->nvalues[i] - 1);
+
+			if (i == 0)
+				format = "{%s"; /* fist dimension */
+			else if (i < (histogram->ndimensions - 1))
+				format = ", %s";	/* medium dimensions */
+			else
+				format = ", %s}";		/* last dimension */
+
+			appendStringInfo(&bufs[3], format, bucket->nullsonly[i] ? "t" : "f");
+			appendStringInfo(&bufs[4], format, bucket->min_inclusive[i] ? "t" : "f");
+			appendStringInfo(&bufs[5], format, bucket->max_inclusive[i] ? "t" : "f");
+
+			/*
+			 * for NULL-only  dimension, simply put there the NULL and
+			 * continue
+			 */
+			if (bucket->nullsonly[i])
+			{
+				if (i == 0)
+					format = "{%s";
+				else if (i < (histogram->ndimensions - 1))
+					format = ", %s";
+				else
+					format = ", %s}";
+
+				appendStringInfo(&bufs[1], format, "NULL");
+				appendStringInfo(&bufs[2], format, "NULL");
+
+				continue;
+			}
+
+			/* otherwise we really need to format the value */
+			switch (otype)
+			{
+				case OUTPUT_FORMAT_RAW: /* actual boundary values */
+
+					if (i == 0)
+						format = "{%s";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %s";
+					else
+						format = ", %s}";
+
+					appendStringInfo(&bufs[1], format,
+								  FunctionCall1(&fmgrinfo[i], vals[minidx]));
+
+					appendStringInfo(&bufs[2], format,
+								  FunctionCall1(&fmgrinfo[i], vals[maxidx]));
+
+					break;
+
+				case OUTPUT_FORMAT_INDEXES:		/* indexes into deduplicated
+												 * arrays */
+
+					if (i == 0)
+						format = "{%d";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %d";
+					else
+						format = ", %d}";
+
+					appendStringInfo(&bufs[1], format, minidx);
+
+					appendStringInfo(&bufs[2], format, maxidx);
+
+					break;
+
+				case OUTPUT_FORMAT_DISTINCT:	/* distinct arrays as measure */
+
+					if (i == 0)
+						format = "{%f";
+					else if (i < (histogram->ndimensions - 1))
+						format = ", %f";
+					else
+						format = ", %f}";
+
+					appendStringInfo(&bufs[1], format,
+							   (minidx * 1.0 / (histogram->nvalues[i] - 1)));
+
+					appendStringInfo(&bufs[2], format,
+							   (maxidx * 1.0 / (histogram->nvalues[i] - 1)));
+
+					break;
+
+				default:
+					elog(ERROR, "unknown output type: %d", otype);
+			}
+		}
+
+		values[1] = bufs[1].data;
+		values[2] = bufs[2].data;
+		values[3] = bufs[3].data;
+		values[4] = bufs[4].data;
+		values[5] = bufs[5].data;
+
+		snprintf(values[6], 64, "%f", bucket->ntuples); /* frequency */
+		snprintf(values[7], 64, "%f", bucket->ntuples / bucket_volume); /* density */
+		snprintf(values[8], 64, "%f", bucket_volume);	/* volume (as a
+														 * fraction) */
+
+		/* build a tuple */
+		tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		/* make the tuple into a datum */
+		result = HeapTupleGetDatum(tuple);
+
+		/* clean up (this is not really necessary) */
+		pfree(values[0]);
+		pfree(values[6]);
+		pfree(values[7]);
+		pfree(values[8]);
+
+		resetStringInfo(&bufs[1]);
+		resetStringInfo(&bufs[2]);
+		resetStringInfo(&bufs[3]);
+		resetStringInfo(&bufs[4]);
+		resetStringInfo(&bufs[5]);
+
+		pfree(bufs);
+		pfree(values);
+
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+	else	/* do when there is no more left */
+	{
+		SRF_RETURN_DONE(funcctx);
+	}
+}
+
+/*
+ * pg_histogram_in		- input routine for type pg_histogram.
+ *
+ * pg_histogram is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_histogram_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_histogram")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_histogram		- output routine for type PG_HISTOGRAM.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ *
+ * FIXME not implemented yet, returning dummy value
+ */
+Datum
+pg_histogram_out(PG_FUNCTION_ARGS)
+{
+	return byteaout(fcinfo);
+}
+
+/*
+ * pg_histogram_recv		- binary input routine for type pg_histogram.
+ */
+Datum
+pg_histogram_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_histogram")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_histogram_send		- binary output routine for type pg_histogram.
+ *
+ * XXX Histograms are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_histogram_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+#ifdef DEBUG_MVHIST
+/*
+ * prints debugging info about matched histogram buckets (full/partial)
+ *
+ * XXX Currently works only for INT data type.
+ */
+void
+debug_histogram_matches(MVSerializedHistogram mvhist, char *matches)
+{
+	int			i,
+				j;
+
+	float		ffull = 0,
+				fpartial = 0;
+	int			nfull = 0,
+				npartial = 0;
+
+	StringInfoData buf;
+
+	initStringInfo(&buf);
+
+	for (i = 0; i < mvhist->nbuckets; i++)
+	{
+		MVSerializedBucket bucket = mvhist->buckets[i];
+
+		if (!matches[i])
+			continue;
+
+		/* increment the counters */
+		nfull += (matches[i] == MVSTATS_MATCH_FULL) ? 1 : 0;
+		npartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? 1 : 0;
+
+		/* and also update the frequencies */
+		ffull += (matches[i] == MVSTATS_MATCH_FULL) ? bucket->ntuples : 0;
+		fpartial += (matches[i] == MVSTATS_MATCH_PARTIAL) ? bucket->ntuples : 0;
+
+		resetStringInfo(&buf);
+
+		/* build ranges for all the dimentions */
+		for (j = 0; j < mvhist->ndimensions; j++)
+		{
+			appendStringInfo(&buf, '[%d %d]',
+							 DatumGetInt32(mvhist->values[j][bucket->min[j]]),
+						   DatumGetInt32(mvhist->values[j][bucket->max[j]]));
+		}
+
+		elog(WARNING, "bucket %d %s => %d [%f]", i, buf.data, matches[i], bucket->ntuples);
+	}
+
+	elog(WARNING, "full=%f partial=%f (%f)", ffull, fpartial, (ffull + 0.5 * fpartial));
+}
+#endif
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index db74d93..cf73aec 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2298,8 +2298,8 @@ describeOneTableDetails(const char *schemaname,
 		{
 			printfPQExpBuffer(&buf,
 							  "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
-							  "  ndist_enabled, deps_enabled, mcv_enabled,\n"
-							  "  ndist_built, deps_built, mcv_built,\n"
+							  "  ndist_enabled, deps_enabled, mcv_enabled, hist_enabled,\n"
+							  "  ndist_built, deps_built, mcv_built, hist_built,\n"
 							  "  (SELECT string_agg(attname::text,', ')\n"
 						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
 							  "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
@@ -2342,8 +2342,17 @@ describeOneTableDetails(const char *schemaname,
 						first = false;
 					}
 
+					if (!strcmp(PQgetvalue(result, i, 6), "t"))
+					{
+						if (!first)
+							appendPQExpBuffer(&buf, ", histogram");
+						else
+							appendPQExpBuffer(&buf, "(histogram");
+						first = false;
+					}
+
 					appendPQExpBuffer(&buf, ") ON (%s)",
-									  PQgetvalue(result, i, 9));
+									  PQgetvalue(result, i, 12));
 
 					printTableAddFooter(&cont, buf.data);
 				}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 80d8ea2..f62ba50 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -266,6 +266,9 @@ DATA(insert (  3358	 25    0 i i ));
 DATA(insert (  441	 17    0 i b ));
 DATA(insert (  441	 25    0 i i ));
 
+/* pg_histogram can be coerced to, but not from, bytea */
+DATA(insert (  774	 17    0 i b ));
+
 
 /*
  * Datetime category
diff --git a/src/include/catalog/pg_mv_statistic.h b/src/include/catalog/pg_mv_statistic.h
index 34049d6..d30d3cd9 100644
--- a/src/include/catalog/pg_mv_statistic.h
+++ b/src/include/catalog/pg_mv_statistic.h
@@ -40,11 +40,13 @@ CATALOG(pg_mv_statistic,3381)
 	bool		ndist_enabled;	/* build ndist coefficient? */
 	bool		deps_enabled;	/* analyze dependencies? */
 	bool		mcv_enabled;	/* build MCV list? */
+	bool		hist_enabled;	/* build histogram? */
 
 	/* statistics that are available (if requested) */
 	bool		ndist_built;	/* ndistinct coeff built */
 	bool		deps_built;		/* dependencies were built */
 	bool		mcv_built;		/* MCV list was built */
+	bool		hist_built;		/* histogram was built */
 
 	/*
 	 * variable-length fields start here, but we allow direct access to
@@ -56,6 +58,7 @@ CATALOG(pg_mv_statistic,3381)
 	pg_ndistinct		standist;		/* ndistinct coeff (serialized) */
 	pg_dependencies		stadeps;		/* dependencies (serialized) */
 	pg_mcv_list			stamcv;			/* MCV list (serialized) */
+	pg_histogram		stahist;		/* MV histogram (serialized) */
 #endif
 
 } FormData_pg_mv_statistic;
@@ -71,7 +74,7 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
  *		compiler constants for pg_mv_statistic
  * ----------------
  */
-#define Natts_pg_mv_statistic					14
+#define Natts_pg_mv_statistic					17
 #define Anum_pg_mv_statistic_starelid			1
 #define Anum_pg_mv_statistic_staname			2
 #define Anum_pg_mv_statistic_stanamespace		3
@@ -79,12 +82,15 @@ typedef FormData_pg_mv_statistic *Form_pg_mv_statistic;
 #define Anum_pg_mv_statistic_ndist_enabled		5
 #define Anum_pg_mv_statistic_deps_enabled		6
 #define Anum_pg_mv_statistic_mcv_enabled		7
-#define Anum_pg_mv_statistic_ndist_built		8
-#define Anum_pg_mv_statistic_deps_built			9
-#define Anum_pg_mv_statistic_mcv_built			10
-#define Anum_pg_mv_statistic_stakeys			11
-#define Anum_pg_mv_statistic_standist			12
-#define Anum_pg_mv_statistic_stadeps			13
-#define Anum_pg_mv_statistic_stamcv				14
+#define Anum_pg_mv_statistic_hist_enabled		8
+#define Anum_pg_mv_statistic_ndist_built		9
+#define Anum_pg_mv_statistic_deps_built			10
+#define Anum_pg_mv_statistic_mcv_built			11
+#define Anum_pg_mv_statistic_hist_built			12
+#define Anum_pg_mv_statistic_stakeys			13
+#define Anum_pg_mv_statistic_standist			14
+#define Anum_pg_mv_statistic_stadeps			15
+#define Anum_pg_mv_statistic_stamcv				16
+#define Anum_pg_mv_statistic_stahist			17
 
 #endif   /* PG_MV_STATISTIC_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 7cf1e5a..653bf1a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2730,6 +2730,10 @@ DATA(insert OID = 3376 (  pg_mv_stats_mcvlist_info	PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("multi-variate statistics: MCV list info");
 DATA(insert OID = 3373 (  pg_mv_mcv_items PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 1 0 2249 "26" "{26,23,1009,1000,701}" "{i,o,o,o,o}" "{oid,index,values,nulls,frequency}" _null_ _null_ pg_mv_mcv_items _null_ _null_ _null_ ));
 DESCR("details about MCV list items");
+DATA(insert OID = 3375 (  pg_mv_stats_histogram_info	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 25 "774" _null_ _null_ _null_ _null_ _null_ pg_mv_stats_histogram_info _null_ _null_ _null_ ));
+DESCR("multi-variate statistics: histogram info");
+DATA(insert OID = 3374 (  pg_mv_histogram_buckets PGNSP PGUID 12 1 1000 0 0 f f f f t t i s 2 0 2249 "26 23" "{26,23,23,1009,1009,1000,1000,1000,701,701,701}" "{i,i,o,o,o,o,o,o,o,o,o}" "{oid,otype,index,minvals,maxvals,nullsonly,mininclusive,maxinclusive,frequency,density,bucket_volume}" _null_ _null_ pg_mv_histogram_buckets _null_ _null_ _null_ ));
+DESCR("details about histogram buckets");
 
 DATA(insert OID = 3354 (  pg_ndistinct_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3353 "2275" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_in _null_ _null_ _null_ ));
 DESCR("I/O");
@@ -2758,6 +2762,15 @@ DESCR("I/O");
 DATA(insert OID = 445 (  pg_mcv_list_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "441" _null_ _null_ _null_ _null_ _null_	pg_mcv_list_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 775 (  pg_histogram_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 774 "2275" _null_ _null_ _null_ _null_ _null_ pg_histogram_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 776 (  pg_histogram_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "774" _null_ _null_ _null_ _null_ _null_ pg_histogram_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 777 (  pg_histogram_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 774 "2281" _null_ _null_ _null_ _null_ _null_ pg_histogram_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 778 (  pg_histogram_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "774" _null_ _null_ _null_ _null_ _null_	pg_histogram_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index fbac135..7133862 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -376,6 +376,10 @@ DATA(insert OID = 441 ( pg_mcv_list		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_mcv_
 DESCR("multivariate MCV list");
 #define PGMCVLISTOID	441
 
+DATA(insert OID = 774 ( pg_histogram		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_histogram_in pg_histogram_out pg_histogram_recv pg_histogram_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate histogram");
+#define PGHISTOGRAMOID	774
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d912827..f99f547 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -684,11 +684,13 @@ typedef struct MVStatisticInfo
 	bool		ndist_enabled;	/* ndistinct coefficient enabled */
 	bool		deps_enabled;	/* functional dependencies enabled */
 	bool		mcv_enabled;	/* MCV list enabled */
+	bool		hist_enabled;	/* histogram enabled */
 
 	/* built/available statistics */
 	bool		ndist_built;	/* ndistinct coefficient built */
 	bool		deps_built;		/* functional dependencies built */
 	bool		mcv_built;		/* MCV list built */
+	bool		hist_built;		/* histogram built */
 
 	/* columns in the statistics (attnums) */
 	int2vector *stakeys;		/* attnums of the columns covered */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 9ed080a..1c7925b 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -81,6 +81,10 @@ extern Datum pg_mcv_list_in(PG_FUNCTION_ARGS);
 extern Datum pg_mcv_list_out(PG_FUNCTION_ARGS);
 extern Datum pg_mcv_list_recv(PG_FUNCTION_ARGS);
 extern Datum pg_mcv_list_send(PG_FUNCTION_ARGS);
+extern Datum pg_histogram_in(PG_FUNCTION_ARGS);
+extern Datum pg_histogram_out(PG_FUNCTION_ARGS);
+extern Datum pg_histogram_recv(PG_FUNCTION_ARGS);
+extern Datum pg_histogram_send(PG_FUNCTION_ARGS);
 
 /* regexp.c */
 extern char *regexp_fixed_prefix(text *text_re, bool case_insensitive,
diff --git a/src/include/utils/mvstats.h b/src/include/utils/mvstats.h
index 0c4f621..5d8c024 100644
--- a/src/include/utils/mvstats.h
+++ b/src/include/utils/mvstats.h
@@ -18,7 +18,7 @@
 #include "commands/vacuum.h"
 
 /*
- * Degree of how much MCV item matches a clause.
+ * Degree of how much MCV item / histogram bucket matches a clause.
  * This is then considered when computing the selectivity.
  */
 #define MVSTATS_MATCH_NONE		0		/* no match at all */
@@ -114,19 +114,133 @@ bool dependency_implies_attribute(MVDependency dependency, AttrNumber attnum,
 bool dependency_is_fully_matched(MVDependency dependency, Bitmapset *attnums,
 								 int16 *attmap);
 
+/* used to flag stats serialized to bytea */
+#define MVSTAT_HIST_MAGIC		0x7F8C5670		/* marks serialized bytea */
+#define MVSTAT_HIST_TYPE_BASIC	1				/* basic histogram type */
+
+/* max buckets in a histogram (mostly arbitrary number */
+#define MVSTAT_HIST_MAX_BUCKETS 16384
+
+/*
+ * Multivariate histograms
+ */
+typedef struct MVBucketData
+{
+
+	/* Frequencies of this bucket. */
+	float		ntuples;		/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool	   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	Datum	   *min;
+	bool	   *min_inclusive;
+
+	/* upper boundaries - values and information about the inequalities */
+	Datum	   *max;
+	bool	   *max_inclusive;
+
+	/* used when building the histogram (not serialized/deserialized) */
+	void	   *build_data;
+
+} MVBucketData;
+
+typedef MVBucketData *MVBucket;
+
+
+typedef struct MVHistogramData
+{
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	MVBucket   *buckets;		/* array of buckets */
+
+} MVHistogramData;
+
+typedef MVHistogramData *MVHistogram;
+
+/*
+ * Histogram in a partially serialized form, with deduplicated boundary
+ * values etc.
+ *
+ * TODO add more detailed description here
+ */
+
+typedef struct MVSerializedBucketData
+{
+
+	/* Frequencies of this bucket. */
+	float		ntuples;		/* frequency of tuples tuples */
+
+	/*
+	 * Information about dimensions being NULL-only. Not yet used.
+	 */
+	bool	   *nullsonly;
+
+	/* lower boundaries - values and information about the inequalities */
+	uint16	   *min;
+	bool	   *min_inclusive;
+
+	/*
+	 * indexes of upper boundaries - values and information about the
+	 * inequalities (exclusive vs. inclusive)
+	 */
+	uint16	   *max;
+	bool	   *max_inclusive;
+
+} MVSerializedBucketData;
+
+typedef MVSerializedBucketData *MVSerializedBucket;
+
+typedef struct MVSerializedHistogramData
+{
+
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of histogram (BASIC) */
+	uint32		nbuckets;		/* number of buckets (buckets array) */
+	uint32		ndimensions;	/* number of dimensions */
+
+	/*
+	 * keep this the same with MVHistogramData, because of deserialization
+	 * (same offset)
+	 */
+	MVSerializedBucket *buckets;	/* array of buckets */
+
+	/*
+	 * serialized boundary values, one array per dimension, deduplicated (the
+	 * min/max indexes point into these arrays)
+	 */
+	int		   *nvalues;
+	Datum	  **values;
+
+} MVSerializedHistogramData;
+
+typedef MVSerializedHistogramData *MVSerializedHistogram;
+
+
 MVNDistinct		load_mv_ndistinct(Oid mvoid);
 MVDependencies	load_mv_dependencies(Oid mvoid);
 MCVList			load_mv_mcvlist(Oid mvoid);
+MVSerializedHistogram load_mv_histogram(Oid mvoid);
 
 bytea *serialize_mv_ndistinct(MVNDistinct ndistinct);
 bytea *serialize_mv_dependencies(MVDependencies dependencies);
 bytea *serialize_mv_mcvlist(MCVList mcvlist, int2vector *attrs,
 							VacAttrStats **stats);
+bytea *serialize_mv_histogram(MVHistogram histogram, int2vector *attrs,
+							VacAttrStats **stats);
 
 /* deserialization of stats (serialization is private to analyze) */
 MVNDistinct deserialize_mv_ndistinct(bytea *data);
 MVDependencies deserialize_mv_dependencies(bytea *data);
 MCVList deserialize_mv_mcvlist(bytea *data);
+MVSerializedHistogram deserialize_mv_histogram(bytea * data);
 
 /*
  * Returns index of the attribute number within the vector (i.e. a
@@ -139,6 +253,8 @@ int2vector *find_mv_attnums(Oid mvoid, Oid *relid);
 /* functions for inspecting the statistics */
 extern Datum pg_mv_stats_mcvlist_info(PG_FUNCTION_ARGS);
 extern Datum pg_mv_mcvlist_items(PG_FUNCTION_ARGS);
+extern Datum pg_mv_stats_histogram_info(PG_FUNCTION_ARGS);
+extern Datum pg_mv_histogram_buckets(PG_FUNCTION_ARGS);
 
 
 MVNDistinct build_mv_ndistinct(double totalrows, int numrows, HeapTuple *rows,
@@ -151,8 +267,15 @@ MVDependencies build_mv_dependencies(int numrows, HeapTuple *rows,
 MCVList build_mv_mcvlist(int numrows, HeapTuple *rows, int2vector *attrs,
 				 VacAttrStats **stats, int *numrows_filtered);
 
+MVHistogram build_mv_histogram(int numrows, HeapTuple *rows, int2vector *attrs,
+				   VacAttrStats **stats, int numrows_total);
+
 void build_mv_stats(Relation onerel, double totalrows,
 			   int numrows, HeapTuple *rows,
 			   int natts, VacAttrStats **vacattrstats);
 
+#ifdef DEBUG_MVHIST
+extern void debug_histogram_matches(MVSerializedHistogram mvhist, char *matches);
+#endif
+
 #endif
diff --git a/src/test/regress/expected/mv_histogram.out b/src/test/regress/expected/mv_histogram.out
new file mode 100644
index 0000000..16410ce
--- /dev/null
+++ b/src/test/regress/expected/mv_histogram.out
@@ -0,0 +1,198 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+-- unknown column
+CREATE STATISTICS s7 WITH (histogram) ON (unknown_column) FROM mv_histogram;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s7 WITH (histogram) ON (a) FROM mv_histogram;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s7 WITH (histogram) ON (a, a) FROM mv_histogram;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s7 WITH (histogram) ON (a, a, b) FROM mv_histogram;
+ERROR:  duplicate column name in statistics definition
+-- unknown option
+CREATE STATISTICS s7 WITH (unknown_option) ON (a, b, c) FROM mv_histogram;
+ERROR:  unrecognized STATISTICS option "unknown_option"
+-- correct command
+CREATE STATISTICS s7 WITH (histogram) ON (a, b, c) FROM mv_histogram;
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/100, i/200, i/400 FROM generate_series(1,30000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+                 QUERY PLAN                 
+--------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = 10) AND (b = 5))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = 10) AND (b = 5))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+CREATE STATISTICS s8 WITH (histogram) ON (a, b, c) FROM mv_histogram;
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/100, i/200, i/400 FROM generate_series(1,30000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a = '10'::text) AND (b = '5'::text))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a = '10'::text) AND (b = '5'::text))
+(4 rows)
+
+TRUNCATE mv_histogram;
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/100 = 0 THEN NULL ELSE i/100 END),
+       (CASE WHEN i/200 = 0 THEN NULL ELSE i/200 END),
+       (CASE WHEN i/400 = 0 THEN NULL ELSE i/400 END)
+     FROM generate_series(1,30000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on mv_histogram
+   Recheck Cond: ((a IS NULL) AND (b IS NULL))
+   ->  Bitmap Index Scan on hist_idx
+         Index Cond: ((a IS NULL) AND (b IS NULL))
+(4 rows)
+
+DROP TABLE mv_histogram;
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+CREATE STATISTICS s9 WITH (histogram) ON (a, b, c, d) FROM mv_histogram;
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+ hist_enabled | hist_built 
+--------------+------------
+ t            | t
+(1 row)
+
+DROP TABLE mv_histogram;
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9969c10..a9d8163 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -820,11 +820,12 @@ WHERE c.castmethod = 'b' AND
  pg_ndistinct      | bytea             |        0 | i
  pg_dependencies   | bytea             |        0 | i
  pg_mcv_list       | bytea             |        0 | i
+ pg_histogram      | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(10 rows)
+(11 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2e3c40e..27e903c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1383,7 +1383,9 @@ pg_mv_stats| SELECT n.nspname AS schemaname,
     length((s.standist)::bytea) AS ndistbytes,
     length((s.stadeps)::bytea) AS depsbytes,
     length((s.stamcv)::bytea) AS mcvbytes,
-    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo
+    pg_mv_stats_mcvlist_info(s.stamcv) AS mcvinfo,
+    length((s.stahist)::bytea) AS histbytes,
+    pg_mv_stats_histogram_info(s.stahist) AS histinfo
    FROM ((pg_mv_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index dde15b9..4d3c4d7 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -73,8 +73,9 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
  3353 | pg_ndistinct
  3358 | pg_dependencies
   441 | pg_mcv_list
+  774 | pg_histogram
   210 | smgr
-(5 rows)
+(6 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index d805840..36dd618 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -118,4 +118,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_ndistinct mv_dependencies mv_mcv
+test: mv_ndistinct mv_dependencies mv_mcv mv_histogram
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 72c6acd..34f5467 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -174,3 +174,4 @@ test: stats
 test: mv_ndistinct
 test: mv_dependencies
 test: mv_mcv
+test: mv_histogram
diff --git a/src/test/regress/sql/mv_histogram.sql b/src/test/regress/sql/mv_histogram.sql
new file mode 100644
index 0000000..55197cb
--- /dev/null
+++ b/src/test/regress/sql/mv_histogram.sql
@@ -0,0 +1,167 @@
+-- data type passed by value
+CREATE TABLE mv_histogram (
+    a INT,
+    b INT,
+    c INT
+);
+
+-- unknown column
+CREATE STATISTICS s7 WITH (histogram) ON (unknown_column) FROM mv_histogram;
+
+-- single column
+CREATE STATISTICS s7 WITH (histogram) ON (a) FROM mv_histogram;
+
+-- single column, duplicated
+CREATE STATISTICS s7 WITH (histogram) ON (a, a) FROM mv_histogram;
+
+-- two columns, one duplicated
+CREATE STATISTICS s7 WITH (histogram) ON (a, a, b) FROM mv_histogram;
+
+-- unknown option
+CREATE STATISTICS s7 WITH (unknown_option) ON (a, b, c) FROM mv_histogram;
+
+-- correct command
+CREATE STATISTICS s7 WITH (histogram) ON (a, b, c) FROM mv_histogram;
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/100, i/200, i/400 FROM generate_series(1,30000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = 10 AND b = 5;
+
+DROP TABLE mv_histogram;
+
+-- varlena type (text)
+CREATE TABLE mv_histogram (
+    a TEXT,
+    b TEXT,
+    c TEXT
+);
+
+CREATE STATISTICS s8 WITH (histogram) ON (a, b, c) FROM mv_histogram;
+
+-- random data (no functional dependencies)
+INSERT INTO mv_histogram
+     SELECT mod(i, 111), mod(i, 123), mod(i, 23) FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c, b => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- a => b, a => c
+INSERT INTO mv_histogram
+     SELECT i/10, i/150, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan)
+INSERT INTO mv_histogram
+     SELECT i/100, i/200, i/400 FROM generate_series(1,30000) s(i);
+CREATE INDEX hist_idx ON mv_histogram (a, b);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a = '10' AND b = '5';
+
+TRUNCATE mv_histogram;
+
+-- check explain (expect bitmap index scan, not plain index scan) with NULLs
+INSERT INTO mv_histogram
+     SELECT
+       (CASE WHEN i/100 = 0 THEN NULL ELSE i/100 END),
+       (CASE WHEN i/200 = 0 THEN NULL ELSE i/200 END),
+       (CASE WHEN i/400 = 0 THEN NULL ELSE i/400 END)
+     FROM generate_series(1,30000) s(i);
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT * FROM mv_histogram WHERE a IS NULL AND b IS NULL;
+
+DROP TABLE mv_histogram;
+
+-- NULL values (mix of int and text columns)
+CREATE TABLE mv_histogram (
+    a INT,
+    b TEXT,
+    c INT,
+    d TEXT
+);
+
+CREATE STATISTICS s9 WITH (histogram) ON (a, b, c, d) FROM mv_histogram;
+
+INSERT INTO mv_histogram
+     SELECT
+         mod(i, 100),
+         (CASE WHEN mod(i, 200) = 0 THEN NULL ELSE mod(i,200) END),
+         mod(i, 400),
+         (CASE WHEN mod(i, 300) = 0 THEN NULL ELSE mod(i,600) END)
+     FROM generate_series(1,10000) s(i);
+
+ANALYZE mv_histogram;
+
+SELECT hist_enabled, hist_built
+  FROM pg_mv_statistic WHERE starelid = 'mv_histogram'::regclass;
+
+DROP TABLE mv_histogram;
-- 
2.5.5

0007-WIP-use-ndistinct-for-selectivity-estimation-in--v23.patchbinary/octet-stream; name=0007-WIP-use-ndistinct-for-selectivity-estimation-in--v23.patchDownload

From 20a213b27d337bed72256589f0267f30a7724fe8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Thu, 27 Oct 2016 15:24:42 +0200
Subject: [PATCH 7/9] WIP: use ndistinct for selectivity estimation in
 clausesel.c

---
 src/backend/optimizer/path/clausesel.c | 382 ++++++++++++++++++++++++++-------
 1 file changed, 299 insertions(+), 83 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index fddbcc4..da5c340 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -47,9 +47,10 @@ typedef struct RangeQueryClause
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
 
-#define		STATS_TYPE_FDEPS	0x01
-#define		STATS_TYPE_MCV		0x02
-#define		STATS_TYPE_HIST		0x04
+#define		STATS_TYPE_NDIST	0x01
+#define		STATS_TYPE_FDEPS	0x02
+#define		STATS_TYPE_MCV		0x04
+#define		STATS_TYPE_HIST		0x08
 
 static bool clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums,
 						int type);
@@ -70,6 +71,10 @@ static List *clauselist_mv_split(PlannerInfo *root, Index relid,
 static Selectivity clauselist_mv_selectivity(PlannerInfo *root,
 						  List *clauses, MVStatisticInfo *mvstats);
 
+static Selectivity clauselist_mv_selectivity_ndist(PlannerInfo *root,
+						Index relid, List *clauses, MVStatisticInfo *mvstats,
+						Index varRelid, JoinType jointype, SpecialJoinInfo *sjinfo);
+
 static Selectivity clauselist_mv_selectivity_deps(PlannerInfo *root,
 						Index relid, List *clauses, MVStatisticInfo *mvstats,
 						Index varRelid, JoinType jointype, SpecialJoinInfo *sjinfo);
@@ -282,6 +287,37 @@ clauselist_selectivity(PlannerInfo *root,
 		}
 	}
 
+	/* And finally, try to use ndistinct coefficients. */
+	if (has_stats(stats, STATS_TYPE_NDIST) &&
+		(count_mv_attnums(clauses, relid, STATS_TYPE_NDIST) >= 2))
+	{
+		MVStatisticInfo *mvstat;
+		Bitmapset  *mvattnums;
+
+		/* collect attributes from the compatible conditions */
+		mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_NDIST);
+
+		/* and search for the statistic covering the most attributes */
+		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_NDIST);
+
+		if (mvstat != NULL)		/* we have a matching stats */
+		{
+			/* clauses compatible with multi-variate stats */
+			List	   *mvclauses = NIL;
+
+			/* split the clauselist into regular and mv-clauses */
+			clauses = clauselist_mv_split(root, relid, clauses, &mvclauses,
+										  mvstat, STATS_TYPE_NDIST);
+
+			/* we've chosen the histogram to match the clauses */
+			Assert(mvclauses != NIL);
+
+			/* compute the multivariate stats (dependencies) */
+			s1 *= clauselist_mv_selectivity_ndist(root, relid, mvclauses, mvstat,
+												  varRelid, jointype, sjinfo);
+		}
+	}
+
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
 	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
@@ -939,6 +975,261 @@ clause_selectivity(PlannerInfo *root,
 	return s1;
 }
 
+
+/*
+ * estimate selectivity of clauses using multivariate statistic
+ *
+ * Perform estimation of the clauses using a MCV list.
+ *
+ * This assumes all the clauses are compatible with the selected statistics
+ * (e.g. only reference columns covered by the statistics, use supported
+ * operator, etc.).
+ *
+ * TODO: We may support some additional conditions, most importantly those
+ * matching multiple columns (e.g. "a = b" or "a < b").
+ *
+ * TODO: Clamp the selectivity by min of the per-clause selectivities (i.e. the
+ * selectivity of the most restrictive clause), because that's the maximum
+ * we can ever get from ANDed list of clauses. This may probably prevent
+ * issues with hitting too many buckets and low precision histograms.
+ *
+ * TODO: We may remember the lowest frequency in the MCV list, and then later
+ * use it as a upper boundary for the selectivity (had there been a more
+ * frequent item, it'd be in the MCV list). This might improve cases with
+ * low-detail histograms.
+ *
+ * TODO: We may also derive some additional boundaries for the selectivity from
+ * the MCV list, because
+ *
+ * (a) if we have a "full equality condition" (one equality condition on
+ * each column of the statistic) and we found a match in the MCV list,
+ * then this is the final selectivity (and pretty accurate),
+ *
+ * (b) if we have a "full equality condition" and we haven't found a match
+ * in the MCV list, then the selectivity is below the lowest frequency
+ * found in the MCV list,
+ *
+ * TODO: When applying the clauses to the histogram/MCV list, we can do that
+ * from the most selective clauses first, because that'll eliminate the
+ * buckets/items sooner (so we'll be able to skip them without inspection,
+ * which is more expensive). But this requires really knowing the per-clause
+ * selectivities in advance, and that's not what we do now.
+ */
+static Selectivity
+clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
+{
+	bool		fullmatch = false;
+	Selectivity s1 = 0.0,
+				s2 = 0.0;
+
+	/*
+	 * Lowest frequency in the MCV list (may be used as an upper bound for
+	 * full equality conditions that did not match any MCV item).
+	 */
+	Selectivity mcv_low = 0.0;
+
+	/*
+	 * TODO: Evaluate simple 1D selectivities, use the smallest one as an
+	 * upper bound, product as lower bound, and sort the clauses in ascending
+	 * order by selectivity (to optimize the MCV/histogram evaluation).
+	 */
+
+	/* Evaluate the MCV first. */
+	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
+										   &fullmatch, &mcv_low);
+
+	/*
+	 * If we got a full equality match on the MCV list, we're done (and the
+	 * estimate is pretty good).
+	 */
+	if (fullmatch && (s1 > 0.0))
+		return s1;
+
+	/*
+	 * TODO if (fullmatch) without matching MCV item, use the mcv_low
+	 * selectivity as upper bound
+	 */
+
+	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
+
+	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
+	return s1 + s2;
+}
+
+static MVNDistinctItem *
+find_widest_ndistinct_item(MVNDistinct ndistinct, Bitmapset *attnums,
+						   int16 *attmap)
+{
+	int i;
+	MVNDistinctItem *widest = NULL;
+
+	/* number of attnums in clauses */
+	int nattnums = bms_num_members(attnums);
+
+	/* with less than two attributes, we can bail out right away */
+	if (nattnums < 2)
+		return NULL;
+
+	/*
+	 * Iterate over the MVNDistinctItem items and find the widest one from
+	 * those fully-matched by clasuse.
+	 */
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		int				j;
+		bool			full_match = true;
+		MVNDistinctItem *item = &ndistinct->items[i];
+
+		/*
+		 * Skip items referencing more attributes than available clauses,
+		 * as those can't be fully matched.
+		 */
+		if (item->nattrs > nattnums)
+			continue;
+
+		/* We can skip items with fewer attributes than the best one. */
+		if (widest && (widest->nattrs >= item->nattrs))
+			continue;
+
+		/*
+		 * Check that the item actually is fully covered by clauses. We
+		 * have to translate all attribute numbers.
+		 */
+		for (j = 0; j < item->nattrs; j++)
+		{
+			int attnum = attmap[item->attrs[j]];
+
+			if (! bms_is_member(attnum, attnums))
+			{
+				full_match = false;
+				break;
+			}
+		}
+
+		/*
+		 * If the item is not fully matched by clauses, we can't use
+		 * it for the estimation.
+		 */
+		if (! full_match)
+			continue;
+
+		/*
+		 * We have a fully-matched item, and we already know it has to
+		 * be wider than the current one (otherwise we'd skip it before
+		 * inspecting it at the very beginning).
+		 */
+		widest = item;
+	}
+
+	return widest;
+}
+
+static bool
+attnum_in_ndistinct_item(MVNDistinctItem *item, int attnum, int16 *attmap)
+{
+	int j;
+
+	for (j = 0; j < item->nattrs; j++)
+	{
+		if (attnum == attmap[item->attrs[j]])
+			return true;
+	}
+
+	return false;
+}
+
+static Selectivity
+clauselist_mv_selectivity_ndist(PlannerInfo *root, Index relid,
+								List *clauses, MVStatisticInfo *mvstats,
+								Index varRelid, JoinType jointype,
+								SpecialJoinInfo *sjinfo)
+{
+	ListCell	   *lc;
+	Selectivity		s1 = 1.0;
+	MVNDistinct		ndistinct;
+	MVNDistinctItem *item;
+	Bitmapset	   *attnums;
+	List		   *clauses_filtered = NIL;
+
+	/* we should only get here if the statistics includes ndistinct */
+	Assert(mvstats->ndist_enabled && mvstats->ndist_built);
+
+	/* load the ndistinct items stored in the statistics */
+	ndistinct = load_mv_ndistinct(mvstats->mvoid);
+
+	/* collect attnums in the clauses */
+	attnums = collect_mv_attnums(clauses, relid, STATS_TYPE_NDIST);
+
+	Assert(bms_num_members(attnums) >= 2);
+
+	/*
+	 * Search for the widest ndistinct item (covering the most clauses), and
+	 * then use it to estimate the number of entries.
+	 */
+	item = find_widest_ndistinct_item(ndistinct, attnums,
+									  mvstats->stakeys->values);
+
+	if (item)
+	{
+		/*
+		 * We have an applicable item, so identify all covered clauses, and
+		 * remove them from the list of clauses.
+		 */
+		foreach(lc, clauses)
+		{
+			Bitmapset  *attnums_clause = NULL;
+			Node	   *clause = (Node *) lfirst(lc);
+
+			/*
+			 * XXX We need the attnum referenced by the clause, and this is the
+			 * easiest way to get it (but maybe not the best one). At this point
+			 * we should only see equality clauses, so just error out if we
+			 * stumble upon something else.
+			 */
+			if (! clause_is_mv_compatible(clause, relid, &attnums_clause,
+										  STATS_TYPE_NDIST))
+				elog(ERROR, "clause not compatible with ndistinct stats");
+
+			/*
+			 * We also expect only simple equality clauses, with a single Var.
+			 *
+			 * XXX This checks the number of attnums, not the number of Vars,
+			 * but clause_is_mv_compatible only accepts (Var=Const) clauses.
+			 */
+			Assert(bms_num_members(attnums_clause) == 1);
+
+			/*
+			 * If the clause matches the selected ndistinct item, add it to
+			 * the list of ndistinct clauses.
+			 */
+			if (!attnum_in_ndistinct_item(item,
+										  bms_singleton_member(attnums_clause),
+										  mvstats->stakeys->values))
+				clauses_filtered = lappend(clauses_filtered, clause);
+		}
+
+		/* Compute selectivity using the ndistinct item. */
+		s1 *= (1.0 / item->ndistinct);
+
+		/*
+		 * Throw away the clauses matched by the ndistinct, so that we don't
+		 * estimate them twice.
+		 */
+		clauses = clauses_filtered;
+	}
+
+	/* And now simply multiply with selectivities of the remaining clauses. */
+	foreach (lc, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(lc);
+
+		s1 *= clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+	}
+
+	return s1;
+}
+
+
 /*
  * When applying functional dependencies, we start with the strongest ones
  * strongest dependencies. That is, we select the dependency that:
@@ -1147,85 +1438,6 @@ clauselist_mv_selectivity_deps(PlannerInfo *root, Index relid,
 	return s1;
 }
 
-/*
- * estimate selectivity of clauses using multivariate statistic
- *
- * Perform estimation of the clauses using a MCV list.
- *
- * This assumes all the clauses are compatible with the selected statistics
- * (e.g. only reference columns covered by the statistics, use supported
- * operator, etc.).
- *
- * TODO: We may support some additional conditions, most importantly those
- * matching multiple columns (e.g. "a = b" or "a < b").
- *
- * TODO: Clamp the selectivity by min of the per-clause selectivities (i.e. the
- * selectivity of the most restrictive clause), because that's the maximum
- * we can ever get from ANDed list of clauses. This may probably prevent
- * issues with hitting too many buckets and low precision histograms.
- *
- * TODO: We may remember the lowest frequency in the MCV list, and then later
- * use it as a upper boundary for the selectivity (had there been a more
- * frequent item, it'd be in the MCV list). This might improve cases with
- * low-detail histograms.
- *
- * TODO: We may also derive some additional boundaries for the selectivity from
- * the MCV list, because
- *
- * (a) if we have a "full equality condition" (one equality condition on
- * each column of the statistic) and we found a match in the MCV list,
- * then this is the final selectivity (and pretty accurate),
- *
- * (b) if we have a "full equality condition" and we haven't found a match
- * in the MCV list, then the selectivity is below the lowest frequency
- * found in the MCV list,
- *
- * TODO: When applying the clauses to the histogram/MCV list, we can do that
- * from the most selective clauses first, because that'll eliminate the
- * buckets/items sooner (so we'll be able to skip them without inspection,
- * which is more expensive). But this requires really knowing the per-clause
- * selectivities in advance, and that's not what we do now.
- */
-static Selectivity
-clauselist_mv_selectivity(PlannerInfo *root, List *clauses, MVStatisticInfo *mvstats)
-{
-	bool		fullmatch = false;
-	Selectivity s1 = 0.0,
-				s2 = 0.0;
-
-	/*
-	 * Lowest frequency in the MCV list (may be used as an upper bound for
-	 * full equality conditions that did not match any MCV item).
-	 */
-	Selectivity mcv_low = 0.0;
-
-	/*
-	 * TODO: Evaluate simple 1D selectivities, use the smallest one as an
-	 * upper bound, product as lower bound, and sort the clauses in ascending
-	 * order by selectivity (to optimize the MCV/histogram evaluation).
-	 */
-
-	/* Evaluate the MCV first. */
-	s1 = clauselist_mv_selectivity_mcvlist(root, clauses, mvstats,
-										   &fullmatch, &mcv_low);
-
-	/*
-	 * If we got a full equality match on the MCV list, we're done (and the
-	 * estimate is pretty good).
-	 */
-	if (fullmatch && (s1 > 0.0))
-		return s1;
-
-	/*
-	 * TODO if (fullmatch) without matching MCV item, use the mcv_low
-	 * selectivity as upper bound
-	 */
-
-	s2 = clauselist_mv_selectivity_histogram(root, clauses, mvstats);
-
-	/* TODO clamp to <= 1.0 (or more strictly, when possible) */
-	return s1 + s2;
-}
 
 /*
  * Collect attributes from mv-compatible clauses.
@@ -1409,7 +1621,8 @@ choose_mv_statistics(List *stats, Bitmapset *attnums, int types)
 		int			numattrs = info->stakeys->dim1;
 
 		/* skip statistics not matching any of the requested types */
-		if (! ((info->deps_built && (STATS_TYPE_FDEPS & types)) ||
+		if (! ((info->ndist_built && (STATS_TYPE_NDIST & types)) ||
+			   (info->deps_built && (STATS_TYPE_FDEPS & types)) ||
 			   (info->mcv_built && (STATS_TYPE_MCV & types)) ||
 			   (info->hist_built && (STATS_TYPE_HIST & types))))
 			continue;
@@ -1703,6 +1916,9 @@ clause_is_mv_compatible(Node *clause, Index relid, Bitmapset **attnums, int type
 static bool
 stats_type_matches(MVStatisticInfo *stat, int type)
 {
+	if ((type & STATS_TYPE_NDIST) && stat->ndist_built)
+		return true;
+
 	if ((type & STATS_TYPE_FDEPS) && stat->deps_built)
 		return true;
 
-- 
2.5.5

0008-WIP-allow-using-multiple-statistics-in-clauselis-v23.patchbinary/octet-stream; name=0008-WIP-allow-using-multiple-statistics-in-clauselis-v23.patchDownload

From 4899debf66176fba3199f28d2147a8d113c0cbbc Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 28 Oct 2016 17:03:09 +0200
Subject: [PATCH 8/9] WIP: allow using multiple statistics in
 clauselist_selectivity

---
 src/backend/optimizer/path/clausesel.c      | 31 +++++++-----
 src/test/regress/expected/mv_statistics.out | 78 +++++++++++++++++++++++++++++
 src/test/regress/parallel_schedule          |  2 +-
 src/test/regress/serial_schedule            |  1 +
 src/test/regress/sql/mv_statistics.sql      | 60 ++++++++++++++++++++++
 5 files changed, 159 insertions(+), 13 deletions(-)
 create mode 100644 src/test/regress/expected/mv_statistics.out
 create mode 100644 src/test/regress/sql/mv_statistics.sql

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index da5c340..c449c96 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -228,15 +228,16 @@ clauselist_selectivity(PlannerInfo *root,
 		(count_mv_attnums(clauses, relid,
 						  STATS_TYPE_MCV | STATS_TYPE_HIST) >= 2))
 	{
+		Bitmapset  *mvattnums;
+		MVStatisticInfo *mvstat;
+
 		/* collect attributes from the compatible conditions */
-		Bitmapset  *mvattnums = collect_mv_attnums(clauses, relid,
-									  STATS_TYPE_MCV | STATS_TYPE_HIST);
+		mvattnums = collect_mv_attnums(clauses, relid,
+									   STATS_TYPE_MCV | STATS_TYPE_HIST);
 
 		/* and search for the statistic covering the most attributes */
-		MVStatisticInfo *mvstat = choose_mv_statistics(stats, mvattnums,
-									  STATS_TYPE_MCV | STATS_TYPE_HIST);
-
-		if (mvstat != NULL)		/* we have a matching stats */
+		while ((mvstat = choose_mv_statistics(stats, mvattnums,
+									  STATS_TYPE_MCV | STATS_TYPE_HIST)))
 		{
 			/* clauses compatible with multi-variate stats */
 			List	   *mvclauses = NIL;
@@ -250,6 +251,10 @@ clauselist_selectivity(PlannerInfo *root,
 
 			/* compute the multivariate stats */
 			s1 *= clauselist_mv_selectivity(root, mvclauses, mvstat);
+
+			/* update the bitmap if attnums using the remaining clauses) */
+			mvattnums = collect_mv_attnums(clauses, relid,
+								   STATS_TYPE_MCV | STATS_TYPE_HIST);
 		}
 	}
 
@@ -264,9 +269,7 @@ clauselist_selectivity(PlannerInfo *root,
 		mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_FDEPS);
 
 		/* and search for the statistic covering the most attributes */
-		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_FDEPS);
-
-		if (mvstat != NULL)		/* we have a matching stats */
+		while ((mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_FDEPS)))
 		{
 			/* clauses compatible with multi-variate stats */
 			List	   *mvclauses = NIL;
@@ -284,6 +287,9 @@ clauselist_selectivity(PlannerInfo *root,
 			/* compute the multivariate stats (dependencies) */
 			s1 *= clauselist_mv_selectivity_deps(root, relid, mvclauses, mvstat,
 												 varRelid, jointype, sjinfo);
+
+			/* update the bitmap if attnums using the remaining clauses) */
+			mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_FDEPS);
 		}
 	}
 
@@ -298,9 +304,7 @@ clauselist_selectivity(PlannerInfo *root,
 		mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_NDIST);
 
 		/* and search for the statistic covering the most attributes */
-		mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_NDIST);
-
-		if (mvstat != NULL)		/* we have a matching stats */
+		while ((mvstat = choose_mv_statistics(stats, mvattnums, STATS_TYPE_NDIST)))
 		{
 			/* clauses compatible with multi-variate stats */
 			List	   *mvclauses = NIL;
@@ -315,6 +319,9 @@ clauselist_selectivity(PlannerInfo *root,
 			/* compute the multivariate stats (dependencies) */
 			s1 *= clauselist_mv_selectivity_ndist(root, relid, mvclauses, mvstat,
 												  varRelid, jointype, sjinfo);
+
+			/* collect attributes from the compatible conditions */
+			mvattnums = collect_mv_attnums(clauses, relid, STATS_TYPE_NDIST);
 		}
 	}
 
diff --git a/src/test/regress/expected/mv_statistics.out b/src/test/regress/expected/mv_statistics.out
new file mode 100644
index 0000000..7eb6f2e
--- /dev/null
+++ b/src/test/regress/expected/mv_statistics.out
@@ -0,0 +1,78 @@
+-- data type passed by value
+CREATE TABLE multi_stats (
+    a INT,
+    b INT,
+    c INT,
+    d INT,
+    e INT,
+    f INT,
+    g INT,
+    h INT
+);
+-- MCV list on (a,b)
+CREATE STATISTICS m1 WITH (mcv) ON (a, b) FROM multi_stats;
+-- histogram on (c,d)
+CREATE STATISTICS m2 WITH (histogram) ON (c, d) FROM multi_stats;
+-- functional dependencies on (e,f)
+CREATE STATISTICS m3 WITH (dependencies) ON (e, f) FROM multi_stats;
+-- ndistinct coefficients on (g,h)
+CREATE STATISTICS m4 WITH (ndistinct) ON (g, h) FROM multi_stats;
+-- perfectly correlated groups
+INSERT INTO multi_stats
+SELECT
+    i, i/2,      -- MCV
+    i, i + j,    -- histogram
+    k, k/2,      -- dependencies
+    l/5, l/10    -- ndistinct
+FROM (
+    SELECT
+        mod(x, 13)   AS i,
+        mod(x, 17)   AS j,
+        mod(x, 11)   AS k,
+        mod(x, 51)   AS l
+    FROM generate_series(1,30000) AS s(x)
+) foo;
+ANALYZE multi_stats;
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (c >= 3) AND (d <= 10);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..821.00 rows=413 width=32)
+   Filter: ((c >= 3) AND (d <= 10) AND (a = 8) AND (b = 4))
+(2 rows)
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (e = 10) AND (f = 5);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..821.00 rows=210 width=32)
+   Filter: ((a = 8) AND (b = 4) AND (e = 10) AND (f = 5))
+(2 rows)
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (e = 10) AND (f = 5);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..821.00 rows=210 width=32)
+   Filter: ((a = 8) AND (b = 4) AND (e = 10) AND (f = 5))
+(2 rows)
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (g = 2) AND (h = 1);
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..821.00 rows=210 width=32)
+   Filter: ((a = 8) AND (b = 4) AND (g = 2) AND (h = 1))
+(2 rows)
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND
+               (c >= 3) AND (d <= 10) AND
+               (e = 10) AND (f = 5);
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Seq Scan on multi_stats  (cost=0.00..971.00 rows=37 width=32)
+   Filter: ((c >= 3) AND (d <= 10) AND (a = 8) AND (b = 4) AND (e = 10) AND (f = 5))
+(2 rows)
+
+DROP TABLE multi_stats;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 36dd618..bd4a294 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -118,4 +118,4 @@ test: event_trigger
 test: stats
 
 # run tests of multivariate stats
-test: mv_ndistinct mv_dependencies mv_mcv mv_histogram
+test: mv_ndistinct mv_dependencies mv_mcv mv_histogram mv_statistics
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 34f5467..54cc854 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -175,3 +175,4 @@ test: mv_ndistinct
 test: mv_dependencies
 test: mv_mcv
 test: mv_histogram
+test: mv_statistics
diff --git a/src/test/regress/sql/mv_statistics.sql b/src/test/regress/sql/mv_statistics.sql
new file mode 100644
index 0000000..cd12ad0
--- /dev/null
+++ b/src/test/regress/sql/mv_statistics.sql
@@ -0,0 +1,60 @@
+-- data type passed by value
+CREATE TABLE multi_stats (
+    a INT,
+    b INT,
+    c INT,
+    d INT,
+    e INT,
+    f INT,
+    g INT,
+    h INT
+);
+
+-- MCV list on (a,b)
+CREATE STATISTICS m1 WITH (mcv) ON (a, b) FROM multi_stats;
+
+-- histogram on (c,d)
+CREATE STATISTICS m2 WITH (histogram) ON (c, d) FROM multi_stats;
+
+-- functional dependencies on (e,f)
+CREATE STATISTICS m3 WITH (dependencies) ON (e, f) FROM multi_stats;
+
+-- ndistinct coefficients on (g,h)
+CREATE STATISTICS m4 WITH (ndistinct) ON (g, h) FROM multi_stats;
+
+-- perfectly correlated groups
+INSERT INTO multi_stats
+SELECT
+    i, i/2,      -- MCV
+    i, i + j,    -- histogram
+    k, k/2,      -- dependencies
+    l/5, l/10    -- ndistinct
+FROM (
+    SELECT
+        mod(x, 13)   AS i,
+        mod(x, 17)   AS j,
+        mod(x, 11)   AS k,
+        mod(x, 51)   AS l
+    FROM generate_series(1,30000) AS s(x)
+) foo;
+
+ANALYZE multi_stats;
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (c >= 3) AND (d <= 10);
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (e = 10) AND (f = 5);
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (e = 10) AND (f = 5);
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND (g = 2) AND (h = 1);
+
+EXPLAIN SELECT * FROM multi_stats
+         WHERE (a = 8) AND (b = 4) AND
+               (c >= 3) AND (d <= 10) AND
+               (e = 10) AND (f = 5);
+
+DROP TABLE multi_stats;
-- 
2.5.5

0009-WIP-psql-tab-completion-basics-v23.patchbinary/octet-stream; name=0009-WIP-psql-tab-completion-basics-v23.patchDownload

From 9b9ce038dca938cc78c935448fb331d83a6d2f6c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@pgaddict.com>
Date: Fri, 28 Oct 2016 20:46:37 +0200
Subject: [PATCH 9/9] WIP: psql tab-completion basics

---
 src/bin/psql/tab-complete.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index d6fffcf..8e804d1 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -448,6 +448,21 @@ static const SchemaQuery Query_for_list_of_foreign_tables = {
 	NULL
 };
 
+static const SchemaQuery Query_for_list_of_statistics = {
+	/* catname */
+	"pg_catalog.pg_mv_statistic s",
+	/* selcondition */
+	NULL,
+	/* viscondition */
+	NULL,
+	/* namespace */
+	"s.stanamespace",
+	/* result */
+	"pg_catalog.quote_ident(s.staname)",
+	/* qualresult */
+	NULL
+};
+
 static const SchemaQuery Query_for_list_of_tables = {
 	/* catname */
 	"pg_catalog.pg_class c",
@@ -966,6 +981,7 @@ static const pgsql_thing_t words_after_create[] = {
 	{"SCHEMA", Query_for_list_of_schemas},
 	{"SEQUENCE", NULL, &Query_for_list_of_sequences},
 	{"SERVER", Query_for_list_of_servers},
+	{"STATISTICS", NULL, &Query_for_list_of_statistics},
 	{"SUBSCRIPTION", NULL, NULL},
 	{"TABLE", NULL, &Query_for_list_of_tables},
 	{"TABLESPACE", Query_for_list_of_tablespaces},
@@ -1410,8 +1426,8 @@ psql_completion(const char *text, int start, int end)
 			"EVENT TRIGGER", "EXTENSION", "FOREIGN DATA WRAPPER", "FOREIGN TABLE", "FUNCTION",
 			"GROUP", "INDEX", "LANGUAGE", "LARGE OBJECT", "MATERIALIZED VIEW", "OPERATOR",
 			"POLICY", "PUBLICATION", "ROLE", "RULE", "SCHEMA", "SERVER", "SEQUENCE",
-			"SUBSCRIPTION", "SYSTEM", "TABLE", "TABLESPACE", "TEXT SEARCH", "TRIGGER", "TYPE",
-		"USER", "USER MAPPING FOR", "VIEW", NULL};
+			"STATISTICS", "SUBSCRIPTION", "SYSTEM", "TABLE", "TABLESPACE", "TEXT SEARCH",
+		"TRIGGER", "TYPE", "USER", "USER MAPPING FOR", "VIEW", NULL};
 
 		COMPLETE_WITH_LIST(list_ALTER);
 	}
@@ -1713,6 +1729,10 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches5("ALTER", "RULE", MatchAny, "ON", MatchAny))
 		COMPLETE_WITH_CONST("RENAME TO");
 
+	/* ALTER STATISTICS <name> */
+	else if (Matches3("ALTER", "STATISTICS", MatchAny))
+		COMPLETE_WITH_LIST3("OWNER TO", "RENAME TO", "SET SCHEMA");
+
 	/* ALTER TRIGGER <name>, add ON */
 	else if (Matches3("ALTER", "TRIGGER", MatchAny))
 		COMPLETE_WITH_CONST("ON");
@@ -2292,6 +2312,12 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches3("CREATE", "SERVER", MatchAny))
 		COMPLETE_WITH_LIST3("TYPE", "VERSION", "FOREIGN DATA WRAPPER");
 
+/* CREATE STATISTICS <name> */
+	else if (Matches3("CREATE", "STATISTICS", MatchAny))
+		COMPLETE_WITH_LIST2("WITH", "ON");
+	else if (Matches4("CREATE", "STATISTICS", MatchAny, "ON|WITH"))
+		COMPLETE_WITH_CONST("(");
+
 /* CREATE TABLE --- is allowed inside CREATE SCHEMA, so use TailMatches */
 	/* Complete "CREATE TEMP/TEMPORARY" with the possible temp objects */
 	else if (TailMatches2("CREATE", "TEMP|TEMPORARY"))
-- 
2.5.5

#185

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Dilip Kumar (#180)

Re: multivariate statistics (v19)

On 01/26/2017 10:43 AM, Dilip Kumar wrote:

histograms
--------------
+ if (matches[i] == MVSTATS_MATCH_FULL)
+ s += mvhist->buckets[i]->ntuples;
+ else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+ s += 0.5 * mvhist->buckets[i]->ntuples;
Isn't it will be better that take some percentage of the bucket based
on the number of distinct element for partial matching buckets.

I don't think so, for the same reason why ineq_histogram_selectivity()
in selfuncs.c uses

binfrac = 0.5;

for partial bucket matches - it provides minimum average error. Even if
we knew the number of distinct items in the bucket, we have no idea what
the distribution within the bucket looks like. Maybe 99% of the bucket
are covered by a single distinct value, maybe all the items are squashed
on one side of the bucket, etc.

Moreover we don't really know the number of distinct values in the
bucket - we only know the number of distinct items in the sample, and
only while building the histogram. I don't think it makes much sense to
estimate the number of distinct items in a bucket, because the buckets
contain only very few rows so the estimates would be wildly inaccurate.

+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+  int2vector *stakeys,
+  MVSerializedHistogram mvhist,
+  int nmatches, char *matches,
+  bool is_or)
+{
+ int i;
For each clause we are processing all the buckets, can't we use some
data structure which can make multi-dimensions information searching
faster.

No, we're not processing all buckets for each clause. We're' only
processing buckets that were not "ruled out" by preceding clauses.
That's the whole point of the bitmap.

For example for condition (a=1) AND (b=2), the code will first evaluate
(a=1) on all buckets, and then (b=2) but only on buckets where (a=1) was
evaluated as true. Similarly for OR clauses.

Something like HTree, RTree, Maybe storing histogram in these formats
will be difficult?

Maybe, but I don't want to do that in the first version. I'm not opposed
to doing that in the future, if we find out the v1 histograms are not
efficient (I don't think we will, based on tests I did while working on
the patch). Support for other histogram implementations is pretty much
why there is 'type' field in the struct.

For now I think we should stick with the simple implementation.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#186

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Ideriha, Takeshi (#179)

Re: multivariate statistics (v19)

Hello,

On 01/26/2017 10:03 AM, Ideriha, Takeshi wrote:

Though I pointed out these typoes and so on,
I believe these feedback are less priority compared to the source code itself.

So please work on my feedback if you have time.

I think getting the comments (and docs in general) right is just as
important as the code. So thank you for your review!

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#187

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#183)

Re: multivariate statistics (v19)

On 01/30/2017 05:55 PM, Alvaro Herrera wrote:

Minor nitpicks:

Let me suggest to use get_attnum() in CreateStatistics instead of
SearchSysCacheAttName for each column. Also, we use type AttrNumber for
attribute numbers rather than int16. Finally in the same function you
have an erroneous ERRCODE_UNDEFINED_COLUMN which should be
ERRCODE_DUPLICATE_COLUMN in the loop that searches for duplicates.

May I suggest that compare_int16 be named attnum_cmp (just to be
consistent with other qsort comparators) and look like
return *((const AttrNumber *) a) - *((const AttrNumber *) b);
instead of memcmp?

Yes, I think this is pretty much what Kyotaro-san pointed out in his
review. I'll go through the patch and make sure the correct data types
are used.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#188

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#182)

Re: multivariate statistics (v19)

On 01/30/2017 05:12 PM, Alvaro Herrera wrote:

Hmm. So we have a catalog pg_mv_statistics which stores two things:
1. the configuration regarding mvstats that have been requested by user
via CREATE/ALTER STATISTICS
2. the actual values captured from the above, via ANALYZE

I think this conflates two things that really are separate, given their
different timings and usage patterns. This decision is causing the
catalog to have columns enabled/built flags for each set of stats
requested, which looks a bit odd. In particular, the fact that you have
to heap_update the catalog in order to add more stuff as it's built
looks inconvenient.

Have you thought about having the "requested" bits be separate from the
actual computed values? Something like

pg_mv_statistics
starelid
staname
stanamespace
staowner -- all the above as currently
staenabled array of "char" {d,f,s}
stakeys
// no CATALOG_VARLEN here

where each char in the staenabled array has a #define and indicates one
type, "ndistinct", "functional dep", "selectivity" etc.

The actual values computed by ANALYZE would live in a catalog like:

pg_mv_statistics_values
stvstaid -- OID of the corresponding pg_mv_statistics row. Needed?

Definitely needed. How else would you know which MCV list and histogram
belong together? This works just like in pg_statistic - when both MCV
and histograms are enabled for the statistic, we first build MCV list,
then histogram on remaining rows. So we need to pair them.

stvrelid -- same as starelid
stvkeys -- same as stakeys
#ifdef CATALOG_VARLEN
stvkind 'd' or 'f' or 's', etc
stvvalue the bytea blob
#endif

I think that would be simpler, both conceptually and in terms of code.

I think the main issue here is that it throws away the special data
types (pg_histogram, pg_mcv, pg_ndistinct, pg_dependencies), which I
think is a neat idea and would like to keep it. This would throw that
away, making everything bytea again. I don't like that.

The other angle to consider is planner-side: how does the planner gets
to the values? I think as far as the planner goes, the first catalog
doesn't matter at all, because a statistics type that has been enabled
but not computed is not interesting at all; planner only cares about the
values in the second catalog (this is why I added stvkeys). Currently
you're just caching a single pg_mv_statistics row in get_relation_info
(and only if any of the "built" flags is set), which is simple. With my
proposed change, you'd need to keep multiple pg_mv_statistics_values
rows.

But maybe you already tried something like what I propose and there's a
reason not to do it?

Honestly, I don't see how this improves the situation. We still need to
cache data for exactly one catalog, so how is that simpler?

The way I see it, it actually makes things more complicated, because now
we have two catalogs to manage instead of one (e.g. when doing DROP
STATISTICS, or after ALTER TABLE ... DROP COLUMN).

The 'built' flags may be easily replaced with a check if the bytea-like
columns are NULL, and the 'enabled' columns may be replaced by the array
of char, just like you proposed.

That'd give us a single catalog looking like this:

pg_mv_statistics
starelid
staname
stanamespace
staowner -- all the above as currently
staenabled array of "char" {d,f,s}
stakeys
stadeps (dependencies)
standist (ndistinct coefficients)
stamcv (MCV list)
stahist (histogram)

Which is probably a better / simpler structure than the current one.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#189

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#188)

Re: multivariate statistics (v19)

Tomas Vondra wrote:

The 'built' flags may be easily replaced with a check if the bytea-like
columns are NULL, and the 'enabled' columns may be replaced by the array of
char, just like you proposed.

That'd give us a single catalog looking like this:

pg_mv_statistics
starelid
staname
stanamespace
staowner -- all the above as currently
staenabled array of "char" {d,f,s}
stakeys
stadeps (dependencies)
standist (ndistinct coefficients)
stamcv (MCV list)
stahist (histogram)

Which is probably a better / simpler structure than the current one.

Looks good to me. I don't think we need to keep the names very short --
I would propose "standistinct", "stahistogram", "stadependencies".

Thanks,

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#190

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#189)

Re: multivariate statistics (v19)

On 01/30/2017 09:37 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:

The 'built' flags may be easily replaced with a check if the bytea-like
columns are NULL, and the 'enabled' columns may be replaced by the array of
char, just like you proposed.

That'd give us a single catalog looking like this:

pg_mv_statistics
starelid
staname
stanamespace
staowner -- all the above as currently
staenabled array of "char" {d,f,s}
stakeys
stadeps (dependencies)
standist (ndistinct coefficients)
stamcv (MCV list)
stahist (histogram)

Which is probably a better / simpler structure than the current one.

Looks good to me. I don't think we need to keep the names very short --
I would propose "standistinct", "stahistogram", "stadependencies".

Yeah, I got annoyed by the short names too.

This however reminds me that perhaps pg_mv_statistic is not the best
name. I know others proposed pg_statistic_ext (and pg_stats_ext), and
while I wasn't a big fan initially, I think it's a better name. People
generally don't know what 'multivariate' means, while 'extended' is
better known (e.g. because Oracle uses it for similar stuff).

So I think I'll switch to that name too.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#191

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Tomas Vondra (#190)

Re: multivariate statistics (v19)

On Tue, Jan 31, 2017 at 6:57 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

This however reminds me that perhaps pg_mv_statistic is not the best name. I
know others proposed pg_statistic_ext (and pg_stats_ext), and while I wasn't
a big fan initially, I think it's a better name. People generally don't know
what 'multivariate' means, while 'extended' is better known (e.g. because
Oracle uses it for similar stuff).

So I think I'll switch to that name too.

I have moved this patch to the next CF, with Álvaro as reviewer.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#192

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

almost 9 years ago

In reply to: Tomas Vondra (#190)

Re: multivariate statistics (v19)

On 2017/01/31 6:57, Tomas Vondra wrote:

On 01/30/2017 09:37 PM, Alvaro Herrera wrote:

Looks good to me. I don't think we need to keep the names very short --
I would propose "standistinct", "stahistogram", "stadependencies".

Yeah, I got annoyed by the short names too.

This however reminds me that perhaps pg_mv_statistic is not the best name.
I know others proposed pg_statistic_ext (and pg_stats_ext), and while I
wasn't a big fan initially, I think it's a better name. People generally
don't know what 'multivariate' means, while 'extended' is better known
(e.g. because Oracle uses it for similar stuff).

So I think I'll switch to that name too.

+1 to pg_statistics_ext. Maybe, even pg_statistics_extended, however
being that verbose may not be warranted.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#193

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Amit Langote (#192)

Re: multivariate statistics (v19)

On 01/31/2017 07:52 AM, Amit Langote wrote:

On 2017/01/31 6:57, Tomas Vondra wrote:

On 01/30/2017 09:37 PM, Alvaro Herrera wrote:

Looks good to me. I don't think we need to keep the names very short --
I would propose "standistinct", "stahistogram", "stadependencies".

Yeah, I got annoyed by the short names too.

This however reminds me that perhaps pg_mv_statistic is not the best name.
I know others proposed pg_statistic_ext (and pg_stats_ext), and while I
wasn't a big fan initially, I think it's a better name. People generally
don't know what 'multivariate' means, while 'extended' is better known
(e.g. because Oracle uses it for similar stuff).

So I think I'll switch to that name too.

+1 to pg_statistics_ext. Maybe, even pg_statistics_extended, however
being that verbose may not be warranted.

Yeah, I think pg_statistic_extended / pg_stats_extended seems fine.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#194

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#184)

Re: multivariate statistics (v19)

Still looking at 0002.

pg_ndistinct_in disallows input, claiming that pg_node_tree does the
same thing. But pg_node_tree does it for security reasons: you could
crash the backend if you supplied a malicious value. I don't think that
applies to pg_ndistinct_in. Perhaps it will be useful to inject fake
stats at some point, so why not allow it? It shouldn't be complicated
(though it does require writing some additional code, so perhaps that's
one reason we don't want to allow input of these values).

The comment on top of pg_ndistinct_out is missing the "_out"; also it
talks about histograms, which is not what this is about.

In the same function, a trivial point you don't need to pstrdup() the
.data out of a stringinfo; it's already palloc'ed in the right context
-- just PG_RETURN_CSTRING(str.data) and forget about "ret". Saves you
one line.

Nearby, some auxiliary functions such as n_choose_k and num_combinations
are not documented. What it is that they do? I'd move these at the end
of the file, keeping the important entry points at the top of the file.

I see this patch has a estimate_ndistinct() which claims to be a re-
implementation of code already in analyze.c, but it is actually a lot
simpler than what analyze.c does. I've been wondering if it'd be a good
idea to use some of this code so that some routines are moved out of
analyze.c; good implementations of statistics-related functions would
live in src/backend/statistics/ where they can be used both by analyze.c
and your new mvstats stuff. (More generally I am beginning to wonder if
the new directory should be just src/backend/statistics.)

common.h does not belong in src/backend/utils/mvstats; IMO it should be
called src/include/utils/mvstat.h. Also, it must not include
postgres.h, and it probably doesn't need most of the #includes it has;
those are better put into whatever include it. It definitely needs a
guarding #ifdef MVSTATS_H around its whole content too. An include file
is not just a way to avoid #includes in other files; it is supposed to
be a minimally invasive way of exporting the structs and functions
implemented in some file into other files. So it must be kept minimal.

psql/tab-complete.c compares the wrong version number (9.6 instead of
10).

Is it important to have a cast from pg_ndistinct to bytea? I think
it's odd that outputting it as bytea yields something completely
different than as text. (The bytea is not human readable and cannot be
used for future input, so what is the point?)

In another subthread you seem to have surrendered to the opinion that
the new catalog should be called pg_statistics_ext, just in case in the
future we come up with additional things to put on it. However, given
its schema, with a "starelid / stakeys", is it sensible to think that
we're going to get anything other than something that involves multiple
variables? Maybe it should just be "pg_statistics_multivar" and if
something else comes along we create another catalog with an appropriate
schema. Heck, how does this catalog serve the purpose of cross-table
statistics in the first place, given that it has room to record a single
relid only? Are you thinking that in the future you'd change starelid
into an oidvector column?

The comment in gram.y about the CREATE STATISTICS is at odds with what
is actually allowed by the grammar.

I think the name of a statistics is only useful to DROP/ALTER it, right?
I wonder why it's useful that statistics belongs in a schema. Perhaps
it should be a global object? I suppose the name collisions would
become bothersome if you have many mvstats.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#195

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#194)

Re: multivariate statistics (v19)

On 02/01/2017 11:52 PM, Alvaro Herrera wrote:

Still looking at 0002.

pg_ndistinct_in disallows input, claiming that pg_node_tree does the
same thing. But pg_node_tree does it for security reasons: you could
crash the backend if you supplied a malicious value. I don't think
that applies to pg_ndistinct_in. Perhaps it will be useful to inject
fake stats at some point, so why not allow it? It shouldn't be
complicated (though it does require writing some additional code, so
perhaps that's one reason we don't want to allow input of these
values).

Yes, I haven't written the code, and I'm not sure it's a very practical
way to inject custom statistics. But if we decide to allow that in the
future, we can probably add the code.

There's a subtle difference between pg_node_tree and the data types for
statistics - pg_node_tree stores the value as a string (matching the
nodeToString output), so the _in function is fairly simple. Of course,
stringToNode() assumes safe input, which is why the input is disabled.

OTOH the statistics are stored in an optimized binary format, allowing
to use the value directly (without having to do expensive parsing etc).

I was thinking that the easiest way to add support for _in would be to
add a bunch of Nodes for the statistics, along with in/out functions,
but keeping the internal binary representation. But that'll be tricky to
do in a safe way - even if those nodes are coded in a very defensive
ways, I'd bet there'll be ways to inject unsafe nodes.

So I'm OK with not having the _in for now. If needed, it's possible to
construct the statistics as a bytea using a bit of C code. That's at
least obviously unsafe, as anything written in C, touching the memory.

The comment on top of pg_ndistinct_out is missing the "_out"; also it
talks about histograms, which is not what this is about.

OK, will fix.

In the same function, a trivial point you don't need to pstrdup() the
.data out of a stringinfo; it's already palloc'ed in the right context
-- just PG_RETURN_CSTRING(str.data) and forget about "ret". Saves you
one line.

Will fix too.

Nearby, some auxiliary functions such as n_choose_k and
num_combinations are not documented. What it is that they do? I'd
move these at the end of the file, keeping the important entry points
at the top of the file.

I'd say n-choose-k is pretty widely known term from combinatorics. The
comment would essentially say just 'this is n-choose-k' which seems
rather pointless. So as much as I dislike the self-documenting code,
this actually seems like a good case of that.

I see this patch has a estimate_ndistinct() which claims to be a re-
implementation of code already in analyze.c, but it is actually a lot
simpler than what analyze.c does. I've been wondering if it'd be a good
idea to use some of this code so that some routines are moved out of
analyze.c; good implementations of statistics-related functions would
live in src/backend/statistics/ where they can be used both by analyze.c
and your new mvstats stuff. (More generally I am beginning to wonder if
the new directory should be just src/backend/statistics.)

I'll look into that. I have to check if I ignored some assumptions or
corner cases the analyze.c deals with.

common.h does not belong in src/backend/utils/mvstats; IMO it should be
called src/include/utils/mvstat.h. Also, it must not include
postgres.h, and it probably doesn't need most of the #includes it has;
those are better put into whatever include it. It definitely needs a
guarding #ifdef MVSTATS_H around its whole content too. An include file
is not just a way to avoid #includes in other files; it is supposed to
be a minimally invasive way of exporting the structs and functions
implemented in some file into other files. So it must be kept minimal.

Will do.

psql/tab-complete.c compares the wrong version number (9.6 instead of
10).

Is it important to have a cast from pg_ndistinct to bytea? I think
it's odd that outputting it as bytea yields something completely
different than as text. (The bytea is not human readable and cannot be
used for future input, so what is the point?)

Because it internally is a bytea, and it seems useful to have the
ability to inspect the bytea value directly (e.g. to see the length of
the bytea and not the string output).

In another subthread you seem to have surrendered to the opinion that
the new catalog should be called pg_statistics_ext, just in case in the
future we come up with additional things to put on it. However, given
its schema, with a "starelid / stakeys", is it sensible to think that
we're going to get anything other than something that involves multiple
variables? Maybe it should just be "pg_statistics_multivar" and if
something else comes along we create another catalog with an appropriate
schema. Heck, how does this catalog serve the purpose of cross-table
statistics in the first place, given that it has room to record a single
relid only? Are you thinking that in the future you'd change starelid
into an oidvector column?

Yes, I think the starelid will turn into OID vector. The reason why I
haven't done that in the current version of the catalog is to keep it
simple. Supporting join statistics will require tracking OID for each
attribute, because those will be from multiple relations. It'll also
require tracking "join condition" and so on.

We've designed the CREATED STATISTICS syntax to support this extension,
but I'm strongly against complicating the catalogs at this point.

The comment in gram.y about the CREATE STATISTICS is at odds with what
is actually allowed by the grammar.

Which comment?

I think the name of a statistics is only useful to DROP/ALTER it, right?
I wonder why it's useful that statistics belongs in a schema. Perhaps
it should be a global object? I suppose the name collisions would
become bothersome if you have many mvstats.

I think it shouldn't be a global object. I consider them to be a part of
a schema (just like indexes, for example). Imagine you have a
multi-tenant database, with using exactly the same (tables/indexes)
schema, but keept in different schemas. Why shouldn't it be possible to
also use the same set of statistics for each tenant?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#196

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Tomas Vondra (#195)

Re: multivariate statistics (v19)

On Thu, Feb 2, 2017 at 3:59 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

There's a subtle difference between pg_node_tree and the data types for
statistics - pg_node_tree stores the value as a string (matching the
nodeToString output), so the _in function is fairly simple. Of course,
stringToNode() assumes safe input, which is why the input is disabled.

OTOH the statistics are stored in an optimized binary format, allowing to
use the value directly (without having to do expensive parsing etc).

I was thinking that the easiest way to add support for _in would be to add a
bunch of Nodes for the statistics, along with in/out functions, but keeping
the internal binary representation. But that'll be tricky to do in a safe
way - even if those nodes are coded in a very defensive ways, I'd bet
there'll be ways to inject unsafe nodes.

So I'm OK with not having the _in for now. If needed, it's possible to
construct the statistics as a bytea using a bit of C code. That's at least
obviously unsafe, as anything written in C, touching the memory.

Since these data types are already special-purpose, I don't really see
why it would be desirable to entangle them with the existing code for
serializing and deserializing Nodes. Whether or not it's absolutely
necessary for these types to have input functions, it seems at least
possible that it would be useful, and it becomes much less likely that
we can make that work if it's piggybacking on stringToNode().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#197

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#195)

Re: multivariate statistics (v19)

Tomas Vondra wrote:

On 02/01/2017 11:52 PM, Alvaro Herrera wrote:

Nearby, some auxiliary functions such as n_choose_k and
num_combinations are not documented. What it is that they do? I'd
move these at the end of the file, keeping the important entry points
at the top of the file.

I'd say n-choose-k is pretty widely known term from combinatorics. The
comment would essentially say just 'this is n-choose-k' which seems rather
pointless. So as much as I dislike the self-documenting code, this actually
seems like a good case of that.

Actually, we do have such comments all over the place. I knew this as
"n sobre k", so the english name doesn't immediately ring a bell with me
until I look it up; I think the function comment could just say
"n_choose_k -- this function returns the binomial coefficient".

I see this patch has a estimate_ndistinct() which claims to be a re-
implementation of code already in analyze.c, but it is actually a lot
simpler than what analyze.c does. I've been wondering if it'd be a good
idea to use some of this code so that some routines are moved out of
analyze.c; good implementations of statistics-related functions would
live in src/backend/statistics/ where they can be used both by analyze.c
and your new mvstats stuff. (More generally I am beginning to wonder if
the new directory should be just src/backend/statistics.)

I'll look into that. I have to check if I ignored some assumptions or corner
cases the analyze.c deals with.

Maybe it's not terribly important to refactor analyze.c from the get go,
but let's give the subdir a more general name. Hence my vote for having
the subdir be "statistics" instead of "mvstats".

In another subthread you seem to have surrendered to the opinion that
the new catalog should be called pg_statistics_ext, just in case in the
future we come up with additional things to put on it. However, given
its schema, with a "starelid / stakeys", is it sensible to think that
we're going to get anything other than something that involves multiple
variables? Maybe it should just be "pg_statistics_multivar" and if
something else comes along we create another catalog with an appropriate
schema. Heck, how does this catalog serve the purpose of cross-table
statistics in the first place, given that it has room to record a single
relid only? Are you thinking that in the future you'd change starelid
into an oidvector column?

Yes, I think the starelid will turn into OID vector. The reason why I
haven't done that in the current version of the catalog is to keep it
simple.

OK -- as long as we know what the way forward is, I'm good. Still, my
main point was that even if we have multiple rels, this catalog will be
about having multivariate statistics, and not different kinds of
statistical data. I would keep pg_mv_statistics, really.

The comment in gram.y about the CREATE STATISTICS is at odds with what
is actually allowed by the grammar.

Which comment?

This one:
* CREATE STATISTICS stats_name ON relname (columns) WITH (options)
the production actually says:
CREATE STATISTICS any_name ON '(' columnList ')' FROM qualified_name

I think the name of a statistics is only useful to DROP/ALTER it, right?
I wonder why it's useful that statistics belongs in a schema. Perhaps
it should be a global object? I suppose the name collisions would
become bothersome if you have many mvstats.

I think it shouldn't be a global object. I consider them to be a part of a
schema (just like indexes, for example). Imagine you have a multi-tenant
database, with using exactly the same (tables/indexes) schema, but keept in
different schemas. Why shouldn't it be possible to also use the same set of
statistics for each tenant?

True. Suggestion withdrawn.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#198

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#184)

Re: multivariate statistics (v19)

Looking at 0003, I notice that gram.y is changed to add a WITH ( .. )
clause. If it's not specified, an error is raised. If you create
stats with (ndistinct) then you can't alter it later to add
"dependencies" or whatever; unless I misunderstand, you have to drop the
statistics and create another one. Probably in a forthcoming patch we
should have ALTER support to add a stats type.

Also, why isn't the default to build everything, rather than nothing?

BTW, almost everything in the backend could be inside "utils/", so let's
not do that -- let's just create src/backend/statistics/ for all your
code.

Here a few notes while reading README.dependencies -- some typos, two
questions.

diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies
index 908f094..7f3ed3d 100644
--- a/src/backend/utils/mvstats/README.dependencies
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -36,7 +36,7 @@ design choice to model the dataset in denormalized way, either because of
 performance or to make querying easier.

-soft dependencies
+Soft dependencies
 -----------------

Real-world data sets often contain data errors, either because of data entry
@@ -48,7 +48,7 @@ rendering the approach mostly useless even for slightly noisy data sets, or
result in sudden changes in behavior depending on minor differences between
samples provided to ANALYZE.

-For this reason the statistics implementes "soft" functional dependencies,
+For this reason the statistics implements "soft" functional dependencies,
 associating each functional dependency with a degree of validity (a number
 number between 0 and 1). This degree is then used to combine selectivities
 in a smooth manner.
@@ -75,6 +75,7 @@ The algorithm also requires a minimum size of the group to consider it
 consistent (currently 3 rows in the sample). Small groups make it less likely
 to break the consistency.

+## What is it that we store in the catalog?

 Clause reduction (planner/optimizer)
 ------------------------------------
@@ -95,12 +96,12 @@ example for (a,b,c) we first use (a,b=>c) to break the computation into
 and then apply (a=>b) the same way on P(a=?,b=?).

-Consistecy of clauses
+Consistency of clauses
 ---------------------

 Functional dependencies only express general dependencies between columns,
 without referencing particular values. This assumes that the equality clauses
-are in fact consistent with the functinal dependency, i.e. that given a
+are in fact consistent with the functional dependency, i.e. that given a
 dependency (a=>b), the value in (b=?) clause is the value determined by (a=?).
 If that's not the case, the clauses are "inconsistent" with the functional
 dependency and the result will be over-estimation.
@@ -111,6 +112,7 @@ set will be empty, but we'll estimate the selectivity using the ZIP condition.

In this case the default estimation based on AVIA principle happens to work
better, but mostly by chance.
+## what is AVIA principle?

This issue is the price for the simplicity of functional dependencies. If the
application frequently constructs queries with clauses inconsistent with

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#199

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#184)

Re: multivariate statistics (v19)

Still about 0003. dependencies.c comment at the top of the file should
contain some details about what is it implementing and a general
description of the algorithm and data structures. As before, it's best
to have the main entry point build_mv_dependencies at the top, the other
public functions, keeping the internal routines at the bottom of the
file. That eases code study for future readers. (Minimizing number of
function prototypes is not a goal.)

What is MVSTAT_DEPS_TYPE_BASIC? Is "functional dependencies" really
BASIC? I wonder if it should be TYPE_FUNCTIONAL_DEPS or something.

As with pg_ndistinct_out, there's no need to pstrdup(str.data), as it's
already palloc'ed in the right context.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#200

Dean Rasheed

dean.a.rasheed@gmail.com

almost 9 years ago

In reply to: Alvaro Herrera (#197)

Re: multivariate statistics (v19)

On 6 February 2017 at 21:26, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Tomas Vondra wrote:

On 02/01/2017 11:52 PM, Alvaro Herrera wrote:

Nearby, some auxiliary functions such as n_choose_k and
num_combinations are not documented. What it is that they do? I'd
move these at the end of the file, keeping the important entry points
at the top of the file.

I'd say n-choose-k is pretty widely known term from combinatorics. The
comment would essentially say just 'this is n-choose-k' which seems rather
pointless. So as much as I dislike the self-documenting code, this actually
seems like a good case of that.

Actually, we do have such comments all over the place. I knew this as
"n sobre k", so the english name doesn't immediately ring a bell with me
until I look it up; I think the function comment could just say
"n_choose_k -- this function returns the binomial coefficient".

One of the things you have to watch out for when writing code to
compute binomial coefficients is integer overflow, since the numerator
and denominator get large very quickly. For example, the current code
will overflow for n=13, k=12, which really isn't that large.

This can be avoided by computing the product in reverse and using a
larger datatype like a 64-bit integer to store a single intermediate
result. The point about multiplying the terms in reverse is that it
guarantees that each intermediate result is an exact integer (a
smaller binomial coefficient), so there is no need to track separate
numerators and denominators, and you avoid huge intermediate
factorials. Here's what that looks like in psuedo-code:

binomial(int n, int k):
# Save computational effort by using the symmetry of the binomial
# coefficients
k = min(k, n-k);

# Compute the result using binomial(n, k) = binomial(n-1, k-1) * n / k,
# starting from binomial(n-k, 0) = 1, and computing the sequence
# binomial(n-k+1, 1), binomial(n-k+2, 2), ...
#
# Note that each intermediate result is an exact integer.
int64 result = 1;
for (int i = 1; i <= k; i++)
{
result = (result * (n-k+i)) / i;
if (result > INT_MAX) Raise overflow error
}
return (int) result;

Note also that I think num_combinations(n) is just an expensive way of
calculating 2^n - n - 1.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#201

David Fetter

david@fetter.org

almost 9 years ago

In reply to: Dean Rasheed (#200)

Re: multivariate statistics (v19)

On Wed, Feb 08, 2017 at 03:23:25PM +0000, Dean Rasheed wrote:

On 6 February 2017 at 21:26, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Tomas Vondra wrote:

On 02/01/2017 11:52 PM, Alvaro Herrera wrote:

Nearby, some auxiliary functions such as n_choose_k and
num_combinations are not documented. What it is that they do? I'd
move these at the end of the file, keeping the important entry points
at the top of the file.

I'd say n-choose-k is pretty widely known term from combinatorics. The
comment would essentially say just 'this is n-choose-k' which seems rather
pointless. So as much as I dislike the self-documenting code, this actually
seems like a good case of that.

Actually, we do have such comments all over the place. I knew this as
"n sobre k", so the english name doesn't immediately ring a bell with me
until I look it up; I think the function comment could just say
"n_choose_k -- this function returns the binomial coefficient".

One of the things you have to watch out for when writing code to
compute binomial coefficients is integer overflow, since the numerator
and denominator get large very quickly. For example, the current code
will overflow for n=13, k=12, which really isn't that large.

This can be avoided by computing the product in reverse and using a
larger datatype like a 64-bit integer to store a single intermediate
result. The point about multiplying the terms in reverse is that it
guarantees that each intermediate result is an exact integer (a
smaller binomial coefficient), so there is no need to track separate
numerators and denominators, and you avoid huge intermediate
factorials. Here's what that looks like in psuedo-code:

binomial(int n, int k):
# Save computational effort by using the symmetry of the binomial
# coefficients
k = min(k, n-k);

# Compute the result using binomial(n, k) = binomial(n-1, k-1) * n / k,
# starting from binomial(n-k, 0) = 1, and computing the sequence
# binomial(n-k+1, 1), binomial(n-k+2, 2), ...
#
# Note that each intermediate result is an exact integer.
int64 result = 1;
for (int i = 1; i <= k; i++)
{
result = (result * (n-k+i)) / i;
if (result > INT_MAX) Raise overflow error
}
return (int) result;

Note also that I think num_combinations(n) is just an expensive way of
calculating 2^n - n - 1.

Combinations are n!/(k! * (n-k)!), so computing those is more
along the lines of:

unsigned long long
choose(unsigned long long n, unsigned long long k) {
if (k > n) {
return 0;
}
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d) {
r *= n--;
r /= d;
}
return r;
}

which greatly reduces the chance of overflow.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#202

Dean Rasheed

dean.a.rasheed@gmail.com

almost 9 years ago

In reply to: David Fetter (#201)

Re: multivariate statistics (v19)

On 8 February 2017 at 16:09, David Fetter <david@fetter.org> wrote:

Combinations are n!/(k! * (n-k)!), so computing those is more
along the lines of:

unsigned long long
choose(unsigned long long n, unsigned long long k) {
if (k > n) {
return 0;
}
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d) {
r *= n--;
r /= d;
}
return r;
}

which greatly reduces the chance of overflow.

Hmm, but that doesn't actually prevent overflows, since it can
overflow in the multiplication step, and there is no protection
against that.

In the algorithm I presented, the inputs and the intermediate result
are kept below INT_MAX, so the multiplication step cannot overflow the
64-bit integer, and it will only raise an overflow error if the actual
result won't fit in a 32-bit int. Actually a crucial part of that,
which I failed to mention previously, is the first step replacing k
with min(k, n-k). This is necessary for inputs like (100,99), which
should return 100, and which must be computed as 100 choose 1, not 100
choose 99, otherwise it will overflow internally before getting to the
final result.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#203

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Dean Rasheed (#202)

Re: multivariate statistics (v19)

On 02/08/2017 07:40 PM, Dean Rasheed wrote:

On 8 February 2017 at 16:09, David Fetter <david@fetter.org> wrote:

Combinations are n!/(k! * (n-k)!), so computing those is more
along the lines of:

unsigned long long
choose(unsigned long long n, unsigned long long k) {
if (k > n) {
return 0;
}
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d) {
r *= n--;
r /= d;
}
return r;
}

which greatly reduces the chance of overflow.

Hmm, but that doesn't actually prevent overflows, since it can
overflow in the multiplication step, and there is no protection
against that.

In the algorithm I presented, the inputs and the intermediate result
are kept below INT_MAX, so the multiplication step cannot overflow the
64-bit integer, and it will only raise an overflow error if the actual
result won't fit in a 32-bit int. Actually a crucial part of that,
which I failed to mention previously, is the first step replacing k
with min(k, n-k). This is necessary for inputs like (100,99), which
should return 100, and which must be computed as 100 choose 1, not 100
choose 99, otherwise it will overflow internally before getting to the
final result.

Thanks for the feedback, I'll fix this. I've allowed myself to be a bit
sloppy because the number of attributes in the statistics is currently
limited to 8, so the overflows are currently not an issue. But it
doesn't hurt to make it future-proof, in case we change that mostly
artificial limit sometime in the future.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#204

Dean Rasheed

dean.a.rasheed@gmail.com

almost 9 years ago

In reply to: Tomas Vondra (#203)

Re: multivariate statistics (v19)

On 11 February 2017 at 01:17, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Thanks for the feedback, I'll fix this. I've allowed myself to be a bit
sloppy because the number of attributes in the statistics is currently
limited to 8, so the overflows are currently not an issue. But it doesn't
hurt to make it future-proof, in case we change that mostly artificial limit
sometime in the future.

Ah right, so it can't overflow at present, but it's neater to have an
overflow-proof algorithm.

Thinking about the exactness of the division steps is quite
interesting. Actually, the order of the multiplying factors doesn't
matter as long as the divisors are in increasing order. So in both my
proposal:

result = 1
for (i = 1; i <= k; i++)
result = (result * (n-k+i)) / i;

and David's proposal, which is equivalent but has the multiplying
factors in the opposite order, equivalent to:

result = 1
for (i = 1; i <= k; i++)
result = (result * (n-i+1)) / i;

the divisions are exact at each step. The first time through the loop
it divides by 1 which is trivially exact. The second time it divides
by 2, having multiplied by 2 consecutive factors, one of which is
therefore guaranteed to be divisible by 2. The third time it divides
by 3, having multiplied by 3 consecutive factors, one of which is
therefore guaranteed to be divisible by 3, and so on.

My approach originally seemed more logical to me because of the way it
derives from the recurrence relation binomial(n, k) = binomial(n-1,
k-1) * n / k, but they both work fine as long as they have suitable
overflow checks.

It's also interesting that descriptions of this algorithm tend to talk
about setting k to min(k, n-k) at the start as an optimisation step,
as I did in fact, whereas it's actually more than that -- it helps
prevent unnecessary intermediate overflows when k > n/2. Of course,
that's not a worry for the current use of this function, but it's good
to have a robust algorithm.

Regards,
Dean

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#205

David Fetter

david@fetter.org

almost 9 years ago

In reply to: Dean Rasheed (#204)

Re: multivariate statistics (v19)

On Sun, Feb 12, 2017 at 10:35:04AM +0000, Dean Rasheed wrote:

On 11 February 2017 at 01:17, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Thanks for the feedback, I'll fix this. I've allowed myself to be a bit
sloppy because the number of attributes in the statistics is currently
limited to 8, so the overflows are currently not an issue. But it doesn't
hurt to make it future-proof, in case we change that mostly artificial limit
sometime in the future.

Ah right, so it can't overflow at present, but it's neater to have an
overflow-proof algorithm.

Thinking about the exactness of the division steps is quite
interesting. Actually, the order of the multiplying factors doesn't
matter as long as the divisors are in increasing order. So in both my
proposal:

result = 1
for (i = 1; i <= k; i++)
result = (result * (n-k+i)) / i;

and David's proposal, which is equivalent but has the multiplying
factors in the opposite order, equivalent to:

result = 1
for (i = 1; i <= k; i++)
result = (result * (n-i+1)) / i;

the divisions are exact at each step. The first time through the loop
it divides by 1 which is trivially exact. The second time it divides
by 2, having multiplied by 2 consecutive factors, one of which is
therefore guaranteed to be divisible by 2. The third time it divides
by 3, having multiplied by 3 consecutive factors, one of which is
therefore guaranteed to be divisible by 3, and so on.

Right. You know you can use integer division, which make sense as
permutations of discrete sets are always integers.

My approach originally seemed more logical to me because of the way it
derives from the recurrence relation binomial(n, k) = binomial(n-1,
k-1) * n / k, but they both work fine as long as they have suitable
overflow checks.

Right. We could even cache those checks (sorry) based on data type
limits by architecture and OS if performance on those operations ever
matters that much.

It's also interesting that descriptions of this algorithm tend to
talk about setting k to min(k, n-k) at the start as an optimisation
step, as I did in fact, whereas it's actually more than that -- it
helps prevent unnecessary intermediate overflows when k > n/2. Of
course, that's not a worry for the current use of this function, but
it's good to have a robust algorithm.

Indeed. :)

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#206

Bruce Momjian

bruce@momjian.us

almost 9 years ago

In reply to: Tomas Vondra (#203)

Multivariate statistics and expression indexes

At the risk of asking a stupid question, we already have optimizer
statistics on expression indexes. In what sense are we using this for
multi-variate statistics, and in what sense can't we.

FYI, I just wrote a blog post about expression index statistics:

http://momjian.us/main/blogs/pgblog/2017.html#February_20_2017

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#207

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Bruce Momjian (#206)

Re: Multivariate statistics and expression indexes

On 02/21/2017 12:13 AM, Bruce Momjian wrote:

At the risk of asking a stupid question, we already have optimizer
statistics on expression indexes. In what sense are we using this for
multi-variate statistics, and in what sense can't we.

We're not using that at all, because those are really orthogonal
features. Even with expression indexes, the statistics are per
attribute, and the attributes are treated as independent.

There was a proposal to also allow creating statistics on expressions
(without having to create an index), but that's not supported yet.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#208

Bruce Momjian

bruce@momjian.us

almost 9 years ago

In reply to: Tomas Vondra (#207)

Re: Multivariate statistics and expression indexes

On Tue, Feb 21, 2017 at 01:27:53AM +0100, Tomas Vondra wrote:

On 02/21/2017 12:13 AM, Bruce Momjian wrote:

At the risk of asking a stupid question, we already have optimizer
statistics on expression indexes. In what sense are we using this for
multi-variate statistics, and in what sense can't we.

We're not using that at all, because those are really orthogonal features.
Even with expression indexes, the statistics are per attribute, and the
attributes are treated as independent.

There was a proposal to also allow creating statistics on expressions
(without having to create an index), but that's not supported yet.

OK, thanks. I had to ask.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#209

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: David Fetter (#205)

9 attachment(s)

Re: multivariate statistics (v24)

OK,

attached is v24 of the patch series, addressing most of the reported
issues and comments (at least I believe so). The main changes are:

1) I've mostly abandoned the "multivariate" name in favor of "extended",
particularly in places referring to stats stored in the pg_statistic_ext
in general. "Multivariate" is now used only in places talking about
particular types (e.g. multivariate histograms).

The "extended" name is more widely used for this type of statistics, and
the assumption is that we'll also add other (non-multivariate) types of
statistics - e.g. statistics on custom expressions, or some for of join
statistics.

2) Catalog pg_mv_statistic was renamed to pg_statistic_ext (and
pg_mv_stats view renamed to pg_stats_ext).

3) The structure of pg_statistic_ext was changed as proposed by Alvaro,
i.e. the boolean flags were removed and instead we have just a single
"char[]" column with list of enabled statistics.

4) I also got rid of the "mv" part in most variable/function/constant
names, replacing it by "ext" or something similar. Also mvstats.h got
renamed to stats.h.

5) Moved the files from src/backend/utils/mvstats to backend/statistics.

6) Fixed the n_choose_k() overflow issues by using the algorithm
proposed by Dean. Also, use the simple formula for num_combinations().

7) I've tweaked data types for a few struct members (in stats.h). I've
kept most of the uint32 fields at the top level though, because int16
might not be large enough for large statistics and the overhead is
minimal (compared to the space needed e.g. for histogram buckets).

The renames/changes were quite widespread, but I've done my best to fix
all the comments and various other places.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-Restric-v24.patch.gzapplication/gzip; name=0001-teach-pull_-varno-varattno-_walker-about-Restric-v24.patch.gzDownload

0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v24.patch.gzapplication/gzip; name=0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v24.patch.gzDownload

0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v24.patch.gzapplication/gzip; name=0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v24.patch.gzDownload

�/y�X0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v24.patch�<�s�H�?[E�k��e���86�a��s���l6E�����Q;������%$����;j�������8
�{�������I��9�w��`�sm����Z�~��Z�S�>D!�%�NX�uF��v�e���D���Q����'x�����8����K��'��}n�Y���k���g����V���	'9c�?^M�d����?��l:���g�X����	�*Q��Y����������I�0���D�pd���h����"�cV~�t�k��7O&^8g���BJD�	�sx��h������l����r>
]��$hr����Uy���7l.�"M@J����K`?����/�� �'Y��n�$)���e�8�����PW�v����#�����%�dA�[��(����(�p����L��K�M���'��yaG���@��IaZ����2C& +(E��_��GKMj�BM6Yx���yl${B��z�����������h4�r�2���ojKK�X��w�7`�-�X��N,@�S������Y
�!���� LJ�)�2���O��_$������#�HL:�+Mg����\�B�r�����,o,Z%����
-���"n��o.}��yp��p	�
����<0�e6@8����|��*fD�.��V��=�n>K���,D(�YX?{A��Q��T ���,������-E����A,�+Wda�p���R�v�~0�0����L����#�`'��%p�7^��r[�&�s���E�X���J��nTC�M{���iJh�aR�Q��(��&H��4u�i�oK���m���kB���uOX��{a��H����z�fD�|�]�+�b���[�S�C/Y����\9\���dE�>�.�M��g8�����Jg!�U1��O�R�-G+����X/�`��`�	5��z��q���e�
�F�N�a��Y�pN}]���J�bh�J�Zg��f_����T����A���~G���b����y�=3���X�1���74����8�m��kq����v��hf�_9��?9���[��vZ�:d��z��[C���	}Q��4��8z���.�������2�WN2���^�m�j9o�f��`x^n����{A�o���}y�X���0����<(�I���(�����*�#Wk
.�q�Z�\@;..�����~�H�5��5���i��������:0��m�������I����-���lrn�.$frFK�I�{%����F��d�����h<]��g6z���'l���/�IG�^��s)/�?^=\}N�����hX0�f�%��c��=����U� �����s�����j����D�4���*���=
��xf��-�_�|�z����O��i����s�(<�t��qLFn>`����w-t�?�� @������N�6P���s/�!�t�lh7�m	`Z�c���/�/h�������[x3\��Sv���5
��1T��J�w��/c/��27c�2G }��E�a�����p�{��������A�+	*�!�P�������X&�E96�q��r��P:�8������O�0�x,Zj�-��I����fL]M��I���qN\eBl���*�v��C<^V�J��4��ON;�
CdA�]�B�q�&������7�d����T����D+_���%k�X���<����u��fx�?��7��s�C,kP�y�����&�L�����lt7>L�gr�w������yML<��k��n-��\7!�4�9BV�:����d����KJ��([�:��
Ob�)�R0P���wU��A7�B��,�?/Z�_4L7>�����g���YV��j|^�Hu=/Jv����s����G�OL�nH\�`W�����X�Xk�.�������	�������b-�1O�p�:�3�����"	I0UKW��u�#��>;�Q|��E��� ��"�q���������PYT�59��y.�P���<�Z�(�v�~���r�j+�#�;
�i�V�_�:����v���i[����u����I���
Cw�����7u��`��zab�+��=���!��6�����kp��l���HJ��!N���T�Lc}�����K:��;CH��(���.���,0@��'s�_V�i��@v��H&W�e'L��,T�$Z��:�~��Vs�+
W�p���/��_���\1Y3�}2���:.��u�KQOM�8CR����$H�k	�`�#�8�w��e���k�������|��{��?��i4�nx�
��H����7����r[b��FO
���i����pl��t��p�
8 �Z����:�]��k�Q�����
_�	�%Qd(Y�5!�GG� �]s�f����j�/�����'�GW��*#yUA-������L�����(�G���=*��E�W����C)��,���:�����
)�]��F @�H<6.�U�:�r�
9���+�=�����Q��=�/"y�[x�Jx�b���SaI>�\ T�l�Q�^�����(�^@�nh�h}�,��1�&o�`�����&������	�� ���$���v,`*lI��L��i]�L�i�3n��59X�I(,��(B������V@[����
F����X�]��|w4UD;�+*��������ow7����������o��u	� �9��+�M���ya^�~������@������2��X#7!�������������!�bU2?���j�P5P�������r�4����0�n�l9��kx����^W�O��
�^
G{3�;��u�23�R�q��q�]@�@�BI�x�;5+@L��(MK�'k�:"��_r��`^A�x�J7�c6o���Ww���x�SB$b��
���A'�]��|7��iJ�����/Wa�#r�U����?�?��S����Z�Ik��J��o
��*��
�.���CKS�m��SoO��.���`�m{��;�Uo��_�����F;l�
jN�UxbI{��#���#��(��-6�yY���Pn
gl�v��M��i�V�&����������E���Q��}��
��n���|���B���>W��fJ��l�gQ@��i��G��U+���oIZ�Fn�=DO�ro+�{��>�\7�������i���>��&~�n�;Fm��j����>�![������gq`����������hx{S�!Ln��0m4J��S�o?HkC4����z�O����)��J�!�*�|���}rgH��y��sjsaw��y�:�����N�>�����E���/����q`H�%G�����1%�0N�Y���k$RUaq��b��)��@����BB���&��*.>=�&����F+;�)\�A�f��+�w�����|����w��=?BqK'����"%�����>WQB���t���=7d�h�v�����f�������E���v]h�PJ����
�/-�E2M#1�uV�$w����Y�������G��L�G���,"x�,�ky
-�����6����ij9r�.Dy��S��*,ZW-rCHc��B�N@���([M7�t���Z��@F� SuV��U7��KA�I���D��"-�F}�n$z�}s�������e�AG�m��S�v��i���[���(kd�������/Wr�&:_�b����q6j����(�Q�8:�.�"z��Y���$��K�����q@{�3�����w������{r�?�����^|Nq'�n���?�\���R?�i���������'F,�Z%6���~e��y�����3?M����j��vW�x���+}�*��t��`����z-~agO#�����$���Nm_NE��u@�g�
�����&��	�T����O�P�'��=���&�,m��Z�G)�I�n*8<�x*4P���t�FE��w���C��{������c~���1��#�g��^[�R�[�Sj�-��L|r��z�����&��3��s#t"''v��2M1���]w",����F��Iqm�H*W���XSJA
�DK�2�f�M��`�H,���R%R����l�c����1�����*��Q�~R<����i��b~S�__d$}�|����t�)��
�{0sf�Y�@�������4��N�A?���'��(�����B��_�b@�qK0��d�8��CE\��K0�)��"y"T�#j�n���g+�_��R��Z4m\�T���+��J�Ev��e+�Oz�{�-LZ�nlV��F~��N�$�&Q��L�'@J����_V��nB.��>��YU���	�vx��MR���������=�%O�
�"���.�
����F��g��*9�K�N�������/z{�;B��n�3~[<�/Q�������7@�;�����N��M\�zI�Ia�����S�77��5j�J3j>��=gQi���#�	������=��m�1���Q���<g��c(�m6�q�(�p��b��cu�2��&��}�����Q[p�oey����KE]�������F����G�p��`��0�J�������E^������"�~b�������K�s��Q,���5��F�
�2r<z���q��p<)hWY��&{�����Q� ������
����SEBm��x�@���X�<��oC�H
�;/N�u�������rX�@��F����=E+�����&a���I2\7T3�X<z�JR6���"	���"Y0�aU�c�L����]��g�������(�7�����iQ�`K(4����"�5�}i+<T���BbP���:���Xz.i�.�6�^�T�Lv
�������7���������\�����U�y�Af�Q<��s���?��������:��:|�+=���������w.���Lz��E��J��d�1j;0��:����WR�\�������+����
`�o2�>	���$���&g�}d�lPYI���*����t�-�A�F
���\�{$�"I_l�l�p\p��Q|�T���`Co�B�I|�"K�tj�i�8x�
2L�|"���~b�
t<E��s	.eb�$ws���*�'�U	���JofXD����w��Mq��~��'^ <rd[8t&��R���9:����&U������<�Vt�+����Ap����l�^R#)#��Wd	_�1��4j�G+F�F��z����u��=��K\F�:|u����9�[����w/M��2P�������
9�zYl
���Q��E~Z���U���F��k��k��P�c|�b�%��X��Hb���-L`-�C�>z%���<����z@d��+��j����- ><�@���BH�--������j�$�V�U��m��=�{�#o\�6'����<���A	i%Q�����X9������^;�#X�B������S��]������L��uU@6�;T7~$�CS�DP >p�[G�������w@���/FJ�|y�m~�j���}{[G����S��O�	�W��}`�����$���g$��!i%a�������efz����1N��{WWWW��:�Qx��5��!�@�G@�����A���������������-�#3���~!sa�3�����b[,�Q�@�s����h��
0k�a�C���3]	o�I^s��Lu��=�A�r$�b"�A�1���'��-�P����h<A�a4��1�t�g��z�tf�`Z0EEf����(��������nY�-fnx���D�R�cR\���Hm�"`M�a5b��{����G\hV�de���R�����Z'��TNkXoi,��3�Tm�;jq���? QXW��H�T��w�!!���
H� �T������l�/�v��D=A:bZ���Ek�YZ�c �O��B�J��JH>4����$(�p,��Y�Q��IM�Np���������i��-�������K�E�� ��G�������/e�����k����y�������Vz+�gY�o@�
��ux'�	m $i��������8���BA���6z1���{���������A���������I������� T��@�������{h$���Q��? g��i�R%��t<������}��+�{��~�W�?z�O"���*�a^���`�W������R�c0!��I��e�^�"\�����l.z$��S��&{�=~i��h"`jb����i�A�B�����w��/�Z�h�L��,!90$�G ��Y����X������fq���^�*| ��<z0��6���s��`Z,4t$G�C��a����4����<����0�!Z�gZ���l���a!Z$�f���L�T���6��
a�_�������^�7g4�Y�����$�)L,
����,�����nd��b
#D��9�{x7"3��'XmBY7��V�M=.��,�m�DI�a�)������:�#	�[B*��b��74;�
�}�jp���n���R�Z��h5;��Zm������������5���{�|�	!���a��d�S������n��o�b��X�Z����7������B�q������I|._��z���P�a�	�X~
��)���<�(���lz��Z�L��nj}��8@����<	PLz�;RC�8��U����~���`Ihf���p�.�R���\x�����wfaj^��[�B7Dn�!��b�vv���g��4FC�hv�����������<����%�p�?�2����=��	����m�[f0�e�ua��c�p=���E����"�X�%�tM��=�b�#�sc��V�-t��'���)�J�(m�c�x�����^|���%C�q�.J�-�Q90����v	H�������%"�����nG��$����d;;2��0#�3�j�t��L�d�b����L�'�"�'dy
�[O�H��L^3o�xx�B��`I!��;���mw������)�7h
/��k�?� GQT��V�:��o����#��#���fr��6��������vx!�s8�##�u)L�s�	����Y�"�:��F��o��Y:\H�Y���eF��tR���i!����)!��V�����n���������������[����3���:U�������Sd�����P����o
U@Q���[8��Q-���	�< �,�(xZ�AK9�%�������=��b�'W�
���ll��Y �d��fX+;���l���d���x�S'�7���(B�=?BKc1����!���mK����-9p>!�]r��q���a�����2�Y�	�����Vq.�Qetm���[J�7e�^�8����{TJE�+Pi�ln���!�T-6C|"6e}������q�O�
�`���]^��8���pn*�|����h|?!3+����[k�zc���{Ax>u�G@)���5/(i���d~���aL��Y�p����d�`�i��R����_��J��
����:*V��1��r��|,�Eq�w�7�����+^r�K_q{��"�&�i�s��
O�I�WS	)9��X&	�{�6�&������h��L��ED�Q_�&hu�H��f�s�`,H
���8�Yk�M9�E�
ht�(*�j�_&��a���5L����8gz��P�/&�A��)��S�c6qt�N���J�R�!
�N�c����/sVd��dq�3�3�y��{�6�2e��v��{:�Dk6o21��)oHV�G/'�Z�����4^1���6��G�D�Ue{���E�8��J�Id'w�����y~��=��M] 1�r�����	A��`�zAK:�_��1����S~���E����������[��C�������iW��q5!�&����	\��"�&~�����&�3�d�T���!|T*�������c������L�C�Q�?�C���|�����>��TfL������`���l��A0���(+0[�rR�V��1���\��O���Q���*�$�8�$/��p=gp���
���e�
i�;��*x���
(�)�H-D���wo0�A��uS��)�2;#T�7}��������
{Z����v�,��R>D+�
��<Z������DdE��2��/��
�n�I���Y��f�&�����U���N�e�LK��mw||�����)�Wz��U7�EQUn��j}�U*O�$�e��w�$�$	`��q�n����c�����}=|93o���d{o��F"�eBu��a<�w��DQ�<���I~&��l��D�i����q�^��4����Q��9�������'#$��8��Q1���Q��l�"��4�n�I&w*=,U|��0xl74�7eWdB7)z	� �mPP�<����PM�P=������1�g�^�	"��������B��}{7�����9�����(�M����H:����a<�K��3LTw��+�}z�o�aD/�u=m�P��^*�:�Y�li6�'<����0Y���-��j�/���R�L_���]��;���Kz�a�$�3"��IN��i�7���|v�\,z���c�u��- ��	����F��g��J�������������q����Bd
��h"�]sQ�e�� �R�����������@\\���~�:�g�C!�)e#��i3P�T��f5t���X\�(��\`��i����Xr��'�S$�=� <Q�����D}���/��R�>���#��u���#.��TI�H��`~��G�2������iF���#F���{	R:���r<`I�x�E��*�����S����Gt�eOUX ��`1�����e4^8L"c�T4	�5��N��sa��G��_U+/t@�+�>P�H��_�&���9��$A�TG���S-��	G��iu����$��D��C���hj���)�,�i���� �|)�QU�6c�
rk�Q�>D����|�_�D�f%����7��7����v�2,
�"�C*a�F�n��'���Ex`�'�r�KH��h0�x���U$�N���Ym-��I�Q���I�x	a6\U�\�8�����?����S���4�R���</r7���O�����75��o�Z#�x�4���Y��N�O	^�:�`��N����j������
�,L�1���O�$.!����n75���9�IN6_w�������[�x^t��p�$k��B;��;����8���;�!An�1@��o��:���Mr�{��)0�QE�/;
0�.�C�'����Y�,�:/�eF#����4�������Q"�{��UI��tm�mLb�]G3\�f�TU}�����q�w�1x��lv7>��]4.��4��I��&C: ��*C�#X8��w�7���\�C.sv���������������G
*���W���y<�ia��nw4!�o�\���k^��Mv�u�j�qs������
V��$�M�z#�F�,���'��
���vLHmm����w�l��;��s��$�"gM��t.���pxO���O�!��(��5�3!= ��-������r0>�?C��VX.�P��U�L��#�/�$gMyR�j4�$���"��)�����M���YV*q�4��)y0�v��� �`
6�Ak�����)Jc���"h%�z���C�}
��V�
��e�{{Rq=�LNJ���tZ�H@x)G��&Q7_��*dY���33���<x@��J�5mS�	a��a&���z���VIz8�%�&J�4���I��3#NjmhR�J�hK������iV�.��+BfU�AR�~�v������pY�Y���l��F�l\�dcf��Ka:���^A����q�����X���!�Z�ij�Dg��\6���lO*f��}�����*������;�$�H~�nJ��{	���u�\Aj�:��x����i]U�Z]����R����cek��C�t�J4M�AN�:�blXk`u\lUB�7�d3���z�d8�4B�E-�q����������a������U��U��V6� �D.����
<�F��`"+@�2���~Y{��,`�����N�}�����������}7��+���C���S�����}T��T.Urp��7i�ml��/]
�=��8gt��O�M7�3�����
�@z{z6%'�Z�BW���^�:�-(�����S������Z�z�~��6uS��X�U��{���Oz�%K�w������GY)d��
�HT�����$
{���Kk��dHXAex���a��:��_�T�	L7���m��Y0\h��-����l�k�!��Xn������1�M��h#�E��z�@+�s��W�����\��mX���~�[�C���������DV���l���^��=����<n�3��uk���|��C�}Q�$j:�P����7�d�<s�`S����@QXywF�A����U��}���p��>$'4� +��1��+j�%I�x����VxEc���6@@�e3�hQ0�n���[�JY`�Ump&����T�wwj�fVq��\6S����,������'�$7�h�����/[�����:�b�ZB��_�_��8>x�KL��D��������_�d��O��eAdi[����A����|4)��;=V|-6���y3��,��g_)8�iY^����0�N�=2A~��2�G��#�k0DN����5�l�'d�����H:�D��~��~
sd���B���`�����hv;&�TS*
��]>D��ncC����+VE���D�Z(���������������R~�*�,-X`(�	������y[,�6$wk*�[!������Y~���S���!}�s3���"����RP����U����G'�*yoany������k�#q��	t ���b]40l�ad���.�Z`Z���"������i�Gm4���34��w=di�p)L����9�h����G�LY=gP�(;Y���V=B]��L��K�r2�| [y�U
�V�����pWpb���g=�SL���wc�O	{��)K��2��_��]H�Rj����C���
�-���m`*Y������G?sdW: �+�5Q������m�8�k�|��r���iL�U���!���F����l��D��BXRL�>���4�3� ��Ui���	������R�u�Y���8\F��o)�
��\����L���<���@�O7�����4�M�����j����������P���������������C+�	Q[�������PL���L1�&.'�H0UdnY�K@���b$t����|�\�!%Z_`�~���4��L�q�q����xr��xI$��SN���Z���ucJ�� OD>�h:'&$-���>9@^�
��W���d�pz��ak�
oe��H�",�"���f*/�l�<:�<>;�,��}0X�����������W�8K�4�9���E�W#}%�OV6N�ZJl����Jv���r��FOg�Jpr~�e6w�3�'m���0I]W	s.F� ���6b�����@W}�#��>�c#1�p���"�����	��TZ8��eH�
��T�����^(�:P��q&�>�����V�x�0��p�Fc
�+	�b��}S/��#��6�L����P_x��8����6K�v}���3$[��j��@tK���/��_���m{���e�;O�HjD��J'�l`I-�pD4�vs�c�T��uK��L'�G�<�f�}~�����}zv��|u~~vqut��A���n8��.�������D��tv��x,Z�Q�8�:N������SV��
u������b��s�!��I�8�����w+����j������!��x>~
�e�����@Ew������s���C�n�f	g1"�&q@0[���/�;��1\�s�(Z�|r���#9"V�;s�# ,y:H���������hUp|<TP���+(���OWG����@�����s����G�5����K�c��_������J/I���+����O,���d�7��[�����ti�mXr�J���k//�ZCd����p��&^� G����o&�NCa1�V:V�w�3��'��~�3����eg�������!aR���w7�[�k�t���}����W��/(!
/I/�I�}�LOv���������iF�i��o(!|��
daTX�"`���L
������J��J����25��K0�u��g��[����^���o��6Z�Z��7:;����%�/�i(�������r�v�Tcw����7�����f�T�Vk�UR��"�lY1�"�����������Kf#5��{2���D�U���d���6��*����*V�����,�k4f��hr�����5�P��5����nln�[�Z������Bs-%��kICs������O�������"X>^?[N��w�@jmn�8�����x)��8�<���k4U����L���?|��������#l��(ej��f
K6?9������!�@b���Zn6�7���SY~'*�Nx��U�uX�C����y�yT��y�h�:A��hI��N�Q���>�M_7�$
ME�WY�N��uc��#�@P��@	�S��
-��	SUfc��\d*[��-����`��� ����Js�x7�c���4��zTp�>����{_d�I����������6��v���u��{Z5w�I'�H��mlr������A:q����c��{���P��	'G�KJ���D���x�T�Tt�>���W�m����CE�o�{����>{b���DR!��F�k�e�������?!&�����s���o�$�����I��+OY�ddw�7b�p�S��� G�����H�b��L=���^�H�� �r#CRP_H�(�Vn=oLJ��R����C��[���A^X5�#����
��J[E�d)��B!7q&��zB>�J+�[�7E^7`($�� �3���
�\�p��}QA�i�����v������^����m�zR�\f'i8�-��#{�2bi��$+r����6uI������Q�9����E����AO������w�}IW��i��'LzE3���F�b�j�c�u���Pbt����"; .a������VSF�7@?��0%�����y��������d��}cc���q���q��q�9��r����wlGi�i
�u�N���&�����h0 ����"�������/�OqtZ�������:�P"�3�V�	�U�;���(��n�^��/VO���IDL����?{���<�D@?��s9��=����HH�>m!��nL��#	]�h��k��MIy9�@�H�u8au?��q)��Z��A��6"J9��*���6��'h��
�7%�mh p����Q�h�JaT%�y)���p�!�1~	8#T��eE�{�����T���`��}������ZkRZ[������l�7��h�) �b�$,$4�n{�h4<s���;���:(Mf���
��
�xc�W�b`�������gdiy>z�=4y�[�g.WQ������&�9�.�����QR&�\OM�8"�0���,��e�Q�o{@�'<p��J�K��!��k!��V3�v7j��~����)��,�e� �m��;�[K@p��8�LBY�3{����5L����X�����?�k��Y�������Ly=��+�S
��u����Z��S�a��i���Q�=�X������"�`��6
�i"LT]D����C��Nvk�������Q������d�<j���p_�6������B�a�t��+��< �n�Da���R��w�����2���&���O�YjOF,n�?�d�^�b������f`�|�2����7[��m<&grt��(�����
�6�P8dgAHV��7�����*�,����:�I�v���q{N&�\�������N�`a\���~y��������d�2�x�O�D�OI��"@L�;�x[MH|weX��u!DBe��c����a�mJ�
�����R
�����C7A�a&�'Ow��O^'F0h.I�v��`��03W1������������z�%(���4����U��:��xg�a��R5����1����������|(M2sr����e.�Ssc�E(�\����)����MX�A?��"j=��>����e�X�d3������u�"_�}��(��������n�7�7v�N��-�ks�pill�.P&��tb�$D����T9�8��:RW���=i��>�P�^U����w�^� �@ewC�I���M
F�w��
�����I7��:;U�dns��z~q�2�YPaf�(��.'�/���x�L�:�R�u<��4;,������iD#���;��C]=�m���3��%�Fj�P��9�.4�	5S�pH�o���:�gc���$��%)���NUu6*�@��[^x�p�w"./U�O/�.�p����L]�\!	��j40`���l�/�7������E���7e������O��]��B���	
��nU����������&��Vr�Yy�x[wR<b�2�k�6��Q={�:����I����>h`�����q8�s�.D�%�3}va����7�U�M�z�?fZ������,����'MV?�RG�h�'�^���z�SU��f�a��("�y3S�������������z������C����������\b�J�Qrn�x���>l����z����$�������'&hH�a�.?���w������K_���g�_��/g�~g��i��A���_����~g���N�_�8�Q�L����#E������J�9���k_�����E�nU���]������<�/�p��������+|H#zti�5�=:=L"97���[��.[������?���~��?��������n=�w���<��K�rE���9���{�7$}�����4o�sO�y��U��v��V{��(b�O$���v7j��Z���a;W��F�d�W;���:���!,�����4�5������|K�����F���Z|O���oe�W;�����)Ov�0+~��	+���W?����#
�j��nG\���;�S��t(���	GO��N���`x���Q��~\�tN�����k?q,�&>
	u������%��B��M.Luc#�l�n�j�n����.�T=.b��,��G^���a������.^*�E�������_����	��=�I3����r�@���������W���SBlE{|��������t\���c �������<�8>�Y���������]YI��	'Br���X�aU����N������LH"Z����L������M�/����1��hByXB�pp@����+�����Z����.;��n�4�H���!f��-^c����x
$�j������#�w��Na�"U�B�O&G��i��S�>�H�|���0B�m4{;�CWgE��sA'��������{����<�"�����xI%�7J�$~���\#II;'[%K�P6�[���T�x-j����J'Y��0'�ot�-���`��3�:��������]w4|M�t�T�����F7
q0�����������AoZ+��&w�XZ�J�fg�Q���j����N��~���E����J��
�,�s��(-����U3�a
X����w�F�k��t���~KL�y9�I`�;�ybbqd�.��+�w={��p�'O�����!�(�rs�Ie�z�$�gn6tY��0�� �CMEB�g��D.��Q]��nmDJ��4>x%g89z~er��
j�9���,�-{��gQ.�J����@~Q�V��;�na6�.m=����p�w?Z���5HL[��`�+���J�����K����+ur����������f�G>�nN��)s#�5MI{PQ{E���S<S<m$����K��AIw)'�������dMa�p�6�Y�K%��1�����c[W���������F��p��_�{�l ���^O���H0������������'�����C���;�M0�2��EV����
s�I��gz�)H/,�}����L�w���$�������b��QT_�$�����lB\
�b6T�?���<Q�{�#a���h�$�aS5��C��}5������gq52K��3���X0��,4wTSI��n7�77wj���������s��t2��V��	�x��T`h��C�����G<�������M����t
���U�z��v�x/�?���p���}^�$��?����+b^�S��_��?n�(��_j��l���4���-$X�w�k���u�����_������L���j��k�k1���D��"��kT����U��;�Z�����<|qI��+����O��O)���=��x�/�����0X�c�A�B
3�%dSjx�n������v�����N��)k[������������2�;�1�
[j�6k�KK�	vp�o

0004-PATCH-selectivity-estimation-using-functional-de-v24.patch.gzapplication/gzip; name=0004-PATCH-selectivity-estimation-using-functional-de-v24.patch.gzDownload

0005-PATCH-multivariate-MCV-lists-v24.patch.gzapplication/gzip; name=0005-PATCH-multivariate-MCV-lists-v24.patch.gzDownload

0006-PATCH-multivariate-histograms-v24.patch.gzapplication/gzip; name=0006-PATCH-multivariate-histograms-v24.patch.gzDownload

0007-WIP-use-ndistinct-for-selectivity-estimation-in--v24.patch.gzapplication/gzip; name=0007-WIP-use-ndistinct-for-selectivity-estimation-in--v24.patch.gzDownload

0008-WIP-allow-using-multiple-statistics-in-clauselis-v24.patch.gzapplication/gzip; name=0008-WIP-allow-using-multiple-statistics-in-clauselis-v24.patch.gzDownload

0009-WIP-psql-tab-completion-basics-v24.patch.gzapplication/gzip; name=0009-WIP-psql-tab-completion-basics-v24.patch.gzDownload

#210

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 9 years ago

In reply to: Tomas Vondra (#209)

Re: multivariate statistics (v24)

Hello,

At Thu, 2 Mar 2017 04:05:34 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <a78ffb17-70e8-a55a-c10c-66ab575e88ed@2ndquadrant.com>

OK,

attached is v24 of the patch series, addressing most of the reported
issues and comments (at least I believe so). The main changes are:

Unfortunately, 0002 conflicts with the current master
(4461a9b). Could you rebase them or tell us the commit where this
patches stand on?

I only saw the patch files but have some comments.

1) I've mostly abandoned the "multivariate" name in favor of
"extended", particularly in places referring to stats stored in the
pg_statistic_ext in general. "Multivariate" is now used only in places
talking about particular types (e.g. multivariate histograms).

The "extended" name is more widely used for this type of statistics,
and the assumption is that we'll also add other (non-multivariate)
types of statistics - e.g. statistics on custom expressions, or some
for of join statistics.

In 0005, and

@@ -184,14 +208,43 @@ clauselist_selectivity(PlannerInfo *root,
 	 * If there are no such stats or not enough attributes, don't waste time
 	 * simply skip to estimation using the plain per-column stats.
 	 */
+	if (has_stats(stats, STATS_TYPE_MCV) &&
...
+			/* compute the multivariate stats */
+			s1 *= clauselist_ext_selectivity(root, mvclauses, stat);
====
@@ -1080,10 +1136,71 @@ clauselist_ext_selectivity_deps(PlannerInfo *root, Index relid,
 }

/*
+ * estimate selectivity of clauses using multivariate statistic

These comment is left unchanged? or on purpose? 0007 adds very
similar texts.

2) Catalog pg_mv_statistic was renamed to pg_statistic_ext (and
pg_mv_stats view renamed to pg_stats_ext).

FWIW, "extended statistic" would be abbreviated as
"ext_statistic" or "extended_stats". Why have you exchanged the
words?

3) The structure of pg_statistic_ext was changed as proposed by
Alvaro, i.e. the boolean flags were removed and instead we have just a
single "char[]" column with list of enabled statistics.

4) I also got rid of the "mv" part in most variable/function/constant
names, replacing it by "ext" or something similar. Also mvstats.h got
renamed to stats.h.

5) Moved the files from src/backend/utils/mvstats to
backend/statistics.

6) Fixed the n_choose_k() overflow issues by using the algorithm
proposed by Dean. Also, use the simple formula for num_combinations().

7) I've tweaked data types for a few struct members (in stats.h). I've
kept most of the uint32 fields at the top level though, because int16
might not be large enough for large statistics and the overhead is
minimal (compared to the space needed e.g. for histogram buckets).

Some formulated proof or boundary value test cases might be
needed (to prevent future trouble). Or any defined behavior on
overflow of them might be enough. I belive all (or most) of
overflow-able data has such behavior.

The renames/changes were quite widespread, but I've done my best to
fix all the comments and various other places.

regards

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#211

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Kyotaro HORIGUCHI (#210)

Re: multivariate statistics (v25)

On 03/02/2017 07:42 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Thu, 2 Mar 2017 04:05:34 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <a78ffb17-70e8-a55a-c10c-66ab575e88ed@2ndquadrant.com>

OK,

attached is v24 of the patch series, addressing most of the reported
issues and comments (at least I believe so). The main changes are:

Unfortunately, 0002 conflicts with the current master
(4461a9b). Could you rebase them or tell us the commit where this
patches stand on?

Attached is a rebased patch series, otherwise it's the same as v24.

FWIW it was based on 016c990834 from Feb 28, but apparently some recent
patch caused a minor conflict.

I only saw the patch files but have some comments.

1) I've mostly abandoned the "multivariate" name in favor of
"extended", particularly in places referring to stats stored in the
pg_statistic_ext in general. "Multivariate" is now used only in places
talking about particular types (e.g. multivariate histograms).

The "extended" name is more widely used for this type of statistics,
and the assumption is that we'll also add other (non-multivariate)
types of statistics - e.g. statistics on custom expressions, or some
for of join statistics.

In 0005, and
@@ -184,14 +208,43 @@ clauselist_selectivity(PlannerInfo *root,
* If there are no such stats or not enough attributes, don't waste time
* simply skip to estimation using the plain per-column stats.
*/
+	if (has_stats(stats, STATS_TYPE_MCV) &&
...
+			/* compute the multivariate stats */
+			s1 *= clauselist_ext_selectivity(root, mvclauses, stat);
====
@@ -1080,10 +1136,71 @@ clauselist_ext_selectivity_deps(PlannerInfo *root, Index relid,
}
/*
+ * estimate selectivity of clauses using multivariate statistic

These comment is left unchanged? or on purpose? 0007 adds very
similar texts.

Hmm, those comments should be probably changed to "extended".

2) Catalog pg_mv_statistic was renamed to pg_statistic_ext (and
pg_mv_stats view renamed to pg_stats_ext).

FWIW, "extended statistic" would be abbreviated as
"ext_statistic" or "extended_stats". Why have you exchanged the
words?

Because this way it's clear it's a version of pg_statistic, and it will
be sorted right next to it.

3) The structure of pg_statistic_ext was changed as proposed by
Alvaro, i.e. the boolean flags were removed and instead we have just a
single "char[]" column with list of enabled statistics.

4) I also got rid of the "mv" part in most variable/function/constant
names, replacing it by "ext" or something similar. Also mvstats.h got
renamed to stats.h.

5) Moved the files from src/backend/utils/mvstats to
backend/statistics.

6) Fixed the n_choose_k() overflow issues by using the algorithm
proposed by Dean. Also, use the simple formula for num_combinations().

7) I've tweaked data types for a few struct members (in stats.h). I've
kept most of the uint32 fields at the top level though, because int16
might not be large enough for large statistics and the overhead is
minimal (compared to the space needed e.g. for histogram buckets).

Some formulated proof or boundary value test cases might be
needed (to prevent future trouble). Or any defined behavior on
overflow of them might be enough. I belive all (or most) of
overflow-able data has such behavior.

That is probably a good idea and I plan to do that.

The renames/changes were quite widespread, but I've done my best to
fix all the comments and various other places.

regards

regards,

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#212

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#211)

9 attachment(s)

Re: multivariate statistics (v25)

On 03/02/2017 03:52 PM, Tomas Vondra wrote:

On 03/02/2017 07:42 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Thu, 2 Mar 2017 04:05:34 +0100, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote in
<a78ffb17-70e8-a55a-c10c-66ab575e88ed@2ndquadrant.com>

OK,

attached is v24 of the patch series, addressing most of the reported
issues and comments (at least I believe so). The main changes are:

Unfortunately, 0002 conflicts with the current master
(4461a9b). Could you rebase them or tell us the commit where this
patches stand on?

Attached is a rebased patch series, otherwise it's the same as v24.

This time with the attachments ....

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-Restric-v25.patch.gzapplication/gzip; name=0001-teach-pull_-varno-varattno-_walker-about-Restric-v25.patch.gzDownload

0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v25.patch.gzapplication/gzip; name=0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v25.patch.gzDownload

0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v25.patch.gzapplication/gzip; name=0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v25.patch.gzDownload

�j0�X0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v25.patch�<�s�H�?[E�k��a�q���[��3�ff�)�%5���Z�a��}��-!�������&`I���������������gV�ub��;��|�����a���V��Dt��C��X1��u:���v:�����ip���n��E�Wo�������01�(�4nx"��'�6��>� ��y����F��t�����p�s������G�k�0�������xQ�}���]:������%K��n�����[�8�F�y�����,�3��jL)�8f��+��L|�d���E,�Ddp?�P<2�'������Q\E V�Y�
'a@��d����c_��es)\i
Rr�^�I�B�P��(�d1��1���C�������1B!\������l�fO��h��Q,o�{ �p$���>�=p�s3#/�6�+�8K��I�kXYx'�iQs������ ��T~&D,5�
��l��H ����H"����7h��-'��8@%��j���e������&^�����o�;`����X�g}�9����C�;_A��Z[nd"���'��H��G���tW�����=�0r������Y(�>X�N��l�����"n��o�|��yp;�������y`n�l�pVn����U�=��}6�����]�|��{�Y�P��,�~*A��Q��T ���,����;�Z�d�G��X�������21�����j0�0N��X&�L|��E��L�8���R�-y��W��"X,|Jx%�R7*�������|�4%�*�T`�e0
��
�5m]{���
��p��C!����|�S��z%l�	P��:O��������}Y�
�U<�<��MJm7����k_KVT�Sp�2��~V���/��t��]s��T+��q�"�/����
&�
�M�a�����&D��m!�"��s�d��������� *���5�M�Bi�����L�A�-\tN������^��������a�0�a��|�Z��W�}���<4<��
h�g}�����vVO3���A���a�o��V���5!���5`o���m������^��#\
uIW
�L���I�v��'|�l�W�E;�0�����r�e����)��^�w�Mc3�c�tF�����%u�fF�HR+�a����\�5�X��zEWp���<�:G��#�����w�i�N��8�5p������O]�	M|����H�����c���&g�F�Bb.7a���t�P�Ezyi��!!LGl2���'����}f�w��n�F?��	����t��9>�������������r��
���dw������8�����}n2�4������������Y�)4��G�.������z��-�^G����S-c����&
�2B�n�����4!��}���#��y���}x��/�
��.����Z:T��[��0�����R���g�h������-����.(��������1T��J[���_�b/��2�c�*G }��E�a�����p�{��������A�k	*�!�P����]��X&�E96�q��r��P:�8���o��O�2�x,Zj�-��Q����fL]M��I���qA\eB����*�v��C<^��J��4��ON;��BdAV.z���d�S@V�r�.�E2��yA�+]���y��������K��YAE������V3���2������6a�a
���!����G��6������3��;������ymL<�3�t�n-��\�!�4�9B��&����d����KJ��([�&��
Ob�*�R0P���w���A7�B��,�?�Z��5L7>������M�dnY]g��y"��<k(�q�%;��Y���>1��!qe�]M���6c�c����*^4�&&����������<I�u �����[$��$$i��q,]���Y�\>��G���q�w�< �����5k�K>l��z7�EeQYd��H�f���������V�G��*G����[������a�������O�n7�X��54M>�Z�~������\�0t��z|S������&�I��B���%�a�������~	n�����Iis?���P����iL���<��=z��A��o�I��4Ub�e�^�D��}0�`��e�A��]
dK�dr�ZEq�$|��B�O�U\o2�'�h����p�
��2a�Q�U��9��
��'��pr��%��a��S�S4����7�$	�R�7�>���]��B�������z���1���������l������#������������{!@����'�!�@Z/��*[p4]�y����t��Nq���c���(�
���~�pIJ�h
B����=�m��fs�K�x
��zZ`Fx���+��P�������lG��z��X���#+O����"��mS	m�!���;����B��]���._�" ~ [���r�]9��}s��������Q��{=�/"��[5x�Jx�b���SaI>�\ T�l�Q�^�����(���2��N���Yt�c M�R�h��F?Mg�7���S`�E��/�i�C	�X�T��b��h���`���7f�Hmkr���PXh�Q���z��������?��I��c'�f������&���\Q��_���f�����nf�w����������8��������y��UJ�������1����;�*�)��r�2_n��;��
O,�h�*V-��f!l�&
�Pe
�,k~�)L�,<�
#�F�G��19��g`�Zz�e��4����5p�7g����0[�`*3�.�g���kHH@Q(	t�z�f�	��iI�d��LG�?�K@n`��T��N���y3�8����&;�"�U��w�:���������O3TOGe~�s�+�������)�y|����BN�@�T��}S(�TT���v	4}Z
���:��z{�5u�����������z���2<K�L�0�c�2,�e8�W��u$��_+��u�FGq-m�1�@��z�n��pS8;`�C�[�oZT�N0mui"�=YKQ.J�J�F$���k�������n�=����m,tZ��Sp��(�j����j��x�����x�[;���I?���uo�6�}��[ +[���
������m��T�Ls���p���}�����2�
������XZw�{���Q�U������[��w�����ar���i�Q����~�AZ2������{��������Y�4b�������O�'w��Lq���:g6v�o�g����)��T'7�6/��}��V��C�.9��|�)���q:���^"��
����tN)|�P'�4"O7���Tq��~<���7Z�N��b?���\����{��(�gz8yoN��#�tr��]-RbU�������R�����Ryn�~�0�:}�;<��M�;���B���]�r(�����V�����"�����:�A��[i������gY�b��#d�)&��%�U\<v������NX�h@�^�dw4��Y�<H�)Zx���!�1_C!V' �EU�-���f	���U-\y ����&+����V�� �$Pa
��Q��~�>}?=�~
s�������e�aO�m��S�v��i���[���(ke��������Wr�6:��b����q6j����(�Q�8:�.�"z��Y���$��K���A�8�]�L>?N�'��t��{zZ�E��h/>����A�����T.F������4�o���_���#�h�����?�2~���t�^���6Z�h���G�+n<������z:��`/�/�z-~agO#�����$���Nm�OE��u@�g�
�����*��	�T����O�P�'��=���U6Y�ou
�R����Tpx�/�Th���Y����3�0���f�������������c��G����'����V[�Wj���L|z��z����+�_��f��:��S�7����;��[��p����Q�~�B\W-����z1��RP�%�������f��3�>�~����������#�������u���t����fd4v��m�]�@)����,��gI �T�z��=�}�tavc���;?9����8h =���F�3u���=��&.��}����a��q��1*Y2Nb �P+��Le
k�H������il	s����7�����M�8�+,���g��R�C�(m�
��^�^�AS���[��r��_.oA��'I��I��1��	��y>���5m������O�nWU�svE��C0�T99�o��?8�B��1b_�Yd�p����z��@6��?�nT�i\Zv2�f<$�e@�����w#�����`|y�_�\�� u�R�����(p��xh���sB�H
����=�&�������P��x}����8�J3/\�J��>>�N��`_o;�8��(�4����9k��Can��Fq��[U�g�[���<6�'��������K|+�s�^*��}�����4�l�=`��$��w��u"���^�\(��
�
2�.�dH$���c2@�.��u\�G�(:�o���gtH*l\����]6��]�����]d�F����>��DU��#��6B+L$�~cN	����1a��c����
	")\o�8Y7	�B�3���������T��#z�����wM6�4����d�n�f��x����l"!)EDC%*,E�`����b�L����]����������(^������YQ�`+(4����"�5���*Qx�
!1�
���w��a,=��eaK�O�q&��~�s��|�j����A��P.A�@U���"� ��(�������������W��pW��w>����b��M������~v��E����h\��c2���=�E��{��^.���_�dW����uRE0���~�O�?�qU`���?2Z6��%��M�����{:�=�A�F���\�{$�"I_l�l�p\p��Q|�T���`Co�B�I|�"K�tj�i�8x�2L�|"�h�~b�
t<E��s	.eb�$ws���*�'�U	���JofXF����w��Mq��~��G^ <rd[8t&��J���9:����&U��l���<�Vt�+����^p����l�^R#)#��Wd	_�1��4�G+F�F��f����u��=��+\F�:|u����9�[����w/M��2P�������
9�zYl
������E~Z���U���F��k��k��P�c|�b���X��Hb���-L`-�C�>z%���<����	z@d��+��j����- ><�@���BH�--������j�$�V�U��m���{�#�\��'����<�������(���{J��,bA�B���,H�����GA�)F���U�o�kXu&D��*���?��!���"(���#�tD�k��;�^��#%w�<�6?p5�������#��o�)z�'a�����}�>���b�v����3�Fx������I��O��23=��sq�'�������~U�(�����a��j��#�R�a�� SHx�Cgi�Q�]�n��W�����O���0��N��E����-�(u �9���l4e�������f�������$/���{�:e���r9c1�� �wG��p�y(nkq
D4� ��0�M��X����s=
t:3}0-���
���V�
�l���d�[S����37�F�R"�d��1	)�Bb\��\�����1O�=��I�#.4�i���Dh)�@b�`�Xj*�5����4�n�l��e������P��(���[�M*h��H���Z���A�F�L�F��R6����jL�� 1-Yk��5�,��1����yB�o��y%$��{��dWX8��,�(����&N'�g�J�EJl��4���sCcrU�%�"�y������GY����ZUAO�5�z��<��e��IR+�
��,�7 X�{�:������6����e{NMDO~�V�k� ��@����	�L�=�@AX�Tau���J����{��i�$pPyEdHf�]g xT��Xu���=4��R��]��3t
�4h�GM:�Gu��t��>�Q��=�k������=�'o�H�0/|Gv0�+�n����c���1�������2_/{�J��Zp6=���)��]����4{Y405��P��4� q��|b���;M��}-O�M&]T���#�����RXd�OH����^�8�P{/f>�J�X=���[n�9ds0-:�#���C�0I�Zt
�Vd[��g�|����3�odJ6����-K�krL��k���kG�Kq�������a�����I/r��3��,UX��m�&�ndG�DpM	PZ7�NX1�"����=���M���,�6!���jm+S���pd��6\�$�������pMSX����X�-!��I�|�����N�>Z5���VD7j���j)u
��l����v����lm����|e����T���M>���������{2�����hts7n����J1�z
,�R��cT��YWQbp!�8�{�}}��`�$>�/mU�|}z��0���V,������]jJi���v6=HT-C&h`7	���` �bm��(�&���!QG���f�wd?��^�$43f�X8w�m�N�K.<����	�;��05�	���M��
"7�����Mj1w;;�F�3�AF�!m4;nwl��i`�����e��U�K���|��c�b������PF���-3������0�Z��@���\����r�fR����d�e�������Mp������\~��h�|�6�1	j�OK��q/�mh������J%����CJI��[�HT�o�kO
H@@�#|�FF��D2���ww����B�Z���W�i2E���w�G&B��p�F������'q�k^&/���I<<O�
��n��c��S�����`��j����4�AO�5���j��(*�E+^	V�7f�������oq[3��ofm��d�h��D;��9������9�Z{�r�,
QM�fh�dx�����.����e�2#Jc:)�����K�����q��RJk[V7Y]��P�����PQ��S����������G��AA�����)2����g(����W���* �(���-T��[���z�eu<-������`�|cB�I��iR���+�z�W6���,�f�fU3�����|P6�_L2U�m<����RD�����?���j[Y�������b�B��8��.9C�����0���Pk�,���D}�C�8���2��fY�-%��2m/rK���=*�"���v6��a��J���!>���ZDmZ��8��j�G�`��.�^\]�a87�p>����`4�����������f��Yuc�� <�:��#�
J��������k2?����0����8@��h2D����cX)W�������N�s��_��M|+��e9��]N>����������~��/9����=�D�a��4F������$
�������Xb,����YG�[Q���U4�Q�a�"���/G�:P$fT��9P0���X}��5��T�"�
4:e�N5�/����YC��&r�IJ�3=�Y(���t��)�1�8�G'BIl%O)��R'��1�uh��9+��S�8���Z��`�v�=_�{�2�K;S��=O�5�7�K��7$+����F�z^PFW���[}��	Q��@�����TC��i�v��$��;OVez�<����f��.��z9���ju����B0���%���G��Y��)?Z{v�"U�������FO���s�!jf�PW�B���M���Zw�b�.�X�]�
�[���b��p2{�bh�>*�^��g�J�1o��gTf�����(����g�x�������Zg*3&����fR0c|x��� �GH���P9�a�a��wwP.F�'c���{cY�F�K�S��E���3��B����O����4��zC��
�A���J�"G]��7� ��:�����Y���*��>yRp���A����=�j�AB�V��Vu)�����QM���P�j[�?��Rqw���@��Q7�$�����U�J�XU`�*�F}���H�����;���T�s��+��������*7Sn���*��z�����;Q�����y�Z���1p\�C�����7��V����x#�G�2�:��0�;�O�(p�}r�$?���h6GD���4nUqS�8��	���WlQ��(���fmRbG]����L����Ok�(xe6H��oE���$�;���*�Rq<�����+2����E�6(�TQ��K��`���Aa��o��3U/��JHA[@s�S����������MDT�d�&SF��r$^|EX�0��%l��
�;u��>���7f�0���������x(�X/i���z�4��e�Td���h���aD5��G�z�I&�/m���x��u�%=��@����$���4��N�l>�K.�J�f�1�����������w���3RE��F���lrkONhX�h���[�C!2��v4����(���BV��@����LTq�H�r .�xEN?\��3�!������E��(E�?cK��U~@,.y�HK.�o�4�s��Xe,�I��)��n����KT[���R��N)`�QA�c�:Q����H��T�����0?���z�wxQ�����#
~����v��=�)�DKb9��@��"C|��?Y�c�)�NM�#�	���*,�hC�wC���2/&�1p*����H'v���0q��
�����:��Q(x$��/^��j��h���m��Mu���a��#������jij�SF"�!���x4��p��L�4��BrU�B����*�m��M�5�(_"���Q���e��]��Kv����v�FE;Y{��!�0�A�w�U���qc�"<��a��%�zB4m����*�
X'�M������G���(p���l��0�*O.|���}���~��)BXq�u)T��K������c���������b-��k�y�Hd�,�z'��/xM0^z�]�Tk�Zy���Tw������'T����|H���P��$'���;���@����-[</:�v[8^�5�f�����fj��F
������� ���X�7Dh�dg�&9����^���������H�!��n�����g�J��2��DgLj��iu�B�(��K�*���i��6�6&1���.��B��>
H�Dg���;�<]{6�F�.�te�j�$�E�!�}���,�X������O����!�9���I��Sxl�U����M��p���	�d��+U�<����_@�;��J��W.�C�5��������@���9����p�n+������j�p#w�Q��l���W;&��6Ro���;����{�9{_�W���m�L:����h8���E�'���h�P���������\�br�n9��!�D+,E���*D�b��E��O���<)W5�yahV���U�fn���V�,���j������o;�HBi0�� �����F�����SB��V=���!�>QO�d�}�2��=���n&'%Y��a:�l$ ���AB����[d��N������bA<�^E���6�)���_�0@�K���K�$=��e��f�Ic�$����'�64�m%	�b�%�_�SG�4+_��!�*� )�b�H;��H{Iq�,
���Y�S6�S�A6�p�13�����zEc� ���u�8i�MYy��b��h-�45�b����sp.t��E�'�����S��X�A������TX$?p7%������_��Q� 5d��D��������F����RJ)P�e�����5���Q:�j%�&� �wq16�5�:.6�*!�s��wru�T2c�!������gT����F���[��[����i�*~m
+�[P"d~A��W#wX0���D�@I��=S]�P����}'��S
S��{����w���x������y�����CS���Kb*�*98�������6��������o�3:��'�������di��A�Z �==���h-I�+�lr/q���O�_t��)�hFL�S-R�g�sR�:�)[�,H��M��=t��'=��%��d�`f�o�#	���Jd�P$��MR�`��{c���it2$��2��T���B�_i���{�@��k���[�,.�����K�{��5���C,7�A������jW���PC��}����9�����l�I���6�T�?h?�-�!��R���eBE"+�RZ�U�~/[���jlS�7�Y���N�q�����>�(j�
J(B���w��s�@��u���d[G�(��;#� �?pG�*�F�>�^p�t_��U�������5��$C<��n`+���1K�
�D  ��s�(�a�S
�����-O�,���68��rPb*��;5I3��a{.�)e��{on�I�a��A�g�v�����-q��mxw1p-!m��/�/�_<�%&Hj"��O�G���/s2���n�� 2���?m�L��� |�M_>���`��+��L���d�������,/��bC�U���� ?�@���s���5"'Pvi��q6�2���q@$v"XE?�\?�9�DVh����0]|�`sz4����)����."�~b��!�kI���qxA"�
-���z�������ZLX�fY)?Uc�,0��F���M��-L���5�����M��[O�,��^���W��>�9����l�i�I�r)�������
�{y�����W����0�<���W����?�����:S�I�.��0�	�M^�R-0��^l����h����6����L��2��u��z�����p�����#K����3(y�����[���.C`����R9�`>����<V�*�P+DzY�h����1~����e�)&�P���1�����h��%Dk��/��.$Z)5�����K����W��60�,����O���9�+�e����(��E�������
��5k���e��I�4����G��e�T#���H6�E�Ia!,)&�UXi]��QOx���Fd���kF��:���t��|.�e��7
�o�UK.�LrGZ���L��Z�[V ���Ur�o\���gI�SU5[Q^�H��[��[�yTB��a�jN]	U�O�@�������-@�Ihm�F(&qSD�M��M$�*2�,�%���EU1:�S}L�Y.���/0I�	�IV��~&�8��8bJ���<9�T�$�s�)�gE�
�?q����1%�XM�'"j4��s]e� �l^�+Y�M��8=W�������2��?�N
t�@��K3���?��_�^��^��~�>,�
��WMB��r���r��I��U�n{�"�������'+'f-%6�L�o%;zeg9�K��3z%89
��2�;���6���H������9#A
Fpw�����h�+�>��pp�����P���z�~��I��xX*�	�n�2$���]*q�X�R/�`�h�8V�K|r��@�g<S�}8R�1���\1��>��OC��m��F�vFFP�/��qO�xF�%^�>�E��-�U��N ��Jv�����N������2���K$5�
��a
Q��{6��N8"B����X*����
�%LG���#a�	�G3��>?��zuq�>=�j_�:??��::�� }�h7b�a��]u�q�|"af��hj<�
-��\A���g��A�)+II��:�x���w1��9������V��`Df���_{^s5W��nJ��DG<����2�HG�B�
 ��;LTM}G��B@��D�v��3������8 �-������D����9�F�t>9b�s�����9��<$�����Kpd�Y�*8>*(������������y����v���\����#�tFc�%��o�/zd���c���x����iM�'�dmZ2���-q~�_L�4��6,�G�a�}����~�!���v`8�e�B�#�lw��7�R���n++�;�����i��M?U
��F������O�f���0)cx���-�5M��U�>Jp�������������$��G�';�v�|��e�q���
�
�����7���2�0*,a�~b�&��M��bT��w%@Y�s��X	�%�:���3k���^�}�����v{�Z������l���l���4�ks����G�}�Y;P���QR�M��\��Qt�k*Q���*��v�T���S��\Z�Dxu��h�%�����=�Ve�	�*w|W���d�O�nTeI+DV��m�5��z4��HpE�A������\�vzQ����l��p�s#[~	s�Z��\o�6i����e;<�<�������3�=�Z��8N(0k',^��/N/�A�x�M����}������$,��/�6���>J��������O�����t }H*��Xz�d������"}�T���J��.l�}]���>`�iv���|^'�N���Z�3��xT/����F���G.�BSQ�UV��e��m�H7���5P���.BCud�TU���*:��/n�{�(�83�6Ht@��:��F=�M��b{"M������v�����`����4����t���F�]�mnF������E��{��)�!mA�������E�N\m�v�X���14�$Ct�������2�)���?7�<'���#�a[)8���P���^h�l���X�2*�T���Qw��t���.�c��O�	���~&4"���c��-����f�fR���S�-���M��?\F��$e"����w��1R�b;S����*R��0H������6
��[������Tj� �� �Vw?s�V��Hh�f��B���V�.Y�v��P�M�	;������Jo��M���

��+H�L� z�B�#���r|_T��i������������^����m�zR�\f'i8�-��#{�2bi��$+r����6uI������Q�9����E����AO������w�}IW��i��'LzE3���F�b�j�c�u���Pbt����"; .a������VSF�7@?��0%�����y��������d��}cc���q���q��q�9��r����wlGi�i
�u�N���&�����h0 ����"�������/�OqtZ�������:�P"�3�V�	�U�;���(��n�^��/VO���IDL����?{���<�D@?��s9��=����HH�>m!��nL��#	]�h��k��MIy9�@�H�u8au?��q)��Z��A��6"J9��*���6��'h��
�7%�mh p����Q�h�JaT%�y)���p�!�1~	8#T��eE�{�����T���`��}������Zh&lmu6�[x7w�U�h.���<�q������������e���hp6��P�4�1���6��7����_���G����K���	���9�@�t�yo�b��L\E�/�&�r�(���D��FI��r=5���L�$gn�l6n��F���}���Y�+QH�c,�~����u��~����r���j�=7#��%�L$��
�N��e	��a+�+�P�������3x
�ss�-s����O�Z�d��&�Eyr�3?S^������B�1p������%��r��o���c�k��;�V`��6����?�����p�U�-���������mn���m��:�9�1�;�Z�{:���
8?�����1:7�P@l-�"�J�:H��<QXxh�T�������~����m8�I`�-��S�Eq������O$�����rni���X:����hm�����>}���)%
�si�sC��%�Y��s����al�J!Ky�����_��]{�|����1??9�����#X�?�_�����54~$��L7�����/�@��D������$�V_�]�)k]��F���X��&za��A�F'�1���?&�+��M���C������������Z�K����4��+��U@��dy%�$6��� ���|	�<�4�j-��D$�dUm��7��YpF �T�eq��i@������ti-�02J���c�5q�����XtJ��:`�d�t�1q�t����O����Z���O:;�e�=8���k�uw]���o�'!
xd�<,}��[�������{wK���;\�t��I�1]�,	m��,U.������>�GnO��O9T��WU��������*H8P��a��zbS�Q�]�iC�p|�}�����NU ������_���kTG��'Jo�������8������n�1��4e���f������(�����PWk[�x�L%zDI��:�b���tB�T9����������h2��tI���7�SU��
��&P���^0����K������+\AgyE�*S�G'GW�G���
��4[���-4���t�q<�MY�������?s�=��*�i�Bc��F�e,&?|tq��IG����zV�<������L���M%oT�^���R��+����v���CN���	�L�]�r�|�D���z�~�G�^����������?�����I������*����v��^�TU���z��=��{��������s�w�+�~�^|�����P�g��������1�����_��5��9���.9��73%'�����|���	Z�C)��={�����|��>~y���Wq��Y�_��/rZ�r����W��?F�_��������+�bT2&�6�H �3�������!!�����nk��[U��"�C�|�-����:�����#\ �F��e����
���@~Mo�N�H���e�������2��n�����r������|��,�&�[O��f��4�o�R�\dk{/����
I�.�!�3�[��So���}U��������+�����4�����F�V�D��F���,��?��p�����7���;�E�KDv=2�+Md��k>�f=��n������l��S�,�[��U�N?����u��]=���im��?����m���H��Z`����������9J#�w���>��B�16�r�,�W.�S�,j���O���BB]A�:���?�jI|���f�k'�0�����jQg{��(�T=.b��,��G^���a������.^*�E�������_����	��=�I3����r�@���������W���SBlE{|��������t\���c �������<�8>�Y���������]YI��	'Br���X�aU����N������LH"Z����L������M�/����1��hByXB�pp@����+�����Z����.;��n�4�H���!f��-^c����x
$�j������o�j�N���),T�
[H���H�;��`��gi�\�F��foGt������C`.��v1:��pr�8�_�gV���:�/�d�FIB����7��k$	#i�d�db	�}�{�J�E��x;P�$��d����������}F Q����U��������������J�P�����C!�u8p|���q4�MkE���nPKkR����l6�;�Zm����ix����h���C�����}������j�4�#qcv�����b
Ty���`7�o��9/�5	L�~G7OL,�l��5|��go������	Y�1==D��TnZ �,UO������.\�!Vd�z��H(������56�K{@���H�C����'G��L.�A
9�Pr����eo �,#�/��Si�_U�/1�S�l����x��E��NO'�m���hY6>n� 1mixV��tW�+=��RF�.i�O������?RG+��W���@(�aTtT`8U������4%�AE����N�@,L������w,a�M$E���|���b��5�	�����gIT/��&��N�7:.�m]Q�;���ZK�N�q��q~M�9XH���wz{=�2x#��_��V�;�3Gw����k�C��7��ht7�����Y����N(�&����	�\ �X�@�����62����.�H�b7����nnDQ}������	q4@��P�@�C��DE�A��9������<$�MH��nH��/�����Kk����,U��T��b������QM%�c���������f��kz�B�-$gD��h<�[��'|8�	�R���;h�2-����B�6�V���)�j�W��e�����l�p?�?�A��y���W�@?��t��y�O10Bx|8���1��P^����F�P����`��	�����U��W��������R0����+����@?%
�0��QI����V=j��j��N?CO���%�W�$�R <�?m/�?q��/������8�&��`A�-f)l���M)La#jm4v6k�n����^d�)k[������������2�;�1�
[j�6k�KK���$�o

0004-PATCH-selectivity-estimation-using-functional-de-v25.patch.gzapplication/gzip; name=0004-PATCH-selectivity-estimation-using-functional-de-v25.patch.gzDownload

0005-PATCH-multivariate-MCV-lists-v25.patch.gzapplication/gzip; name=0005-PATCH-multivariate-MCV-lists-v25.patch.gzDownload

0006-PATCH-multivariate-histograms-v25.patch.gzapplication/gzip; name=0006-PATCH-multivariate-histograms-v25.patch.gzDownload

�j0�X0006-PATCH-multivariate-histograms-v25.patch�;kw�H������9�F<$0��m�C����C:����#��h#$"	o�����*�@`2�9g9��uu����:`��m�nk&����g��5k����9���!z=�h��M��H,�8�V�����2*������������t����r��v�����bq������;;Fp��������	�Z��2ZM�G�����j|�t��?�/�`��bwm�.b����Ch-�J��S,|'�x.`�0�b+���=�`[���!;�2|	3WxN��+�s\���3���Bzzu{u��
p`�.����He�8w����|?A�q%BWD8YI���l@����$<�[��	f�8����,h�LW�G0[�6Q�y}�[9"�P<�"���y�`D������ �� 0���*�X���o#���7������F�r�x��9�	����p����/������
8���B��a�5�*D
�������
 �+-f9&j�&T(fM;�&�&d��[`j�X}'a�=E�XL��xD��^�?�o@�����d&��5��Y�P~���V1-F"�en2,Q�P�asi����Y�HD�S��muQ����]���*�7%�cr���o�bF�^��n�����7�C&����F��>On'-E���K������o����2�#�}):��������%jV�����R�Lw�)���TO�7�V9W����-��.2�����P�(���u6II���>ufCg R��I�;v��h���.e�G9M'E����kS|ZbdNs���r����=HW�,X�����������K,\y"Rer/�"������B���7���pV9+�0��t��0
��2a����zX�a�T���9����?G���ip�E��O�Uu���O��j�����0Z�n�s�������;Lo/Y������nVsO���yYq1��t����i��i[qB!|/�
��~�?��N�4t�K�c�O?U���7��O���6��R��|WS����(Wv�i�%.	�������${���S�l";��?Nn�����0V��b��%�N�c���|����i�(���C��|�7���{�#e��{�4`�q�iw�h�
a��
��/Q���}�C*@���*����\
o��F������������/��`%q�>�zS���Q?X������o�u.^,��,}!�{�1�,]���OP�����Z�s��iq��0���:z�8����������
Vm�@:<�X��	��It�({�c��B,/PM�'�o��#���h�w�n<�����Q�0�v��`�.Q�-o%"t{�n2���4���KA`�������$qa���
��8�>�\-x�I�e��`)�������K���$3,��O����|r�/P��Y�T�,��p}�2���f��
�B��X���3�����5w�e��W�����EM�q.x�����M�Hk<o*���+�����yQ\�2�Q�\XQY!Z+7Bh�r��A1��CHl����;v�T:Y�0)���rH�D�L���_K|y`*@m��S�B�H_+d��A ����1�#���'����-��W��h<�Ad�X�]c�
��f�E��,}Z����/��T$1��5�L�q����{=���w�����Vj��9��0�%�@����h5Z�F����_@-�.�s�E:���WEY�1��B���I�a6����c� X�U���SY����h�,��S�#���"X��M�R�~���o�6R<�V��d�-u�I�+6g���0z}~��07�
aS9������=�XD�q�b�b��*��E�y�(P��%)s
X���G����z�����G�z:E�_!nt�"��z�
�(��	�1��(M�:+O�j��Rs������S������kM��A��zc�����l��P��� ���y�\c
��@CJ�sj�����]�� ���s�@���HL�D�P��Y	-��2�2;�c�$jn��7'
;|����:���
+F��\��5^Q��QEb�k�|P�Q�����7A(P��.3+|@�8�S�[-qO#�b ��K>�:�h���t��*���*�6���Q6<�W����R^�C|.���GT�`XG�yQ��Y��(8��*~��(�|#	�d����Vn�*F�tS���V�+�M���)����D��J��
0
Z������K������?^��MH��
�����x�> ���j;�2���W�3�f�}�F��[��&�I_
�s���N��R����5:K���K��]\�v�$IWC���bTy�jW+�#��=�Py�
r|���~zNGG���\�<���]�h���p�)L�/����5�@��y�����t��L���_7��2����SP�l��L��hQ��z���5"�]X�������}���W�����UI�M#�G�U���q����|������UW���l|n���l%�����f���������+{�YF[C]�n/��u�?11��DR�+Co�W�a������"f���'�����8���z���ON
t ����]�^7�^���s�R��;ff�c��b�L������&�>�;����;���~7���k��6A�Rb�1
��a�2���.r>����bO����(	"4�������J-�r7�F^V@����\�)��.��Q\���J����w,�������H@#�����N�r�*K!�B i�A���*�"V�^��-��RZ�\z$1Jv��
��d'3I&Z�U(�[�2i
HM���Me;���O.�
7�i�XO��?kt��._DA�S+�s����{E3�$2�\��T�K�'l�S4��+��"Q�D�>�4%bk�G�y��e��*�R�M�\� D(*�S�&��kw���D��%��nI,�����N[���r��t�Yes"S��i��^L���4x�.���M�g���I�y�l3T�)m��8�@}g��_nv��^�R���1����6�X.�k:��q�:R5�{���m����i�fkO3z����������FMj�����p��5���a�>��p
��QYh�Qt��������xp��2�3�-���C�,���O����������]��&74�A�%��O��M�@Y���M�r*0�e��5U�l����ySD�9��3,�Z��i��4 ?
Z-���J�����;
����z$&������=�L�r��K�B���7%W�E�	�9P�!��m�i��
#�r�tN'Wb�h�*�_�|3��
�J�/{��6M�+5r����v4���v|G`JH���\��AXZ�n�'-�B����V�mS�\��'XhU�x��@f���4��3�Hs�%����3\����>S� S�`�f��#-o��.	ud<k�l�����9
��j�%�%H�����)`�J�%�*��y�P�2�L(�-����6M�sgz2=v���[��i���B$��AC�%�;�)�d�c�r.����}��s5�d�����S�O��Q���C<�"�'=��3��XU�������������<Rp�28�z��x��3�?`���F��	g��8�p	=��������u�
D��7�S@��1��[����Z�[�8	�l��
�}\�EC/�:�j&����q
F>6Zb��[�U��
�K_�;�c	e����Qp2]K�F���"�X@��/F�����5�#s�9�4<k��h/b8��_�����������~�Q�h��������z���@.e��f�~���W��#
���?$���P89J2�E�Z��Pg�b~�8���c���8p}[8��+bxL�Oi��b;���6�g&������k;FH	������ ��WB�B�oK��hV�s�������lj)�|�Q&m]i�p:l�g���}`��ii�T7�oU�<o(����g�ki�2&G�RIi�j�A����RjyG��o�	���J8pg����bI��HCv�_��~Q�\�*�|f�l��������w��[~%.�A���=@a�����ji"�������`2��v|���U�4�`����V���D3��,D�P�f��1�bK���g��h5u���+��4D��_�M���J�5}J1���g7��g�*�s�d����~�5em��X�mL�����]�7���,�.�Obs�n4�t
���<J
�4O�i�1W�&���W�p�-'���o��8����
��m�����s�)I��'6��m��������4=�@�(��ZN���Jq�)�W����N���Oj�&�?�b��[���{_�Pj�Z������z��C���0�5D�M��\zM^�Iw&8&?�!'\1}hA�TR���D��9��
�����x0����f�z8�y%
b�B&���!���A�f6��=`�ADf:������q���V^��2�V��'F����L��M3O6�1���F{>��8P���k8��:�j��u����������p�|
�/�Tn����S����?���|���G*2t��s���N��+Fn�GG2�;��~5x;:j}j��1�8��f�V��B9�����8�t'.W|j��;�)��Y�\J���/�e$��g� g�fr�@= �b�eB����FND����-v�=�:z����"� �Sr���:��S%�6�u@��~<C/���:��G�����m�����ZB��0���H���Z:��a�'S^�="��K:�,�2��\4�p��0|�H���)�qr�*���%�t4�@����w�p�L7�XKW�l���o~�G��s��X�y�4���
+L�V������&���9C�LR�@�uH��s������b��.�!&.�9���b[+��"C����������F�n��Bu7�9��bj��]�&�ho�O4�Z�3�q}yC\���w�U�Z=:��{7�F0����7o��������x8������T�7���w�5������?
>���\(�h<gy�e�����.�-�'*��(���������S7 �c�vup\��y�d'QRw*uD�I>;XT�GK�t�M�o*v,�FN,>�|�=�`-9�zI5]��EJ~���0�t<f"��^r>Q�w��h(��{����C���t�7K������9��/df8+����L�i��M��9y��;]�='�_��`�+���D�P?A��]����f��a�����&�\�rR��
?�@��f+_?��4��a�������2�6R&\;�p�����'8�L���y-[N�0����i�$3���l�(��08��s�	�����r?�d;)����h.|NCX!���>$2�aj�AT���S���I��FU�&��Z�t��n3��1d�/p������E�0��5X��iHr.�Ui�%�m�{���+�.H'Zzn������N|�}��'�D�?Jn/�rGB����,'U��4��$)c�y�
�-S)A�����R��+���Z��jb���r��2�Qy�a`�O���o������'�~>���:W�s��NS�|�9��F5�-�="S�&����7�#E��\�:��F0M}z��/�1����k����������U�0O��q5v��`����N2M]>�d�������g]�^{�s�p2������}�������}�������;����\��N���<�������{
��2���&9������!��d��q��V�����pq���f �%�7F6��r�[:�UC�!��bS	���a)�gu��a���sA1jX��ApM����C������wQK�
_�-9��_`�d��n��7�SC����7�%l'������P��!��8J��
r� G(%�8@t
i���g���m����m����~m��{A�j�������+bf�}��[�I
���h����������n|
^#�����.
������q�m�G��Y �e��V+��tF�[����7����kQ�sBH����)�{����8��X
�|#�7�����
�U���tqM�`W�����awa=���+�tH��h8�U��,0t����5i�Vy�6���_}2J���l�1��
U����(�r\~�q������������NA�|"O#YZG�������B�-Q�9��T:Y~�{�M�f�6����������Of2�{a,H��m��<�(��6�W����;�0?TvE�2�<%Qinx ��9\4Z;TT���|�4b4����=�����z�v6y���<���1r0�
�{�����i5��Q����:���&�4�e�-�XJ�E���^�$�l{{k�gW79�}����hw��������Gs�x��?�g9�	���H���%��1�K�,�BI�hr��	1�	GOT3]Y�c��F�i3w?�XG��\��]�8�V����G!�|���_S���
\~�����'��2��q��nd��`���|��`���K?`E��C�)��O�|z�H���t���<5Sf�(�����1�2�(R�Ke}�����:�w67������F�����Q&B�Y5���$R���)��J�6����E�<j|�%��-��/0��!���H��<���xD`���^B���[�p'a��T��C��N~=��B�8��B���zO�I��3�)6NA1Lc�����$W��y�p
S����V�E��/r����(��k���q�s)��TP��K�4���%�5�����Z�$`������n��6]�1U��Cj��B��X�������W���4I��z�Zg9:����E������+wSp�����\�e&X3^��JM>uy�'q]��41�'Dhh�#�
�����p����[e�1Y��4G�MvwN3�w��y�x$�������.XF_���O4��!v/�`���+;I�X���8bzi��X���@{����_�x$`���O��c7��u�������y��������*�;K1�������B6��M1�M�������(���C��@��&-�9}I��]d�;T�>��2����.�e��?L �>��%�%�fV[���1IoI��v�7�Wn4-~
���p����9�r�5k�R}$<�@���D��upu�t���'�:$��/��f��OVv���x�)����H
�	��G$����������0"��
������.�yf`��R]������%��ceC�,���(h+U����/4���Ax�UY.�''���K*JUI����]U�-��wim�(�?������L3�WU�N1�M�NNsc\�����!�KL����7�a�
�Q��FY�6�o��
�pD����TKMO�7%����v^����X�B-��<H���*���K�$����������x����������'X�+���{
m�:�i�:����
��4���
�A���<��Q�7� 2��KX�f2C+�����K�}gr#�0�������]bK��sc�8����p��<�����m���L�����K������o��O�KKx�M�N�����u�E*dcx		�_����bD�KZ8�Ez^K�|x)���������L���\�Z�����$]�"��^��`�91�@��(�A�Ktv��5��9���]�������3]�$�iXP�E'w9��8,�Q�EFhq�WlG?ox��J��IC@�����.���<0=�����I`�rZ~*���9<��-��I���#�K�=��82:9��-���7��:������������F��0���iN����J^������Q_u����
.��z����	�����q����)H�����n^�~��{
'%OM��	�*WZm0��D$l����B-�
w���`+���0t�I��C��p%��v���?;<:={y���9Y�V�y���dn��8�
������ze�7�'?H�5��+H���YF%�3����W��&(�7
�`og*��2�o�u��Z�*�|�D])�#�[V�b)���"�(.F(���%�W��!�vIx!6�f���K���+�,q�~q�z���������9�aM�l�CY��������G��2�H%�/!H��g��@:�Ay'�q6�#�_��}�����4��Yeg��*�s�U�64�a�p/�@��������!�,��G�I�2\{1�bx8OA6���XO�<i��>��~��-�r�U�%�{0�uFx��SJ�C27����OF`�
J���������8i�B������`���F�A�������b�����0����wO���^�����Vby��o�2�??QU��<C/��%OJ����.������[zc n�Ur��V����Jz�J�P�*�?t��Z�Y�	U��`������p_�*�/�(�>��.*&�����fVZEg!�Plvj�#r�T����P����b&���(G�>���r";�%�����w%.��^��e(������z��vH2j�Y�<#�1�t��Nl���4�a�-������rc�x�����a����.2�)y:�������I��JQ��;hX�&s=(��k��};I-��m�>al�@Vd-y���6�
�(;��1����G��]�B�(g��9����Nps���p`��?�u��k�np:d�����7�Q�3��ba�g��H�g���p?���5nE%�7(�_]�f�5�U��*������h~"�X4;�4�tD���f�
��0�~O:yt�@��������!�*�_B�y��A�97�P7��*�W������*��B�H���T3���h�"U
+�E��%��C�=r�XO�?�~g��t�y��t�PJ�Aic����,�'�m����		���=|�O��������)�4�Kn6��jhAVC�;����&�!�-9�=yj�m
��	a���Qc���������#k8�
f�P�]���y��	�\�hS�a�����.�(p�iBs��Mh����.�.�-x��5B]G#mrX�30��f��w{e��|w'���E��M+��w���5���*>AN��1�^��N���:8(��E�+�Y��R��`��u �Pe��U�AS�����^�x�"+J�O�/��+�ep�{�3,=���;t�&���5��1tH�Y��Lt�4\��3���d��HaH�pCV�t
���~��E����E0�����:�c0`�� g L���U�q��k0���H���W#6�D��uwP��60�)1)@��)��F�@�c�e��bp;��=fl����@E�Ir�O^�kh����NrG�|���/�M�R�^.����C�,i�)�^[S\]�������Y����,a�7gym
�A,!�Z��}a�V�����i������������d@���@�&l�g�q���H���?c��#�6�����6��8��{OS[?�2�e��5��!�|t�'��M9��
"~H5��O���]���#��/8�����<,;+7*�Mv�~7�(�S����	����+R��l�"��^��|t=�S"��n�8����^I��Y�%�V#��W�u�M�psu�.�GxFm�����e�.�����$�l89�E�[�<�=_��9����"G%��&��!���pC�m}�s�����uS|u�_�E��.�3���d+{��z�_v����Q��G�������n�����X�N�o�5W���Ut�H\t����������#��n�>b�&��w����Y�\�#4}�CZ~����	�7������c�4�#�l�0�ct!�z���~&3�.;���;�Q�	���������|@��7���PN��d���y�G��!�^����M��{�?��?y�B�L������W�@��\8 a��y��LTC�p�C0��s�[��m%��*�0)e������?�����o4����1����0j�FF�I��
V!9����?'g	w?G�����T���h��� �7�3EF p4�|e�*K��s�Q�G����t��U�j��tm�|���h��,{`�_`B�:EkC���v�'%����x�:��.�]2x_D��bR����,�����1&��f�����V}q��60,���a���7��}&�{q�������G<��VOm���T5���PG����o5H����������4�tt�|�m����m@������/q���jIgaorC�Q���+�����1��8����.����T�S�U�/����"H��{����W�W�l=!:��P"����:}u��������T](ERU���Q|�	�(�5LLa�E#v^��e<�yg�,Y�v9�������{�����_I����;�J������08Q��a�("�2��7�z�V�|i���L�M�u����S�.z�%'�/�����1j�p�(�����@V��l���_���n�H��	��z3��H!��x;]�?�\��wN�9����8S�����+���`v_4XW��7O
k���L�i�JH�4��������@q���M8����N&����}Q%��^7k���y�3����s3��we�]�L��Bxo=����a��)���*������
u�����Y`vh��VM���U�Z��:�n$U]dW��q�$1�O�)d�)#����cKH�R�(��J1�~��(K�!v��l�>��W���\Q�@n�A7tC,[�b�Z�
i��c��y�>�L��m���������g�~�}��R�
�?~u�����A���.[�z����*YT�a�Nb�^Gj8b���e�j�/~~�/��&�����2�nzs_.���s;�)�z)r�O���2@m����[gGL�{����D��?���yMI���� >@������X6�B��%#��d	�_��G�����%n~E�yUmj��DJ*)D}\��N��������L�Hr������+����~U#6�%���(�[�)�9:�9����C��e��Y��E�n��g��[O���3V�\�gd����Q�AS����h�������@%Gc��N�^k�5�85g��[���b�PUZ�=hjtfZ��9�U�#������
3r@Q~T�41We�p%�h��p�r�!P
#C
Y�2e8�C@���3�W���!7f�.9>y��ay�<��3�j�g������}Q���U���#���`f���3���C��<*�W����E*��D\�>3�Y����,�aS��L7����=:�Z��~����8v��).��+~��n��7�+��,]I���J�;�L�d����c�6vm1��!����19�xpE7wO��z������(;��@_P�}9��I��,�Q���e���$�����=C�4�g�E�aH�I�fBS�|~�����,h�7��`�sv��,��� ��v�;����Vw���]��Q���i8�QQ�}�M�L��p���Kj��u�����:F�h�?�hi���������tBs���B.�f2��w������k�2����f>;�;%�a��xj��o�_����8���*y��YJ����>3K��-q���b����f��%D'�gm�1eM�_v$k�o��9�$�X����)���cS)Li�n�lS&�B��]�$I��y*<�C=����T:s��,�rt]o3���&f]�{Z�o�fD���{��&NYe8���X��@��
�I��'�������Ut�N��z2��a�a&��V����nomv;�V�?����oUe�[U�o,�Um�hqs������|2]�2w�X��B�7�����Km��A;lq�."�}�/'l��
�Q�}����~���/�S�����&��`@n`6��{�q����������>�o��w6[�����6-)]����(eA�R	������
�z��sp�i�{����N�@d���s=q�U.Q�����y�To>�3���I�N���=`#Z��?k��a~��}�B* {"l�k����v����Z���VP�0���~p{V��p���"���/��^���g�������0�.��?��7����I�U����+$TLdObG���P%V� ���u����s�`LN�*��~~��(RQ��X��2�y�/���w�Y�H�^X
q������`�������hJT^��~5�#_p�w�L��N����+���#A>1(���sN�>��h��.9�T����&����,�%�2�e� �I0p��d���jh��&d/�0�h�PA(��[-�e���C��<Qy��f��.��8:�j�*��$��{���Q���khk�19�����f�\��UU�
{w���������je,sLC�+���Ds�b��,������z7��rD�Cq2�H�D�����K�<�z��tG�:���$��h��I��
&���N��Rj<���7���p�8`�;����Sc� ���6�%Q.KB�k�1k���.�V��5������4G��`@�
+.�.�5�9y��{7j�(����P|+�������W�����`���b�4���W��r'w1�
�}��)��@8������� {�yC9���Aq�����0�������-5�S�I���,]v��)������&Wq50.<��2*&���9��
_��RL����,�ZC���
�"���;�]i��H�C/��1��������}p,�����]���+^0�$�}�(L���&4��%h��1x%�!��������g,8��u����!��9T��\��g@)r���Ow��!V����
M]����U��'@!����-3�h�o C����������|�vx�	�,�6����)��x�;�~�F%��[�������o�h�`E�#�a��,�G��NF-���(��B#�������0�H�4ctv��9����k��F��@U �����q
y���^H�7t��jd���������GS�T;o�|�M�z_���l:�������)�����i"�=j���*�����n�p��7���pz����s��
��X[���_�\$��O$���{Na��+�D���M�����}���n�0 C���C��-�iID$I��2$��:C�z%438
h�`S��S��O�l9j��`��iU#���zsD�^I2
5I�?b<��34����Svt�$<�d��E�"�S���7��[31�/�f����S"0�P�AY�^~!C�?!E�t����6���,~�<w���p>������^���\���UoT��� O�p���E1���Y�y������"��K���U�a���v,���#�u����8��G.W�Z����PG	�g/��x�������tA��j������`:J�,&1����I`b��dH$����R��Uo��!�A����p��y�Vfd~�����2�&,.s�R
~�^o����K���O.q��U6�O���	�p]��O�"$$���0�,7-R�����{z��]@	b+;��v������ih2:�S=�z�@����].�8���*=���u�Q�	��e���1|�/�w7W!#*x�U��'
��p�T�J�[�	������N3	�����M��n����������U����V�\��Ci/��v���CY�Q�"Fx������.���?�G���C%
�+)����en������'G'O��vC��8�H2�
��a��F������>o�4M�)A�ej���R�j�~�U�q��7������F0�u7d�g�����JOJ������b�W�k8�*��Op���."�`���1r[m��o�
����<j�O��-���� 	�Z�����
�!�/���N7��=������*��y~�8�[Tp]����=��$VW+��9�����[�:�^����3�p��8#x\�������� �Q@B]�<���9"Y��*Wv�(�����)���.'")���V�` �1w���������-��}���A�^��l�A����2�Hqn>a�1S�z����}�����[+nr� H�������&3�����7����V^�;q%����,-9k���8�J@z
]����3�������:a�*��2�$Jf��?���@��,�����AfxV�/�'�B�)gJ��+'C4��\`����s&����#O)����Iq	E�Ls�Q=&nB���VLG_���)��we��>�-�{��(Q_ ��$�������y��gL2"d}������H�o��I{�4,���MA��kRttL����)�����v�*RE9�y"�P�)�B5��l��LTa��Q5�N���*��a��9Nw��k8���s	���~a�H�l�";�[��#�3���PB��S�[S���AI|Ti�&8�cj~	I_�����w��~�?=;>��N	H?"���D��A'0��/���/.�#����M�)����N='+$N�>�li�TJA��7e]Pw�w�{��u���M�_:C���"�i2KO%i*�w��;59��j9�������Ev"�����/@������+�;B.�@>�y�V���g�f�&#D����31�P��/����<p�<h|N��]��z����\�J7��lzTy/7��3t�^������`2����e��2������d�
�����nS��������-�M�Y�[u�����C���C#e�`�"<x������aC$��r�CwRc�V�B2x�1-lr4Vz��?H�?����dF�KBi���h�5	�1Z3@������Z�&xV��M�Kl����d@� <���h��%��Y�841��x�����?)tI�0Se����
����w���	�:��P�<&:�#�����w�x����J�vw803��p��!�;IY'{8Lw7U�`��wV [�R��.�����Q���6��A[���F���J^3h��OlL���Mbn�?l<��%��v,����t�=:����>Q�,p�{�R���>��������P��+���^H_{������f�R.{$��LN`�{H9��7�t3 �0?q:KX
��a���V���������'E,���b4o���B�����q��8�l�����
O�
7D��Q���!~�PC���u�l�V��g;;;4�l<F�u�L}��^V@D�g�0�]�1|�xQ���%y%����,�O%�y��2"��e8�[�PQ�B�#1'��������`�������]��qQd5y����l��S���������	��L�
�$���9�Q`u��0�h
�F7�|��������N�����������;�����t-#8������u]��qv�hhiJ�	N�,=��!�}%^w���1�J������?������kf��O�^S���x����^7&lxR3W���F�L8t�S��;��B�������R�Q[%hs�9���ah��u7��Ppc�6���L�T[�4d�f�C�.��y���%��!�����l�����#�������HESQ�s��}�[�!�IH�}1fm��w����2�p�!���8��8�����%UIO���}��cZ)S85Q0������d��f2�"������&�j"`�zx�$
yWm-��EhS�T�)��M2�K|y3��7�^;r�������dp��"R�-j�Txy���=���lE��c2�1E�4�������~�#��S�a�{�!����Z
B�m�d���B�#
�[���/�?��N_l%��SQzM���f��/�x��O�������5�Y��q@?M��
�D9���`}[�#c����TSa�;�L9
�������{�G7/M�ac�a�}��P4A��������t�g�$1������4���1���q�<�7c�0���a� �)����g�T���>Q3NRy�Q�����lG��Q-�}n���ABDgh�ud�K�������/���dsq��[���=MyxGK/����_�$����Twf��O5�:��0���wor���m���K�|��n���uxQC�R
�2g����a��@:��Ys�T:���������������)�����v����G����r4>���]�m�_�V&�d����)�s��m�I����){���#���Jc<��r��~����(�Y�P�m�_��Q�X�2��P7�E���5�t����/�Au�XV�
F�KW[����.�ZA�)�p�0����W\�w���n�����'��h4���79�P8�>�5������)���)^���x�O���b/��,
�G���RJ��9�m	o�|!��>�%���0�/U�;!�f��Y�R�tK��iI���\�M�>�,.������kX�;��|�3[��j��~���Y�cz+JH\���Z�������<��l�Q�����JQD�:E��71��|�<���Aq=�?lls	JP���z�_.`���B��1��X�W����
9^���&:���>[/>P�Q��hqH(E1=��W�mR8!C��P	���n��/L4[����`���#
I<�}���h����3�a������c����z�Ia���#Z�MZ��M�/#
����
s����N:��S�f�s>)�b{P�x��k����L�}���" �bog�VA'*W���L�/I�h�:�!�f`]�Wog�m���n���|�4=;�i�"�i�,�4���8u�L���L��L��^�l�m.o�lt�
����E�t���?�v,.���k5Um)�������=L�
��@z{��y�����E�Rz�J-���+q���&�	�3��u>�U���5�������Z��'�w	�P��������u����{��������(����g[�Y��Z�Y���A�J�%�\vS��I��r��a�wcmk���!t�g���p���@��7��|���U�������M��W�+>�%���O���<�>E����huw��J��n6�Vq}�-R�) y��,���/72���vDv��	���&$e�3qy���I�5��%	����g^���z�E��D������S���I�w�a����g��	����e#��$���;��bf
�!T	�0e@�3�Iq��Y�����f�5=���J[ �������j&�N+�P��Or�N8��@�8��������mD���2wk���4�p�ss����~�C�D��h�$�q���Y��s7�(��n.I�����"@��s������*@HB/��'����f"�Tx� w^�+
^���x{{3{��j�d��k;��;/g�;/�<lt��$4���xr"f�Q���j�SJ���_�D2�)�u�gN����*
N`�����W��?�����C4��V����)6��,@<�GY3��7�s�����BG�jg���p���u�d�;��j�|io5w�%�/�Y�wA����t������}$���E�-G�Q�LX�n@R�T���q��P���8�����@������z�Z@�O6vM��q5�x����0���O���/�N2��y6�\��pT��$j��
����_�3n���~���U�S�r��L(4���l�q�!(4�@�^['���Q�)3m��7P��G'����O��.Z����E��������'�����Z^[mo6���N��u��`���'x��1F7y5�\���B�]`��Z�
�hDv>�&�7�Y0���N^�Q�g��+/���&p�j�g�����G7�Nv�/e�(��Q
�fP�d�,��
V���W���u��-�"���Q0��wU����Hd�������#�-��5���W5J��AL	[�Q8�2W�Q�z.n����Q���aWZ���<"y@M!L�=��U���mi<�n��F������B[V��%�O�Z_Z���=}y3���*���r�*����U��,n��Vy^W��4P����2��{�=���~�/���R��+A��7��!�@�]��q��$�1i;SE�"�����l��8��<�
F��Q*�����Uhl�,:&)`�/C����
W!���
��6��jL��d����H9u�J��h��jT�dD�c4
����J�F�f����C���+����!���!O$d�����S���������n�L%��^)��VzRq��4i�F���������O����\���{�@��� ��N����\J}�H�KC������Z� �2,WD�������?GUm��U���?���T*�
c�n���|�P]���D�p�������s�m�rh���(�
�N{��������Pm`��W�FQ7�&,�%��R8
6EI��t���y���9��o�
�i����'S����!,���P6������O�L�|���x���f�>����zAU����a�������Z1����6�A}�[=�l�k�z��+���G�g�!��:��#t���(��Pj�O9ya��K��P{&��v���^�=���dp���~��!>-d{(B�*��������9AE�R�dn
Z��5����BSUE=�Kg)�����>�Av>��Q���
����wN������Q�����}��q�xt���F4E������i4���y�96V�.*��a��K=&�DK z�{W��X1���ZH��opb; 	SG�`;9��v^��������O�`��[]���\#�5�:��	E������fG������MS	���906�C[��'��3V���f���c������ay�l���02y���%8	D1�Q)<�|g�PLC���l�%���{����W�U��.uM�wB�<��I����h�J�����~>{q�f������D;w������n���[�
+$5�7�;|@�W��^���AS������'���0��w�'T4,�B>�rE���=�hA�����xc�����KBd2~9O��hu������t�.(��kC0����:�gC����aVEx���%�c�9����e�=�^�������*�7���p.�����(p�$��&���17O$�������������-��*�RU�"���i�U��(�M�w���cY�W?�"��0h,Z��HD N�X@&������}n���c�l�,{����IV����1+���(���@Zyd�m���,AZu��M�o�se��5'=<��<������|������A���xz�-P�wy�1y��0�*�xK��~�.���o�x�n�����C	�����zM���Jn"o9���&5�	�1�n������}|�*F��mt(��
��|IT9
4����������W��0�^�x��J�CR�4�I �'�B�� E��,�O�.M-.)�����m>f]@_/F"ht)��$5��P��K	�q���
5�S/C�lY<���t8��E�C�.�tn�$r80�_�`����\.�W��K�rS��R��vk������?z�c�2n��.D��h����iK�*4�P�����K�k����u��������2��1��L�?�����Iw���3g�����J��W
����J�3�1���3��� �3��4W]2~����"2����o-��Y�3�p����"��8�����u�J���~�C_�����{�������������d=<;:x��|��8���7�`�"'9��"�q$��G��,�����L3�F����8��\�KL5�R�S������T��]Q���0$�������,�`�.�5�7�F^7rx�	�8G�|��h$�����i��3��O�Eo�	{�=C��S;�-#�����*:�������y���UK_i���E�z�Q�+xJan)�c�:�����T�L�&��kh�8w&`O�p�A�+��7��.��C�����9���;S�/��]-.�K)P@%6��$��5c;��( ��x�T�O,	!��&L��	�Bb�;�m9�B�������|#�����Nn�zU�sv�O��y����<�\\m��
�A��-�W�����A�����?�X����~�%���f�n<f��?�^N���g9>��?�(���PnB��)?���<�AN=)�N���l����nd�Q���/�<��R��CIj�|�S����w'�g�''���"}������+<(+�ZWvC���oh�O9�KOO�<L$�c�z�AC
'�z�!�@���}�9x~ ��wmG�XT���������#��t�o6���"���\{������?!e�Ux�x�t}�����H^��xz�n��;y��D������o�I���Q�.�U���I�PW�y��*��8G��\;v33�!�s��=��~�W���i���{O�������������|P��KA�8VH��:{��p�����W'z-��(�%���������t�6�e��U�V\G�T?o#���%������i�=,_4�G�Q�`l����$��x�����n=L�����}@�UZ��,�~����$�L�A�����f7��C���R�QV���dt
'�`���5_���R9��y�)�X��i�P��1#�xe���eRM�8��y��1t@R�^�E�e���G��W��Eo�b���9�5�����+���Yo(�C<z�
wk��zhe	Qr/����t������!�_�vv�/p(�c���u2���u���fx(n	�NQ�n �`:$(B�av`E���D {-j:��:+����e�������p��9�K�����~��}��},��#xw��L���p8��;�L���`��iC�����A ��{3���y��(��aP���9A|������ x��F-&bi��8:R���Iz��<�(�DC
p|�!�^5��d����1��v6����FO���9R��
���Z�& ���a���������w������	UMx����T�}9�y
y��c�U����>vf�!� F�J?H�U�[�u�g}Q|e��	����'�M��)�(o�|�����YII����5U����S�#�q��
]>JD�F[^����S�M2�����T������^��W
�$��Coz%�w���:5��KN�M�Y�\]?T��c'Q� �����!�kh��M�Xbji�$�,|�4*~�Q_���5@�*�3�����`SAj�&�>
�Io������jZ�$	�
�����v���9V,�Io�])��dM�G��������l�_{d�%��7���j6���S�j���QG����H��3?�_���o�Cji����,���
F���)�m]�s,����K��~I��G��-m��gNc�����w�A����~-t\verX�V�|�FhF�M�&d������-L�o�/n�]����[��K���pYs�"@*M^���U����V.�i�����������N� �k��M/9��m��Ln�e���s�@{�Y'�Gqn.r
@�8���5jn�+�9�L�����^�4�?�G7�N���L�'��W�����*9��{��F�)da���O$�`�!���������nty���������^����>Kf�It�����w���-4�7��/z��Vq�����}��5���W4�\�	����.)����<cw �����^��b��N��b�"�F��rK�������x����oZ�|PP��x\�{�m�#�W�DQ�i�q�y����?���,�?���������|��w�kn��2��*^�������r�/��GK�D��H{���R�����'D
�y

��f���7�<��s*/����{^�������D'\B/w\���Qp�^�~�p��~>��/[�)�E���w,�*L0���c}!����u����1�#I9��Q�N��/�������
&|�,�a�����ef(������������0�b������x�=�������SS$=	k}
�8F�^�O���M��H�K�S9B-�T���;LT�%���Mk�I-BE�u���ct_�O�����|���As�1�F.��w�
5[�A�n��A���6�&��f�~I5`d��!��1lzag�9�	IE���;��'��gB/S��L�B<.>mH�q�oK������F�z0:�l�(a������W���Y�
���P��%6
������4k��UZh'SKzuA��
N�u�5�jf�������y���X��g�93��(sY�������vetuV���)�^��|�|��x+k^�n0GW��#7���a#�<og�PJr��3D�z�����h:�3����z���X��l���^�~��b4I,���U�n9��l=Cqu��I�/G�s)���&!L�]J?5�8M��R}�+�K�S�01=��F�����:��9���nP}y��� ����="���840pO�<�������� [�4���?�����0������k����������������*w?��5���A��x;y5P�Y��oj��Z��o�3~/tT?8�]�H����)�:���O���T���b7����G�]Vz�����������"��%�u�������b������6^���U�,���u���-���Dn�61���&���h�k7��XP��B3�=o:��{hx��O[E�|��so���4x�����q������A�90����}��?]|}++�Cv�g��e�7�M���l����}r��f �7PyE�D�'��"����c�"�LQ ���:`B��x�@�o�O��z��k�N��MQ�x"���1}W�� 3v�F~g9����(�Ib�0W�v!������E���.
�K���G��,�N��������
$�i�i5|�p�d�+�����u�'���"(Sy�G1�<gP� ���5����A�c��.L�o�#�t����oP���z"��^/��z2�.�l.�P�'I��r����/���
N���+'�Kz�>>}�!*[R��;7���_jF�|���"c2���jaN���V����\3���qhM:DC)u�&�1yh���)]I��#��<I2��w<��+�!b�Q����� )7����kKc�9������U9|�#����G���?Iq7�r'�c"��W51�����eXX�Y��~��{}�fpK2����%Z�q���k�H�h@|���y`���?q�������G��M����H�����:�6+������-�����=5��C��3VC�>�;�q����W$y��O�/����"5��{$�!��M�r��U9	�3���������D������q�B�Kf��`��B�K����Rnu��b�z��]PXT���ho�%o<���F��q�R���hi�x�[�����VQx��@e��T���,�S�U�V�T.��$,�$�gk����f���<
������Y-�i9/�H�n�@�H=�b[�F��Ec��5��'���TE}��)3�	�����q*�O�+��$�	=YeT�_*
�E����,����U�(��R�����R�i��Y���*�U��7q�w�����������UHI���pu���B�`_!t���(�&N�����Fw�������X��,�M"��V�e"��Cz�4�����$��%+Ba&��lc����{��������M0�"1��yj�X� �Tu K��L�-�2��SZ��<�0�$uV����F�$�OO77��l�,��fl;31��Ml���c�����e�4�(���5!��7�����]K�
�����W)0=�m�[���z�������hpU���5@���9�m��j���%����/�5��������r}�8��P<����[����]�id�'�l<e_�����v�X�}i���{������W(������������z-�ud��u*�e�����s�0b�q����/����<���+x�z������iu*�J�pE�.������AM���,��L�^�W��[nWR�����]��4�V`��5�5����C�zj��E=���UiJWqU">�M�ZH!�e���z�p7�k�~z����d��hq�HjB�G�������(��@^��~�(.S	j����0��6�x[U0���}��f{&����B�Q���/L7��4�����J�+�;	W�
/�m���)a'\�W(#�y��2g �Ym�"h!WaTx��W&���-��M.��x�;�%
�ek0���Z�3���\�jd��Wy|��Q��b|�I�.Y�m�����0>��Z*��TdH����V���%��+�{4���i	w/:����e�"���
� P����������K���S�2��-����C�E���{�h&��~%���f�=��w5�w����5��nI@�>��*�.�"�J�� 7�
�N.�U/Y������]JN6����+�*:<7��5�k�)���2��I�9z���I��==���,�NIu�c�����R3|J�a��=F7�`�W�����GT���YuH�z����
_��D����c����0Z��J�b:�x�8r'�~����p�{q������5~E��=	������,AL0.�f�)����S��/�����
��>}F�20r~U��YQXZ�����t��K���w����!z�]d���5�F �#/t�����������"�?@u'����eZ�o/����C��|���zb#o������EHI%
���?�P���|�4.@_���x3!�����x��+.����c��:h�'�d%�;��=�ONw�{��L����d����)\y�#���& ����7��
��j��H��6��k$E��+^��$��w����0A��<o$?�~o,�a������7��49@�g�������+� lX�:�=�
]�&lK'C2�SEsC"�c��6���W�����Nd���j#�?����������<I~�~�x���k�{�8%�����x+�������U��d��>"�g���?�R�'���a��V�����?'o_�j���:��2�����IN0��<�"�g�-�
���X���h����m�������
7*�bm�n�k;����������Ac�54���/���O��f!]�cdJ�	�T�Zq������A6���<I����[������y���(���xC���o�(�DXt7u����3`�	��3�W�����������!��>:�2x�t��V��c��c�L���j�����M3n���o�����������jO��Sx.pOy6��?�z;��0��%>���Q�����5rU_��$4��m��%�/����z{�=���^���6w��'����������S��8�,d!
GJ�-�7��/��Jt�Z�J�R�q�zo<���ii�)
���3�2_B�)�{�:�����i�e�tC�l��D����JF�y�h
���At��.Jd� \��9�g��a���z�r����9�����~�>�c�R���C�U�����6���#v&*�w���)v���'����#_�Fx*B�&F��7`pn5��K0�A�2��O��@~uM|��	>f�]n��I�p������[m7}a^��
�f��O�W�"8���C8�rI��E���k#�D=b�7WC�HP����g#���t]����y� +�!f�"A&����� y-���p��t2��x��Av��f���j��!&6np��Er ���G)&��]�*�&C��O��y��	
6F6J��q]�|Mcd����sn��'KvIb�[��<���[�hG�[p9����c9��x�yA�|��8&8at��X�`A��r�������&���v)�j5��!'o��U&:W��zm�l��j�mIAe�+�b�GQ�{�&Fs���;n
:U��Cd�M#
U��u���NC��s�^W�Y���a��*���9�/k�����F�W$����P_��	=o�*n��cW��si�W��A ������U#��XZ��	'W@�o�p�<�ym��MD+���6@��q�N��p?a��k'���0h�m�	,���]��������,��h�����C�2�,�QsGT����@���JBGTJdf��d�����E�|���nm��]Rcq� ��8���6�z��<st�g��R�����T�����pIj����g���0{N�"30D�.�Ao��5>�N�Av;l�"���F���~�n�ftNsL.\>�a�
r�(�&^�:����I'�s��*c��%y�YcQ	6EWp����n!B��o���J�(���SD~4	"�6�E��K����WW��,`���O���0�e���46y�X5�|��Z������HS��"hL��O(����n���]�/�t��8�����"zE�����cA���\��7���Ek�Pw�i������C�?��Jd��P������B&x����pH�P�/q�A2MI�e
	�3G��@#�J��w��N�*��=�lB�jL�@���P������-�{����znK�;�#��/�I2s�t�Yu�9e�����TE�?���8?G���<�I�L�)�������H��t��{�fm�	M�IF.$����@
IU�27gF�����(7�0�w���)���J|_��rJn{Idn���I�U�>�I�`��wYh��\�t5;s�y�I���j��q�aR�:��NQ���C�a�H4#�!��	3)� ..���$6R����> ���l?����s�D_�m�5Y���k������|�\�\]�Z��
<;����������n�`S����!Tn��k�Ec��)��5��$�]�2K���/Hi1/ Q@�{(�ytSY����>���\�K1&���DYo��(�d���U�9U���e��\�Sf���U"f�����k�2���	0��)�1<xIZtM|9t���b�&�S#�9z������f�6C2�d�-���l��`�-#9z�KVqEJ�%������U�Vxy����=�^�c'�]�������Y���0��Q�n�z���$��C$R
W�i�����9��90�0&7�;�I�1V�"����,kz���;$K�������Z�4�a2Z�nDfl>'!;�Ue���J������AO�1�����&o�..~"���D���9��<�*�2�w�E^���C;���=X�x�����������$�6�BGw)�@g��n0�b�]S1��f�E@a���o���%|����>�l��0�?$!�Q������^�<(5�m[�/d�oQ�G�+�fzn*\>5`�'�:4]�����6��C���.�����d��'�����qo�+��W9[ %W��M����H��L�cB��.FT��h� /!N�����qW��������l(�r�
SN��t����n,��(�S�B$}v�"H�����w�����E�Q�?|�����;Y�wd���E�9������.����'�Fx�|9A2F�[�H�A R�	OE���(�B�:�ry;t0y@Q�u8�<�]U�*0a]�@���l��M.p������u�2+���X���:r��Z/*(�y>!s��;6�g��"'#U�~��c�%��.Zz�y
R��xmDb�Y�T?a�s�^���cH�t~�qT	#�?�C3��$����re�8]����4-l8�W�5�b��a�L0���^2�(W�����(,U�X}�Xs�[�:0)�b?�iq��T���^D��#;<�c�#R����c��������m���O�WT����
q�j"���U{U�E���m�:�)��:�'2`��5�����++���$�m��_8h��PC�"���`Q��Xy�<8��H�gg������;G�'�Q�������qY�R�$���+��s#a-O�8�c��42�������������� y��Z��Q��������3
����F�WW�	��8�fA�!�����E��w6-2k���m��H�o�5`������!2������@w����(�!5�W&�"T��2�����H��������5-�W�_�&�w��,�����O�:^�tU�K�������Z�0y'l����
j��������)�w��5
OtK���=t������(������*���H{@@9Y�r����K%��ox-o���H�_�B�	(rZI�xh�ImOj�TG��&.��>]q�uY�2.�D��B2�=�6�����q���1y���R7�%�/D!���R����&lp9/��
k3�\7�Yd���>|d*�/���?�!d]^FIE����_�T��j�����P����X�Ja��xR��*�IQ<i�)= �tnF���,���@���a`IM�������K3e�Kf��zy���7m�������j�X���ZV��c��7>u��LQ�^iZ���T��%m�/94d�*/T��6����]��#|�������v��j�����N���/^�9hm������t
^MP,�������
\5�0 �Z"u�kS/��T���N���`�}������J�s(�D���Au�D��}��^&�+��������z�lHI������Y�0�wWs�SEZ
�c�����A����:|
Qs:�T�E��>3���E����Kb��
_�7��`SZ_X���HC����&B)����D�"G�����DV���o���|�IdD������u�!���;�q����L�<����f9��)��H!&����8�
��n�a�T a��T[r���:W��D�������q�C�";Z��nl��o����q�f*�O��8�u�X�������C�u��O���/E��svO��h��	c�h�T���|x�7�!������WB�U�>�E�AH�9H����5�����x�C���.1r��.����m�n~:��;5�[��}Nf�m��+E%-@k��z�J/����j�{��WE���t
�OR�4����'�����`����0v>�M��LD��}���������_�p����F����C�zg�j_#�����P�m;~���j�CM#���:E��)nftpt�>��yq>��Ge9�a6Kqj���?����q��.��������3��T�D���I^=��z���b$�{���A�N�;s4���sv4F�4�S����*m��c��CZ��7�N/L���lWS��Gs;E����
5����|n%�8��k���P�����������(��L�`7������p^:������U�iT�~P�a.����?�*��|�TWra�~ ��'��N6�[�t0�..t�7��pOEn$?����{�����O���&�2�l���H/��M ?
f��3�q�q�\h��jm!%����z�+X�F�!��,���j�c����.�@{X���J��@��)EEy=#6��PF�,���!v�r����G��c�89�
��S�c_��d?L���W����($�X�,	sH���t�����pC���%#`0��V����!CDD[S�<G5)��u���a�\@-�,���-��������vp&?� �Q�j������9^S9��M��-�i����4 !�����T
�N�<��*�G�~��UQ?k
����~���_.+�S��L�V���hq�,w\�0{�_���P����8t�k���t4t����h���E�P��6�!{�nL�D�����@x����e�N
�� ����A>P�#���F�m�������.K���<�?������CO?|	����E��
�����b��2�hgH��vx?v%M����s��A�^O?4�j'tm��`R+��i5�.;���b����-$v�so���S`<�6�� �B1�d��z�=����^�HXX:�k�q����7W�ez����Y�xU=[
R������c�f�1Gu��
f�X,i6�%����1��<�%��~�w3v��W��4��h+cm���#*�������Q� Z�~���4��K��U�c���}�q�_(���^�����4pC�$�j����H��'������"Z��@�M]U�zq���<RE"3�(X���%������+P�}��&?HW��zH����:���8{R�K_v����F�"=�x��9�<[�^�z��Y��K/����G]��Va8B�����
�����tT
���[��5��Y�-�R�O����K���!�5*)�%G����0����4L*XV��������	�=� ���:�����1�����������K,��i�#�Z���V�=G'���N��*��i,D���tY��k� ����8�>��.�Iw���P,s����o�Muv>��������	9R����3�,���6�]6Q��!/�����rr��
p0(�pc�9�[����<Q�sb�@�}5BUJ8�p����;�V����W����/{�Uh���\�������{5��$�D7�� ���c��1��'�)����tN���]�9�P$uk�;Z��$^��8�3�5�����e�� _M���F��Jy�z��|��HHE"i�u�S��+���I��-r]���9%�`�u9����t�:=���sY���T�����)+�F�����2���iX���V�6u�DP"ree�T��%���=�������f���e�bR��(��|�r���l��������0
�Ux&�qi����1:��_t
KK�5��������'��p��uxh��?�����G���Z���4y�q��������z�a�&�4s1�����Qu&�9�c��N.�v���-4�kh���\.���{9�B���D�������^�����C�e�"�g����H�����
U-�p�U^�o?1t5H5��c��*��7�#7�7pc�:qr�R�bs����j�&��u�oJBE[z�"IW[������5,W,��wl�������{z~X�E�`y6��()����D5%���J�XmBG�N�b�Z�^�����b8G|.z�F�o����'�AyMR�����}8�U�(�0��%)rh�����y�����$�"�C�r4��� S��e�.�������27`�\���	���U������8)�B�'��p0�����Qv�.�]t����N�	���W���0�M���E:@-�1��(���\=��� ��%�!L���]s�N��b	j,������h�I���D���&S&���(�4:��l�,C��h�l�Z1"U���xW��^D �E+Rj�/>��T���|������{x���&��I�1�u��!��8]m9�3�,�rz�lU�d7c�}r�l�El)v�EI��G�4���e�y�.v6h�d"������$� 	=�D��������=��:8|yt�c;��8sK��?���=�������N�^��==;��inn��[I�8��+�S������j�_�Cj�{��D,/���;-�e�#�
��g���W��3�4���f�]?"&���&��O�~O&�7�0r��C��"&���
y�<K`d��OaL�p�r�� �5�%O��RU���6Ym$�k""�t�e��E�||t�L�C�B�������+&�5����;k�����H��Wpvpr�����to������Z'> �8�0�j���;��>$�}K3b?��d��x����S(����G���zp��.U�5l4d�Cv�0�I��l	?��������!�L��:8�2����LG��<��KIN���(��:�}��9���E`(�g=�n�2���,�j��>I<C�������$���z�������I�;��5��p�G7���@7O:�^�
"/!}�_�����4�����_�����#M8�;z�����t��	�p��a�y.��������{G/��^����;�?;<:=;y���������O��Uq�.qBq�1hq��q���1Y�W��"��lH$����e�\w����u��5���
��	d� ��`b�����[iiO_�yo�q��Q�W�$ t��������%���������0I��zz�����t�O�O������;�mu�����|����^y��R���M\
N������cb�����	sl����b�b�^��9-�S�����g��w�/���g�1�����sx�o��I�:R�3�f���:p��HO���^Pd�\�sY������Bq�/�.��U��������Q�����x��
6�2��b�;Wc����j��k���
�3L0�t83@�&�@�hGJ���3r���/^x-����C��{b�5--���X��ji��[���}���F���u��q[=�{��
V��_�@r��_}������F������=D�����7�9��{sjU�5�T�\]i]+�j�U�S�o��
�)u6�q�u��m��n�����<~��pdJ����tt]�+�+�%�;=b���>���)6�9%=��#I�������n�c��O�)���3����f�?��>�%�E������
G�J^���5��N}�� ���=��)�$G41:���� ������93J�}�F�}��'p���"��nL���r���?��.u���lm���uw��82,���"���mS���B�Q�{�b���R12�t�Z��
��"���������@�JA1��9�\��sT������P�o�f�@��R�^t��$7������p4��������.��~Os�����E �|�r1�_S��P`#a�,�[R��o����3U�60���OZa�e)i7��P��V�4��4���]��m-X�����Xx���Q��������d&�]�n&�2�j�n��R�xU�_�Q�JC�����x��|W d���S�0YH�$���ii���0��!�mV����[�N�5~����kv���!�x�����6�]��J''>:��C�I���f*���kR��m��8���
k���q��s�O�w�A�$&�
��]�����hX�,.e�)���m�!����@��To����fs���o����T�+,�5��})�8���u��{{X����=�+b�6����d������m�a��=4�M��b��������}�.�+��}Ix����5B�}�e!���t��i��Z���k������8��k�)��^��y�v��D�?q�a��7C��6j�'	��L�N��k�=1z�>-2�/���k�/�����/7���/7�����*>}�����j������3�K�����iM�F� 7)�dk|\����wGTch/�B���8#2�0���r�;������y���'�H�2���p/:�����E��p�B�NK����n�R�c���g���B=���'��������.�p�����+p7�����6�����nL�v��������eP�|������p��S5�4�`��l�m��_Vs�_��5��8:��v�=��?����e�9���,��hJ�%������_��}����6A�{�@��J���X���s��j�c'e�(dT���u��������9���/���n~6A�����2���u�^����()M
�������T�b�l��#h�Q'�y��������;pU�dRr��(���?�T`-�^����"X9D?��U x����>�<��M_\W�N)yzJg
�����W��o��������x��:����Sq�E�G�}
]�
��Q�0�{��������\�`P�aU�8D�ss��:p�&!����7�N?h(��J��	�I�NaW$���avw���~���+�:��]�g������l������y�n���YIA��������9�ghq3�Vw��	(R��z�Y�EP�d��x�)��f9%T.:&����L^���������e`��n~~sqA��Q
�����v.�t����`��Y�A#X	prx!�k�<8<��L:U���d����z�\}�����
R�6�~�
F|��1f��|����o�D�F��a�����P����|�v� B��q���k
��,)5�����'k1%- �}c"V��q����J��/=C�FB	P��O������L����'��hz������A]���mB"��!��7��0�X���<�����G�X3�O����&F�����QD[)�6��_���X�~�H�DK��?@��T�v���XS����Tr��u��T70���������+��r�q�h
���u��E
��[��7�B$�W���m<���%���k�u�m����M�d������0���e�����{����Z���
�+��_+��t��:�y�7�������ou�[��Ngk��q�^]����_^^�ou~iiiJ��������j���,��v;�����0?EN�;����%t��u����U��O��b!A��,b���'X����??O�����~�HvO8B�L�6�}�0c;/�v��u��\E�\u>�,���e����s��Q'	�1P������jNu"cqmZ�)�a~�:8g_�4c�P�/���d��������e�����	r��E -
S-�jI����M�j��!��b��^�����>���"W�=������B�d�k���=-n�)�W�}g�_����c�&�����pM"�G�����(J����\]�o��"��)+��m5�h�r
����������
m���������������2����nZU5���y-�kk�}]��_����z�;
���bm�fO�o�Ji�v�����{Gn��
��_��ZR{�t��+p�:Y1i]
e��^(�����<[k�z[k���j�r����w}:�[[�-8��c������d�|<I�$Y_�|<��m�%Y�����D�66�s	�m)t^_��
���0v�����i<c�4E�!%�a{{�<�%�L2F��)�
t.F����a
��/{�����u6Z�|������i����{&eh���Wi��g{5�6�z;����U�Z����Y@V=<y��}�����	^������I6o�d�8]��/G�Do=I��������x�]����	���?�P<h:)hhkk;����B��_k[��u����g�����?�7G�?�]i/���F�����C!A�0D�3$WH�kXP~����	C��`m"q�+�)+c��3��.�f��=�J��RG��Z���F;+�^=����T\�}����'k�~���S���A�_��;�����JNJ*���X�n����}���TSl9��NSu��+r�N���Y���j6u��Z��1hb�m{s�cf{��������m>f����loE
U�pcc3&0�>�a�����x�f��\����K���v%
x���������?w2���o��g�A�����o�o�vi���~����<��)w�����4�/9�3.�\���_ue�;k:~|X>�e%`�
`�g���O�*3�m@���/�}�|�W�0 R��d�BOH
����3f5�������X?�Y�a�2�/M����� o���o���Z���:�jo&9nj*��fc��/	�{�O��^�q�����v�9��Q�����/�*��^����h�H���<��{����Z���lC�9j����7�ZK���Z>x�j�h��>E�����}����x�����z;Y�Z��E�
M<���G�UE[	�;SiB2\S���J�,W
�Op���}�M���k�P��c���j��:H�]��Rg���3�E]]��hq�u��:���jZ��E�����������xp��P�
��vjK������f����g�.nn��XU��������{�'P�WY.��������9`��E
��$AW�����,�d���I����r
���p���N{}���n��Z��pJS����s��M� �Q�W~��(�l�
��yrB�������Tp�+���8�]�=���o_���q�^�A�u������0�%&������e�_?���&�����^)�|���w����f���x��."�]]e3xr�o�G/�����\'o+��T�����z��tE�gfMv_�:4�^���>z8(
'�H���f�@��=��w������x�z�������:�k��}�����oS��������;;9��l�<����(_�RzOo"�_�����s.��D���g
}A���Y��MI�3\�Z����
��IE]c�z�$����_�#�<�m���d%�����&�
gMe������3�Z�&\��,_OC��jJZ�����|!Wi���-I8��0z�\�bQ��)������IP�������A����������n��n�l��L�r�(��R����!���;�
��P���*��7�����Nn�*T{�M�Q�"�=�1�mtj��iL�Q7�)U�&B��?_�:t6��]�7g=L~qv=��9e�#���TR:E9�����R���������wj:[�7�������No{{
�k��N��f�1�<���gx}�d� �`��)���\�.�����������'��@�A)�������������X)��+IL���f4�w�&H�3���"�����?�k�4q���txt�?7�:'���m"����P/X��:�,��������*XoqO����}mJ6��q�*&t��rs�:����^���������@OL!��e.��d�
9�$adbP� �`LJ�Y����$��p�l��`�>o�|�����:���A3S�@����vO�����Y���c�0�h��O������W��"��y2F?SQp����������z����o��7��l!^���iIPd%7Ir�O^;z�\�%I� ~���4��r AF���D{�����\P�7"���zy����}��2D���4G�R��&���IS�AE�'@y�c��z-a|��^��a�6^��Z(] ��� g�'�b���R@�����Gz8��#���T� �~kb
��!��E�#�1��)�8��X ��86.��� ������k8GYAU�`u5�p���q��G���>�/����M�\��xi�%��Uv<5���6��DK�d
�^S�wMJC
n����s�r+c���sdA�0f�<5
P�5�:�m����g��T�<�O�����������NdU<���O�9;�G��>���]pYF�.FW���ck�N�a��R�;����;]����O=�q���L�v�IN�~����H�����=��k
E\(@�<N���u�����N��5�����y�v��)�n��G�(��L��*tW��!�{?�fc�+,^KA�����
���yeD�VF�7����YY/�F8�@�R��NMf�����/J������{�"��u���w�4�#��kG�c*����1�#0�������?sV���6�V�#2-�F-������"�ut��5',��X�'��s�<&�t������0"�%��'{�X�\L~������������ ���7
wW}�ah��N��0��`�/��=Z��M�-�r��yjV�^�+���L�����5f�$t8|`�Z�lZ��V���I�@H����SvJ"�fb2�|z6�?���n��3�]w�|m�HS+���*r��{$�����	*xXwa)�������]��DB���@7�������=�-��\@�k4g��<m�M!�����~�:4x����;�
�6�|���6�}��q���d	������^�&�V����q�O]������P�Ma���1j-�b%�|M�� �>X��<��<���3�'W�n�Z\	l��V�qw���E���n�i6����+<���d��l����_�}r�A2.�D������'�����OlsI
R/��0�������;����1���:l�q�:�����N~:8���p���<��
���(*�I"�;������a��;���Pu��Oh��V��������B��L�A���kj�*���A�z�h��=r����GN�>�(��@#���2Ru�|l ����s�M�����	��nX�@!��f� �C�L���?z����3#bt"��.U����1�7���Gu6^�O��O�R��mM$�(P�bmLn����?�w�"��!#t�M�	��8��������o���]����6g����R�%O����?�OL ���u�� ���6��:��i=�����?��]����M`(iG5P>=���=�_�7�g��~�����j��K?�VW�X�/\�/��K��%������A�Gx�B�s2F�����Mzq���tfe�������`���&/	�
�z��Om���o_��k�wtrz��R����>*?=2\L��^Mv_ ]K6��>���n���	4X�*�}��m�j�t��`-���cr��)�
�O�4�A6h�)
C�����D[<�C�M��0����L7��=�����m���D0�%�a���>����������vc������{�c����000�?c��^]t����bs1��e�F���08��1LH>�=�������������g1��s�J�vO�q��"B[r�Rb����-�H��U�����M��QWoC�I��h������S�U��-���{��]�/��2Vw����~���<�`A������dcF�e<��"����
���
�Ew���3U�h&�j��a�#q�\����\q�0��V\t���
��k{��������mO�$�W9��S=��mt=>+�!���g{�j����N���je;�����r �L��g���s�*E���54��NtZ_v�O.G�����P7���qb��UY�_�\-�����E�"�V[
��fd��'Q��0���E�>_
�����jYM5�g�	����;�������W����������NE�p�:��'L��)��T��h��o���gl��VS�����5�E����;�J��9��Z���[��Z'�Z��z}|;��_����[[�D?�
����Uy�akX\���`�5����xB<K�j�/&�iJE����'t�=)�YY'���j�)��K�����o*���LE��*y@�JR��vW���j�����6�>E��R�bw�^Zzw
^<�MJo��'�����"j�����
��_��ZJ"r���������Q��R}�P9;����[�������Vk�����n�z+����qy��XG/A���R��-(JJg����bg������^d�����A��������������	c��5x������=��M���W{�� ��_��E��H(���1��MG*�J���P�,�8Y�Q�P�U����o;&�G�������2�<����7+���%�k�sj�������0����MhV���
�yY_�6;y�jmn�g�����r35�\���w��|��s�'g�q��"���DX	$��C
���O�_�*��y.�x�����EC)`��W��
����w2X���N����y�����,~\��~{��K?���*��|^0�q��3��_�{|P9��r3y�e��^����c�������wrV�;���M����c7'���;���/���Z�Z�P.�������k��X���L�'������M����m~7W�?��?�������������p��n���[�R��������9��4��a���q�=�p��Z������"jM�TQ

0007-WIP-use-ndistinct-for-selectivity-estimation-in--v25.patch.gzapplication/gzip; name=0007-WIP-use-ndistinct-for-selectivity-estimation-in--v25.patch.gzDownload

0008-WIP-allow-using-multiple-statistics-in-clauselis-v25.patch.gzapplication/gzip; name=0008-WIP-allow-using-multiple-statistics-in-clauselis-v25.patch.gzDownload

�j0�X0008-WIP-allow-using-multiple-statistics-in-clauselis-v25.patch�Y[s�F~�~����]Rg��x���]C�����V G"-������V���v�6;����s?��qG���j�M�EL��������U��0M�2tMs��k��u��NA5AQ:�?�E���M�Q@�)
���w��O��q<���(x)�F;p{2h-����M��Q������������6���W'��W����7����na�x������O!a�y	���P�'���82����� I�jU�$�������GS���4�O	����lH����P�Z]�3��zL�1M�:�������h�G-�1$7[��V�oJb������Pg��-��T��Hh��"��m(�y����#���Eq	p=�&`OH8����F��
����T)��z^T���b�!�
��4�}<�}�A��x����c@�;F��q������-���Ql�m���<�v�)�sw�{|UMk�����*N���O�������0�"&KppP��Y�F���c�,HJ�bd���xO�����Uw���'������`X��G��1�H����=�iB�a0����3�Et�wL(���S\X��������5���w(6�8�#<����������T�"���������`ij��*9	����$t �$�'�Fq��ez��9��A>D�=�	�{�+=����:��R�]Tzi����r���|s�/.��)�&dN�@@�=�z�������	n*P*��_e�����i�SM)�V���c�:'��2�R�S]�"<��bj��w�B�P�&��PeU��K�P��)W�P�D���<��Q9)WXj)��<wS�����!�+�`�I-�W>��x!�8��x=;"���/����k�
/Z�-�{�t{������{5xD@���E�<7�6uxr�)�Z&�|�X�c����:�%�N�{h{Y.��R%��Y�����r��o"/d���x���	���^Z����+��Z��qO��'�B�hB����h��~�V����v�������e^�`�Q��ZOZ���Bz�Vj��)�W�j5�ZMW�i�Rw���|?W��*�g�"c���X�KX.�mx,`J��`���Q�rz�=vax��EWd�(K��P��P}+��s}'�������s�	�K���w���(����R)<�Y�t�
oz�WP
�y.�|%Xe8��|�W[�� �h� �i�N1O-��\.8�28�8����	I|�C*�Ce�X����B.
��n���Cq���u=�
��4�'����%�3�a�M���.&������4��l�H�^���p]�)���{�=�pz2xuM,D@WP�9�X�YL���]�.����2��+�X�4K���d9�W�7�Y���(������3����95�{�9g,�|1��1��)��F��MJ��sP����t��*����O�'o��E�����I�����Z�o^u�������P@�J�Y������8��n7g����c��-��X$��n��P�}��<��[
��7�#��7������69j�:���Z95����;x�Z*�6���%-�Y���#:�BN��/���n
����{�����	g��H�	������O��
�n��6��
�=��������U@u�q��]auv}y�yG~��b��������20�qz�������/a��v�\�KUm�
,�p��9^9GX���4��A�� ����to����;�"������m_�]��/����>���C���/u�Z[�EF�UW��Z����j���:�-qY_�F�l�:F����"���]�W����[��w�X���E����Kl;�Fe�2�XL�n*��$5�SV��Z?[%��~�Zx[5�,��3�%_������}��~1C�HZ����7��W%

0009-WIP-psql-tab-completion-basics-v25.patch.gzapplication/gzip; name=0009-WIP-psql-tab-completion-basics-v25.patch.gzDownload

#213

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Tomas Vondra (#209)

Re: multivariate statistics (v24)

On Thu, Mar 2, 2017 at 8:35 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

attached is v24 of the patch series, addressing most of the reported issues
and comments (at least I believe so). The main changes are:

1) I've mostly abandoned the "multivariate" name in favor of "extended",
particularly in places referring to stats stored in the pg_statistic_ext in
general. "Multivariate" is now used only in places talking about particular
types (e.g. multivariate histograms).

The "extended" name is more widely used for this type of statistics, and the
assumption is that we'll also add other (non-multivariate) types of
statistics - e.g. statistics on custom expressions, or some for of join
statistics.

Oh, I like that. I found it hard to wrap my head around what
"multivariate" was supposed to mean, exactly. I think "extended" will
be clearer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#214

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#212)

Re: multivariate statistics (v25)

On 3 March 2017 at 03:53, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

This time with the attachments ....

It's been a long while since I looked at this patch, but I'm now taking
another look.

I've made a list of stuff I've found from making my first pass on 0001 and
0002. Some of the stuff may seem a little pedantic, so apologies about
those ones. I merely SET nit_picking_threshold TO 0; and reviewed.

Here goes:

0001:

+ RestrictInfo *rinfo = (RestrictInfo*)node;

and

+ RestrictInfo *rinfo = (RestrictInfo *)node;
+
+ return expression_tree_walker((Node*)rinfo->clause,
+   pull_varattnos_walker,
+   (void*) context);

spacing incorrect. Please space after type name in casts and after the
closing parenthesis.

0002:

+      dropped as well.  Multivariate statistics referencing the column will
+      be dropped only if there would remain a single non-dropped column.

I was initially confused by this. I think it should worded as:

"Multivariate statistics referencing the dropped column will also be
removed if the removal of the column would cause the statistics to contain
data for only a single column"

I had been confused as I'd been thinking of dropping multiple columns at
once with the same command, and only 1 column remained in the table. So I
think it's best to clarify you mean the statistic here.

+ OCLASS_STATISTICS /* pg_statistics_ext */

I wonder if this should be named: OCLASS_STATISTICEXT. The comment is also
incorrect and should read "pg_statistic_ext" (without 's')

I tried to perform a test in this area and received an error:

postgres=# create table ab1 (a int, b int);
CREATE TABLE
postgres=# create statistics ab1_a_b_stats on (a,b) from ab1;
CREATE STATISTICS
postgres=# alter table ab1 drop column a;
ALTER TABLE
postgres=# drop table ab1;
ERROR: cache lookup failed for statistics 16399

+   When estimating conditions on multiple columns, the planner assumes
+   independence of the conditions and multiplies the selectivities. When
the
+   columns are correlated, the independence assumption is violated, and the
+   estimates may be off by several orders of magnitude, resulting in poor
+   plan choices.

I don't think the assumption is violated. We still assume that they're
independent, which is incorrect. Nothing gets violated.

Perhaps it would be more accurate to write:

"When estimating the selectivity of conditions over multiple columns, the
planner normally assumes each condition is independent of other conditions,
and simply multiplies the selectivity estimates of each condition together
to produce a final selectivity estimation for all conditions. This method
can often lead to inaccurate row estimations when the conditions have
dependencies on one another. Such misestimations can result poor plan
choices being made."

+ using <command>CREATE STATISTICS</> command.

using the ...

+   As explained in <xref linkend="planner-stats">, the planner can
determine
+   cardinality of <structname>t</structname> using the number of pages and
+   rows is looked up in <structname>pg_class</structname>:

perhaps "rows is" should become "rows as" or "rows which are".

+ * delete multi-variate statistics
+ */
+ RemoveStatisticsExt(relid, 0);

I think it should be "delete extended statistics"

Should this not be rejected?

postgres=# create view v1 as select 1 a, 2 b;
CREATE VIEW
postgres=# create statistics v1_a_stats on (a,b) from v1;
CREATE STATISTICS

and this?

postgres=# create sequence test_seq;
CREATE SEQUENCE
postgres=# select * from test_seq;
last_value | log_cnt | is_called
------------+---------+-----------
1 | 0 | f
(1 row)
postgres=# create statistics test_seq_stats on (last_value,log_cnt) from
test_seq;
CREATE STATISTICS

The patch does claim:

+ /* extended stats are supported on tables and matviews */

So I guess it should be disallowed.

+ /* OBJECT_STATISTICS */
+ {
+ "statistics", OBJECT_STATISTICS

Maybe this should be changed to be OBJECT_STATISTICEXT */. Doing it this
way would close the door a bit on pg_depends records existing for
pg_statistic.

A quick test shows a problem here:

postgres=# create table ab (a int, b int);
CREATE TABLE
postgres=# create statistics ab_a_b_stats on (a,b) from ab;
CREATE STATISTICS
postgres=# create statistics ab_a_b_stats1 on (a,b) from ab;
CREATE STATISTICS
postgres=# alter statistics ab_a_b_stats1 rename to ab_a_b_stats;
ERROR: unsupported object class 3381

+/*****************************************************************************
+ *
+ * QUERY :
+ * CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+
*****************************************************************************/

Old Syntax?

+ $$ = (Node *)n;

Incorrect spacing.

+ * The returned list is guaranteed to be sorted in order by OID, although
+ * this is not currently needed.

hmm, whats the tie-breaker going to be for:

CREATE TABLE abc (a int, b int, c int);
create statistics abc_ab_stats (a,b) from abc;
create statistics abc_bc_stats (b,c) from abc;

select * from abc where a=1 and b=1 and c=1;

I've not gotten to that part of the code yet, but reading the comment made
me wonder how you're handling this. I think predictable is a good way, so
that would require some ordering on this list... I presume.

+ * happen if the statistics has fewer attributes than we have Vars.

"statistics" is plural, so "has" should be "have"

although I see you mix the plurals up a few lines later and write in
singular form.

+ /* check that all Vars are covered by the statistic */

This one is more of a question:

+ bool found;
+ double ndist = find_ndistinct(root, rel, varinfos, &found);

would it be better to return the bool and pass the &ndist here? That way
you could simply write:

if (!find_ndistinct(root, rel, varinfos, &reldistinct))
clamp *= 0.1;

@@ -3450,6 +3467,7 @@ estimate_num_groups(PlannerInfo *root, List
*groupExprs, double input_rows,
clamp = rel->tuples;
}
}
+

Adds a new line by mistake.

+ /*
+ * Only ndistinct stats covering all Vars are acceptable, which can't
+ * happen if the statistics has fewer attributes than we have Vars.
+ */
+ if (bms_num_members(attnums) > info->stakeys->dim1)
+ continue;

bms_num_members() done inside loop. Would you say it's OK to assume the
compiler will do that before the loop?, or do you think it's best to set it
before looping? We already know we're going to loop at least once, since
we'd have short circuited at the start of the function otherwise.

+ k = -1;
+ while ((k = bms_next_member(attnums, k)) >= 0)
+ {
+ bool attr_found = false;
+ for (i = 0; i < info->stakeys->dim1; i++)
+ {
+ if (info->stakeys->values[i] == k)
+ {
+ attr_found = true;
+ break;
+ }
+ }
+
+ /* found attribute not covered by this ndistinct stats, skip */
+ if (!attr_found)
+ {
+ matches = false;
+ break;
+ }
+ }

Would it be better just to stuff info->stakeys->values into a bitmapset and
check its a subset of attnums? It would mean allocating memory in the loop,
so maybe you think otherwise, but in that case maybe StatisticExtInfo
should store the bitmapset?

+ if (! matches)
+ continue;

extra whitespace after !

+ /* not the right item (different number of attributes) */
+ if (item->nattrs != bms_num_members(attnums))
+ continue;

again using bms_num_members() inside a loop when its known before the loop.

+ Assert(!(*found));

This confused me for a minute as I mistakenly read this as
Assert((*found)); can you comment this to say something along the lines of
the fact that we should have returned already if we found a match.

+ appendPQExpBuffer(&buf, "(dependencies)");

I think it's better practice to use appendPQExpBufferStr() when there's no
formatting. It'll perform marginally better, which might not be important
here, but it sets a better example for people to follow when performance is
more critical.

+ List *keys; /* String nodes naming referenced column(s) */

column(s) should read columns. 's' is not optional.

+ bool rd_statvalid; /* state of rd_statlist: true/false */

so bool can only be true or false. Good to know ;-) the comment is
probably useless, can you improve?

+ change the definition of a extended statistics

"a" should be "an", Also is statistics plural here. It's commonly mixed up
in the patch. I think it needs standardised. I personally think if you're
speaking of a single pg_statatic_ext row, then it should be singular. Yet,
I'm aware you're using plural for the CREATE STATISTICS command, to me that
feels a bit like: CREATE TABLES mytable (); am I somehow thinking wrongly
somehow here?

+ The name (optionally schema-qualified) of a statistics to be
altered.

"a" should be "the"

+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of

What's created in the current schema? I thought this was just for naming?

+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>

"create a table" ? create an extended statistic ?

+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>

Why are the examples missing? I've not looked beyond patch 0002 yet, but
I'd have assumed 0002 should be commitable without requiring later patches
to make it correct.

+ * statscmds.c
+ *  Commands for creating and altering extended statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California

2017.

+ * statistics might work with equality only.

extra space

+ /* costruction of array of enabled statistic */

construction?

+ atttuple = SearchSysCacheAttName(relid, attname);

+ if (!HeapTupleIsValid(atttuple))

+ ereport(ERROR,

+ (errcode(ERRCODE_UNDEFINED_COLUMN),

+ errmsg("column \"%s\" referenced in statistics does not exist",

+ attname)));

+ /* more than STATS_MAX_DIMENSIONS columns not allowed */

+ if (numcols >= STATS_MAX_DIMENSIONS)

+ ereport(ERROR,

+ (errcode(ERRCODE_TOO_MANY_COLUMNS),

+ errmsg("cannot have more than %d keys in statistics",

+ STATS_MAX_DIMENSIONS)));

+ attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;

+ ReleaseSysCache(atttuple);

Looks like a syscache leak. No?

+ /*
+ * Delete the pg_proc tuple.
+ */
+ relation = heap_open(StatisticExtRelationId, RowExclusiveLock);

pg_proc?

+ * pg_statistic_ext.h
+ *  definition of the system "extended statistic" relation
(pg_statistic_ext)
+ *  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group

2017

+ * stats.h
+ *  Multivariate statistics and selectivity estimation functions.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group

2017

"Multivariate" should be "Extended". My justification here is
that stats_are_built() is contained within, which is used
in get_relation_statistics() which is not specific to MV stats.

0003:

No more time today. Will try and get to those soon.

Setting to waiting on author in the meantime.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#215

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: David Rowley (#214)

Re: multivariate statistics (v25)

On 13 March 2017 at 23:00, David Rowley <david.rowley@2ndquadrant.com>
wrote:

0003:

No more time today. Will try and get to those soon.

0003:

I've now read this patch. My main aim here was to learn what it does and
how it works. I need to spend much longer understanding how your
calculating the functional dependencies.

In the meantime I've pasted the notes I took while reading over the patch.

+ default:
+ elog(ERROR, "unexcpected statistics type requested: %d", type);

"unexpected", but we generally use "unknown".

@@ -1293,7 +1294,8 @@ get_relation_statistics(RelOptInfo *rel, Relation
relation)
info->rel = rel;

  /* built/available statistics */
- info->ndist_built = true;
+ info->ndist_built = stats_are_built(htup, STATS_EXT_NDISTINCT);
+ info->deps_built = stats_are_built(htup, STATS_EXT_DEPENDENCIES);

I don't really like how this function is shaping up. You're calling
stats_are_built() potentially twice for each stats type. There must be a
nicer way to do this. Are non-built stats common enough to optimize
building a StatisticExtInfo regardless and throwing it away if it happens
to be useless?

Can you also rename mvoid to become something more esoid or similar. I seem
to always read it as m-void instead of mv-oid and naturally I expect a void
pointer rather than an Oid.

+dependencies, and for each one count the number of rows rows consistent it.

duplicate word "rows"

+Apllying the functional dependencies is fairly simple - given a list of

Applying

+In this case the default estimation based on AVIA principle happens to work

hmm, maybe I should know what AVIA principles are, but I don't. Is there
something I should be reading? I searched a bit around the internet for a
few minutes it didn't seem have a great idea either.

2017

+ Assert(tmp <= ((char *) output + len));

Shouldn't you just Assert(tmp == ((char *) output + len)); at the end of
the loop?

+ if (dependencies->magic != STATS_DEPS_MAGIC)
+ elog(ERROR, "invalid dependency magic %d (expected %dd)",
+ dependencies->magic, STATS_DEPS_MAGIC);
+
+ if (dependencies->type != STATS_DEPS_TYPE_BASIC)
+ elog(ERROR, "invalid dependency type %d (expected %dd)",
+ dependencies->type, STATS_DEPS_TYPE_BASIC);

%dd ?

+ Assert(dependencies->ndeps > 0);

Why Assert() and not elog() ? Wouldn't think mean that a corrupt dependency
could fail an Assert

+ dependencies = (MVDependencies) palloc0(sizeof(MVDependenciesData));

Why palloc0() and not palloc()?

Can you not just read it into a variable on the stack, then check the exact
size using tempdeps.ndeps * sizeof(MVDependency), then memcpy() it over?
That'll save you the realloc()

+ /* what minimum bytea size do we expect for those parameters */
+ expected_size = offsetof(MVDependenciesData, deps) +
+ dependencies->ndeps * (offsetof(MVDependencyData, attributes) +
+   sizeof(AttrNumber) * 2);

Can't quite make sense of this yet. Why * 2?

+ /* is the number of attributes valid? */
+ Assert((k >= 2) && (k <= STATS_MAX_DIMENSIONS));

Seems like a bad idea to Assert() this. Wouldn't some bad data being
deserialized cause an Assert failure?

+ d = (MVDependency) palloc0(offsetof(MVDependencyData, attributes) +
+   (k * sizeof(AttrNumber)));

Why palloc0(), you seem to write out all the fields right away. Seems like
a waste to zero the memory.

+ /* still within the bytea */
+ Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));

Any point? You're already Asserting that you've consumed the entire array
at the end anyway.

+ appendStringInfoString(&str, "[");

appendStringInfoChar(&str. '['); would be better.

+ ret = pstrdup(str.data);

ret = pnstrdup(str.data, str.len);

+CREATE STATISTICS s1 WITH (dependencies) ON (a,a) FROM
functional_dependencies;
+ERROR:  duplicate column name in statistics definition

Is it worth mentioning which column here?

I'll try to spend more time understanding 0003 soon.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#216

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Tomas Vondra (#212)

Re: multivariate statistics (v25)

I tried patch 0002 today and again there are conflicts, so I rebased and
fixed the merge problems. I also changed a number of minor things, all
AFAICS cosmetic in nature:

* moved src/backend/statistics/common.h to
src/include/statistics/common.h, as previously commented. I also took
out postgres.h and most of the includes; instead, put all these into
each .c source file. That aligns with our established practice.
I also removed two prototypes that should actually be in stats.h.
I think statistics/common.h should be further renamed to
statistics/stats_ext_internal.h, and statistics/stats.h to something
different though I don't know what ATM.

* Moved src/include/utils/stats.h to src/include/statistics, clean it up
a bit.

* Moved some structs from analyze.c into statistics/common.h, removing
some duplication; have analyze.c include that file.

* renamed src/test/regress/sql/mv_ndistinct.sql to stats_ext.sql, to
collect all ext.stats. related tests in a single file, instead of
having a large number of them. I also added one test that drops a
column, per David Rowley's reported failure, but I didn't actually fix
the problem nor add it to the expected file. (I'll follow up with
that tomorrow, if Tomas doesn't beat me to it). Also, put the test in
an earlier parallel test group, 'cause I see no reason to put it last.

* A bunch of stylistic changes.

The added tests pass (or they passed before I added the drop column
tests; not a surprise really that they pass, since I didn't touch
anything functionally), but they aren't terribly exhaustive at the stage
of the first patch in the series.

I didn't get around to addressing all of David Rowley's input. Also I
didn't try to rebase the remaining patches in the series on top of this
one.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#217

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#216)

1 attachment(s)

Re: multivariate statistics (v25)

Alvaro Herrera wrote:

I tried patch 0002 today and again there are conflicts, so I rebased and
fixed the merge problems.

... and attached the patch.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

stats-ext-shared-infra.patchtext/plain; charset=us-asciiDownload

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2c2da2a..b5c4129 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -296,6 +296,11 @@
      </row>
 
      <row>
+      <entry><link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link></entry>
+      <entry>extended planner statistics</entry>
+     </row>
+
+     <row>
       <entry><link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link></entry>
       <entry>logical replication subscriptions</entry>
      </row>
@@ -4223,6 +4228,98 @@
   </table>
  </sect1>
 
+ <sect1 id="catalog-pg-statistic-ext">
+  <title><structname>pg_statistic_ext</structname></title>
+
+  <indexterm zone="catalog-pg-statistic-ext">
+   <primary>pg_statistic_ext</primary>
+  </indexterm>
+
+  <para>
+   The catalog <structname>pg_statistic_ext</structname>
+   holds extended planner statistics.
+  </para>
+
+  <table>
+   <title><structname>pg_statistic_ext</> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+    <tbody>
+
+     <row>
+      <entry><structfield>starelid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
+      <entry>The table that the described columns belongs to</entry>
+     </row>
+
+     <row>
+      <entry><structfield>staname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry></entry>
+      <entry>Name of the statistic.</entry>
+     </row>
+
+     <row>
+      <entry><structfield>stanamespace</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the namespace that contains this statistic
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>staowner</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.oid</literal></entry>
+      <entry>Owner of the statistic</entry>
+     </row>
+
+     <row>
+      <entry><structfield>staenabled</structfield></entry>
+      <entry><type>char[]</type></entry>
+      <entry></entry>
+      <entry>
+        An array with the modes of the enabled statistic types, encoded as
+        <literal>d</literal> for ndistinct coefficients.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>stakeys</structfield></entry>
+      <entry><type>int2vector</type></entry>
+      <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
+      <entry>
+       This is an array of values that indicate which table columns this
+       statistic covers. For example a value of <literal>1 3</literal> would
+       mean that the first and the third table columns make up the statistic key.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>standistinct</structfield></entry>
+      <entry><type>pg_ndistinct</type></entry>
+      <entry></entry>
+      <entry>
+       Ndistict coefficients, serialized as <structname>pg_ndistinct</> type.
+      </entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
  <sect1 id="catalog-pg-namespace">
   <title><structname>pg_namespace</structname></title>
 
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index b73c66b..76955e5 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -448,4 +448,145 @@ rows = (outer_cardinality * inner_cardinality) * selectivity
 
  </sect1>
 
+ <sect1 id="extended-statistics">
+  <title>Extended Statistics</title>
+
+  <indexterm zone="extended-statistics">
+   <primary>extended statistics</primary>
+   <secondary>planner</secondary>
+  </indexterm>
+
+  <para>
+   The examples presented in <xref linkend="row-estimation-examples"> used
+   statistics about individual columns to compute selectivity estimates.
+   When estimating conditions on multiple columns, the planner assumes
+   independence of the conditions and multiplies the selectivities. When the
+   columns are correlated, the independence assumption is violated, and the
+   estimates may be off by several orders of magnitude, resulting in poor
+   plan choices.
+  </para>
+
+  <para>
+   The examples presented below demonstrate such estimation errors on simple
+   data sets, and also how to resolve them by creating extended statistics
+   using <command>CREATE STATISTICS</> command.
+  </para>
+
+  <para>
+   Let's start with a very simple data set - a table with two columns,
+   containing exactly the same values:
+
+<programlisting>
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i);
+ANALYZE t;
+</programlisting>
+
+   As explained in <xref linkend="planner-stats">, the planner can determine
+   cardinality of <structname>t</structname> using the number of pages and
+   rows is looked up in <structname>pg_class</structname>:
+
+<programlisting>
+SELECT relpages, reltuples FROM pg_class WHERE relname = 't';
+
+ relpages | reltuples
+----------+-----------
+       45 |     10000
+</programlisting>
+
+   The data distribution is very simple - there are only 100 distinct values
+   in each column, uniformly distributed.
+  </para>
+
+  <para>
+   The following example shows the result of estimating a <literal>WHERE</>
+   condition on the <structfield>a</> column:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
+                                           QUERY PLAN                                            
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual time=0.031..2.870 rows=100 loops=1)
+   Filter: (a = 1)
+   Rows Removed by Filter: 9900
+ Planning time: 0.092 ms
+ Execution time: 3.103 ms
+(5 rows)
+</programlisting>
+
+   The planner examines the condition and computes the estimate using
+   <function>eqsel</>, the selectivity function for <literal>=</>, and
+   statistics stored in the <structname>pg_stats</> table. In this case
+   the planner estimates the condition matches 1% rows, and by comparing
+   the estimated and actual number of rows, we see that the estimate is
+   very accurate (in fact exact, as the table is very small).
+ </para>
+
+  <para>
+   Adding a condition on the second column results in the following plan:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.033..3.006 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.121 ms
+ Execution time: 3.220 ms
+(5 rows)
+</programlisting>
+
+   The planner estimates the selectivity for each condition individually,
+   arriving to the 1% estimates as above, and then multiplies them, getting
+   the final 0.01% estimate. The plan however shows that this results in
+   a significant underestimate, as the actual number of rows matching the
+   conditions is two orders of magnitude higher than estimated.
+  </para>
+
+  <para>
+   Overestimates, i.e. errors in the opposite direction, are also possible.
+   Consider for example the following combination of range conditions, each
+   matching 
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=2500 width=8) (actual time=1.607..1.607 rows=0 loops=1)
+   Filter: ((a <= 49) AND (b > 49))
+   Rows Removed by Filter: 10000
+ Planning time: 0.050 ms
+ Execution time: 1.623 ms
+(5 rows)
+</programlisting>
+
+   The planner examines both <literal>WHERE</> clauses and estimates them
+   using the <function>scalarltsel</> and <function>scalargtsel</> functions,
+   specified as the selectivity functions matching the <literal>&lt;=</> and
+   <literal>&gt;</literal> operators. Both conditions match 50% of the
+   table, and assuming independence the planner multiplies them to compute
+   the total estimate of 25%. However as the explain output shows, the actual
+   number of rows is 0, because the columns are correlated and the conditions
+   contradict each other.
+  </para>
+
+  <para>
+   Both estimation errors are caused by violation of the independence
+   assumption, as the two columns contain exactly the same values, and are
+   therefore perfectly correlated. Providing additional information about
+   correlation between columns is the purpose of extended statistics,
+   and the rest of this section explains in more detail how the planner
+   leverages them to improve estimates.
+  </para>
+
+  <para>
+   For additional details about extended statistics, see
+   <filename>src/backend/statistics/README</>. There are additional
+   <literal>READMEs</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
+
+ </sect1>
+
 </chapter>
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 2bc4d9f..255e800 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -34,6 +34,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY alterSequence      SYSTEM "alter_sequence.sgml">
 <!ENTITY alterSubscription  SYSTEM "alter_subscription.sgml">
 <!ENTITY alterSystem        SYSTEM "alter_system.sgml">
+<!ENTITY alterStatistics    SYSTEM "alter_statistics.sgml">
 <!ENTITY alterTable         SYSTEM "alter_table.sgml">
 <!ENTITY alterTableSpace    SYSTEM "alter_tablespace.sgml">
 <!ENTITY alterTSConfig      SYSTEM "alter_tsconfig.sgml">
@@ -80,6 +81,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createSubscription SYSTEM "create_subscription.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
@@ -126,6 +128,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropSubscription   SYSTEM "drop_subscription.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
diff --git a/doc/src/sgml/ref/alter_statistics.sgml b/doc/src/sgml/ref/alter_statistics.sgml
new file mode 100644
index 0000000..35cbc09
--- /dev/null
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -0,0 +1,115 @@
+<!--
+doc/src/sgml/ref/alter_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-ALTERSTATISTICS">
+ <indexterm zone="sql-alterstatistics">
+  <primary>ALTER STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>ALTER STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>ALTER STATISTICS</refname>
+  <refpurpose>
+   change the definition of a extended statistics
+  </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> | CURRENT_USER | SESSION_USER }
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> RENAME TO <replaceable class="parameter">new_name</replaceable>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> SET SCHEMA <replaceable class="parameter">new_schema</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>ALTER STATISTICS</command> changes the parameters of an existing
+   extended statistics.  Any parameters not specifically set in the
+   <command>ALTER STATISTICS</command> command retain their prior settings.
+  </para>
+
+  <para>
+   You must own the statistics to use <command>ALTER STATISTICS</>.
+   To change a statistics' schema, you must also have <literal>CREATE</>
+   privilege on the new schema.
+   To alter the owner, you must also be a direct or indirect member of the new
+   owning role, and that role must have <literal>CREATE</literal> privilege on
+   the statistics' schema.  (These restrictions enforce that altering the owner
+   doesn't do anything you couldn't do by dropping and recreating the statistics.
+   However, a superuser can alter ownership of any statistics anyway.)
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><replaceable class="parameter">name</replaceable></term>
+      <listitem>
+       <para>
+        The name (optionally schema-qualified) of a statistics to be altered.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="PARAMETER">new_owner</replaceable></term>
+      <listitem>
+       <para>
+        The user name of the new owner of the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="parameter">new_name</replaceable></term>
+      <listitem>
+       <para>
+        The new name for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="parameter">new_schema</replaceable></term>
+      <listitem>
+       <para>
+        The new schema for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+  </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>ALTER STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 077c003..f3ad5ed 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -119,9 +119,12 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
      <para>
       This form drops a column from a table.  Indexes and
       table constraints involving the column will be automatically
-      dropped as well.  You will need to say <literal>CASCADE</> if
-      anything outside the table depends on the column, for example,
-      foreign key references or views.
+      dropped as well.
+      Multivariate statistics referencing the dropped column will also be
+      removed if the removal of the column would cause the statistics to
+      contain data for only a single column.
+      You will need to say <literal>CASCADE</> if anything outside the table
+      depends on the column, for example, foreign key references or views.
       If <literal>IF EXISTS</literal> is specified and the column
       does not exist, no error is thrown. In this case a notice
       is issued instead.
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..5919a25
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,152 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define extended statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
+  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+  FROM <replaceable class="PARAMETER">table_name</replaceable>
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new extended statistics
+   on the table. The statistics will be created in the current database and
+   will be owned by the user issuing the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+
+  <para>
+   To be able to create a table, you must have <literal>USAGE</literal>
+   privilege on all column types or the type in the <literal>OF</literal>
+   clause, respectively.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   Create table <structname>t1</> with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for determining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
+
+<programlisting>
+CREATE TABLE t1 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t1 SELECT i/100, i/500
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s1 ON (a, b) FROM t1;
+
+ANALYZE t1;
+
+-- valid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
+
+-- invalid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..d7c657f
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,91 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove extended statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics do not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index c8191de..aa8a157 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -60,6 +60,7 @@
    &alterSchema;
    &alterSequence;
    &alterServer;
+   &alterStatistics;
    &alterSubscription;
    &alterSystem;
    &alterTable;
@@ -108,6 +109,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createSubscription;
    &createTable;
    &createTableAs;
@@ -154,6 +156,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropSubscription;
    &dropTable;
    &dropTableSpace;
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 7a0bbb2..426ef4f 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -19,7 +19,7 @@ include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = access bootstrap catalog parser commands executor foreign lib libpq \
 	main nodes optimizer port postmaster regex replication rewrite \
-	storage tcop tsearch utils $(top_builddir)/src/timezone
+	statistics storage tcop tsearch utils $(top_builddir)/src/timezone
 
 include $(srcdir)/common.mk
 
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 3136858..ff7cc79 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -33,6 +33,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_statistic_ext.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index be86d76..1d71c7c 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -48,6 +48,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_proc.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
@@ -5104,6 +5105,32 @@ pg_subscription_ownercheck(Oid sub_oid, Oid roleid)
 }
 
 /*
+ * Ownership check for a extended statistics (specified by OID).
+ */
+bool
+pg_statistics_ownercheck(Oid stat_oid, Oid roleid)
+{
+	HeapTuple	tuple;
+	Oid			ownerId;
+
+	/* Superusers bypass all permission checking. */
+	if (superuser_arg(roleid))
+		return true;
+
+	tuple = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(stat_oid));
+	if (!HeapTupleIsValid(tuple))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics with OID %u do not exist", stat_oid)));
+
+	ownerId = ((Form_pg_statistic_ext) GETSTRUCT(tuple))->staowner;
+
+	ReleaseSysCache(tuple);
+
+	return has_privs_of_role(roleid, ownerId);
+}
+
+/*
  * Check whether specified role has CREATEROLE privilege (or is a superuser)
  *
  * Note: roles do not have owners per se; instead we use this test in
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fc088b2..ee27cae 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -51,6 +51,7 @@
 #include "catalog/pg_publication.h"
 #include "catalog/pg_publication_rel.h"
 #include "catalog/pg_rewrite.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_transform.h"
@@ -154,6 +155,7 @@ static const Oid object_classes[] = {
 	RewriteRelationId,			/* OCLASS_REWRITE */
 	TriggerRelationId,			/* OCLASS_TRIGGER */
 	NamespaceRelationId,		/* OCLASS_SCHEMA */
+	StatisticExtRelationId,		/* OCLASS_STATISTIC_EXT */
 	TSParserRelationId,			/* OCLASS_TSPARSER */
 	TSDictionaryRelationId,		/* OCLASS_TSDICT */
 	TSTemplateRelationId,		/* OCLASS_TSTEMPLATE */
@@ -1263,6 +1265,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTIC_EXT:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2377,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
 		case NamespaceRelationId:
 			return OCLASS_SCHEMA;
 
+		case StatisticExtRelationId:
+			return OCLASS_STATISTIC_EXT;
+
 		case TSParserRelationId:
 			return OCLASS_TSPARSER;
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 41c0056..c944b57 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -52,6 +52,7 @@
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_partitioned_table.h"
 #include "catalog/pg_statistic.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
 #include "catalog/pg_type_fn.h"
@@ -1608,7 +1609,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveStatisticsExt(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1856,6 +1860,11 @@ heap_drop_with_catalog(Oid relid)
 	RemoveStatistics(relid, 0);
 
 	/*
+	 * delete multi-variate statistics
+	 */
+	RemoveStatisticsExt(relid, 0);
+
+	/*
 	 * delete attribute tuples
 	 */
 	DeleteAttributeTuples(relid);
@@ -2766,6 +2775,98 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveStatisticsExt --- remove entries in pg_statistic_ext for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveStatisticsExt(Oid relid, AttrNumber attnum)
+{
+	Relation	pgstatisticext;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single remaining
+	 * (undropped column). To do that, we need the tuple descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER TABLE ...
+	 * DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation	rel = relation_open(relid, NoLock);
+
+		/* extended stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgstatisticext = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_statistic_ext_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgstatisticext,
+							  StatisticExtRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool		delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(STATEXTOID, tuple,
+									 Anum_pg_statistic_ext_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16 *) ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((!tupdesc->attrs[attnums[i] - 1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+			simple_heap_delete(pgstatisticext, &tuple->t_self);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgstatisticext, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index a38da30..e521bd9 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -2086,6 +2086,62 @@ ConversionIsVisible(Oid conid)
 }
 
 /*
+ * get_statistics_oid - find a statistics by possibly qualified name
+ *
+ * If not found, returns InvalidOid if missing_ok, else throws error
+ */
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(STATEXTNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(STATEXTNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" do not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
+
+/*
  * get_ts_parser_oid - find a TS parser by possibly qualified name
  *
  * If not found, returns InvalidOid if missing_ok, else throws error
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 3a7f049..a346215 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -48,6 +48,7 @@
 #include "catalog/pg_publication.h"
 #include "catalog/pg_publication_rel.h"
 #include "catalog/pg_rewrite.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_transform.h"
@@ -478,6 +479,18 @@ static const ObjectPropertyType ObjectProperty[] =
 		InvalidAttrNumber,
 		ACL_KIND_SUBSCRIPTION,
 		true
+	},
+	{
+		StatisticExtRelationId,
+		StatisticExtOidIndexId,
+		STATEXTOID,
+		STATEXTNAMENSP,
+		Anum_pg_statistic_ext_staname,
+		Anum_pg_statistic_ext_stanamespace,
+		Anum_pg_statistic_ext_staowner,
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		-1,						/* no ACL */
+		true
 	}
 };
 
@@ -696,6 +709,10 @@ static const struct object_type_map
 	/* OCLASS_TRANSFORM */
 	{
 		"transform", OBJECT_TRANSFORM
+	},
+	/* OBJECT_STATISTICS */
+	{
+		"statistics", OBJECT_STATISTICS
 	}
 };
 
@@ -974,6 +991,12 @@ get_object_address(ObjectType objtype, Node *object,
 				address = get_object_address_defacl(castNode(List, object),
 													missing_ok);
 				break;
+			case OBJECT_STATISTICS:
+				address.classId = StatisticExtRelationId;
+				address.objectId = get_statistics_oid(castNode(List, object),
+													  missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2079,6 +2102,7 @@ pg_get_object_address(PG_FUNCTION_ARGS)
 		case OBJECT_ATTRIBUTE:
 		case OBJECT_COLLATION:
 		case OBJECT_CONVERSION:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSPARSER:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSTEMPLATE:
@@ -2366,6 +2390,10 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTICS:
+			if (!pg_statistics_ownercheck(address.objectId, roleid))
+				aclcheck_error_type(ACLCHECK_NOT_OWNER, address.objectId);
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
@@ -3853,6 +3881,10 @@ getObjectTypeDescription(const ObjectAddress *object)
 			appendStringInfoString(&buffer, "subscription");
 			break;
 
+		case OCLASS_STATISTIC_EXT:
+			appendStringInfoString(&buffer, "extended statistics");
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized %u", object->classId);
 			break;
@@ -4876,6 +4908,29 @@ getObjectIdentityParts(const ObjectAddress *object,
 				break;
 			}
 
+		case OCLASS_STATISTIC_EXT:
+			{
+				HeapTuple	tup;
+				Form_pg_statistic_ext formStatistic;
+				char	   *schema;
+
+				tup = SearchSysCache1(STATEXTOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for statistics %u",
+						 object->objectId);
+				formStatistic = (Form_pg_statistic_ext) GETSTRUCT(tup);
+				schema = get_namespace_name_or_temp(formStatistic->stanamespace);
+				appendStringInfoString(&buffer,
+									   quote_qualified_identifier(schema,
+										   NameStr(formStatistic->staname)));
+				if (objname)
+					*objname = list_make2(schema,
+								   pstrdup(NameStr(formStatistic->staname)));
+				ReleaseSysCache(tup);
+			}
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0bce209..f3b3578 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -186,6 +186,16 @@ CREATE OR REPLACE VIEW pg_sequences AS
     WHERE NOT pg_is_other_temp_schema(N.oid)
           AND relkind = 'S';
 
+CREATE VIEW pg_stats_ext AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(s.standistinct) AS ndistbytes
+    FROM (pg_statistic_ext S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e0fab38..4a6c99e 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
-	schemacmds.o seclabel.o sequence.o subscriptioncmds.o tablecmds.o \
-	tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
-	vacuumlazy.o variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
+	vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index cf1391c..bf1aba1 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -373,6 +373,7 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
 		case OBJECT_LANGUAGE:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -489,6 +490,7 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -803,6 +805,7 @@ ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABLESPACE:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSCONFIGURATION:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index b91df98..39d9bdb 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -17,6 +17,7 @@
 #include <math.h>
 
 #include "access/multixact.h"
+#include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tupconvert.h"
 #include "access/tuptoaster.h"
@@ -28,6 +29,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
 #include "catalog/pg_namespace.h"
+#include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
@@ -39,13 +41,17 @@
 #include "parser/parse_relation.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
+#include "statistics/common.h"
+#include "statistics/stats.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -566,6 +572,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build extended statistics (if there are any). */
+		build_ext_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
@@ -1683,19 +1692,6 @@ ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
  */
 typedef struct
 {
-	Oid			eqopr;			/* '=' operator for datatype, if any */
-	Oid			eqfunc;			/* and associated function */
-	Oid			ltopr;			/* '<' operator for datatype, if any */
-} StdAnalyzeData;
-
-typedef struct
-{
-	Datum		value;			/* a data value */
-	int			tupno;			/* position index for tuple it came from */
-} ScalarItem;
-
-typedef struct
-{
 	int			count;			/* # of duplicates */
 	int			first;			/* values[] index of first occurrence */
 } ScalarMCVItem;
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index ab73fbf..e7ae4a5 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -286,6 +286,13 @@ does_not_exist_skipping(ObjectType objtype, Node *object)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = strVal((Value *) object);
 			break;
+		case OBJECT_STATISTICS:
+			if (!schema_does_not_exist_skipping(castNode(List, object), &msg, &name))
+			{
+				msg = gettext_noop("statistics \"%s\" do not exist, skipping");
+				name = NameListToString(castNode(List, object));
+			}
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(castNode(List, object), &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 346b347..b84a10f 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -112,6 +112,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"SUBSCRIPTION", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
@@ -1108,6 +1109,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
 		case OBJECT_SUBSCRIPTION:
+		case OBJECT_STATISTICS:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1173,6 +1175,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_PUBLICATION:
 		case OCLASS_PUBLICATION_REL:
 		case OCLASS_SUBSCRIPTION:
+		case OCLASS_STATISTIC_EXT:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..77d7a36
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,270 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering extended statistics
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_statistic_ext.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "statistics/stats.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int
+compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON (columns) FROM table
+ *
+ * We do require that the types support sorting (ltopr), although some
+ * statistics might work with  equality only.
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i;
+	ListCell   *l;
+	int16		attnums[STATS_MAX_DIMENSIONS];
+	int			numcols = 0;
+	ObjectAddress address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+
+	HeapTuple	htup;
+	Datum		values[Natts_pg_statistic_ext];
+	bool		nulls[Natts_pg_statistic_ext];
+	int2vector *stakeys;
+	Relation	statrel;
+	Relation	rel;
+	Oid			relid;
+	ObjectAddress parentobject,
+				childobject;
+
+	/* costruction of array of enabled statistic */
+	Datum		types[1];		/* only ndistinct defined now */
+	int			ntypes;
+	ArrayType  *staenabled;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (SearchSysCacheExists2(STATEXTNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		if (stmt->if_not_exists)
+		{
+			ereport(NOTICE,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("statistics \"%s\" already exist, skipping",
+							namestr)));
+			return InvalidObjectAddress;
+		}
+
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exist", namestr)));
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+	relid = RelationGetRelid(rel);
+
+	/* ndistinct coefficients is the only known type of extended statistics */
+	ntypes = 1;
+	types[0] = CharGetDatum(STATS_EXT_NDISTINCT);
+
+	/*
+	 * Transform column names to array of attnums. While doing that, we also
+	 * enforce the maximum number of keys.
+	 */
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(relid, attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+			  errmsg("column \"%s\" referenced in statistics does not exist",
+					 attname)));
+
+		/* more than STATS_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= STATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in statistics",
+							STATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check that at least two columns were specified in the statement. The
+	 * upper bound was already checked in the loop above.
+	 */
+	if (numcols < 2)
+		ereport(ERROR,
+				(errcode(ERRCODE_TOO_MANY_COLUMNS),
+				 errmsg("statistics require at least 2 columns")));
+
+	/*
+	 * Sort the attnums, which makes detecting duplicies somewhat easier, and
+	 * it does not hurt (it does not affect the efficiency, unlike for
+	 * indexes, for example).
+	 */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+
+	/*
+	 * Look for duplicities in the list of columns. The attnums are sorted so
+	 * just check consecutive elements.
+	 */
+	for (i = 1; i < numcols; i++)
+		if (attnums[i] == attnums[i - 1])
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+				  errmsg("duplicate column name in statistics definition")));
+
+	stakeys = buildint2vector(attnums, numcols);
+
+	/* construct the char array of enabled statistic types */
+	staenabled = construct_array(types, ntypes, CHAROID, 1, true, 'c');
+
+	/*
+	 * Everything seems fine, so let's build the pg_statistic_ext entry. At
+	 * this point we obviously only have the keys and options.
+	 */
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* metadata */
+	values[Anum_pg_statistic_ext_starelid - 1] = ObjectIdGetDatum(relid);
+	values[Anum_pg_statistic_ext_staname - 1] = NameGetDatum(&staname);
+	values[Anum_pg_statistic_ext_stanamespace - 1] = ObjectIdGetDatum(namespaceId);
+	values[Anum_pg_statistic_ext_staowner - 1] = ObjectIdGetDatum(GetUserId());
+
+	values[Anum_pg_statistic_ext_stakeys - 1] = PointerGetDatum(stakeys);
+
+	/* enabled statistics */
+	values[Anum_pg_statistic_ext_staenabled - 1] = PointerGetDatum(staenabled);
+
+	/* no statistics build yet */
+	nulls[Anum_pg_statistic_ext_standistinct - 1] = true;
+
+	/* insert the tuple into pg_statistic_ext */
+	statrel = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(statrel->rd_att, values, nulls);
+
+	CatalogTupleInsert(statrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+	/*
+	 * Add a dependency on a table, so that stats get dropped on DROP TABLE.
+	 */
+	ObjectAddressSet(parentobject, RelationRelationId, relid);
+	ObjectAddressSet(childobject, StatisticExtRelationId, statoid);
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also add dependency on the schema (to drop statistics on DROP SCHEMA).
+	 * This is not handled automatically by DROP TABLE because statistics have
+	 * their own schema.
+	 */
+	ObjectAddressSet(parentobject, NamespaceRelationId, namespaceId);
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	heap_close(statrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, StatisticExtRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *	   DROP STATISTICS stats_name
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	Oid			relid;
+	Relation	rel;
+	HeapTuple	tup;
+	Form_pg_statistic_ext statext;
+
+	/*
+	 * Delete the pg_proc tuple.
+	 */
+	relation = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statsOid));
+
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	statext = (Form_pg_statistic_ext) GETSTRUCT(tup);
+	relid = statext->starelid;
+
+	rel = heap_open(relid, AccessExclusiveLock);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	CacheInvalidateRelcache(rel);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+	heap_close(rel, NoLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index bfc2ac1..9a34f94 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4447,6 +4447,19 @@ _copyDropSubscriptionStmt(const DropSubscriptionStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_SCALAR_FIELD(if_not_exists);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -5385,6 +5398,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_ObjectWithArgs:
 			retval = _copyObjectWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7418fbe..953e6e2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2266,6 +2266,18 @@ _outForeignKeyOptInfo(StringInfo str, const ForeignKeyOptInfo *node)
 }
 
 static void
+_outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
+{
+	WRITE_NODE_TYPE("STATISTICEXTINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(statOid);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(ndist_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3915,6 +3927,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_StatisticExtInfo:
+				_outStatisticExtInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 463f806..d90f199 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -29,6 +29,7 @@
 #include "catalog/heap.h"
 #include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_statistic_ext.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -40,8 +41,11 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "statistics/stats.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -63,7 +67,7 @@ static List *get_relation_constraints(PlannerInfo *root,
 						 bool include_notnull);
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 				  Relation heapRelation);
-
+static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
 
 /*
  * get_relation_info -
@@ -398,6 +402,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	rel->statlist = get_relation_statistics(rel, relation);
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
@@ -1251,6 +1257,64 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * get_relation_statistics
+ *
+ * Retrieve extended statistics defined on the table.
+ *
+ * Returns a List (possibly empty) of StatisticExtInfo objects describing
+ * the statistics.  Only attributes needed for selecting statistics are
+ * retrieved (columns covered by the statistics, etc.).
+ */
+static List *
+get_relation_statistics(RelOptInfo *rel, Relation relation)
+{
+	List	   *statoidlist;
+	ListCell   *l;
+	List	   *stainfos = NIL;
+
+	statoidlist = RelationGetStatExtList(relation);
+
+	foreach(l, statoidlist)
+	{
+		ArrayType  *arr;
+		Datum		adatum;
+		bool		isnull;
+		Oid			statOid = lfirst_oid(l);
+
+		HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
+
+		/* unavailable stats are not interesting for the planner */
+		if (stats_are_built(htup, STATS_EXT_NDISTINCT))
+		{
+			StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+			info->statOid = statOid;
+			info->rel = rel;
+
+			/* built/available statistics */
+			info->ndist_built = true;
+
+			/* decode the stakeys array */
+			adatum = SysCacheGetAttr(STATEXTOID, htup,
+									 Anum_pg_statistic_ext_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+
+			info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+											ARR_DIMS(arr)[0]);
+
+			stainfos = lcons(info, stainfos);
+		}
+
+		ReleaseSysCache(htup);
+	}
+
+	list_free(statoidlist);
+
+	return stainfos;
+}
 
 /*
  * relation_excluded_by_constraints
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e7acc2d..a0801dc 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -257,7 +257,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -873,6 +873,7 @@ stmt :
 			| CreateSeqStmt
 			| CreateStmt
 			| CreateSubscriptionStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3746,6 +3747,34 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON '(' columnList ')' FROM qualified_name
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $9;
+							n->keys = $6;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON '(' columnList ')' FROM qualified_name
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $12;
+							n->keys = $9;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -6033,6 +6062,7 @@ drop_type_name:
 			| PUBLICATION							{ $$ = OBJECT_PUBLICATION; }
 			| SCHEMA								{ $$ = OBJECT_SCHEMA; }
 			| SERVER								{ $$ = OBJECT_FOREIGN_SERVER; }
+			| STATISTICS							{ $$ = OBJECT_STATISTICS; }
 		;
 
 /* object types attached to a table */
@@ -8377,6 +8407,15 @@ RenameStmt: ALTER AGGREGATE aggregate_with_argtypes RENAME TO name
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name RENAME TO name
+				{
+					RenameStmt *n = makeNode(RenameStmt);
+					n->renameType = OBJECT_STATISTICS;
+					n->object = (Node *) $3;
+					n->newname = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 			| ALTER TEXT_P SEARCH PARSER any_name RENAME TO name
 				{
 					RenameStmt *n = makeNode(RenameStmt);
@@ -8592,6 +8631,15 @@ AlterObjectSchemaStmt:
 					n->missing_ok = true;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name SET SCHEMA name
+				{
+					AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = (Node *) $3;
+					n->newschema = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 			| ALTER TEXT_P SEARCH PARSER any_name SET SCHEMA name
 				{
 					AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
@@ -8855,6 +8903,14 @@ AlterOwnerStmt: ALTER AGGREGATE aggregate_with_argtypes OWNER TO RoleSpec
 					n->newowner = $6;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS name OWNER TO RoleSpec
+				{
+					AlterOwnerStmt *n = makeNode(AlterOwnerStmt);
+					n->objectType = OBJECT_STATISTICS;
+					n->object = (Node *) makeString($3);
+					n->newowner = $6;
+					$$ = (Node *)n;
+				}
 			| ALTER TEXT_P SEARCH DICTIONARY any_name OWNER TO RoleSpec
 				{
 					AlterOwnerStmt *n = makeNode(AlterOwnerStmt);
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
new file mode 100644
index 0000000..e77b350
--- /dev/null
+++ b/src/backend/statistics/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for statistics
+#
+# IDENTIFICATION
+#    src/backend/statistics/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/statistics
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o mvdist.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statistics/README b/src/backend/statistics/README
new file mode 100644
index 0000000..beb7c24
--- /dev/null
+++ b/src/backend/statistics/README
@@ -0,0 +1,34 @@
+Extended statistics
+===================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Extended statistics track different types of dependencies between the columns,
+hopefully improving the estimates and producing better plans.
+
+Currently we only have one type of extended statistics - ndistinct
+coefficients, and we use it to improve estimates of grouping queries. See
+README.ndistinct for details.
+
+
+Size of sample in ANALYZE
+-------------------------
+When performing ANALYZE, the number of rows to sample is determined as
+
+    (300 * statistics_target)
+
+That works reasonably well for statistics on individual columns, but perhaps
+it's not enough for extended statistics. Papers analyzing estimation errors
+all use samples proportional to the table (usually finding that 1-3% of the
+table is enough to build accurate stats).
+
+The requested accuracy (number of MCV items or histogram bins) should also
+be considered when determining the sample size, and in extended statistics
+those are not necessarily limited by statistics_target.
+
+This however merits further discussion, because collecting the sample is quite
+expensive and increasing it further would make ANALYZE even more painful.
+Judging by the experiments with the current implementation, the fixed size
+seems to work reasonably well for now, so we leave this as a future work.
diff --git a/src/backend/statistics/README.ndistinct b/src/backend/statistics/README.ndistinct
new file mode 100644
index 0000000..9365b17
--- /dev/null
+++ b/src/backend/statistics/README.ndistinct
@@ -0,0 +1,22 @@
+ndistinct coefficients
+======================
+
+Estimating number of groups in a combination of columns (e.g. for GROUP BY)
+is tricky, and the estimation error is often significant.
+
+The ndistinct coefficients address this by storing ndistinct estimates not
+only for individual columns, but also for (all) combinations of columns.
+So for example given three columns (a,b,c) the statistics will estimate
+ndistinct for (a,b), (a,c), (b,c) and (a,b,c). The per-column estimates
+are already available in pg_statistic.
+
+
+GROUP BY estimation (estimate_num_groups)
+-----------------------------------------
+
+Although ndistinct coefficient might be used for selectivity estimation
+(of equality conditions in WHERE clause), that is not implemented at this
+point.
+
+Instead, ndistinct coefficients are only used in estimate_num_groups() to
+estimate grouped queries.
diff --git a/src/backend/statistics/common.c b/src/backend/statistics/common.c
new file mode 100644
index 0000000..f63d8cc
--- /dev/null
+++ b/src/backend/statistics/common.c
@@ -0,0 +1,454 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES extended statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/statistics/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_statistic_ext.h"
+#include "nodes/relation.h"
+#include "statistics/common.h"
+#include "statistics/stats.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+static VacAttrStats **lookup_var_attr_stats(int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats);
+
+static List *list_ext_stats(Oid relid);
+
+static void update_ext_stats(Oid relid, MVNDistinct ndistinct,
+				 int2vector *attrs, VacAttrStats **stats);
+
+
+/*
+ * Compute requested extended stats, using the rows sampled for the plain
+ * (single-column) stats.
+ *
+ * This fetches a list of stats from pg_statistic_ext, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_ext_stats(Relation onerel, double totalrows,
+				int numrows, HeapTuple *rows,
+				int natts, VacAttrStats **vacattrstats)
+{
+	ListCell   *lc;
+	List	   *stats;
+
+	TupleDesc	tupdesc = RelationGetDescr(onerel);
+
+	/* Fetch defined statistics from pg_statistic_ext, and compute them. */
+	stats = list_ext_stats(RelationGetRelid(onerel));
+
+	foreach(lc, stats)
+	{
+		int			j;
+		StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(lc);
+		MVNDistinct ndistinct = NULL;
+
+		VacAttrStats **stats = NULL;
+		int			numatts = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector *attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (!tupdesc->attrs[attrs->values[j] - 1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16	   *tmp = palloc0(numatts * sizeof(int16));
+			int			attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (!tupdesc->attrs[attrs->values[j] - 1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= STATS_MAX_DIMENSIONS));
+
+		/* compute ndistinct coefficients */
+		if (stat->ndist_enabled)
+			ndistinct = build_ext_ndistinct(totalrows, numrows, rows, attrs, stats);
+
+		/* store the statistics in the catalog */
+		update_ext_stats(stat->statOid, ndistinct, attrs, stats);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing extended stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int			i,
+				j;
+	int			numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats **) palloc0(numattrs * sizeof(VacAttrStats *));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and that
+		 * there's the requested 'lt' operator and that the type is
+		 * 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/*
+		 * FIXME This is rather ugly way to check for 'ltopr' (which is
+		 * defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *) stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List *
+list_ext_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/*
+	 * Prepare to scan pg_statistic_ext for entries having indrelid = this
+	 * rel.
+	 */
+	ScanKeyInit(&skey,
+				Anum_pg_statistic_ext_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(StatisticExtRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, StatisticExtRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		StatisticExtInfo *info = makeNode(StatisticExtInfo);
+		Form_pg_statistic_ext stats = (Form_pg_statistic_ext) GETSTRUCT(htup);
+
+		info->statOid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+
+		info->ndist_enabled = stats_are_enabled(htup, STATS_EXT_NDISTINCT);
+		info->ndist_built = stats_are_built(htup, STATS_EXT_NDISTINCT);
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/*
+	 * TODO maybe save the list into relcache, as in RelationGetIndexList
+	 * (which was used as an inspiration of this one)?.
+	 */
+
+	return result;
+}
+
+/*
+ * update_ext_stats
+ *	Serializes the statistics and stores them into the pg_statistic_ext tuple.
+ */
+static void
+update_ext_stats(Oid statOid, MVNDistinct ndistinct,
+				 int2vector *attrs, VacAttrStats **stats)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_statistic_ext];
+	bool		nulls[Natts_pg_statistic_ext];
+	bool		replaces[Natts_pg_statistic_ext];
+
+	Relation	sd = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+	memset(nulls, 1, Natts_pg_statistic_ext * sizeof(bool));
+	memset(replaces, 0, Natts_pg_statistic_ext * sizeof(bool));
+	memset(values, 0, Natts_pg_statistic_ext * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_statistic_ext tuple - replace only the histogram and
+	 * MCV list, depending whether it actually was computed.
+	 */
+	if (ndistinct != NULL)
+	{
+		bytea	   *data = serialize_ext_ndistinct(ndistinct);
+
+		nulls[Anum_pg_statistic_ext_standistinct - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_standistinct - 1] = PointerGetDatum(data);
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_statistic_ext_standistinct - 1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_statistic_ext_stakeys - 1] = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_statistic_ext_stakeys - 1] = true;
+
+	values[Anum_pg_statistic_ext_stakeys - 1] = PointerGetDatum(attrs);
+
+	/* Is there already a pg_statistic_ext tuple for this attribute? */
+	oldtup = SearchSysCache1(STATEXTOID,
+							 ObjectIdGetDatum(statOid));
+
+	if (!HeapTupleIsValid(oldtup))
+		elog(ERROR, "cache lookup failed for extended statistics %u", statOid);
+
+	/* replace it */
+	stup = heap_modify_tuple(oldtup,
+							 RelationGetDescr(sd),
+							 values,
+							 nulls,
+							 replaces);
+	ReleaseSysCache(oldtup);
+	CatalogTupleUpdate(sd, &stup->t_self, stup);
+
+	heap_freetuple(stup);
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum *) a;
+	Datum		db = *(Datum *) b;
+	SortSupport ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem *) a)->value;
+	Datum		db = ((ScalarItem *) b)->value;
+	SortSupport ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport) palloc0(offsetof(MultiSortSupportData, ssup)
+									 +sizeof(SortSupportData) * ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * Prepare sort support info for dimension 'dim' (index into vacattrstats) to
+ * 'mss', at the position 'sortdim'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *) vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int			i;
+	SortItem   *ia = (SortItem *) a;
+	SortItem   *ib = (SortItem *) b;
+
+	MultiSortSupport mss = (MultiSortSupport) arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int			compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
+
+int
+multi_sort_compare_dims(int start, int end,
+						const SortItem *a, const SortItem *b,
+						MultiSortSupport mss)
+{
+	int			dim;
+
+	for (dim = start; dim <= end; dim++)
+	{
+		int			r = ApplySortComparator(a->values[dim], a->isnull[dim],
+											b->values[dim], b->isnull[dim],
+											&mss->ssup[dim]);
+
+		if (r != 0)
+			return r;
+	}
+
+	return 0;
+}
+
+bool
+stats_are_enabled(HeapTuple htup, char type)
+{
+	Datum		datum;
+	bool		isnull;
+	int			i,
+				nenabled;
+	char	   *enabled;
+	ArrayType  *enabledArray;
+
+	/* see which statistics are enabled */
+	datum = SysCacheGetAttr(STATEXTOID, htup,
+							Anum_pg_statistic_ext_staenabled, &isnull);
+
+	/* if there are no values in staenabled field, everything is enabled */
+	if (isnull || (datum == PointerGetDatum(NULL)))
+		return false;
+
+	/*
+	 * We expect the array to be a 1-D CHAR array; verify that. We don't need
+	 * to use deconstruct_array() since the array data is just going to look
+	 * like a C array of char values.
+	 */
+	enabledArray = DatumGetArrayTypeP(datum);
+
+	if (ARR_NDIM(enabledArray) != 1 ||
+		ARR_HASNULL(enabledArray) ||
+		ARR_ELEMTYPE(enabledArray) != CHAROID)
+		elog(ERROR, "enabled statistics (staenabled) is not a 1-D char array");
+
+	nenabled = ARR_DIMS(enabledArray)[0];
+	enabled = (char *) ARR_DATA_PTR(enabledArray);
+
+	for (i = 0; i < nenabled; i++)
+		if (enabled[i] == type)
+			return true;
+
+	return false;
+}
+
+bool
+stats_are_built(HeapTuple htup, char type)
+{
+	bool		isnull;
+
+	switch (type)
+	{
+		case STATS_EXT_NDISTINCT:
+			SysCacheGetAttr(STATEXTOID, htup,
+							Anum_pg_statistic_ext_standistinct, &isnull);
+			break;
+
+		default:
+			elog(ERROR, "unexpected statistics type requested: %d", type);
+	}
+
+	return !isnull;
+}
diff --git a/src/backend/statistics/mvdist.c b/src/backend/statistics/mvdist.c
new file mode 100644
index 0000000..8f318da
--- /dev/null
+++ b/src/backend/statistics/mvdist.c
@@ -0,0 +1,621 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate ndistinct coefficients
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/statistics/mvdist.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/htup_details.h"
+#include "catalog/pg_statistic_ext.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
+#include "lib/stringinfo.h"
+#include "utils/syscache.h"
+#include "statistics/common.h"
+#include "statistics/stats.h"
+
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/* internal state for generator of k-combinations of n elements */
+typedef struct CombinationGeneratorData
+{
+
+	int			k;				/* size of the combination */
+	int			current;		/* index of the next combination to return */
+
+	int			ncombinations;	/* number of combinations (size of array) */
+	AttrNumber *combinations;	/* array of pre-built combinations */
+
+} CombinationGeneratorData;
+
+typedef CombinationGeneratorData *CombinationGenerator;
+
+/* generator API */
+static CombinationGenerator generator_init(int2vector *attrs, int k);
+static void generator_free(CombinationGenerator state);
+static AttrNumber *generator_next(CombinationGenerator state, int2vector *attrs);
+
+static int	n_choose_k(int n, int k);
+static int	num_combinations(int n);
+static double ndistinct_for_combination(double totalrows, int numrows,
+					HeapTuple *rows, int2vector *attrs, VacAttrStats **stats,
+						  int k, AttrNumber *combination);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+MVNDistinct
+build_ext_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+					int2vector *attrs, VacAttrStats **stats)
+{
+	int			i,
+				k;
+	int			numattrs = attrs->dim1;
+	int			numcombs = num_combinations(numattrs);
+
+	MVNDistinct result;
+
+	result = palloc0(offsetof(MVNDistinctData, items) +
+					 numcombs * sizeof(MVNDistinctItem));
+
+	result->nitems = numcombs;
+
+	i = 0;
+	for (k = 2; k <= numattrs; k++)
+	{
+		AttrNumber *combination;
+		CombinationGenerator generator;
+
+		generator = generator_init(attrs, k);
+
+		while ((combination = generator_next(generator, attrs)))
+		{
+			MVNDistinctItem *item = &result->items[i++];
+
+			item->nattrs = k;
+			item->ndistinct = ndistinct_for_combination(totalrows, numrows, rows,
+											   attrs, stats, k, combination);
+
+			item->attrs = palloc(k * sizeof(AttrNumber));
+			memcpy(item->attrs, combination, k * sizeof(AttrNumber));
+
+			/* must not overflow the output array */
+			Assert(i <= result->nitems);
+		}
+
+		generator_free(generator);
+	}
+
+	/* must consume exactly the whole output array */
+	Assert(i == result->nitems);
+
+	return result;
+}
+
+/*
+ * ndistinct_for_combination
+ *	Estimates number of distinct values in a combination of columns.
+ *
+ * This uses the same ndistinct estimator as compute_scalar_stats() in
+ * ANALYZE, i.e.,
+ *		n*d / (n - f1 + f1*n/N)
+ *
+ * except that instead of values in a single column we are dealing with
+ * combination of multiple columns.
+ */
+static double
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+						  int2vector *attrs, VacAttrStats **stats,
+						  int k, AttrNumber *combination)
+{
+	int			i,
+				j;
+	int			f1,
+				cnt,
+				d;
+	int			nmultiple,
+				summultiple;
+	bool	   *isnull;
+	Datum	   *values;
+	SortItem   *items;
+	MultiSortSupport mss;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed somehow
+	 * simpler / less error prone. Another option would be to allocate the
+	 * arrays for each SortItem separately, but that'd be significant overhead
+	 * (not just CPU, but especially memory bloat).
+	 */
+	mss = multi_sort_init(k);
+	items = (SortItem *) palloc0(numrows * sizeof(SortItem));
+	values = (Datum *) palloc0(sizeof(Datum) * numrows * k);
+	isnull = (bool *) palloc0(sizeof(bool) * numrows * k);
+
+	Assert((k >= 2) && (k <= attrs->dim1));
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	for (i = 0; i < k; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, combination[i], stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i] =
+				heap_getattr(rows[j], attrs->values[combination[i]],
+							 stats[combination[i]]->tupDesc,
+							 &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	return estimate_ndistinct(totalrows, numrows, d, f1);
+}
+
+MVNDistinct
+load_ext_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		ndist;
+
+	/*
+	 * Prepare to scan pg_statistic_ext for entries having indrelid = this
+	 * rel.
+	 */
+	HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+
+	Assert(stats_are_enabled(htup, STATS_EXT_NDISTINCT));
+	Assert(stats_are_built(htup, STATS_EXT_NDISTINCT));
+
+	ndist = SysCacheGetAttr(STATEXTOID, htup,
+							Anum_pg_statistic_ext_standistinct, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_ext_ndistinct(DatumGetByteaP(ndist));
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double		numer,
+				denom,
+				ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+		(double) f1 *(double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
+
+/*
+ * pg_ndistinct_in		- input routine for type pg_ndistinct.
+ *
+ * pg_ndistinct is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_ndistinct_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_ndistinct")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_ndistinct		- output routine for type pg_ndistinct.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ */
+Datum
+pg_ndistinct_out(PG_FUNCTION_ARGS)
+{
+	int			i,
+				j;
+	StringInfoData str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVNDistinct ndist = deserialize_ext_ndistinct(data);
+
+	initStringInfo(&str);
+	appendStringInfoChar(&str, '[');
+
+	for (i = 0; i < ndist->nitems; i++)
+	{
+		MVNDistinctItem item = ndist->items[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoChar(&str, '{');
+
+		for (j = 0; j < item.nattrs; j++)
+		{
+			if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", item.attrs[j]);
+		}
+
+		appendStringInfo(&str, ", %f", item.ndistinct);
+
+		appendStringInfoChar(&str, '}');
+	}
+
+	appendStringInfoChar(&str, ']');
+
+	PG_RETURN_CSTRING(str.data);
+}
+
+/*
+ * pg_ndistinct_recv		- binary input routine for type pg_ndistinct.
+ */
+Datum
+pg_ndistinct_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_ndistinct")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_ndistinct_send		- binary output routine for type pg_ndistinct.
+ *
+ * XXX Histograms are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_ndistinct_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+/*
+ * n_choose_k
+ *		computes binomial coefficients using an algorithm that is both
+ *		efficient and prevents overflows
+ */
+static int
+n_choose_k(int n, int k)
+{
+	int			d,
+				r;
+
+	Assert((k > 0) && (n >= k));
+
+	/* use symmetry of the binomial coefficients */
+	k = Min(k, n - k);
+
+	r = 1;
+	for (d = 1; d <= k; ++d)
+	{
+		r *= n--;
+		r /= d;
+	}
+
+	return r;
+}
+
+/*
+ * num_combinations
+ *		computes number of combinations, excluding single-value combinations
+ */
+static int
+num_combinations(int n)
+{
+	int			k;
+	int			ncombs = 1;
+
+	for (k = 1; k <= n; k++)
+		ncombs *= 2;
+
+	ncombs -= (n + 1);
+
+	return ncombs;
+}
+
+/*
+ * generate all combinations (k elements from n)
+ */
+static void
+generate_combinations_recurse(CombinationGenerator state, AttrNumber n,
+							int index, AttrNumber start, AttrNumber *current)
+{
+	/* If we haven't filled all the elements, simply recurse. */
+	if (index < state->k)
+	{
+		AttrNumber	i;
+
+		/*
+		 * The values have to be in ascending order, so make sure we start
+		 * with the value passed by parameter.
+		 */
+
+		for (i = start; i < n; i++)
+		{
+			current[index] = i;
+			generate_combinations_recurse(state, n, (index + 1), (i + 1), current);
+		}
+
+		return;
+	}
+	else
+	{
+		/* we got a correct combination */
+		state->combinations = (AttrNumber *) repalloc(state->combinations,
+					   state->k * (state->current + 1) * sizeof(AttrNumber));
+		memcpy(&state->combinations[(state->k * state->current)],
+			   current, state->k * sizeof(AttrNumber));
+		state->current++;
+	}
+}
+
+/* generate all k-combinations of n elements */
+static void
+generate_combinations(CombinationGenerator state, int n)
+{
+	AttrNumber *current = (AttrNumber *) palloc0(sizeof(AttrNumber) * state->k);
+
+	generate_combinations_recurse(state, n, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the generator of combinations, and prebuild them.
+ *
+ * This pre-builds all the combinations. We could also generate them in
+ * generator_next(), but this seems simpler.
+ */
+static CombinationGenerator
+generator_init(int2vector *attrs, int k)
+{
+	int			n = attrs->dim1;
+	CombinationGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the generator state as a single chunk of memory */
+	state = (CombinationGenerator) palloc0(sizeof(CombinationGeneratorData));
+	state->combinations = (AttrNumber *) palloc(k * sizeof(AttrNumber));
+
+	state->ncombinations = n_choose_k(n, k);
+	state->current = 0;
+	state->k = k;
+
+	/* now actually pre-generate all the combinations */
+	generate_combinations(state, n);
+
+	/* make sure we got the expected number of combinations */
+	Assert(state->current == state->ncombinations);
+
+	/* reset the number, so we start with the first one */
+	state->current = 0;
+
+	return state;
+}
+
+/* free the generator state */
+static void
+generator_free(CombinationGenerator state)
+{
+	/* we've allocated a single chunk, so just free it */
+	pfree(state);
+}
+
+/* generate next combination */
+static AttrNumber *
+generator_next(CombinationGenerator state, int2vector *attrs)
+{
+	if (state->current == state->ncombinations)
+		return NULL;
+
+	return &state->combinations[state->k * state->current++];
+}
+
+/*
+ * serialize list of ndistinct items into a bytea
+ */
+bytea *
+serialize_ext_ndistinct(MVNDistinct ndistinct)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+
+	/* we need to store nitems */
+	Size		len = VARHDRSZ + offsetof(MVNDistinctData, items) +
+	ndistinct->nitems * offsetof(MVNDistinctItem, attrs);
+
+	/* and also include space for the actual attribute numbers */
+	for (i = 0; i < ndistinct->nitems; i++)
+		len += (sizeof(AttrNumber) * ndistinct->items[i].nattrs);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	ndistinct->magic = STATS_NDISTINCT_MAGIC;
+	ndistinct->type = STATS_NDISTINCT_TYPE_BASIC;
+
+	/* first, store the number of items */
+	memcpy(tmp, ndistinct, offsetof(MVNDistinctData, items));
+	tmp += offsetof(MVNDistinctData, items);
+
+	/*
+	 * store number of attributes and attribute numbers for each ndistinct
+	 * entry
+	 */
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		MVNDistinctItem item = ndistinct->items[i];
+
+		memcpy(tmp, &item, offsetof(MVNDistinctItem, attrs));
+		tmp += offsetof(MVNDistinctItem, attrs);
+
+		memcpy(tmp, item.attrs, sizeof(AttrNumber) * item.nattrs);
+		tmp += sizeof(AttrNumber) * item.nattrs;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized ndistinct into MVNDistinct structure.
+ */
+MVNDistinct
+deserialize_ext_ndistinct(bytea *data)
+{
+	int			i;
+	Size		expected_size;
+	MVNDistinct ndistinct;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVNDistinctData, items))
+		elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVNDistinctData, items));
+
+	/* read the MVNDistinct header */
+	ndistinct = (MVNDistinct) palloc0(sizeof(MVNDistinctData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(ndistinct, tmp, offsetof(MVNDistinctData, items));
+	tmp += offsetof(MVNDistinctData, items);
+
+	if (ndistinct->magic != STATS_NDISTINCT_MAGIC)
+		elog(ERROR, "invalid ndistinct magic %d (expected %d)",
+			 ndistinct->magic, STATS_NDISTINCT_MAGIC);
+
+	if (ndistinct->type != STATS_NDISTINCT_TYPE_BASIC)
+		elog(ERROR, "invalid ndistinct type %d (expected %d)",
+			 ndistinct->type, STATS_NDISTINCT_TYPE_BASIC);
+
+	Assert(ndistinct->nitems > 0);
+
+	/* what minimum bytea size do we expect for those parameters */
+	expected_size = offsetof(MVNDistinctData, items) +
+		ndistinct->nitems * (offsetof(MVNDistinctItem, attrs) +
+							 sizeof(AttrNumber) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the ndistinct items */
+	ndistinct = repalloc(ndistinct, offsetof(MVNDistinctData, items) +
+						 (ndistinct->nitems * sizeof(MVNDistinctItem)));
+
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		MVNDistinctItem *item = &ndistinct->items[i];
+
+		/* number of attributes */
+		memcpy(item, tmp, offsetof(MVNDistinctItem, attrs));
+		tmp += offsetof(MVNDistinctItem, attrs);
+
+		/* is the number of attributes valid? */
+		Assert((item->nattrs >= 2) && (item->nattrs <= STATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the attribute */
+		item->attrs = (AttrNumber *) palloc0(item->nattrs * sizeof(AttrNumber));
+
+		/* copy attribute numbers */
+		memcpy(item->attrs, tmp, sizeof(AttrNumber) * item->nattrs);
+		tmp += sizeof(AttrNumber) * item->nattrs;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return ndistinct;
+}
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 20b5273..0af8c34 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1623,6 +1623,10 @@ ProcessUtilitySlow(ParseState *pstate,
 				commandCollected = true;
 				break;
 
+			case T_CreateStatsStmt:		/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -1988,6 +1992,8 @@ AlterObjectTypeCommandTag(ObjectType objtype)
 			break;
 		case OBJECT_SUBSCRIPTION:
 			tag = "ALTER SUBSCRIPTION";
+		case OBJECT_STATISTICS:
+			tag = "ALTER STATISTICS";
 			break;
 		default:
 			tag = "???";
@@ -2282,6 +2288,8 @@ CreateCommandTag(Node *parsetree)
 					break;
 				case OBJECT_PUBLICATION:
 					tag = "DROP PUBLICATION";
+				case OBJECT_STATISTICS:
+					tag = "DROP STATISTICS";
 					break;
 				default:
 					tag = "???";
@@ -2681,6 +2689,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 04bd9b9..5ea9e5b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -126,6 +126,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
+#include "statistics/stats.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/date.h"
@@ -208,6 +209,8 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
+static double find_ndistinct(PlannerInfo *root, RelOptInfo *rel, List *varinfos,
+			   bool *found);
 
 /*
  *		eqsel			- Selectivity of "=" for any data types.
@@ -3437,12 +3440,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient to
+			 * get the (probably way more accurate) estimate.
+			 *
+			 * XXX Might benefit from some refactoring, mixing the ndistinct
+			 * coefficients and clamp seems a bit unfortunate.
 			 */
 			double		clamp = rel->tuples;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				bool		found;
+				double		ndist = find_ndistinct(root, rel, varinfos, &found);
+
+				if (found)
+					reldistinct = ndist;
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3451,6 +3468,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -7592,3 +7610,155 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
+/*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * XXX Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics. This may be
+ * a bit problematic as adding a column (not covered by the ndistinct stats)
+ * will prevent us from using the stats entirely. So instead this needs to
+ * estimate the covered attributes, and then combine that with the extra
+ * attributes somehow (probably the old way).
+ */
+static double
+find_ndistinct(PlannerInfo *root, RelOptInfo *rel, List *varinfos, bool *found)
+{
+	ListCell   *lc;
+	Bitmapset  *attnums = NULL;
+	VariableStatData vardata;
+
+	/* assume we haven't found any suitable ndistinct statistics */
+	*found = false;
+
+	/* bail out immediately if the table has no extended statistics */
+	if (!rel->statlist)
+		return 0.0;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats
+			= (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+
+	/* look for a matching ndistinct statistics */
+	foreach(lc, rel->statlist)
+	{
+		int			i,
+					k;
+		bool		matches;
+		StatisticExtInfo *info = (StatisticExtInfo *) lfirst(lc);
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/*
+		 * Only ndistinct stats covering all Vars are acceptable, which can't
+		 * happen if the statistics has fewer attributes than we have Vars.
+		 */
+		if (bms_num_members(attnums) > info->stakeys->dim1)
+			continue;
+
+		/* check that all Vars are covered by the statistic */
+		matches = true;			/* assume match until we find unmatched
+								 * attribute */
+		k = -1;
+		while ((k = bms_next_member(attnums, k)) >= 0)
+		{
+			bool		attr_found = false;
+
+			for (i = 0; i < info->stakeys->dim1; i++)
+			{
+				if (info->stakeys->values[i] == k)
+				{
+					attr_found = true;
+					break;
+				}
+			}
+
+			/* found attribute not covered by this ndistinct stats, skip */
+			if (!attr_found)
+			{
+				matches = false;
+				break;
+			}
+		}
+
+		if (!matches)
+			continue;
+
+		/* hey, this statistics matches! great, let's extract the value */
+		*found = true;
+
+		{
+			int			j;
+			MVNDistinct stat = load_ext_ndistinct(info->statOid);
+
+			for (j = 0; j < stat->nitems; j++)
+			{
+				bool		item_matches = true;
+				MVNDistinctItem *item = &stat->items[j];
+
+				/* not the right item (different number of attributes) */
+				if (item->nattrs != bms_num_members(attnums))
+					continue;
+
+				/* check the attribute numbers */
+				k = -1;
+				while ((k = bms_next_member(attnums, k)) >= 0)
+				{
+					bool		attr_found = false;
+
+					for (i = 0; i < item->nattrs; i++)
+					{
+						if (info->stakeys->values[item->attrs[i]] == k)
+						{
+							attr_found = true;
+							break;
+						}
+					}
+
+					if (!attr_found)
+					{
+						item_matches = false;
+						break;
+					}
+				}
+
+				if (!item_matches)
+					continue;
+
+				return item->ndistinct;
+			}
+		}
+	}
+
+	Assert(!(*found));
+
+	return 0.0;
+}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index ce55fc5..a6b60c6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -56,6 +56,7 @@
 #include "catalog/pg_publication.h"
 #include "catalog/pg_rewrite.h"
 #include "catalog/pg_shseclabel.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_trigger.h"
@@ -4452,6 +4453,82 @@ RelationGetIndexList(Relation relation)
 }
 
 /*
+ * RelationGetStatExtList
+ *		get a list of OIDs of extended statistics on this relation
+ *
+ * The statistics list is created only if someone requests it, in a way
+ * similar to RelationGetIndexList().  We scan pg_statistic_ext to find
+ * relevant statistics, and add the list to the relcache entry so that we
+ * won't have to compute it again.  Note that shared cache inval of a
+ * relcache entry will delete the old list and set rd_statvalid to 0,
+ * so that we must recompute the statistics list on next request.  This
+ * handles creation or deletion of a statistic.
+ *
+ * The returned list is guaranteed to be sorted in order by OID, although
+ * this is not currently needed.
+ *
+ * Since shared cache inval causes the relcache's copy of the list to go away,
+ * we return a copy of the list palloc'd in the caller's context.  The caller
+ * may list_free() the returned list after scanning it. This is necessary
+ * since the caller will typically be doing syscache lookups on the relevant
+ * statistics, and syscache lookup could cause SI messages to be processed!
+ */
+List *
+RelationGetStatExtList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_statvalid != 0)
+		return list_copy(relation->rd_statlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_statistic_ext for entries having starelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_statistic_ext_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(StatisticExtRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, StatisticExtRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_statlist;
+	relation->rd_statlist = list_copy(result);
+
+	relation->rd_statvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
+/*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
  *
@@ -5560,6 +5637,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_pkattr = NULL;
 		rel->rd_idattr = NULL;
 		rel->rd_pubactions = NULL;
+		rel->rd_statvalid = false;
+		rel->rd_statlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index b1c0b4b..4a9cb76 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -61,6 +61,7 @@
 #include "catalog/pg_shseclabel.h"
 #include "catalog/pg_replication_origin.h"
 #include "catalog/pg_statistic.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_transform.h"
@@ -725,6 +726,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		32
 	},
+	{StatisticExtRelationId,	/* STATEXTNAMENSP */
+		StatisticExtNameIndexId,
+		2,
+		{
+			Anum_pg_statistic_ext_staname,
+			Anum_pg_statistic_ext_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{StatisticExtRelationId,	/* STATEXTOID */
+		StatisticExtOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{StatisticRelationId,		/* STATRELATTINH */
 		StatisticRelidAttnumInhIndexId,
 		3,
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 61a3e2a..3001dee 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2320,6 +2320,50 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any extended statistics */
+		if (pset.sversion >= 100000)
+		{
+			printfPQExpBuffer(&buf,
+							  "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled,\n"
+							  "  (standistinct IS NOT NULL) AS ndist_built,\n"
+							  "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+							  "         JOIN pg_attribute a ON (starelid = a.attrelid and a.attnum = s.attnum))) AS attnums\n"
+			  "FROM pg_statistic_ext stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/* options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBuffer(&buf, "(dependencies)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+									  PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != RELKIND_MATVIEW)
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 10759c7..9effbce 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -147,6 +147,7 @@ typedef enum ObjectClass
 	OCLASS_REWRITE,				/* pg_rewrite */
 	OCLASS_TRIGGER,				/* pg_trigger */
 	OCLASS_SCHEMA,				/* pg_namespace */
+	OCLASS_STATISTIC_EXT,		/* pg_statistic_ext */
 	OCLASS_TSPARSER,			/* pg_ts_parser */
 	OCLASS_TSDICT,				/* pg_ts_dict */
 	OCLASS_TSTEMPLATE,			/* pg_ts_template */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 1187797..473fe17 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveStatisticsExt(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6bce732..8130581 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -182,6 +182,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_oid_index, 3380, on pg_statistic_ext using btree(oid oid_ops));
+#define StatisticExtOidIndexId	3380
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, on pg_statistic_ext using btree(staname name_ops, stanamespace oid_ops));
+#define StatisticExtNameIndexId 3997
+DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, on pg_statistic_ext using btree(starelid oid_ops));
+#define StatisticExtRelidIndexId 3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index dbeb25b..35e0e2b 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -141,6 +141,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 80a40ab..5bcdce7 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -254,6 +254,10 @@ DATA(insert (	23	 18   78 e f ));
 /* pg_node_tree can be coerced to, but not from, text */
 DATA(insert (  194	 25    0 i b ));
 
+/* pg_ndistinct can be coerced to, but not from, bytea and text */
+DATA(insert (  3353  17    0 i b ));
+DATA(insert (  3353  25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index ec4aedb..05baa80 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2726,6 +2726,15 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3354 (  pg_ndistinct_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3353 "2275" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3355 (  pg_ndistinct_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3353" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3356 (  pg_ndistinct_recv PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3353 "2281" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3357 (  pg_ndistinct_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3353" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
new file mode 100644
index 0000000..94b23a2
--- /dev/null
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -0,0 +1,74 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_statistic_ext.h
+ *	  definition of the system "extended statistic" relation (pg_statistic_ext)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_statistic_ext.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_STATISTIC_EXT_H
+#define PG_STATISTIC_EXT_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_statistic_ext definition.  cpp turns this into
+ *		typedef struct FormData_pg_statistic_ext
+ * ----------------
+ */
+#define StatisticExtRelationId	3381
+
+CATALOG(pg_statistic_ext,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+	NameData	staname;		/* statistics name */
+	Oid			stanamespace;	/* OID of namespace containing this statistics */
+	Oid			staowner;		/* statistics owner */
+
+	/*
+	 * variable-length fields start here, but we allow direct access to
+	 * stakeys
+	 */
+	int2vector	stakeys;		/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	char		staenabled[1];	/* statistics requested to build */
+	pg_ndistinct standistinct;	/* ndistinct coefficients (serialized) */
+#endif
+
+} FormData_pg_statistic_ext;
+
+/* ----------------
+ *		Form_pg_statistic_ext corresponds to a pointer to a tuple with
+ *		the format of pg_statistic_ext relation.
+ * ----------------
+ */
+typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
+
+/* ----------------
+ *		compiler constants for pg_statistic_ext
+ * ----------------
+ */
+#define Natts_pg_statistic_ext					7
+#define Anum_pg_statistic_ext_starelid			1
+#define Anum_pg_statistic_ext_staname			2
+#define Anum_pg_statistic_ext_stanamespace		3
+#define Anum_pg_statistic_ext_staowner			4
+#define Anum_pg_statistic_ext_stakeys			5
+#define Anum_pg_statistic_ext_staenabled		6
+#define Anum_pg_statistic_ext_standistinct		7
+
+#define STATS_EXT_NDISTINCT		'd'
+
+#endif   /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 6e4c65e..9c9caf3 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -364,6 +364,10 @@ DATA(insert OID = 194 ( pg_node_tree	PGNSP PGUID -1 f b S f t \054 0 0 0 pg_node
 DESCR("string representing an internal node tree");
 #define PGNODETREEOID	194
 
+DATA(insert OID = 3353 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_ndistinct_in pg_ndistinct_out pg_ndistinct_recv pg_ndistinct_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate ndistinct coefficients");
+#define PGNDISTINCTOID	3353
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index db7f145..00d0a83 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -53,6 +53,7 @@ DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
 DECLARE_TOAST(pg_statistic, 2840, 2841);
+DECLARE_TOAST(pg_statistic_ext, 3439, 3440);
 DECLARE_TOAST(pg_trigger, 2336, 2337);
 
 /* shared catalogs */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 8740cee..c323e81 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -77,6 +77,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(ParseState *pstate, List *name, List *args, bool oldstyle,
 				List *parameters);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 2bc7a5d..d269e77 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -278,6 +278,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_StatisticExtInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -423,6 +424,7 @@ typedef enum NodeTag
 	T_CreateSubscriptionStmt,
 	T_AlterSubscriptionStmt,
 	T_DropSubscriptionStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index a44d217..0a7a8d5c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -644,6 +644,16 @@ typedef struct ColumnDef
 	int			location;		/* parse location, or -1 if none/unknown */
 } ColumnDef;
 
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced column(s) */
+	bool		if_not_exists;	/* do nothing if statistics already exists */
+} CreateStatsStmt;
+
+
 /*
  * TableLikeClause - CREATE TABLE ( ... LIKE ... ) clause
  */
@@ -1593,6 +1603,7 @@ typedef enum ObjectType
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
 	OBJECT_SUBSCRIPTION,
+	OBJECT_STATISTICS,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 05d6f07..5923b5f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -525,6 +525,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *statlist;		/* list of StatisticExtInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -664,6 +665,31 @@ typedef struct ForeignKeyOptInfo
 	List	   *rinfos[INDEX_MAX_KEYS];
 } ForeignKeyOptInfo;
 
+/*
+ * StatisticExtInfo
+ *		Information about extended statistics for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct StatisticExtInfo
+{
+	NodeTag		type;
+
+	Oid			statOid;		/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
+
+	/* built/available statistics */
+	bool		ndist_built;	/* ndistinct coefficient built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+} StatisticExtInfo;
 
 /*
  * EquivalenceClasses
diff --git a/src/include/statistics/common.h b/src/include/statistics/common.h
new file mode 100644
index 0000000..39c62bd
--- /dev/null
+++ b/src/include/statistics/common.h
@@ -0,0 +1,62 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES extended statistics internal declarations
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/statistics/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STATISTICS_COMMON_H
+#define STATISTICS_COMMON_H
+
+#include "commands/vacuum.h"
+#include "utils/sortsupport.h"
+
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+
+/* multi-sort */
+typedef struct MultiSortSupportData
+{
+	int			ndims;			/* number of dimensions supported by the */
+	SortSupportData ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData *MultiSortSupport;
+
+typedef struct SortItem
+{
+	Datum	   *values;
+	bool	   *isnull;
+} SortItem;
+
+extern MultiSortSupport multi_sort_init(int ndims);
+extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 int dim, VacAttrStats **vacattrstats);
+extern int	multi_sort_compare(const void *a, const void *b, void *arg);
+extern int multi_sort_compare_dim(int dim, const SortItem * a,
+					   const SortItem * b, MultiSortSupport mss);
+extern int multi_sort_compare_dims(int start, int end, const SortItem * a,
+						const SortItem * b, MultiSortSupport mss);
+
+/* comparators, used when constructing extended stats */
+extern int	compare_scalars_simple(const void *a, const void *b, void *arg);
+extern int	compare_scalars_partition(const void *a, const void *b, void *arg);
+
+#endif   /* STATISTICS_COMMON_H */
diff --git a/src/include/statistics/stats.h b/src/include/statistics/stats.h
new file mode 100644
index 0000000..ed14459
--- /dev/null
+++ b/src/include/statistics/stats.h
@@ -0,0 +1,57 @@
+/*-------------------------------------------------------------------------
+ *
+ * stats.h
+ *	  Multivariate statistics and selectivity estimation functions.
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/statistics/stats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STATS_H
+#define STATS_H
+
+#include "commands/vacuum.h"
+
+#define STATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+#define STATS_NDISTINCT_MAGIC		0xA352BFA4	/* marks serialized bytea */
+#define STATS_NDISTINCT_TYPE_BASIC	1	/* basic MCV list type */
+
+/* Multivariate distinct coefficients. */
+typedef struct MVNDistinctItem
+{
+	double		ndistinct;
+	AttrNumber	nattrs;
+	AttrNumber *attrs;
+} MVNDistinctItem;
+
+typedef struct MVNDistinctData
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of ndistinct (BASIC) */
+	uint32		nitems;			/* number of items in the statistic */
+	MVNDistinctItem items[FLEXIBLE_ARRAY_MEMBER];
+} MVNDistinctData;
+
+typedef MVNDistinctData *MVNDistinct;
+
+extern MVNDistinct load_ext_ndistinct(Oid mvoid);
+
+extern bytea *serialize_ext_ndistinct(MVNDistinct ndistinct);
+
+/* deserialization of stats (serialization is private to analyze) */
+extern MVNDistinct deserialize_ext_ndistinct(bytea *data);
+
+extern MVNDistinct build_ext_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+					int2vector *attrs, VacAttrStats **stats);
+
+extern void build_ext_stats(Relation onerel, double totalrows,
+				int numrows, HeapTuple *rows,
+				int natts, VacAttrStats **vacattrstats);
+extern bool stats_are_enabled(HeapTuple htup, char type);
+extern bool stats_are_built(HeapTuple htup, char type);
+
+#endif   /* STATS_H */
diff --git a/src/include/utils/acl.h b/src/include/utils/acl.h
index 0d11852..90dac93 100644
--- a/src/include/utils/acl.h
+++ b/src/include/utils/acl.h
@@ -326,6 +326,7 @@ extern bool pg_event_trigger_ownercheck(Oid et_oid, Oid roleid);
 extern bool pg_extension_ownercheck(Oid ext_oid, Oid roleid);
 extern bool pg_publication_ownercheck(Oid pub_oid, Oid roleid);
 extern bool pg_subscription_ownercheck(Oid sub_oid, Oid roleid);
+extern bool pg_statistics_ownercheck(Oid stat_oid, Oid roleid);
 extern bool has_createrole_privilege(Oid roleid);
 extern bool has_bypassrls_privilege(Oid roleid);
 
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index a617a7c..5772804 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -92,6 +92,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_statvalid;	/* state of rd_statlist: true/false */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -136,6 +137,9 @@ typedef struct RelationData
 	Oid			rd_pkindex;		/* OID of primary key, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
 
+	/* data managed by RelationGetStatExtList: */
+	List	   *rd_statlist;	/* list of OIDs of extended stats */
+
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
 	Bitmapset  *rd_keyattr;		/* cols that can be ref'd by foreign keys */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index da36b67..81af3ae 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -39,6 +39,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 66f60d2..048541e 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -86,6 +86,8 @@ enum SysCacheIdentifier
 	PUBLICATIONRELMAP,
 	RULERELNAME,
 	SEQRELID,
+	STATEXTNAMENSP,
+	STATEXTOID,
 	STATRELATTINH,
 	SUBSCRIPTIONOID,
 	SUBSCRIPTIONNAME,
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index 836773f..07b3701 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -38,6 +38,7 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
+CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
@@ -399,7 +400,8 @@ WITH objects (type, name, args) AS (VALUES
 				('access method', '{btree}', '{}'),
 				('publication', '{addr_pub}', '{}'),
 				('publication relation', '{addr_nsp, gentable}', '{addr_pub}'),
-				('subscription', '{addr_sub}', '{}')
+				('subscription', '{addr_sub}', '{}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.objsubid)).*,
 	-- test roundtrip through pg_identify_object_as_address
@@ -447,6 +449,7 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.objsubid)).*,
  trigger                   |            |                   | t on addr_nsp.gentable                                               | t
  operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops USING btree                                   | t
  policy                    |            |                   | genpol on addr_nsp.gentable                                          | t
+ statistics                | addr_nsp   | gentable_stat     | addr_nsp.gentable_stat                                               | t
  collation                 | pg_catalog | "default"         | pg_catalog."default"                                                 | t
  transform                 |            |                   | for integer on language sql                                          | t
  text search dictionary    | addr_nsp   | addr_ts_dict      | addr_nsp.addr_ts_dict                                                | t
@@ -456,7 +459,7 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.objsubid)).*,
  subscription              |            | addr_sub          | addr_sub                                                             | t
  publication               |            | addr_pub          | addr_pub                                                             | t
  publication relation      |            |                   | gentable in publication addr_pub                                     | t
-(45 rows)
+(46 rows)
 
 ---
 --- Cleanup resources
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 0bcec13..9a26205 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -817,11 +817,12 @@ WHERE c.castmethod = 'b' AND
  text              | character         |        0 | i
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
+ pg_ndistinct      | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(7 rows)
+(8 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index bd13ae6..d4b2158 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2160,6 +2160,14 @@ pg_stats| SELECT n.nspname AS schemaname,
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
   WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text) AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid))));
+pg_stats_ext| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length((s.standistinct)::text) AS ndistbytes
+   FROM ((pg_statistic_ext s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index b5eff55..9edba4f 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -142,6 +142,7 @@ pg_shdepend|t
 pg_shdescription|t
 pg_shseclabel|t
 pg_statistic|t
+pg_statistic_ext|t
 pg_subscription|t
 pg_tablespace|t
 pg_transform|t
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
new file mode 100644
index 0000000..77ce1f1
--- /dev/null
+++ b/src/test/regress/expected/stats_ext.out
@@ -0,0 +1,117 @@
+-- data type passed by value
+CREATE TABLE ndistinct (
+    a INT,
+    b INT,
+    c INT,
+    d INT
+);
+-- unknown column
+CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s10 ON (a) FROM ndistinct;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+ERROR:  duplicate column name in statistics definition
+-- correct command
+CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+-- perfectly correlated groups
+INSERT INTO ndistinct
+     SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
+ANALYZE ndistinct;
+SELECT staenabled, standistinct
+  FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
+ staenabled |                                    standistinct                                     
+------------+-------------------------------------------------------------------------------------
+ {d}        | [{0, 1, 101.000000}, {0, 2, 101.000000}, {1, 2, 101.000000}, {0, 1, 2, 101.000000}]
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b, c
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b, c, d
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+TRUNCATE TABLE ndistinct;
+-- partially correlated groups
+INSERT INTO ndistinct
+     SELECT i/50, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE ndistinct;
+SELECT staenabled, standistinct
+  FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
+ staenabled |                                    standistinct                                     
+------------+-------------------------------------------------------------------------------------
+ {d}        | [{0, 1, 201.000000}, {0, 2, 201.000000}, {1, 2, 101.000000}, {0, 1, 2, 201.000000}]
+(1 row)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ HashAggregate  (cost=230.00..232.01 rows=201 width=16)
+   Group Key: a, b
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=8)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=255.00..257.01 rows=201 width=20)
+   Group Key: a, b, c
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=12)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=280.00..290.00 rows=1000 width=24)
+   Group Key: a, b, c, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=16)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=255.00..265.00 rows=1000 width=20)
+   Group Key: b, c, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=12)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ HashAggregate  (cost=230.00..240.00 rows=1000 width=16)
+   Group Key: a, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=8)
+(3 rows)
+
+DROP TABLE ndistinct;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 8d75bbf..f6b799a 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -59,7 +59,7 @@ WHERE (p1.typtype = 'c' AND p1.typrelid = 0) OR
 -- Look for types that should have an array type according to their typtype,
 -- but don't.  We exclude composites here because we have not bothered to
 -- make array types corresponding to the system catalogs' rowtypes.
--- NOTE: as of v10, this check finds pg_node_tree and smgr.
+-- NOTE: as of v10, this check finds pg_node_tree, pg_ndistinct, smgr.
 SELECT p1.oid, p1.typname
 FROM pg_type as p1
 WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
@@ -67,11 +67,12 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid |   typname    
------+--------------
- 194 | pg_node_tree
- 210 | smgr
-(2 rows)
+ oid  |   typname    
+------+--------------
+  194 | pg_node_tree
+ 3353 | pg_ndistinct
+  210 | smgr
+(3 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 9f38349..a8ebf93 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -89,7 +89,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf
+test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf stats_ext
 
 # rules cannot run concurrently with any test that creates a view
 test: rules psql_crosstab amutils
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 2987b24..bff9432 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -128,6 +128,7 @@ test: dbsize
 test: misc_functions
 test: sysviews
 test: tsrf
+test: stats_ext
 test: rules
 test: psql_crosstab
 test: select_parallel
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index 0ace4dd..4e34185 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -41,6 +41,7 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
+CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
@@ -179,7 +180,8 @@ WITH objects (type, name, args) AS (VALUES
 				('access method', '{btree}', '{}'),
 				('publication', '{addr_pub}', '{}'),
 				('publication relation', '{addr_nsp, gentable}', '{addr_pub}'),
-				('subscription', '{addr_sub}', '{}')
+				('subscription', '{addr_sub}', '{}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.objsubid)).*,
 	-- test roundtrip through pg_identify_object_as_address
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
new file mode 100644
index 0000000..6381157
--- /dev/null
+++ b/src/test/regress/sql/stats_ext.sql
@@ -0,0 +1,75 @@
+-- Generic extended statistics support
+CREATE TABLE ab1 (a int, b int);
+CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
+ALTER TABLE ab1 DROP COLUMN a;
+DROP TABLE ab1;
+
+
+-- data type passed by value
+CREATE TABLE ndistinct (
+    a INT,
+    b INT,
+    c INT,
+    d INT
+);
+
+-- unknown column
+CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+
+-- single column
+CREATE STATISTICS s10 ON (a) FROM ndistinct;
+
+-- single column, duplicated
+CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+
+-- two columns, one duplicated
+CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+
+-- correct command
+CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+
+-- perfectly correlated groups
+INSERT INTO ndistinct
+     SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
+
+ANALYZE ndistinct;
+
+SELECT staenabled, standistinct
+  FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+
+TRUNCATE TABLE ndistinct;
+
+-- partially correlated groups
+INSERT INTO ndistinct
+     SELECT i/50, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE ndistinct;
+
+SELECT staenabled, standistinct
+  FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
+
+DROP TABLE ndistinct;
diff --git a/src/test/regress/sql/type_sanity.sql b/src/test/regress/sql/type_sanity.sql
index 0a31249..4c65814 100644
--- a/src/test/regress/sql/type_sanity.sql
+++ b/src/test/regress/sql/type_sanity.sql
@@ -53,7 +53,7 @@ WHERE (p1.typtype = 'c' AND p1.typrelid = 0) OR
 -- Look for types that should have an array type according to their typtype,
 -- but don't.  We exclude composites here because we have not bothered to
 -- make array types corresponding to the system catalogs' rowtypes.
--- NOTE: as of v10, this check finds pg_node_tree and smgr.
+-- NOTE: as of v10, this check finds pg_node_tree, pg_ndistinct, smgr.
 
 SELECT p1.oid, p1.typname
 FROM pg_type as p1

#218

David Fetter

david@fetter.org

almost 9 years ago

In reply to: Alvaro Herrera (#217)

Re: multivariate statistics (v25)

On Tue, Mar 14, 2017 at 07:10:49PM -0300, Alvaro Herrera wrote:

Alvaro Herrera wrote:

I tried patch 0002 today and again there are conflicts, so I rebased and
fixed the merge problems.

... and attached the patch.

Is the plan to convert completely from "multivariate" to "extended?"
I ask because I found a "multivariate" in there.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#219

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: David Fetter (#218)

Re: multivariate statistics (v25)

On 15 March 2017 at 12:18, David Fetter <david@fetter.org> wrote:

Is the plan to convert completely from "multivariate" to "extended?"
I ask because I found a "multivariate" in there.

I get the idea that Tomas would like to keep the multivariate when it's
actually referencing multivariate stats. The idea of the rename was to
allow future expansion of the code to perhaps allow creation of stats on
expressions, which is not multivariate. If you've found multivariate
reference in an area that should be generic to extended statistics then
that's a bug and should be fixed.

I found a few of these and listed them during my review.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#220

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: David Rowley (#214)

1 attachment(s)

Re: multivariate statistics (v25)

Here's another version of 0002 after cleaning up almost everything from
David's review. I also added tests for ALTER STATISTICS in
sql/alter_generic.sql which made me realize there were three crasher bug
in here; fixed all those. It also made me realize that psql's \d was a
little bit too generous with dropped columns in a stats object. That
should all behave better now.

One thing I didn't do was change StatisticExtInfo to use a bitmapset
instead of int2vector. I think it's a good idea to do so.

I'll go rebase the followup patches now.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

stats-ext-shared-infra-27.patchtext/plain; charset=us-asciiDownload

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2c2da2a..b5c4129 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -296,6 +296,11 @@
      </row>
 
      <row>
+      <entry><link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link></entry>
+      <entry>extended planner statistics</entry>
+     </row>
+
+     <row>
       <entry><link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link></entry>
       <entry>logical replication subscriptions</entry>
      </row>
@@ -4223,6 +4228,98 @@
   </table>
  </sect1>
 
+ <sect1 id="catalog-pg-statistic-ext">
+  <title><structname>pg_statistic_ext</structname></title>
+
+  <indexterm zone="catalog-pg-statistic-ext">
+   <primary>pg_statistic_ext</primary>
+  </indexterm>
+
+  <para>
+   The catalog <structname>pg_statistic_ext</structname>
+   holds extended planner statistics.
+  </para>
+
+  <table>
+   <title><structname>pg_statistic_ext</> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+    <tbody>
+
+     <row>
+      <entry><structfield>starelid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
+      <entry>The table that the described columns belongs to</entry>
+     </row>
+
+     <row>
+      <entry><structfield>staname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry></entry>
+      <entry>Name of the statistic.</entry>
+     </row>
+
+     <row>
+      <entry><structfield>stanamespace</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the namespace that contains this statistic
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>staowner</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.oid</literal></entry>
+      <entry>Owner of the statistic</entry>
+     </row>
+
+     <row>
+      <entry><structfield>staenabled</structfield></entry>
+      <entry><type>char[]</type></entry>
+      <entry></entry>
+      <entry>
+        An array with the modes of the enabled statistic types, encoded as
+        <literal>d</literal> for ndistinct coefficients.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>stakeys</structfield></entry>
+      <entry><type>int2vector</type></entry>
+      <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
+      <entry>
+       This is an array of values that indicate which table columns this
+       statistic covers. For example a value of <literal>1 3</literal> would
+       mean that the first and the third table columns make up the statistic key.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structfield>standistinct</structfield></entry>
+      <entry><type>pg_ndistinct</type></entry>
+      <entry></entry>
+      <entry>
+       Ndistict coefficients, serialized as <structname>pg_ndistinct</> type.
+      </entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
  <sect1 id="catalog-pg-namespace">
   <title><structname>pg_namespace</structname></title>
 
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index b73c66b..9260989 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -448,4 +448,149 @@ rows = (outer_cardinality * inner_cardinality) * selectivity
 
  </sect1>
 
+ <sect1 id="extended-statistics">
+  <title>Extended Statistics</title>
+
+  <indexterm zone="extended-statistics">
+   <primary>statistics</primary>
+   <secondary>of the planner</secondary>
+  </indexterm>
+
+  <para>
+   The examples presented in <xref linkend="row-estimation-examples"> used
+   statistics about individual columns to compute selectivity estimates.
+   When estimating the selectivity of conditions over multiple columns,
+   the planner normally assumes each condition is independent of other
+   conditions,
+   and simply multiplies the selectivity estimates of each condition together
+   to produce a final selectivity estimation for all conditions.
+   This method can often lead to inaccurate row estimations
+   when the conditions have dependencies on one another.
+   Such misestimations can result in poor plan choices being made.
+  </para>
+
+  <para>
+   The examples presented below demonstrate such estimation errors on simple
+   data sets, and also how to resolve them by creating extended statistics
+   using the <command>CREATE STATISTICS</> command.
+  </para>
+
+  <para>
+   Let's start with a very simple data set &mdash; a table with two columns,
+   containing exactly the same values:
+
+<programlisting>
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i);
+ANALYZE t;
+</programlisting>
+
+   As explained in <xref linkend="planner-stats">,
+   the planner can determine the cardinality of <structname>t</structname>
+   using the number of pages and rows
+   obtained from <structname>pg_class</structname>:
+
+<programlisting>
+SELECT relpages, reltuples FROM pg_class WHERE relname = 't';
+
+ relpages | reltuples
+----------+-----------
+       45 |     10000
+</programlisting>
+
+   The data distribution is very simple &mdash; there are only 100 distinct values
+   in each column, uniformly distributed.
+  </para>
+
+  <para>
+   The following example shows the result of estimating a <literal>WHERE</>
+   condition on column <structfield>a</>:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
+                                           QUERY PLAN                                            
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..170.00 rows=100 width=8) (actual time=0.031..2.870 rows=100 loops=1)
+   Filter: (a = 1)
+   Rows Removed by Filter: 9900
+ Planning time: 0.092 ms
+ Execution time: 3.103 ms
+(5 rows)
+</programlisting>
+
+   The planner examines the condition and computes the estimate using
+   <function>eqsel</> (the selectivity function for <literal>=</>), and
+   statistics stored in the <structname>pg_stats</> table. In this case,
+   the planner estimates the condition matches 1% of rows, and by comparing
+   the estimated and actual number of rows, we see that the estimate is
+   very accurate (exact, in fact, as the table is very small).
+ </para>
+
+  <para>
+   Adding a condition on the second column results in the following plan:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.033..3.006 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.121 ms
+ Execution time: 3.220 ms
+(5 rows)
+</programlisting>
+
+   The planner estimates the selectivity for each condition individually,
+   arriving to the 1% estimates as above, and then multiplies them, getting
+   the final 0.01% estimate. The plan however shows that this results in
+   a significant underestimate, as the actual number of rows matching the
+   conditions is two orders of magnitude higher than estimated.
+  </para>
+
+  <para>
+   Overestimates, i.e., errors in the opposite direction, are also possible.
+   Consider for example the following combination of range conditions, each
+   matching 
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a <= 49 AND b > 49;
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=2500 width=8) (actual time=1.607..1.607 rows=0 loops=1)
+   Filter: ((a <= 49) AND (b > 49))
+   Rows Removed by Filter: 10000
+ Planning time: 0.050 ms
+ Execution time: 1.623 ms
+(5 rows)
+</programlisting>
+
+   The planner examines both <literal>WHERE</> clauses and estimates them
+   using the <function>scalarltsel</> and <function>scalargtsel</> functions,
+   specified as the selectivity functions matching the <literal>&lt;=</> and
+   <literal>&gt;</literal> operators. Both conditions match 50% of the
+   table, and assuming independence the planner multiplies them to compute
+   the total estimate of 25%. However as the explain output shows, the actual
+   number of rows is 0, because the columns are correlated and the conditions
+   contradict each other.
+  </para>
+
+  <para>
+   Both estimation errors are caused by violation of the independence
+   assumption, as the two columns contain exactly the same values, and are
+   therefore perfectly correlated. Providing additional information about
+   correlation between columns is the purpose of extended statistics,
+   and the rest of this section explains in more detail how the planner
+   leverages them to improve estimates.
+  </para>
+
+  <para>
+   For additional details about extended statistics, see
+   <filename>src/backend/statistics/README</>. There are additional
+   <literal>READMEs</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
+
+ </sect1>
+
 </chapter>
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 2bc4d9f..255e800 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -34,6 +34,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY alterSequence      SYSTEM "alter_sequence.sgml">
 <!ENTITY alterSubscription  SYSTEM "alter_subscription.sgml">
 <!ENTITY alterSystem        SYSTEM "alter_system.sgml">
+<!ENTITY alterStatistics    SYSTEM "alter_statistics.sgml">
 <!ENTITY alterTable         SYSTEM "alter_table.sgml">
 <!ENTITY alterTableSpace    SYSTEM "alter_tablespace.sgml">
 <!ENTITY alterTSConfig      SYSTEM "alter_tsconfig.sgml">
@@ -80,6 +81,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY createSchema       SYSTEM "create_schema.sgml">
 <!ENTITY createSequence     SYSTEM "create_sequence.sgml">
 <!ENTITY createServer       SYSTEM "create_server.sgml">
+<!ENTITY createStatistics   SYSTEM "create_statistics.sgml">
 <!ENTITY createSubscription SYSTEM "create_subscription.sgml">
 <!ENTITY createTable        SYSTEM "create_table.sgml">
 <!ENTITY createTableAs      SYSTEM "create_table_as.sgml">
@@ -126,6 +128,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY dropSchema         SYSTEM "drop_schema.sgml">
 <!ENTITY dropSequence       SYSTEM "drop_sequence.sgml">
 <!ENTITY dropServer         SYSTEM "drop_server.sgml">
+<!ENTITY dropStatistics     SYSTEM "drop_statistics.sgml">
 <!ENTITY dropSubscription   SYSTEM "drop_subscription.sgml">
 <!ENTITY dropTable          SYSTEM "drop_table.sgml">
 <!ENTITY dropTableSpace     SYSTEM "drop_tablespace.sgml">
diff --git a/doc/src/sgml/ref/alter_statistics.sgml b/doc/src/sgml/ref/alter_statistics.sgml
new file mode 100644
index 0000000..3e4d286
--- /dev/null
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -0,0 +1,115 @@
+<!--
+doc/src/sgml/ref/alter_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-ALTERSTATISTICS">
+ <indexterm zone="sql-alterstatistics">
+  <primary>ALTER STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>ALTER STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>ALTER STATISTICS</refname>
+  <refpurpose>
+   change the definition of a extended statistics
+  </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> | CURRENT_USER | SESSION_USER }
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> RENAME TO <replaceable class="parameter">new_name</replaceable>
+ALTER STATISTICS <replaceable class="parameter">name</replaceable> SET SCHEMA <replaceable class="parameter">new_schema</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>ALTER STATISTICS</command> changes the parameters of an existing
+   extended statistics.  Any parameters not specifically set in the
+   <command>ALTER STATISTICS</command> command retain their prior settings.
+  </para>
+
+  <para>
+   You must own the statistics to use <command>ALTER STATISTICS</>.
+   To change a statistics' schema, you must also have <literal>CREATE</>
+   privilege on the new schema.
+   To alter the owner, you must also be a direct or indirect member of the new
+   owning role, and that role must have <literal>CREATE</literal> privilege on
+   the statistics' schema.  (These restrictions enforce that altering the owner
+   doesn't do anything you couldn't do by dropping and recreating the statistics.
+   However, a superuser can alter ownership of any statistics anyway.)
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><replaceable class="parameter">name</replaceable></term>
+      <listitem>
+       <para>
+        The name (optionally schema-qualified) of the statistics to be altered.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="PARAMETER">new_owner</replaceable></term>
+      <listitem>
+       <para>
+        The user name of the new owner of the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="parameter">new_name</replaceable></term>
+      <listitem>
+       <para>
+        The new name for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable class="parameter">new_schema</replaceable></term>
+      <listitem>
+       <para>
+        The new schema for the statistics.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
+   </para>
+  </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>ALTER STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-createstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 077c003..f3ad5ed 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -119,9 +119,12 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
      <para>
       This form drops a column from a table.  Indexes and
       table constraints involving the column will be automatically
-      dropped as well.  You will need to say <literal>CASCADE</> if
-      anything outside the table depends on the column, for example,
-      foreign key references or views.
+      dropped as well.
+      Multivariate statistics referencing the dropped column will also be
+      removed if the removal of the column would cause the statistics to
+      contain data for only a single column.
+      You will need to say <literal>CASCADE</> if anything outside the table
+      depends on the column, for example, foreign key references or views.
       If <literal>IF EXISTS</literal> is specified and the column
       does not exist, no error is thrown. In this case a notice
       is issued instead.
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 0000000..e3d120e
--- /dev/null
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -0,0 +1,154 @@
+<!--
+doc/src/sgml/ref/create_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATESTATISTICS">
+ <indexterm zone="sql-createstatistics">
+  <primary>CREATE STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>CREATE STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>CREATE STATISTICS</refname>
+  <refpurpose>define extended statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
+  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+  FROM <replaceable class="PARAMETER">table_name</replaceable>
+</synopsis>
+
+ </refsynopsisdiv>
+
+ <refsect1 id="SQL-CREATESTATISTICS-description">
+  <title>Description</title>
+
+  <para>
+   <command>CREATE STATISTICS</command> will create a new extended statistics
+   on the table. The statistics will be created in the current database and
+   will be owned by the user issuing the command.
+  </para>
+
+  <para>
+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+
+   <varlistentry>
+    <term><literal>IF NOT EXISTS</></term>
+    <listitem>
+     <para>
+      Do not throw an error if a statistics with the same name already exists.
+      A notice is issued in this case.  Note that there is no guarantee that
+      the existing statistics is anything like the one that would have been
+      created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">statistics_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to be created.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table_name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table the statistics should
+      be created on.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column_name</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column to be included in the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+  <para>
+   You must be the owner of a table to create or change statistics on it.
+  </para>
+ </refsect1>
+
+ <refsect1 id="SQL-CREATESTATISTICS-examples">
+  <title>Examples</title>
+
+  <para>
+   Create table <structname>t1</> with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for determining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
+
+<programlisting>
+CREATE TABLE t1 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t1 SELECT i/100, i/500
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s1 ON (a, b) FROM t1;
+
+ANALYZE t1;
+
+-- valid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
+
+-- invalid combination of values
+EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>CREATE STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-dropstatistics"></member>
+  </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_statistics.sgml b/doc/src/sgml/ref/drop_statistics.sgml
new file mode 100644
index 0000000..d7c657f
--- /dev/null
+++ b/doc/src/sgml/ref/drop_statistics.sgml
@@ -0,0 +1,91 @@
+<!--
+doc/src/sgml/ref/drop_statistics.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPSTATISTICS">
+ <indexterm zone="sql-dropstatistics">
+  <primary>DROP STATISTICS</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle>DROP STATISTICS</refentrytitle>
+  <manvolnum>7</manvolnum>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>DROP STATISTICS</refname>
+  <refpurpose>remove extended statistics</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>DROP STATISTICS</command> removes statistics from the database.
+   Only the statistics owner, the schema owner, and superuser can drop a
+   statistics.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><literal>IF EXISTS</literal></term>
+    <listitem>
+     <para>
+      Do not throw an error if the statistics do not exist. A notice is
+      issued in this case.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">name</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the statistics to drop.
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   ...
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   There's no <command>DROP STATISTICS</command> command in the SQL standard.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="sql-alterstatistics"></member>
+   <member><xref linkend="sql-createstatistics"></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index c8191de..aa8a157 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -60,6 +60,7 @@
    &alterSchema;
    &alterSequence;
    &alterServer;
+   &alterStatistics;
    &alterSubscription;
    &alterSystem;
    &alterTable;
@@ -108,6 +109,7 @@
    &createSchema;
    &createSequence;
    &createServer;
+   &createStatistics;
    &createSubscription;
    &createTable;
    &createTableAs;
@@ -154,6 +156,7 @@
    &dropSchema;
    &dropSequence;
    &dropServer;
+   &dropStatistics;
    &dropSubscription;
    &dropTable;
    &dropTableSpace;
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 7a0bbb2..426ef4f 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -19,7 +19,7 @@ include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = access bootstrap catalog parser commands executor foreign lib libpq \
 	main nodes optimizer port postmaster regex replication rewrite \
-	storage tcop tsearch utils $(top_builddir)/src/timezone
+	statistics storage tcop tsearch utils $(top_builddir)/src/timezone
 
 include $(srcdir)/common.mk
 
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 3136858..ff7cc79 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -33,6 +33,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
+	pg_statistic_ext.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index be86d76..d01930f 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -48,6 +48,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_proc.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
@@ -3302,6 +3303,8 @@ static const char *const no_priv_msg[MAX_ACL_KIND] =
 	gettext_noop("permission denied for collation %s"),
 	/* ACL_KIND_CONVERSION */
 	gettext_noop("permission denied for conversion %s"),
+	/* ACL_KIND_STATISTICS */
+	gettext_noop("permission denied for statistics %s"),
 	/* ACL_KIND_TABLESPACE */
 	gettext_noop("permission denied for tablespace %s"),
 	/* ACL_KIND_TSDICTIONARY */
@@ -3352,6 +3355,8 @@ static const char *const not_owner_msg[MAX_ACL_KIND] =
 	gettext_noop("must be owner of collation %s"),
 	/* ACL_KIND_CONVERSION */
 	gettext_noop("must be owner of conversion %s"),
+	/* ACL_KIND_STATISTICS */
+	gettext_noop("must be owner of statistics %s"),
 	/* ACL_KIND_TABLESPACE */
 	gettext_noop("must be owner of tablespace %s"),
 	/* ACL_KIND_TSDICTIONARY */
@@ -3467,6 +3472,10 @@ pg_aclmask(AclObjectKind objkind, Oid table_oid, AttrNumber attnum, Oid roleid,
 												   mask, how, NULL);
 		case ACL_KIND_NAMESPACE:
 			return pg_namespace_aclmask(table_oid, roleid, mask, how);
+		case ACL_KIND_STATISTICS:
+			elog(ERROR, "grantable rights not supported for statistics");
+			/* not reached, but keep compiler quiet */
+			return ACL_NO_RIGHTS;
 		case ACL_KIND_TABLESPACE:
 			return pg_tablespace_aclmask(table_oid, roleid, mask, how);
 		case ACL_KIND_FDW:
@@ -5104,6 +5113,32 @@ pg_subscription_ownercheck(Oid sub_oid, Oid roleid)
 }
 
 /*
+ * Ownership check for a extended statistics (specified by OID).
+ */
+bool
+pg_statistics_ownercheck(Oid stat_oid, Oid roleid)
+{
+	HeapTuple	tuple;
+	Oid			ownerId;
+
+	/* Superusers bypass all permission checking. */
+	if (superuser_arg(roleid))
+		return true;
+
+	tuple = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(stat_oid));
+	if (!HeapTupleIsValid(tuple))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics with OID %u do not exist", stat_oid)));
+
+	ownerId = ((Form_pg_statistic_ext) GETSTRUCT(tuple))->staowner;
+
+	ReleaseSysCache(tuple);
+
+	return has_privs_of_role(roleid, ownerId);
+}
+
+/*
  * Check whether specified role has CREATEROLE privilege (or is a superuser)
  *
  * Note: roles do not have owners per se; instead we use this test in
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fc088b2..ee27cae 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -51,6 +51,7 @@
 #include "catalog/pg_publication.h"
 #include "catalog/pg_publication_rel.h"
 #include "catalog/pg_rewrite.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_transform.h"
@@ -154,6 +155,7 @@ static const Oid object_classes[] = {
 	RewriteRelationId,			/* OCLASS_REWRITE */
 	TriggerRelationId,			/* OCLASS_TRIGGER */
 	NamespaceRelationId,		/* OCLASS_SCHEMA */
+	StatisticExtRelationId,		/* OCLASS_STATISTIC_EXT */
 	TSParserRelationId,			/* OCLASS_TSPARSER */
 	TSDictionaryRelationId,		/* OCLASS_TSDICT */
 	TSTemplateRelationId,		/* OCLASS_TSTEMPLATE */
@@ -1263,6 +1265,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			DropTransformById(object->objectId);
 			break;
 
+		case OCLASS_STATISTIC_EXT:
+			RemoveStatisticsById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2377,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
 		case NamespaceRelationId:
 			return OCLASS_SCHEMA;
 
+		case StatisticExtRelationId:
+			return OCLASS_STATISTIC_EXT;
+
 		case TSParserRelationId:
 			return OCLASS_TSPARSER;
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 41c0056..131a432 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -52,6 +52,7 @@
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_partitioned_table.h"
 #include "catalog/pg_statistic.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
 #include "catalog/pg_type_fn.h"
@@ -1608,7 +1609,10 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
 	heap_close(attr_rel, RowExclusiveLock);
 
 	if (attnum > 0)
+	{
 		RemoveStatistics(relid, attnum);
+		RemoveStatisticsExt(relid, attnum);
+	}
 
 	relation_close(rel, NoLock);
 }
@@ -1854,6 +1858,7 @@ heap_drop_with_catalog(Oid relid)
 	 * delete statistics
 	 */
 	RemoveStatistics(relid, 0);
+	RemoveStatisticsExt(relid, 0);
 
 	/*
 	 * delete attribute tuples
@@ -2766,6 +2771,103 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
 
 
 /*
+ * RemoveStatisticsExt --- remove entries in pg_statistic_ext for a rel
+ *
+ * If attnum is zero, remove all entries for rel; else remove only the one(s)
+ * for that column.
+ */
+void
+RemoveStatisticsExt(Oid relid, AttrNumber attnum)
+{
+	Relation	pgstatisticext;
+	TupleDesc	tupdesc = NULL;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * When dropping a column, we'll drop statistics with a single remaining
+	 * (undropped column). To do that, we need the tuple descriptor.
+	 *
+	 * We already have the relation locked (as we're running ALTER TABLE ...
+	 * DROP COLUMN), so we'll just get the descriptor here.
+	 */
+	if (attnum != 0)
+	{
+		Relation	rel = relation_open(relid, NoLock);
+
+		/* extended stats are supported on tables and matviews */
+		if (rel->rd_rel->relkind == RELKIND_RELATION ||
+			rel->rd_rel->relkind == RELKIND_MATVIEW)
+			tupdesc = RelationGetDescr(rel);
+
+		relation_close(rel, NoLock);
+	}
+
+	if (tupdesc == NULL)
+		return;
+
+	pgstatisticext = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_statistic_ext_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(pgstatisticext,
+							  StatisticExtRelidIndexId,
+							  true, NULL, 1, &key);
+
+	/* we must loop even when attnum != 0, in case of inherited stats */
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		bool		delete = true;
+
+		if (attnum != 0)
+		{
+			Datum		adatum;
+			bool		isnull;
+			int			i;
+			int			ncolumns = 0;
+			ArrayType  *arr;
+			int16	   *attnums;
+
+			/* get the columns */
+			adatum = SysCacheGetAttr(STATEXTOID, tuple,
+									 Anum_pg_statistic_ext_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+			attnums = (int16 *) ARR_DATA_PTR(arr);
+
+			for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+			{
+				/* count the column unless it's has been / is being dropped */
+				if ((!tupdesc->attrs[attnums[i] - 1]->attisdropped) &&
+					(attnums[i] != attnum))
+					ncolumns += 1;
+			}
+
+			/* delete if there are less than two attributes */
+			delete = (ncolumns < 2);
+		}
+
+		if (delete)
+		{
+			simple_heap_delete(pgstatisticext, &tuple->t_self);
+			deleteDependencyRecordsFor(StatisticExtRelationId,
+									   HeapTupleGetOid(tuple),
+									   false);
+		}
+	}
+
+	systable_endscan(scan);
+
+	heap_close(pgstatisticext, RowExclusiveLock);
+}
+
+
+/*
  * RelationTruncateIndexes - truncate all indexes associated
  * with the heap relation to zero tuples.
  *
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index a38da30..e521bd9 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -2086,6 +2086,62 @@ ConversionIsVisible(Oid conid)
 }
 
 /*
+ * get_statistics_oid - find a statistics by possibly qualified name
+ *
+ * If not found, returns InvalidOid if missing_ok, else throws error
+ */
+Oid
+get_statistics_oid(List *names, bool missing_ok)
+{
+	char	   *schemaname;
+	char	   *stats_name;
+	Oid			namespaceId;
+	Oid			stats_oid = InvalidOid;
+	ListCell   *l;
+
+	/* deconstruct the name list */
+	DeconstructQualifiedName(names, &schemaname, &stats_name);
+
+	if (schemaname)
+	{
+		/* use exact schema given */
+		namespaceId = LookupExplicitNamespace(schemaname, missing_ok);
+		if (missing_ok && !OidIsValid(namespaceId))
+			stats_oid = InvalidOid;
+		else
+			stats_oid = GetSysCacheOid2(STATEXTNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+	}
+	else
+	{
+		/* search for it in search path */
+		recomputeNamespacePath();
+
+		foreach(l, activeSearchPath)
+		{
+			namespaceId = lfirst_oid(l);
+
+			if (namespaceId == myTempNamespace)
+				continue;		/* do not look in temp namespace */
+			stats_oid = GetSysCacheOid2(STATEXTNAMENSP,
+										PointerGetDatum(stats_name),
+										ObjectIdGetDatum(namespaceId));
+			if (OidIsValid(stats_oid))
+				break;
+		}
+	}
+
+	if (!OidIsValid(stats_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("statistics \"%s\" do not exist",
+						NameListToString(names))));
+
+	return stats_oid;
+}
+
+/*
  * get_ts_parser_oid - find a TS parser by possibly qualified name
  *
  * If not found, returns InvalidOid if missing_ok, else throws error
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 3a7f049..6c6bf27 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -48,6 +48,7 @@
 #include "catalog/pg_publication.h"
 #include "catalog/pg_publication_rel.h"
 #include "catalog/pg_rewrite.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_transform.h"
@@ -478,6 +479,18 @@ static const ObjectPropertyType ObjectProperty[] =
 		InvalidAttrNumber,
 		ACL_KIND_SUBSCRIPTION,
 		true
+	},
+	{
+		StatisticExtRelationId,
+		StatisticExtOidIndexId,
+		STATEXTOID,
+		STATEXTNAMENSP,
+		Anum_pg_statistic_ext_staname,
+		Anum_pg_statistic_ext_stanamespace,
+		Anum_pg_statistic_ext_staowner,
+		InvalidAttrNumber,		/* no ACL (same as relation) */
+		ACL_KIND_STATISTICS,
+		true
 	}
 };
 
@@ -696,6 +709,10 @@ static const struct object_type_map
 	/* OCLASS_TRANSFORM */
 	{
 		"transform", OBJECT_TRANSFORM
+	},
+	/* OBJECT_STATISTIC_EXT */
+	{
+		"statistics", OBJECT_STATISTIC_EXT
 	}
 };
 
@@ -974,6 +991,12 @@ get_object_address(ObjectType objtype, Node *object,
 				address = get_object_address_defacl(castNode(List, object),
 													missing_ok);
 				break;
+			case OBJECT_STATISTIC_EXT:
+				address.classId = StatisticExtRelationId;
+				address.objectId = get_statistics_oid(castNode(List, object),
+													  missing_ok);
+				address.objectSubId = 0;
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				/* placate compiler, in case it thinks elog might return */
@@ -2079,6 +2102,7 @@ pg_get_object_address(PG_FUNCTION_ARGS)
 		case OBJECT_ATTRIBUTE:
 		case OBJECT_COLLATION:
 		case OBJECT_CONVERSION:
+		case OBJECT_STATISTIC_EXT:
 		case OBJECT_TSPARSER:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSTEMPLATE:
@@ -2366,6 +2390,10 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 						 errmsg("must be superuser")));
 			break;
+		case OBJECT_STATISTIC_EXT:
+			if (!pg_statistics_ownercheck(address.objectId, roleid))
+				aclcheck_error_type(ACLCHECK_NOT_OWNER, address.objectId);
+			break;
 		default:
 			elog(ERROR, "unrecognized object type: %d",
 				 (int) objtype);
@@ -3853,6 +3881,10 @@ getObjectTypeDescription(const ObjectAddress *object)
 			appendStringInfoString(&buffer, "subscription");
 			break;
 
+		case OCLASS_STATISTIC_EXT:
+			appendStringInfoString(&buffer, "statistics");
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized %u", object->classId);
 			break;
@@ -4876,6 +4908,29 @@ getObjectIdentityParts(const ObjectAddress *object,
 				break;
 			}
 
+		case OCLASS_STATISTIC_EXT:
+			{
+				HeapTuple	tup;
+				Form_pg_statistic_ext formStatistic;
+				char	   *schema;
+
+				tup = SearchSysCache1(STATEXTOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for statistics %u",
+						 object->objectId);
+				formStatistic = (Form_pg_statistic_ext) GETSTRUCT(tup);
+				schema = get_namespace_name_or_temp(formStatistic->stanamespace);
+				appendStringInfoString(&buffer,
+									   quote_qualified_identifier(schema,
+										   NameStr(formStatistic->staname)));
+				if (objname)
+					*objname = list_make2(schema,
+								   pstrdup(NameStr(formStatistic->staname)));
+				ReleaseSysCache(tup);
+			}
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0bce209..f3b3578 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -186,6 +186,16 @@ CREATE OR REPLACE VIEW pg_sequences AS
     WHERE NOT pg_is_other_temp_schema(N.oid)
           AND relkind = 'S';
 
+CREATE VIEW pg_stats_ext AS
+    SELECT
+        N.nspname AS schemaname,
+        C.relname AS tablename,
+        S.staname AS staname,
+        S.stakeys AS attnums,
+        length(s.standistinct) AS ndistbytes
+    FROM (pg_statistic_ext S JOIN pg_class C ON (C.oid = S.starelid))
+        LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
+
 CREATE VIEW pg_stats WITH (security_barrier) AS
     SELECT
         nspname AS schemaname,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e0fab38..4a6c99e 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
-	schemacmds.o seclabel.o sequence.o subscriptioncmds.o tablecmds.o \
-	tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
-	vacuumlazy.o variable.o view.o
+	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
+	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
+	vacuum.o vacuumlazy.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index cf1391c..2c6435b 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_subscription.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_ts_parser.h"
@@ -120,6 +121,10 @@ report_namespace_conflict(Oid classId, const char *name, Oid nspOid)
 			Assert(OidIsValid(nspOid));
 			msgfmt = gettext_noop("conversion \"%s\" already exists in schema \"%s\"");
 			break;
+		case StatisticExtRelationId:
+			Assert(OidIsValid(nspOid));
+			msgfmt = gettext_noop("statistics \"%s\" already exists in schema \"%s\"");
+			break;
 		case TSParserRelationId:
 			Assert(OidIsValid(nspOid));
 			msgfmt = gettext_noop("text search parser \"%s\" already exists in schema \"%s\"");
@@ -373,6 +378,7 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
 		case OBJECT_LANGUAGE:
+		case OBJECT_STATISTIC_EXT:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -489,6 +495,7 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTIC_EXT:
 		case OBJECT_TSCONFIGURATION:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSPARSER:
@@ -803,6 +810,7 @@ ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 		case OBJECT_OPERATOR:
 		case OBJECT_OPCLASS:
 		case OBJECT_OPFAMILY:
+		case OBJECT_STATISTIC_EXT:
 		case OBJECT_TABLESPACE:
 		case OBJECT_TSDICTIONARY:
 		case OBJECT_TSCONFIGURATION:
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index b91df98..39d9bdb 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -17,6 +17,7 @@
 #include <math.h>
 
 #include "access/multixact.h"
+#include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tupconvert.h"
 #include "access/tuptoaster.h"
@@ -28,6 +29,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits_fn.h"
 #include "catalog/pg_namespace.h"
+#include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
@@ -39,13 +41,17 @@
 #include "parser/parse_relation.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
+#include "statistics/common.h"
+#include "statistics/stats.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/attoptcache.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -566,6 +572,9 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params,
 			update_attstats(RelationGetRelid(Irel[ind]), false,
 							thisdata->attr_cnt, thisdata->vacattrstats);
 		}
+
+		/* Build extended statistics (if there are any). */
+		build_ext_stats(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats);
 	}
 
 	/*
@@ -1683,19 +1692,6 @@ ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
  */
 typedef struct
 {
-	Oid			eqopr;			/* '=' operator for datatype, if any */
-	Oid			eqfunc;			/* and associated function */
-	Oid			ltopr;			/* '<' operator for datatype, if any */
-} StdAnalyzeData;
-
-typedef struct
-{
-	Datum		value;			/* a data value */
-	int			tupno;			/* position index for tuple it came from */
-} ScalarItem;
-
-typedef struct
-{
 	int			count;			/* # of duplicates */
 	int			first;			/* values[] index of first occurrence */
 } ScalarMCVItem;
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index ab73fbf..cb948f0 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -286,6 +286,13 @@ does_not_exist_skipping(ObjectType objtype, Node *object)
 			msg = gettext_noop("schema \"%s\" does not exist, skipping");
 			name = strVal((Value *) object);
 			break;
+		case OBJECT_STATISTIC_EXT:
+			if (!schema_does_not_exist_skipping(castNode(List, object), &msg, &name))
+			{
+				msg = gettext_noop("extended statistics \"%s\" do not exist, skipping");
+				name = NameListToString(castNode(List, object));
+			}
+			break;
 		case OBJECT_TSPARSER:
 			if (!schema_does_not_exist_skipping(castNode(List, object), &msg, &name))
 			{
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 346b347..7366fc7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -112,6 +112,7 @@ static event_trigger_support_data event_trigger_support[] = {
 	{"SCHEMA", true},
 	{"SEQUENCE", true},
 	{"SERVER", true},
+	{"STATISTICS", true},
 	{"SUBSCRIPTION", true},
 	{"TABLE", true},
 	{"TABLESPACE", false},
@@ -1108,6 +1109,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
 		case OBJECT_SCHEMA:
 		case OBJECT_SEQUENCE:
 		case OBJECT_SUBSCRIPTION:
+		case OBJECT_STATISTIC_EXT:
 		case OBJECT_TABCONSTRAINT:
 		case OBJECT_TABLE:
 		case OBJECT_TRANSFORM:
@@ -1173,6 +1175,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 		case OCLASS_PUBLICATION:
 		case OCLASS_PUBLICATION_REL:
 		case OCLASS_SUBSCRIPTION:
+		case OCLASS_STATISTIC_EXT:
 			return true;
 	}
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index 0000000..a2d603d
--- /dev/null
+++ b/src/backend/commands/statscmds.c
@@ -0,0 +1,275 @@
+/*-------------------------------------------------------------------------
+ *
+ * statscmds.c
+ *	  Commands for creating and altering extended statistics
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/statscmds.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_statistic_ext.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "statistics/stats.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+/* used for sorting the attnums in ExecCreateStatistics */
+static int
+compare_int16(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(int16));
+}
+
+/*
+ * Implements the CREATE STATISTICS name ON (columns) FROM table
+ *
+ * We do require that the types support sorting (ltopr), although some
+ * statistics might work with equality only.
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)
+{
+	int			i;
+	ListCell   *l;
+	int16		attnums[STATS_MAX_DIMENSIONS];
+	int			numcols = 0;
+	ObjectAddress address = InvalidObjectAddress;
+	char	   *namestr;
+	NameData	staname;
+	Oid			statoid;
+	Oid			namespaceId;
+	HeapTuple	htup;
+	Datum		values[Natts_pg_statistic_ext];
+	bool		nulls[Natts_pg_statistic_ext];
+	int2vector *stakeys;
+	Relation	statrel;
+	Relation	rel;
+	Oid			relid;
+	ObjectAddress parentobject,
+				childobject;
+	Datum		types[1];		/* only ndistinct defined now */
+	int			ntypes;
+	ArrayType  *staenabled;
+
+	Assert(IsA(stmt, CreateStatsStmt));
+
+	/* resolve the pieces of the name (namespace etc.) */
+	namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames, &namestr);
+	namestrcpy(&staname, namestr);
+
+	/*
+	 * If if_not_exists was given and the statistics already exists, bail out.
+	 */
+	if (SearchSysCacheExists2(STATEXTNAMENSP,
+							  PointerGetDatum(&staname),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		if (stmt->if_not_exists)
+		{
+			ereport(NOTICE,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("statistics \"%s\" already exist, skipping",
+							namestr)));
+			return InvalidObjectAddress;
+		}
+
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics \"%s\" already exist", namestr)));
+	}
+
+	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+	relid = RelationGetRelid(rel);
+
+	if (rel->rd_rel->relkind != RELKIND_RELATION &&
+		rel->rd_rel->relkind != RELKIND_MATVIEW)
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("relation \"%s\" is not a table or materialized view",
+						RelationGetRelationName(rel))));
+
+
+	/* ndistinct coefficients is the only known type of extended statistics */
+	ntypes = 1;
+	types[0] = CharGetDatum(STATS_EXT_NDISTINCT);
+
+	/*
+	 * Transform column names to array of attnums. While doing that, we also
+	 * enforce the maximum number of keys.
+	 */
+	foreach(l, stmt->keys)
+	{
+		char	   *attname = strVal(lfirst(l));
+		HeapTuple	atttuple;
+
+		atttuple = SearchSysCacheAttName(relid, attname);
+
+		if (!HeapTupleIsValid(atttuple))
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+			  errmsg("column \"%s\" referenced in statistics does not exist",
+					 attname)));
+
+		/* more than STATS_MAX_DIMENSIONS columns not allowed */
+		if (numcols >= STATS_MAX_DIMENSIONS)
+			ereport(ERROR,
+					(errcode(ERRCODE_TOO_MANY_COLUMNS),
+					 errmsg("cannot have more than %d keys in statistics",
+							STATS_MAX_DIMENSIONS)));
+
+		attnums[numcols] = ((Form_pg_attribute) GETSTRUCT(atttuple))->attnum;
+		ReleaseSysCache(atttuple);
+		numcols++;
+	}
+
+	/*
+	 * Check that at least two columns were specified in the statement. The
+	 * upper bound was already checked in the loop above.
+	 */
+	if (numcols < 2)
+		ereport(ERROR,
+				(errcode(ERRCODE_TOO_MANY_COLUMNS),
+				 errmsg("statistics require at least 2 columns")));
+
+	/*
+	 * Sort the attnums, which makes detecting duplicies somewhat easier, and
+	 * it does not hurt (it does not affect the efficiency, unlike for
+	 * indexes, for example).
+	 */
+	qsort(attnums, numcols, sizeof(int16), compare_int16);
+
+	/*
+	 * Look for duplicities in the list of columns. The attnums are sorted so
+	 * just check consecutive elements.
+	 */
+	for (i = 1; i < numcols; i++)
+		if (attnums[i] == attnums[i - 1])
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_COLUMN),
+				  errmsg("duplicate column name in statistics definition")));
+
+	stakeys = buildint2vector(attnums, numcols);
+
+	/* construct the char array of enabled statistic types */
+	staenabled = construct_array(types, ntypes, CHAROID, 1, true, 'c');
+
+	/*
+	 * Everything seems fine, so let's build the pg_statistic_ext entry. At
+	 * this point we obviously only have the keys and options.
+	 */
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* metadata */
+	values[Anum_pg_statistic_ext_starelid - 1] = ObjectIdGetDatum(relid);
+	values[Anum_pg_statistic_ext_staname - 1] = NameGetDatum(&staname);
+	values[Anum_pg_statistic_ext_stanamespace - 1] = ObjectIdGetDatum(namespaceId);
+	values[Anum_pg_statistic_ext_staowner - 1] = ObjectIdGetDatum(GetUserId());
+
+	values[Anum_pg_statistic_ext_stakeys - 1] = PointerGetDatum(stakeys);
+
+	/* enabled statistics */
+	values[Anum_pg_statistic_ext_staenabled - 1] = PointerGetDatum(staenabled);
+
+	/* no statistics build yet */
+	nulls[Anum_pg_statistic_ext_standistinct - 1] = true;
+
+	/* insert the tuple into pg_statistic_ext */
+	statrel = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+	htup = heap_form_tuple(statrel->rd_att, values, nulls);
+
+	CatalogTupleInsert(statrel, htup);
+
+	statoid = HeapTupleGetOid(htup);
+
+	heap_freetuple(htup);
+
+	/*
+	 * Add a dependency on a table, so that stats get dropped on DROP TABLE.
+	 */
+	ObjectAddressSet(parentobject, RelationRelationId, relid);
+	ObjectAddressSet(childobject, StatisticExtRelationId, statoid);
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also add dependency on the schema (to drop statistics on DROP SCHEMA).
+	 * This is not handled automatically by DROP TABLE because statistics have
+	 * their own schema.
+	 */
+	ObjectAddressSet(parentobject, NamespaceRelationId, namespaceId);
+
+	recordDependencyOn(&childobject, &parentobject, DEPENDENCY_AUTO);
+
+	heap_close(statrel, RowExclusiveLock);
+
+	relation_close(rel, NoLock);
+
+	/*
+	 * Invalidate relcache so that others see the new statistics.
+	 */
+	CacheInvalidateRelcache(rel);
+
+	ObjectAddressSet(address, StatisticExtRelationId, statoid);
+
+	return address;
+}
+
+
+/*
+ * Implements the DROP STATISTICS
+ *
+ *	   DROP STATISTICS stats_name
+ */
+void
+RemoveStatisticsById(Oid statsOid)
+{
+	Relation	relation;
+	Oid			relid;
+	Relation	rel;
+	HeapTuple	tup;
+	Form_pg_statistic_ext statext;
+
+	/*
+	 * Delete the pg_statistic_ext tuple.
+	 */
+	relation = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statsOid));
+
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics %u", statsOid);
+
+	statext = (Form_pg_statistic_ext) GETSTRUCT(tup);
+	relid = statext->starelid;
+
+	rel = heap_open(relid, AccessExclusiveLock);
+
+	simple_heap_delete(relation, &tup->t_self);
+
+	CacheInvalidateRelcache(rel);
+
+	ReleaseSysCache(tup);
+
+	heap_close(relation, RowExclusiveLock);
+	heap_close(rel, NoLock);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 25fd051..c6f8eb4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4448,6 +4448,19 @@ _copyDropSubscriptionStmt(const DropSubscriptionStmt *from)
 	return newnode;
 }
 
+static CreateStatsStmt *
+_copyCreateStatsStmt(const CreateStatsStmt *from)
+{
+	CreateStatsStmt *newnode = makeNode(CreateStatsStmt);
+
+	COPY_NODE_FIELD(defnames);
+	COPY_NODE_FIELD(relation);
+	COPY_NODE_FIELD(keys);
+	COPY_SCALAR_FIELD(if_not_exists);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -5386,6 +5399,9 @@ copyObject(const void *from)
 		case T_CommonTableExpr:
 			retval = _copyCommonTableExpr(from);
 			break;
+		case T_CreateStatsStmt:
+			retval = _copyCreateStatsStmt(from);
+			break;
 		case T_ObjectWithArgs:
 			retval = _copyObjectWithArgs(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7418fbe..953e6e2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2266,6 +2266,18 @@ _outForeignKeyOptInfo(StringInfo str, const ForeignKeyOptInfo *node)
 }
 
 static void
+_outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
+{
+	WRITE_NODE_TYPE("STATISTICEXTINFO");
+
+	/* NB: this isn't a complete set of fields */
+	WRITE_OID_FIELD(statOid);
+
+	/* built/available statistics */
+	WRITE_BOOL_FIELD(ndist_built);
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -3915,6 +3927,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PlannerParamItem:
 				_outPlannerParamItem(str, obj);
 				break;
+			case T_StatisticExtInfo:
+				_outStatisticExtInfo(str, obj);
+				break;
 
 			case T_ExtensibleNode:
 				_outExtensibleNode(str, obj);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 463f806..d90f199 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -29,6 +29,7 @@
 #include "catalog/heap.h"
 #include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_statistic_ext.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -40,8 +41,11 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "statistics/stats.h"
 #include "storage/bufmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -63,7 +67,7 @@ static List *get_relation_constraints(PlannerInfo *root,
 						 bool include_notnull);
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 				  Relation heapRelation);
-
+static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
 
 /*
  * get_relation_info -
@@ -398,6 +402,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	rel->statlist = get_relation_statistics(rel, relation);
+
 	/* Grab foreign-table info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
@@ -1251,6 +1257,64 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * get_relation_statistics
+ *
+ * Retrieve extended statistics defined on the table.
+ *
+ * Returns a List (possibly empty) of StatisticExtInfo objects describing
+ * the statistics.  Only attributes needed for selecting statistics are
+ * retrieved (columns covered by the statistics, etc.).
+ */
+static List *
+get_relation_statistics(RelOptInfo *rel, Relation relation)
+{
+	List	   *statoidlist;
+	ListCell   *l;
+	List	   *stainfos = NIL;
+
+	statoidlist = RelationGetStatExtList(relation);
+
+	foreach(l, statoidlist)
+	{
+		ArrayType  *arr;
+		Datum		adatum;
+		bool		isnull;
+		Oid			statOid = lfirst_oid(l);
+
+		HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
+
+		/* unavailable stats are not interesting for the planner */
+		if (stats_are_built(htup, STATS_EXT_NDISTINCT))
+		{
+			StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+			info->statOid = statOid;
+			info->rel = rel;
+
+			/* built/available statistics */
+			info->ndist_built = true;
+
+			/* decode the stakeys array */
+			adatum = SysCacheGetAttr(STATEXTOID, htup,
+									 Anum_pg_statistic_ext_stakeys, &isnull);
+			Assert(!isnull);
+
+			arr = DatumGetArrayTypeP(adatum);
+
+			info->stakeys = buildint2vector((int16 *) ARR_DATA_PTR(arr),
+											ARR_DIMS(arr)[0]);
+
+			stainfos = lcons(info, stainfos);
+		}
+
+		ReleaseSysCache(htup);
+	}
+
+	list_free(statoidlist);
+
+	return stainfos;
+}
 
 /*
  * relation_excluded_by_constraints
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6316688..53c6d61 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -257,7 +257,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
 		CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
 		CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
-		CreateSchemaStmt CreateSeqStmt CreateStmt CreateTableSpaceStmt
+		CreateSchemaStmt CreateSeqStmt CreateStmt CreateStatsStmt CreateTableSpaceStmt
 		CreateFdwStmt CreateForeignServerStmt CreateForeignTableStmt
 		CreateAssertStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
@@ -873,6 +873,7 @@ stmt :
 			| CreateSeqStmt
 			| CreateStmt
 			| CreateSubscriptionStmt
+			| CreateStatsStmt
 			| CreateTableSpaceStmt
 			| CreateTransformStmt
 			| CreateTrigStmt
@@ -3746,6 +3747,34 @@ OptConsTableSpace:   USING INDEX TABLESPACE name	{ $$ = $4; }
 ExistingIndex:   USING INDEX index_name				{ $$ = $3; }
 		;
 
+/*****************************************************************************
+ *
+ *		QUERY :
+ *				CREATE STATISTICS stats_name ON relname (columns) WITH (options)
+ *
+ *****************************************************************************/
+
+
+CreateStatsStmt:	CREATE STATISTICS any_name ON '(' columnList ')' FROM qualified_name
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $3;
+							n->relation = $9;
+							n->keys = $6;
+							n->if_not_exists = false;
+							$$ = (Node *)n;
+						}
+					| CREATE STATISTICS IF_P NOT EXISTS any_name ON '(' columnList ')' FROM qualified_name
+						{
+							CreateStatsStmt *n = makeNode(CreateStatsStmt);
+							n->defnames = $6;
+							n->relation = $12;
+							n->keys = $9;
+							n->if_not_exists = true;
+							$$ = (Node *)n;
+						}
+			;
+
 
 /*****************************************************************************
  *
@@ -6018,6 +6047,7 @@ drop_type_any_name:
 			| FOREIGN TABLE							{ $$ = OBJECT_FOREIGN_TABLE; }
 			| COLLATION								{ $$ = OBJECT_COLLATION; }
 			| CONVERSION_P							{ $$ = OBJECT_CONVERSION; }
+			| STATISTICS							{ $$ = OBJECT_STATISTIC_EXT; }
 			| TEXT_P SEARCH PARSER					{ $$ = OBJECT_TSPARSER; }
 			| TEXT_P SEARCH DICTIONARY				{ $$ = OBJECT_TSDICTIONARY; }
 			| TEXT_P SEARCH TEMPLATE				{ $$ = OBJECT_TSTEMPLATE; }
@@ -8404,6 +8434,15 @@ RenameStmt: ALTER AGGREGATE aggregate_with_argtypes RENAME TO name
 					n->missing_ok = false;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name RENAME TO name
+				{
+					RenameStmt *n = makeNode(RenameStmt);
+					n->renameType = OBJECT_STATISTIC_EXT;
+					n->object = (Node *) $3;
+					n->newname = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 			| ALTER TEXT_P SEARCH PARSER any_name RENAME TO name
 				{
 					RenameStmt *n = makeNode(RenameStmt);
@@ -8619,6 +8658,15 @@ AlterObjectSchemaStmt:
 					n->missing_ok = true;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name SET SCHEMA name
+				{
+					AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
+					n->objectType = OBJECT_STATISTIC_EXT;
+					n->object = (Node *) $3;
+					n->newschema = $6;
+					n->missing_ok = false;
+					$$ = (Node *)n;
+				}
 			| ALTER TEXT_P SEARCH PARSER any_name SET SCHEMA name
 				{
 					AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
@@ -8882,6 +8930,14 @@ AlterOwnerStmt: ALTER AGGREGATE aggregate_with_argtypes OWNER TO RoleSpec
 					n->newowner = $6;
 					$$ = (Node *)n;
 				}
+			| ALTER STATISTICS any_name OWNER TO RoleSpec
+				{
+					AlterOwnerStmt *n = makeNode(AlterOwnerStmt);
+					n->objectType = OBJECT_STATISTIC_EXT;
+					n->object = (Node *) $3;
+					n->newowner = $6;
+					$$ = (Node *)n;
+				}
 			| ALTER TEXT_P SEARCH DICTIONARY any_name OWNER TO RoleSpec
 				{
 					AlterOwnerStmt *n = makeNode(AlterOwnerStmt);
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
new file mode 100644
index 0000000..e77b350
--- /dev/null
+++ b/src/backend/statistics/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for statistics
+#
+# IDENTIFICATION
+#    src/backend/statistics/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/statistics
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = common.o mvdist.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statistics/README b/src/backend/statistics/README
new file mode 100644
index 0000000..beb7c24
--- /dev/null
+++ b/src/backend/statistics/README
@@ -0,0 +1,34 @@
+Extended statistics
+===================
+
+When estimating various quantities (e.g. condition selectivities) the default
+approach relies on the assumption of independence. In practice that's often
+not true, resulting in estimation errors.
+
+Extended statistics track different types of dependencies between the columns,
+hopefully improving the estimates and producing better plans.
+
+Currently we only have one type of extended statistics - ndistinct
+coefficients, and we use it to improve estimates of grouping queries. See
+README.ndistinct for details.
+
+
+Size of sample in ANALYZE
+-------------------------
+When performing ANALYZE, the number of rows to sample is determined as
+
+    (300 * statistics_target)
+
+That works reasonably well for statistics on individual columns, but perhaps
+it's not enough for extended statistics. Papers analyzing estimation errors
+all use samples proportional to the table (usually finding that 1-3% of the
+table is enough to build accurate stats).
+
+The requested accuracy (number of MCV items or histogram bins) should also
+be considered when determining the sample size, and in extended statistics
+those are not necessarily limited by statistics_target.
+
+This however merits further discussion, because collecting the sample is quite
+expensive and increasing it further would make ANALYZE even more painful.
+Judging by the experiments with the current implementation, the fixed size
+seems to work reasonably well for now, so we leave this as a future work.
diff --git a/src/backend/statistics/README.ndistinct b/src/backend/statistics/README.ndistinct
new file mode 100644
index 0000000..9365b17
--- /dev/null
+++ b/src/backend/statistics/README.ndistinct
@@ -0,0 +1,22 @@
+ndistinct coefficients
+======================
+
+Estimating number of groups in a combination of columns (e.g. for GROUP BY)
+is tricky, and the estimation error is often significant.
+
+The ndistinct coefficients address this by storing ndistinct estimates not
+only for individual columns, but also for (all) combinations of columns.
+So for example given three columns (a,b,c) the statistics will estimate
+ndistinct for (a,b), (a,c), (b,c) and (a,b,c). The per-column estimates
+are already available in pg_statistic.
+
+
+GROUP BY estimation (estimate_num_groups)
+-----------------------------------------
+
+Although ndistinct coefficient might be used for selectivity estimation
+(of equality conditions in WHERE clause), that is not implemented at this
+point.
+
+Instead, ndistinct coefficients are only used in estimate_num_groups() to
+estimate grouped queries.
diff --git a/src/backend/statistics/common.c b/src/backend/statistics/common.c
new file mode 100644
index 0000000..f63d8cc
--- /dev/null
+++ b/src/backend/statistics/common.c
@@ -0,0 +1,454 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.c
+ *	  POSTGRES extended statistics
+ *
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/statistics/common.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_statistic_ext.h"
+#include "nodes/relation.h"
+#include "statistics/common.h"
+#include "statistics/stats.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+
+static VacAttrStats **lookup_var_attr_stats(int2vector *attrs,
+					  int natts, VacAttrStats **vacattrstats);
+
+static List *list_ext_stats(Oid relid);
+
+static void update_ext_stats(Oid relid, MVNDistinct ndistinct,
+				 int2vector *attrs, VacAttrStats **stats);
+
+
+/*
+ * Compute requested extended stats, using the rows sampled for the plain
+ * (single-column) stats.
+ *
+ * This fetches a list of stats from pg_statistic_ext, computes the stats
+ * and serializes them back into the catalog (as bytea values).
+ */
+void
+build_ext_stats(Relation onerel, double totalrows,
+				int numrows, HeapTuple *rows,
+				int natts, VacAttrStats **vacattrstats)
+{
+	ListCell   *lc;
+	List	   *stats;
+
+	TupleDesc	tupdesc = RelationGetDescr(onerel);
+
+	/* Fetch defined statistics from pg_statistic_ext, and compute them. */
+	stats = list_ext_stats(RelationGetRelid(onerel));
+
+	foreach(lc, stats)
+	{
+		int			j;
+		StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(lc);
+		MVNDistinct ndistinct = NULL;
+
+		VacAttrStats **stats = NULL;
+		int			numatts = 0;
+
+		/* int2 vector of attnums the stats should be computed on */
+		int2vector *attrs = stat->stakeys;
+
+		/* see how many of the columns are not dropped */
+		for (j = 0; j < attrs->dim1; j++)
+			if (!tupdesc->attrs[attrs->values[j] - 1]->attisdropped)
+				numatts += 1;
+
+		/* if there are dropped attributes, build a filtered int2vector */
+		if (numatts != attrs->dim1)
+		{
+			int16	   *tmp = palloc0(numatts * sizeof(int16));
+			int			attnum = 0;
+
+			for (j = 0; j < attrs->dim1; j++)
+				if (!tupdesc->attrs[attrs->values[j] - 1]->attisdropped)
+					tmp[attnum++] = attrs->values[j];
+
+			pfree(attrs);
+			attrs = buildint2vector(tmp, numatts);
+		}
+
+		/* filter only the interesting vacattrstats records */
+		stats = lookup_var_attr_stats(attrs, natts, vacattrstats);
+
+		/* check allowed number of dimensions */
+		Assert((attrs->dim1 >= 2) && (attrs->dim1 <= STATS_MAX_DIMENSIONS));
+
+		/* compute ndistinct coefficients */
+		if (stat->ndist_enabled)
+			ndistinct = build_ext_ndistinct(totalrows, numrows, rows, attrs, stats);
+
+		/* store the statistics in the catalog */
+		update_ext_stats(stat->statOid, ndistinct, attrs, stats);
+	}
+}
+
+/*
+ * Lookup the VacAttrStats info for the selected columns, with indexes
+ * matching the attrs vector (to make it easy to work with when
+ * computing extended stats).
+ */
+static VacAttrStats **
+lookup_var_attr_stats(int2vector *attrs, int natts, VacAttrStats **vacattrstats)
+{
+	int			i,
+				j;
+	int			numattrs = attrs->dim1;
+	VacAttrStats **stats = (VacAttrStats **) palloc0(numattrs * sizeof(VacAttrStats *));
+
+	/* lookup VacAttrStats info for the requested columns (same attnum) */
+	for (i = 0; i < numattrs; i++)
+	{
+		stats[i] = NULL;
+		for (j = 0; j < natts; j++)
+		{
+			if (attrs->values[i] == vacattrstats[j]->tupattnum)
+			{
+				stats[i] = vacattrstats[j];
+				break;
+			}
+		}
+
+		/*
+		 * Check that we found the info, that the attnum matches and that
+		 * there's the requested 'lt' operator and that the type is
+		 * 'passed-by-value'.
+		 */
+		Assert(stats[i] != NULL);
+		Assert(stats[i]->tupattnum == attrs->values[i]);
+
+		/*
+		 * FIXME This is rather ugly way to check for 'ltopr' (which is
+		 * defined for 'scalar' attributes).
+		 */
+		Assert(((StdAnalyzeData *) stats[i]->extra_data)->ltopr != InvalidOid);
+	}
+
+	return stats;
+}
+
+/*
+ * Fetch list of MV stats defined on a table, without the actual data
+ * for histograms, MCV lists etc.
+ */
+static List *
+list_ext_stats(Oid relid)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result = NIL;
+
+	/*
+	 * Prepare to scan pg_statistic_ext for entries having indrelid = this
+	 * rel.
+	 */
+	ScanKeyInit(&skey,
+				Anum_pg_statistic_ext_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	indrel = heap_open(StatisticExtRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, StatisticExtRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+	{
+		StatisticExtInfo *info = makeNode(StatisticExtInfo);
+		Form_pg_statistic_ext stats = (Form_pg_statistic_ext) GETSTRUCT(htup);
+
+		info->statOid = HeapTupleGetOid(htup);
+		info->stakeys = buildint2vector(stats->stakeys.values, stats->stakeys.dim1);
+
+		info->ndist_enabled = stats_are_enabled(htup, STATS_EXT_NDISTINCT);
+		info->ndist_built = stats_are_built(htup, STATS_EXT_NDISTINCT);
+
+		result = lappend(result, info);
+	}
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/*
+	 * TODO maybe save the list into relcache, as in RelationGetIndexList
+	 * (which was used as an inspiration of this one)?.
+	 */
+
+	return result;
+}
+
+/*
+ * update_ext_stats
+ *	Serializes the statistics and stores them into the pg_statistic_ext tuple.
+ */
+static void
+update_ext_stats(Oid statOid, MVNDistinct ndistinct,
+				 int2vector *attrs, VacAttrStats **stats)
+{
+	HeapTuple	stup,
+				oldtup;
+	Datum		values[Natts_pg_statistic_ext];
+	bool		nulls[Natts_pg_statistic_ext];
+	bool		replaces[Natts_pg_statistic_ext];
+
+	Relation	sd = heap_open(StatisticExtRelationId, RowExclusiveLock);
+
+	memset(nulls, 1, Natts_pg_statistic_ext * sizeof(bool));
+	memset(replaces, 0, Natts_pg_statistic_ext * sizeof(bool));
+	memset(values, 0, Natts_pg_statistic_ext * sizeof(Datum));
+
+	/*
+	 * Construct a new pg_statistic_ext tuple - replace only the histogram and
+	 * MCV list, depending whether it actually was computed.
+	 */
+	if (ndistinct != NULL)
+	{
+		bytea	   *data = serialize_ext_ndistinct(ndistinct);
+
+		nulls[Anum_pg_statistic_ext_standistinct - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_standistinct - 1] = PointerGetDatum(data);
+	}
+
+	/* always replace the value (either by bytea or NULL) */
+	replaces[Anum_pg_statistic_ext_standistinct - 1] = true;
+
+	/* always change the availability flags */
+	nulls[Anum_pg_statistic_ext_stakeys - 1] = false;
+
+	/* use the new attnums, in case we removed some dropped ones */
+	replaces[Anum_pg_statistic_ext_stakeys - 1] = true;
+
+	values[Anum_pg_statistic_ext_stakeys - 1] = PointerGetDatum(attrs);
+
+	/* Is there already a pg_statistic_ext tuple for this attribute? */
+	oldtup = SearchSysCache1(STATEXTOID,
+							 ObjectIdGetDatum(statOid));
+
+	if (!HeapTupleIsValid(oldtup))
+		elog(ERROR, "cache lookup failed for extended statistics %u", statOid);
+
+	/* replace it */
+	stup = heap_modify_tuple(oldtup,
+							 RelationGetDescr(sd),
+							 values,
+							 nulls,
+							 replaces);
+	ReleaseSysCache(oldtup);
+	CatalogTupleUpdate(sd, &stup->t_self, stup);
+
+	heap_freetuple(stup);
+	heap_close(sd, RowExclusiveLock);
+}
+
+/* multi-variate stats comparator */
+
+/*
+ * qsort_arg comparator for sorting Datums (MV stats)
+ *
+ * This does not maintain the tupnoLink array.
+ */
+int
+compare_scalars_simple(const void *a, const void *b, void *arg)
+{
+	Datum		da = *(Datum *) a;
+	Datum		db = *(Datum *) b;
+	SortSupport ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/*
+ * qsort_arg comparator for sorting data when partitioning a MV bucket
+ */
+int
+compare_scalars_partition(const void *a, const void *b, void *arg)
+{
+	Datum		da = ((ScalarItem *) a)->value;
+	Datum		db = ((ScalarItem *) b)->value;
+	SortSupport ssup = (SortSupport) arg;
+
+	return ApplySortComparator(da, false, db, false, ssup);
+}
+
+/* initialize multi-dimensional sort */
+MultiSortSupport
+multi_sort_init(int ndims)
+{
+	MultiSortSupport mss;
+
+	Assert(ndims >= 2);
+
+	mss = (MultiSortSupport) palloc0(offsetof(MultiSortSupportData, ssup)
+									 +sizeof(SortSupportData) * ndims);
+
+	mss->ndims = ndims;
+
+	return mss;
+}
+
+/*
+ * Prepare sort support info for dimension 'dim' (index into vacattrstats) to
+ * 'mss', at the position 'sortdim'
+ */
+void
+multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 int dim, VacAttrStats **vacattrstats)
+{
+	/* first, lookup StdAnalyzeData for the dimension (attribute) */
+	SortSupportData ssup;
+	StdAnalyzeData *tmp = (StdAnalyzeData *) vacattrstats[dim]->extra_data;
+
+	Assert(mss != NULL);
+	Assert(sortdim < mss->ndims);
+
+	/* initialize sort support, etc. */
+	memset(&ssup, 0, sizeof(ssup));
+	ssup.ssup_cxt = CurrentMemoryContext;
+
+	/* We always use the default collation for statistics */
+	ssup.ssup_collation = DEFAULT_COLLATION_OID;
+	ssup.ssup_nulls_first = false;
+
+	PrepareSortSupportFromOrderingOp(tmp->ltopr, &ssup);
+
+	mss->ssup[sortdim] = ssup;
+}
+
+/* compare all the dimensions in the selected order */
+int
+multi_sort_compare(const void *a, const void *b, void *arg)
+{
+	int			i;
+	SortItem   *ia = (SortItem *) a;
+	SortItem   *ib = (SortItem *) b;
+
+	MultiSortSupport mss = (MultiSortSupport) arg;
+
+	for (i = 0; i < mss->ndims; i++)
+	{
+		int			compare;
+
+		compare = ApplySortComparator(ia->values[i], ia->isnull[i],
+									  ib->values[i], ib->isnull[i],
+									  &mss->ssup[i]);
+
+		if (compare != 0)
+			return compare;
+	}
+
+	/* equal by default */
+	return 0;
+}
+
+/* compare selected dimension */
+int
+multi_sort_compare_dim(int dim, const SortItem *a, const SortItem *b,
+					   MultiSortSupport mss)
+{
+	return ApplySortComparator(a->values[dim], a->isnull[dim],
+							   b->values[dim], b->isnull[dim],
+							   &mss->ssup[dim]);
+}
+
+int
+multi_sort_compare_dims(int start, int end,
+						const SortItem *a, const SortItem *b,
+						MultiSortSupport mss)
+{
+	int			dim;
+
+	for (dim = start; dim <= end; dim++)
+	{
+		int			r = ApplySortComparator(a->values[dim], a->isnull[dim],
+											b->values[dim], b->isnull[dim],
+											&mss->ssup[dim]);
+
+		if (r != 0)
+			return r;
+	}
+
+	return 0;
+}
+
+bool
+stats_are_enabled(HeapTuple htup, char type)
+{
+	Datum		datum;
+	bool		isnull;
+	int			i,
+				nenabled;
+	char	   *enabled;
+	ArrayType  *enabledArray;
+
+	/* see which statistics are enabled */
+	datum = SysCacheGetAttr(STATEXTOID, htup,
+							Anum_pg_statistic_ext_staenabled, &isnull);
+
+	/* if there are no values in staenabled field, everything is enabled */
+	if (isnull || (datum == PointerGetDatum(NULL)))
+		return false;
+
+	/*
+	 * We expect the array to be a 1-D CHAR array; verify that. We don't need
+	 * to use deconstruct_array() since the array data is just going to look
+	 * like a C array of char values.
+	 */
+	enabledArray = DatumGetArrayTypeP(datum);
+
+	if (ARR_NDIM(enabledArray) != 1 ||
+		ARR_HASNULL(enabledArray) ||
+		ARR_ELEMTYPE(enabledArray) != CHAROID)
+		elog(ERROR, "enabled statistics (staenabled) is not a 1-D char array");
+
+	nenabled = ARR_DIMS(enabledArray)[0];
+	enabled = (char *) ARR_DATA_PTR(enabledArray);
+
+	for (i = 0; i < nenabled; i++)
+		if (enabled[i] == type)
+			return true;
+
+	return false;
+}
+
+bool
+stats_are_built(HeapTuple htup, char type)
+{
+	bool		isnull;
+
+	switch (type)
+	{
+		case STATS_EXT_NDISTINCT:
+			SysCacheGetAttr(STATEXTOID, htup,
+							Anum_pg_statistic_ext_standistinct, &isnull);
+			break;
+
+		default:
+			elog(ERROR, "unexpected statistics type requested: %d", type);
+	}
+
+	return !isnull;
+}
diff --git a/src/backend/statistics/mvdist.c b/src/backend/statistics/mvdist.c
new file mode 100644
index 0000000..8f318da
--- /dev/null
+++ b/src/backend/statistics/mvdist.c
@@ -0,0 +1,621 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvdist.c
+ *	  POSTGRES multivariate ndistinct coefficients
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/statistics/mvdist.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/htup_details.h"
+#include "catalog/pg_statistic_ext.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
+#include "lib/stringinfo.h"
+#include "utils/syscache.h"
+#include "statistics/common.h"
+#include "statistics/stats.h"
+
+
+static double estimate_ndistinct(double totalrows, int numrows, int d, int f1);
+
+/* internal state for generator of k-combinations of n elements */
+typedef struct CombinationGeneratorData
+{
+
+	int			k;				/* size of the combination */
+	int			current;		/* index of the next combination to return */
+
+	int			ncombinations;	/* number of combinations (size of array) */
+	AttrNumber *combinations;	/* array of pre-built combinations */
+
+} CombinationGeneratorData;
+
+typedef CombinationGeneratorData *CombinationGenerator;
+
+/* generator API */
+static CombinationGenerator generator_init(int2vector *attrs, int k);
+static void generator_free(CombinationGenerator state);
+static AttrNumber *generator_next(CombinationGenerator state, int2vector *attrs);
+
+static int	n_choose_k(int n, int k);
+static int	num_combinations(int n);
+static double ndistinct_for_combination(double totalrows, int numrows,
+					HeapTuple *rows, int2vector *attrs, VacAttrStats **stats,
+						  int k, AttrNumber *combination);
+
+/*
+ * Compute ndistinct coefficient for the combination of attributes. This
+ * computes the ndistinct estimate using the same estimator used in analyze.c
+ * and then computes the coefficient.
+ */
+MVNDistinct
+build_ext_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+					int2vector *attrs, VacAttrStats **stats)
+{
+	int			i,
+				k;
+	int			numattrs = attrs->dim1;
+	int			numcombs = num_combinations(numattrs);
+
+	MVNDistinct result;
+
+	result = palloc0(offsetof(MVNDistinctData, items) +
+					 numcombs * sizeof(MVNDistinctItem));
+
+	result->nitems = numcombs;
+
+	i = 0;
+	for (k = 2; k <= numattrs; k++)
+	{
+		AttrNumber *combination;
+		CombinationGenerator generator;
+
+		generator = generator_init(attrs, k);
+
+		while ((combination = generator_next(generator, attrs)))
+		{
+			MVNDistinctItem *item = &result->items[i++];
+
+			item->nattrs = k;
+			item->ndistinct = ndistinct_for_combination(totalrows, numrows, rows,
+											   attrs, stats, k, combination);
+
+			item->attrs = palloc(k * sizeof(AttrNumber));
+			memcpy(item->attrs, combination, k * sizeof(AttrNumber));
+
+			/* must not overflow the output array */
+			Assert(i <= result->nitems);
+		}
+
+		generator_free(generator);
+	}
+
+	/* must consume exactly the whole output array */
+	Assert(i == result->nitems);
+
+	return result;
+}
+
+/*
+ * ndistinct_for_combination
+ *	Estimates number of distinct values in a combination of columns.
+ *
+ * This uses the same ndistinct estimator as compute_scalar_stats() in
+ * ANALYZE, i.e.,
+ *		n*d / (n - f1 + f1*n/N)
+ *
+ * except that instead of values in a single column we are dealing with
+ * combination of multiple columns.
+ */
+static double
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+						  int2vector *attrs, VacAttrStats **stats,
+						  int k, AttrNumber *combination)
+{
+	int			i,
+				j;
+	int			f1,
+				cnt,
+				d;
+	int			nmultiple,
+				summultiple;
+	bool	   *isnull;
+	Datum	   *values;
+	SortItem   *items;
+	MultiSortSupport mss;
+
+	/*
+	 * It's possible to sort the sample rows directly, but this seemed somehow
+	 * simpler / less error prone. Another option would be to allocate the
+	 * arrays for each SortItem separately, but that'd be significant overhead
+	 * (not just CPU, but especially memory bloat).
+	 */
+	mss = multi_sort_init(k);
+	items = (SortItem *) palloc0(numrows * sizeof(SortItem));
+	values = (Datum *) palloc0(sizeof(Datum) * numrows * k);
+	isnull = (bool *) palloc0(sizeof(bool) * numrows * k);
+
+	Assert((k >= 2) && (k <= attrs->dim1));
+
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	for (i = 0; i < k; i++)
+	{
+		/* prepare the sort function for the first dimension */
+		multi_sort_add_dimension(mss, i, combination[i], stats);
+
+		/* accumulate all the data into the array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i] =
+				heap_getattr(rows[j], attrs->values[combination[i]],
+							 stats[combination[i]]->tupDesc,
+							 &items[j].isnull[i]);
+		}
+	}
+
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/* count number of distinct combinations */
+
+	f1 = 0;
+	cnt = 1;
+	d = 1;
+	for (i = 1; i < numrows; i++)
+	{
+		if (multi_sort_compare(&items[i], &items[i - 1], mss) != 0)
+		{
+			if (cnt == 1)
+				f1 += 1;
+			else
+			{
+				nmultiple += 1;
+				summultiple += cnt;
+			}
+
+			d++;
+			cnt = 0;
+		}
+
+		cnt += 1;
+	}
+
+	if (cnt == 1)
+		f1 += 1;
+	else
+	{
+		nmultiple += 1;
+		summultiple += cnt;
+	}
+
+	return estimate_ndistinct(totalrows, numrows, d, f1);
+}
+
+MVNDistinct
+load_ext_ndistinct(Oid mvoid)
+{
+	bool		isnull = false;
+	Datum		ndist;
+
+	/*
+	 * Prepare to scan pg_statistic_ext for entries having indrelid = this
+	 * rel.
+	 */
+	HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+
+	Assert(stats_are_enabled(htup, STATS_EXT_NDISTINCT));
+	Assert(stats_are_built(htup, STATS_EXT_NDISTINCT));
+
+	ndist = SysCacheGetAttr(STATEXTOID, htup,
+							Anum_pg_statistic_ext_standistinct, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return deserialize_ext_ndistinct(DatumGetByteaP(ndist));
+}
+
+/* The Duj1 estimator (already used in analyze.c). */
+static double
+estimate_ndistinct(double totalrows, int numrows, int d, int f1)
+{
+	double		numer,
+				denom,
+				ndistinct;
+
+	numer = (double) numrows *(double) d;
+
+	denom = (double) (numrows - f1) +
+		(double) f1 *(double) numrows / totalrows;
+
+	ndistinct = numer / denom;
+
+	/* Clamp to sane range in case of roundoff error */
+	if (ndistinct < (double) d)
+		ndistinct = (double) d;
+
+	if (ndistinct > totalrows)
+		ndistinct = totalrows;
+
+	return floor(ndistinct + 0.5);
+}
+
+/*
+ * pg_ndistinct_in		- input routine for type pg_ndistinct.
+ *
+ * pg_ndistinct is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ *
+ * XXX This is inspired by what pg_node_tree does.
+ */
+Datum
+pg_ndistinct_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_ndistinct")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_ndistinct		- output routine for type pg_ndistinct.
+ *
+ * histograms are serialized into a bytea value, so we simply call byteaout()
+ * to serialize the value into text. But it'd be nice to serialize that into
+ * a meaningful representation (e.g. for inspection by people).
+ */
+Datum
+pg_ndistinct_out(PG_FUNCTION_ARGS)
+{
+	int			i,
+				j;
+	StringInfoData str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVNDistinct ndist = deserialize_ext_ndistinct(data);
+
+	initStringInfo(&str);
+	appendStringInfoChar(&str, '[');
+
+	for (i = 0; i < ndist->nitems; i++)
+	{
+		MVNDistinctItem item = ndist->items[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoChar(&str, '{');
+
+		for (j = 0; j < item.nattrs; j++)
+		{
+			if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", item.attrs[j]);
+		}
+
+		appendStringInfo(&str, ", %f", item.ndistinct);
+
+		appendStringInfoChar(&str, '}');
+	}
+
+	appendStringInfoChar(&str, ']');
+
+	PG_RETURN_CSTRING(str.data);
+}
+
+/*
+ * pg_ndistinct_recv		- binary input routine for type pg_ndistinct.
+ */
+Datum
+pg_ndistinct_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_ndistinct")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_ndistinct_send		- binary output routine for type pg_ndistinct.
+ *
+ * XXX Histograms are serialized into a bytea value, so let's just send that.
+ */
+Datum
+pg_ndistinct_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+/*
+ * n_choose_k
+ *		computes binomial coefficients using an algorithm that is both
+ *		efficient and prevents overflows
+ */
+static int
+n_choose_k(int n, int k)
+{
+	int			d,
+				r;
+
+	Assert((k > 0) && (n >= k));
+
+	/* use symmetry of the binomial coefficients */
+	k = Min(k, n - k);
+
+	r = 1;
+	for (d = 1; d <= k; ++d)
+	{
+		r *= n--;
+		r /= d;
+	}
+
+	return r;
+}
+
+/*
+ * num_combinations
+ *		computes number of combinations, excluding single-value combinations
+ */
+static int
+num_combinations(int n)
+{
+	int			k;
+	int			ncombs = 1;
+
+	for (k = 1; k <= n; k++)
+		ncombs *= 2;
+
+	ncombs -= (n + 1);
+
+	return ncombs;
+}
+
+/*
+ * generate all combinations (k elements from n)
+ */
+static void
+generate_combinations_recurse(CombinationGenerator state, AttrNumber n,
+							int index, AttrNumber start, AttrNumber *current)
+{
+	/* If we haven't filled all the elements, simply recurse. */
+	if (index < state->k)
+	{
+		AttrNumber	i;
+
+		/*
+		 * The values have to be in ascending order, so make sure we start
+		 * with the value passed by parameter.
+		 */
+
+		for (i = start; i < n; i++)
+		{
+			current[index] = i;
+			generate_combinations_recurse(state, n, (index + 1), (i + 1), current);
+		}
+
+		return;
+	}
+	else
+	{
+		/* we got a correct combination */
+		state->combinations = (AttrNumber *) repalloc(state->combinations,
+					   state->k * (state->current + 1) * sizeof(AttrNumber));
+		memcpy(&state->combinations[(state->k * state->current)],
+			   current, state->k * sizeof(AttrNumber));
+		state->current++;
+	}
+}
+
+/* generate all k-combinations of n elements */
+static void
+generate_combinations(CombinationGenerator state, int n)
+{
+	AttrNumber *current = (AttrNumber *) palloc0(sizeof(AttrNumber) * state->k);
+
+	generate_combinations_recurse(state, n, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the generator of combinations, and prebuild them.
+ *
+ * This pre-builds all the combinations. We could also generate them in
+ * generator_next(), but this seems simpler.
+ */
+static CombinationGenerator
+generator_init(int2vector *attrs, int k)
+{
+	int			n = attrs->dim1;
+	CombinationGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the generator state as a single chunk of memory */
+	state = (CombinationGenerator) palloc0(sizeof(CombinationGeneratorData));
+	state->combinations = (AttrNumber *) palloc(k * sizeof(AttrNumber));
+
+	state->ncombinations = n_choose_k(n, k);
+	state->current = 0;
+	state->k = k;
+
+	/* now actually pre-generate all the combinations */
+	generate_combinations(state, n);
+
+	/* make sure we got the expected number of combinations */
+	Assert(state->current == state->ncombinations);
+
+	/* reset the number, so we start with the first one */
+	state->current = 0;
+
+	return state;
+}
+
+/* free the generator state */
+static void
+generator_free(CombinationGenerator state)
+{
+	/* we've allocated a single chunk, so just free it */
+	pfree(state);
+}
+
+/* generate next combination */
+static AttrNumber *
+generator_next(CombinationGenerator state, int2vector *attrs)
+{
+	if (state->current == state->ncombinations)
+		return NULL;
+
+	return &state->combinations[state->k * state->current++];
+}
+
+/*
+ * serialize list of ndistinct items into a bytea
+ */
+bytea *
+serialize_ext_ndistinct(MVNDistinct ndistinct)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+
+	/* we need to store nitems */
+	Size		len = VARHDRSZ + offsetof(MVNDistinctData, items) +
+	ndistinct->nitems * offsetof(MVNDistinctItem, attrs);
+
+	/* and also include space for the actual attribute numbers */
+	for (i = 0; i < ndistinct->nitems; i++)
+		len += (sizeof(AttrNumber) * ndistinct->items[i].nattrs);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	ndistinct->magic = STATS_NDISTINCT_MAGIC;
+	ndistinct->type = STATS_NDISTINCT_TYPE_BASIC;
+
+	/* first, store the number of items */
+	memcpy(tmp, ndistinct, offsetof(MVNDistinctData, items));
+	tmp += offsetof(MVNDistinctData, items);
+
+	/*
+	 * store number of attributes and attribute numbers for each ndistinct
+	 * entry
+	 */
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		MVNDistinctItem item = ndistinct->items[i];
+
+		memcpy(tmp, &item, offsetof(MVNDistinctItem, attrs));
+		tmp += offsetof(MVNDistinctItem, attrs);
+
+		memcpy(tmp, item.attrs, sizeof(AttrNumber) * item.nattrs);
+		tmp += sizeof(AttrNumber) * item.nattrs;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized ndistinct into MVNDistinct structure.
+ */
+MVNDistinct
+deserialize_ext_ndistinct(bytea *data)
+{
+	int			i;
+	Size		expected_size;
+	MVNDistinct ndistinct;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVNDistinctData, items))
+		elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), offsetof(MVNDistinctData, items));
+
+	/* read the MVNDistinct header */
+	ndistinct = (MVNDistinct) palloc0(sizeof(MVNDistinctData));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* get the header and perform basic sanity checks */
+	memcpy(ndistinct, tmp, offsetof(MVNDistinctData, items));
+	tmp += offsetof(MVNDistinctData, items);
+
+	if (ndistinct->magic != STATS_NDISTINCT_MAGIC)
+		elog(ERROR, "invalid ndistinct magic %d (expected %d)",
+			 ndistinct->magic, STATS_NDISTINCT_MAGIC);
+
+	if (ndistinct->type != STATS_NDISTINCT_TYPE_BASIC)
+		elog(ERROR, "invalid ndistinct type %d (expected %d)",
+			 ndistinct->type, STATS_NDISTINCT_TYPE_BASIC);
+
+	Assert(ndistinct->nitems > 0);
+
+	/* what minimum bytea size do we expect for those parameters */
+	expected_size = offsetof(MVNDistinctData, items) +
+		ndistinct->nitems * (offsetof(MVNDistinctItem, attrs) +
+							 sizeof(AttrNumber) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), expected_size);
+
+	/* allocate space for the ndistinct items */
+	ndistinct = repalloc(ndistinct, offsetof(MVNDistinctData, items) +
+						 (ndistinct->nitems * sizeof(MVNDistinctItem)));
+
+	for (i = 0; i < ndistinct->nitems; i++)
+	{
+		MVNDistinctItem *item = &ndistinct->items[i];
+
+		/* number of attributes */
+		memcpy(item, tmp, offsetof(MVNDistinctItem, attrs));
+		tmp += offsetof(MVNDistinctItem, attrs);
+
+		/* is the number of attributes valid? */
+		Assert((item->nattrs >= 2) && (item->nattrs <= STATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the attribute */
+		item->attrs = (AttrNumber *) palloc0(item->nattrs * sizeof(AttrNumber));
+
+		/* copy attribute numbers */
+		memcpy(item->attrs, tmp, sizeof(AttrNumber) * item->nattrs);
+		tmp += sizeof(AttrNumber) * item->nattrs;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return ndistinct;
+}
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 20b5273..1a559a5 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1623,6 +1623,10 @@ ProcessUtilitySlow(ParseState *pstate,
 				commandCollected = true;
 				break;
 
+			case T_CreateStatsStmt:		/* CREATE STATISTICS */
+				address = CreateStatistics((CreateStatsStmt *) parsetree);
+				break;
+
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
@@ -1988,6 +1992,8 @@ AlterObjectTypeCommandTag(ObjectType objtype)
 			break;
 		case OBJECT_SUBSCRIPTION:
 			tag = "ALTER SUBSCRIPTION";
+		case OBJECT_STATISTIC_EXT:
+			tag = "ALTER STATISTICS";
 			break;
 		default:
 			tag = "???";
@@ -2282,6 +2288,8 @@ CreateCommandTag(Node *parsetree)
 					break;
 				case OBJECT_PUBLICATION:
 					tag = "DROP PUBLICATION";
+				case OBJECT_STATISTIC_EXT:
+					tag = "DROP STATISTICS";
 					break;
 				default:
 					tag = "???";
@@ -2681,6 +2689,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "EXECUTE";
 			break;
 
+		case T_CreateStatsStmt:
+			tag = "CREATE STATISTICS";
+			break;
+
 		case T_DeallocateStmt:
 			{
 				DeallocateStmt *stmt = (DeallocateStmt *) parsetree;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 04bd9b9..43c9021 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -126,6 +126,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
+#include "statistics/stats.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/date.h"
@@ -164,6 +165,8 @@ static double eqjoinsel_inner(Oid operator,
 static double eqjoinsel_semi(Oid operator,
 			   VariableStatData *vardata1, VariableStatData *vardata2,
 			   RelOptInfo *inner_rel);
+static double find_ndistinct(PlannerInfo *root, RelOptInfo *rel, List *varinfos,
+			   bool *found);
 static bool convert_to_scalar(Datum value, Oid valuetypid, double *scaledvalue,
 				  Datum lobound, Datum hibound, Oid boundstypid,
 				  double *scaledlobound, double *scaledhibound);
@@ -208,7 +211,6 @@ static Const *string_to_const(const char *str, Oid datatype);
 static Const *string_to_bytea_const(const char *str, size_t str_len);
 static List *add_predicate_to_quals(IndexOptInfo *index, List *indexQuals);
 
-
 /*
  *		eqsel			- Selectivity of "=" for any data types.
  *
@@ -3437,12 +3439,26 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 * don't know by how much.  We should never clamp to less than the
 			 * largest ndistinct value for any of the Vars, though, since
 			 * there will surely be at least that many groups.
+			 *
+			 * However we don't need to do this if we have ndistinct stats on
+			 * the columns - in that case we can simply use the coefficient to
+			 * get the (probably way more accurate) estimate.
+			 *
+			 * XXX Might benefit from some refactoring, mixing the ndistinct
+			 * coefficients and clamp seems a bit unfortunate.
 			 */
 			double		clamp = rel->tuples;
 
 			if (relvarcount > 1)
 			{
-				clamp *= 0.1;
+				bool		found;
+				double		ndist = find_ndistinct(root, rel, varinfos, &found);
+
+				if (found)
+					reldistinct = ndist;
+				else
+					clamp *= 0.1;
+
 				if (clamp < relmaxndistinct)
 				{
 					clamp = relmaxndistinct;
@@ -3451,6 +3467,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 						clamp = rel->tuples;
 				}
 			}
+
 			if (reldistinct > clamp)
 				reldistinct = clamp;
 
@@ -3506,7 +3523,6 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 			 */
 			numdistinct *= reldistinct;
 		}
-
 		varinfos = newvarinfos;
 	} while (varinfos != NIL);
 
@@ -3668,6 +3684,159 @@ estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey, double nbuckets)
  */
 
 /*
+ * Find applicable ndistinct statistics and compute the coefficient to
+ * correct the estimate (simply a product of per-column ndistincts).
+ *
+ * XXX Currently we only look for a perfect match, i.e. a single ndistinct
+ * estimate exactly matching all the columns of the statistics. This may be
+ * a bit problematic as adding a column (not covered by the ndistinct stats)
+ * will prevent us from using the stats entirely. So instead this needs to
+ * estimate the covered attributes, and then combine that with the extra
+ * attributes somehow (probably the old way).
+ */
+static double
+find_ndistinct(PlannerInfo *root, RelOptInfo *rel, List *varinfos, bool *found)
+{
+	ListCell   *lc;
+	Bitmapset  *attnums = NULL;
+	int			nattnums;
+	VariableStatData vardata;
+
+	/* assume we haven't found any suitable ndistinct statistics */
+	*found = false;
+
+	/* bail out immediately if the table has no extended statistics */
+	if (!rel->statlist)
+		return 0.0;
+
+	foreach(lc, varinfos)
+	{
+		GroupVarInfo *varinfo = (GroupVarInfo *) lfirst(lc);
+
+		if (varinfo->rel != rel)
+			continue;
+
+		/* FIXME handle expressions in general only */
+
+		/*
+		 * examine the variable (or expression) so that we know which
+		 * attribute we're dealing with - we need this for matching the
+		 * ndistinct coefficient
+		 *
+		 * FIXME probably might remember this from estimate_num_groups
+		 */
+		examine_variable(root, varinfo->var, 0, &vardata);
+
+		if (HeapTupleIsValid(vardata.statsTuple))
+		{
+			Form_pg_statistic stats;
+
+			stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+			attnums = bms_add_member(attnums, stats->staattnum);
+
+			ReleaseVariableStats(vardata);
+		}
+	}
+	nattnums = bms_num_members(attnums);
+
+	/* look for a matching ndistinct statistics */
+	foreach(lc, rel->statlist)
+	{
+		int			i,
+					k;
+		bool		matches;
+		StatisticExtInfo *info = (StatisticExtInfo *) lfirst(lc);
+		int			j;
+		MVNDistinct stat;
+
+		/* skip statistics without ndistinct coefficient built */
+		if (!info->ndist_built)
+			continue;
+
+		/*
+		 * Only ndistinct stats covering all Vars are acceptable, which can't
+		 * happen if the statistics has fewer attributes than we have Vars.
+		 */
+		if (nattnums > info->stakeys->dim1)
+			continue;
+
+		/* check that all Vars are covered by the statistic */
+		matches = true;			/* assume match until we find unmatched
+								 * attribute */
+		k = -1;
+		while ((k = bms_next_member(attnums, k)) >= 0)
+		{
+			bool		attr_found = false;
+
+			for (i = 0; i < info->stakeys->dim1; i++)
+			{
+				if (info->stakeys->values[i] == k)
+				{
+					attr_found = true;
+					break;
+				}
+			}
+
+			/* found attribute not covered by this ndistinct stats, skip */
+			if (!attr_found)
+			{
+				matches = false;
+				break;
+			}
+		}
+
+		if (!matches)
+			continue;
+
+		/* hey, this statistics matches! great, let's extract the value */
+		*found = true;
+
+		stat = load_ext_ndistinct(info->statOid);
+
+		for (j = 0; j < stat->nitems; j++)
+		{
+			bool		item_matches = true;
+			MVNDistinctItem *item = &stat->items[j];
+
+			/* not the right item (different number of attributes) */
+			if (item->nattrs != nattnums)
+				continue;
+
+			/* check the attribute numbers */
+			k = -1;
+			while ((k = bms_next_member(attnums, k)) >= 0)
+			{
+				bool		attr_found = false;
+
+				for (i = 0; i < item->nattrs; i++)
+				{
+					if (info->stakeys->values[item->attrs[i]] == k)
+					{
+						attr_found = true;
+						break;
+					}
+				}
+
+				if (!attr_found)
+				{
+					item_matches = false;
+					break;
+				}
+			}
+
+			if (!item_matches)
+				continue;
+
+			return item->ndistinct;
+		}
+	}
+
+	/* Nothing usable :-( */
+	Assert(!(*found));
+	return 0.0;
+}
+
+/*
  * convert_to_scalar
  *	  Convert non-NULL values of the indicated types to the comparison
  *	  scale needed by scalarineqsel().
@@ -7592,3 +7761,4 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	/* XXX what about pages_per_range? */
 }
+
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index ce55fc5..a6b60c6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -56,6 +56,7 @@
 #include "catalog/pg_publication.h"
 #include "catalog/pg_rewrite.h"
 #include "catalog/pg_shseclabel.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_trigger.h"
@@ -4452,6 +4453,82 @@ RelationGetIndexList(Relation relation)
 }
 
 /*
+ * RelationGetStatExtList
+ *		get a list of OIDs of extended statistics on this relation
+ *
+ * The statistics list is created only if someone requests it, in a way
+ * similar to RelationGetIndexList().  We scan pg_statistic_ext to find
+ * relevant statistics, and add the list to the relcache entry so that we
+ * won't have to compute it again.  Note that shared cache inval of a
+ * relcache entry will delete the old list and set rd_statvalid to 0,
+ * so that we must recompute the statistics list on next request.  This
+ * handles creation or deletion of a statistic.
+ *
+ * The returned list is guaranteed to be sorted in order by OID, although
+ * this is not currently needed.
+ *
+ * Since shared cache inval causes the relcache's copy of the list to go away,
+ * we return a copy of the list palloc'd in the caller's context.  The caller
+ * may list_free() the returned list after scanning it. This is necessary
+ * since the caller will typically be doing syscache lookups on the relevant
+ * statistics, and syscache lookup could cause SI messages to be processed!
+ */
+List *
+RelationGetStatExtList(Relation relation)
+{
+	Relation	indrel;
+	SysScanDesc indscan;
+	ScanKeyData skey;
+	HeapTuple	htup;
+	List	   *result;
+	List	   *oldlist;
+	MemoryContext oldcxt;
+
+	/* Quick exit if we already computed the list. */
+	if (relation->rd_statvalid != 0)
+		return list_copy(relation->rd_statlist);
+
+	/*
+	 * We build the list we intend to return (in the caller's context) while
+	 * doing the scan.  After successfully completing the scan, we copy that
+	 * list into the relcache entry.  This avoids cache-context memory leakage
+	 * if we get some sort of error partway through.
+	 */
+	result = NIL;
+
+	/* Prepare to scan pg_statistic_ext for entries having starelid = this rel. */
+	ScanKeyInit(&skey,
+				Anum_pg_statistic_ext_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	indrel = heap_open(StatisticExtRelationId, AccessShareLock);
+	indscan = systable_beginscan(indrel, StatisticExtRelidIndexId, true,
+								 NULL, 1, &skey);
+
+	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+		/* TODO maybe include only already built statistics? */
+		result = insert_ordered_oid(result, HeapTupleGetOid(htup));
+
+	systable_endscan(indscan);
+
+	heap_close(indrel, AccessShareLock);
+
+	/* Now save a copy of the completed list in the relcache entry. */
+	oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+	oldlist = relation->rd_statlist;
+	relation->rd_statlist = list_copy(result);
+
+	relation->rd_statvalid = true;
+	MemoryContextSwitchTo(oldcxt);
+
+	/* Don't leak the old list, if there is one */
+	list_free(oldlist);
+
+	return result;
+}
+
+/*
  * insert_ordered_oid
  *		Insert a new Oid into a sorted list of Oids, preserving ordering
  *
@@ -5560,6 +5637,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_pkattr = NULL;
 		rel->rd_idattr = NULL;
 		rel->rd_pubactions = NULL;
+		rel->rd_statvalid = false;
+		rel->rd_statlist = NIL;
 		rel->rd_createSubid = InvalidSubTransactionId;
 		rel->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 		rel->rd_amcache = NULL;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index b1c0b4b..4a9cb76 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -61,6 +61,7 @@
 #include "catalog/pg_shseclabel.h"
 #include "catalog/pg_replication_origin.h"
 #include "catalog/pg_statistic.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_transform.h"
@@ -725,6 +726,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		32
 	},
+	{StatisticExtRelationId,	/* STATEXTNAMENSP */
+		StatisticExtNameIndexId,
+		2,
+		{
+			Anum_pg_statistic_ext_staname,
+			Anum_pg_statistic_ext_stanamespace,
+			0,
+			0
+		},
+		4
+	},
+	{StatisticExtRelationId,	/* STATEXTOID */
+		StatisticExtOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{StatisticRelationId,		/* STATRELATTINH */
 		StatisticRelidAttnumInhIndexId,
 		3,
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 61a3e2a..37cb6af 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2320,6 +2320,51 @@ describeOneTableDetails(const char *schemaname,
 			PQclear(result);
 		}
 
+		/* print any extended statistics */
+		if (pset.sversion >= 100000)
+		{
+			printfPQExpBuffer(&buf,
+							  "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled,\n"
+							  "  (standistinct IS NOT NULL) AS ndist_built,\n"
+							  "  (SELECT string_agg(attname::text,', ')\n"
+						   "    FROM ((SELECT unnest(stakeys) AS attnum) s\n"
+			   "         JOIN pg_attribute a ON (starelid = a.attrelid AND\n"
+							  "a.attnum = s.attnum AND not attisdropped))) AS attnums\n"
+			  "FROM pg_statistic_ext stat WHERE starelid  = '%s' ORDER BY 1;",
+							  oid);
+
+			result = PSQLexec(buf.data);
+			if (!result)
+				goto error_return;
+			else
+				tuples = PQntuples(result);
+
+			if (tuples > 0)
+			{
+				printTableAddFooter(&cont, _("Statistics:"));
+				for (i = 0; i < tuples; i++)
+				{
+					printfPQExpBuffer(&buf, "    ");
+
+					/* statistics name (qualified with namespace) */
+					appendPQExpBuffer(&buf, "\"%s.%s\" ",
+									  PQgetvalue(result, i, 1),
+									  PQgetvalue(result, i, 2));
+
+					/* options */
+					if (!strcmp(PQgetvalue(result, i, 4), "t"))
+						appendPQExpBufferStr(&buf, "(ndistinct)");
+
+					appendPQExpBuffer(&buf, " ON (%s)",
+									  PQgetvalue(result, i, 6));
+
+					printTableAddFooter(&cont, buf.data);
+				}
+			}
+			PQclear(result);
+		}
+
 		/* print rules */
 		if (tableinfo.hasrules && tableinfo.relkind != RELKIND_MATVIEW)
 		{
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 10759c7..9effbce 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -147,6 +147,7 @@ typedef enum ObjectClass
 	OCLASS_REWRITE,				/* pg_rewrite */
 	OCLASS_TRIGGER,				/* pg_trigger */
 	OCLASS_SCHEMA,				/* pg_namespace */
+	OCLASS_STATISTIC_EXT,		/* pg_statistic_ext */
 	OCLASS_TSPARSER,			/* pg_ts_parser */
 	OCLASS_TSDICT,				/* pg_ts_dict */
 	OCLASS_TSTEMPLATE,			/* pg_ts_template */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 1187797..473fe17 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 				  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
+extern void RemoveStatisticsExt(Oid relid, AttrNumber attnum);
 
 extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6bce732..8130581 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -182,6 +182,13 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_oid_index, 3380, on pg_statistic_ext using btree(oid oid_ops));
+#define StatisticExtOidIndexId	3380
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, on pg_statistic_ext using btree(staname name_ops, stanamespace oid_ops));
+#define StatisticExtNameIndexId 3997
+DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, on pg_statistic_ext using btree(starelid oid_ops));
+#define StatisticExtRelidIndexId 3379
+
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index dbeb25b..35e0e2b 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -141,6 +141,8 @@ extern Oid	get_collation_oid(List *collname, bool missing_ok);
 extern Oid	get_conversion_oid(List *conname, bool missing_ok);
 extern Oid	FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);
 
+extern Oid	get_statistics_oid(List *names, bool missing_ok);
+
 /* initialization & transaction cleanup code */
 extern void InitializeSearchPath(void);
 extern void AtEOXact_Namespace(bool isCommit, bool parallel);
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 80a40ab..5bcdce7 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -254,6 +254,10 @@ DATA(insert (	23	 18   78 e f ));
 /* pg_node_tree can be coerced to, but not from, text */
 DATA(insert (  194	 25    0 i b ));
 
+/* pg_ndistinct can be coerced to, but not from, bytea and text */
+DATA(insert (  3353  17    0 i b ));
+DATA(insert (  3353  25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index ec4aedb..05baa80 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2726,6 +2726,15 @@ DESCR("current user privilege on any column by rel name");
 DATA(insert OID = 3029 (  has_any_column_privilege	   PGNSP PGUID 12 10 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_ has_any_column_privilege_id _null_ _null_ _null_ ));
 DESCR("current user privilege on any column by rel oid");
 
+DATA(insert OID = 3354 (  pg_ndistinct_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3353 "2275" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3355 (  pg_ndistinct_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3353" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3356 (  pg_ndistinct_recv PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3353 "2281" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3357 (  pg_ndistinct_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3353" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
new file mode 100644
index 0000000..dda6c17
--- /dev/null
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -0,0 +1,74 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_statistic_ext.h
+ *	  definition of the system "extended statistic" relation (pg_statistic_ext)
+ *	  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_statistic_ext.h
+ *
+ * NOTES
+ *	  the genbki.pl script reads this file and generates .bki
+ *	  information from the DATA() statements.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_STATISTIC_EXT_H
+#define PG_STATISTIC_EXT_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_statistic_ext definition.  cpp turns this into
+ *		typedef struct FormData_pg_statistic_ext
+ * ----------------
+ */
+#define StatisticExtRelationId	3381
+
+CATALOG(pg_statistic_ext,3381)
+{
+	/* These fields form the unique key for the entry: */
+	Oid			starelid;		/* relation containing attributes */
+	NameData	staname;		/* statistics name */
+	Oid			stanamespace;	/* OID of namespace containing this statistics */
+	Oid			staowner;		/* statistics owner */
+
+	/*
+	 * variable-length fields start here, but we allow direct access to
+	 * stakeys
+	 */
+	int2vector	stakeys;		/* array of column keys */
+
+#ifdef CATALOG_VARLEN
+	char		staenabled[1];	/* statistics requested to build */
+	pg_ndistinct standistinct;	/* ndistinct coefficients (serialized) */
+#endif
+
+} FormData_pg_statistic_ext;
+
+/* ----------------
+ *		Form_pg_statistic_ext corresponds to a pointer to a tuple with
+ *		the format of pg_statistic_ext relation.
+ * ----------------
+ */
+typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
+
+/* ----------------
+ *		compiler constants for pg_statistic_ext
+ * ----------------
+ */
+#define Natts_pg_statistic_ext					7
+#define Anum_pg_statistic_ext_starelid			1
+#define Anum_pg_statistic_ext_staname			2
+#define Anum_pg_statistic_ext_stanamespace		3
+#define Anum_pg_statistic_ext_staowner			4
+#define Anum_pg_statistic_ext_stakeys			5
+#define Anum_pg_statistic_ext_staenabled		6
+#define Anum_pg_statistic_ext_standistinct		7
+
+#define STATS_EXT_NDISTINCT		'd'
+
+#endif   /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 6e4c65e..9c9caf3 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -364,6 +364,10 @@ DATA(insert OID = 194 ( pg_node_tree	PGNSP PGUID -1 f b S f t \054 0 0 0 pg_node
 DESCR("string representing an internal node tree");
 #define PGNODETREEOID	194
 
+DATA(insert OID = 3353 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_ndistinct_in pg_ndistinct_out pg_ndistinct_recv pg_ndistinct_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate ndistinct coefficients");
+#define PGNDISTINCTOID	3353
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index db7f145..00d0a83 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -53,6 +53,7 @@ DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
 DECLARE_TOAST(pg_statistic, 2840, 2841);
+DECLARE_TOAST(pg_statistic_ext, 3439, 3440);
 DECLARE_TOAST(pg_trigger, 2336, 2337);
 
 /* shared catalogs */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 8740cee..c323e81 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -77,6 +77,10 @@ extern ObjectAddress DefineOperator(List *names, List *parameters);
 extern void RemoveOperatorById(Oid operOid);
 extern ObjectAddress AlterOperator(AlterOperatorStmt *stmt);
 
+/* commands/statscmds.c */
+extern ObjectAddress CreateStatistics(CreateStatsStmt *stmt);
+extern void RemoveStatisticsById(Oid statsOid);
+
 /* commands/aggregatecmds.c */
 extern ObjectAddress DefineAggregate(ParseState *pstate, List *name, List *args, bool oldstyle,
 				List *parameters);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 2bc7a5d..d269e77 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -278,6 +278,7 @@ typedef enum NodeTag
 	T_PlaceHolderInfo,
 	T_MinMaxAggInfo,
 	T_PlannerParamItem,
+	T_StatisticExtInfo,
 
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
@@ -423,6 +424,7 @@ typedef enum NodeTag
 	T_CreateSubscriptionStmt,
 	T_AlterSubscriptionStmt,
 	T_DropSubscriptionStmt,
+	T_CreateStatsStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d576523..86a80f6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1593,6 +1593,7 @@ typedef enum ObjectType
 	OBJECT_SCHEMA,
 	OBJECT_SEQUENCE,
 	OBJECT_SUBSCRIPTION,
+	OBJECT_STATISTIC_EXT,
 	OBJECT_TABCONSTRAINT,
 	OBJECT_TABLE,
 	OBJECT_TABLESPACE,
@@ -2644,6 +2645,19 @@ typedef struct IndexStmt
 } IndexStmt;
 
 /* ----------------------
+ *		Create Statistics Statement
+ * ----------------------
+ */
+typedef struct CreateStatsStmt
+{
+	NodeTag		type;
+	List	   *defnames;		/* qualified name (list of Value strings) */
+	RangeVar   *relation;		/* relation to build statistics on */
+	List	   *keys;			/* String nodes naming referenced columns */
+	bool		if_not_exists;	/* do nothing if statistics already exists */
+} CreateStatsStmt;
+
+/* ----------------------
  *		Create Function Statement
  * ----------------------
  */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 05d6f07..5923b5f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -525,6 +525,7 @@ typedef struct RelOptInfo
 	List	   *lateral_vars;	/* LATERAL Vars and PHVs referenced by rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *statlist;		/* list of StatisticExtInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -664,6 +665,31 @@ typedef struct ForeignKeyOptInfo
 	List	   *rinfos[INDEX_MAX_KEYS];
 } ForeignKeyOptInfo;
 
+/*
+ * StatisticExtInfo
+ *		Information about extended statistics for planning/optimization
+ *
+ * This contains information about which columns are covered by the
+ * statistics (stakeys), which options were requested while adding the
+ * statistics (*_enabled), and which kinds of statistics were actually
+ * built and are available for the optimizer (*_built).
+ */
+typedef struct StatisticExtInfo
+{
+	NodeTag		type;
+
+	Oid			statOid;		/* OID of the statistics row */
+	RelOptInfo *rel;			/* back-link to index's table */
+
+	/* enabled statistics */
+	bool		ndist_enabled;	/* ndistinct coefficient enabled */
+
+	/* built/available statistics */
+	bool		ndist_built;	/* ndistinct coefficient built */
+
+	/* columns in the statistics (attnums) */
+	int2vector *stakeys;		/* attnums of the columns covered */
+} StatisticExtInfo;
 
 /*
  * EquivalenceClasses
diff --git a/src/include/statistics/common.h b/src/include/statistics/common.h
new file mode 100644
index 0000000..39c62bd
--- /dev/null
+++ b/src/include/statistics/common.h
@@ -0,0 +1,62 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *	  POSTGRES extended statistics internal declarations
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/statistics/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STATISTICS_COMMON_H
+#define STATISTICS_COMMON_H
+
+#include "commands/vacuum.h"
+#include "utils/sortsupport.h"
+
+
+typedef struct
+{
+	Oid			eqopr;			/* '=' operator for datatype, if any */
+	Oid			eqfunc;			/* and associated function */
+	Oid			ltopr;			/* '<' operator for datatype, if any */
+} StdAnalyzeData;
+
+typedef struct
+{
+	Datum		value;			/* a data value */
+	int			tupno;			/* position index for tuple it came from */
+} ScalarItem;
+
+/* multi-sort */
+typedef struct MultiSortSupportData
+{
+	int			ndims;			/* number of dimensions supported by the */
+	SortSupportData ssup[1];	/* sort support data for each dimension */
+} MultiSortSupportData;
+
+typedef MultiSortSupportData *MultiSortSupport;
+
+typedef struct SortItem
+{
+	Datum	   *values;
+	bool	   *isnull;
+} SortItem;
+
+extern MultiSortSupport multi_sort_init(int ndims);
+extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
+						 int dim, VacAttrStats **vacattrstats);
+extern int	multi_sort_compare(const void *a, const void *b, void *arg);
+extern int multi_sort_compare_dim(int dim, const SortItem * a,
+					   const SortItem * b, MultiSortSupport mss);
+extern int multi_sort_compare_dims(int start, int end, const SortItem * a,
+						const SortItem * b, MultiSortSupport mss);
+
+/* comparators, used when constructing extended stats */
+extern int	compare_scalars_simple(const void *a, const void *b, void *arg);
+extern int	compare_scalars_partition(const void *a, const void *b, void *arg);
+
+#endif   /* STATISTICS_COMMON_H */
diff --git a/src/include/statistics/stats.h b/src/include/statistics/stats.h
new file mode 100644
index 0000000..e0f6c24
--- /dev/null
+++ b/src/include/statistics/stats.h
@@ -0,0 +1,57 @@
+/*-------------------------------------------------------------------------
+ *
+ * stats.h
+ *	  Extended statistics and selectivity estimation functions.
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/statistics/stats.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STATS_H
+#define STATS_H
+
+#include "commands/vacuum.h"
+
+#define STATS_MAX_DIMENSIONS	8		/* max number of attributes */
+
+#define STATS_NDISTINCT_MAGIC		0xA352BFA4	/* marks serialized bytea */
+#define STATS_NDISTINCT_TYPE_BASIC	1	/* basic MCV list type */
+
+/* Multivariate distinct coefficients. */
+typedef struct MVNDistinctItem
+{
+	double		ndistinct;
+	AttrNumber	nattrs;
+	AttrNumber *attrs;
+} MVNDistinctItem;
+
+typedef struct MVNDistinctData
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of ndistinct (BASIC) */
+	uint32		nitems;			/* number of items in the statistic */
+	MVNDistinctItem items[FLEXIBLE_ARRAY_MEMBER];
+} MVNDistinctData;
+
+typedef MVNDistinctData *MVNDistinct;
+
+extern MVNDistinct load_ext_ndistinct(Oid mvoid);
+
+extern bytea *serialize_ext_ndistinct(MVNDistinct ndistinct);
+
+/* deserialization of stats (serialization is private to analyze) */
+extern MVNDistinct deserialize_ext_ndistinct(bytea *data);
+
+extern MVNDistinct build_ext_ndistinct(double totalrows, int numrows, HeapTuple *rows,
+					int2vector *attrs, VacAttrStats **stats);
+
+extern void build_ext_stats(Relation onerel, double totalrows,
+				int numrows, HeapTuple *rows,
+				int natts, VacAttrStats **vacattrstats);
+extern bool stats_are_enabled(HeapTuple htup, char type);
+extern bool stats_are_built(HeapTuple htup, char type);
+
+#endif   /* STATS_H */
diff --git a/src/include/utils/acl.h b/src/include/utils/acl.h
index 0d11852..c957d8e 100644
--- a/src/include/utils/acl.h
+++ b/src/include/utils/acl.h
@@ -192,6 +192,7 @@ typedef enum AclObjectKind
 	ACL_KIND_OPFAMILY,			/* pg_opfamily */
 	ACL_KIND_COLLATION,			/* pg_collation */
 	ACL_KIND_CONVERSION,		/* pg_conversion */
+	ACL_KIND_STATISTICS,		/* pg_statistic_ext */
 	ACL_KIND_TABLESPACE,		/* pg_tablespace */
 	ACL_KIND_TSDICTIONARY,		/* pg_ts_dict */
 	ACL_KIND_TSCONFIGURATION,	/* pg_ts_config */
@@ -326,6 +327,7 @@ extern bool pg_event_trigger_ownercheck(Oid et_oid, Oid roleid);
 extern bool pg_extension_ownercheck(Oid ext_oid, Oid roleid);
 extern bool pg_publication_ownercheck(Oid pub_oid, Oid roleid);
 extern bool pg_subscription_ownercheck(Oid sub_oid, Oid roleid);
+extern bool pg_statistics_ownercheck(Oid stat_oid, Oid roleid);
 extern bool has_createrole_privilege(Oid roleid);
 extern bool has_bypassrls_privilege(Oid roleid);
 
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index a617a7c..ab875bb 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -92,6 +92,7 @@ typedef struct RelationData
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
 								 * valid, 2 = temporarily forced */
+	bool		rd_statvalid;	/* is rd_statlist valid? */
 
 	/*
 	 * rd_createSubid is the ID of the highest subtransaction the rel has
@@ -136,6 +137,9 @@ typedef struct RelationData
 	Oid			rd_pkindex;		/* OID of primary key, if any */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
 
+	/* data managed by RelationGetStatExtList: */
+	List	   *rd_statlist;	/* list of OIDs of extended stats */
+
 	/* data managed by RelationGetIndexAttrBitmap: */
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
 	Bitmapset  *rd_keyattr;		/* cols that can be ref'd by foreign keys */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index da36b67..81af3ae 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -39,6 +39,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetOidIndex(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 66f60d2..048541e 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -86,6 +86,8 @@ enum SysCacheIdentifier
 	PUBLICATIONRELMAP,
 	RULERELNAME,
 	SEQRELID,
+	STATEXTNAMENSP,
+	STATEXTOID,
 	STATRELATTINH,
 	SUBSCRIPTIONOID,
 	SUBSCRIPTIONNAME,
diff --git a/src/test/regress/expected/alter_generic.out b/src/test/regress/expected/alter_generic.out
index b01be59..ce581bb 100644
--- a/src/test/regress/expected/alter_generic.out
+++ b/src/test/regress/expected/alter_generic.out
@@ -497,6 +497,48 @@ ALTER OPERATOR FAMILY alt_opf18 USING btree DROP FUNCTION 2 (int4, int4);
 ERROR:  function 2(integer,integer) does not exist in operator family "alt_opf18"
 DROP OPERATOR FAMILY alt_opf18 USING btree;
 --
+-- Statistics
+--
+SET SESSION AUTHORIZATION regress_alter_user1;
+CREATE TABLE alt_regress_1 (a INTEGER, b INTEGER);
+CREATE STATISTICS alt_stat1 ON (a, b) FROM alt_regress_1;
+CREATE STATISTICS alt_stat2 ON (a, b) FROM alt_regress_1;
+ALTER STATISTICS alt_stat1 RENAME TO alt_stat2;   -- failed (name conflict)
+ERROR:  statistics "alt_stat2" already exists in schema "alt_nsp1"
+ALTER STATISTICS alt_stat1 RENAME TO alt_stat3;   -- failed (name conflict)
+ALTER STATISTICS alt_stat2 OWNER TO regress_alter_user2;  -- failed (no role membership)
+ERROR:  must be member of role "regress_alter_user2"
+ALTER STATISTICS alt_stat2 OWNER TO regress_alter_user3;  -- OK
+ALTER STATISTICS alt_stat2 SET SCHEMA alt_nsp2;    -- OK
+SET SESSION AUTHORIZATION regress_alter_user2;
+CREATE STATISTICS alt_stat1 ON (a, b) FROM alt_regress_1;
+CREATE STATISTICS alt_stat2 ON (a, b) FROM alt_regress_1;
+ALTER STATISTICS alt_stat3 RENAME TO alt_stat4;    -- failed (not owner)
+ERROR:  must be owner of statistics alt_stat3
+ALTER STATISTICS alt_stat1 RENAME TO alt_stat4;    -- OK
+ALTER STATISTICS alt_stat3 OWNER TO regress_alter_user2; -- failed (not owner)
+ERROR:  must be owner of statistics alt_stat3
+ALTER STATISTICS alt_stat2 OWNER TO regress_alter_user3; -- failed (no role membership)
+ERROR:  must be member of role "regress_alter_user3"
+ALTER STATISTICS alt_stat3 SET SCHEMA alt_nsp2;		-- failed (not owner)
+ERROR:  must be owner of statistics alt_stat3
+ALTER STATISTICS alt_stat2 SET SCHEMA alt_nsp2;		-- failed (name conflict)
+ERROR:  statistics "alt_stat2" already exists in schema "alt_nsp2"
+RESET SESSION AUTHORIZATION;
+SELECT nspname, staname, rolname
+  FROM pg_statistic_ext s, pg_namespace n, pg_authid a
+ WHERE s.stanamespace = n.oid AND s.staowner = a.oid
+   AND n.nspname in ('alt_nsp1', 'alt_nsp2')
+ ORDER BY nspname, staname;
+ nspname  |  staname  |       rolname       
+----------+-----------+---------------------
+ alt_nsp1 | alt_stat2 | regress_alter_user2
+ alt_nsp1 | alt_stat3 | regress_alter_user1
+ alt_nsp1 | alt_stat4 | regress_alter_user2
+ alt_nsp2 | alt_stat2 | regress_alter_user3
+(4 rows)
+
+--
 -- Text Search Dictionary
 --
 SET SESSION AUTHORIZATION regress_alter_user1;
@@ -639,7 +681,7 @@ DROP LANGUAGE alt_lang3 CASCADE;
 DROP LANGUAGE alt_lang4 CASCADE;
 ERROR:  language "alt_lang4" does not exist
 DROP SCHEMA alt_nsp1 CASCADE;
-NOTICE:  drop cascades to 26 other objects
+NOTICE:  drop cascades to 27 other objects
 DETAIL:  drop cascades to function alt_func3(integer)
 drop cascades to function alt_agg3(integer)
 drop cascades to function alt_func4(integer)
@@ -656,6 +698,7 @@ drop cascades to operator family alt_opc1 for access method hash
 drop cascades to operator family alt_opc2 for access method hash
 drop cascades to operator family alt_opf4 for access method hash
 drop cascades to operator family alt_opf2 for access method hash
+drop cascades to table alt_regress_1
 drop cascades to text search dictionary alt_ts_dict3
 drop cascades to text search dictionary alt_ts_dict4
 drop cascades to text search dictionary alt_ts_dict2
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index 836773f..07b3701 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -38,6 +38,7 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
+CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
 ERROR:  unrecognized object type "stone"
@@ -399,7 +400,8 @@ WITH objects (type, name, args) AS (VALUES
 				('access method', '{btree}', '{}'),
 				('publication', '{addr_pub}', '{}'),
 				('publication relation', '{addr_nsp, gentable}', '{addr_pub}'),
-				('subscription', '{addr_sub}', '{}')
+				('subscription', '{addr_sub}', '{}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.objsubid)).*,
 	-- test roundtrip through pg_identify_object_as_address
@@ -447,6 +449,7 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.objsubid)).*,
  trigger                   |            |                   | t on addr_nsp.gentable                                               | t
  operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops USING btree                                   | t
  policy                    |            |                   | genpol on addr_nsp.gentable                                          | t
+ statistics                | addr_nsp   | gentable_stat     | addr_nsp.gentable_stat                                               | t
  collation                 | pg_catalog | "default"         | pg_catalog."default"                                                 | t
  transform                 |            |                   | for integer on language sql                                          | t
  text search dictionary    | addr_nsp   | addr_ts_dict      | addr_nsp.addr_ts_dict                                                | t
@@ -456,7 +459,7 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.objsubid)).*,
  subscription              |            | addr_sub          | addr_sub                                                             | t
  publication               |            | addr_pub          | addr_pub                                                             | t
  publication relation      |            |                   | gentable in publication addr_pub                                     | t
-(45 rows)
+(46 rows)
 
 ---
 --- Cleanup resources
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 0bcec13..9a26205 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -817,11 +817,12 @@ WHERE c.castmethod = 'b' AND
  text              | character         |        0 | i
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
+ pg_ndistinct      | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(7 rows)
+(8 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index bd13ae6..d4b2158 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2160,6 +2160,14 @@ pg_stats| SELECT n.nspname AS schemaname,
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
   WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text) AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid))));
+pg_stats_ext| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    s.staname,
+    s.stakeys AS attnums,
+    length((s.standistinct)::text) AS ndistbytes
+   FROM ((pg_statistic_ext s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index b5eff55..9edba4f 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -142,6 +142,7 @@ pg_shdepend|t
 pg_shdescription|t
 pg_shseclabel|t
 pg_statistic|t
+pg_statistic_ext|t
 pg_subscription|t
 pg_tablespace|t
 pg_transform|t
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
new file mode 100644
index 0000000..134a684
--- /dev/null
+++ b/src/test/regress/expected/stats_ext.out
@@ -0,0 +1,137 @@
+-- Generic extended statistics support
+CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);
+CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
+DROP STATISTICS ab1_a_b_stats;
+CREATE SCHEMA regress_schema_2;
+CREATE STATISTICS regress_schema_2.ab1_a_b_stats ON (a, b) FROM ab1;
+DROP STATISTICS regress_schema_2.ab1_a_b_stats;
+CREATE STATISTICS ab1_a_b_c_stats ON (a, b, c) FROM ab1;
+CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
+ALTER TABLE ab1 DROP COLUMN a;
+\d ab1
+                Table "public.ab1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ b      | integer |           |          | 
+ c      | integer |           |          | 
+Statistics:
+    "public.ab1_a_b_c_stats" (ndistinct) ON (b, c)
+
+DROP TABLE ab1;
+-- data type passed by value
+CREATE TABLE ndistinct (
+    a INT,
+    b INT,
+    c INT,
+    d INT
+);
+-- unknown column
+CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+ERROR:  column "unknown_column" referenced in statistics does not exist
+-- single column
+CREATE STATISTICS s10 ON (a) FROM ndistinct;
+ERROR:  statistics require at least 2 columns
+-- single column, duplicated
+CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+ERROR:  duplicate column name in statistics definition
+-- two columns, one duplicated
+CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+ERROR:  duplicate column name in statistics definition
+-- correct command
+CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+-- perfectly correlated groups
+INSERT INTO ndistinct
+     SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
+ANALYZE ndistinct;
+SELECT staenabled, standistinct
+  FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
+ staenabled |                                    standistinct                                     
+------------+-------------------------------------------------------------------------------------
+ {d}        | [{0, 1, 101.000000}, {0, 2, 101.000000}, {1, 2, 101.000000}, {0, 1, 2, 101.000000}]
+(1 row)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b, c
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+         QUERY PLAN          
+-----------------------------
+ HashAggregate
+   Group Key: a, b, c, d
+   ->  Seq Scan on ndistinct
+(3 rows)
+
+TRUNCATE TABLE ndistinct;
+-- partially correlated groups
+INSERT INTO ndistinct
+     SELECT i/50, i/100, i/200 FROM generate_series(1,10000) s(i);
+ANALYZE ndistinct;
+SELECT staenabled, standistinct
+  FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
+ staenabled |                                    standistinct                                     
+------------+-------------------------------------------------------------------------------------
+ {d}        | [{0, 1, 201.000000}, {0, 2, 201.000000}, {1, 2, 101.000000}, {0, 1, 2, 201.000000}]
+(1 row)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ HashAggregate  (cost=230.00..232.01 rows=201 width=16)
+   Group Key: a, b
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=8)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=255.00..257.01 rows=201 width=20)
+   Group Key: a, b, c
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=12)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=280.00..290.00 rows=1000 width=24)
+   Group Key: a, b, c, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=16)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ HashAggregate  (cost=255.00..265.00 rows=1000 width=20)
+   Group Key: b, c, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=12)
+(3 rows)
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ HashAggregate  (cost=230.00..240.00 rows=1000 width=16)
+   Group Key: a, d
+   ->  Seq Scan on ndistinct  (cost=0.00..155.00 rows=10000 width=8)
+(3 rows)
+
+DROP TABLE ndistinct;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 8d75bbf..f6b799a 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -59,7 +59,7 @@ WHERE (p1.typtype = 'c' AND p1.typrelid = 0) OR
 -- Look for types that should have an array type according to their typtype,
 -- but don't.  We exclude composites here because we have not bothered to
 -- make array types corresponding to the system catalogs' rowtypes.
--- NOTE: as of v10, this check finds pg_node_tree and smgr.
+-- NOTE: as of v10, this check finds pg_node_tree, pg_ndistinct, smgr.
 SELECT p1.oid, p1.typname
 FROM pg_type as p1
 WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
@@ -67,11 +67,12 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid |   typname    
------+--------------
- 194 | pg_node_tree
- 210 | smgr
-(2 rows)
+ oid  |   typname    
+------+--------------
+  194 | pg_node_tree
+ 3353 | pg_ndistinct
+  210 | smgr
+(3 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 9f38349..a8ebf93 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -89,7 +89,7 @@ test: brin gin gist spgist privileges init_privs security_label collate matview
 # ----------
 # Another group of parallel tests
 # ----------
-test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf
+test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf stats_ext
 
 # rules cannot run concurrently with any test that creates a view
 test: rules psql_crosstab amutils
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 2987b24..bff9432 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -128,6 +128,7 @@ test: dbsize
 test: misc_functions
 test: sysviews
 test: tsrf
+test: stats_ext
 test: rules
 test: psql_crosstab
 test: select_parallel
diff --git a/src/test/regress/sql/alter_generic.sql b/src/test/regress/sql/alter_generic.sql
index c9ea479..f6fa8d8 100644
--- a/src/test/regress/sql/alter_generic.sql
+++ b/src/test/regress/sql/alter_generic.sql
@@ -433,6 +433,37 @@ ALTER OPERATOR FAMILY alt_opf18 USING btree ADD
 ALTER OPERATOR FAMILY alt_opf18 USING btree DROP FUNCTION 2 (int4, int4);
 DROP OPERATOR FAMILY alt_opf18 USING btree;
 
+--
+-- Statistics
+--
+SET SESSION AUTHORIZATION regress_alter_user1;
+CREATE TABLE alt_regress_1 (a INTEGER, b INTEGER);
+CREATE STATISTICS alt_stat1 ON (a, b) FROM alt_regress_1;
+CREATE STATISTICS alt_stat2 ON (a, b) FROM alt_regress_1;
+
+ALTER STATISTICS alt_stat1 RENAME TO alt_stat2;   -- failed (name conflict)
+ALTER STATISTICS alt_stat1 RENAME TO alt_stat3;   -- failed (name conflict)
+ALTER STATISTICS alt_stat2 OWNER TO regress_alter_user2;  -- failed (no role membership)
+ALTER STATISTICS alt_stat2 OWNER TO regress_alter_user3;  -- OK
+ALTER STATISTICS alt_stat2 SET SCHEMA alt_nsp2;    -- OK
+
+SET SESSION AUTHORIZATION regress_alter_user2;
+CREATE STATISTICS alt_stat1 ON (a, b) FROM alt_regress_1;
+CREATE STATISTICS alt_stat2 ON (a, b) FROM alt_regress_1;
+
+ALTER STATISTICS alt_stat3 RENAME TO alt_stat4;    -- failed (not owner)
+ALTER STATISTICS alt_stat1 RENAME TO alt_stat4;    -- OK
+ALTER STATISTICS alt_stat3 OWNER TO regress_alter_user2; -- failed (not owner)
+ALTER STATISTICS alt_stat2 OWNER TO regress_alter_user3; -- failed (no role membership)
+ALTER STATISTICS alt_stat3 SET SCHEMA alt_nsp2;		-- failed (not owner)
+ALTER STATISTICS alt_stat2 SET SCHEMA alt_nsp2;		-- failed (name conflict)
+
+RESET SESSION AUTHORIZATION;
+SELECT nspname, staname, rolname
+  FROM pg_statistic_ext s, pg_namespace n, pg_authid a
+ WHERE s.stanamespace = n.oid AND s.staowner = a.oid
+   AND n.nspname in ('alt_nsp1', 'alt_nsp2')
+ ORDER BY nspname, staname;
 
 --
 -- Text Search Dictionary
diff --git a/src/test/regress/sql/object_address.sql b/src/test/regress/sql/object_address.sql
index 0ace4dd..4e34185 100644
--- a/src/test/regress/sql/object_address.sql
+++ b/src/test/regress/sql/object_address.sql
@@ -41,6 +41,7 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
+CREATE STATISTICS addr_nsp.gentable_stat ON (a,b) FROM addr_nsp.gentable;
 
 -- test some error cases
 SELECT pg_get_object_address('stone', '{}', '{}');
@@ -179,7 +180,8 @@ WITH objects (type, name, args) AS (VALUES
 				('access method', '{btree}', '{}'),
 				('publication', '{addr_pub}', '{}'),
 				('publication relation', '{addr_nsp, gentable}', '{addr_pub}'),
-				('subscription', '{addr_sub}', '{}')
+				('subscription', '{addr_sub}', '{}'),
+				('statistics', '{addr_nsp, gentable_stat}', '{}')
         )
 SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.objsubid)).*,
 	-- test roundtrip through pg_identify_object_as_address
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
new file mode 100644
index 0000000..472d465
--- /dev/null
+++ b/src/test/regress/sql/stats_ext.sql
@@ -0,0 +1,86 @@
+-- Generic extended statistics support
+CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);
+
+CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
+DROP STATISTICS ab1_a_b_stats;
+
+CREATE SCHEMA regress_schema_2;
+CREATE STATISTICS regress_schema_2.ab1_a_b_stats ON (a, b) FROM ab1;
+DROP STATISTICS regress_schema_2.ab1_a_b_stats;
+
+CREATE STATISTICS ab1_a_b_c_stats ON (a, b, c) FROM ab1;
+CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
+ALTER TABLE ab1 DROP COLUMN a;
+\d ab1
+
+DROP TABLE ab1;
+
+
+-- data type passed by value
+CREATE TABLE ndistinct (
+    a INT,
+    b INT,
+    c INT,
+    d INT
+);
+
+-- unknown column
+CREATE STATISTICS s10 ON (unknown_column) FROM ndistinct;
+
+-- single column
+CREATE STATISTICS s10 ON (a) FROM ndistinct;
+
+-- single column, duplicated
+CREATE STATISTICS s10 ON (a,a) FROM ndistinct;
+
+-- two columns, one duplicated
+CREATE STATISTICS s10 ON (a, a, b) FROM ndistinct;
+
+-- correct command
+CREATE STATISTICS s10 ON (a, b, c) FROM ndistinct;
+
+-- perfectly correlated groups
+INSERT INTO ndistinct
+     SELECT i/100, i/100, i/100 FROM generate_series(1,10000) s(i);
+
+ANALYZE ndistinct;
+
+SELECT staenabled, standistinct
+  FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+
+EXPLAIN (COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+
+TRUNCATE TABLE ndistinct;
+
+-- partially correlated groups
+INSERT INTO ndistinct
+     SELECT i/50, i/100, i/200 FROM generate_series(1,10000) s(i);
+
+ANALYZE ndistinct;
+
+SELECT staenabled, standistinct
+  FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+
+EXPLAIN
+ SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
+
+DROP TABLE ndistinct;
diff --git a/src/test/regress/sql/type_sanity.sql b/src/test/regress/sql/type_sanity.sql
index 0a31249..4c65814 100644
--- a/src/test/regress/sql/type_sanity.sql
+++ b/src/test/regress/sql/type_sanity.sql
@@ -53,7 +53,7 @@ WHERE (p1.typtype = 'c' AND p1.typrelid = 0) OR
 -- Look for types that should have an array type according to their typtype,
 -- but don't.  We exclude composites here because we have not bothered to
 -- make array types corresponding to the system catalogs' rowtypes.
--- NOTE: as of v10, this check finds pg_node_tree and smgr.
+-- NOTE: as of v10, this check finds pg_node_tree, pg_ndistinct, smgr.
 
 SELECT p1.oid, p1.typname
 FROM pg_type as p1

#221

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: David Rowley (#214)

Re: multivariate statistics (v25)

David Rowley wrote:

+ k = -1;
+ while ((k = bms_next_member(attnums, k)) >= 0)
+ {
+ bool attr_found = false;
+ for (i = 0; i < info->stakeys->dim1; i++)
+ {
+ if (info->stakeys->values[i] == k)
+ {
+ attr_found = true;
+ break;
+ }
+ }
+
+ /* found attribute not covered by this ndistinct stats, skip */
+ if (!attr_found)
+ {
+ matches = false;
+ break;
+ }
+ }
Would it be better just to stuff info->stakeys->values into a bitmapset and
check its a subset of attnums? It would mean allocating memory in the loop,
so maybe you think otherwise, but in that case maybe StatisticExtInfo
should store the bitmapset?

Yeah, I think StatisticExtInfo should have a bitmapset, not an
int2vector.

+ appendPQExpBuffer(&buf, "(dependencies)");

I think it's better practice to use appendPQExpBufferStr() when there's no
formatting. It'll perform marginally better, which might not be important
here, but it sets a better example for people to follow when performance is
more critical.

FWIW this should have said "(ndistinct)" anyway :-)

+ change the definition of a extended statistics

"a" should be "an", Also is statistics plural here. It's commonly mixed up
in the patch. I think it needs standardised. I personally think if you're
speaking of a single pg_statatic_ext row, then it should be singular. Yet,
I'm aware you're using plural for the CREATE STATISTICS command, to me that
feels a bit like: CREATE TABLES mytable (); am I somehow thinking wrongly
somehow here?

This was discussed upthread as I recall. This is what Merriam-Webster says on
the topic:

statistic
1 : a single term or datum in a collection of statistics
2 a : a quantity (as the mean of a sample) that is computed from a sample;
specifically : estimate 3b
b : a random variable that takes on the possible values of a statistic

statistics
1 : a branch of mathematics dealing with the collection, analysis,
interpretation, and presentation of masses of numerical data
2 : a collection of quantitative data

Now, I think there's room to say that a single object created by the new CREATE
STATISTICS is really the latter, not the former. I find it very weird
that a single of these objects is named in the plural form, though, and
it looks odd all over the place. I would rather use the term
"statistics object", and then we can continue using the singular.

+   If a schema name is given (for example, <literal>CREATE STATISTICS
+   myschema.mystat ...</>) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of

What's created in the current schema? I thought this was just for naming?

Well, "created in a schema" means that the object is named after that
schema. So both are the same thing. Is this unclear in some way?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#222

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#220)

1 attachment(s)

Re: multivariate statistics (v25)

On 16 March 2017 at 09:45, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Here's another version of 0002 after cleaning up almost everything from
David's review. I also added tests for ALTER STATISTICS in
sql/alter_generic.sql which made me realize there were three crasher bug
in here; fixed all those. It also made me realize that psql's \d was a
little bit too generous with dropped columns in a stats object. That
should all behave better now.

Thanks for fixing.

As you mentioned to me off-list about missing pg_dump support, I've gone
and implemented that in the attached patch.

I followed how pg_dump works for indexes, and
created pg_get_statisticsextdef() in ruleutils.c. I was unsure if I should
be naming this pg_get_statisticsdef() instead.

I also noticed there's no COMMENT ON support either, so I added that too.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

extstats_pg_dump_and_comment_support.patchapplication/octet-stream; name=extstats_pg_dump_and_comment_support.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 40af053..1e42f05 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16418,6 +16418,10 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
    </indexterm>
 
    <indexterm>
+    <primary>pg_get_statisticsextdef</primary>
+   </indexterm>
+
+   <indexterm>
     <primary>pg_get_triggerdef</primary>
    </indexterm>
 
@@ -16587,6 +16591,11 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
        uses</entry>
       </row>
       <row>
+       <entry><literal><function>pg_get_statisticsextdef(<parameter>statext_oid</parameter>)</function></literal></entry>
+       <entry><type>text</type></entry>
+       <entry>get <command>CREATE STATISTICS</> command for extended statistics objects</entry>
+      </row>
+      <row>
        <entry><function>pg_get_triggerdef</function>(<parameter>trigger_oid</parameter>)</entry>
        <entry><type>text</type></entry>
        <entry>get <command>CREATE [ CONSTRAINT ] TRIGGER</> command for trigger</entry>
@@ -16730,20 +16739,22 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 
   <para>
    <function>pg_get_constraintdef</function>,
-   <function>pg_get_indexdef</function>, <function>pg_get_ruledef</function>,
-   and <function>pg_get_triggerdef</function>, respectively reconstruct the
-   creating command for a constraint, index, rule, or trigger. (Note that this
-   is a decompiled reconstruction, not the original text of the command.)
-   <function>pg_get_expr</function> decompiles the internal form of an
-   individual expression, such as the default value for a column.  It can be
-   useful when examining the contents of system catalogs.  If the expression
-   might contain Vars, specify the OID of the relation they refer to as the
-   second parameter; if no Vars are expected, zero is sufficient.
-   <function>pg_get_viewdef</function> reconstructs the <command>SELECT</>
-   query that defines a view. Most of these functions come in two variants,
-   one of which can optionally <quote>pretty-print</> the result.  The
-   pretty-printed format is more readable, but the default format is more
-   likely to be interpreted the same way by future versions of
+   <function>pg_get_indexdef</function>,
+   <function>pg_get_statisticsextdef</function>,
+   <function>pg_get_ruledef</function>, and
+   <function>pg_get_triggerdef</function>, respectively reconstruct the
+   creating command for a constraint, index, extended statistics object, rule,
+   or trigger. (Note that this is a decompiled reconstruction, not the
+   original text of the command.) <function>pg_get_expr</function> decompiles
+   the internal form of an individual expression, such as the default value
+   for a column.  It can be useful when examining the contents of system
+   catalogs.  If the expression might contain Vars, specify the OID of the
+   relation they refer to as the second parameter; if no Vars are expected,
+   zero is sufficient. <function>pg_get_viewdef</function> reconstructs the
+   <command>SELECT</> query that defines a view. Most of these functions come
+   in two variants, one of which can optionally <quote>pretty-print</> the
+   result.  The pretty-printed format is more readable, but the default format
+   is more likely to be interpreted the same way by future versions of
    <productname>PostgreSQL</>; avoid using pretty-printed output for dump
    purposes.  Passing <literal>false</> for the pretty-print parameter yields
    the same result as the variant that does not have the parameter at all.
diff --git a/doc/src/sgml/ref/comment.sgml b/doc/src/sgml/ref/comment.sgml
index c1cf587..d94bc85 100644
--- a/doc/src/sgml/ref/comment.sgml
+++ b/doc/src/sgml/ref/comment.sgml
@@ -51,6 +51,7 @@ COMMENT ON
   SCHEMA <replaceable class="PARAMETER">object_name</replaceable> |
   SEQUENCE <replaceable class="PARAMETER">object_name</replaceable> |
   SERVER <replaceable class="PARAMETER">object_name</replaceable> |
+  STATISTICS <replaceable class="PARAMETER">object_name</replaceable> |
   TABLE <replaceable class="PARAMETER">object_name</replaceable> |
   TABLESPACE <replaceable class="PARAMETER">object_name</replaceable> |
   TEXT SEARCH CONFIGURATION <replaceable class="PARAMETER">object_name</replaceable> |
@@ -125,8 +126,8 @@ COMMENT ON
       The name of the object to be commented.  Names of tables,
       aggregates, collations, conversions, domains, foreign tables, functions,
       indexes, operators, operator classes, operator families, sequences,
-      text search objects, types, and views can be schema-qualified.
-      When commenting on a column,
+      statistics, text search objects, types, and views can be
+      schema-qualified. When commenting on a column,
       <replaceable class="parameter">relation_name</replaceable> must refer
       to a table, view, composite type, or foreign table.
      </para>
@@ -327,6 +328,7 @@ COMMENT ON RULE my_rule ON my_table IS 'Logs updates of employee records';
 COMMENT ON SCHEMA my_schema IS 'Departmental data';
 COMMENT ON SEQUENCE my_sequence IS 'Used to generate primary keys';
 COMMENT ON SERVER myserver IS 'my foreign server';
+COMMENT ON STATISTICS my_statistics IS 'Improves planner row estimations';
 COMMENT ON TABLE my_schema.my_table IS 'Employee Information';
 COMMENT ON TABLESPACE my_tablespace IS 'Tablespace for indexes';
 COMMENT ON TEXT SEARCH CONFIGURATION my_config IS 'Special word filtering';
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a755646..2ca8e75 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6197,7 +6197,7 @@ opt_restart_seqs:
  *                 EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER |
  *                 FOREIGN TABLE | INDEX | [PROCEDURAL] LANGUAGE |
  *                 MATERIALIZED VIEW | POLICY | ROLE | SCHEMA | SEQUENCE |
- *                 SERVER | TABLE | TABLESPACE |
+ *                 SERVER | STATISTICS | TABLE | TABLESPACE |
  *                 TEXT SEARCH CONFIGURATION | TEXT SEARCH DICTIONARY |
  *                 TEXT SEARCH PARSER | TEXT SEARCH TEMPLATE | TYPE |
  *                 VIEW] <objname> |
@@ -6384,6 +6384,7 @@ comment_type:
 			| SCHEMA							{ $$ = OBJECT_SCHEMA; }
 			| INDEX								{ $$ = OBJECT_INDEX; }
 			| SEQUENCE							{ $$ = OBJECT_SEQUENCE; }
+			| STATISTICS						{ $$ = OBJECT_STATISTICS; }
 			| TABLE								{ $$ = OBJECT_TABLE; }
 			| VIEW								{ $$ = OBJECT_VIEW; }
 			| MATERIALIZED VIEW					{ $$ = OBJECT_MATVIEW; }
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index b27b77d..680aef9 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_partitioned_table.h"
 #include "catalog/pg_proc.h"
+#include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_trigger.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
@@ -317,6 +318,7 @@ static char *pg_get_indexdef_worker(Oid indexrelid, int colno,
 					   const Oid *excludeOps,
 					   bool attrsOnly, bool showTblSpc,
 					   int prettyFlags, bool missing_ok);
+static char *pg_get_statisticsext_worker(Oid statextid, bool missing_ok);
 static char *pg_get_partkeydef_worker(Oid relid, int prettyFlags);
 static char *pg_get_constraintdef_worker(Oid constraintId, bool fullCommand,
 							int prettyFlags, bool missing_ok);
@@ -1419,6 +1421,85 @@ pg_get_indexdef_worker(Oid indexrelid, int colno,
 }
 
 /*
+ * pg_get_statisticsextdef
+ *		Get the definition of an extended statistics object
+ */
+Datum
+pg_get_statisticsextdef(PG_FUNCTION_ARGS)
+{
+	Oid			statextid = PG_GETARG_OID(0);
+	char	   *res;
+
+	res = pg_get_statisticsext_worker(statextid, true);
+
+	if (res == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_TEXT_P(string_to_text(res));
+}
+
+/*
+ * Internal workhorse to decompile an extended statistics object.
+ */
+static char *
+pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
+{
+	Form_pg_statistic_ext	statextrec;
+	Form_pg_class			pgclassrec;
+	HeapTuple	statexttup;
+	HeapTuple	pgclasstup;
+	StringInfoData buf;
+	int			colno;
+
+	statexttup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statextid));
+
+	if (!HeapTupleIsValid(statexttup))
+	{
+		if (missing_ok)
+			return NULL;
+		elog(ERROR, "cache lookup failed for extended statistics %u", statextid);
+	}
+
+	statextrec = (Form_pg_statistic_ext) GETSTRUCT(statexttup);
+
+	pgclasstup = SearchSysCache1(RELOID, ObjectIdGetDatum(statextrec->starelid));
+
+	if (!HeapTupleIsValid(statexttup))
+	{
+		ReleaseSysCache(statexttup);
+		elog(ERROR, "cache lookup failed for relation %u", statextrec->starelid);
+	}
+
+	pgclassrec = (Form_pg_class) GETSTRUCT(pgclasstup);
+
+	initStringInfo(&buf);
+
+	appendStringInfo(&buf, "CREATE STATISTICS %s ON (",
+							quote_identifier(NameStr(statextrec->staname)));
+
+	for (colno = 0; colno < statextrec->stakeys.dim1; colno++)
+	{
+		AttrNumber	attnum = statextrec->stakeys.values[colno];
+		char	   *attname;
+
+		if (colno > 0)
+			appendStringInfoString(&buf, ", ");
+
+		attname = get_relid_attribute_name(statextrec->starelid, attnum);
+
+		appendStringInfoString(&buf, attname);
+	}
+
+	appendStringInfo(&buf, ") FROM %s",
+							quote_identifier(NameStr(pgclassrec->relname)));
+
+	ReleaseSysCache(statexttup);
+	ReleaseSysCache(pgclasstup);
+
+	return buf.data;
+}
+
+/*
  * pg_get_partkeydef
  *
  * Returns the partition key specification, ie, the following:
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 89530a9..e2bc357 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -273,6 +273,10 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 	getIndexes(fout, tblinfo, numTables);
 
 	if (g_verbose)
+		write_msg(NULL, "reading extended statistics\n");
+	getExtendedStatistics(fout, tblinfo, numTables);
+
+	if (g_verbose)
 		write_msg(NULL, "reading constraints\n");
 	getConstraints(fout, tblinfo, numTables);
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7273ec8..0ab2cc6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -190,6 +190,7 @@ static void dumpAttrDef(Archive *fout, AttrDefInfo *adinfo);
 static void dumpSequence(Archive *fout, TableInfo *tbinfo);
 static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
 static void dumpIndex(Archive *fout, IndxInfo *indxinfo);
+static void dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo);
 static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo);
 static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo);
 static void dumpTSParser(Archive *fout, TSParserInfo *prsinfo);
@@ -6507,6 +6508,100 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 }
 
 /*
+ * getExtendedStatistics
+ *	  get information about extended statistics on a dumpable table
+ *	  or materialized view.
+ *
+ * Note: extended statistics data is not returned directly to the caller, but
+ * it does get entered into the DumpableObject tables.
+ */
+void
+getExtendedStatistics(Archive *fout, TableInfo tblinfo[], int numTables)
+{
+	int				i,
+					j;
+	PQExpBuffer		query;
+	PGresult	   *res;
+	StatsExtInfo   *statsextinfo;
+	int				ntups;
+	int				i_tableoid;
+	int				i_oid;
+	int				i_staname;
+	int				i_stadef;
+
+	/* Extended statistics were new in v10 */
+	if (fout->remoteVersion < 100000)
+		return;
+
+	query = createPQExpBuffer();
+
+	for (i = 0; i < numTables; i++)
+	{
+		TableInfo  *tbinfo = &tblinfo[i];
+
+		/* Only plain tables and materialized views can have extended statistics. */
+		/* XXX ensure this is true. It was broken in v25 0002 */
+		if (tbinfo->relkind != RELKIND_RELATION &&
+			tbinfo->relkind != RELKIND_MATVIEW)
+			continue;
+
+		/*
+		 * Ignore extended statistics of tables whose definitions are not to
+		 * be dumped.
+		 */
+		if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
+			continue;
+
+		if (g_verbose)
+			write_msg(NULL, "reading extended statistics for table \"%s.%s\"\n",
+					  tbinfo->dobj.namespace->dobj.name,
+					  tbinfo->dobj.name);
+
+		/* Make sure we are in proper schema so stadef is right */
+		selectSourceSchema(fout, tbinfo->dobj.namespace->dobj.name);
+
+		resetPQExpBuffer(query);
+
+		appendPQExpBuffer(query,
+						  "SELECT "
+							"tableoid, "
+							"oid, "
+							"staname, "
+						  "pg_catalog.pg_get_statisticsextdef(oid) AS stadef "
+						  "FROM pg_statistic_ext "
+						  "WHERE starelid = '%u' "
+						  "ORDER BY staname", tbinfo->dobj.catId.oid);
+
+		res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+		ntups = PQntuples(res);
+
+		i_tableoid = PQfnumber(res, "tableoid");
+		i_oid = PQfnumber(res, "oid");
+		i_staname = PQfnumber(res, "staname");
+		i_stadef = PQfnumber(res, "stadef");
+
+		statsextinfo = (StatsExtInfo *) pg_malloc(ntups * sizeof(StatsExtInfo));
+
+		for (j = 0; j < ntups; j++)
+		{
+			statsextinfo[j].dobj.objType = DO_STATSEXT;
+			statsextinfo[j].dobj.catId.tableoid = atooid(PQgetvalue(res, j, i_tableoid));
+			statsextinfo[j].dobj.catId.oid = atooid(PQgetvalue(res, j, i_oid));
+			AssignDumpId(&statsextinfo[j].dobj);
+			statsextinfo[j].dobj.name = pg_strdup(PQgetvalue(res, j, i_staname));
+			statsextinfo[j].dobj.namespace = tbinfo->dobj.namespace;
+			statsextinfo[j].statsexttable = tbinfo;
+			statsextinfo[j].statsextdef = pg_strdup(PQgetvalue(res, j, i_stadef));
+		}
+
+		PQclear(res);
+	}
+
+	destroyPQExpBuffer(query);
+}
+
+/*
  * getConstraints
  *
  * Get info about constraints on dumpable tables.
@@ -9157,6 +9252,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_INDEX:
 			dumpIndex(fout, (IndxInfo *) dobj);
 			break;
+		case DO_STATSEXT:
+			dumpStatisticsExt(fout, (StatsExtInfo *) dobj);
+			break;
 		case DO_REFRESH_MATVIEW:
 			refreshMatViewData(fout, (TableDataInfo *) dobj);
 			break;
@@ -15596,6 +15694,61 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 }
 
 /*
+ * dumpStatisticsExt
+ *	  write out to fout an extended statistics object
+ */
+static void
+dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+	TableInfo  *tbinfo = statsextinfo->statsexttable;
+	PQExpBuffer q;
+	PQExpBuffer delq;
+	PQExpBuffer labelq;
+
+	if (dopt->dataOnly)
+		return;
+
+	q = createPQExpBuffer();
+	delq = createPQExpBuffer();
+	labelq = createPQExpBuffer();
+
+	appendPQExpBuffer(labelq, "STATISTICS %s",
+					  fmtId(statsextinfo->dobj.name));
+
+	appendPQExpBuffer(q, "%s;\n", statsextinfo->statsextdef);
+
+	appendPQExpBuffer(delq, "DROP STATISTICS %s.",
+						  fmtId(tbinfo->dobj.namespace->dobj.name));
+	appendPQExpBuffer(delq, "%s;\n",
+						  fmtId(statsextinfo->dobj.name));
+
+	if (statsextinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+			ArchiveEntry(fout, statsextinfo->dobj.catId,
+						 statsextinfo->dobj.dumpId,
+						 statsextinfo->dobj.name,
+						 tbinfo->dobj.namespace->dobj.name,
+						 NULL,
+						 tbinfo->rolname, false,
+						 "STATISTICS", SECTION_POST_DATA,
+						 q->data, delq->data, NULL,
+						 NULL, 0,
+						 NULL, NULL);
+
+	/* Dump Statistics Comments */
+	if (statsextinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+		dumpComment(fout, labelq->data,
+					tbinfo->dobj.namespace->dobj.name,
+					tbinfo->rolname,
+					statsextinfo->dobj.catId, 0,
+					statsextinfo->dobj.dumpId);
+
+	destroyPQExpBuffer(q);
+	destroyPQExpBuffer(delq);
+	destroyPQExpBuffer(labelq);
+}
+
+/*
  * dumpConstraint
  *	  write out to fout a user-defined constraint
  */
@@ -17133,6 +17286,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 				addObjectDependency(postDataBound, dobj->dumpId);
 				break;
 			case DO_INDEX:
+			case DO_STATSEXT:
 			case DO_REFRESH_MATVIEW:
 			case DO_TRIGGER:
 			case DO_EVENT_TRIGGER:
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index a466527..cb22f63 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -56,6 +56,7 @@ typedef enum
 	DO_TABLE,
 	DO_ATTRDEF,
 	DO_INDEX,
+	DO_STATSEXT,
 	DO_RULE,
 	DO_TRIGGER,
 	DO_CONSTRAINT,
@@ -362,6 +363,13 @@ typedef struct _indxInfo
 	int			relpages;		/* relpages of the underlying table */
 } IndxInfo;
 
+typedef struct _statsExtInfo
+{
+	DumpableObject dobj;
+	TableInfo  *statsexttable;	/* link to table the stats ext is for */
+	char	   *statsextdef;
+} StatsExtInfo;
+
 typedef struct _ruleInfo
 {
 	DumpableObject dobj;
@@ -682,6 +690,7 @@ extern void getOwnedSeqs(Archive *fout, TableInfo tblinfo[], int numTables);
 extern InhInfo *getInherits(Archive *fout, int *numInherits);
 extern PartInfo *getPartitions(Archive *fout, int *numPartitions);
 extern void getIndexes(Archive *fout, TableInfo tblinfo[], int numTables);
+extern void getExtendedStatistics(Archive *fout, TableInfo tblinfo[], int numTables);
 extern void getConstraints(Archive *fout, TableInfo tblinfo[], int numTables);
 extern RuleInfo *getRules(Archive *fout, int *numRules);
 extern void getTriggers(Archive *fout, TableInfo tblinfo[], int numTables);
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index ea64339..9ca58cc 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -48,10 +48,11 @@ static const int dbObjectTypePriority[] =
 	18,							/* DO_TABLE */
 	20,							/* DO_ATTRDEF */
 	28,							/* DO_INDEX */
-	29,							/* DO_RULE */
-	30,							/* DO_TRIGGER */
+	29,							/* DO_STATSEXT */
+	30,							/* DO_RULE */
+	31,							/* DO_TRIGGER */
 	27,							/* DO_CONSTRAINT */
-	31,							/* DO_FK_CONSTRAINT */
+	32,							/* DO_FK_CONSTRAINT */
 	2,							/* DO_PROCLANG */
 	10,							/* DO_CAST */
 	23,							/* DO_TABLE_DATA */
@@ -63,18 +64,18 @@ static const int dbObjectTypePriority[] =
 	15,							/* DO_TSCONFIG */
 	16,							/* DO_FDW */
 	17,							/* DO_FOREIGN_SERVER */
-	32,							/* DO_DEFAULT_ACL */
+	33,							/* DO_DEFAULT_ACL */
 	3,							/* DO_TRANSFORM */
 	21,							/* DO_BLOB */
 	25,							/* DO_BLOB_DATA */
 	22,							/* DO_PRE_DATA_BOUNDARY */
 	26,							/* DO_POST_DATA_BOUNDARY */
-	33,							/* DO_EVENT_TRIGGER */
-	34,							/* DO_REFRESH_MATVIEW */
-	35,							/* DO_POLICY */
-	36,							/* DO_PUBLICATION */
-	37,							/* DO_PUBLICATION_REL */
-	38							/* DO_SUBSCRIPTION */
+	34,							/* DO_EVENT_TRIGGER */
+	35,							/* DO_REFRESH_MATVIEW */
+	36,							/* DO_POLICY */
+	37,							/* DO_PUBLICATION */
+	38,							/* DO_PUBLICATION_REL */
+	39							/* DO_SUBSCRIPTION */
 };
 
 static DumpId preDataBoundId;
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9bdeef0..340f1e5 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -1981,6 +1981,8 @@ DESCR("select statement of a view");
 DATA(insert OID = 1642 (  pg_get_userbyid	   PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 19 "26" _null_ _null_ _null_ _null_ _null_ pg_get_userbyid _null_ _null_ _null_ ));
 DESCR("role name by OID (with fallback)");
 DATA(insert OID = 1643 (  pg_get_indexdef	   PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 25 "26" _null_ _null_ _null_ _null_ _null_ pg_get_indexdef _null_ _null_ _null_ ));
+DESCR("extended statistics description");
+DATA(insert OID = 3353 (  pg_get_statisticsextdef	   PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 25 "26" _null_ _null_ _null_ _null_ _null_ pg_get_statisticsextdef _null_ _null_ _null_ ));
 DESCR("index description");
 DATA(insert OID = 3352 (  pg_get_partkeydef    PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 25 "26" _null_ _null_ _null_ _null_ _null_ pg_get_partkeydef _null_ _null_ _null_ ));
 DESCR("partition key description");

#223

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: David Rowley (#222)

11 attachment(s)

Re: multivariate statistics (v25)

Here's a rebased series on top of today's a3eac988c267. I call this
v28.

I put David's pg_dump and COMMENT patches as second in line, just after
the initial infrastructure patch. I suppose those three have to be
committed together, while the others (which add support for additional
statistic types) can rightly remain as separate commits.

(I think I lost some regression test files. I couldn't make up my mind
about putting each statistic type's tests in a separate file, or all
together in stats_ext.sql.)

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-teach-pull_-varno-varattno-_walker-about-RestrictInf.patch.gzapplication/x-gunzipDownload

0002-PATCH-shared-infrastructure-and-ndistinct-coefficien.patch.gzapplication/x-gunzipDownload

���X0002-PATCH-shared-infrastructure-and-ndistinct-coefficien.patch�\�s7����UYK+�/QO�*+2sq�%yEes��-8�����d�|��}���g()v�JLrf���n|��Sx�p<����G��������l>�������mY�<8������|1U+1<��	�'F������������g����1�z�ZH�v��g�Y������&~W�����|x~'{�'�C�3��4�����D�������b0�����DDK*[8�<�Q&V��JH���D����������Q�����J`���B�����v$��P_b����2&vV�^��[a,��]�t:�����z�:��',K7X����.#��m<~q39���������������������7��h[�ts})b9s��\�>����o'7���^O\�z�����/��[1��yry�7����	� ��<�R���h���V��n��(�z��pG4��z���Y�������_�����$I������E�����u�T��R�V����;D*����[�A����E�$R!�����qY��[t�J�"UU���`�{�:�x���*|)�A(H�:]�x)c��k���0�"IE<Dp2p	� ���\�h�K�/�H�dl-;�������r����\��~b�!�i�wE;��_��a�J�r���)���{�>s/��%#�6	?�Ytz�XC�-�[������+l�"s�������p!b���t������5S*�L��u���d���:�-�*s�!f9�����,�1i������A�IS,eD
�`�DA��l1�]$����H+i�9���h�4�I('%F�Q��5��qDPv�z�F����d��o�vv�Pgv_��ih��������%����$�W��[���8���/+������}	[��z����hk�rz�m*���IRD��'�{�]��:�Q��y	�U��"��k�QhvE��j�-p�N���C.�	��$�4'#�X�x���Q�f��$�O���r'��9Qb��]�*�\��vcN:��m����y\������5�iF��`�z��oK����������;��$"��	���c���n�|?������~���U��DQ��3v�-����%�,���)!:�C��v �0��14F,fI�B5@N#���#����c���1n\%}����	�@&+
��
#��]���#�E�������FT�"(�f�j��.���BcY���&o�/�(�5���"3 �)H�E�~g�����c��BL4	�?��s)�w;�i����]��`�F��A�A^.�{	[�H��<w��"z��(Z�U����YE�]���J
��9+#����y�T�:m��%�h�>0�/$��#z�f#���6s|���
N��OA�Q_�D�p��-xpw���k����X�4��B��M�	�TD1��a�s'�2-��w� ���mL�3�(�=��4�ys����G�"^	1$x� ��
�5)�	V|���X�P�G���Q\�����d��i��;~����`��	?���z\�)VRkTC�C���)a_�B@�BR��G��xE�����!�x �w�h�������.��_����B�����	 �����B#���
���>G5-�_�p��)k���>z:w0M5��U�'�����B X�F���FYsJ��jMR{�b��"����"�o�"�d�����h&��H�^�n�b���/1i�6,�S���E2c�vr"VK���Z�O=��j��L����
��B3�R�UCSl��(�z%�1�V�����h�8���AyW������t�{w�����[�j���j�j��A�r)�u�
�Qkch�����5�IJFmyv��Iq����@ g�(����cKCc_��0����E`�ZSRTU��������1�~�*����5$��]< �(t%C� �E(�������?���X��.$e�*�=o/'
���k��F�������R�WZ�)k������n�#���;��h����_�`3!�h1�u�0\��)��7?�;�6[i`��/%r4yx,��	'���|�8~�V kd���>�^5���t[5��J��U*�esk�T�Q ��Q���J�C��N5j�BDT]H�o�Z���Lu����������S7R��L!�@���j�+?�q5����44���vnV��b�k&������Q���9���*�t�t3��Q��+t�B���*Xn�BZT����MT5�dTUR7������*��V��f����D��y�|$;Vs#SQP���L�Mwi�DtfPo&[!��>�37%�a#Y�P�RddG�����j�O����*��Y{��=�m$��]�
���a����{G+fvR����|�Lg�MD��G3G��y�J��P4v��*����#7��\������D�*����Jdmtrp$8��R�Zk���)%V!/bl�l�����������YC�59�}p0?1�|is�4��$����(�Q����94:H|>EV>�6.�C�F��4�����n���Km yJ=��6(;Zm��Z��oX2�m���(O���-G����[�!F��V�6��6��F����
���7
	��0x8���'��1?�����S��?	�^���a��Z�f6��>^����p�>��q�/�<�7|�*�e���'���
�~'����T���+B�����E/����"BEk�zK��*����}����G��#c��>�����;$[��S�.��� ���N�g�Fh���gbLz���W�5&NW��I���Fz���3�iC4�k��n���,6S.�������M[Z=Z�,�STu&.�VAYQ �Ez�H�^�}�2�>����eo�B;-��x���t��:$�����6�=�/w����������Q�}� �v���j�L����4k{�u���0v-WFQm�������63�-���_yC�v���;��)7����(��8�-�d���7*��2yX������}��U����Y�5�I�l'H2
]�{��-����v�$�eS2�*5c���
�\�^e���5���X������}���Oc�9��T'�������B��B��\�3:b�
��TyGyC4�����$��Wf��y��e�����'����9���{A��N�@�$���N�����x��zS]���7�c��kS��������S��Q�q�MiQi�.i����&�Y��b�`P�z��yJ��3w�(��z��Y:�]���W�F�����7d���(|���b6_��B�����`��3aO�5����!��H�@0��w(�n��3���c�dd�F�!u���6�5���p�:8��z���������B��=V���d|�#;��p|��DW��[��UxW(E������q1R.���S������|�aD�\f���B^�1ki��',�����`�o�o��&�`���'%52"�
U���~	�<�Jhu7��	������0�B]���")a��c'H<3L1�*�=�*2s��TqT�'b�)<�5Si
*�	/qcg�T��"� �t-f%�XKl��@�����
._f>yk��)d�J:�����+.�/����VV����O�5����T�W,��I�����9<K����98ZV��;���������3�.���T�L5R>A`��f�	:�9Q�#7G�Z�/��da-�J�f�LJ��
i���I���t+��PP�
� dqu�s��H�����	�`��u��s��G%WYAs�a���TI_�f���V�L�onn��{��_�VMEx��0[���e�|��z�Q�CPvw�k��s�(�����'$�%��W�M�$F~]��-)�]�v��>�_uv�]M'7���������[�����]f���Z��RE[�.��E��/S#+b|'L���9��j��m��i�M@P}���� �k���
S_��Mn���s%���f~,���`1�F�f�-�7���MP-�'�������������ri�k�2~������9mg'/��)P���x��]�6O�-h �Rl���A�����su=��s	-yE�3hd�TH�����Ke�@6�
��F���CR����Y�hI�?�� o>��<f]b���]i��(�O�Xo�_����JK����ws��M�\�Wy�����2��(���D�h�?�����b��&A'-����7�z�C��1����������6@����P���������A���m��Om��Aw��
�����c�g������<r���@#�#���M�(K;��������>���y��B����KF��[YM6#���C>g�3fw�!��C����������F5��C=!y:��G��p��xg�g�I�:d��J�k�9�4���JO�4S��2L�W���gYm�B5��������$��+Y���3Y��[���d0D��v���f�8�m
���
@��q��#Ju�c)��1�����\4�C`��c�#Hp��#��������c8`��5��*��c�0
�Pa4�T(�����%�J��eE��da�+$`�0�r����{�M�c�J��us�qq��-p�eS��(gJgK�� �q}-&�&f��1E�*L�e��q�k�H�^��'>P��5��#O��8��X::/
��s&�����\& ���	���jDRD�bp�r|�)nE�#s��|��V3!E
�i��T�R����,���A��k1>6�q��A���O���~k1��4}��7�1�R�#�a����b�d�M!����A)�%����h`D�te�������\�7��&��G\�����Gx.�����k���y�;��Ua�9XQ�������>0g�?��l]hP������zC-��5�*E2x,�9e�,�CK��z�g���&�����n��]�o��g�"�O��C�A���&@*��d)|(�]z��[�0�X_���$�x�[�;���5[��F{�����	����ti�m]��&�T�|�)<�+&����3 �c��<����j!y���#L�����AE���b���q��^�l�����V���� �3OIm�I_W/����\r��S�B����f��h���i�R�`��Z��&p�8#b���x~<��4<o�< ���d1J��V:�
�<���B
Z��>w+N��R�����������#��i3kl��L���#L�;u
���h�aoLEP����.�
��r��$w$��Lk�R}��}�������G�%��	�0��~��N.�S�j����S!-�p�H����]Dc�|/%���h��\�Zo��5��j~���n���t�dI����Y46Y|/%%s
��G��jn]�9��(T��V��ze�S�������4�5�����6�EcW)7�P]�J��E�C%o���xK��<j'��Q�����GG���T�]�x.��o�8����B�joMY�v����LW��
�f�Uh7Z�)�B����Y

d%Pxl�i:��8;4�$�W���B��������Sc{tt�sS�V�}?q7LBM����xj����yC&�����;�i-B5��{:��d�������S:�N�
����v�E ���Q�6����.7Y�O���/�*�h���I����������������}�">?;<��?Rv�Ys��&v�{��	;�����`��c,c.���2����r��(^��D����l?�YS�9w|'
�e�$3,��������t������'��^�vJ��)���v�^��`��M������Y�2��m$�p~s~9A+ Vw�n������������)X}��t���J����"�~5��-���
��{�zB���p:��6w����M������K�(yN�mc��6�c|�$\���~G�u�-w���e{T��.�����R�~�mr���"�/��LD���B/;���S��A"<z����b����o�M����������Qw,�b�8�b	;:�%�'����B��
,���wM{����s�s�iU{�^{
�Z�����b��o������z����ZU�;k��r�B�axZD�{)�TE��������~�!�a���(�2
)����$��M�F���'��X#�q�k+�!����u4
]��H�x��f��--M�����$���?�I��&�@y�q��h1�
�=<�"�A��h1
g_�bHY�;�Lya?x6M���������lE}�����f���3�jT_!�"nwd8�� �����"����DP9��R
	%���6�n{��F���&S�"�V�y�y������A>���U���o�W{��8�>~>n0h������TS��?����G.�$uZ����:�v`~|7��?���8������$x+�u%�@'��6���:HE���0Y_���m�6��P��8�5�-��6����[��p0��+�Q_�,��R��v�V�<�^������S�&�t��)��8:�d��9�F���R�X�W�^
��U��X)K��N:�p���po��rE.������j�k��~Y��'eoW�=5 ����^�����.E�<%j�$�v600��8���N��R��A��n�re4R��pXd��� �$�QQ��p8��i��tn��.�0������	+P"��i[6Gy\���.dp;F��3!��0c�������F�Q��w�z��wX��uQ3�u
*��8�:#!�Z�w����J����d��V�[c�m����_2�f�V���A��v����M�0��)�ZQ��#hX)9U���M�����p������_!B��3��SP� 5���4'�0�d�W�#�
��r��'�����W�����2
��_V��X%Od�b���2����RB����J�HJ���?U0�������f��)��[]��tgY!��S
(|tN�sQ��$Y��(��iB)O�Z��-v{�5��R���cXK��a���Y^B��'����s^cE����	�!�*v':=�����@U#��L�j�s������@@��9�619�3T��A�!����z��/v�����?�s�r�+���@���H���c�����o6��������>qhs���!��m���-�#��L��P�W^��Q���9ni��w!L\���������&�&��}O�g����4L�&���L�e��r�{���`b.'sk= ^e����J���0����Uc�&~'Y�X�e��$z�0�W�n�w���k�o�K����c��eT��!�"�\�5��7���	�P���?1*��G
i��V�.@]��&D<��mB��.���'y)�NhE��%���>\�B���Y�6��K��������!3�
��z�ve��F������	H�F�T����~�neR�@[�`��3^��(6���I����&WV\=��8A@�;�:�a^`t�8i�Z�e�A��}��l��:�ks'On}��e�x]����^����>�e��'�4N;*wTG��)SR���?�2�1�b����o�i{Yl����.~0�C*(��]N����v���&�;�9��R��h
)r�T@�u�!���A����4$�Gr\^&
I��1����$>A�$���E ���R�����?X��G�?��?Q*�E��Hb
e�!�\��\1���O����.p�6v��|�'�����q!���������N���4.Y�,���P�&V]t������������p�|FN����[�m�'3"���-�B����n��$k���{T>.��b1��r�p�����|�)��8 g��P�+?a7�0'��*����8��>��x�o�9y��O�h��u��#��/;�s�$N�G~�\�i_�e�O�l��+�|R���5r�)��FZ�����@k�/��m��T�����o2w?��`j0�N�cYt�A���T����A�����.5#���W4L8�;e�I-fQ�����s�7��)��V�N:�Q�Zo^��7[pK��0Bo����S��Q�4uB� t�����w8������n��������<)
�
��|Dh�7oa,\X�Yx?Cw�n�n �2�{H��0�u���H��#K�������B`P��J�������q���a�:K_�����G��b��?�v��D����H$!:�i�\_�n^5���o����)���sA�76x/��;��_�i��^�/�T�����1�}�
C����U� �/)g2����A�'��#�9��o^H2�&�o��:��x�@7�F
��;�-"4`Y;)!��
�Le����~�������JX%�w,���h�������"��o;m�n���4�����)�p���\�&Na���5�dxt�;<��z�|\--�k����IB��td��a�7�K���vqme���-+��������5�U��y������4�jKvr��{�T-`L���l�����z����6Z��G�����?����o�/�~Q�a!r	�O&������c���kE��8�m��k�����O�.�o4�oD=�[�D�����H�E��U�C��ZI���u���~+����:;?��~��?a�</5��Zm����,z���Z����II)��#Q�'ME�������C�?��%����!0os�������@��o��j ���~��L�dh	���H]�54��S�6�- `AA]�����`
�43-EG(��3�=��������N;�B[���E�	z�/7B �F�y�,��[Tb�`�����g1EV,�y��xbL6C\q�mN	�c���]�9��l��U�y����V��������u!Q��������%�/���j�Z��w	1o<�^��Y�w\�������,��S�9<UW�1�r1l�R���b�C*�p�pct��(�x�h��o0������%7_�	�� R�=������<��O�kG�!w��7�C�u�6pG9�7�S+S4�-B��jV�E�4\.Oq��s�?�@������{���Y0_�r�_yZ^X��?����l�Q�\)\�qu����g3�����Wg����������Y���5No��Es@Rs[qc������m�m���x"�u��d6j���z��i�4����F���\F3��8��H2~'#zDt0������\��?`X�
��)-�������*#�1(M�f�yu�p|�rl��[��4*���(��[��
{a��QQx�-b7�����q\�����:��s�������r���n�f"�B�I/&��������K�E��M1*��k�_���"B?�^�K�����InVtA���cQ�Enr��t�+�QruzQo������9�o�v����W�&���'���&�^"gF��x?�J���6,iM������S�������(��CF����7r��L|�x}}Q��4�*�a����M�$�bN&�&���#�S�����0���t����JO����^��~=*���F��'�/'k�h1�xmV��U��-v����`1����1)��,�Nn���K`��3 �[�o)]��(�]����J��Z��R���b�e��7|D���o��������|Q�H��P���V�^�'���(�t���Bk���R��r������J�l*+	���}�]zYIO&m
f��`�SU6U�2�O�����g�m������HB{P:�=���#��>"�c��\Bp�����Daox�F��Z
����[���9-���V%���L'�oN��Jh�����NIF<��F��fQ�.'���Gr����:B����e�l�����P�YW�H����I�2-�@I�9+�
iH	�;Q���yx�f\U;mN�+���H��J�N�6
�9Z�Q���J�;T�Qah��,�������A�_��� ��
�Q�L��6U�2�8v�
������xs'm8�w��-��mLoMo�38A���}�)P>
\^��p��N�oU_�|~>�������$�RDw ��CQL,8�qR��at�P!~E0�s0J����-��k_��p�	�Aj�[��������P�I�H�8m�A����#����Y0���'Jzj,����Wo^_��3�n��'p�	��n�B4��w>Y9_<���6��4
f���	p�z����c�L�wf�L+^@M"�d+2
���'rl��������p�����j6.�Z�u�����#uy������?�=�YX�gx%�Q�zu��/�q;2d�8^���K��8�J$�/�-M#�;v�����X�r����%��������i�
F����>��,��m�7��D�+:�\��4AI
�	o�s~�0���z8��[�y�M�b��T���gy#��?V�B��8�:%z��&}%�%��hI����-�.\p{[d�a��y�PF�|*���G���,4~�cZ����bZ(����_��X��Bq����0�+�����$.���)WI�p��F�%�8���B@�,#���44$vr`z2W�����__�0$|�������^�Tw�:�m�����P�����7����M�}}��A1�H<�r%5P_q���-J�s�x�����'G�;Y���Q������Ai	���=<89 ���2�4��/d��~��A������/jW����H����'2�9'!�9���L��!�G����c�2A���.;Z�>�L�Y�9S�W������_N�,]��i3'F/�\=�u��5,�p�����gF��������������2���������WL�f��9$���T�8��42K��n�
��R�"�"�'�T
4VBM�����
�`�����2�'�$y[yu������B.qA��T�e�V)wzk�ed���T�
�����
uj����c�����'��d�-����utd�e�	���e}Q^��m�
:�#u��X=�wx��~����/YoFl�)�����f�.�.�)u���0C�4amQ�Gt�
1������~��z3�$�������|��8
a�AqCsP�B6�Xt��G���������C�FNz����V�."}�IaNF����?�����i��3���d�v1m�G�����Vrn����h
e�1U_�0��)_i��m�l'����!�����r��k��l\O�X
g��B�K����V
��[��QL��X��<�@Cx0QF��f���eN�hp���`l���1��!����E���=�q��d������^9�kd0f��(V����T!�	[�����s��%dZ�Ob-�tRN�f��Q�?@S���/�n��T��8+Hn&��C+��y��y�irL��d^��O|oZ��o	��z�O��+�@��,��<c��,<(G�r�X����W���r���x�u����S!����IC���34������3�l����%RX��7/Z���k����V.hz��u_]��U����t�U�]����)h8�'cL�1�@�8�	� 2\g^(p�yGc�Pt���d�~L\��L{�%��(�H�=
�l�c�u������k����y�,���b�g��I�"�UB+'3�PG[��8���C4�g
��4^vn�W-<x�]B�,j�Fc��$�,YF�Y�a�81+���|�Dh�cv����-�g�E�0���^x�O]�'~r�����8��v�x�V	6-^~k��*D�a��1}�r�N�3�e�Ey�L��tU�k�6A���P���
�Gh�$��V������r��P���0�_�_��$��v���e�2?������77l���9��`Qe�m�G�d�c��/��5{Ky��������
��}��d��nPws���J�"�U��,��s~�z�����9��_7��?�h�j��a m�g�T���q����M3{_dZW���6*�� Y���BD.����������6��T�_��h�]����/~�g<��=���60n��Gt��{#�������q_s�O:�~����l=�6`u�	c�L�xYN)�`��@�q��r��8���q�R�����y1�?\��R���G������E$���Jf���v*/)c}����[�y����)a��	}���ez��������F'm��6��J�^�Q�����,�	>��i-~k����j���l�LBs�-_������d��m�=��_g"��NU��9(1�!yo`t� ���o��C6�����d5P����b�[������N�m*4���gh�F���������%�:�����iP�N7�����S���#&
��$&����7����M�����
8�O
����/�H�[P��*���>Tp��j����o�.����?�]1�Q���&�-���HT����@y�b�m.��hJ��R����8-B5:]c	ZEYkT�<��^��	_���y
����r�����~u����gr	J�R�uu~I�T���)���Y4E����Vz�xyc�Y��%e������rl���U?��|�����k�C����Y�L������������q/� ,��N��X������l%�,�D^��#\��������0Q�8��������`����E�.|����a=@_�����vL����1y�M��������'C"��C���|�e�Bl�������P���k0�7�Y8��E�&]tR��V�c
���K�b�pH_%��D���$�
���Hk��q|���k���m�����X���1��@oo��P�n�����)��FOK���l,�4}E�=���o�����w�n���N�^k��~�z\�����~��Yc���l5�$����^��I?��2jK��eI0��,�6V��N����z��:a����f8<=�O�k��Wl�h,\�����d��9s��G��\'�}��4@��������2;�eM��nZR��F����R���1�j�b��X�y�y+��)�
oW����9����yB�suM���`���e���ER�rQ�|�����h�������Wo��r�R-���1�mk��u$|�c��}�S�B��
h�UR��z��SqT�r��R| P����������>���ul��8����>�t��\����?���;���,S��S�$"�M�-�7vn}��]��k<�-�gx�=
2���SH^���b���$�D0�x��2��:��|B�%������r���5���'�����D:�G�zz�5+�e�2��	��*�8u���$|�>�ZI+����
N���:�G�������������(f����n�KG�f�J���W�B��
�[�?��:'�Z�'��Pp�x?�%�[�f���6=��"�~>�eZ�x;
G�E���L�+�����e���i>���}�I.*��i�	�/RO)������������n�8��Ci?��%_`�A+bB�UKaS�vw</(������j1s�����@|�t_o�H3?���z�0��&Z�;:��>EC��/F�����J���vVa/��j��q�������z�~8����b���CU1���5��x�1�9��At���h���IW�^�]���
��l��:��NX��|�\��F�UC6���9�����������s������0��:/,t9������d{"&�`��q���Z�V�@��I�c�W2���!9��]������n0f����
%�e�T�74d�-���yu22��d���/���x��.��`�+����{n�j�M���g�6�65�V��~�����Kk0NK�'�li��*�0j#l
q�����<{V�����������D)]�����f�29�f�r�Zs�1�Jn@;�g*z�:����U/�u2�p�X�������	���f��
���?�(�����f.�X2�����������~�p�u/h���'����<�����-�V����������[rl�P�g���4.O����M�l�Y k?�c�����M�#��l�
/���E��(U{#8-�Kd7���I]_o����I�8������qpk�4������Sm�L���Y[5���|��A�,gX}�����i�^�x��v�����v��m�����J�3w��d-����;(U{�P�++pP�+�5���{������[%��P�T#kF��M���������A��O'��zR�n^���v+%�_+��!�3����)��l���(i����-��kp�7��7���U���x�f���Y�����,5�������u���������-;)R`���(���`��'�EI0�������u'���ga�6��Ao4��/������xI�CR����K�yN<w������1.��*��[�`Lb��1����-	���&D��Aow�4�
����o���hJ�Z�m�;����1��6��G�:
�@
L}!�����#�H�U.��Y�Y��b0�����i_k39�G�D�n����7�8�r�l#y?��e����Wp�����������g�>�\I*
����b�V��3G���g�ha
$����)���7�TD��}��c�f��W-��xa�k��='�=����Xp�w���KN$l��7��'��d���;����b�p���9��}&���d��)\����b{����O���������P���Pz���{��9�LX���M
����N�A�Q=G�[@����&O���0�8�o�����
������O����pN��h9�hP�.FS��q�D�*#�����m����i���}{}��}��T�x�����X@C��#-���5_��f;N)���m����;M_?r�������i���+�b
]E����5������2�'ox�[����}���E�(o-_�����-�F� Cb�N�W���N,G�I�w�`��N`�	��E
����J��h�5n?4�._������Fb��S����-:���Q�\.�!����s��f� h�7&�Tu'�Bq p���@vc:=����G����*>�LKx�>��� >�`O�/��L��5�S����h��o8�(���E��W�&���F����1��e�8
�F���;��G�%"��(�5|m��9�jO�������.{�AJe��LX��s=7-�zg���D������UjJ������g"sPdg�}��Yt���X��5�06Vi\����=N�
���
'�������������I��JRV��DO��/{�L�phkj��h�M:���j`)\�l;��xBJ���$��R���%�z31
*���2'L=�������N�Qb�)�$�l3���+�������Jh4Ag�.���q6���Y��v�hv�t��;����HL/�
����{��8<���]���A��Np��E$��5��
q��rs����v@S��C�U0�qq���(('Hw���xY����X��R���b]F��q���Fs�X������������J�U�b`g��aX��������ESf(i�E��(A���\c��/B�����oFs��,q�����y���[?�@�={�HL�e�}�	��3��0��Z�����K�gJiS��������z��������n�&��.�I��(aB�BBh��}�K|o�[�S������$B2�"������n0YDC�[|6)d�EAmV|���p���_�d�>B%]��ZZ�����G�*���;�r�/ZW0��^'k�C�F���R��n)|%�j�{�X�D�U|y!RvN���h*���	b��T$u����|��H
[�x����~���,0�u��J��l�0��\Bh�$�1�F���)����cO4��o�*�I�t�
SPz������,�d��������"I+��""�$�����w�h*�C40"\*����0�{A�4:�"�ER��s��s�`�{�swsG���DfG,S�0�e���@/�pY@��q��W��Eg��/��?��on������������@��h����nr���;��]�n
�����Fv�1U�� ��S0�bM������k�@������6��Y�f9��+bb�����=-�2%o�OC���~�U"��m9M)�$$Q{x��
I���XK��`ji7�J�qx#����'S�+;����/!,��R���n�t� <[g���/�!�d'�HE>���t�M�i��%!9�y��Uw���'{�=�R�1��vr,Ag����|�PL��3�����MK"2Dl��$���!����V�p�����#mfc)]��![�N������%u����;����R�\,v�Gag	jR���b=����+	?�d��4g��d�����W�)���	x�P<������/������b���Dj���G��T�o�d�We�B����}����������5���_����9����E�)�|!�G���)������^���Kq@�qP�EgDa�\YW��V=>f;b��i�����A������G�Z��3c������$GE���Aq����nq�5%���J��?�w��m��8?EJ�V���b�l���:�~���	���Z5<���H����$d�Y}�d@�6$y�n���W�9AX�Gk-|"lA�F����g��u�%����Ri�S������Y�78U�/_^m����g,MD��9a����a�9�����\ �`8�e�b��q��m�^��PR��o�\����)����vn�1��n���C8[���I<����
z��\#���!oL(��`z��K�ifSM���`�{2�G{e�����_��0, <���K��J]
Nid���{���c����/��2w����-{S�]7�'vgV2����?*ad��R�|��7���n�LJ��x�gz�pV�1[�����E��^L�������A��g��<g���5�%�SXBxt[*�����yq��i�����x0�T���yyn;K
�Y���iL��������g������c��&2t��
�Dw6��-&y�H�����_0����h��
I)Q��90(���2"�%4-��k(gJ�d���)�/��@��c�k
�j�`�]����*���H��/	�B��������%�g�k�8�D�+1I�O"F���-y��]����2���Y��������-"m��������!K�sqva�*wi\��W����K6���$��s?�<,�'�p��FyF��\� ����0vy�#���(��@[J����V�Y	57�u�3���h:�#���NX�IX�E�a��+�Q�
�N�M���p-eU�k�4c���t�g���~�8��_M�������gP�#w
3_X�Q��g�)����5�0�/<Y��m���F3{�������n�&�Bt$�$�X���A�oZ�1G�Cx�c��d�'
qH%F�8�S����;s������T���c����"87�x
�F?�]��Ft��3�H�Y�]�y.���1�{�^���#��cB��m:�pRD�2T�K"������*�g�
����mz������
F�t*hr�&�D�X��������=��v�����\�0k�p�(>��`���������Z�{�;(g��~�$���'�����X���q�4,�����Q����XA�m0�8G����E8{�[��������9��+����4�(e�'g��$ihD-�y�8���tm�K���$�!J���Nt}�o)��������W���J��*s�����`�{�%�����;�D2�
g�N�{��x�y��i�@����x����o'�|�����9���&�2*��R��?d!BJ�������,� &�%��{|h�,����!���A`�����?,T�����-�0oZ���p����?$���o������d����$@�Y��A��Og�R���f?��Q�m|����	�e���I?�!Co
�h�J�PLX�����G1��KS��LK�s�b�D�l;��0L��~|4�f��H~�0^��[{�<��������F���xO}���l�cS���1]�������w����k��l�O�RCu�9T�J�X/+���*�n"�9�(�	�/Je�����0����@�&�r	e""M: ���5U(�����H�e^�94f|�:#�N���������@m�B��]�Qo�~��E?%��?����U���o��k�������D>�/Q�����~�\������DL��z�l��f`F9�p0�e�f=i�������	Cev�J]��.����4J�h���.��5��+i��S��fL�M+��������d���3z���M�lJ�����9��A=bh��$�>:������=��Q���	�j���S��f@�y�JM�O����g��g�t|������ptD�G��R��o��[�P\\��'O��]30ll����I�%YM�����)#�_�qk���G�K��u=e-�������H�@�NI��Dxx���J+�$��w�$o�o���o���u�D7����$y�����e��
�����[z��;�L",NV,���;>��N�(����[����q�&U�D��� q���}$ �3b�������0��Z��v����,;)�Ytt)��i���<O�����A�Q6<b���,"�f`�t���Q��cx4�*���y	=O"6w��t6	�wd�FZD�b4e�>����6,�����0�e�����:����k�a}
�o`[�Q�4zM�4d�QS8��
8?a��c;���C;���;w�i�_���`}z�U\R����7�E�u�kFrun�)�������q2���������!+�S+9maY��5���m�BoQD�-B� ���Z!�i^-E�@@�T�<@|�����fE���C]��_������%��V�4����<�N6����gG5��&����)W-�<����U2�;7h��(�/D���������k��e3����cS��)TH���tL���������hq�����_����
d�N��paLW��O�RP���,�1���U�����y��u��n�g7�!�*�@�"JH����P��>�������a�����z5A���PT$���u�	��g�#5�=���%��e����W$��4�3��D��A��!�F�9��,]���M��w������b�������"���V{ ��(:�&��AE�;��)J����`D["�5l#]�=
�"z�;�NQ9�B��k�����*R,y6`+w�a�j�-������q�`��T$�6�\fI[���=���cPxvQ`4r@��hf->��r�����\�;��:���`N=.W���t����L��=�� �H�d�+?����}�)����y��Z��	H������������I���pp����O�qlh:iL��&�$v���C:�����9�-y����D~�z�
n��n�v(�B�����L��*u����j�|?��A�q��,v����+���@:��Y���^��U��������.���W����#+�A�k���	��N�Q&��i��N�u��dx&Zz��;D��/0�W)C)��SxK�U�~���������f�aP�,���H�(�~�[Rk.a-b���1cj���F���;�vC7���_��C���F�Y��j��j6Z� ��������������/��|��>��~�_���}l����f�>��q�}.,��C��!��A��>e��6\%	%D@�]�3B<7&W>F��B��:�N���K,���Jj�z�(^	�|J��z��������j��d��mtgl	O'��b�2���B�I�1��s��c**�	���y�oM�W&<x��0�#����d�
��'<�,:�#��D/���7hr#'���9������$���2��2�x�����x����M�@�9�����^*����v����-�������hm�[R�]���^/q�����%e9��;
l�z�G:�gr���T�Z����r#|���6i��iQ��|�W����=���o.��r#m���K��V7��]e��r���N��,k�X1�#E�g��l=�$
�H�������\}3���\8����������6���7��������1�,)0�_�B�DR�Du�C��`czL���Q����E�Q��~���:E���m�c��p���#�;�"rU�d2>Mb�n����r�r���������u�� �E<�n������[/��}$�H�7�e��f��9j\1���Keh�S��S�?9;�v%�5��Q�4�����P���u��h�n�����'�{�U��[������o"�-k5�	��K!,�7Os��D��9w�f�T���!��J���5�seJ��S���^�3;���62i�3_����~��G ��XT�;��lS� ��������@y��@���1
���.��.�����	L��K�i���g�
���|�����'uP�s {�,�e$�&���c�^%�E
0�^��=M�^�\��<GK�~3'�O� �^ \�_Q�vQ>Q;��r�N�;Zm���#���np!�+6�6��B�i��J�I!tmG�A���p=:���V(������To���Kc�]�.3��G�iq��3J�����R�����D�h��(�]��d5�=�t'r����U�!�L�5gGi"
���������r~|"��l`'�������*�.k[����$~n`U�D1��������xX#$)��Lsw�C��������!�����{�d��b�+@	T�����h���O�������(~
�%��
���?��G�\s�e�*A�I�t����`�X��p",e�`/��"�;��'�>�{�q�$���
�Zs���>\�	^P/����$��n��,���b��uM00"2��?D����v0��9.;�3��4I�Vc��A{��h(�����K ��'���6��a�r�&s:yY�ohc	,O����
�p@�B[�lC��E�w{L|�W��I�-�]���'��x1������K�b6���~k��p�������L+�����e��>��{suv���A���/e#q�u%
��s���X�>p9B������C�o4����4-p����Et���z�6�X�����G<�J� �*�#3���r�,�MR�T���,b*&���F��m2��AA8P�a�]Z�{�D�OA��a8��	�V�e7����n.}>:��*/�v����������,���U��7 ��x/���p� ��=�wyK��\�4��)9���&$��V���o��<�2��|�J��xq|N��,��H�C�#3�8�52h�:"���F-��n��?�S����[�Vt�R%����F�_>n.��6��*�TBC�_ ���r�,>��-�������{�Vm;�)8�F��=8���L���6��XB����^0�[�om������Hz\�I���ff��<� �.��b��&�A�APX�
NW"���F�e0������v�;�J�������4�	�GX;����x���^�kE��w�v�G	�x����\'4r�wS���h�#������k1���.�^�ow<����#�>���S|����?�4UA?����-grI�S>3��8�u�����mz��A�"Z,9��0��<�����S���2i��{�j�L}x��'�2�Dq��2�Yt���eCg�|���=�D��!��,!>���7��7����H��n��p��q��z���l�P�6M�#���\`<�E�;�%f��pU{+�h�x�~�`j�I��/=���
�#<P,%���RS/]sFX=}zC;��r-���1����������*�6)��{�O2�AI��]P"�2����x,��~:��zmSYbL��,��r����`
�,),f�����Bi���t����"�i�h���Y�0Jsy2J���wyK
W�+���>���Sm=g���4�a��`�]h~ZqT~+�?�.���}�k`zf�n����O2�4���2 q�c>�m%&�su�xYs�8~�8�o��m���m���)z9���f=
z5EM������J����E�JfVS!��q���Q�u�k���Y�R�#)�1�F4.�@RC�I:�$�q��J'W�6��v��z��z*�=x�SI� p$����'C��O��A�O��L�����z#��;���U(�m����y"^c����2�RV�Y��,YH�r�~	���� ��c
�T���8���5.s��f���a����v��:���������kC�0�3a������H]������%���9W|���>���{b�2����H��5��RR/�Yr����&Q��"3_JjM�X�I�9#����+��+8�6��
��<sA��1=2�Z��L���F��^�p���Z�������5*O��'A�ts����:���8m�������� �5q��:�|����$,����&$��n��|W��3
��O4	���`,r`\D�E�;.k>����PCy`�"��*����O�soB<E!�uj�2���0w����\�.8D��<;�s��q��a4q�0�7��^,�}��h�&d�D)F))mH�i���1q�m�'�t<��	�U�s���6�`V�0��<����r^h-y(�s�������k-�V�2d��Qd���r�������,E��0��aG;V+>l����9��Ro�c�a1wH�YF��L}��*p�$�3K������v�[�N��S�Q�Z>��q
0�;.���`��9c�%QY�Ki��G9���Yy-���F���x�u����+�����t6�O����@��X���r��������}Ck�d�1	l�Y_�}�Z��.��C��
��XaL�������U�)EjdK�n���~��9##�����'H+�Xaf�x���\|/O�����$��7)����6X���.�P�6c���.���QL�@��V�i� �YLg�.����2���AaV"�4���A;]��sW����6�4CB�;�l'���M+�����O�#��I�����3:nw�&�(l�eh���
bw.8��H���@����bC��wjXWoo%�����Xhz�z^(�N�Z0s��FnERx8�b���Ru�Y��R�C�#���C>(�3��/�i���86�[��d,��$�B��Z����7�l�W��������
G�ws���8')a�9X�N.�y���)�vci�dAy�Vbr�_�$���)w��g���*'�-J���[G���2�]Ne���4�y��di+m��s�������)&�y�:6&�)��D�

��p/1^�FKO����S�>�Vd��{r�<��n��V75���B�y2�n'D{s��Qw��s2z%CM��)?�\���G���'���'����]0m�p��K�s����o�A��e�d�}�������d�V���yZ���m�2g������w<5d���%r ���2�4Ac���X��>T���&�~zeP���?��=��]`����t�wiu���n8��7:��c�����:}/J!\��h0��G��I�MM�(f��t�Ax�y�E�����G�
/�~Y�u�����z �,\���"I�a��,��g<		�.��}�/j;G�	6�v�:t�`���io0q1�Y� 4�X�M��06[��j"v�_��hU}LaI%j���t�J"G�=rA�;#6|A�(��($Qth����[��,S.,�4�dz��o8[M���,�F��S��$�;�Y1��������qS�!3�Q'b�*�2#��3�SO�-I�e�*d��.��db��hI�|�F����%y��b�.>�~)��<�
*��}PL"��'Zme}H���_P�����
��/h�T���
(8��T��Y�5R��/����\�FJrbm��'	��	�<�O��c�
��_z$I��t���n�9�Q�B�B���o���bM{�=���m����i��)�SW�����r���3+��&���(���HQe�����v�xL��!��&PR���=��������W����Xz@dq���:(�4��b�.�V��S�����!`9�%�N���c(�_����������
�#]��G���4H�C�IiJzC<io�x(�C�DD�������m�G��RD�.��(�(�z�1�=�x�{����mxF/k{jh����Q�+l���4Y�]���Yv�ZC�����<���!2���_e���i���� _L�"?U��+����=�iN.O�]�)\f��]:�����<��T�������������4Q������E�Z���C{O���1��8=�+������&��0r)��_����[g��~��ms9c����d2sJ�Q�b-n]
+�^���]�^��3�P��Q����-�)���%�Y����Z��`t����+�=����*�c����w?�V�'S��?�h<R�7���������6F�$c}[�����w��U���K_n���Z����.����S�fm�1����QL�����r�5�d�F�(G@���D\�8Pj8oE�H�%�
�l��S
l����ib���v����U��q�9	H<�ns[@�I%��K����'��h�������k�C�l@������',�x�S�5����b2���pI�a�5e�hK	�1�@9���R��C~
U���ob�p�O�!E�zAK��EcB����A��V�h`�_��V���tq=��o����8��� 67c
�\�m�`�.���/�:E�j�@a�?�4����\)E>����3G��P{s������`�u�oN���-�����nKX����g\�(bD���O��iDR,��m�o�\��������F'�&Xyq,��$L��lC�oI"�NCFT#�����7�z`����3&<��u���# \�����8m�4�/_aT��^2���,��C� �s�s'caiY[�5m��9��8��G�7����e^$���tl�D`�������t�������F��"W���^OF�ju�tXW��`x;�
�w#����0��FM�0��E����#_�J&�Y�N�jT��,!9@��1
��:p+�j�0�����J���z
,�[�8{�eT3���z�C!�����3t8Q����'�c�91��}u���r-6
��������!�x�K�s���eWE��e�Dy��g�Q���OQ��<>?�}�������k�ut�P����Y���[k�A u�|��.��#���,Z�����c{���`�n1��$�l�a�Z��:~4�|�C42�R:�������E�`��?����~�6EAI�,
�_FE�Ip�|+�h��i�� ZCPN�p��%)�@���b�� e��n�c��=���kXcH>�e�~�~�3��H�gL&���#�!_����4/0�v�|	��zKV����IK�
qo��vg;��(k*SR;z�	SR
�:v`��S��OR��9�����E7���6dU���=/�o��VR�7�*�C�RvXr*b��svx4!_w���?guA^��s����q�������t[ND�C$0M_����z�!ndH�5Q��n�KG�	A>���6,��L�3�+���3d�4*��s�4�#\�@x������9c��|X@k{�����e��+E�����XGYj����Xi�`�Jr���X��v������<P������m�����I��zO�I��
$�:�Ylg����~<Wi���D��	�b�x�����l���v|�6=|��nb��;uYe��5�
3pn������)��x~j���MM���8��L��:-u������H�m�O`Y�f�`oW�g��[-�d�J�4��#%��{�������$]���M�����	��O�X�GS�`�C�:�h��@�6�������a�$��z���f���X�2�4���=M��2m.f��X�ma�i���=c@)�-������{����*�<v���m�\������f(��`�q���k��w#��J��[v�����#�]
W>
n���*��j���:?=���9fa<)�3�_�[�~�ue����:+@X;����U+������]��WE�"4m�F��(Sn�[�h^X�+��|�5�R�������D���1r;��K)�3�j����rd#�g��^]�K]rb���C������u�th2�����0�E�L�!�H]��pf��&g��e�x1��N�L��m��4a�(Sh%�*.V����d����?�?�d)9����f3�Y8`�R��L�����d�9���S���^����v�fgF�u�G�0A(��N��������L��U�q��F  �oS���`�����(���K�<����5GOu��k�)������!z��>?�������A��W��	.�Kw}|iWE��BFE����%�m�|Y��,_�>L�l�S�wK2x!��J~1��h1Y6m����'���3�#Y�P���M
���T����4?K����:��kq��x:?����&/�>g�$����{��S)k"�Ec�&r}N�x5d�
���������XBr>��@���>�)�4�C���1���x� c2j`���cqgHkE���X�O�$x��6�q�<-��r��u�Hu������t���^:�m��pH"A���sFG���?yg����$��������wpe�����a�v��1t��%��s��=���9����g�J�S�V��rP�5��������7�X�p/T���C����g��	�=��t���>w����B�)�'8;��~���x�)�[P�v@{���)<�SYgF������4�	B��>?:��se3��r.+�74���A���
��ZC:� ,�cf�Q�g�6������_��9[�ECy|tDCy|\)�Hrz2�DH�S,�l�I�_�	�����z�_�ST��h�6��Q�����0$[���FS�����^f=�h�~n3�['�6x�%����lq+��
�>���<!N7�V�)I[����7/.-@�H�z���V��-����'z���_)�=8*S��e���a)��c���M#6�f��[�����p��s���KQdaN��_a �]m����t�M�| ����u1�F	���D�Z��l�X�K��Q9�����n��hcVh_��!�x(�9���6�TE����p�]�Gl
h�,�b�%V�o_0%�gB^��Cn��_p���i��pFV�:�
���Q8���R�{�&���\.��x0���*&{3^MM�hQ[���-�����z`Z�6�L�� 
�qF9�EF	��P����������=��?�x]�1v���F�|i�SL�8o/�0Nj8�`=�y7��X}��(��/���K)r�TJG��w*e$Iv�	��5Z�a��J ��!+6l	N���df���Ur{m�h�Y����)@O��,�
��`a��(��3�d��9��w�
���T{O7�1�
�q-�ZN�d����o�}���=�������_=,�+j�*4ViQ�SV���7�O�#�	!��6{��r~*xb��w8T��{WT�"VtLQ����~�q�b~u1C�]����C�P��`!N��P�K`�6
���A`n�]_r�bBA��E�O�C}3��V��i����u�cK���q������D��eP�V������hLTM
��-��Mg������n������xO����D�����m��~}�#��h�^CN�"v*����7i�Xq����3�/����=�5�	���]H~uCt�U����!�*vo�}����%���>��zQ{���Q1yb�M��9F1y)a4$Q=�dPd	�~$\���Hu�g-�<&f�`F/��ZG�{G�G)�q�M���Z+�a�S�U*�?h��<Aq��O�5�%
-�v�C��Fx�a�J�#L�/(�Q0���]�Gp��{����6��#5M����cH���GJ��JG��i:���f�Af�(��
Hk�����@���D���k�y	A�����I�Ac~+ ���%�&2]#auH��y��A�^������-�Ww#�ZS0�Ri;�(4(H�85�F�2Gtp��J��8��%����,[���=��4Wl��1�q�l�H��R��jMLQV���|�M�W������@��I`h6��J��	�����G��xHH�����ye�S~1���)���E�����I^���)��F���~R�B<���`��7H���MD>��!��`4�� U����+-�1(���|�����T,1����G�e	�F��\^�5����)�"�~�k��Q<&����`h,L
NM��G�
I���
���,C�����������$����Q�]Jmm<I)a�JVPyn�`B��0v��uc�J�hU�NRp��v�0G!�Y�l��)g��6�E�+����)g�g�B6�Od�����"i�D.��2��Kg�;�k�1i����u)U��n7^g�D���,�t%���Y{�m�������Q�%��X��sF�������I�0?��������������As�7���n�t_���7*|�� ��k�g�'������e�_����t���o��>>�����_����w��)��~${����������.`
n�SR������B#:��S�R�����;�����	���sL�;���@��Y��8G��+}������.s�C��^�h��Ao�yT.�0���v����P���5��X���%���V_���"�+!�!� O����V�������]�����i5���.�	3��� ���p��f�k�r%�p#2_C�/b����+87�il�6�d_a��t�4��_[���V)X��[��:S;�ej��;ac1fh_J�C�nHv�ij��;U��x��e1�����a����B���jY�����UK��cv��d����n'�;{Cy;J���S����_	��X�Q��p�fL�0�1B���c�r�������g�9O��EN�}:=�V�)�`���$�S(����T��x�nTr��<`�m��ic*F�D��CBS�@Z�u����\wV�5PyOG��Pl��j��j�n�X:���*�F��,�F"!
j���%�:�ns8l�w�	N�Yx?���
��.
��������`g��D��`��E���Q��4@�n��V����H+	���Q�-i!s�!����>8r'r}��q�`�F�d20v�W�g�:��|}G!C��(F�����)#`�oU�o���~��7/
��������0 ���~�E�
|���������`���,g�����,�M/A�Cvnr,
!���{��&�h��`
""��:�FF[t���������J.�M��S2�Q���*.x��5�P?Q�oE_�I�#wqh�/���L�S��=��V�&�i`�3�|�f0�"\�0X#q|, X��Y����B\��1���P�6���@������,Hh�����
`}������L,��
��{:�"��3*v�(#4�����]v��w|�>�������y���C�L	fz���,\>�T�u����Zr�||�l'�24K�K�-�X&���AT�s5����M�h&�}!r4��m����4�C�1������!j�`  ��P��>�

�|�c�����,�
.�}h(�}/z�������x��=l`��hT-����kZ�k�ko[�<YS�pY%3���w�Bf�5y���
4�\�b�������A�i�����>��������Ra�6@��)t��$i���X�x?�J{�K�7XJ\�0�0��"����BkqT���fH,p����<�0��1�w�������xK�z;G��'��D��	
G�H�7
�������}��������$J��9{����5b��x��u��ib[Hu��
�N�0�������1>�q�+p����D����&y5����&��&!��@S1L9�J�\q�����l~�?��]��!K��a�Yh��6�,a�=���w�1��4�������"�2~��U�#�2���p�"d�#D��9��H���p��jQP��I���W4�B�X���.|?Iy��y�0�Xk�T�f�x�����9#	7���l��Adh�y)�MO�����������&�'���i
b�Z�VE8_�w����Z�����![�D����1@3CB:`���M��E��y�7�^���]����C��;!�r����%2��|wk�����<?o�M���������
B�3����_�"jB�,���D����-u�;��~p���yu�Y~u��8 �H�w�E��Bf�C���
�w�x�=�p��)L=�_�;�a��#sX9(T\�<���F4�=d����~Ft���@F�
���[�)��EP\/���p-����2���w���8f�B���$44��$$?���(EZ����4��������\su-g&�kKV���$�=�A��E�����1q�J����iNV!eg�{�����3�����7���A5��b�q����S�:7�-YV+D���F66:��8��u~�A�|�?X�#Y
h?s�]���=%�hF	�2�
z�![#N�T�#
'7!��2E�sD�TR�������
�sO:��hw���h���	i�����Yxk~�zK���IB_P�[��x�-N���(����-	�!��gx��j;��|��)��y�Z�t��xF�������?U���s��]�_"�ne����$����M?��g����1����WH�a:�0���d:��8O�����L��g�p��"�_���z��*�U���vi��!q� ���4�
{���nm���Y��^���'[��N��r����.��a7K�hu����V$�q;.�.Qm�J����i�������*���)hq����z/'�9.O��T;�e}V�m��7	����"P��u�SL�����l����t�|Uoc���#YT��a�=-3��n}�����~8��F��y�����2$�7��A1��HU����8 �m�����i.��ZZ8���V������7R�-���j��k���,�<m�/��Zc���v���n��I���U��-����dc-M2����'�>��U�_<Wp�}{~y�~]��������%6a��4d�n�wr-K"G]�tX;���a������ni!��[���%��Y>�YU:b���9��aE�����z��n6~h��4
�0����UR�4�_�j4�T���R�N�i��;���D]'s�
:�OJ��[��f�k���6{��������m���A7^�M��5p=
��yWvcZ�
PD�9��R�|����f~������g^�g��$��.�l��6�"po�(%M����i<c�/rwa������u���E�B3���R��$8F�����N�7ZZ������';���^z�rY[�,������T�-�,G�z�`1n�@P�q.V�D;Pn�Tt�����o|i�����l�1�8�V���r�TK�,[RD��r��b�G�(WqU�5`�4�7����i��j6~����S��hJ{�nO�mq{��M8j#�dl6��V�#���l@	�)��� �X�V��;s)�w��L��^���3����������
����&��C�m��QI��'eK���~��B�m�Z���������1I�"�v����u�*�
�h!mH;^��k5���U�q�T6�
Yg����;n�n�u��@�
�f���u���CC���w����#��VRSJ&�qR��u�N��)���V:���[F6�qS1�Rf����V!���"�
�������_r����M{��9Yl��\�����G�9�NM1$�>��
j��v��������'�5;���;���������E[b�*0�V��Q�&��\�{:��&��t��(�Y���W?B�m��Y�=���3�'"�����r�����<s�����������GG���������l���?���@�X��r���*�E��H���h�=�#��j�C�E�.�:0�-J!�����bp&���*�o�J
/�%�w�x��B����q>�[F���UU���P��_Wz*��������#��<����-k��G��e3��e���Z����X�Tj� ��5�R����44��5W�g���M�����x��Q����`��6�$|�0�A�!~�H{S�B���Z������m��6�,�����_��e������0�*��U�Y�@mU`v�T����~d���������G��-^�)CP����Q������������,���XM�]�����]m�w�i|-���b���c������GG�Ok�A��G%��2��;�G�������&:@AJ��x9�T���O#����ty4}@G 4.�P��N/f��J����C����2!��x�����3(9��a���{����6~���1����[1�j��K����7W#��������b�t��t����w{���'X�<�J�����}TH������H���w���g�!��&j+�	�2�)*q�Ky�M��(F2�a;�</[���6<5�z2c��d��n�n���n�T����$�����w��p�����w�p2�@�|JW���^��X���f<�K�t���a2��5'Q'���i�d$��vP���	�5�M\[��i|�H!��F�l��M��Xm�<c���1��+f-�c4_�����o�}=�]�m�>k�V��FBn`WfQ��t������Z".�	(�CS�����'���������M?����z�X�L`��o��}�B�A��8u����E���w�����(.���E���
�����;&��q�/w��
������*�}��"�GT|#�pj�;���L�a�&*����Z�j���a8�� �Q<�#��=�<�W��:d��X+l���icMjCl�+n
1�����i�k\-2�
�����
�F����I�cbi/��d��Q��^���R=e#��(����F[�c?d/���;(�M��xp���1�6 ���wX�K�Z���
@x2#^�^��e�Io��n���M]�;���<
������/aE��I�hS-7���5������5����������Y�L����H*�c�`�&�0����4!M������mSZ�puF���	��4��2f���v�?(W�G�bpT�����4BJX��I���g��49�f��.�J�<^z��sG����g	.��XK��'wx�V�+j����B�[�,j�:k�4
h�4%��_�F��]����=qN^-���]��@���KXU	�j�%n��t������j�Cz]�h`��o��
F�7l��7�{��}}u~���Ku�22Nv�����H�������9L��wj�k�h�dV
!�
HB�>�@��V����������3�je�N�O��l��M`����~�X,�z����z�:EdoV'��0�{�jDo�����H�rT=�y�b����R������!qAUk�G�oFBM^����[�U����OE����~)�\1������*w�������)ZN~yQDp����QrN��eJ��K�0,��J5\��N��2��4d�Lv2�/_��!�p]��I�^	B����6�@�E��k, n�����U2�����e�m���t
�7��zT��Kj�	��������t�����mMp��[(��h��u�48neG]f�ZS6"��P���LJ���S/��{���;����+{����p�%��;� ����8b��Q���%�rCo��C`���^3TY���_�a�#�3����0p��i��UI�Q�C�]��Z��US�n��j����o��(IS���������m=��'����v�+4@J���H�Hg��L��-�d)f��bA��S��M�����A�R-��a��x�,)"k}x�HK^;fD{��2�C�u�u�p��s4�{��<m���u�4�q'�M����e��YN�~���n]��.�g���~�P>v�/������T�}�d�66xaXiLD_I��z�K�|J+b���4��Yx��CN:X�`���Y;�}OH;��k��f0�E\1�a�[rL�c���Db���:���E�-`��0pP��$)� ����0�J�H������`P��V��29-G���V��\\9��r1f�;U*{��+�y�QC�����L��;�����q���-��%
���Nb�=
?S���mc��Pg�Q�xF�T�Y�<0T�}�}����e��������K1��0H1j�y�`�JrF<tC#E����z�����h\P�����y1�t��A�1dS8#�M)��|�$�6s�3��A�
��y�y7�������|�>8�����x9���[tO�����l�(D����Z�� �J���!+b�fru�H��^��|���5d��=t����Y���'��R�Nv}I.����F��N
���k����-x�i�<�3yF�������(NR*YG�r����Y���`0$0&-����0D`1��'?����i���u+2���lr/�����e�������-n�9��������I��{�F��`S�-��b�����)���-,#�������S�q���u"��]��<��#&�	����������`�����(�8����${�"����<�Z:��q������������T�0�nZ��@��j��j��F������������=?k\�����:��G���*E���$���_�������T�����bsgov�>��)|�65N%�(2�LgB���o��(D{uP
Br@�0G���Qi4�(��(�t���|��~8w*�j���8��0+�&��IVO_a��
��g8h�e<�x"i���Mxk�aC���b�D������V��]�&��M�9����e�g���<���������A�e-,���%��%$6E�q�M�eS��&�u���%PO�OS&D�k&wf��b��}(�!�<���\T�H��������������1.�r"���}.Yd�7
�3~���E ������.ae�O�^�U�Zp����4((�g����n��T� �C�4���c�8���0���PQZ���3�JI��b(���7S��f����""N�"�yA!���M�+��a��eL��MQ>r��!F��=�<_}�B��^lc�SR$�L�������c9���Rj�(������06�
�$�E�~�#�;�U�c�'��8��Z���g���n�Y6
�g���c��Qz_��*/^�����[/�
��U��7�Q��F}���A�*�[/_���Tup1�h�q����i�a�:�a�:)8=�/4�d��P�z�}������qLqmC��C���t��4&h�eF �����X8Ps���Y�������������F��l�j�n�~�h��$G��f�>��'JA
�9��F���d��e�Z,�[�
�O�^���@�����|����)#�6��fn=��xk��6�FJn� ��I���d)r�0�����a(��H����^B)�8"�
�L�m%���qH�z��p�k4���u���c���6�"H�(�
��	$A3���dy����Zq�K��nRP����y�\>�U���q��w�U���<�������#�f�w��i����F���M@W�/���/~��
�i?
�L�$<�����Mi<<I/�o4[�V'�^��������%LrG?eP Z<-�J@+���
��NRh���Wo��E���br�������������m
���0�d�G�h��w��.g�a����gb��X���dwp����5
pQ��%D�%��Kp�dc�QV��	�
�����z���g�<�i�le�X�wX6}�����A�08��A����Y�Z�������o����
'�Q�����2"pG�}t����b�IZ�l�<?u�%V��:�W�c=S%+8�s	UV�5b�y��
���#�6$s���m�[�:����/����/n%�w��cPRX��7�x��r`rX�(���B��:6�,�"PO�k
X�E�x��E�=��lFA�T��o�}Bp�2XTO����eqQ:J�������u�v���W+�#!������xxz��1��J��#j���` r9��+F��]E�����,�oS����3����1c�fnfy��@��A���Y�~5Xy;,��:	��SM���a?%��$����<l&����o������\�bNXr���������p�iT������F��������o��1�������>��R����Qm��r�8d-'	.�#"c�/cl 3�z�`�\o����7/.De�l\��_�UO��E~!�*�l5��_�g��������i]�Qz�X�4�#�#1�s ��#n�pp/|?%���
���7�Q��yLj
'\*wB������r�i�nyv.��A6u�d���4;�����ny�T�Yc�d��G�M���#������z���8T8LQN�O7�}\��f���LYEL�h�+�y��}��{(Z�Q�0�e�����j��b'Fk��?7wZ��j���W�77�\5������1k�P! ��:m6`Q)b��.���r�By*�������y���2�V&8�����+qi����<w��6�����-��_�~0@}~������]d��9��-�w+n��0��hM�[�lPuE�2�q��^AY��n��N��V �n0u�J�l:�%q?�v+��e�[��*������%J%�I��s>f	W�z���2���v�����2A��9�Y	J��\v���.i��5�7|����+��l�WS���>
����T7z����Np/ ���9�0��esG�����dmL����+!��9]�K��'Q��x4�#R@m�K]�xq����&������@:���~���J?���'��_x��?������Ayv~O�]���������W�\Y�X��}4��X��I�����dx3u6 6ci!���h��(��v�������E�����+f>�����N����Y�d3#���Bo|���2�mJ����8��mY��W�!P���=�����	�YQ�	���������qS?�HKk�Gl
��j&2��"up{�~b��oS�dp��c��IgU���9�6������"���&d�P�����\Bfv%���w���Cpqo���*�M$������Q�R<B��M������^2���.u�;�~�X���ji��^J���z)YH�C�^UY��Z��_�^^5_�*�1A��zF]�WB�7�=����o�`1��O
u.�h�=C�>�/�(|���6�vlN�:��"�z
w�������	f�v����
��Jvqu�q����0���Mi&L��$�I�#���#����	��P�����d�������!/�A�vLXL*�-�'aXK8s;��c��P����.�	�n�C"�}��M���o`��L����iQA'tt0�N����@�d�� z�>���Bv�WccEN}�r]'e�$5_F{h�m
&RH��%z$�d�Af4�������	�������S'�df�y���U��6")���9�#h�&��s6Li�����?�3
���!)I��A1�����'Ur��r:#�pJ����S������C��������?iP�;�]%�^��t�����|��v����'�{���c�����������Jy���<C&FN�a��_��nG:S��B?��^�	����k���Ypil�z��9��>��?��������vL?{;��lZ;�
04�T���>l�nn��o������#����:,n8<&�Y7�O)�]������a5w�%�
���q����R� X���[����I�T��e�C���E&�"�n������l�LDo����eA��;���:H	M�w���|�lr����x�v|<nI��vK�A#��Y�'f��6�����pe#��li���l�����?rW�z���i��(��d�����0�
���C�??7#���F�i�2���W�!l��~�R���yl9���MIhe�1K�eDq����w}�X�!�!�����J/���d����'ad����I8���e^4^����(<�r�G�V����s����������+�L�--(�.��I�6�!�f��B\paw1C�	��Wp���e[�m��;)<������w����������4�`�7�����b�\���a�C�1��%�b1)������|��F�9��s�q�?eV�fl�"�\�YcF/�N�r��������	��~����s�M��+�|<��������04[����@�Y�*����O��v_�p��8����M��0������B��;����8�<���������s����	pS���T��;g7�Mi�z�r����Wl����,~�q��N��A��r��N��;L����hY����)�5Z������Z�'*~D3������X=0J^U5L����A�=��x��R���?{�X���G�'�s���-Hs�P��� f�;��������F�~W��Q�YuZ���~��:���7`�����w����<�N;]tgaK���CCK��
4B3�'�9����h���������&�����c��|�:�{�}s'�U-�o�@���3mYD��XR
!��u��R\�p��3l��9�`��=�|�f!��0\�U�����!��������;��h�du�[p���������Mz<�����K�V��'&`2z����l�����&<r>\Q�C]����i8�C9�.qH����	#~�j4op=^�|B=�����T�>�
��F��0���Bq�U����_�/~����t[c&<��l�XDi�J�L��X�Zjh�4IH�������A�Y���A?~�}�����`���_�\d��CA��J�Y9����W�2^�6~���w�;�j�����G�Yf�����M�i|y�W��7�h�����9s�{�h������QK��7Atgp0�4r9V����
z��5���W��@�9w����*��;�O����w����3��1��i��<M9[��!���},��d��d���?��UR�Xe}2VYA�>�t���m��#�)��m.�r�I4^���x%�T+��9z#�������A��4S��k5���iH�G�T"�i���0��CM#C];L�J)u�W���F�\�\����?��}\�FE�~���<c���O��qC��A-u����s��gY�+��F��SWw:��,������ah�t�@d]m,�k�CO�~��?�������JOOP�d�\�QQ�L�EHK�`����U�P�h%���
;L/.���d1D��w�N�hQyAU���G����y����z�������
�3�bn!T�A�T'��(�8_T	�i:d�LQ���Q�6t���V�~�P����UA��4��y;�+�E��HX��\���	}jt;+w���O�\�r��b�ub<
��`�iY(��)S�� 
9t��=��t;�L(�\PUvq�mC5�������l`}H�{�p���Z2Q��t���BET�R[���*jo��w��?{�Z%2��-����������&�c*F�����t���!�c�����/���<��Q�S����h�B�&
�I/yG�����~��uj������q�G�'���M���f��s�%,��b��?��.Yu8I��-Z��	".���r��c�����it+�H�evj�}88�jA���T;YL�L&$#"�GB 1�3X���46����^s�R�f#4�(�I-'F{!����Ax���� �����}/�O��*�:�e���x�����m�QWM��.��]���:>nD<$�� X!���������
���> R��� %����C�"���2t�vD��i���\6���M"�������X����t�������`�q�=���X��B2V`<i�+����
������~��^�vg��g�o(Mqd����q���:����JI�sZB=��a�x��Q?8����I/*k�S��f���)��t_?���3z�?���=���r���n��C\������S��?���_�c�sA�\����;��]�n�.����������7�R[�w�j��R,���ji�MFYK����t���-��'��{B~f_H�D��������������jc-*��/�Z�������Am=��d�e���o���v�u�;����)V����Y.~n��O7:��v��E������Z�����1��Z����}V���fP�QC�����,��~�l~��'h�?N����:]����i�&��+����tjG�������0?������V��Z���V��z]�w�R,�77��o�'h

0003-Add-COMMENT-ON-STATISTICS-support.patch.gzapplication/x-gunzipDownload

0004-pg_dump-support-for-pg_statistic_ext.patch.gzapplication/x-gunzipDownload

0005-PATCH-functional-dependencies-only-the-ANALYZE-part.patch.gzapplication/x-gunzipDownload

0006-PATCH-selectivity-estimation-using-functional-depend.patch.gzapplication/x-gunzipDownload

0007-PATCH-multivariate-MCV-lists.patch.gzapplication/x-gunzipDownload

0008-PATCH-multivariate-histograms.patch.gzapplication/x-gunzipDownload

0009-WIP-use-ndistinct-for-selectivity-estimation-in-clau.patch.gzapplication/x-gunzipDownload

0010-WIP-allow-using-multiple-statistics-in-clauselist_se.patch.gzapplication/x-gunzipDownload

0011-WIP-psql-tab-completion-basics.patch.gzapplication/x-gunzipDownload

#224

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#223)

Re: multivariate statistics (v25)

On 17 March 2017 at 11:20, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

(I think I lost some regression test files. I couldn't make up my mind
about putting each statistic type's tests in a separate file, or all
together in stats_ext.sql.)

+1 for stats_ext.sql. I wanted to add some tests for
pg_statisticsextdef(), but I didn't see a suitable location.
stats_ext.sql would have been a good spot.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#225

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#223)

Re: multivariate statistics (v25)

Alvaro Herrera wrote:

Here's a rebased series on top of today's a3eac988c267. I call this
v28.

I put David's pg_dump and COMMENT patches as second in line, just after
the initial infrastructure patch. I suppose those three have to be
committed together, while the others (which add support for additional
statistic types) can rightly remain as separate commits.

As I said in another thread, I pushed parts 0002,0003,0004. Tomas said
he would try to rebase patches 0001,0005,0006 on top of what was
committed. My intention is to give that one a look as soon as it is
available. So we will have n-distinct and functional dependencies in
PG10. It sounds unlikely that we will get MCVs and histograms in, since
they're each a lot of code.

I suppose we need 0011 too (psql tab completion), but that can wait.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#226

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Alvaro Herrera (#225)

1 attachment(s)

Re: multivariate statistics (v25)

On 25 March 2017 at 07:35, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

As I said in another thread, I pushed parts 0002,0003,0004. Tomas said
he would try to rebase patches 0001,0005,0006 on top of what was
committed. My intention is to give that one a look as soon as it is
available. So we will have n-distinct and functional dependencies in
PG10. It sounds unlikely that we will get MCVs and histograms in, since
they're each a lot of code.

I've been working on the MV functional dependencies part of the patch to
polish it up a bit. Tomas has been busy with a few other duties.

I've made some changes around how clauselist_selectivity() determines if it
should try to apply any extended stats. The solution I came up with was to
add two parameters to this function, one for the RelOptInfo in question,
and one a bool to control if we should try to apply any extended stats.
For clauselist_selectivity() usage involving join rels we just pass the rel
as NULL, that way we can skip all the extended stats stuff with very low
overhead. When we actually have a base relation to pass along we can do so,
along with a true tryextstats value to have the function attempt to use any
extended stats to assist with the selectivity estimation.

When adding these two parameters I had 2nd thoughts that the "tryextstats"
was required at all. We could just have this controlled by if the rel is a
base rel of kind RTE_RELATION. I ended up having to pass these parameters
further, down to clauselist_selectivity's singleton couterpart,
clause_selectivity(). This was due to clause_selectivity() calling
clauselist_selectivity() for some clause types. I'm not entirely sure if
this is actually required, but I can't see any reason for it to cause
problems.

I've also attempted to simplify some of the logic within
clauselist_selectivity and some other parts of clausesel.c to remove some
unneeded code and make it a bit more efficient. For example, we no longer
count the attributes in the clause list before calling a similar function
to retrieve the actual attnums. This is now done as a single step.

I've not yet quite gotten as far as I'd like with this. I'd quite like to
see clauselist_ext_split() gone, and instead we could build up a bitmapset
of clause list indexes to ignore when applying the selectivity of clauses
that couldn't use any extended stats. I'm planning on having a bit more of
a look at this tomorrow.

The attached patch should apply to master as
of f90d23d0c51895e0d7db7910538e85d3d38691f0.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

mv_functional-deps_2017-03-31.patchapplication/octet-stream; name=mv_functional-deps_2017-03-31.patchDownload

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 277639f..1c4353a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1013,7 +1013,9 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NULL,
+							   false);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 03f1480..9f21297 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -591,7 +591,9 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NULL,
+													 false);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2573,7 +2575,9 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NULL,
+										   false);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
@@ -4447,7 +4451,9 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 0,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NULL,
+													 false);
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
 	/*
@@ -4457,7 +4463,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 	if (!fpinfo->use_remote_estimate)
 		fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
 														0, fpinfo->jointype,
-														extra->sjinfo);
+														extra->sjinfo, NULL, false);
 
 	/* Estimate costs for bare join relation */
 	estimate_path_cost_size(root, joinrel, NIL, NIL, &rows,
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ac39c63..58b5ca9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4339,6 +4339,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stadependencies</structfield></entry>
+      <entry><type>pg_dependencies</type></entry>
+      <entry></entry>
+      <entry>
+       Functional dependencies, serialized as <structname>pg_dependencies</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index b73c66b..b61d159 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -446,6 +446,150 @@ rows = (outer_cardinality * inner_cardinality) * selectivity
    in <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
 
+  <sect2 id="functional-dependencies">
+   <title>Functional Dependencies</title>
+
+   <para>
+    The simplest type of extended statistics are functional dependencies,
+    used in definitions of database normal forms. When simplified, saying that
+    <literal>b</> is functionally dependent on <literal>a</> means that
+    knowledge of value of <literal>a</> is sufficient to determine value of
+    <literal>b</>.
+   </para>
+
+   <para>
+    In normalized databases, only functional dependencies on primary keys
+    and super keys are allowed. In practice however many data sets are not
+    fully normalized, for example due to intentional denormalization for
+    performance reasons.
+   </para>
+
+   <para>
+    Functional dependencies directly affect accuracy of the estimates, as
+    conditions on the dependent column(s) do not restrict the result set,
+    resulting in underestimates.
+   </para>
+
+   <para>
+    To inform the planner about the functional dependencies, or rather to
+    instruct it to search for them during <command>ANALYZE</>, we can use
+    the <command>CREATE STATISTICS</> command.
+
+<programlisting>
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i/100, i/100 FROM generate_series(1,10000) s(i);
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                             
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.095..3.118 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.367 ms
+ Execution time: 3.380 ms
+(5 rows)
+</programlisting>
+
+    The planner is now aware of the functional dependencies and considers
+    them when computing selectivity of the second condition.  Running
+    the query without the statistics would lead to quite different estimates.
+
+<programlisting>
+DROP STATISTICS s1;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.000..6.379 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.000 ms
+ Execution time: 6.379 ms
+(5 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Similarly to per-column statistics, extended statistics are stored in
+    a system catalog called <structname>pg_statistic_ext</structname>, but
+    there is also a more convenient view <structname>pg_stats_ext</structname>.
+    To inspect the statistics <literal>s1</literal> defined above,
+    you may do this:
+
+<programlisting>
+SELECT tablename, staname, attnums, depsbytes
+  FROM pg_stats_ext WHERE staname = 's1';
+ tablename | staname | attnums | depsbytes 
+-----------+---------+---------+-----------
+ t         | s1      | 1 2     |        40
+(1 row)
+</programlisting>
+
+     This shows that the statistics are defined on table <structname>t</>,
+     <structfield>attnums</structfield> lists attribute numbers of columns
+     (references <structname>pg_attribute</structname>). It also shows
+     the length in bytes of the functional dependencies, as found by
+     <command>ANALYZE</> when serialized into a <literal>bytea</> column.
+   </para>
+
+   <para>
+    When computing the selectivity, the planner inspects all conditions and
+    attempts to identify which conditions are already implied by some other
+    conditions, and eliminates them (but only for the estimation, all
+    conditions will be evaluated on tuples during execution). In the example
+    query, either of the conditions may get eliminated, improving the estimate.
+   </para>
+
+    <sect3 id="functional-dependencies-limitations">
+     <title>Limitations of functional dependencies</title>
+
+     <para>
+      Functional dependencies are a very simple type of statistics, and
+      as such have several limitations. The first limitation is that they
+      only work with simple equality conditions, comparing columns and constant
+      values. It's not possible to use them to eliminate equality conditions
+      comparing two columns or a column to an expression, range clauses,
+      <literal>LIKE</> or any other type of conditions.
+     </para>
+
+     <para>
+      When eliminating the implied conditions, the planner assumes that the
+      conditions are compatible. Consider the following example, violating
+      this assumption:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=2.992..2.992 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.232 ms
+ Execution time: 3.033 ms
+(5 rows)
+</programlisting>
+
+      While there are no rows with such combination of values, the planner
+      is unable to verify whether the values match - it only knows that
+      the columns are functionally dependent.
+     </para>
+
+     <para>
+      This assumption is more about queries executed on the database - in many
+      cases it's actually satisfied (e.g. when the GUI only allows selecting
+      compatible values). But if that's not the case, functional dependencies
+      may not be a viable option.
+     </para>
+
+     <para>
+      For additional information about functional dependencies, see
+      <filename>src/backend/statistics/README.dependencies</>.
+     </para>
+
+    </sect3>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 60184a3..6600edf 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -21,8 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
-  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable>
+  WITH ( <replaceable class="PARAMETER">option</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+  ON ( <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
   FROM <replaceable class="PARAMETER">table_name</replaceable>
 </synopsis>
 
@@ -94,6 +95,41 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
 
   </variablelist>
 
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>options</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
  </refsect1>
 
  <refsect1>
@@ -122,7 +158,7 @@ CREATE TABLE t1 (
 INSERT INTO t1 SELECT i/100, i/500
                  FROM generate_series(1,1000000) s(i);
 
-CREATE STATISTICS s1 ON (a, b) FROM t1;
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t1;
 
 ANALYZE t1;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d357c8b..c19e68e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -192,7 +192,8 @@ CREATE VIEW pg_stats_ext AS
         C.relname AS tablename,
         S.staname AS staname,
         S.stakeys AS attnums,
-        length(s.standistinct) AS ndistbytes
+        length(s.standistinct::bytea) AS ndistbytes,
+        length(S.stadependencies::bytea) AS depsbytes
     FROM (pg_statistic_ext S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 0750329..8d483db 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -62,10 +62,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	Oid			relid;
 	ObjectAddress parentobject,
 				childobject;
-	Datum		types[1];		/* only ndistinct defined now */
+	Datum		types[2];		/* one for each possible type of statistics */
 	int			ntypes;
 	ArrayType  *staenabled;
 	bool		build_ndistinct;
+	bool		build_dependencies;
 	bool		requested_type = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
@@ -159,7 +160,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 				 errmsg("statistics require at least 2 columns")));
 
 	/*
-	 * Sort the attnums, which makes detecting duplicies somewhat easier, and
+	 * Sort the attnums, which makes detecting duplicities somewhat easier, and
 	 * it does not hurt (it does not affect the efficiency, unlike for
 	 * indexes, for example).
 	 */
@@ -182,6 +183,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	 * recognized.
 	 */
 	build_ndistinct = false;
+	build_dependencies = false;
 	foreach(l, stmt->options)
 	{
 		DefElem    *opt = (DefElem *) lfirst(l);
@@ -191,6 +193,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 			build_ndistinct = defGetBoolean(opt);
 			requested_type = true;
 		}
+		else if (strcmp(opt->defname, "dependencies") == 0)
+		{
+			build_dependencies = defGetBoolean(opt);
+			requested_type = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -199,12 +206,17 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 	/* If no statistic type was specified, build them all. */
 	if (!requested_type)
+	{
 		build_ndistinct = true;
+		build_dependencies = true;
+	}
 
 	/* construct the char array of enabled statistic types */
 	ntypes = 0;
 	if (build_ndistinct)
 		types[ntypes++] = CharGetDatum(STATS_EXT_NDISTINCT);
+	if (build_dependencies)
+		types[ntypes++] = CharGetDatum(STATS_EXT_DEPENDENCIES);
 	Assert(ntypes > 0);
 	staenabled = construct_array(types, ntypes, CHAROID, 1, true, 'c');
 
@@ -222,6 +234,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* no statistics build yet */
 	nulls[Anum_pg_statistic_ext_standistinct - 1] = true;
+	nulls[Anum_pg_statistic_ext_stadependencies - 1] = true;
 
 	/* insert it into pg_statistic_ext */
 	statrel = heap_open(StatisticExtRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index af2934a..369e481 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -14,14 +14,20 @@
  */
 #include "postgres.h"
 
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic_ext.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
+#include "optimizer/var.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
+#include "statistics/statistics.h"
+#include "utils/typcache.h"
 
 
 /*
@@ -40,6 +46,22 @@ typedef struct RangeQueryClause
 
 static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
 			   bool varonleft, bool isLTsel, Selectivity s2);
+static MVDependency *find_strongest_dependency(StatisticExtInfo *stats,
+											   MVDependencies *dependencies,
+											   Bitmapset *attnums);
+static Selectivity clauselist_ext_selectivity_deps(PlannerInfo *root, Index relid,
+								List *clauses, StatisticExtInfo *stats,
+								Index varRelid, JoinType jointype,
+								SpecialJoinInfo *sjinfo,
+								RelOptInfo *rel);
+static Bitmapset *collect_ext_attnums(List *clauses, Index relid);
+static int count_attnums_covered_by_stats(StatisticExtInfo *info, Bitmapset *attnums);
+static StatisticExtInfo *choose_ext_statistics(List *stats, Bitmapset *attnums, char requiredkind);
+static List *clauselist_ext_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					StatisticExtInfo *stats);
+static bool clause_is_ext_compatible(Node *clause, Index relid, AttrNumber *attnum);
+static bool has_stats_of_kind(List *stats, char requiredkind);
 
 
 /****************************************************************************
@@ -60,23 +82,33 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
- * Currently, the only extra smarts we have is to recognize "range queries",
- * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
- * query components if they are restriction opclauses whose operators have
- * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
- * We pair up clauses of this form that refer to the same variable.  An
- * unpairable clause of this kind is simply multiplied into the selectivity
- * product in the normal way.  But when we find a pair, we know that the
- * selectivities represent the relative positions of the low and high bounds
- * within the column's range, so instead of figuring the selectivity as
- * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
- * see that hisel is the fraction of the range below the high bound, while
- * losel is the fraction above the low bound; so hisel can be interpreted
- * directly as a 0..1 value but we need to convert losel to 1-losel before
- * interpreting it as a value.  Then the available range is 1-losel to hisel.
- * However, this calculation double-excludes nulls, so really we need
- * hisel + losel + null_frac - 1.)
+ *
+ * When 'tryextstats' is true, and 'rel' is not null, we'll try to apply
+ * selectivity estimates using any extended statistcs on 'rel'. Currently this
+ * is limited only to base relations with an rtekind of RTE_RELATION.
+ *
+ * If we identify such extended statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so we try to reduce
+ * the list of clauses.
+ *
+ * Then we remove the clauses estimated using extended stats, and process
+ * the rest of the clauses using the regular per-column stats.
+ *
+ * We also recognize "range queries", such as "x > 34 AND x < 42".  Clauses
+ * are recognized as possible range query components if they are restriction
+ * opclauses whose operators have scalarltsel() or scalargtsel() as their
+ * restriction selectivity estimator.  We pair up clauses of this form that
+ * refer to the same variable.  An unpairable clause of this kind is simply
+ * multiplied into the selectivity product in the normal way.  But when we
+ * find a pair, we know that the selectivities represent the relative
+ * positions of the low and high bounds within the column's range, so instead
+ * of figuring the selectivity as hisel * losel, we can figure it as hisel +
+ * losel - 1.  (To visualize this, see that hisel is the fraction of the range
+ * below the high bound, while losel is the fraction above the low bound; so
+ * hisel can be interpreted directly as a 0..1 value but we need to convert
+ * losel to 1-losel before interpreting it as a value.  Then the available
+ * range is 1-losel to hisel.  However, this calculation double-excludes
+ * nulls, so really we need hisel + losel + null_frac - 1.)
  *
  * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
  * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
@@ -93,19 +125,71 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel,
+					   bool tryextstats)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then extended statistics is futile
+	 * at this level (we might be able to apply them later if it's AND/OR
+	 * clause). So just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+							  varRelid, jointype, sjinfo, rel, tryextstats);
+
+	/*
+	 * Check for common reasons where we can't apply multivariate dependency
+	 * statistics. We want to be as cheap as possible here as most likely
+	 * we'll not be using multivariate statistics in most cases.
+	 */
+	if (tryextstats && rel && rel->rtekind == RTE_RELATION &&
+		rel->statlist != NIL &&
+		has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
+	{
+		Index		relid = rel->relid;
+		Bitmapset  *mvattnums;
+
+		/*
+		 * Now that we've validated that we actually have some multivariate
+		 * statistics, we'll want to check that the clauses reference more
+		 * than a single column.
+		 */
+
+		/* extract all of the attribute attnums into a bitmap set. */
+		mvattnums = collect_ext_attnums(clauses, relid);
+
+		/* we can't do anything with mv stats unless we got two or more */
+		if (bms_num_members(mvattnums) >= 2)
+		{
+			StatisticExtInfo *stat;
+
+			/* and search for the statistic covering the most attributes */
+			stat = choose_ext_statistics(rel->statlist, mvattnums,
+										 STATS_EXT_DEPENDENCIES);
+
+			if (stat != NULL)		/* we have a matching stats */
+			{
+				/* clauses compatible with multi-variate stats */
+				List	   *mvclauses = NIL;
+
+				/* split the clauselist into regular and mv-clauses */
+				clauses = clauselist_ext_split(root, relid, clauses,
+											&mvclauses, stat);
+
+				/* Empty list of clauses is a clear sign something went wrong. */
+				Assert(list_length(mvclauses));
+
+				/* compute the extended stats (dependencies) */
+				s1 *= clauselist_ext_selectivity_deps(root, relid, mvclauses, stat,
+													 varRelid, jointype, sjinfo, rel);
+			}
+		}
+	}
 
 	/*
 	 * Initial scan over clauses.  Anything that doesn't look like a potential
@@ -119,7 +203,8 @@ clauselist_selectivity(PlannerInfo *root,
 		Selectivity s2;
 
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo, rel,
+								tryextstats);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -484,7 +569,9 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *rel,
+				   bool tryextstats)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -604,7 +691,9 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  rel,
+									  tryextstats);
 	}
 	else if (and_clause(clause))
 	{
@@ -613,7 +702,9 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									rel,
+									tryextstats);
 	}
 	else if (or_clause(clause))
 	{
@@ -632,7 +723,9 @@ clause_selectivity(PlannerInfo *root,
 												(Node *) lfirst(arg),
 												varRelid,
 												jointype,
-												sjinfo);
+												sjinfo,
+												rel,
+												tryextstats);
 
 			s1 = s1 + s2 - s1 * s2;
 		}
@@ -725,7 +818,9 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								rel,
+								tryextstats);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -734,7 +829,9 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								rel,
+								tryextstats);
 	}
 	else
 	{
@@ -763,3 +860,542 @@ clause_selectivity(PlannerInfo *root,
 
 	return s1;
 }
+
+/*
+ * find_strongest_dependency
+ *		find the strongest dependency on the attributes
+ *
+ * When applying functional dependencies, we start with the strongest ones
+ * strongest dependencies. That is, we select the dependency that:
+ *
+ * (a) has all attributes covered by equality clauses
+ *
+ * (b) has the most attributes
+ *
+ * (c) has the highest degree of validity
+ *
+ * This guarantees that we eliminate the most redundant conditions first
+ * (see the comment at clauselist_ext_selectivity_deps).
+ */
+static MVDependency *
+find_strongest_dependency(StatisticExtInfo *stats, MVDependencies *dependencies,
+						  Bitmapset *attnums)
+{
+	int i;
+	MVDependency *strongest = NULL;
+
+	/* number of attnums in clauses */
+	int nattnums = bms_num_members(attnums);
+
+	/*
+	 * Iterate over the MVDependency items and find the strongest one from
+	 * the fully-matched dependencies. We do the cheap checks first, before
+	 * matching it against the attnums.
+	 */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency   *dependency = dependencies->deps[i];
+
+		/*
+		 * Skip dependencies referencing more attributes than available clauses,
+		 * as those can't be fully matched.
+		 */
+		if (dependency->nattributes > nattnums)
+			continue;
+
+		/* We can skip dependencies on fewer attributes than the best one. */
+		if (strongest && (strongest->nattributes > dependency->nattributes))
+			continue;
+
+		/* And also weaker dependencies on the same number of attributes. */
+		if (strongest &&
+			(strongest->nattributes == dependency->nattributes) &&
+			(strongest->degree > dependency->degree))
+			continue;
+
+		/*
+		 * Check if the depdendency is full matched to the attnums. If so we
+		 * can save it as the strongest match, since we rejected any weaker
+		 * matches above.
+		 */
+		if (dependency_is_fully_matched(dependency, attnums))
+			strongest = dependency;
+	}
+
+	return strongest;
+}
+
+/*
+ * clauselist_ext_selectivity_deps
+ *		estimate selectivity using functional dependencies
+ *
+ * Given equality clauses on attributes (a,b) we find the strongest dependency
+ * between them, i.e. either (a=>b) or (b=>a). Assuming (a=>b) is the selected
+ * dependency, we then combine the per-clause selectivities using the formula
+ *
+ *     P(a,b) = P(a) * [f + (1-f)*P(b)]
+ *
+ * where 'f' is the degree of the dependency.
+ *
+ * With clauses on more than two attributes, the dependencies are applied
+ * recursively, starting with the widest/strongest dependencies. For example
+ * P(a,b,c) is first split like this:
+ *
+ *     P(a,b,c) = P(a,b) * [f + (1-f)*P(c)]
+ *
+ * assuming (a,b=>c) is the strongest dependency.
+ */
+static Selectivity
+clauselist_ext_selectivity_deps(PlannerInfo *root, Index relid,
+								List *clauses, StatisticExtInfo *stats,
+								Index varRelid, JoinType jointype,
+								SpecialJoinInfo *sjinfo,
+								RelOptInfo *rel)
+{
+	ListCell	   *lc;
+	Selectivity		s1 = 1.0;
+	MVDependencies *dependencies;
+
+	Assert(stats->kind == STATS_EXT_DEPENDENCIES);
+
+	/* load the dependency items stored in the statistics */
+	dependencies = staext_dependencies_load(stats->statOid);
+
+	Assert(dependencies);
+
+	/*
+	 * Apply the dependencies recursively, starting with the widest/strongest
+	 * ones, and proceeding to the smaller/weaker ones. At the end of each
+	 * round we factor in the selectivity of clauses on the implied attribute,
+	 * and remove the clauses from the list.
+	 */
+	while (true)
+	{
+		Selectivity		s2 = 1.0;
+		Bitmapset	   *attnums;
+		MVDependency   *dependency;
+
+		/* clauses remaining after removing those on the "implied" attribute */
+		List		   *clauses_filtered = NIL;
+
+		attnums = collect_ext_attnums(clauses, relid);
+
+		/* no point in looking for dependencies with fewer than 2 attributes */
+		if (bms_num_members(attnums) < 2)
+			break;
+
+		/* the widest/strongest dependency, fully matched by clauses */
+		dependency = find_strongest_dependency(stats, dependencies, attnums);
+
+		/* if no suitable dependency was found, we're done */
+		if (!dependency)
+			break;
+
+		/*
+		 * We found an applicable dependency, so find all the clauses on the
+		 * implied attribute - with dependency (a,b => c) we look for
+		 * clauses on 'c'.
+		 *
+		 * We only expect to find one such clause, as the optimizer will
+		 * detect conflicting clauses like
+		 *
+		 *  (b=1) AND (b=2)
+		 *
+		 * and eliminate them from the list of clauses.
+		 */
+		foreach(lc, clauses)
+		{
+			AttrNumber	attnum_clause = InvalidAttrNumber;
+			Node	   *clause = (Node *) lfirst(lc);
+
+			/*
+			 * Get the attnum referenced by the clause. At this point we should
+			 * only see equality clauses compatible with functional dependencies,
+			 * so just error out if we stumble upon something else.
+			 */
+			if (!clause_is_ext_compatible(clause, relid, &attnum_clause))
+				elog(ERROR, "clause not compatible with functional dependencies");
+
+			Assert(AttributeNumberIsValid(attnum_clause));
+
+			/*
+			 * If the clause is not on the implied attribute, add it to the list
+			 * of filtered clauses (for the next round) and continue with the
+			 * next one.
+			 */
+			if (!dependency_implies_attribute(dependency, attnum_clause))
+			{
+				clauses_filtered = lappend(clauses_filtered, clause);
+				continue;
+			}
+
+			/*
+			 * Otherwise compute selectivity of the clause, and multiply it with
+			 * other clauses on the same attribute.
+			 */
+			s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+									rel, false);
+		}
+
+		/*
+		 * Now factor in the selectivity for all the "implied" clauses into the
+		 * final one, using this formula:
+		 *
+		 *     P(a,b) = P(a) * (f + (1-f) * P(b))
+		 *
+		 * where 'f' is the degree of validity of the dependency.
+		*/
+		s1 *= (dependency->degree + (1 - dependency->degree) * s2);
+
+		/* And only keep the filtered clauses for the next round. */
+		clauses = clauses_filtered;
+	}
+
+	pfree(dependencies);
+
+	/* And now simply multiply with selectivities of the remaining clauses. */
+	foreach (lc, clauses)
+	{
+		Node   *clause = (Node *) lfirst(lc);
+
+		s1 *= clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+								 rel, false);
+	}
+
+	return s1;
+}
+
+/*
+ * collect_ext_attnums
+ *	collect attnums from clauses compatible with extended stats
+ *
+ * Functional dependencies only work with equality claues of the form
+ *
+ *    Var = Const
+ *
+ * so walk the clause list and collect attnums from such clauses.
+ */
+static Bitmapset *
+collect_ext_attnums(List *clauses, Index relid)
+{
+	Bitmapset  *attnums = NULL;
+	ListCell   *l;
+
+	/*
+	 * Walk through the clauses and identify the ones we can estimate using
+	 * extended stats, and remember the relid/columns. We'll then
+	 * cross-check if we have suitable stats, and only if needed we'll split
+	 * the clauses into extended and regular lists.
+	 *
+	 * For now we're only interested in RestrictInfo nodes with nested OpExpr,
+	 * using either a range or equality.
+	 */
+	foreach(l, clauses)
+	{
+		AttrNumber	attnum;
+		Node	   *clause = (Node *) lfirst(l);
+
+		/* ignore the result for now - we only need the info */
+		if (clause_is_ext_compatible(clause, relid, &attnum))
+			attnums = bms_add_member(attnums, attnum);
+	}
+
+	/*
+	 * If there are not at least two attributes referenced by the clause(s),
+	 * we can throw everything out (as we'll revert to simple stats).
+	 */
+	if (bms_num_members(attnums) <= 1)
+	{
+		bms_free(attnums);
+		return NULL;
+	}
+
+	return attnums;
+}
+
+/*
+ * count_attnums_covered_by_stats
+ *		return the number of 'attnums' matched to this extended statistics
+ *		object
+ */
+static int
+count_attnums_covered_by_stats(StatisticExtInfo *info, Bitmapset *attnums)
+{
+	Bitmapset *covered;
+	int ncovered;
+
+	covered = bms_intersect(attnums, info->keys);
+	ncovered = bms_num_members(covered);
+	bms_free(covered);
+
+	return ncovered;
+}
+
+/*
+ * We're looking for statistics matching at least 2 attributes, referenced in
+ * clauses compatible with extended statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * If there are multiple statistics referencing the same number of columns
+ * (from the clauses), the one with fewer source columns (as listed in the
+ * CREATE STATISTICS command) wins, based on the assumption that the object
+ * is either smaller or more accurate. Else the first one wins.
+ */
+static StatisticExtInfo *
+choose_ext_statistics(List *stats, Bitmapset *attnums, char requiredkind)
+{
+	ListCell   *lc;
+	StatisticExtInfo *choice = NULL;
+	int			current_matches = 2;	/* goal #1: maximize */
+	int			current_dims = (STATS_MAX_DIMENSIONS + 1);	/* goal #2: minimize */
+
+	foreach(lc, stats)
+	{
+		StatisticExtInfo *info = (StatisticExtInfo *) lfirst(lc);
+		int			matches;
+		int			numattrs;
+
+		/* skip statistics that are not the correct type */
+		if (info->kind != requiredkind)
+			continue;
+
+		/* determine how many attributes of these stats can be matched to */
+		matches = count_attnums_covered_by_stats(info, attnums);
+
+		/*
+		 * save the actual number of keys in the stats so that we can choose
+		 * the narrowest stats with the most matching keys.
+		 */
+		numattrs = bms_num_members(info->keys);
+
+		/*
+		 * Use these statistics when it increases the number of matched clauses
+		 * or when it matches the same number of attributes but these stats
+		 * have fewer keys than any previous match.
+		 */
+		if (matches > current_matches ||
+			(matches == current_matches && current_dims > numattrs))
+		{
+			choice = info;
+			current_matches = matches;
+			current_dims = numattrs;
+		}
+	}
+
+	return choice;
+}
+
+
+/*
+ * clauselist_ext_split
+ *		split the clause list into a part to be estimated using the provided
+ *		statistics, and remaining clauses (estimated in some other way)
+ */
+static List *
+clauselist_ext_split(PlannerInfo *root, Index relid,
+					List *clauses, List **mvclauses,
+					StatisticExtInfo *stats)
+{
+	ListCell   *l;
+	List	   *non_mvclauses = NIL;
+
+	/* erase the list of mv-compatible clauses */
+	*mvclauses = NIL;
+
+	foreach(l, clauses)
+	{
+		bool		match = false;		/* by default not mv-compatible */
+		AttrNumber	attnum = InvalidAttrNumber;
+		Node	   *clause = (Node *) lfirst(l);
+
+		if (clause_is_ext_compatible(clause, relid, &attnum))
+		{
+			/* are all the attributes part of the selected stats? */
+			if (bms_is_member(attnum, stats->keys))
+				match = true;
+		}
+
+		/*
+		 * The clause matches the selected stats, so put it to the list of
+		 * mv-compatible clauses. Otherwise, keep it in the list of 'regular'
+		 * clauses (that may be selected later).
+		 */
+		if (match)
+			*mvclauses = lappend(*mvclauses, clause);
+		else
+			non_mvclauses = lappend(non_mvclauses, clause);
+	}
+
+	/*
+	 * Perform regular estimation using the clauses incompatible with the
+	 * chosen histogram (or MV stats in general).
+	 */
+	return non_mvclauses;
+
+}
+
+typedef struct
+{
+	Index		varno;			/* relid we're interested in */
+	Bitmapset  *varattnos;		/* attnums referenced by the clauses */
+} mv_compatible_context;
+
+/*
+ * Recursive walker that checks compatibility of the clause with extended
+ * statistics, and collects attnums from the Vars.
+ */
+static bool
+mv_compatible_walker(Node *node, mv_compatible_context *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return true;
+
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+
+		/* check the clause inside the RestrictInfo */
+		return mv_compatible_walker((Node *) rinfo->clause, context);
+	}
+
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might
+		 * be unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (!AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+		/* Only expressions with two arguments are considered compatible. */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node *) expr) == 1) &&
+			(is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (!ok)
+			return true;
+
+		/*
+		 * If it's not "=" operator, just ignore the clause, as it's not
+		 * compatible with functinal dependencies. Otherwise note the relid
+		 * and attnum for the variable.
+		 *
+		 * This uses the function for estimating selectivity, not the operator
+		 * directly (a bit awkward, but well ...).
+		 */
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+		}
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return mv_compatible_walker((Node *) var, context);
+	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+
+/*
+ * clause_is_ext_compatible
+ *	decide if the clause is compatible with extended statistics
+ *
+ * Determines whether the clause is compatible with extended stats,
+ * and if it is, returns some additional information - varno (index
+ * into simple_rte_array) and a bitmap of attributes. This is then
+ * used to fetch related extended statistics.
+ *
+ * At this moment we only support basic conditions of the form
+ *
+ *	   variable OP constant
+ *
+ * where OP is '=' (determined by looking at the associated function
+ * for estimating selectivity, just like with the single-dimensional
+ * case).
+ */
+static bool
+clause_is_ext_compatible(Node *clause, Index relid, AttrNumber *attnum)
+{
+	mv_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (mv_compatible_walker(clause, &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+/*
+ * has_stats_of_kind
+ *	check that the list contains statistic of a given type
+ *
+ * Check for any stats with the required kind.
+ */
+static bool
+has_stats_of_kind(List *stats, char requiredkind)
+{
+	ListCell   *s;
+
+	foreach(s, stats)
+	{
+		StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(s);
+
+		if (stat->kind == requiredkind)
+			return true;
+	}
+
+	return false;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 92de2b7..165729b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3713,7 +3713,9 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NULL,
+									false);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3736,7 +3738,9 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NULL,
+									false);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3903,7 +3907,8 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NULL,
+									false);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3939,7 +3944,9 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   rel,
+							   true); /* try ext stats */
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3976,7 +3983,9 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   rel,
+							   true); /* try ext stats */
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -4142,12 +4151,16 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL,
+										false);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL,
+										false);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -4159,7 +4172,9 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL,
+										false);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
@@ -4454,7 +4469,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 				Selectivity csel;
 
 				csel = clause_selectivity(root, (Node *) rinfo,
-										  0, jointype, sjinfo);
+										  0, jointype, sjinfo, NULL, false);
 				thisfksel = Min(thisfksel, csel);
 			}
 			fkselec *= thisfksel;
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index 9cbcaed..633e7d3 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, rel, true);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -344,7 +344,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NULL, false);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cc88dcc..e35ea0d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1308,6 +1308,18 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			stainfos = lcons(info, stainfos);
 		}
 
+		if (statext_is_kind_built(htup, STATS_EXT_DEPENDENCIES))
+		{
+			StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+			info->statOid = statOid;
+			info->rel = rel;
+			info->kind = STATS_EXT_DEPENDENCIES;
+			info->keys = bms_copy(keys);
+
+			stainfos = lcons(info, stainfos);
+		}
+
 		ReleaseSysCache(htup);
 		bms_free(keys);
 	}
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
index b3615bd..3404e45 100644
--- a/src/backend/statistics/Makefile
+++ b/src/backend/statistics/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/statistics
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = extended_stats.o mvdistinct.o
+OBJS = extended_stats.o dependencies.o mvdistinct.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statistics/README b/src/backend/statistics/README
index beb7c24..5ffc76e 100644
--- a/src/backend/statistics/README
+++ b/src/backend/statistics/README
@@ -8,9 +8,74 @@ not true, resulting in estimation errors.
 Extended statistics track different types of dependencies between the columns,
 hopefully improving the estimates and producing better plans.
 
-Currently we only have one type of extended statistics - ndistinct
-coefficients, and we use it to improve estimates of grouping queries. See
-README.ndistinct for details.
+
+Types of statistics
+-------------------
+
+There are two kinds of extended statistics:
+
+    (a) ndistinct coefficients
+
+    (b) soft functional dependencies (README.dependencies)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+Currently, only OpExprs in the form Var op Const, or Const op Var are
+supported, however it's feasible to expand the code later to also estimate the
+selectivities on clauses such as Var op Var.
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
+
+
+Selectivity estimation
+----------------------
+
+When estimating selectivity, we aim to achieve several things:
+
+    (a) maximize the estimate accuracy
+
+    (b) minimize the overhead, especially when no suitable extended statistics
+        exist (so if you are not using extended stats, there's no overhead)
+
+This clauselist_selectivity() performs several inexpensive checks first, before
+even attempting to do the more expensive estimation.
+
+    (1) check if there are extended stats on the relation
+
+    (2) check there are at least two attributes referenced by clauses compatible
+        with extended statistics (equality clauses for func. dependencies)
+
+    (3) perform reduction of equality clauses using func. dependencies
+
+    (4) estimate the reduced list of clauses using regular statistics
+
+Whenever we find there are no suitable stats, we skip the expensive steps.
 
 
 Size of sample in ANALYZE
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index d2b9f6a..71971ae 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -47,7 +47,7 @@ static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
 static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 					  int natts, VacAttrStats **vacattrstats);
 static void statext_store(Relation pg_stext, Oid relid,
-			  MVNDistinct *ndistinct,
+			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  VacAttrStats **stats);
 
 
@@ -74,6 +74,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	{
 		StatExtEntry   *stat = (StatExtEntry *) lfirst(lc);
 		MVNDistinct	   *ndistinct = NULL;
+		MVDependencies *dependencies = NULL;
 		VacAttrStats  **stats;
 		ListCell	   *lc2;
 
@@ -93,10 +94,13 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 			if (t == STATS_EXT_NDISTINCT)
 				ndistinct = statext_ndistinct_build(totalrows, numrows, rows,
 													stat->columns, stats);
+			else if (t == STATS_EXT_DEPENDENCIES)
+				dependencies = statext_dependencies_build(numrows, rows,
+														  stat->columns, stats);
 		}
 
 		/* store the statistics in the catalog */
-		statext_store(pg_stext, stat->statOid, ndistinct, stats);
+		statext_store(pg_stext, stat->statOid, ndistinct, dependencies, stats);
 	}
 
 	heap_close(pg_stext, RowExclusiveLock);
@@ -117,6 +121,10 @@ statext_is_kind_built(HeapTuple htup, char type)
 			attnum = Anum_pg_statistic_ext_standistinct;
 			break;
 
+		case STATS_EXT_DEPENDENCIES:
+			attnum = Anum_pg_statistic_ext_stadependencies;
+			break;
+
 		default:
 			elog(ERROR, "unexpected statistics type requested: %d", type);
 	}
@@ -178,7 +186,8 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
 		enabled = (char *) ARR_DATA_PTR(arr);
 		for (i = 0; i < ARR_DIMS(arr)[0]; i++)
 		{
-			Assert(enabled[i] == STATS_EXT_NDISTINCT);
+			Assert((enabled[i] == STATS_EXT_NDISTINCT) ||
+				   (enabled[i] == STATS_EXT_DEPENDENCIES));
 			entry->types = lappend_int(entry->types, (int) enabled[i]);
 		}
 
@@ -256,7 +265,7 @@ lookup_var_attr_stats(Relation rel, Bitmapset *attrs, int natts,
  */
 static void
 statext_store(Relation pg_stext, Oid statOid,
-			  MVNDistinct *ndistinct,
+			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -280,8 +289,17 @@ statext_store(Relation pg_stext, Oid statOid,
 		values[Anum_pg_statistic_ext_standistinct - 1] = PointerGetDatum(data);
 	}
 
+	if (dependencies != NULL)
+	{
+		bytea	   *data = statext_dependencies_serialize(dependencies);
+
+		nulls[Anum_pg_statistic_ext_stadependencies - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_stadependencies - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_statistic_ext_standistinct - 1] = true;
+	replaces[Anum_pg_statistic_ext_stadependencies - 1] = true;
 
 	/* there should already be a pg_statistic_ext tuple */
 	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c2681ce..84934ce 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1452,6 +1452,13 @@ pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
 	StringInfoData buf;
 	int			colno;
 	char	   *nsp;
+	ArrayType  *arr;
+	char	   *enabled;
+	Datum		datum;
+	bool		isnull;
+	bool		ndistinct_enabled;
+	bool		dependencies_enabled;
+	int			i;
 
 	statexttup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statextid));
 
@@ -1467,10 +1474,55 @@ pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
 	initStringInfo(&buf);
 
 	nsp = get_namespace_name(statextrec->stanamespace);
-	appendStringInfo(&buf, "CREATE STATISTICS %s ON (",
+	appendStringInfo(&buf, "CREATE STATISTICS %s",
 					 quote_qualified_identifier(nsp,
 												NameStr(statextrec->staname)));
 
+	/*
+	 * Lookup the staenabled column so that we know how to handle the WITH
+	 * clause.
+	 */
+	datum = SysCacheGetAttr(STATEXTOID, statexttup,
+							Anum_pg_statistic_ext_staenabled, &isnull);
+	Assert(!isnull);
+	arr = DatumGetArrayTypeP(datum);
+	if (ARR_NDIM(arr) != 1 ||
+		ARR_HASNULL(arr) ||
+		ARR_ELEMTYPE(arr) != CHAROID)
+		elog(ERROR, "staenabled is not a 1-D char array");
+	enabled = (char *) ARR_DATA_PTR(arr);
+
+	ndistinct_enabled = false;
+	dependencies_enabled = false;
+
+	for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+	{
+		if (enabled[i] == STATS_EXT_NDISTINCT)
+			ndistinct_enabled = true;
+		if (enabled[i] == STATS_EXT_DEPENDENCIES)
+			dependencies_enabled = true;
+	}
+
+	/*
+	 * If any option is disabled, then we'll need to append a WITH clause to
+	 * show which options are enabled.  We omit the WITH clause on purpose
+	 * when all options are enabled, so a pg_dump/pg_restore will create all
+	 * statistics types on a newer postgres version, if the statistics had all
+	 * options enabled on the original version.
+	 */
+	if (!ndistinct_enabled || !dependencies_enabled)
+	{
+		appendStringInfoString(&buf, " WITH (");
+		if (ndistinct_enabled)
+			appendStringInfoString(&buf, "ndistinct");
+		else if (dependencies_enabled)
+			appendStringInfoString(&buf, "dependencies");
+
+		appendStringInfoChar(&buf, ')');
+	}
+
+	appendStringInfoString(&buf, " ON (");
+
 	for (colno = 0; colno < statextrec->stakeys.dim1; colno++)
 	{
 		AttrNumber	attnum = statextrec->stakeys.values[colno];
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5c382a2..1b18ce2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1633,13 +1633,19 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype,
+													sjinfo,
+													NULL,
+													false);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype,
+														  sjinfo,
+														  NULL,
+														  false);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6436,7 +6442,9 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  index->rel,
+											  true);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6757,7 +6765,9 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  index->rel,
+												  true);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7516,7 +7526,9 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   index->rel,
+											   true);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7748,7 +7760,8 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL,
+							   path->indexinfo->rel, true);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index b0f3e5e..df50beb 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2331,7 +2331,8 @@ describeOneTableDetails(const char *schemaname,
 						   "    FROM ((SELECT pg_catalog.unnest(stakeys) AS attnum) s\n"
 			   "         JOIN pg_catalog.pg_attribute a ON (starelid = a.attrelid AND\n"
 							  "a.attnum = s.attnum AND not attisdropped))) AS columns,\n"
-							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled\n"
+							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled,\n"
+							  "  (staenabled::char[] @> '{f}'::char[]) AS deps_enabled\n"
 			  "FROM pg_catalog.pg_statistic_ext stat WHERE starelid  = '%s'\n"
 			  "ORDER BY 1;",
 							  oid);
@@ -2348,7 +2349,7 @@ describeOneTableDetails(const char *schemaname,
 
 				for (i = 0; i < tuples; i++)
 				{
-					int		cnt = 0;
+					bool	gotone = false;
 
 					printfPQExpBuffer(&buf, "    ");
 
@@ -2361,7 +2362,12 @@ describeOneTableDetails(const char *schemaname,
 					if (strcmp(PQgetvalue(result, i, 5), "t") == 0)
 					{
 						appendPQExpBufferStr(&buf, "ndistinct");
-						cnt++;
+						gotone = true;
+					}
+
+					if (strcmp(PQgetvalue(result, i, 6), "t") == 0)
+					{
+						appendPQExpBuffer(&buf, "%sdependencies", gotone ? ", " : "");
 					}
 
 					appendPQExpBuffer(&buf, ") ON (%s)",
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index bc5d28a..c9dd0d8 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -258,6 +258,10 @@ DATA(insert (  194	 25    0 i b ));
 DATA(insert (  3361  17    0 i b ));
 DATA(insert (  3361  25    0 i i ));
 
+/* pg_dependencies can be coerced to, but not from, bytea and text */
+DATA(insert (  3402	 17    0 i b ));
+DATA(insert (  3402	 25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 220ba7b..a2b29da 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2771,6 +2771,15 @@ DESCR("I/O");
 DATA(insert OID = 3358 (  pg_ndistinct_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3361" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 3375 (  pg_dependencies_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3402 "2275" _null_ _null_ _null_ _null_ _null_ pg_dependencies_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3373 (  pg_dependencies_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3402" _null_ _null_ _null_ _null_ _null_ pg_dependencies_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3374 (  pg_dependencies_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3402 "2281" _null_ _null_ _null_ _null_ _null_ pg_dependencies_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3377 (  pg_dependencies_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3402" _null_ _null_ _null_ _null_ _null_	pg_dependencies_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 5f67fe7..0a1cc04 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -46,6 +46,7 @@ CATALOG(pg_statistic_ext,3381)
 	char		staenabled[1] BKI_FORCE_NOT_NULL;	/* statistic types
 													 * requested to build */
 	pg_ndistinct standistinct;	/* ndistinct coefficients (serialized) */
+	pg_dependencies stadependencies;	/* dependencies (serialized) */
 #endif
 
 } FormData_pg_statistic_ext;
@@ -61,7 +62,7 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
  *		compiler constants for pg_statistic_ext
  * ----------------
  */
-#define Natts_pg_statistic_ext					7
+#define Natts_pg_statistic_ext					8
 #define Anum_pg_statistic_ext_starelid			1
 #define Anum_pg_statistic_ext_staname			2
 #define Anum_pg_statistic_ext_stanamespace		3
@@ -69,7 +70,9 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
 #define Anum_pg_statistic_ext_stakeys			5
 #define Anum_pg_statistic_ext_staenabled		6
 #define Anum_pg_statistic_ext_standistinct		7
+#define Anum_pg_statistic_ext_stadependencies	8
 
-#define STATS_EXT_NDISTINCT		'd'
+#define STATS_EXT_NDISTINCT			'd'
+#define STATS_EXT_DEPENDENCIES		'f'
 
 #endif   /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 9ad6725..345e916 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -368,6 +368,10 @@ DATA(insert OID = 3361 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_nd
 DESCR("multivariate ndistinct coefficients");
 #define PGNDISTINCTOID	3361
 
+DATA(insert OID = 3402 ( pg_dependencies		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_dependencies_in pg_dependencies_out pg_dependencies_recv pg_dependencies_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate dependencies");
+#define PGDEPENDENCIESOID	3402
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..06a3719 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -200,12 +200,16 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel,
+					   bool tryextstats);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *rel,
+				   bool tryextstats);
 extern void cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 							  RelOptInfo *rel, ParamPathInfo *param_info,
 							  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 961f1f7..0c40b86 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -52,6 +52,11 @@ extern MVNDistinct *statext_ndistinct_build(double totalrows,
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
 extern MVNDistinct *statext_ndistinct_deserialize(bytea *data);
 
+extern MVDependencies *statext_dependencies_build(int numrows, HeapTuple *rows,
+						Bitmapset *attrs, VacAttrStats **stats);
+extern bytea *statext_dependencies_serialize(MVDependencies *dependencies);
+extern MVDependencies *statext_dependencies_deserialize(bytea *data);
+
 extern MultiSortSupport multi_sort_init(int ndims);
 extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 						 Oid oper);
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 91645bf..6fd7dce 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -44,7 +44,45 @@ typedef struct MVNDistinct
 #define SizeOfMVNDistinct	(offsetof(MVNDistinct, nitems) + sizeof(uint32))
 
 
+/* size of the struct excluding the items array */
+#define SizeOfMVNDistinct	(offsetof(MVNDistinct, nitems) + sizeof(uint32))
+
+#define STATS_DEPS_MAGIC		0xB4549A2C		/* marks serialized bytea */
+#define STATS_DEPS_TYPE_BASIC	1		/* basic dependencies type */
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
+ */
+typedef struct MVDependency
+{
+	double		degree;			/* degree of validity (0-1) */
+	AttrNumber	nattributes;	/* number of attributes */
+	AttrNumber	attributes[FLEXIBLE_ARRAY_MEMBER];	/* attribute numbers */
+} MVDependency;
+
+/* size of the struct excluding the deps array */
+#define SizeOfDependency \
+	(offsetof(MVDependency, nattributes) + sizeof(AttrNumber))
+
+typedef struct MVDependencies
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MV Dependencies (BASIC) */
+	uint32		ndeps;			/* number of dependencies */
+	MVDependency *deps[FLEXIBLE_ARRAY_MEMBER];	/* dependencies */
+} MVDependencies;
+
+/* size of the struct excluding the deps array */
+#define SizeOfDependencies	(offsetof(MVDependencies, ndeps) + sizeof(uint32))
+
+extern bool dependency_implies_attribute(MVDependency *dependency,
+						AttrNumber attnum);
+extern bool dependency_is_fully_matched(MVDependency *dependency,
+						Bitmapset *attnums);
+
 extern MVNDistinct *statext_ndistinct_load(Oid mvoid);
+extern MVDependencies *staext_dependencies_load(Oid mvoid);
 
 extern void BuildRelationExtStatistics(Relation onerel, double totalrows,
 						   int numrows, HeapTuple *rows,
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 262036a..d23f876 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -824,11 +824,12 @@ WHERE c.castmethod = 'b' AND
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
  pg_ndistinct      | bytea             |        0 | i
+ pg_dependencies   | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(8 rows)
+(9 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d706f42..cba82bb 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2192,7 +2192,8 @@ pg_stats_ext| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.staname,
     s.stakeys AS attnums,
-    length((s.standistinct)::text) AS ndistbytes
+    length((s.standistinct)::bytea) AS ndistbytes,
+    length((s.stadependencies)::bytea) AS depsbytes
    FROM ((pg_statistic_ext s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 8fe96d6..b43208d 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -31,7 +31,7 @@ ALTER TABLE ab1 DROP COLUMN a;
  b      | integer |           |          | 
  c      | integer |           |          | 
 Statistics:
-    "public.ab1_b_c_stats" WITH (ndistinct) ON (b, c)
+    "public.ab1_b_c_stats" WITH (ndistinct, dependencies) ON (b, c)
 
 DROP TABLE ab1;
 -- Ensure things work sanely with SET STATISTICS 0
@@ -135,7 +135,7 @@ SELECT staenabled, standistinct
   FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
  staenabled |                                          standistinct                                          
 ------------+------------------------------------------------------------------------------------------------
- {d}        | [{(b 3 4), 301.000000}, {(b 3 6), 301.000000}, {(b 4 6), 301.000000}, {(b 3 4 6), 301.000000}]
+ {d,f}      | [{(b 3 4), 301.000000}, {(b 3 6), 301.000000}, {(b 4 6), 301.000000}, {(b 3 4 6), 301.000000}]
 (1 row)
 
 -- Hash Aggregate, thanks to estimates improved by the statistic
@@ -201,7 +201,7 @@ SELECT staenabled, standistinct
   FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
  staenabled |                                            standistinct                                            
 ------------+----------------------------------------------------------------------------------------------------
- {d}        | [{(b 3 4), 2550.000000}, {(b 3 6), 800.000000}, {(b 4 6), 1632.000000}, {(b 3 4 6), 10000.000000}]
+ {d,f}      | [{(b 3 4), 2550.000000}, {(b 3 6), 800.000000}, {(b 4 6), 1632.000000}, {(b 3 4 6), 10000.000000}]
 (1 row)
 
 -- plans using Group Aggregate, thanks to using correct esimates
@@ -311,3 +311,107 @@ EXPLAIN (COSTS off)
 (3 rows)
 
 DROP TABLE ndistinct;
+-- functional dependencies tests
+CREATE TABLE functional_dependencies (
+    filler1 TEXT,
+    filler2 NUMERIC,
+    a INT,
+    b TEXT,
+    filler3 DATE,
+    c INT,
+    d TEXT
+);
+SET random_page_cost = 1.2;
+CREATE INDEX fdeps_ab_idx ON functional_dependencies (a, b);
+CREATE INDEX fdeps_abc_idx ON functional_dependencies (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i, 23), mod(i, 29), mod(i, 31), i FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- a => b, a => c, b => c
+TRUNCATE functional_dependencies;
+DROP STATISTICS func_deps_stat;
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   Filter: (c = 1)
+   ->  Bitmap Index Scan on fdeps_ab_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(5 rows)
+
+RESET random_page_cost;
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 84022f6..7b200ba 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -67,12 +67,13 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid  |   typname    
-------+--------------
+ oid  |     typname     
+------+-----------------
   194 | pg_node_tree
  3361 | pg_ndistinct
+ 3402 | pg_dependencies
   210 | smgr
-(3 rows)
+(4 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 4faaf88..1b0018d 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -163,3 +163,71 @@ EXPLAIN (COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
 
 DROP TABLE ndistinct;
+
+-- functional dependencies tests
+CREATE TABLE functional_dependencies (
+    filler1 TEXT,
+    filler2 NUMERIC,
+    a INT,
+    b TEXT,
+    filler3 DATE,
+    c INT,
+    d TEXT
+);
+
+SET random_page_cost = 1.2;
+
+CREATE INDEX fdeps_ab_idx ON functional_dependencies (a, b);
+CREATE INDEX fdeps_abc_idx ON functional_dependencies (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i, 23), mod(i, 29), mod(i, 31), i FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- a => b, a => c, b => c
+TRUNCATE functional_dependencies;
+DROP STATISTICS func_deps_stat;
+
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+RESET random_page_cost;
+DROP TABLE functional_dependencies;

#227

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 9 years ago

In reply to: David Rowley (#226)

Re: multivariate statistics (v25)

Hello,

At Fri, 31 Mar 2017 03:03:06 +1300, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f-fqo97jasVF57yfVyG+=T5JLce5ynCi1vvezXxX=wgoA@mail.gmail.com>

On 25 March 2017 at 07:35, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

As I said in another thread, I pushed parts 0002,0003,0004. Tomas said
he would try to rebase patches 0001,0005,0006 on top of what was
committed. My intention is to give that one a look as soon as it is
available. So we will have n-distinct and functional dependencies in
PG10. It sounds unlikely that we will get MCVs and histograms in, since
they're each a lot of code.

I've been working on the MV functional dependencies part of the patch to
polish it up a bit. Tomas has been busy with a few other duties.

I've made some changes around how clauselist_selectivity() determines if it
should try to apply any extended stats. The solution I came up with was to
add two parameters to this function, one for the RelOptInfo in question,
and one a bool to control if we should try to apply any extended stats.
For clauselist_selectivity() usage involving join rels we just pass the rel
as NULL, that way we can skip all the extended stats stuff with very low
overhead. When we actually have a base relation to pass along we can do so,
along with a true tryextstats value to have the function attempt to use any
extended stats to assist with the selectivity estimation.

When adding these two parameters I had 2nd thoughts that the "tryextstats"
was required at all. We could just have this controlled by if the rel is a
base rel of kind RTE_RELATION. I ended up having to pass these parameters
further, down to clauselist_selectivity's singleton couterpart,
clause_selectivity(). This was due to clause_selectivity() calling
clauselist_selectivity() for some clause types. I'm not entirely sure if
this is actually required, but I can't see any reason for it to cause
problems.

I understand that the reason for tryextstats is that the two are
perfectly correlating but caluse_selectivity requires the
RelOptInfo anyway. Some comment about that may be reuiqred in the
function comment.

I've also attempted to simplify some of the logic within
clauselist_selectivity and some other parts of clausesel.c to remove some
unneeded code and make it a bit more efficient. For example, we no longer
count the attributes in the clause list before calling a similar function
to retrieve the actual attnums. This is now done as a single step.

I've not yet quite gotten as far as I'd like with this. I'd quite like to
see clauselist_ext_split() gone, and instead we could build up a bitmapset
of clause list indexes to ignore when applying the selectivity of clauses
that couldn't use any extended stats. I'm planning on having a bit more of
a look at this tomorrow.

The attached patch should apply to master as
of f90d23d0c51895e0d7db7910538e85d3d38691f0.

FWIW, I tries this. This cleanly applied on it but make ends with
the following error.

$ make -s
Writing postgres.bki
Writing schemapg.h
Writing postgres.description
Writing postgres.shdescription
Writing fmgroids.h
Writing fmgrprotos.h
Writing fmgrtab.c
make[3]: *** No rule to make target `dependencies.o', needed by `objfiles.txt'. Stop.
make[2]: *** [statistics-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

Some random comments by just looking on the patch:

======
The name of the function "collect_ext_attnums", and
"clause_is_ext_compatible" seems odd since "ext" doesn't seem to
be a part of "extended statistics". Some other names looks the
same, too.

Something like "collect_e(xt)stat_compatible_attnums" and
"clause_is_e(xt)stat_compatible" seem better to me.

======
The following comment seems something wrong.

+ * When applying functional dependencies, we start with the strongest ones
+ * strongest dependencies. That is, we select the dependency that:

======
dependency_is_fully_matched() is not found. Maybe some other
patches are assumed?

======
+		/* see if it actually has the right */
+		ok = (NumRelids((Node *) expr) == 1) &&
+			(is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (!ok)
+			return true;

Ok is used only here. I don't think seeming-expressions with side
effect is not good idea here.

======
+		switch (get_oprrest(expr->opno))
+		{
+			case F_EQSEL:
+
+				/* equality conditions are compatible with all statistics */
+				break;
+
+			default:
+
+				/* unknown estimator */
+				return true;
+		}

This seems somewhat stupid..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#228

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Kyotaro HORIGUCHI (#227)

Re: multivariate statistics (v25)

On 31 March 2017 at 21:18, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Fri, 31 Mar 2017 03:03:06 +1300, David Rowley <
david.rowley@2ndquadrant.com> wrote in <CAKJS1f-fqo97jasVF57yfVyG+=
T5JLce5ynCi1vvezXxX=wgoA@mail.gmail.com>

FWIW, I tries this. This cleanly applied on it but make ends with
the following error.

$ make -s
Writing postgres.bki
Writing schemapg.h
Writing postgres.description
Writing postgres.shdescription
Writing fmgroids.h
Writing fmgrprotos.h
Writing fmgrtab.c
make[3]: *** No rule to make target `dependencies.o', needed by
`objfiles.txt'. Stop.
make[2]: *** [statistics-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

Apologies. I was caught out by patching back on to master, then committing,
and git diff'ing the last commit, where i'd of course forgotten to get add
those files.

I'm just in the middle of fixing up some other stuff. Hopefully I'll post a
working patch soon.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#229

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Kyotaro HORIGUCHI (#227)

1 attachment(s)

Re: multivariate statistics (v25)

On 31 March 2017 at 21:18, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

When adding these two parameters I had 2nd thoughts that the

"tryextstats"

was required at all. We could just have this controlled by if the rel is

a

base rel of kind RTE_RELATION. I ended up having to pass these parameters
further, down to clauselist_selectivity's singleton couterpart,
clause_selectivity(). This was due to clause_selectivity() calling
clauselist_selectivity() for some clause types. I'm not entirely sure if
this is actually required, but I can't see any reason for it to cause
problems.

I understand that the reason for tryextstats is that the two are
perfectly correlating but caluse_selectivity requires the
RelOptInfo anyway. Some comment about that may be reuiqred in the
function comment.

hmm, you could say one is functionally dependant on the other. I did
consider removing it, but it seemed weird to pass a NULL relation when we
dont want to attempt to use extended stats.

Some random comments by just looking on the patch:

======
The name of the function "collect_ext_attnums", and
"clause_is_ext_compatible" seems odd since "ext" doesn't seem to
be a part of "extended statistics". Some other names looks the
same, too.

I agree. I've made some changes to the patch to change how the functional
dependency estimations are applied. I've removed most of the code from
clausesel.c and put it into dependencies.c. In doing so I've removed some
of the inefficiencies that were in the patch. For example
clause_is_ext_compatible() was being called many times on the same clause
at different times. I've now nailed that down to just once per clause.

Something like "collect_e(xt)stat_compatible_attnums" and
"clause_is_e(xt)stat_compatible" seem better to me.

Changed to dependency_compatible_clause(), since this was searching for
equality clauses in the form Var = Const, or Const = Var. This seems
specific to the functional depdencies checking. A multivariate histogram
won't want the same.

======
The following comment seems something wrong.

+ * When applying functional dependencies, we start with the strongest ones
+ * strongest dependencies. That is, we select the dependency that:

======
dependency_is_fully_matched() is not found. Maybe some other
patches are assumed?

======
+               /* see if it actually has the right */
+               ok = (NumRelids((Node *) expr) == 1) &&
+                       (is_pseudo_constant_clause(lsecond(expr->args)) ||
+                        (varonleft = false,
+                         is_pseudo_constant_clause(
linitial(expr->args))));
+
+               /* unsupported structure (two variables or so) */
+               if (!ok)
+                       return true;

Ok is used only here. I don't think seeming-expressions with side
effect is not good idea here.

I thought the same, but I happened to notice that Tomas must have taken it
from clauselist_selectivity().

======
+               switch (get_oprrest(expr->opno))
+               {
+                       case F_EQSEL:
+
+                               /* equality conditions are compatible with
all statistics */
+                               break;
+
+                       default:
+
+                               /* unknown estimator */
+                               return true;
+               }

This seems somewhat stupid..

I agree. Changed.

I've attached an updated patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

mv_functional-deps_2017-04-01.patchapplication/octet-stream; name=mv_functional-deps_2017-04-01.patchDownload

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 277639f..1c4353a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1013,7 +1013,9 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   NULL,
+							   false);
 
 	nrows = clamp_row_est(nrows);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 03f1480..9f21297 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -591,7 +591,9 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NULL,
+													 false);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -2573,7 +2575,9 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
-										   NULL);
+										   NULL,
+										   NULL,
+										   false);
 		local_sel *= fpinfo->local_conds_sel;
 
 		rows = clamp_row_est(rows * local_sel);
@@ -4447,7 +4451,9 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 0,
 													 JOIN_INNER,
-													 NULL);
+													 NULL,
+													 NULL,
+													 false);
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
 	/*
@@ -4457,7 +4463,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 	if (!fpinfo->use_remote_estimate)
 		fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
 														0, fpinfo->jointype,
-														extra->sjinfo);
+														extra->sjinfo, NULL, false);
 
 	/* Estimate costs for bare join relation */
 	estimate_path_cost_size(root, joinrel, NIL, NIL, &rows,
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ac39c63..58b5ca9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4339,6 +4339,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stadependencies</structfield></entry>
+      <entry><type>pg_dependencies</type></entry>
+      <entry></entry>
+      <entry>
+       Functional dependencies, serialized as <structname>pg_dependencies</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index b73c66b..b61d159 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -446,6 +446,150 @@ rows = (outer_cardinality * inner_cardinality) * selectivity
    in <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
 
+  <sect2 id="functional-dependencies">
+   <title>Functional Dependencies</title>
+
+   <para>
+    The simplest type of extended statistics are functional dependencies,
+    used in definitions of database normal forms. When simplified, saying that
+    <literal>b</> is functionally dependent on <literal>a</> means that
+    knowledge of value of <literal>a</> is sufficient to determine value of
+    <literal>b</>.
+   </para>
+
+   <para>
+    In normalized databases, only functional dependencies on primary keys
+    and super keys are allowed. In practice however many data sets are not
+    fully normalized, for example due to intentional denormalization for
+    performance reasons.
+   </para>
+
+   <para>
+    Functional dependencies directly affect accuracy of the estimates, as
+    conditions on the dependent column(s) do not restrict the result set,
+    resulting in underestimates.
+   </para>
+
+   <para>
+    To inform the planner about the functional dependencies, or rather to
+    instruct it to search for them during <command>ANALYZE</>, we can use
+    the <command>CREATE STATISTICS</> command.
+
+<programlisting>
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i/100, i/100 FROM generate_series(1,10000) s(i);
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                             
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.095..3.118 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.367 ms
+ Execution time: 3.380 ms
+(5 rows)
+</programlisting>
+
+    The planner is now aware of the functional dependencies and considers
+    them when computing selectivity of the second condition.  Running
+    the query without the statistics would lead to quite different estimates.
+
+<programlisting>
+DROP STATISTICS s1;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.000..6.379 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.000 ms
+ Execution time: 6.379 ms
+(5 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Similarly to per-column statistics, extended statistics are stored in
+    a system catalog called <structname>pg_statistic_ext</structname>, but
+    there is also a more convenient view <structname>pg_stats_ext</structname>.
+    To inspect the statistics <literal>s1</literal> defined above,
+    you may do this:
+
+<programlisting>
+SELECT tablename, staname, attnums, depsbytes
+  FROM pg_stats_ext WHERE staname = 's1';
+ tablename | staname | attnums | depsbytes 
+-----------+---------+---------+-----------
+ t         | s1      | 1 2     |        40
+(1 row)
+</programlisting>
+
+     This shows that the statistics are defined on table <structname>t</>,
+     <structfield>attnums</structfield> lists attribute numbers of columns
+     (references <structname>pg_attribute</structname>). It also shows
+     the length in bytes of the functional dependencies, as found by
+     <command>ANALYZE</> when serialized into a <literal>bytea</> column.
+   </para>
+
+   <para>
+    When computing the selectivity, the planner inspects all conditions and
+    attempts to identify which conditions are already implied by some other
+    conditions, and eliminates them (but only for the estimation, all
+    conditions will be evaluated on tuples during execution). In the example
+    query, either of the conditions may get eliminated, improving the estimate.
+   </para>
+
+    <sect3 id="functional-dependencies-limitations">
+     <title>Limitations of functional dependencies</title>
+
+     <para>
+      Functional dependencies are a very simple type of statistics, and
+      as such have several limitations. The first limitation is that they
+      only work with simple equality conditions, comparing columns and constant
+      values. It's not possible to use them to eliminate equality conditions
+      comparing two columns or a column to an expression, range clauses,
+      <literal>LIKE</> or any other type of conditions.
+     </para>
+
+     <para>
+      When eliminating the implied conditions, the planner assumes that the
+      conditions are compatible. Consider the following example, violating
+      this assumption:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=2.992..2.992 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.232 ms
+ Execution time: 3.033 ms
+(5 rows)
+</programlisting>
+
+      While there are no rows with such combination of values, the planner
+      is unable to verify whether the values match - it only knows that
+      the columns are functionally dependent.
+     </para>
+
+     <para>
+      This assumption is more about queries executed on the database - in many
+      cases it's actually satisfied (e.g. when the GUI only allows selecting
+      compatible values). But if that's not the case, functional dependencies
+      may not be a viable option.
+     </para>
+
+     <para>
+      For additional information about functional dependencies, see
+      <filename>src/backend/statistics/README.dependencies</>.
+     </para>
+
+    </sect3>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 60184a3..6600edf 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -21,8 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
-  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable>
+  WITH ( <replaceable class="PARAMETER">option</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+  ON ( <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
   FROM <replaceable class="PARAMETER">table_name</replaceable>
 </synopsis>
 
@@ -94,6 +95,41 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
 
   </variablelist>
 
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>options</>
+    for statistics. The currently available parameters are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
  </refsect1>
 
  <refsect1>
@@ -122,7 +158,7 @@ CREATE TABLE t1 (
 INSERT INTO t1 SELECT i/100, i/500
                  FROM generate_series(1,1000000) s(i);
 
-CREATE STATISTICS s1 ON (a, b) FROM t1;
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t1;
 
 ANALYZE t1;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d357c8b..c19e68e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -192,7 +192,8 @@ CREATE VIEW pg_stats_ext AS
         C.relname AS tablename,
         S.staname AS staname,
         S.stakeys AS attnums,
-        length(s.standistinct) AS ndistbytes
+        length(s.standistinct::bytea) AS ndistbytes,
+        length(S.stadependencies::bytea) AS depsbytes
     FROM (pg_statistic_ext S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 0750329..8d483db 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -62,10 +62,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	Oid			relid;
 	ObjectAddress parentobject,
 				childobject;
-	Datum		types[1];		/* only ndistinct defined now */
+	Datum		types[2];		/* one for each possible type of statistics */
 	int			ntypes;
 	ArrayType  *staenabled;
 	bool		build_ndistinct;
+	bool		build_dependencies;
 	bool		requested_type = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
@@ -159,7 +160,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 				 errmsg("statistics require at least 2 columns")));
 
 	/*
-	 * Sort the attnums, which makes detecting duplicies somewhat easier, and
+	 * Sort the attnums, which makes detecting duplicities somewhat easier, and
 	 * it does not hurt (it does not affect the efficiency, unlike for
 	 * indexes, for example).
 	 */
@@ -182,6 +183,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	 * recognized.
 	 */
 	build_ndistinct = false;
+	build_dependencies = false;
 	foreach(l, stmt->options)
 	{
 		DefElem    *opt = (DefElem *) lfirst(l);
@@ -191,6 +193,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 			build_ndistinct = defGetBoolean(opt);
 			requested_type = true;
 		}
+		else if (strcmp(opt->defname, "dependencies") == 0)
+		{
+			build_dependencies = defGetBoolean(opt);
+			requested_type = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -199,12 +206,17 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 	/* If no statistic type was specified, build them all. */
 	if (!requested_type)
+	{
 		build_ndistinct = true;
+		build_dependencies = true;
+	}
 
 	/* construct the char array of enabled statistic types */
 	ntypes = 0;
 	if (build_ndistinct)
 		types[ntypes++] = CharGetDatum(STATS_EXT_NDISTINCT);
+	if (build_dependencies)
+		types[ntypes++] = CharGetDatum(STATS_EXT_DEPENDENCIES);
 	Assert(ntypes > 0);
 	staenabled = construct_array(types, ntypes, CHAROID, 1, true, 'c');
 
@@ -222,6 +234,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* no statistics build yet */
 	nulls[Anum_pg_statistic_ext_standistinct - 1] = true;
+	nulls[Anum_pg_statistic_ext_stadependencies - 1] = true;
 
 	/* insert it into pg_statistic_ext */
 	statrel = heap_open(StatisticExtRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index af2934a..a8bb480 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -22,6 +22,7 @@
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
+#include "statistics/statistics.h"
 
 
 /*
@@ -60,23 +61,31 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
- * Currently, the only extra smarts we have is to recognize "range queries",
- * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
- * query components if they are restriction opclauses whose operators have
- * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
- * We pair up clauses of this form that refer to the same variable.  An
- * unpairable clause of this kind is simply multiplied into the selectivity
- * product in the normal way.  But when we find a pair, we know that the
- * selectivities represent the relative positions of the low and high bounds
- * within the column's range, so instead of figuring the selectivity as
- * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
- * see that hisel is the fraction of the range below the high bound, while
- * losel is the fraction above the low bound; so hisel can be interpreted
- * directly as a 0..1 value but we need to convert losel to 1-losel before
- * interpreting it as a value.  Then the available range is 1-losel to hisel.
- * However, this calculation double-excludes nulls, so really we need
- * hisel + losel + null_frac - 1.)
+ *
+ * When 'tryextstats' is true, and 'rel' is not null, we'll try to apply
+ * selectivity estimates using any extended statistcs on 'rel'. Currently this
+ * is limited only to base relations with an rtekind of RTE_RELATION.
+ *
+ * If we identify such extended statistics apply, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so apply these in as
+ * many cases as possible, and fall back on normal estimates for remaining
+ * clauses.
+ *
+ * We also recognize "range queries", such as "x > 34 AND x < 42".  Clauses
+ * are recognized as possible range query components if they are restriction
+ * opclauses whose operators have scalarltsel() or scalargtsel() as their
+ * restriction selectivity estimator.  We pair up clauses of this form that
+ * refer to the same variable.  An unpairable clause of this kind is simply
+ * multiplied into the selectivity product in the normal way.  But when we
+ * find a pair, we know that the selectivities represent the relative
+ * positions of the low and high bounds within the column's range, so instead
+ * of figuring the selectivity as hisel * losel, we can figure it as hisel +
+ * losel - 1.  (To visualize this, see that hisel is the fraction of the range
+ * below the high bound, while losel is the fraction above the low bound; so
+ * hisel can be interpreted directly as a 0..1 value but we need to convert
+ * losel to 1-losel before interpreting it as a value.  Then the available
+ * range is 1-losel to hisel.  However, this calculation double-excludes
+ * nulls, so really we need hisel + losel + null_frac - 1.)
  *
  * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
  * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
@@ -93,33 +102,78 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel,
+					   bool tryextstats)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
+	Bitmapset	*estimatedclauses = NULL;
+	int			listidx;
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then extended statistics is futile
+	 * at this level (we might be able to apply them later if it's AND/OR
+	 * clause). So just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+							  varRelid, jointype, sjinfo, rel, tryextstats);
 
 	/*
-	 * Initial scan over clauses.  Anything that doesn't look like a potential
-	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
-	 * does gets inserted into an rqlist entry.
+	 * If enabled, and we have the correct relation type, then attempt to
+	 * perform selectivity estimation using extended statistics.
 	 */
+	if (tryextstats && rel && rel->rtekind == RTE_RELATION &&
+		rel->statlist != NIL)
+	{
+		/*
+		 * Try to estimate with multivariate functional dependency statistics.
+		 *
+		 * We pass s1 into the function to give it a starting point for the
+		 * estimation. We receive back an estimatedclauses Bitmapset which
+		 * indicates the zero-based list index of the clauses list of all
+		 * clauses which the retuned value includes selectivity estimations
+		 * for.
+		 */
+		s1 *= dependencies_clauselist_selectivity(root, clauses, s1, varRelid,
+									jointype, sjinfo, rel, &estimatedclauses);
+
+		/*
+		 * This would be the place to apply any other types of extended
+		 * statistics selectivity estimations for remaining clauses.
+		 */
+	}
+
+	/*
+	 * Apply normal selectivity estimates for remaining clauses. We'll be
+	 * careful to skip any clauses which were already estimated above.
+	 *
+	 * Anything that doesn't look like a potential rangequery clause gets
+	 * multiplied into s1 and forgotten. Anything that does gets inserted into
+	 * an rqlist entry.
+	 */
+	listidx = -1;
 	foreach(l, clauses)
 	{
 		Node	   *clause = (Node *) lfirst(l);
 		RestrictInfo *rinfo;
 		Selectivity s2;
 
+		listidx++;
+
+		/*
+		 * Skip this clause if it's already been estimated by some other
+		 * statistics above.
+		 */
+		if (estimatedclauses != NULL &&
+			bms_is_member(listidx, estimatedclauses))
+			continue;
+
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo, rel,
+								tryextstats);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -484,7 +538,9 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *rel,
+				   bool tryextstats)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -604,7 +660,9 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  rel,
+									  tryextstats);
 	}
 	else if (and_clause(clause))
 	{
@@ -613,7 +671,9 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									rel,
+									tryextstats);
 	}
 	else if (or_clause(clause))
 	{
@@ -632,7 +692,9 @@ clause_selectivity(PlannerInfo *root,
 												(Node *) lfirst(arg),
 												varRelid,
 												jointype,
-												sjinfo);
+												sjinfo,
+												rel,
+												tryextstats);
 
 			s1 = s1 + s2 - s1 * s2;
 		}
@@ -725,7 +787,9 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								rel,
+								tryextstats);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -734,7 +798,9 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								rel,
+								tryextstats);
 	}
 	else
 	{
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 92de2b7..165729b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3713,7 +3713,9 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NULL,
+									false);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3736,7 +3738,9 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NULL,
+									false);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3903,7 +3907,8 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NULL,
+									false);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3939,7 +3944,9 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   rel,
+							   true); /* try ext stats */
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3976,7 +3983,9 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   rel,
+							   true); /* try ext stats */
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -4142,12 +4151,16 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL,
+										false);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL,
+										false);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -4159,7 +4172,9 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL,
+										false);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
@@ -4454,7 +4469,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 				Selectivity csel;
 
 				csel = clause_selectivity(root, (Node *) rinfo,
-										  0, jointype, sjinfo);
+										  0, jointype, sjinfo, NULL, false);
 				thisfksel = Min(thisfksel, csel);
 			}
 			fkselec *= thisfksel;
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index 9cbcaed..633e7d3 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, rel, true);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -344,7 +344,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NULL, false);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cc88dcc..e35ea0d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1308,6 +1308,18 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			stainfos = lcons(info, stainfos);
 		}
 
+		if (statext_is_kind_built(htup, STATS_EXT_DEPENDENCIES))
+		{
+			StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+			info->statOid = statOid;
+			info->rel = rel;
+			info->kind = STATS_EXT_DEPENDENCIES;
+			info->keys = bms_copy(keys);
+
+			stainfos = lcons(info, stainfos);
+		}
+
 		ReleaseSysCache(htup);
 		bms_free(keys);
 	}
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
index b3615bd..3404e45 100644
--- a/src/backend/statistics/Makefile
+++ b/src/backend/statistics/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/statistics
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = extended_stats.o mvdistinct.o
+OBJS = extended_stats.o dependencies.o mvdistinct.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statistics/README b/src/backend/statistics/README
index beb7c24..20c6800 100644
--- a/src/backend/statistics/README
+++ b/src/backend/statistics/README
@@ -8,10 +8,74 @@ not true, resulting in estimation errors.
 Extended statistics track different types of dependencies between the columns,
 hopefully improving the estimates and producing better plans.
 
-Currently we only have one type of extended statistics - ndistinct
-coefficients, and we use it to improve estimates of grouping queries. See
-README.ndistinct for details.
 
+Types of statistics
+-------------------
+
+There are two kinds of extended statistics:
+
+    (a) ndistinct coefficients
+
+    (b) soft functional dependencies (README.dependencies)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+Currently, only OpExprs in the form Var op Const, or Const op Var are
+supported, however it's feasible to expand the code later to also estimate the
+selectivities on clauses such as Var op Var.
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
+
+
+Selectivity estimation
+----------------------
+
+Throughout the planner clauselist_selectivity() still remains in charge of
+most selectivity estimate requests. clauselist_selectivity() can be instructed
+to try to make use of any extended statistics on the given RelOptInfo, which
+it will do, if:
+
+    (a) An actual valid RelOptInfo was given. Join relations are passed in as
+        NULL, therefore are invalid.
+
+    (b) The relation given actually has any extended statistics defined which
+        are actually built.
+
+    (c) tryextstats is True.
+
+When the above conditions are met, clauselist_selectivity() first attempts to
+pass the clause list off to the extended statistics selectivity estimation
+function. This functions may not find any clauses which is can perform any
+estimations on. In such cases these clauses are simply ignored. When actual
+estimation work is performed in these functions they're expected to mark which
+clauses they've performed estimations for so that any other function
+performing estimations knows which clauses are to be skipped.
 
 Size of sample in ANALYZE
 -------------------------
diff --git a/src/backend/statistics/README.dependencies b/src/backend/statistics/README.dependencies
new file mode 100644
index 0000000..7bc2533
--- /dev/null
+++ b/src/backend/statistics/README.dependencies
@@ -0,0 +1,119 @@
+Soft functional dependencies
+============================
+
+Functional dependencies are a concept well described in relational theory,
+particularly in the definition of normalization and "normal forms". Wikipedia
+has a nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency
+    on a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee"
+    table that includes the attributes "Employee ID" and "Employee Date of
+    Birth", the functional dependency
+
+        {Employee ID} -> {Employee Date of Birth}
+
+    would hold. It follows from the previous two sentences that each
+    {Employee ID} is associated with precisely one {Employee Date of Birth}.
+
+    [1] https://en.wikipedia.org/wiki/Functional_dependency
+
+In practical terms, functional dependencies mean that a value in one column
+determines values in some other column. Consider for example this trivial
+table with two integer columns:
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, knowledge of the value in column 'a' is sufficient to determine the
+value in column 'b', as it's simply (a/10). A more practical example may be
+addresses, where the knowledge of a ZIP code (usually) determines city. Larger
+cities may have multiple ZIP codes, so the dependency can't be reversed.
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases, it's actually a conscious
+design choice to model the dataset in a denormalized way, either because of
+performance or to make querying easier.
+
+
+Soft dependencies
+-----------------
+
+Real-world data sets often contain data errors, either because of data entry
+mistakes (user mistyping the ZIP code) or perhaps issues in generating the
+data (e.g. a ZIP code mistakenly assigned to two cities in different states).
+
+A strict implementation would either ignore dependencies in such cases,
+rendering the approach mostly useless even for slightly noisy data sets, or
+result in sudden changes in behavior depending on minor differences between
+samples provided to ANALYZE.
+
+For this reason, the statistics implements "soft" functional dependencies,
+associating each functional dependency with a degree of validity (a number
+between 0 and 1). This degree is then used to combine selectivities in a
+smooth manner.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current algorithm is fairly simple - generate all possible functional
+dependencies, and for each one count the number of rows consistent with it.
+Then use the fraction of rows (supporting/total) as the degree.
+
+To count the rows consistent with the dependency (a => b):
+
+ (a) Sort the data lexicographically, i.e. first by 'a' then 'b'.
+
+ (b) For each group of rows with the same 'a' value, count the number of
+     distinct values in 'b'.
+
+ (c) If there's a single distinct value in 'b', the rows are consistent with
+     the functional dependency, otherwise they contradict it.
+
+The algorithm also requires a minimum size of the group to consider it
+consistent (currently 3 rows in the sample). Small groups make it less likely
+to break the consistency.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Applying the functional dependencies is fairly simple - given a list of
+equality clauses, we compute selectivities of each clause and then use the
+degree to combine them using this formula
+
+    P(a=?,b=?) = P(a=?) * (d + (1-d) * P(b=?))
+
+Where 'd' is the degree of functional dependence (a=>b).
+
+With more than two equality clauses, this process happens recursively. For
+example for (a,b,c) we first use (a,b=>c) to break the computation into
+
+    P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))
+
+and then apply (a=>b) the same way on P(a=?,b=?).
+
+
+Consistency of clauses
+----------------------
+
+Functional dependencies only express general dependencies between columns,
+without referencing particular values. This assumes that the equality clauses
+are in fact consistent with the functional dependency, i.e. that given a
+dependency (a=>b), the value in (b=?) clause is the value determined by (a=?).
+If that's not the case, the clauses are "inconsistent" with the functional
+dependency and the result will be over-estimation.
+
+This may happen, for example, when using conditions on the ZIP code and city
+name with mismatching values (ZIP code for a different city), etc. In such a
+case, the result set will be empty, but we'll estimate the selectivity using
+the ZIP code condition.
+
+In this case, the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+This issue is the price for the simplicity of functional dependencies. If the
+application frequently constructs queries with clauses inconsistent with
+functional dependencies present in the data, the best solution is not to
+use functional dependencies, but one of the more complex types of statistics.
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
new file mode 100644
index 0000000..36271c4
--- /dev/null
+++ b/src/backend/statistics/dependencies.c
@@ -0,0 +1,1144 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES functional dependencies
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/statistics/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic_ext.h"
+#include "lib/stringinfo.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/var.h"
+#include "nodes/nodes.h"
+#include "nodes/relation.h"
+#include "statistics/extended_stats_internal.h"
+#include "statistics/statistics.h"
+#include "utils/bytea.h"
+#include "utils/fmgroids.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+/*
+ * Internal state for DependencyGenerator of dependencies. Dependencies are similar to
+ * k-permutations of n elements, except that the order does not matter for the
+ * first (k-1) elements. That is, (a,b=>c) and (b,a=>c) are equivalent.
+ */
+typedef struct DependencyGeneratorData
+{
+	int			k;				/* size of the dependency */
+	int			n;				/* number of possible attributes */
+	int			current;		/* next dependency to return (index) */
+	AttrNumber	ndependencies;	/* number of dependencies generated */
+	AttrNumber *dependencies;	/* array of pre-generated dependencies	*/
+}	DependencyGeneratorData;
+
+typedef DependencyGeneratorData *DependencyGenerator;
+
+typedef struct
+{
+	Index		varno;			/* relid we're interested in */
+	Bitmapset  *varattnos;		/* attnums referenced by the clauses */
+}	dependency_compatible_context;
+
+static void generate_dependencies_recurse(DependencyGenerator state,
+						   int index, AttrNumber start, AttrNumber *current);
+static void generate_dependencies(DependencyGenerator state);
+static DependencyGenerator DependencyGenerator_init(int n, int k);
+static void DependencyGenerator_free(DependencyGenerator state);
+static AttrNumber *DependencyGenerator_next(DependencyGenerator state);
+static double dependency_degree(int numrows, HeapTuple *rows, int k,
+			 AttrNumber *dependency, VacAttrStats **stats, Bitmapset *attrs);
+static bool dependency_compatible_walker(Node *node,
+							 dependency_compatible_context * context);
+static bool dependency_compatible_clause(Node *clause, Index relid,
+							 AttrNumber *attnum);
+static MVDependency *find_strongest_dependency(StatisticExtInfo * stats,
+						  MVDependencies * dependencies,
+						  Bitmapset *attnums);
+
+static void
+generate_dependencies_recurse(DependencyGenerator state, int index,
+							  AttrNumber start, AttrNumber *current)
+{
+	/*
+	 * The generator handles the first (k-1) elements differently from the
+	 * last element.
+	 */
+	if (index < (state->k - 1))
+	{
+		AttrNumber	i;
+
+		/*
+		 * The first (k-1) values have to be in ascending order, which we
+		 * generate recursively.
+		 */
+
+		for (i = start; i < state->n; i++)
+		{
+			current[index] = i;
+			generate_dependencies_recurse(state, (index + 1), (i + 1), current);
+		}
+	}
+	else
+	{
+		int			i;
+
+		/*
+		 * the last element is the implied value, which does not respect the
+		 * ascending order. We just need to check that the value is not in the
+		 * first (k-1) elements.
+		 */
+
+		for (i = 0; i < state->n; i++)
+		{
+			int			j;
+			bool		match = false;
+
+			current[index] = i;
+
+			for (j = 0; j < index; j++)
+			{
+				if (current[j] == i)
+				{
+					match = true;
+					break;
+				}
+			}
+
+			/*
+			 * If the value is not found in the first part of the dependency,
+			 * we're done.
+			 */
+			if (!match)
+			{
+				state->dependencies = (AttrNumber *) repalloc(state->dependencies,
+				 state->k * (state->ndependencies + 1) * sizeof(AttrNumber));
+				memcpy(&state->dependencies[(state->k * state->ndependencies)],
+					   current, state->k * sizeof(AttrNumber));
+				state->ndependencies++;
+			}
+		}
+	}
+}
+
+/* generate all dependencies (k-permutations of n elements) */
+static void
+generate_dependencies(DependencyGenerator state)
+{
+	AttrNumber *current = (AttrNumber *) palloc0(sizeof(AttrNumber) * state->k);
+
+	generate_dependencies_recurse(state, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the DependencyGenerator of variations, and prebuild the variations
+ *
+ * This pre-builds all the variations. We could also generate them in
+ * DependencyGenerator_next(), but this seems simpler.
+ */
+static DependencyGenerator
+DependencyGenerator_init(int n, int k)
+{
+	DependencyGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the DependencyGenerator state */
+	state = (DependencyGenerator) palloc0(sizeof(DependencyGeneratorData));
+	state->dependencies = (AttrNumber *) palloc(k * sizeof(AttrNumber));
+
+	state->ndependencies = 0;
+	state->current = 0;
+	state->k = k;
+	state->n = n;
+
+	/* now actually pre-generate all the variations */
+	generate_dependencies(state);
+
+	return state;
+}
+
+/* free the DependencyGenerator state */
+static void
+DependencyGenerator_free(DependencyGenerator state)
+{
+	pfree(state->dependencies);
+	pfree(state);
+
+}
+
+/* generate next combination */
+static AttrNumber *
+DependencyGenerator_next(DependencyGenerator state)
+{
+	if (state->current == state->ndependencies)
+		return NULL;
+
+	return &state->dependencies[state->k * state->current++];
+}
+
+
+/*
+ * validates functional dependency on the data
+ *
+ * An actual work horse of detecting functional dependencies. Given a variation
+ * of k attributes, it checks that the first (k-1) are sufficient to determine
+ * the last one.
+ */
+static double
+dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
+				  VacAttrStats **stats, Bitmapset *attrs)
+{
+	int			i,
+				j;
+	int			nvalues = numrows * k;
+	MultiSortSupport mss;
+	SortItem   *items;
+	Datum	   *values;
+	bool	   *isnull;
+
+	int		   *attnums;
+
+	/*
+	 * min_group_size defines how many matching sets of (k-1) attributes are
+	 * required to exist with the same k value before we count this towards
+	 * the functional dependencies. Having this set too low is more likely to
+	 * cause false positives of functional dependencies and too high a value
+	 * would be too strict, and may miss detection of functional dependencies.
+	 *
+	 * XXX Maybe the threshold should be somehow related to the number of
+	 * distinct values in the combination of columns we're analyzing. Assuming
+	 * the distribution is uniform, we can estimate the average group size and
+	 * use it as a threshold, similarly to what we do for MCV lists.
+	 */
+	int			min_group_size = 3;
+
+	/* counters valid within a group */
+	int			group_size = 0;
+	int			n_violations = 0;
+
+	/* total number of rows supporting (consistent with) the dependency */
+	int			n_supporting_rows = 0;
+
+	/* Make sure we have at least two input attributes. */
+	Assert(k >= 2);
+
+	/* sort info for all attributes columns */
+	mss = multi_sort_init(k);
+
+	/* data for the sort */
+	items = (SortItem *) palloc0(numrows * sizeof(SortItem));
+	values = (Datum *) palloc0(sizeof(Datum) * nvalues);
+	isnull = (bool *) palloc0(sizeof(bool) * nvalues);
+
+	/* fix the pointers to values/isnull */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	/*
+	 * Transform the bms into an array, to make accessing i-th member easier.
+	 */
+	attnums = palloc(sizeof(int) * bms_num_members(attrs));
+	i = 0;
+	j = -1;
+	while ((j = bms_next_member(attrs, j)) >= 0)
+		attnums[i++] = j;
+
+	/*
+	 * Verify the dependency (a,b,...)->z, using a rather simple algorithm:
+	 *
+	 * (a) sort the data lexicographically
+	 *
+	 * (b) split the data into groups by first (k-1) columns
+	 *
+	 * (c) for each group count different values in the last column
+	 */
+
+	/* prepare the sort function for the first dimension, and SortItem array */
+	for (i = 0; i < k; i++)
+	{
+		VacAttrStats *colstat = stats[dependency[i]];
+		TypeCacheEntry *type;
+
+		type = lookup_type_cache(colstat->attrtypid, TYPECACHE_LT_OPR);
+		if (type->lt_opr == InvalidOid) /* shouldn't happen */
+			elog(ERROR, "cache lookup failed for ordering operator for type %u",
+				 colstat->attrtypid);
+
+		/* prepare the sort function for this dimension */
+		multi_sort_add_dimension(mss, i, type->lt_opr);
+
+		/* accumulate all the data for both columns into an array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i] =
+				heap_getattr(rows[j], attnums[dependency[i]],
+							 stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	/* sort the items so that we can detect the groups */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/*
+	 * Walk through the sorted array, split it into rows according to the
+	 * first (k-1) columns. If there's a single value in the last column, we
+	 * count the group as 'supporting' the functional dependency. Otherwise we
+	 * count it as contradicting.
+	 *
+	 * We also require a group to have a minimum number of rows to be
+	 * considered useful for supporting the dependency. Contradicting groups
+	 * may be of any size, though.
+	 *
+	 * XXX The minimum size requirement makes it impossible to identify case
+	 * when both columns are unique (or nearly unique), and therefore
+	 * trivially functionally dependent.
+	 */
+
+	/* start with the first row forming a group */
+	group_size = 1;
+
+	/* loop 1 beyond the end of the array so that we count the final group */
+	for (i = 1; i <= numrows; i++)
+	{
+		/*
+		 * Check if the group ended, which may be either because we processed
+		 * all the items (i==numrows), or because the i-th item is not equal
+		 * to the preceding one.
+		 */
+		if ((i == numrows) ||
+			(multi_sort_compare_dims(0, (k - 2), &items[i - 1], &items[i], mss) != 0))
+		{
+			/*
+			 * Do accounting for the preceding group, and reset counters.
+			 *
+			 * If there were no contradicting rows in the group, count the
+			 * rows as supporting.
+			 */
+			if ((n_violations == 0) && (group_size >= min_group_size))
+				n_supporting_rows += group_size;
+
+			/* current values start a new group */
+			n_violations = 0;
+			group_size = 0;
+		}
+		/* first colums match, but the last one does not (so contradicting) */
+		else if (multi_sort_compare_dim((k - 1), &items[i - 1], &items[i], mss) != 0)
+			n_violations += 1;
+
+		group_size += 1;
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(mss);
+
+	/* Compute the 'degree of validity' as (supporting/total). */
+	return (n_supporting_rows * 1.0 / numrows);
+}
+
+/*
+ * detects functional dependencies between groups of columns
+ *
+ * Generates all possible subsets of columns (variations) and computes
+ * the degree of validity for each one. For example with a statistic on
+ * three columns (a,b,c) there are 9 possible dependencies
+ *
+ *	   two columns			  three columns
+ *	   -----------			  -------------
+ *	   (a) -> b				  (a,b) -> c
+ *	   (a) -> c				  (a,c) -> b
+ *	   (b) -> a				  (b,c) -> a
+ *	   (b) -> c
+ *	   (c) -> a
+ *	   (c) -> b
+ */
+MVDependencies *
+statext_dependencies_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
+						   VacAttrStats **stats)
+{
+	int			i,
+				j,
+				k;
+	int			numattrs;
+	int		   *attnums;
+
+	/* result */
+	MVDependencies *dependencies = NULL;
+
+	numattrs = bms_num_members(attrs);
+
+	/*
+	 * Transform the bms into an array, to make accessing i-th member easier.
+	 */
+	attnums = palloc(sizeof(int) * bms_num_members(attrs));
+	i = 0;
+	j = -1;
+	while ((j = bms_next_member(attrs, j)) >= 0)
+		attnums[i++] = j;
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * We'll try build functional dependencies starting from the smallest ones
+	 * covering just 2 columns, to the largest ones, covering all columns
+	 * included in the statistics. We start from the smallest ones because we
+	 * want to be able to skip already implied ones.
+	 */
+	for (k = 2; k <= numattrs; k++)
+	{
+		AttrNumber *dependency; /* array with k elements */
+
+		/* prepare a DependencyGenerator of variation */
+		DependencyGenerator DependencyGenerator = DependencyGenerator_init(numattrs, k);
+
+		/* generate all possible variations of k values (out of n) */
+		while ((dependency = DependencyGenerator_next(DependencyGenerator)))
+		{
+			double		degree;
+			MVDependency *d;
+
+			/* compute how valid the dependency seems */
+			degree = dependency_degree(numrows, rows, k, dependency, stats, attrs);
+
+			/*
+			 * if the dependency seems entirely invalid, don't store it it
+			 */
+			if (degree == 0.0)
+				continue;
+
+			d = (MVDependency *) palloc0(offsetof(MVDependency, attributes)
+										 + k * sizeof(AttrNumber));
+
+			/* copy the dependency (and keep the indexes into stakeys) */
+			d->degree = degree;
+			d->nattributes = k;
+			for (i = 0; i < k; i++)
+				d->attributes[i] = attnums[dependency[i]];
+
+			/* initialize the list of dependencies */
+			if (dependencies == NULL)
+			{
+				dependencies
+					= (MVDependencies *) palloc0(sizeof(MVDependencies));
+
+				dependencies->magic = STATS_DEPS_MAGIC;
+				dependencies->type = STATS_DEPS_TYPE_BASIC;
+				dependencies->ndeps = 0;
+			}
+
+			dependencies->ndeps++;
+			dependencies = (MVDependencies *) repalloc(dependencies,
+											   offsetof(MVDependencies, deps)
+							   + dependencies->ndeps * sizeof(MVDependency));
+
+			dependencies->deps[dependencies->ndeps - 1] = d;
+		}
+
+		/*
+		 * we're done with variations of k elements, so free the
+		 * DependencyGenerator
+		 */
+		DependencyGenerator_free(DependencyGenerator);
+	}
+
+	return dependencies;
+}
+
+
+/*
+ * Serialize list of dependencies into a bytea value.
+ */
+bytea *
+statext_dependencies_serialize(MVDependencies * dependencies)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+	Size		len;
+
+	/* we need to store ndeps, with a number of attributes for each one */
+	len = VARHDRSZ + SizeOfDependencies
+		+ dependencies->ndeps * SizeOfDependency;
+
+	/* and also include space for the actual attribute numbers and degrees */
+	for (i = 0; i < dependencies->ndeps; i++)
+		len += (sizeof(AttrNumber) * dependencies->deps[i]->nattributes);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	/* Store the base struct values (magic, type, ndeps) */
+	memcpy(tmp, &dependencies->magic, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(tmp, &dependencies->type, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(tmp, &dependencies->ndeps, sizeof(uint32));
+	tmp += sizeof(uint32);
+
+	/* store number of attributes and attribute numbers for each dependency */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *d = dependencies->deps[i];
+
+		memcpy(tmp, d, SizeOfDependency);
+		tmp += SizeOfDependency;
+
+		memcpy(tmp, d->attributes, sizeof(AttrNumber) * d->nattributes);
+		tmp += sizeof(AttrNumber) * d->nattributes;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies *
+statext_dependencies_deserialize(bytea *data)
+{
+	int			i;
+	Size		min_expected_size;
+	MVDependencies *dependencies;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < SizeOfDependencies)
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), SizeOfDependencies);
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies *) palloc0(sizeof(MVDependencies));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* read the header fields and perform basic sanity checks */
+	memcpy(&dependencies->magic, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(&dependencies->type, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(&dependencies->ndeps, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+
+	if (dependencies->magic != STATS_DEPS_MAGIC)
+		elog(ERROR, "invalid dependency magic %d (expected %d)",
+			 dependencies->magic, STATS_DEPS_MAGIC);
+
+	if (dependencies->type != STATS_DEPS_TYPE_BASIC)
+		elog(ERROR, "invalid dependency type %d (expected %d)",
+			 dependencies->type, STATS_DEPS_TYPE_BASIC);
+
+	if (dependencies->ndeps == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid zero-length item array in MVDependencies")));
+
+	/* what minimum bytea size do we expect for those parameters */
+	min_expected_size = SizeOfDependencies +
+		dependencies->ndeps * (SizeOfDependency +
+							   sizeof(AttrNumber) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < min_expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), min_expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependencies, deps)
+							+ (dependencies->ndeps * sizeof(MVDependency *)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		double		degree;
+		AttrNumber	k;
+		MVDependency *d;
+
+		/* degree of validity */
+		memcpy(&degree, tmp, sizeof(double));
+		tmp += sizeof(double);
+
+		/* number of attributes */
+		memcpy(&k, tmp, sizeof(AttrNumber));
+		tmp += sizeof(AttrNumber);
+
+		/* is the number of attributes valid? */
+		Assert((k >= 2) && (k <= STATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the dependency */
+		d = (MVDependency *) palloc0(offsetof(MVDependency, attributes)
+									 + (k * sizeof(AttrNumber)));
+
+		d->degree = degree;
+		d->nattributes = k;
+
+		/* copy attribute numbers */
+		memcpy(d->attributes, tmp, sizeof(AttrNumber) * d->nattributes);
+		tmp += sizeof(AttrNumber) * d->nattributes;
+
+		dependencies->deps[i] = d;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return dependencies;
+}
+
+/*
+ * dependency_is_fully_matched
+ *		checks that a functional dependency is fully matched given clauses on
+ *		attributes (assuming the clauses are suitable equality clauses)
+ */
+bool
+dependency_is_fully_matched(MVDependency * dependency, Bitmapset *attnums)
+{
+	int			j;
+
+	/*
+	 * Check that the dependency actually is fully covered by clauses. We have
+	 * to translate all attribute numbers, as those are referenced
+	 */
+	for (j = 0; j < dependency->nattributes; j++)
+	{
+		int			attnum = dependency->attributes[j];
+
+		if (!bms_is_member(attnum, attnums))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * dependency_implies_attribute
+ *		check that the attnum matches is implied by the functional dependency
+ */
+bool
+dependency_implies_attribute(MVDependency * dependency, AttrNumber attnum)
+{
+	if (attnum == dependency->attributes[dependency->nattributes - 1])
+		return true;
+
+	return false;
+}
+
+/*
+ * staext_dependencies_load
+ *		Load the functional dependencies for the indicated pg_statistic_ext tuple
+ */
+MVDependencies *
+staext_dependencies_load(Oid mvoid)
+{
+	bool		isnull;
+	Datum		deps;
+
+	/*
+	 * Prepare to scan pg_statistic_ext for entries having indrelid = this
+	 * rel.
+	 */
+	HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+
+	if (!HeapTupleIsValid(htup))
+		elog(ERROR, "cache lookup failed for extended statistics %u", mvoid);
+
+	deps = SysCacheGetAttr(STATEXTOID, htup,
+						   Anum_pg_statistic_ext_stadependencies, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return statext_dependencies_deserialize(DatumGetByteaP(deps));
+}
+
+/*
+ * pg_dependencies_in		- input routine for type pg_dependencies.
+ *
+ * pg_dependencies is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ */
+Datum
+pg_dependencies_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies		- output routine for type pg_dependencies.
+ */
+Datum
+pg_dependencies_out(PG_FUNCTION_ARGS)
+{
+	int			i,
+				j;
+	StringInfoData str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVDependencies *dependencies = statext_dependencies_deserialize(data);
+
+	initStringInfo(&str);
+	appendStringInfoChar(&str, '[');
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *dependency = dependencies->deps[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoChar(&str, '{');
+		for (j = 0; j < dependency->nattributes; j++)
+		{
+			if (j == dependency->nattributes - 1)
+				appendStringInfoString(&str, " => ");
+			else if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", dependency->attributes[j]);
+		}
+		appendStringInfo(&str, " : %f", dependency->degree);
+		appendStringInfoChar(&str, '}');
+	}
+
+	appendStringInfoChar(&str, ']');
+
+	PG_RETURN_CSTRING(str.data);
+}
+
+/*
+ * pg_dependencies_recv		- binary input routine for type pg_dependencies.
+ */
+Datum
+pg_dependencies_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies_send		- binary output routine for type pg_dependencies.
+ *
+ * Functional dependencies are serialized in a bytea value (although the type
+ * is named differently), so let's just send that.
+ */
+Datum
+pg_dependencies_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+/*
+ * Recursive walker that checks compatibility of the clause with extended
+ * statistics, and collects attnums from the Vars.
+ */
+static bool
+dependency_compatible_walker(Node *node,
+							 dependency_compatible_context * context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return true;
+
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+
+		/* check the clause inside the RestrictInfo */
+		return dependency_compatible_walker((Node *) rinfo->clause, context);
+	}
+
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might
+		 * be unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* we also better ensure the var is from the current level */
+		if (var->varlevelsup > 0)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (!AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+		/* Only expressions with two arguments are considered compatible. */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node *) expr) == 1) &&
+			(is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (!ok)
+			return true;
+
+		/*
+		 * If it's not "=" operator, just ignore the clause, as it's not
+		 * compatible with functinal dependencies. Otherwise note the relid
+		 * and attnum for the variable.
+		 *
+		 * This uses the function for estimating selectivity, not the operator
+		 * directly (a bit awkward, but well ...).
+		 */
+		if (get_oprrest(expr->opno) != F_EQSEL)
+			return true;
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return dependency_compatible_walker((Node *) var, context);
+	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+
+/*
+ * dependency_compatible_clause
+ *		Determines if the clause is compatible with functional dependencies
+ *
+ * Only OpExprs with two arguments using an equality operator are supported.
+ * When returning True attnum is set to the attribute number of the Var within
+ * the supported clause.
+ *
+ * Currently we only support Var = Const, or Const = Var. It may be possible
+ * to expand on this later.
+ */
+static bool
+dependency_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
+{
+	dependency_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (dependency_compatible_walker(clause, &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+/*
+ * find_strongest_dependency
+ *		find the strongest dependency on the attributes
+ *
+ * When applying functional dependencies, we start with the strongest
+ * dependencies. That is, we select the dependency that:
+ *
+ * (a) has all attributes covered by equality clauses
+ *
+ * (b) has the most attributes
+ *
+ * (c) has the highest degree of validity
+ *
+ * This guarantees that we eliminate the most redundant conditions first
+ * (see the comment at clauselist_ext_selectivity_deps).
+ */
+static MVDependency *
+find_strongest_dependency(StatisticExtInfo * stats, MVDependencies * dependencies,
+						  Bitmapset *attnums)
+{
+	int			i;
+	MVDependency *strongest = NULL;
+
+	/* number of attnums in clauses */
+	int			nattnums = bms_num_members(attnums);
+
+	/*
+	 * Iterate over the MVDependency items and find the strongest one from the
+	 * fully-matched dependencies. We do the cheap checks first, before
+	 * matching it against the attnums.
+	 */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *dependency = dependencies->deps[i];
+
+		/*
+		 * Skip dependencies referencing more attributes than available
+		 * clauses, as those can't be fully matched.
+		 */
+		if (dependency->nattributes > nattnums)
+			continue;
+
+		if (strongest)
+		{
+			/* skip dependencies on fewer attributes than the strongest. */
+			if (dependency->nattributes < strongest->nattributes)
+				continue;
+
+			/* also skip weaker dependencies when attribute count matches */
+			if (strongest->nattributes == dependency->nattributes &&
+				strongest->degree > dependency->degree)
+				continue;
+		}
+
+		/*
+		 * this dependency is stronger, but we must still check that it's
+		 * fully matched to these attnums. We perform this check last as it's
+		 * slightly more expensive than the previous checks.
+		 */
+		if (dependency_is_fully_matched(dependency, attnums))
+			strongest = dependency;		/* save new best match */
+	}
+
+	return strongest;
+}
+
+/*
+ * dependencies_clauselist_selectivity
+ *		Attempt to estimate selectivity using functional dependency statistics
+ *
+ * Given equality clauses on attributes (a,b) we find the strongest dependency
+ * between them, i.e. either (a=>b) or (b=>a). Assuming (a=>b) is the selected
+ * dependency, we then combine the per-clause selectivities using the formula
+ *
+ *	   P(a,b) = P(a) * [f + (1-f)*P(b)]
+ *
+ * where 'f' is the degree of the dependency.
+ *
+ * With clauses on more than two attributes, the dependencies are applied
+ * recursively, starting with the widest/strongest dependencies. For example
+ * P(a,b,c) is first split like this:
+ *
+ *	   P(a,b,c) = P(a,b) * [f + (1-f)*P(c)]
+ *
+ * assuming (a,b=>c) is the strongest dependency.
+ */
+Selectivity
+dependencies_clauselist_selectivity(PlannerInfo *root,
+									List *clauses,
+									Selectivity s1,
+									int varRelid,
+									JoinType jointype,
+									SpecialJoinInfo *sjinfo,
+									RelOptInfo *rel,
+									Bitmapset **estimatedclauses)
+{
+	ListCell   *l;
+	Bitmapset  *clauses_attnums = NULL;
+	StatisticExtInfo *stat;
+	MVDependencies *dependencies;
+	AttrNumber *list_attnums;
+	int			listidx;
+
+
+	/* check if there's any stats that might be useful for us. */
+	if (!has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
+		return 1.0;
+
+	list_attnums = (AttrNumber *) palloc(sizeof(AttrNumber) *
+										 list_length(clauses));
+
+	/*
+	 * Pre-process the clauses list to extract the attnums seen in each item.
+	 * We need to determine if there's any clauses which will be useful for
+	 * dependency selectivity estimations. Along the way we'll record all of
+	 * the attnums for each clause in a list which we'll reference later so we
+	 * don't need to repeat the same work again. We'll also keep track of all
+	 * attnums seen. This is only so we can reject the whole list, if we
+	 * happen to not find at least two distinct attnums, which will, of course
+	 * be useless to a multivariate statistics object.
+	 */
+	listidx = 0;
+	foreach(l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+		AttrNumber	attnum;
+
+		if (dependency_compatible_clause(clause, rel->relid, &attnum))
+		{
+			list_attnums[listidx] = attnum;
+			clauses_attnums = bms_add_member(clauses_attnums, attnum);
+		}
+		else
+			list_attnums[listidx] = InvalidAttrNumber;
+
+		listidx++;
+	}
+
+	/*
+	 * If there's not at least two distinct attnums then reject the whole list
+	 * of clauses. We must return 1.0 so the calling function's selectivity is
+	 * unaffected.
+	 */
+	if (bms_num_members(clauses_attnums) < 2)
+	{
+		pfree(list_attnums);
+		return 1.0;
+	}
+
+	/* find the best suited statistics for these attnums */
+	stat = choose_best_statistics(rel->statlist, clauses_attnums,
+								  STATS_EXT_DEPENDENCIES);
+
+	/* if no matching stats could be found then we've nothing to do */
+	if (!stat)
+	{
+		pfree(list_attnums);
+		return 1.0;
+	}
+
+	/* load the dependency items stored in the statistics */
+	dependencies = staext_dependencies_load(stat->statOid);
+
+	/*
+	 * Apply the dependencies recursively, starting with the widest/strongest
+	 * ones, and proceeding to the smaller/weaker ones. At the end of each
+	 * round we factor in the selectivity of clauses on the implied attribute,
+	 * and remove the clauses from the list.
+	 */
+	while (true)
+	{
+		Selectivity s2 = 1.0;
+		MVDependency *dependency;
+
+		/* the widest/strongest dependency, fully matched by clauses */
+		dependency = find_strongest_dependency(stat, dependencies,
+											   clauses_attnums);
+
+		/* if no suitable dependency was found, we're done */
+		if (!dependency)
+			break;
+
+		/*
+		 * We found an applicable dependency, so find all the clauses on the
+		 * implied attribute - with dependency (a,b => c) we look for clauses
+		 * on 'c'.
+		 */
+		listidx = -1;
+		foreach(l, clauses)
+		{
+			Node	   *clause;
+
+			listidx++;
+
+			/*
+			 * Skip incompatible clauses, and ones we've already estimated on.
+			 */
+			if (list_attnums[listidx] == InvalidAttrNumber ||
+				bms_is_member(listidx, *estimatedclauses))
+				continue;
+
+			/*
+			 * Technically we could find more than one clause for a given
+			 * attnum. Since these clauses must be equality clauses, we choose
+			 * to only take the selectivity estimate from the final clause in
+			 * the list for this attnum. If the attnum happens to be compared
+			 * to a different Const in another clause then no rows will match
+			 * anyway. If it happens to be compared to the same Const, then
+			 * ignoring the additional clause is just the thing to do.
+			 */
+			if (dependency_implies_attribute(dependency,
+											 list_attnums[listidx]))
+			{
+				clause = (Node *) lfirst(l);
+
+				s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+										rel, false);
+
+				/* mark this one as done, so we don't touch it again. */
+				*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+
+				/*
+				 * Mark that we've got and used the dependency on this clause.
+				 * We'll want to ignore this when looking for the next
+				 * strongest dependency above.
+				 */
+				clauses_attnums = bms_del_member(clauses_attnums,
+												 list_attnums[listidx]);
+			}
+		}
+
+		/*
+		 * Now factor in the selectivity for all the "implied" clauses into
+		 * the final one, using this formula:
+		 *
+		 * P(a,b) = P(a) * (f + (1-f) * P(b))
+		 *
+		 * where 'f' is the degree of validity of the dependency.
+		 */
+		s1 *= (dependency->degree + (1 - dependency->degree) * s2);
+	}
+
+	pfree(dependencies);
+	pfree(list_attnums);
+
+	return s1;
+}
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index d2b9f6a..473809c 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -47,7 +47,7 @@ static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
 static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 					  int natts, VacAttrStats **vacattrstats);
 static void statext_store(Relation pg_stext, Oid relid,
-			  MVNDistinct *ndistinct,
+			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  VacAttrStats **stats);
 
 
@@ -74,6 +74,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	{
 		StatExtEntry   *stat = (StatExtEntry *) lfirst(lc);
 		MVNDistinct	   *ndistinct = NULL;
+		MVDependencies *dependencies = NULL;
 		VacAttrStats  **stats;
 		ListCell	   *lc2;
 
@@ -93,10 +94,13 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 			if (t == STATS_EXT_NDISTINCT)
 				ndistinct = statext_ndistinct_build(totalrows, numrows, rows,
 													stat->columns, stats);
+			else if (t == STATS_EXT_DEPENDENCIES)
+				dependencies = statext_dependencies_build(numrows, rows,
+														  stat->columns, stats);
 		}
 
 		/* store the statistics in the catalog */
-		statext_store(pg_stext, stat->statOid, ndistinct, stats);
+		statext_store(pg_stext, stat->statOid, ndistinct, dependencies, stats);
 	}
 
 	heap_close(pg_stext, RowExclusiveLock);
@@ -117,6 +121,10 @@ statext_is_kind_built(HeapTuple htup, char type)
 			attnum = Anum_pg_statistic_ext_standistinct;
 			break;
 
+		case STATS_EXT_DEPENDENCIES:
+			attnum = Anum_pg_statistic_ext_stadependencies;
+			break;
+
 		default:
 			elog(ERROR, "unexpected statistics type requested: %d", type);
 	}
@@ -178,7 +186,8 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
 		enabled = (char *) ARR_DATA_PTR(arr);
 		for (i = 0; i < ARR_DIMS(arr)[0]; i++)
 		{
-			Assert(enabled[i] == STATS_EXT_NDISTINCT);
+			Assert((enabled[i] == STATS_EXT_NDISTINCT) ||
+				   (enabled[i] == STATS_EXT_DEPENDENCIES));
 			entry->types = lappend_int(entry->types, (int) enabled[i]);
 		}
 
@@ -256,7 +265,7 @@ lookup_var_attr_stats(Relation rel, Bitmapset *attrs, int natts,
  */
 static void
 statext_store(Relation pg_stext, Oid statOid,
-			  MVNDistinct *ndistinct,
+			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -280,8 +289,17 @@ statext_store(Relation pg_stext, Oid statOid,
 		values[Anum_pg_statistic_ext_standistinct - 1] = PointerGetDatum(data);
 	}
 
+	if (dependencies != NULL)
+	{
+		bytea	   *data = statext_dependencies_serialize(dependencies);
+
+		nulls[Anum_pg_statistic_ext_stadependencies - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_stadependencies - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_statistic_ext_standistinct - 1] = true;
+	replaces[Anum_pg_statistic_ext_stadependencies - 1] = true;
 
 	/* there should already be a pg_statistic_ext tuple */
 	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
@@ -387,3 +405,82 @@ multi_sort_compare_dims(int start, int end,
 
 	return 0;
 }
+
+/*
+ * has_stats_of_kind
+ *	check that the list contains statistic of a given type
+ *
+ * Check for any stats with the required kind.
+ */
+bool
+has_stats_of_kind(List *stats, char requiredkind)
+{
+	ListCell   *l;
+
+	foreach(l, stats)
+	{
+		StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(l);
+
+		if (stat->kind == requiredkind)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * We're looking for statistics matching at least two attributes, referenced
+ * in clauses compatible with extended statistics. The current selection
+ * criteria is very simple - we choose the statistics referencing the most
+ * attributes.
+ *
+ * In cases where there are multiple statistics referencing the same number of
+ * attributes, the statistics with the least keys wins.  The reason for this
+ * is that we deem the smaller statistics to be more accurate.
+ */
+StatisticExtInfo *
+choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind)
+{
+	ListCell   *lc;
+	StatisticExtInfo *best_match = NULL;
+	int			best_num_matched = 2;	/* goal #1: maximize */
+	int			best_match_keys = (STATS_MAX_DIMENSIONS + 1);	/* goal #2: minimize */
+
+	foreach(lc, stats)
+	{
+		StatisticExtInfo *info = (StatisticExtInfo *) lfirst(lc);
+		int			num_matched;
+		int			numkeys;
+		Bitmapset  *matched;
+
+		/* skip statistics that are not the correct type */
+		if (info->kind != requiredkind)
+			continue;
+
+		/* determine how many attributes of these stats can be matched to */
+		matched = bms_intersect(attnums, info->keys);
+		num_matched = bms_num_members(matched);
+		bms_free(matched);
+
+		/*
+		 * save the actual number of keys in the stats so that we can choose
+		 * the narrowest stats with the most matching keys.
+		 */
+		numkeys = bms_num_members(info->keys);
+
+		/*
+		 * Use these statistics when it increases the number of matched clauses
+		 * or when it matches the same number of attributes but these stats
+		 * have fewer keys than any previous match.
+		 */
+		if (num_matched > best_num_matched ||
+			(num_matched == best_num_matched && best_match_keys > numkeys))
+		{
+			best_match = info;
+			best_num_matched = num_matched;
+			best_match_keys = numkeys;
+		}
+	}
+
+	return best_match;
+}
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c2681ce..84934ce 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1452,6 +1452,13 @@ pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
 	StringInfoData buf;
 	int			colno;
 	char	   *nsp;
+	ArrayType  *arr;
+	char	   *enabled;
+	Datum		datum;
+	bool		isnull;
+	bool		ndistinct_enabled;
+	bool		dependencies_enabled;
+	int			i;
 
 	statexttup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statextid));
 
@@ -1467,10 +1474,55 @@ pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
 	initStringInfo(&buf);
 
 	nsp = get_namespace_name(statextrec->stanamespace);
-	appendStringInfo(&buf, "CREATE STATISTICS %s ON (",
+	appendStringInfo(&buf, "CREATE STATISTICS %s",
 					 quote_qualified_identifier(nsp,
 												NameStr(statextrec->staname)));
 
+	/*
+	 * Lookup the staenabled column so that we know how to handle the WITH
+	 * clause.
+	 */
+	datum = SysCacheGetAttr(STATEXTOID, statexttup,
+							Anum_pg_statistic_ext_staenabled, &isnull);
+	Assert(!isnull);
+	arr = DatumGetArrayTypeP(datum);
+	if (ARR_NDIM(arr) != 1 ||
+		ARR_HASNULL(arr) ||
+		ARR_ELEMTYPE(arr) != CHAROID)
+		elog(ERROR, "staenabled is not a 1-D char array");
+	enabled = (char *) ARR_DATA_PTR(arr);
+
+	ndistinct_enabled = false;
+	dependencies_enabled = false;
+
+	for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+	{
+		if (enabled[i] == STATS_EXT_NDISTINCT)
+			ndistinct_enabled = true;
+		if (enabled[i] == STATS_EXT_DEPENDENCIES)
+			dependencies_enabled = true;
+	}
+
+	/*
+	 * If any option is disabled, then we'll need to append a WITH clause to
+	 * show which options are enabled.  We omit the WITH clause on purpose
+	 * when all options are enabled, so a pg_dump/pg_restore will create all
+	 * statistics types on a newer postgres version, if the statistics had all
+	 * options enabled on the original version.
+	 */
+	if (!ndistinct_enabled || !dependencies_enabled)
+	{
+		appendStringInfoString(&buf, " WITH (");
+		if (ndistinct_enabled)
+			appendStringInfoString(&buf, "ndistinct");
+		else if (dependencies_enabled)
+			appendStringInfoString(&buf, "dependencies");
+
+		appendStringInfoChar(&buf, ')');
+	}
+
+	appendStringInfoString(&buf, " ON (");
+
 	for (colno = 0; colno < statextrec->stakeys.dim1; colno++)
 	{
 		AttrNumber	attnum = statextrec->stakeys.values[colno];
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5c382a2..1b18ce2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1633,13 +1633,19 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype,
+													sjinfo,
+													NULL,
+													false);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype,
+														  sjinfo,
+														  NULL,
+														  false);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6436,7 +6442,9 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  index->rel,
+											  true);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6757,7 +6765,9 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  index->rel,
+												  true);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7516,7 +7526,9 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   index->rel,
+											   true);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7748,7 +7760,8 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL,
+							   path->indexinfo->rel, true);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index b0f3e5e..df50beb 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2331,7 +2331,8 @@ describeOneTableDetails(const char *schemaname,
 						   "    FROM ((SELECT pg_catalog.unnest(stakeys) AS attnum) s\n"
 			   "         JOIN pg_catalog.pg_attribute a ON (starelid = a.attrelid AND\n"
 							  "a.attnum = s.attnum AND not attisdropped))) AS columns,\n"
-							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled\n"
+							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled,\n"
+							  "  (staenabled::char[] @> '{f}'::char[]) AS deps_enabled\n"
 			  "FROM pg_catalog.pg_statistic_ext stat WHERE starelid  = '%s'\n"
 			  "ORDER BY 1;",
 							  oid);
@@ -2348,7 +2349,7 @@ describeOneTableDetails(const char *schemaname,
 
 				for (i = 0; i < tuples; i++)
 				{
-					int		cnt = 0;
+					bool	gotone = false;
 
 					printfPQExpBuffer(&buf, "    ");
 
@@ -2361,7 +2362,12 @@ describeOneTableDetails(const char *schemaname,
 					if (strcmp(PQgetvalue(result, i, 5), "t") == 0)
 					{
 						appendPQExpBufferStr(&buf, "ndistinct");
-						cnt++;
+						gotone = true;
+					}
+
+					if (strcmp(PQgetvalue(result, i, 6), "t") == 0)
+					{
+						appendPQExpBuffer(&buf, "%sdependencies", gotone ? ", " : "");
 					}
 
 					appendPQExpBuffer(&buf, ") ON (%s)",
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index bc5d28a..c9dd0d8 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -258,6 +258,10 @@ DATA(insert (  194	 25    0 i b ));
 DATA(insert (  3361  17    0 i b ));
 DATA(insert (  3361  25    0 i i ));
 
+/* pg_dependencies can be coerced to, but not from, bytea and text */
+DATA(insert (  3402	 17    0 i b ));
+DATA(insert (  3402	 25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 220ba7b..a2b29da 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2771,6 +2771,15 @@ DESCR("I/O");
 DATA(insert OID = 3358 (  pg_ndistinct_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3361" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 3375 (  pg_dependencies_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3402 "2275" _null_ _null_ _null_ _null_ _null_ pg_dependencies_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3373 (  pg_dependencies_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3402" _null_ _null_ _null_ _null_ _null_ pg_dependencies_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3374 (  pg_dependencies_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3402 "2281" _null_ _null_ _null_ _null_ _null_ pg_dependencies_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3377 (  pg_dependencies_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3402" _null_ _null_ _null_ _null_ _null_	pg_dependencies_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 5f67fe7..0a1cc04 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -46,6 +46,7 @@ CATALOG(pg_statistic_ext,3381)
 	char		staenabled[1] BKI_FORCE_NOT_NULL;	/* statistic types
 													 * requested to build */
 	pg_ndistinct standistinct;	/* ndistinct coefficients (serialized) */
+	pg_dependencies stadependencies;	/* dependencies (serialized) */
 #endif
 
 } FormData_pg_statistic_ext;
@@ -61,7 +62,7 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
  *		compiler constants for pg_statistic_ext
  * ----------------
  */
-#define Natts_pg_statistic_ext					7
+#define Natts_pg_statistic_ext					8
 #define Anum_pg_statistic_ext_starelid			1
 #define Anum_pg_statistic_ext_staname			2
 #define Anum_pg_statistic_ext_stanamespace		3
@@ -69,7 +70,9 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
 #define Anum_pg_statistic_ext_stakeys			5
 #define Anum_pg_statistic_ext_staenabled		6
 #define Anum_pg_statistic_ext_standistinct		7
+#define Anum_pg_statistic_ext_stadependencies	8
 
-#define STATS_EXT_NDISTINCT		'd'
+#define STATS_EXT_NDISTINCT			'd'
+#define STATS_EXT_DEPENDENCIES		'f'
 
 #endif   /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 9ad6725..345e916 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -368,6 +368,10 @@ DATA(insert OID = 3361 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_nd
 DESCR("multivariate ndistinct coefficients");
 #define PGNDISTINCTOID	3361
 
+DATA(insert OID = 3402 ( pg_dependencies		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_dependencies_in pg_dependencies_out pg_dependencies_recv pg_dependencies_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate dependencies");
+#define PGDEPENDENCIESOID	3402
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..06a3719 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -200,12 +200,16 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel,
+					   bool tryextstats);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *rel,
+				   bool tryextstats);
 extern void cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 							  RelOptInfo *rel, ParamPathInfo *param_info,
 							  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 961f1f7..0c40b86 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -52,6 +52,11 @@ extern MVNDistinct *statext_ndistinct_build(double totalrows,
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
 extern MVNDistinct *statext_ndistinct_deserialize(bytea *data);
 
+extern MVDependencies *statext_dependencies_build(int numrows, HeapTuple *rows,
+						Bitmapset *attrs, VacAttrStats **stats);
+extern bytea *statext_dependencies_serialize(MVDependencies *dependencies);
+extern MVDependencies *statext_dependencies_deserialize(bytea *data);
+
 extern MultiSortSupport multi_sort_init(int ndims);
 extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 						 Oid oper);
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 91645bf..9561f9c 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -14,6 +14,7 @@
 #define STATISTICS_H
 
 #include "commands/vacuum.h"
+#include "nodes/relation.h"
 
 #define STATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
@@ -44,11 +45,60 @@ typedef struct MVNDistinct
 #define SizeOfMVNDistinct	(offsetof(MVNDistinct, nitems) + sizeof(uint32))
 
 
+/* size of the struct excluding the items array */
+#define SizeOfMVNDistinct	(offsetof(MVNDistinct, nitems) + sizeof(uint32))
+
+#define STATS_DEPS_MAGIC		0xB4549A2C		/* marks serialized bytea */
+#define STATS_DEPS_TYPE_BASIC	1		/* basic dependencies type */
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
+ */
+typedef struct MVDependency
+{
+	double		degree;			/* degree of validity (0-1) */
+	AttrNumber	nattributes;	/* number of attributes */
+	AttrNumber	attributes[FLEXIBLE_ARRAY_MEMBER];	/* attribute numbers */
+} MVDependency;
+
+/* size of the struct excluding the deps array */
+#define SizeOfDependency \
+	(offsetof(MVDependency, nattributes) + sizeof(AttrNumber))
+
+typedef struct MVDependencies
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MV Dependencies (BASIC) */
+	uint32		ndeps;			/* number of dependencies */
+	MVDependency *deps[FLEXIBLE_ARRAY_MEMBER];	/* dependencies */
+} MVDependencies;
+
+/* size of the struct excluding the deps array */
+#define SizeOfDependencies	(offsetof(MVDependencies, ndeps) + sizeof(uint32))
+
+extern bool dependency_implies_attribute(MVDependency *dependency,
+						AttrNumber attnum);
+extern bool dependency_is_fully_matched(MVDependency *dependency,
+						Bitmapset *attnums);
+
 extern MVNDistinct *statext_ndistinct_load(Oid mvoid);
+extern MVDependencies *staext_dependencies_load(Oid mvoid);
 
 extern void BuildRelationExtStatistics(Relation onerel, double totalrows,
 						   int numrows, HeapTuple *rows,
 						   int natts, VacAttrStats **vacattrstats);
 extern bool statext_is_kind_built(HeapTuple htup, char kind);
+extern Selectivity dependencies_clauselist_selectivity(PlannerInfo *root,
+					   List *clauses,
+					   Selectivity s1,
+					   int varRelid,
+					   JoinType jointype,
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel,
+					   Bitmapset **estimatedclauses);
+extern bool has_stats_of_kind(List *stats, char requiredkind);
+extern StatisticExtInfo *choose_best_statistics(List *stats,
+									Bitmapset *attnums, char requiredkind);
 
 #endif   /* STATISTICS_H */
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 262036a..d23f876 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -824,11 +824,12 @@ WHERE c.castmethod = 'b' AND
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
  pg_ndistinct      | bytea             |        0 | i
+ pg_dependencies   | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(8 rows)
+(9 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d706f42..cba82bb 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2192,7 +2192,8 @@ pg_stats_ext| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.staname,
     s.stakeys AS attnums,
-    length((s.standistinct)::text) AS ndistbytes
+    length((s.standistinct)::bytea) AS ndistbytes,
+    length((s.stadependencies)::bytea) AS depsbytes
    FROM ((pg_statistic_ext s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 8fe96d6..b43208d 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -31,7 +31,7 @@ ALTER TABLE ab1 DROP COLUMN a;
  b      | integer |           |          | 
  c      | integer |           |          | 
 Statistics:
-    "public.ab1_b_c_stats" WITH (ndistinct) ON (b, c)
+    "public.ab1_b_c_stats" WITH (ndistinct, dependencies) ON (b, c)
 
 DROP TABLE ab1;
 -- Ensure things work sanely with SET STATISTICS 0
@@ -135,7 +135,7 @@ SELECT staenabled, standistinct
   FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
  staenabled |                                          standistinct                                          
 ------------+------------------------------------------------------------------------------------------------
- {d}        | [{(b 3 4), 301.000000}, {(b 3 6), 301.000000}, {(b 4 6), 301.000000}, {(b 3 4 6), 301.000000}]
+ {d,f}      | [{(b 3 4), 301.000000}, {(b 3 6), 301.000000}, {(b 4 6), 301.000000}, {(b 3 4 6), 301.000000}]
 (1 row)
 
 -- Hash Aggregate, thanks to estimates improved by the statistic
@@ -201,7 +201,7 @@ SELECT staenabled, standistinct
   FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
  staenabled |                                            standistinct                                            
 ------------+----------------------------------------------------------------------------------------------------
- {d}        | [{(b 3 4), 2550.000000}, {(b 3 6), 800.000000}, {(b 4 6), 1632.000000}, {(b 3 4 6), 10000.000000}]
+ {d,f}      | [{(b 3 4), 2550.000000}, {(b 3 6), 800.000000}, {(b 4 6), 1632.000000}, {(b 3 4 6), 10000.000000}]
 (1 row)
 
 -- plans using Group Aggregate, thanks to using correct esimates
@@ -311,3 +311,107 @@ EXPLAIN (COSTS off)
 (3 rows)
 
 DROP TABLE ndistinct;
+-- functional dependencies tests
+CREATE TABLE functional_dependencies (
+    filler1 TEXT,
+    filler2 NUMERIC,
+    a INT,
+    b TEXT,
+    filler3 DATE,
+    c INT,
+    d TEXT
+);
+SET random_page_cost = 1.2;
+CREATE INDEX fdeps_ab_idx ON functional_dependencies (a, b);
+CREATE INDEX fdeps_abc_idx ON functional_dependencies (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i, 23), mod(i, 29), mod(i, 31), i FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- a => b, a => c, b => c
+TRUNCATE functional_dependencies;
+DROP STATISTICS func_deps_stat;
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   Filter: (c = 1)
+   ->  Bitmap Index Scan on fdeps_ab_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(5 rows)
+
+RESET random_page_cost;
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 84022f6..7b200ba 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -67,12 +67,13 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid  |   typname    
-------+--------------
+ oid  |     typname     
+------+-----------------
   194 | pg_node_tree
  3361 | pg_ndistinct
+ 3402 | pg_dependencies
   210 | smgr
-(3 rows)
+(4 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 4faaf88..1b0018d 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -163,3 +163,71 @@ EXPLAIN (COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
 
 DROP TABLE ndistinct;
+
+-- functional dependencies tests
+CREATE TABLE functional_dependencies (
+    filler1 TEXT,
+    filler2 NUMERIC,
+    a INT,
+    b TEXT,
+    filler3 DATE,
+    c INT,
+    d TEXT
+);
+
+SET random_page_cost = 1.2;
+
+CREATE INDEX fdeps_ab_idx ON functional_dependencies (a, b);
+CREATE INDEX fdeps_abc_idx ON functional_dependencies (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i, 23), mod(i, 29), mod(i, 31), i FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- a => b, a => c, b => c
+TRUNCATE functional_dependencies;
+DROP STATISTICS func_deps_stat;
+
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+RESET random_page_cost;
+DROP TABLE functional_dependencies;

#230

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: David Rowley (#229)

1 attachment(s)

Re: multivariate statistics (v25)

On 1 April 2017 at 04:25, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've attached an updated patch.

I've made another pass at this and ended up removing the tryextstats
variable. We now only try to use extended statistics when
clauselist_selectivity() is given a valid RelOptInfo with rtekind ==
RTE_RELATION, and of course, it must also have some extended stats
defined too.

I've also cleaned up a few more comments, many of which I managed to
omit updating when I refactored how the selectivity estimates ties
into clauselist_selectivity()

I'm quite happy with all of this now, and would also be happy for
other people to take a look and comment.

As a reviewer, I'd be marking this ready for committer, but I've moved
a little way from just reviewing this now, having spent two weeks
hacking at it.

The latest patch is attached.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

mv_functional-deps_2017-04-04.patchapplication/octet-stream; name=mv_functional-deps_2017-04-04.patchDownload

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 277639f..7414c16 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1013,6 +1013,7 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
+							   NULL,
 							   NULL);
 
 	nrows = clamp_row_est(nrows);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 03f1480..a03bf70 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -591,6 +591,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
+													 NULL,
 													 NULL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
@@ -2573,6 +2574,7 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
+										   NULL,
 										   NULL);
 		local_sel *= fpinfo->local_conds_sel;
 
@@ -4447,6 +4449,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 0,
 													 JOIN_INNER,
+													 NULL,
 													 NULL);
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -4457,7 +4460,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 	if (!fpinfo->use_remote_estimate)
 		fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
 														0, fpinfo->jointype,
-														extra->sjinfo);
+														extra->sjinfo, NULL);
 
 	/* Estimate costs for bare join relation */
 	estimate_path_cost_size(root, joinrel, NIL, NIL, &rows,
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ac39c63..58b5ca9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4339,6 +4339,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stadependencies</structfield></entry>
+      <entry><type>pg_dependencies</type></entry>
+      <entry></entry>
+      <entry>
+       Functional dependencies, serialized as <structname>pg_dependencies</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index b73c66b..671963e 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -446,6 +446,151 @@ rows = (outer_cardinality * inner_cardinality) * selectivity
    in <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
 
+  <sect2 id="functional-dependencies">
+   <title>Functional Dependencies</title>
+
+   <para>
+    The simplest type of extended statistics are functional dependencies,
+    used in definitions of database normal forms. When simplified, saying that
+    <literal>b</> is functionally dependent on <literal>a</> means that
+    knowledge of value of <literal>a</> is sufficient to determine value of
+    <literal>b</>.
+   </para>
+
+   <para>
+    In normalized databases, only functional dependencies on primary keys
+    and superkeys are allowed. However, in practice, many data sets are not
+    fully normalized, for example, due to intentional denormalization for
+    performance reasons.
+   </para>
+
+   <para>
+    Functional dependencies directly affect accuracy of the estimates, as
+    conditions on the dependent column(s) do not restrict the result set,
+    resulting in underestimates.
+   </para>
+
+   <para>
+    To inform the planner about the functional dependencies, or rather to
+    instruct it to search for them during <command>ANALYZE</>, we can use
+    the <command>CREATE STATISTICS</> command.
+
+<programlisting>
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i/100, i/100 FROM generate_series(1,10000) s(i);
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                             
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.095..3.118 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.367 ms
+ Execution time: 3.380 ms
+(5 rows)
+</programlisting>
+
+    The planner is now aware of the functional dependencies and considers
+    them when computing the selectivity of the second condition.  Running
+    the query without the statistics would lead to quite different estimates.
+
+<programlisting>
+DROP STATISTICS s1;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.000..6.379 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.000 ms
+ Execution time: 6.379 ms
+(5 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Similarly to per-column statistics, extended statistics are stored in
+    a system catalog called <structname>pg_statistic_ext</structname>, but
+    there is also a more convenient view <structname>pg_stats_ext</structname>.
+    To inspect the statistics <literal>s1</literal> defined above,
+    you may do this:
+
+<programlisting>
+SELECT tablename, staname, attnums, depsbytes
+  FROM pg_stats_ext WHERE staname = 's1';
+ tablename | staname | attnums | depsbytes 
+-----------+---------+---------+-----------
+ t         | s1      | 1 2     |        40
+(1 row)
+</programlisting>
+
+     This shows that the statistics are defined on table <structname>t</>,
+     <structfield>attnums</structfield> lists attribute numbers of columns
+     (references <structname>pg_attribute</structname>). It also shows
+     the length in bytes of the functional dependencies, as found by
+     <command>ANALYZE</> when serialized into a <literal>bytea</> column.
+   </para>
+
+   <para>
+    When computing the selectivity, the planner inspects all conditions and
+    attempts to identify which conditions are already implied by other
+    conditions.  The selectivity estimates from any redundant conditions are
+    ignored from a selectivity point of view. In the example query above,
+    the selectivity estimates for either of the conditions may be eliminated,
+    thus improving the overall estimate.
+   </para>
+
+    <sect3 id="functional-dependencies-limitations">
+     <title>Limitations of functional dependencies</title>
+
+     <para>
+      Functional dependencies are a very simple type of statistics, and
+      as such have several limitations. The first limitation is that they
+      only work with simple equality conditions, comparing columns and constant
+      values. It's not possible to use them to eliminate equality conditions
+      comparing two columns or a column to an expression, range clauses,
+      <literal>LIKE</> or any other type of conditions.
+     </para>
+
+     <para>
+      When eliminating the implied conditions, the planner assumes that the
+      conditions are compatible. Consider the following example, violating
+      this assumption:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=2.992..2.992 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.232 ms
+ Execution time: 3.033 ms
+(5 rows)
+</programlisting>
+
+      While there are no rows with such combination of values, the planner
+      is unable to verify whether the values match - it only knows that
+      the columns are functionally dependent.
+     </para>
+
+     <para>
+      This assumption is more about queries executed on the database - in many
+      cases, it's actually satisfied (e.g. when the GUI only allows selecting
+      compatible values). But if that's not the case, functional dependencies
+      may not be a viable option.
+     </para>
+
+     <para>
+      For additional information about functional dependencies, see
+      <filename>src/backend/statistics/README.dependencies</>.
+     </para>
+
+    </sect3>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 60184a3..163d43f 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -21,8 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
-  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable>
+  WITH ( <replaceable class="PARAMETER">option</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+  ON ( <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
   FROM <replaceable class="PARAMETER">table_name</replaceable>
 </synopsis>
 
@@ -94,6 +95,41 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
 
   </variablelist>
 
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>options</>
+    for the statistics. Available options are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
  </refsect1>
 
  <refsect1>
@@ -122,7 +158,7 @@ CREATE TABLE t1 (
 INSERT INTO t1 SELECT i/100, i/500
                  FROM generate_series(1,1000000) s(i);
 
-CREATE STATISTICS s1 ON (a, b) FROM t1;
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t1;
 
 ANALYZE t1;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d357c8b..c19e68e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -192,7 +192,8 @@ CREATE VIEW pg_stats_ext AS
         C.relname AS tablename,
         S.staname AS staname,
         S.stakeys AS attnums,
-        length(s.standistinct) AS ndistbytes
+        length(s.standistinct::bytea) AS ndistbytes,
+        length(S.stadependencies::bytea) AS depsbytes
     FROM (pg_statistic_ext S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 0750329..8d483db 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -62,10 +62,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	Oid			relid;
 	ObjectAddress parentobject,
 				childobject;
-	Datum		types[1];		/* only ndistinct defined now */
+	Datum		types[2];		/* one for each possible type of statistics */
 	int			ntypes;
 	ArrayType  *staenabled;
 	bool		build_ndistinct;
+	bool		build_dependencies;
 	bool		requested_type = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
@@ -159,7 +160,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 				 errmsg("statistics require at least 2 columns")));
 
 	/*
-	 * Sort the attnums, which makes detecting duplicies somewhat easier, and
+	 * Sort the attnums, which makes detecting duplicities somewhat easier, and
 	 * it does not hurt (it does not affect the efficiency, unlike for
 	 * indexes, for example).
 	 */
@@ -182,6 +183,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	 * recognized.
 	 */
 	build_ndistinct = false;
+	build_dependencies = false;
 	foreach(l, stmt->options)
 	{
 		DefElem    *opt = (DefElem *) lfirst(l);
@@ -191,6 +193,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 			build_ndistinct = defGetBoolean(opt);
 			requested_type = true;
 		}
+		else if (strcmp(opt->defname, "dependencies") == 0)
+		{
+			build_dependencies = defGetBoolean(opt);
+			requested_type = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -199,12 +206,17 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 	/* If no statistic type was specified, build them all. */
 	if (!requested_type)
+	{
 		build_ndistinct = true;
+		build_dependencies = true;
+	}
 
 	/* construct the char array of enabled statistic types */
 	ntypes = 0;
 	if (build_ndistinct)
 		types[ntypes++] = CharGetDatum(STATS_EXT_NDISTINCT);
+	if (build_dependencies)
+		types[ntypes++] = CharGetDatum(STATS_EXT_DEPENDENCIES);
 	Assert(ntypes > 0);
 	staenabled = construct_array(types, ntypes, CHAROID, 1, true, 'c');
 
@@ -222,6 +234,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* no statistics build yet */
 	nulls[Anum_pg_statistic_ext_standistinct - 1] = true;
+	nulls[Anum_pg_statistic_ext_stadependencies - 1] = true;
 
 	/* insert it into pg_statistic_ext */
 	statrel = heap_open(StatisticExtRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index af2934a..e0cbbe1 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -22,6 +22,7 @@
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
+#include "statistics/statistics.h"
 
 
 /*
@@ -60,23 +61,30 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
- * Currently, the only extra smarts we have is to recognize "range queries",
- * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
- * query components if they are restriction opclauses whose operators have
- * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
- * We pair up clauses of this form that refer to the same variable.  An
- * unpairable clause of this kind is simply multiplied into the selectivity
- * product in the normal way.  But when we find a pair, we know that the
- * selectivities represent the relative positions of the low and high bounds
- * within the column's range, so instead of figuring the selectivity as
- * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
- * see that hisel is the fraction of the range below the high bound, while
- * losel is the fraction above the low bound; so hisel can be interpreted
- * directly as a 0..1 value but we need to convert losel to 1-losel before
- * interpreting it as a value.  Then the available range is 1-losel to hisel.
- * However, this calculation double-excludes nulls, so really we need
- * hisel + losel + null_frac - 1.)
+ *
+ * When 'rel' is not null and rtekind = RTE_RELATION, we'll try to apply
+ * selectivity estimates using any extended statistcs on 'rel'.
+ *
+ * If we identify such extended statistics exist, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so apply these in as
+ * many cases as possible, and fall back on normal estimates for remaining
+ * clauses.
+ *
+ * We also recognize "range queries", such as "x > 34 AND x < 42".  Clauses
+ * are recognized as possible range query components if they are restriction
+ * opclauses whose operators have scalarltsel() or scalargtsel() as their
+ * restriction selectivity estimator.  We pair up clauses of this form that
+ * refer to the same variable.  An unpairable clause of this kind is simply
+ * multiplied into the selectivity product in the normal way.  But when we
+ * find a pair, we know that the selectivities represent the relative
+ * positions of the low and high bounds within the column's range, so instead
+ * of figuring the selectivity as hisel * losel, we can figure it as hisel +
+ * losel - 1.  (To visualize this, see that hisel is the fraction of the range
+ * below the high bound, while losel is the fraction above the low bound; so
+ * hisel can be interpreted directly as a 0..1 value but we need to convert
+ * losel to 1-losel before interpreting it as a value.  Then the available
+ * range is 1-losel to hisel.  However, this calculation double-excludes
+ * nulls, so really we need hisel + losel + null_frac - 1.)
  *
  * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
  * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
@@ -93,33 +101,75 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
+	Bitmapset  *estimatedclauses = NULL;
+	int			listidx;
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then extended statistics is futile at
+	 * this level (we might be able to apply them later if it's AND/OR
+	 * clause). So just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, rel);
 
 	/*
-	 * Initial scan over clauses.  Anything that doesn't look like a potential
-	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
-	 * does gets inserted into an rqlist entry.
+	 * If we have a valid rel and we have the correct rte kind, then attempt
+	 * to perform selectivity estimation using extended statistics.
 	 */
+	if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+	{
+		/*
+		 * Try to estimate with multivariate functional dependency statistics.
+		 *
+		 * The function will supply an estimate for the clauses which it
+		 * estimated for. Any clauses which were unsuitible were ignored.
+		 * Clauses which were estimated will have their 0-based list index set
+		 * in estimatedclauses.  We must ignore these clauses when processing
+		 * the remaining clauses later.
+		 */
+		s1 *= dependencies_clauselist_selectivity(root, clauses, varRelid,
+								   jointype, sjinfo, rel, &estimatedclauses);
+
+		/*
+		 * This would be the place to apply any other types of extended
+		 * statistics selectivity estimations for remaining clauses.
+		 */
+	}
+
+	/*
+	 * Apply normal selectivity estimates for remaining clauses. We'll be
+	 * careful to skip any clauses which were already estimated above.
+	 *
+	 * Anything that doesn't look like a potential rangequery clause gets
+	 * multiplied into s1 and forgotten. Anything that does gets inserted into
+	 * an rqlist entry.
+	 */
+	listidx = -1;
 	foreach(l, clauses)
 	{
 		Node	   *clause = (Node *) lfirst(l);
 		RestrictInfo *rinfo;
 		Selectivity s2;
 
+		listidx++;
+
+		/*
+		 * Skip this clause if it's already been estimated by some other
+		 * statistics above.
+		 */
+		if (estimatedclauses != NULL &&
+			bms_is_member(listidx, estimatedclauses))
+			continue;
+
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo, rel);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -484,7 +534,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *rel)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -604,7 +655,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  rel);
 	}
 	else if (and_clause(clause))
 	{
@@ -613,7 +665,8 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									rel);
 	}
 	else if (or_clause(clause))
 	{
@@ -632,7 +685,8 @@ clause_selectivity(PlannerInfo *root,
 												(Node *) lfirst(arg),
 												varRelid,
 												jointype,
-												sjinfo);
+												sjinfo,
+												rel);
 
 			s1 = s1 + s2 - s1 * s2;
 		}
@@ -725,7 +779,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								rel);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -734,7 +789,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								rel);
 	}
 	else
 	{
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 92de2b7..a2093ac 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3713,7 +3713,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NULL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3736,7 +3737,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NULL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3903,7 +3905,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NULL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3939,7 +3941,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   rel);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3976,7 +3979,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   rel);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -4142,12 +4146,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -4159,7 +4165,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
@@ -4454,7 +4461,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 				Selectivity csel;
 
 				csel = clause_selectivity(root, (Node *) rinfo,
-										  0, jointype, sjinfo);
+										  0, jointype, sjinfo, NULL);
 				thisfksel = Min(thisfksel, csel);
 			}
 			fkselec *= thisfksel;
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index 9cbcaed..735697d 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, rel);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -344,7 +344,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NULL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cc88dcc..e35ea0d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1308,6 +1308,18 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			stainfos = lcons(info, stainfos);
 		}
 
+		if (statext_is_kind_built(htup, STATS_EXT_DEPENDENCIES))
+		{
+			StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+			info->statOid = statOid;
+			info->rel = rel;
+			info->kind = STATS_EXT_DEPENDENCIES;
+			info->keys = bms_copy(keys);
+
+			stainfos = lcons(info, stainfos);
+		}
+
 		ReleaseSysCache(htup);
 		bms_free(keys);
 	}
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
index b3615bd..3404e45 100644
--- a/src/backend/statistics/Makefile
+++ b/src/backend/statistics/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/statistics
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = extended_stats.o mvdistinct.o
+OBJS = extended_stats.o dependencies.o mvdistinct.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statistics/README b/src/backend/statistics/README
index beb7c24..af76511 100644
--- a/src/backend/statistics/README
+++ b/src/backend/statistics/README
@@ -8,10 +8,72 @@ not true, resulting in estimation errors.
 Extended statistics track different types of dependencies between the columns,
 hopefully improving the estimates and producing better plans.
 
-Currently we only have one type of extended statistics - ndistinct
-coefficients, and we use it to improve estimates of grouping queries. See
-README.ndistinct for details.
 
+Types of statistics
+-------------------
+
+There are two kinds of extended statistics:
+
+    (a) ndistinct coefficients
+
+    (b) soft functional dependencies (README.dependencies)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+Currently, only OpExprs in the form Var op Const, or Const op Var are
+supported, however it's feasible to expand the code later to also estimate the
+selectivities on clauses such as Var op Var.
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
+
+
+Selectivity estimation
+----------------------
+
+Throughout the planner clauselist_selectivity() still remains in charge of
+most selectivity estimate requests. clauselist_selectivity() can be instructed
+to try to make use of any extended statistics on the given RelOptInfo, which
+it will do, if:
+
+    (a) An actual valid RelOptInfo was given. Join relations are passed in as
+        NULL, therefore are invalid.
+
+    (b) The relation given actually has any extended statistics defined which
+        are actually built.
+
+When the above conditions are met, clauselist_selectivity() first attempts to
+pass the clause list off to the extended statistics selectivity estimation
+function. This functions may not find any clauses which is can perform any
+estimations on. In such cases these clauses are simply ignored. When actual
+estimation work is performed in these functions they're expected to mark which
+clauses they've performed estimations for so that any other function
+performing estimations knows which clauses are to be skipped.
 
 Size of sample in ANALYZE
 -------------------------
diff --git a/src/backend/statistics/README.dependencies b/src/backend/statistics/README.dependencies
new file mode 100644
index 0000000..7bc2533
--- /dev/null
+++ b/src/backend/statistics/README.dependencies
@@ -0,0 +1,119 @@
+Soft functional dependencies
+============================
+
+Functional dependencies are a concept well described in relational theory,
+particularly in the definition of normalization and "normal forms". Wikipedia
+has a nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency
+    on a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee"
+    table that includes the attributes "Employee ID" and "Employee Date of
+    Birth", the functional dependency
+
+        {Employee ID} -> {Employee Date of Birth}
+
+    would hold. It follows from the previous two sentences that each
+    {Employee ID} is associated with precisely one {Employee Date of Birth}.
+
+    [1] https://en.wikipedia.org/wiki/Functional_dependency
+
+In practical terms, functional dependencies mean that a value in one column
+determines values in some other column. Consider for example this trivial
+table with two integer columns:
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, knowledge of the value in column 'a' is sufficient to determine the
+value in column 'b', as it's simply (a/10). A more practical example may be
+addresses, where the knowledge of a ZIP code (usually) determines city. Larger
+cities may have multiple ZIP codes, so the dependency can't be reversed.
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases, it's actually a conscious
+design choice to model the dataset in a denormalized way, either because of
+performance or to make querying easier.
+
+
+Soft dependencies
+-----------------
+
+Real-world data sets often contain data errors, either because of data entry
+mistakes (user mistyping the ZIP code) or perhaps issues in generating the
+data (e.g. a ZIP code mistakenly assigned to two cities in different states).
+
+A strict implementation would either ignore dependencies in such cases,
+rendering the approach mostly useless even for slightly noisy data sets, or
+result in sudden changes in behavior depending on minor differences between
+samples provided to ANALYZE.
+
+For this reason, the statistics implements "soft" functional dependencies,
+associating each functional dependency with a degree of validity (a number
+between 0 and 1). This degree is then used to combine selectivities in a
+smooth manner.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current algorithm is fairly simple - generate all possible functional
+dependencies, and for each one count the number of rows consistent with it.
+Then use the fraction of rows (supporting/total) as the degree.
+
+To count the rows consistent with the dependency (a => b):
+
+ (a) Sort the data lexicographically, i.e. first by 'a' then 'b'.
+
+ (b) For each group of rows with the same 'a' value, count the number of
+     distinct values in 'b'.
+
+ (c) If there's a single distinct value in 'b', the rows are consistent with
+     the functional dependency, otherwise they contradict it.
+
+The algorithm also requires a minimum size of the group to consider it
+consistent (currently 3 rows in the sample). Small groups make it less likely
+to break the consistency.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Applying the functional dependencies is fairly simple - given a list of
+equality clauses, we compute selectivities of each clause and then use the
+degree to combine them using this formula
+
+    P(a=?,b=?) = P(a=?) * (d + (1-d) * P(b=?))
+
+Where 'd' is the degree of functional dependence (a=>b).
+
+With more than two equality clauses, this process happens recursively. For
+example for (a,b,c) we first use (a,b=>c) to break the computation into
+
+    P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))
+
+and then apply (a=>b) the same way on P(a=?,b=?).
+
+
+Consistency of clauses
+----------------------
+
+Functional dependencies only express general dependencies between columns,
+without referencing particular values. This assumes that the equality clauses
+are in fact consistent with the functional dependency, i.e. that given a
+dependency (a=>b), the value in (b=?) clause is the value determined by (a=?).
+If that's not the case, the clauses are "inconsistent" with the functional
+dependency and the result will be over-estimation.
+
+This may happen, for example, when using conditions on the ZIP code and city
+name with mismatching values (ZIP code for a different city), etc. In such a
+case, the result set will be empty, but we'll estimate the selectivity using
+the ZIP code condition.
+
+In this case, the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+This issue is the price for the simplicity of functional dependencies. If the
+application frequently constructs queries with clauses inconsistent with
+functional dependencies present in the data, the best solution is not to
+use functional dependencies, but one of the more complex types of statistics.
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
new file mode 100644
index 0000000..2f5697b
--- /dev/null
+++ b/src/backend/statistics/dependencies.c
@@ -0,0 +1,1148 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES functional dependencies
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/statistics/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic_ext.h"
+#include "lib/stringinfo.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/var.h"
+#include "nodes/nodes.h"
+#include "nodes/relation.h"
+#include "statistics/extended_stats_internal.h"
+#include "statistics/statistics.h"
+#include "utils/bytea.h"
+#include "utils/fmgroids.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+/*
+ * DEPENDENCY_MIN_GROUP_SIZE defines how many matching sets of (k-1)
+ * attributes are required to exist with the same k value before we count this
+ * towards the functional dependencies. Having this set too low is more likely
+ * to cause false positives of functional dependencies and too high a value
+ * would be too strict, and may miss detection of functional dependencies.
+ */
+#define DEPENDENCY_MIN_GROUP_SIZE 3
+
+
+/*
+ * Internal state for DependencyGenerator of dependencies. Dependencies are similar to
+ * k-permutations of n elements, except that the order does not matter for the
+ * first (k-1) elements. That is, (a,b=>c) and (b,a=>c) are equivalent.
+ */
+typedef struct DependencyGeneratorData
+{
+	int			k;				/* size of the dependency */
+	int			n;				/* number of possible attributes */
+	int			current;		/* next dependency to return (index) */
+	AttrNumber	ndependencies;	/* number of dependencies generated */
+	AttrNumber *dependencies;	/* array of pre-generated dependencies	*/
+}	DependencyGeneratorData;
+
+typedef DependencyGeneratorData *DependencyGenerator;
+
+typedef struct
+{
+	Index		varno;			/* relid we're interested in */
+	Bitmapset  *varattnos;		/* attnums referenced by the clauses */
+}	dependency_compatible_context;
+
+static void generate_dependencies_recurse(DependencyGenerator state,
+						   int index, AttrNumber start, AttrNumber *current);
+static void generate_dependencies(DependencyGenerator state);
+static DependencyGenerator DependencyGenerator_init(int n, int k);
+static void DependencyGenerator_free(DependencyGenerator state);
+static AttrNumber *DependencyGenerator_next(DependencyGenerator state);
+static double dependency_degree(int numrows, HeapTuple *rows, int k,
+			 AttrNumber *dependency, VacAttrStats **stats, Bitmapset *attrs);
+static bool dependency_is_fully_matched(MVDependency *dependency,
+							Bitmapset *attnums);
+static bool dependency_implies_attribute(MVDependency *dependency,
+							 AttrNumber attnum);
+static bool dependency_compatible_walker(Node *node,
+							 dependency_compatible_context *context);
+static bool dependency_compatible_clause(Node *clause, Index relid,
+							 AttrNumber *attnum);
+static MVDependency *find_strongest_dependency(StatisticExtInfo *stats,
+						  MVDependencies *dependencies,
+						  Bitmapset *attnums);
+
+static void
+generate_dependencies_recurse(DependencyGenerator state, int index,
+							  AttrNumber start, AttrNumber *current)
+{
+	/*
+	 * The generator handles the first (k-1) elements differently from the
+	 * last element.
+	 */
+	if (index < (state->k - 1))
+	{
+		AttrNumber	i;
+
+		/*
+		 * The first (k-1) values have to be in ascending order, which we
+		 * generate recursively.
+		 */
+
+		for (i = start; i < state->n; i++)
+		{
+			current[index] = i;
+			generate_dependencies_recurse(state, (index + 1), (i + 1), current);
+		}
+	}
+	else
+	{
+		int			i;
+
+		/*
+		 * the last element is the implied value, which does not respect the
+		 * ascending order. We just need to check that the value is not in the
+		 * first (k-1) elements.
+		 */
+
+		for (i = 0; i < state->n; i++)
+		{
+			int			j;
+			bool		match = false;
+
+			current[index] = i;
+
+			for (j = 0; j < index; j++)
+			{
+				if (current[j] == i)
+				{
+					match = true;
+					break;
+				}
+			}
+
+			/*
+			 * If the value is not found in the first part of the dependency,
+			 * we're done.
+			 */
+			if (!match)
+			{
+				state->dependencies = (AttrNumber *) repalloc(state->dependencies,
+				 state->k * (state->ndependencies + 1) * sizeof(AttrNumber));
+				memcpy(&state->dependencies[(state->k * state->ndependencies)],
+					   current, state->k * sizeof(AttrNumber));
+				state->ndependencies++;
+			}
+		}
+	}
+}
+
+/* generate all dependencies (k-permutations of n elements) */
+static void
+generate_dependencies(DependencyGenerator state)
+{
+	AttrNumber *current = (AttrNumber *) palloc0(sizeof(AttrNumber) * state->k);
+
+	generate_dependencies_recurse(state, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the DependencyGenerator of variations, and prebuild the variations
+ *
+ * This pre-builds all the variations. We could also generate them in
+ * DependencyGenerator_next(), but this seems simpler.
+ */
+static DependencyGenerator
+DependencyGenerator_init(int n, int k)
+{
+	DependencyGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the DependencyGenerator state */
+	state = (DependencyGenerator) palloc0(sizeof(DependencyGeneratorData));
+	state->dependencies = (AttrNumber *) palloc(k * sizeof(AttrNumber));
+
+	state->ndependencies = 0;
+	state->current = 0;
+	state->k = k;
+	state->n = n;
+
+	/* now actually pre-generate all the variations */
+	generate_dependencies(state);
+
+	return state;
+}
+
+/* free the DependencyGenerator state */
+static void
+DependencyGenerator_free(DependencyGenerator state)
+{
+	pfree(state->dependencies);
+	pfree(state);
+
+}
+
+/* generate next combination */
+static AttrNumber *
+DependencyGenerator_next(DependencyGenerator state)
+{
+	if (state->current == state->ndependencies)
+		return NULL;
+
+	return &state->dependencies[state->k * state->current++];
+}
+
+
+/*
+ * validates functional dependency on the data
+ *
+ * An actual work horse of detecting functional dependencies. Given a variation
+ * of k attributes, it checks that the first (k-1) are sufficient to determine
+ * the last one.
+ */
+static double
+dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
+				  VacAttrStats **stats, Bitmapset *attrs)
+{
+	int			i,
+				j;
+	int			nvalues = numrows * k;
+	MultiSortSupport mss;
+	SortItem   *items;
+	Datum	   *values;
+	bool	   *isnull;
+	int		   *attnums;
+
+	/* counters valid within a group */
+	int			group_size = 0;
+	int			n_violations = 0;
+
+	/* total number of rows supporting (consistent with) the dependency */
+	int			n_supporting_rows = 0;
+
+	/* Make sure we have at least two input attributes. */
+	Assert(k >= 2);
+
+	/* sort info for all attributes columns */
+	mss = multi_sort_init(k);
+
+	/* data for the sort */
+	items = (SortItem *) palloc(numrows * sizeof(SortItem));
+	values = (Datum *) palloc(sizeof(Datum) * nvalues);
+	isnull = (bool *) palloc(sizeof(bool) * nvalues);
+
+	/* fix the pointers to values/isnull */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	/*
+	 * Transform the bms into an array, to make accessing i-th member easier.
+	 */
+	attnums = (int *) palloc(sizeof(int) * bms_num_members(attrs));
+	i = 0;
+	j = -1;
+	while ((j = bms_next_member(attrs, j)) >= 0)
+		attnums[i++] = j;
+
+	/*
+	 * Verify the dependency (a,b,...)->z, using a rather simple algorithm:
+	 *
+	 * (a) sort the data lexicographically
+	 *
+	 * (b) split the data into groups by first (k-1) columns
+	 *
+	 * (c) for each group count different values in the last column
+	 */
+
+	/* prepare the sort function for the first dimension, and SortItem array */
+	for (i = 0; i < k; i++)
+	{
+		VacAttrStats *colstat = stats[dependency[i]];
+		TypeCacheEntry *type;
+
+		type = lookup_type_cache(colstat->attrtypid, TYPECACHE_LT_OPR);
+		if (type->lt_opr == InvalidOid) /* shouldn't happen */
+			elog(ERROR, "cache lookup failed for ordering operator for type %u",
+				 colstat->attrtypid);
+
+		/* prepare the sort function for this dimension */
+		multi_sort_add_dimension(mss, i, type->lt_opr);
+
+		/* accumulate all the data for both columns into an array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i] =
+				heap_getattr(rows[j], attnums[dependency[i]],
+							 stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	/* sort the items so that we can detect the groups */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/*
+	 * Walk through the sorted array, split it into rows according to the
+	 * first (k-1) columns. If there's a single value in the last column, we
+	 * count the group as 'supporting' the functional dependency. Otherwise we
+	 * count it as contradicting.
+	 *
+	 * We also require a group to have a minimum number of rows to be
+	 * considered useful for supporting the dependency. Contradicting groups
+	 * may be of any size, though.
+	 *
+	 * XXX The minimum size requirement makes it impossible to identify case
+	 * when both columns are unique (or nearly unique), and therefore
+	 * trivially functionally dependent.
+	 */
+
+	/* start with the first row forming a group */
+	group_size = 1;
+
+	/* loop 1 beyond the end of the array so that we count the final group */
+	for (i = 1; i <= numrows; i++)
+	{
+		/*
+		 * Check if the group ended, which may be either because we processed
+		 * all the items (i==numrows), or because the i-th item is not equal
+		 * to the preceding one.
+		 */
+		if (i == numrows ||
+		multi_sort_compare_dims(0, k - 2, &items[i - 1], &items[i], mss) != 0)
+		{
+			/*
+			 * Do accounting for the preceding group, and reset counters.
+			 *
+			 * If there were no contradicting rows in the group, count the
+			 * rows as supporting.
+			 *
+			 * XXX Maybe the threshold here should be somehow related to the
+			 * number of distinct values in the combination of columns we're
+			 * analyzing. Assuming the distribution is uniform, we can
+			 * estimate the average group size and use it as a threshold,
+			 * similarly to what we do for MCV lists.
+			 */
+			if (n_violations == 0 && group_size >= DEPENDENCY_MIN_GROUP_SIZE)
+				n_supporting_rows += group_size;
+
+			/* current values start a new group */
+			n_violations = 0;
+			group_size = 1;
+			continue;
+		}
+		/* first columns match, but the last one does not (so contradicting) */
+		else if (multi_sort_compare_dim(k - 1, &items[i - 1], &items[i], mss) != 0)
+			n_violations++;
+
+		group_size++;
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(mss);
+
+	/* Compute the 'degree of validity' as (supporting/total). */
+	return (n_supporting_rows * 1.0 / numrows);
+}
+
+/*
+ * detects functional dependencies between groups of columns
+ *
+ * Generates all possible subsets of columns (variations) and computes
+ * the degree of validity for each one. For example with a statistic on
+ * three columns (a,b,c) there are 9 possible dependencies
+ *
+ *	   two columns			  three columns
+ *	   -----------			  -------------
+ *	   (a) -> b				  (a,b) -> c
+ *	   (a) -> c				  (a,c) -> b
+ *	   (b) -> a				  (b,c) -> a
+ *	   (b) -> c
+ *	   (c) -> a
+ *	   (c) -> b
+ */
+MVDependencies *
+statext_dependencies_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
+						   VacAttrStats **stats)
+{
+	int			i,
+				j,
+				k;
+	int			numattrs;
+	int		   *attnums;
+
+	/* result */
+	MVDependencies *dependencies = NULL;
+
+	numattrs = bms_num_members(attrs);
+
+	/*
+	 * Transform the bms into an array, to make accessing i-th member easier.
+	 */
+	attnums = palloc(sizeof(int) * bms_num_members(attrs));
+	i = 0;
+	j = -1;
+	while ((j = bms_next_member(attrs, j)) >= 0)
+		attnums[i++] = j;
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * We'll try build functional dependencies starting from the smallest ones
+	 * covering just 2 columns, to the largest ones, covering all columns
+	 * included in the statistics. We start from the smallest ones because we
+	 * want to be able to skip already implied ones.
+	 */
+	for (k = 2; k <= numattrs; k++)
+	{
+		AttrNumber *dependency; /* array with k elements */
+
+		/* prepare a DependencyGenerator of variation */
+		DependencyGenerator DependencyGenerator = DependencyGenerator_init(numattrs, k);
+
+		/* generate all possible variations of k values (out of n) */
+		while ((dependency = DependencyGenerator_next(DependencyGenerator)))
+		{
+			double		degree;
+			MVDependency *d;
+
+			/* compute how valid the dependency seems */
+			degree = dependency_degree(numrows, rows, k, dependency, stats, attrs);
+
+			/*
+			 * if the dependency seems entirely invalid, don't store it it
+			 */
+			if (degree == 0.0)
+				continue;
+
+			d = (MVDependency *) palloc0(offsetof(MVDependency, attributes)
+										 + k * sizeof(AttrNumber));
+
+			/* copy the dependency (and keep the indexes into stakeys) */
+			d->degree = degree;
+			d->nattributes = k;
+			for (i = 0; i < k; i++)
+				d->attributes[i] = attnums[dependency[i]];
+
+			/* initialize the list of dependencies */
+			if (dependencies == NULL)
+			{
+				dependencies
+					= (MVDependencies *) palloc0(sizeof(MVDependencies));
+
+				dependencies->magic = STATS_DEPS_MAGIC;
+				dependencies->type = STATS_DEPS_TYPE_BASIC;
+				dependencies->ndeps = 0;
+			}
+
+			dependencies->ndeps++;
+			dependencies = (MVDependencies *) repalloc(dependencies,
+											   offsetof(MVDependencies, deps)
+							   + dependencies->ndeps * sizeof(MVDependency));
+
+			dependencies->deps[dependencies->ndeps - 1] = d;
+		}
+
+		/*
+		 * we're done with variations of k elements, so free the
+		 * DependencyGenerator
+		 */
+		DependencyGenerator_free(DependencyGenerator);
+	}
+
+	return dependencies;
+}
+
+
+/*
+ * Serialize list of dependencies into a bytea value.
+ */
+bytea *
+statext_dependencies_serialize(MVDependencies * dependencies)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+	Size		len;
+
+	/* we need to store ndeps, with a number of attributes for each one */
+	len = VARHDRSZ + SizeOfDependencies
+		+ dependencies->ndeps * SizeOfDependency;
+
+	/* and also include space for the actual attribute numbers and degrees */
+	for (i = 0; i < dependencies->ndeps; i++)
+		len += (sizeof(AttrNumber) * dependencies->deps[i]->nattributes);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	/* Store the base struct values (magic, type, ndeps) */
+	memcpy(tmp, &dependencies->magic, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(tmp, &dependencies->type, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(tmp, &dependencies->ndeps, sizeof(uint32));
+	tmp += sizeof(uint32);
+
+	/* store number of attributes and attribute numbers for each dependency */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *d = dependencies->deps[i];
+
+		memcpy(tmp, d, SizeOfDependency);
+		tmp += SizeOfDependency;
+
+		memcpy(tmp, d->attributes, sizeof(AttrNumber) * d->nattributes);
+		tmp += sizeof(AttrNumber) * d->nattributes;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies *
+statext_dependencies_deserialize(bytea *data)
+{
+	int			i;
+	Size		min_expected_size;
+	MVDependencies *dependencies;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < SizeOfDependencies)
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), SizeOfDependencies);
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies *) palloc0(sizeof(MVDependencies));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* read the header fields and perform basic sanity checks */
+	memcpy(&dependencies->magic, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(&dependencies->type, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(&dependencies->ndeps, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+
+	if (dependencies->magic != STATS_DEPS_MAGIC)
+		elog(ERROR, "invalid dependency magic %d (expected %d)",
+			 dependencies->magic, STATS_DEPS_MAGIC);
+
+	if (dependencies->type != STATS_DEPS_TYPE_BASIC)
+		elog(ERROR, "invalid dependency type %d (expected %d)",
+			 dependencies->type, STATS_DEPS_TYPE_BASIC);
+
+	if (dependencies->ndeps == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid zero-length item array in MVDependencies")));
+
+	/* what minimum bytea size do we expect for those parameters */
+	min_expected_size = SizeOfDependencies +
+		dependencies->ndeps * (SizeOfDependency +
+							   sizeof(AttrNumber) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < min_expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), min_expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependencies, deps)
+							+ (dependencies->ndeps * sizeof(MVDependency *)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		double		degree;
+		AttrNumber	k;
+		MVDependency *d;
+
+		/* degree of validity */
+		memcpy(&degree, tmp, sizeof(double));
+		tmp += sizeof(double);
+
+		/* number of attributes */
+		memcpy(&k, tmp, sizeof(AttrNumber));
+		tmp += sizeof(AttrNumber);
+
+		/* is the number of attributes valid? */
+		Assert((k >= 2) && (k <= STATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the dependency */
+		d = (MVDependency *) palloc0(offsetof(MVDependency, attributes)
+									 + (k * sizeof(AttrNumber)));
+
+		d->degree = degree;
+		d->nattributes = k;
+
+		/* copy attribute numbers */
+		memcpy(d->attributes, tmp, sizeof(AttrNumber) * d->nattributes);
+		tmp += sizeof(AttrNumber) * d->nattributes;
+
+		dependencies->deps[i] = d;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return dependencies;
+}
+
+/*
+ * dependency_is_fully_matched
+ *		checks that a functional dependency is fully matched given clauses on
+ *		attributes (assuming the clauses are suitable equality clauses)
+ */
+static bool
+dependency_is_fully_matched(MVDependency * dependency, Bitmapset *attnums)
+{
+	int			j;
+
+	/*
+	 * Check that the dependency actually is fully covered by clauses. We have
+	 * to translate all attribute numbers, as those are referenced
+	 */
+	for (j = 0; j < dependency->nattributes; j++)
+	{
+		int			attnum = dependency->attributes[j];
+
+		if (!bms_is_member(attnum, attnums))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * dependency_implies_attribute
+ *		check that the attnum matches is implied by the functional dependency
+ */
+static bool
+dependency_implies_attribute(MVDependency * dependency, AttrNumber attnum)
+{
+	if (attnum == dependency->attributes[dependency->nattributes - 1])
+		return true;
+
+	return false;
+}
+
+/*
+ * staext_dependencies_load
+ *		Load the functional dependencies for the indicated pg_statistic_ext tuple
+ */
+MVDependencies *
+staext_dependencies_load(Oid mvoid)
+{
+	bool		isnull;
+	Datum		deps;
+
+	/*
+	 * Prepare to scan pg_statistic_ext for entries having indrelid = this
+	 * rel.
+	 */
+	HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+
+	if (!HeapTupleIsValid(htup))
+		elog(ERROR, "cache lookup failed for extended statistics %u", mvoid);
+
+	deps = SysCacheGetAttr(STATEXTOID, htup,
+						   Anum_pg_statistic_ext_stadependencies, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return statext_dependencies_deserialize(DatumGetByteaP(deps));
+}
+
+/*
+ * pg_dependencies_in		- input routine for type pg_dependencies.
+ *
+ * pg_dependencies is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ */
+Datum
+pg_dependencies_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies		- output routine for type pg_dependencies.
+ */
+Datum
+pg_dependencies_out(PG_FUNCTION_ARGS)
+{
+	int			i,
+				j;
+	StringInfoData str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVDependencies *dependencies = statext_dependencies_deserialize(data);
+
+	initStringInfo(&str);
+	appendStringInfoChar(&str, '[');
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *dependency = dependencies->deps[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoChar(&str, '{');
+		for (j = 0; j < dependency->nattributes; j++)
+		{
+			if (j == dependency->nattributes - 1)
+				appendStringInfoString(&str, " => ");
+			else if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", dependency->attributes[j]);
+		}
+		appendStringInfo(&str, " : %f", dependency->degree);
+		appendStringInfoChar(&str, '}');
+	}
+
+	appendStringInfoChar(&str, ']');
+
+	PG_RETURN_CSTRING(str.data);
+}
+
+/*
+ * pg_dependencies_recv		- binary input routine for type pg_dependencies.
+ */
+Datum
+pg_dependencies_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies_send		- binary output routine for type pg_dependencies.
+ *
+ * Functional dependencies are serialized in a bytea value (although the type
+ * is named differently), so let's just send that.
+ */
+Datum
+pg_dependencies_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+/*
+ * Recursive walker that checks compatibility of the clause with extended
+ * statistics, and collects attnums from the Vars.
+ */
+static bool
+dependency_compatible_walker(Node *node,
+							 dependency_compatible_context * context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) node;
+
+		/* Pseudoconstants are not really interesting here. */
+		if (rinfo->pseudoconstant)
+			return true;
+
+		/* clauses referencing multiple varnos are incompatible */
+		if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+			return true;
+
+		/* check the clause inside the RestrictInfo */
+		return dependency_compatible_walker((Node *) rinfo->clause, context);
+	}
+
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+
+		/*
+		 * Also, the variable needs to reference the right relid (this might
+		 * be unnecessary given the other checks, but let's be sure).
+		 */
+		if (var->varno != context->varno)
+			return true;
+
+		/* we also better ensure the var is from the current level */
+		if (var->varlevelsup > 0)
+			return true;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (!AttrNumberIsForUserDefinedAttr(var->varattno))
+			return true;
+
+		/* Seems fine, so let's remember the attnum. */
+		context->varattnos = bms_add_member(context->varattnos, var->varattno);
+
+		return false;
+	}
+
+	/*
+	 * And finally the operator expressions - we only allow simple expressions
+	 * with two arguments, where one is a Var and the other is a constant, and
+	 * it's a simple comparison (which we detect using estimator function).
+	 */
+	if (is_opclause(node))
+	{
+		OpExpr	   *expr = (OpExpr *) node;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+		/* Only expressions with two arguments are considered compatible. */
+		if (list_length(expr->args) != 2)
+			return true;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node *) expr) == 1) &&
+			(is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (!ok)
+			return true;
+
+		/*
+		 * If it's not "=" operator, just ignore the clause, as it's not
+		 * compatible with functinal dependencies. Otherwise note the relid
+		 * and attnum for the variable.
+		 *
+		 * This uses the function for estimating selectivity, not the operator
+		 * directly (a bit awkward, but well ...).
+		 */
+		if (get_oprrest(expr->opno) != F_EQSEL)
+			return true;
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		return dependency_compatible_walker((Node *) var, context);
+	}
+
+	/* Node not explicitly supported, so terminate */
+	return true;
+}
+
+/*
+ * dependency_compatible_clause
+ *		Determines if the clause is compatible with functional dependencies
+ *
+ * Only OpExprs with two arguments using an equality operator are supported.
+ * When returning True attnum is set to the attribute number of the Var within
+ * the supported clause.
+ *
+ * Currently we only support Var = Const, or Const = Var. It may be possible
+ * to expand on this later.
+ */
+static bool
+dependency_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
+{
+	dependency_compatible_context context;
+
+	context.varno = relid;
+	context.varattnos = NULL;	/* no attnums */
+
+	if (dependency_compatible_walker(clause, &context))
+		return false;
+
+	/* remember the newly collected attnums */
+	*attnum = bms_singleton_member(context.varattnos);
+
+	return true;
+}
+
+/*
+ * find_strongest_dependency
+ *		find the strongest dependency on the attributes
+ *
+ * When applying functional dependencies, we start with the strongest
+ * dependencies. That is, we select the dependency that:
+ *
+ * (a) has all attributes covered by equality clauses
+ *
+ * (b) has the most attributes
+ *
+ * (c) has the highest degree of validity
+ *
+ * This guarantees that we eliminate the most redundant conditions first
+ * (see the comment in dependencies_clauselist_selectivity).
+ */
+static MVDependency *
+find_strongest_dependency(StatisticExtInfo * stats, MVDependencies * dependencies,
+						  Bitmapset *attnums)
+{
+	int			i;
+	MVDependency *strongest = NULL;
+
+	/* number of attnums in clauses */
+	int			nattnums = bms_num_members(attnums);
+
+	/*
+	 * Iterate over the MVDependency items and find the strongest one from the
+	 * fully-matched dependencies. We do the cheap checks first, before
+	 * matching it against the attnums.
+	 */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *dependency = dependencies->deps[i];
+
+		/*
+		 * Skip dependencies referencing more attributes than available
+		 * clauses, as those can't be fully matched.
+		 */
+		if (dependency->nattributes > nattnums)
+			continue;
+
+		if (strongest)
+		{
+			/* skip dependencies on fewer attributes than the strongest. */
+			if (dependency->nattributes < strongest->nattributes)
+				continue;
+
+			/* also skip weaker dependencies when attribute count matches */
+			if (strongest->nattributes == dependency->nattributes &&
+				strongest->degree > dependency->degree)
+				continue;
+		}
+
+		/*
+		 * this dependency is stronger, but we must still check that it's
+		 * fully matched to these attnums. We perform this check last as it's
+		 * slightly more expensive than the previous checks.
+		 */
+		if (dependency_is_fully_matched(dependency, attnums))
+			strongest = dependency;		/* save new best match */
+	}
+
+	return strongest;
+}
+
+/*
+ * dependencies_clauselist_selectivity
+ *		Attempt to estimate selectivity using functional dependency statistics
+ *
+ * Given equality clauses on attributes (a,b) we find the strongest dependency
+ * between them, i.e. either (a=>b) or (b=>a). Assuming (a=>b) is the selected
+ * dependency, we then combine the per-clause selectivities using the formula
+ *
+ *	   P(a,b) = P(a) * [f + (1-f)*P(b)]
+ *
+ * where 'f' is the degree of the dependency.
+ *
+ * With clauses on more than two attributes, the dependencies are applied
+ * recursively, starting with the widest/strongest dependencies. For example
+ * P(a,b,c) is first split like this:
+ *
+ *	   P(a,b,c) = P(a,b) * [f + (1-f)*P(c)]
+ *
+ * assuming (a,b=>c) is the strongest dependency.
+ */
+Selectivity
+dependencies_clauselist_selectivity(PlannerInfo *root,
+									List *clauses,
+									int varRelid,
+									JoinType jointype,
+									SpecialJoinInfo *sjinfo,
+									RelOptInfo *rel,
+									Bitmapset **estimatedclauses)
+{
+	Selectivity s1 = 1.0;
+	ListCell   *l;
+	Bitmapset  *clauses_attnums = NULL;
+	StatisticExtInfo *stat;
+	MVDependencies *dependencies;
+	AttrNumber *list_attnums;
+	int			listidx;
+
+
+	/* check if there's any stats that might be useful for us. */
+	if (!has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
+		return 1.0;
+
+	list_attnums = (AttrNumber *) palloc(sizeof(AttrNumber) *
+										 list_length(clauses));
+
+	/*
+	 * Pre-process the clauses list to extract the attnums seen in each item.
+	 * We need to determine if there's any clauses which will be useful for
+	 * dependency selectivity estimations. Along the way we'll record all of
+	 * the attnums for each clause in a list which we'll reference later so we
+	 * don't need to repeat the same work again. We'll also keep track of all
+	 * attnums seen.
+	 */
+	listidx = 0;
+	foreach(l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+		AttrNumber	attnum;
+
+		if (dependency_compatible_clause(clause, rel->relid, &attnum))
+		{
+			list_attnums[listidx] = attnum;
+			clauses_attnums = bms_add_member(clauses_attnums, attnum);
+		}
+		else
+			list_attnums[listidx] = InvalidAttrNumber;
+
+		listidx++;
+	}
+
+	/*
+	 * If there's not at least two distinct attnums then reject the whole list
+	 * of clauses. We must return 1.0 so the calling function's selectivity is
+	 * unaffected.
+	 */
+	if (bms_num_members(clauses_attnums) < 2)
+	{
+		pfree(list_attnums);
+		return 1.0;
+	}
+
+	/* find the best suited statistics for these attnums */
+	stat = choose_best_statistics(rel->statlist, clauses_attnums,
+								  STATS_EXT_DEPENDENCIES);
+
+	/* if no matching stats could be found then we've nothing to do */
+	if (!stat)
+	{
+		pfree(list_attnums);
+		return 1.0;
+	}
+
+	/* load the dependency items stored in the statistics */
+	dependencies = staext_dependencies_load(stat->statOid);
+
+	/*
+	 * Apply the dependencies recursively, starting with the widest/strongest
+	 * ones, and proceeding to the smaller/weaker ones. At the end of each
+	 * round we factor in the selectivity of clauses on the implied attribute,
+	 * and remove the clauses from the list.
+	 */
+	while (true)
+	{
+		Selectivity s2 = 1.0;
+		MVDependency *dependency;
+
+		/* the widest/strongest dependency, fully matched by clauses */
+		dependency = find_strongest_dependency(stat, dependencies,
+											   clauses_attnums);
+
+		/* if no suitable dependency was found, we're done */
+		if (!dependency)
+			break;
+
+		/*
+		 * We found an applicable dependency, so find all the clauses on the
+		 * implied attribute - with dependency (a,b => c) we look for clauses
+		 * on 'c'.
+		 */
+		listidx = -1;
+		foreach(l, clauses)
+		{
+			Node	   *clause;
+
+			listidx++;
+
+			/*
+			 * Skip incompatible clauses, and ones we've already estimated on.
+			 */
+			if (list_attnums[listidx] == InvalidAttrNumber ||
+				bms_is_member(listidx, *estimatedclauses))
+				continue;
+
+			/*
+			 * Technically we could find more than one clause for a given
+			 * attnum. Since these clauses must be equality clauses, we choose
+			 * to only take the selectivity estimate from the final clause in
+			 * the list for this attnum. If the attnum happens to be compared
+			 * to a different Const in another clause then no rows will match
+			 * anyway. If it happens to be compared to the same Const, then
+			 * ignoring the additional clause is just the thing to do.
+			 */
+			if (dependency_implies_attribute(dependency,
+											 list_attnums[listidx]))
+			{
+				clause = (Node *) lfirst(l);
+
+				s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+										NULL);	/* don't try to use ext stats */
+
+				/* mark this one as done, so we don't touch it again. */
+				*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+
+				/*
+				 * Mark that we've got and used the dependency on this clause.
+				 * We'll want to ignore this when looking for the next
+				 * strongest dependency above.
+				 */
+				clauses_attnums = bms_del_member(clauses_attnums,
+												 list_attnums[listidx]);
+			}
+		}
+
+		/*
+		 * Now factor in the selectivity for all the "implied" clauses into
+		 * the final one, using this formula:
+		 *
+		 * P(a,b) = P(a) * (f + (1-f) * P(b))
+		 *
+		 * where 'f' is the degree of validity of the dependency.
+		 */
+		s1 *= (dependency->degree + (1 - dependency->degree) * s2);
+	}
+
+	pfree(dependencies);
+	pfree(list_attnums);
+
+	return s1;
+}
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index d2b9f6a..006bb89 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -47,7 +47,7 @@ static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
 static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 					  int natts, VacAttrStats **vacattrstats);
 static void statext_store(Relation pg_stext, Oid relid,
-			  MVNDistinct *ndistinct,
+			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  VacAttrStats **stats);
 
 
@@ -74,6 +74,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	{
 		StatExtEntry   *stat = (StatExtEntry *) lfirst(lc);
 		MVNDistinct	   *ndistinct = NULL;
+		MVDependencies *dependencies = NULL;
 		VacAttrStats  **stats;
 		ListCell	   *lc2;
 
@@ -93,10 +94,13 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 			if (t == STATS_EXT_NDISTINCT)
 				ndistinct = statext_ndistinct_build(totalrows, numrows, rows,
 													stat->columns, stats);
+			else if (t == STATS_EXT_DEPENDENCIES)
+				dependencies = statext_dependencies_build(numrows, rows,
+													   stat->columns, stats);
 		}
 
 		/* store the statistics in the catalog */
-		statext_store(pg_stext, stat->statOid, ndistinct, stats);
+		statext_store(pg_stext, stat->statOid, ndistinct, dependencies, stats);
 	}
 
 	heap_close(pg_stext, RowExclusiveLock);
@@ -117,6 +121,10 @@ statext_is_kind_built(HeapTuple htup, char type)
 			attnum = Anum_pg_statistic_ext_standistinct;
 			break;
 
+		case STATS_EXT_DEPENDENCIES:
+			attnum = Anum_pg_statistic_ext_stadependencies;
+			break;
+
 		default:
 			elog(ERROR, "unexpected statistics type requested: %d", type);
 	}
@@ -178,7 +186,8 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
 		enabled = (char *) ARR_DATA_PTR(arr);
 		for (i = 0; i < ARR_DIMS(arr)[0]; i++)
 		{
-			Assert(enabled[i] == STATS_EXT_NDISTINCT);
+			Assert((enabled[i] == STATS_EXT_NDISTINCT) ||
+				   (enabled[i] == STATS_EXT_DEPENDENCIES));
 			entry->types = lappend_int(entry->types, (int) enabled[i]);
 		}
 
@@ -256,7 +265,7 @@ lookup_var_attr_stats(Relation rel, Bitmapset *attrs, int natts,
  */
 static void
 statext_store(Relation pg_stext, Oid statOid,
-			  MVNDistinct *ndistinct,
+			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -280,8 +289,17 @@ statext_store(Relation pg_stext, Oid statOid,
 		values[Anum_pg_statistic_ext_standistinct - 1] = PointerGetDatum(data);
 	}
 
+	if (dependencies != NULL)
+	{
+		bytea	   *data = statext_dependencies_serialize(dependencies);
+
+		nulls[Anum_pg_statistic_ext_stadependencies - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_stadependencies - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_statistic_ext_standistinct - 1] = true;
+	replaces[Anum_pg_statistic_ext_stadependencies - 1] = true;
 
 	/* there should already be a pg_statistic_ext tuple */
 	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
@@ -387,3 +405,82 @@ multi_sort_compare_dims(int start, int end,
 
 	return 0;
 }
+
+/*
+ * has_stats_of_kind
+ *	Check that the list contains statistic of a given kind
+ */
+bool
+has_stats_of_kind(List *stats, char requiredkind)
+{
+	ListCell   *l;
+
+	foreach(l, stats)
+	{
+		StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(l);
+
+		if (stat->kind == requiredkind)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * choose_best_statistics
+ *		Look for statistics with the specified 'requiredkind' which have keys
+ *		that match at least two attnums.
+ *
+ * The current selection criteria is very simple - we choose the statistics
+ * referencing the most attributes with the least keys.
+ *
+ * XXX if multiple statistics exists of the same size matching the same number
+ * of keys, then the statistics which are chosen depend on the order that they
+ * appear in the stats list. Perhaps this needs to be more definitive.
+ */
+StatisticExtInfo *
+choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind)
+{
+	ListCell   *lc;
+	StatisticExtInfo *best_match = NULL;
+	int			best_num_matched = 2;	/* goal #1: maximize */
+	int			best_match_keys = (STATS_MAX_DIMENSIONS + 1);	/* goal #2: minimize */
+
+	foreach(lc, stats)
+	{
+		StatisticExtInfo *info = (StatisticExtInfo *) lfirst(lc);
+		int			num_matched;
+		int			numkeys;
+		Bitmapset  *matched;
+
+		/* skip statistics that are not the correct type */
+		if (info->kind != requiredkind)
+			continue;
+
+		/* determine how many attributes of these stats can be matched to */
+		matched = bms_intersect(attnums, info->keys);
+		num_matched = bms_num_members(matched);
+		bms_free(matched);
+
+		/*
+		 * save the actual number of keys in the stats so that we can choose
+		 * the narrowest stats with the most matching keys.
+		 */
+		numkeys = bms_num_members(info->keys);
+
+		/*
+		 * Use these statistics when it increases the number of matched
+		 * clauses or when it matches the same number of attributes but these
+		 * stats have fewer keys than any previous match.
+		 */
+		if (num_matched > best_num_matched ||
+			(num_matched == best_num_matched && numkeys < best_match_keys))
+		{
+			best_match = info;
+			best_num_matched = num_matched;
+			best_match_keys = numkeys;
+		}
+	}
+
+	return best_match;
+}
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c2681ce..84934ce 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1452,6 +1452,13 @@ pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
 	StringInfoData buf;
 	int			colno;
 	char	   *nsp;
+	ArrayType  *arr;
+	char	   *enabled;
+	Datum		datum;
+	bool		isnull;
+	bool		ndistinct_enabled;
+	bool		dependencies_enabled;
+	int			i;
 
 	statexttup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statextid));
 
@@ -1467,10 +1474,55 @@ pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
 	initStringInfo(&buf);
 
 	nsp = get_namespace_name(statextrec->stanamespace);
-	appendStringInfo(&buf, "CREATE STATISTICS %s ON (",
+	appendStringInfo(&buf, "CREATE STATISTICS %s",
 					 quote_qualified_identifier(nsp,
 												NameStr(statextrec->staname)));
 
+	/*
+	 * Lookup the staenabled column so that we know how to handle the WITH
+	 * clause.
+	 */
+	datum = SysCacheGetAttr(STATEXTOID, statexttup,
+							Anum_pg_statistic_ext_staenabled, &isnull);
+	Assert(!isnull);
+	arr = DatumGetArrayTypeP(datum);
+	if (ARR_NDIM(arr) != 1 ||
+		ARR_HASNULL(arr) ||
+		ARR_ELEMTYPE(arr) != CHAROID)
+		elog(ERROR, "staenabled is not a 1-D char array");
+	enabled = (char *) ARR_DATA_PTR(arr);
+
+	ndistinct_enabled = false;
+	dependencies_enabled = false;
+
+	for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+	{
+		if (enabled[i] == STATS_EXT_NDISTINCT)
+			ndistinct_enabled = true;
+		if (enabled[i] == STATS_EXT_DEPENDENCIES)
+			dependencies_enabled = true;
+	}
+
+	/*
+	 * If any option is disabled, then we'll need to append a WITH clause to
+	 * show which options are enabled.  We omit the WITH clause on purpose
+	 * when all options are enabled, so a pg_dump/pg_restore will create all
+	 * statistics types on a newer postgres version, if the statistics had all
+	 * options enabled on the original version.
+	 */
+	if (!ndistinct_enabled || !dependencies_enabled)
+	{
+		appendStringInfoString(&buf, " WITH (");
+		if (ndistinct_enabled)
+			appendStringInfoString(&buf, "ndistinct");
+		else if (dependencies_enabled)
+			appendStringInfoString(&buf, "dependencies");
+
+		appendStringInfoChar(&buf, ')');
+	}
+
+	appendStringInfoString(&buf, " ON (");
+
 	for (colno = 0; colno < statextrec->stakeys.dim1; colno++)
 	{
 		AttrNumber	attnum = statextrec->stakeys.values[colno];
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5c382a2..7a4ed84 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1633,13 +1633,17 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype,
+													sjinfo,
+													NULL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype,
+														  sjinfo,
+														  NULL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6436,7 +6440,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  index->rel);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6757,7 +6762,8 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  index->rel);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7516,7 +7522,8 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   index->rel);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7748,7 +7755,8 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL,
+							   path->indexinfo->rel);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index b0f3e5e..2ef0626 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2331,7 +2331,8 @@ describeOneTableDetails(const char *schemaname,
 						   "    FROM ((SELECT pg_catalog.unnest(stakeys) AS attnum) s\n"
 			   "         JOIN pg_catalog.pg_attribute a ON (starelid = a.attrelid AND\n"
 							  "a.attnum = s.attnum AND not attisdropped))) AS columns,\n"
-							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled\n"
+							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled,\n"
+							  "  (staenabled::char[] @> '{f}'::char[]) AS deps_enabled\n"
 			  "FROM pg_catalog.pg_statistic_ext stat WHERE starelid  = '%s'\n"
 			  "ORDER BY 1;",
 							  oid);
@@ -2348,7 +2349,7 @@ describeOneTableDetails(const char *schemaname,
 
 				for (i = 0; i < tuples; i++)
 				{
-					int		cnt = 0;
+					bool		gotone = false;
 
 					printfPQExpBuffer(&buf, "    ");
 
@@ -2361,7 +2362,12 @@ describeOneTableDetails(const char *schemaname,
 					if (strcmp(PQgetvalue(result, i, 5), "t") == 0)
 					{
 						appendPQExpBufferStr(&buf, "ndistinct");
-						cnt++;
+						gotone = true;
+					}
+
+					if (strcmp(PQgetvalue(result, i, 6), "t") == 0)
+					{
+						appendPQExpBuffer(&buf, "%sdependencies", gotone ? ", " : "");
 					}
 
 					appendPQExpBuffer(&buf, ") ON (%s)",
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index bc5d28a..ccc6fb3 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -258,6 +258,10 @@ DATA(insert (  194	 25    0 i b ));
 DATA(insert (  3361  17    0 i b ));
 DATA(insert (  3361  25    0 i i ));
 
+/* pg_dependencies can be coerced to, but not from, bytea and text */
+DATA(insert (  3402  17    0 i b ));
+DATA(insert (  3402  25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 220ba7b..58e080e 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2771,6 +2771,15 @@ DESCR("I/O");
 DATA(insert OID = 3358 (  pg_ndistinct_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3361" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 3404 (  pg_dependencies_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3402 "2275" _null_ _null_ _null_ _null_ _null_ pg_dependencies_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3405 (  pg_dependencies_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3402" _null_ _null_ _null_ _null_ _null_ pg_dependencies_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3406 (  pg_dependencies_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3402 "2281" _null_ _null_ _null_ _null_ _null_ pg_dependencies_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3407 (  pg_dependencies_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3402" _null_ _null_ _null_ _null_ _null_ pg_dependencies_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 5f67fe7..0a1cc04 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -46,6 +46,7 @@ CATALOG(pg_statistic_ext,3381)
 	char		staenabled[1] BKI_FORCE_NOT_NULL;	/* statistic types
 													 * requested to build */
 	pg_ndistinct standistinct;	/* ndistinct coefficients (serialized) */
+	pg_dependencies stadependencies;	/* dependencies (serialized) */
 #endif
 
 } FormData_pg_statistic_ext;
@@ -61,7 +62,7 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
  *		compiler constants for pg_statistic_ext
  * ----------------
  */
-#define Natts_pg_statistic_ext					7
+#define Natts_pg_statistic_ext					8
 #define Anum_pg_statistic_ext_starelid			1
 #define Anum_pg_statistic_ext_staname			2
 #define Anum_pg_statistic_ext_stanamespace		3
@@ -69,7 +70,9 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
 #define Anum_pg_statistic_ext_stakeys			5
 #define Anum_pg_statistic_ext_staenabled		6
 #define Anum_pg_statistic_ext_standistinct		7
+#define Anum_pg_statistic_ext_stadependencies	8
 
-#define STATS_EXT_NDISTINCT		'd'
+#define STATS_EXT_NDISTINCT			'd'
+#define STATS_EXT_DEPENDENCIES		'f'
 
 #endif   /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 9ad6725..345e916 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -368,6 +368,10 @@ DATA(insert OID = 3361 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_nd
 DESCR("multivariate ndistinct coefficients");
 #define PGNDISTINCTOID	3361
 
+DATA(insert OID = 3402 ( pg_dependencies		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_dependencies_in pg_dependencies_out pg_dependencies_recv pg_dependencies_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate dependencies");
+#define PGDEPENDENCIESOID	3402
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..cb1fecf 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -200,12 +200,14 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *rel);
 extern void cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 							  RelOptInfo *rel, ParamPathInfo *param_info,
 							  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 961f1f7..0c40b86 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -52,6 +52,11 @@ extern MVNDistinct *statext_ndistinct_build(double totalrows,
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
 extern MVNDistinct *statext_ndistinct_deserialize(bytea *data);
 
+extern MVDependencies *statext_dependencies_build(int numrows, HeapTuple *rows,
+						Bitmapset *attrs, VacAttrStats **stats);
+extern bytea *statext_dependencies_serialize(MVDependencies *dependencies);
+extern MVDependencies *statext_dependencies_deserialize(bytea *data);
+
 extern MultiSortSupport multi_sort_init(int ndims);
 extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 						 Oid oper);
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 91645bf..a3f0d90 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -14,6 +14,7 @@
 #define STATISTICS_H
 
 #include "commands/vacuum.h"
+#include "nodes/relation.h"
 
 #define STATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
@@ -44,11 +45,54 @@ typedef struct MVNDistinct
 #define SizeOfMVNDistinct	(offsetof(MVNDistinct, nitems) + sizeof(uint32))
 
 
+/* size of the struct excluding the items array */
+#define SizeOfMVNDistinct	(offsetof(MVNDistinct, nitems) + sizeof(uint32))
+
+#define STATS_DEPS_MAGIC		0xB4549A2C		/* marks serialized bytea */
+#define STATS_DEPS_TYPE_BASIC	1		/* basic dependencies type */
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
+ */
+typedef struct MVDependency
+{
+	double		degree;			/* degree of validity (0-1) */
+	AttrNumber	nattributes;	/* number of attributes */
+	AttrNumber	attributes[FLEXIBLE_ARRAY_MEMBER];		/* attribute numbers */
+} MVDependency;
+
+/* size of the struct excluding the deps array */
+#define SizeOfDependency \
+	(offsetof(MVDependency, nattributes) + sizeof(AttrNumber))
+
+typedef struct MVDependencies
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MV Dependencies (BASIC) */
+	uint32		ndeps;			/* number of dependencies */
+	MVDependency *deps[FLEXIBLE_ARRAY_MEMBER];	/* dependencies */
+} MVDependencies;
+
+/* size of the struct excluding the deps array */
+#define SizeOfDependencies	(offsetof(MVDependencies, ndeps) + sizeof(uint32))
+
 extern MVNDistinct *statext_ndistinct_load(Oid mvoid);
+extern MVDependencies *staext_dependencies_load(Oid mvoid);
 
 extern void BuildRelationExtStatistics(Relation onerel, double totalrows,
 						   int numrows, HeapTuple *rows,
 						   int natts, VacAttrStats **vacattrstats);
 extern bool statext_is_kind_built(HeapTuple htup, char kind);
+extern Selectivity dependencies_clauselist_selectivity(PlannerInfo *root,
+									List *clauses,
+									int varRelid,
+									JoinType jointype,
+									SpecialJoinInfo *sjinfo,
+									RelOptInfo *rel,
+									Bitmapset **estimatedclauses);
+extern bool has_stats_of_kind(List *stats, char requiredkind);
+extern StatisticExtInfo *choose_best_statistics(List *stats,
+					   Bitmapset *attnums, char requiredkind);
 
 #endif   /* STATISTICS_H */
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 262036a..d23f876 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -824,11 +824,12 @@ WHERE c.castmethod = 'b' AND
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
  pg_ndistinct      | bytea             |        0 | i
+ pg_dependencies   | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(8 rows)
+(9 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d706f42..cba82bb 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2192,7 +2192,8 @@ pg_stats_ext| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.staname,
     s.stakeys AS attnums,
-    length((s.standistinct)::text) AS ndistbytes
+    length((s.standistinct)::bytea) AS ndistbytes,
+    length((s.stadependencies)::bytea) AS depsbytes
    FROM ((pg_statistic_ext s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 8fe96d6..b43208d 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -31,7 +31,7 @@ ALTER TABLE ab1 DROP COLUMN a;
  b      | integer |           |          | 
  c      | integer |           |          | 
 Statistics:
-    "public.ab1_b_c_stats" WITH (ndistinct) ON (b, c)
+    "public.ab1_b_c_stats" WITH (ndistinct, dependencies) ON (b, c)
 
 DROP TABLE ab1;
 -- Ensure things work sanely with SET STATISTICS 0
@@ -135,7 +135,7 @@ SELECT staenabled, standistinct
   FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
  staenabled |                                          standistinct                                          
 ------------+------------------------------------------------------------------------------------------------
- {d}        | [{(b 3 4), 301.000000}, {(b 3 6), 301.000000}, {(b 4 6), 301.000000}, {(b 3 4 6), 301.000000}]
+ {d,f}      | [{(b 3 4), 301.000000}, {(b 3 6), 301.000000}, {(b 4 6), 301.000000}, {(b 3 4 6), 301.000000}]
 (1 row)
 
 -- Hash Aggregate, thanks to estimates improved by the statistic
@@ -201,7 +201,7 @@ SELECT staenabled, standistinct
   FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
  staenabled |                                            standistinct                                            
 ------------+----------------------------------------------------------------------------------------------------
- {d}        | [{(b 3 4), 2550.000000}, {(b 3 6), 800.000000}, {(b 4 6), 1632.000000}, {(b 3 4 6), 10000.000000}]
+ {d,f}      | [{(b 3 4), 2550.000000}, {(b 3 6), 800.000000}, {(b 4 6), 1632.000000}, {(b 3 4 6), 10000.000000}]
 (1 row)
 
 -- plans using Group Aggregate, thanks to using correct esimates
@@ -311,3 +311,107 @@ EXPLAIN (COSTS off)
 (3 rows)
 
 DROP TABLE ndistinct;
+-- functional dependencies tests
+CREATE TABLE functional_dependencies (
+    filler1 TEXT,
+    filler2 NUMERIC,
+    a INT,
+    b TEXT,
+    filler3 DATE,
+    c INT,
+    d TEXT
+);
+SET random_page_cost = 1.2;
+CREATE INDEX fdeps_ab_idx ON functional_dependencies (a, b);
+CREATE INDEX fdeps_abc_idx ON functional_dependencies (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i, 23), mod(i, 29), mod(i, 31), i FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- a => b, a => c, b => c
+TRUNCATE functional_dependencies;
+DROP STATISTICS func_deps_stat;
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   Filter: (c = 1)
+   ->  Bitmap Index Scan on fdeps_ab_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(5 rows)
+
+RESET random_page_cost;
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 84022f6..7b200ba 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -67,12 +67,13 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid  |   typname    
-------+--------------
+ oid  |     typname     
+------+-----------------
   194 | pg_node_tree
  3361 | pg_ndistinct
+ 3402 | pg_dependencies
   210 | smgr
-(3 rows)
+(4 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 4faaf88..1b0018d 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -163,3 +163,71 @@ EXPLAIN (COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
 
 DROP TABLE ndistinct;
+
+-- functional dependencies tests
+CREATE TABLE functional_dependencies (
+    filler1 TEXT,
+    filler2 NUMERIC,
+    a INT,
+    b TEXT,
+    filler3 DATE,
+    c INT,
+    d TEXT
+);
+
+SET random_page_cost = 1.2;
+
+CREATE INDEX fdeps_ab_idx ON functional_dependencies (a, b);
+CREATE INDEX fdeps_abc_idx ON functional_dependencies (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i, 23), mod(i, 29), mod(i, 31), i FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- a => b, a => c, b => c
+TRUNCATE functional_dependencies;
+DROP STATISTICS func_deps_stat;
+
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+RESET random_page_cost;
+DROP TABLE functional_dependencies;

#231

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: David Rowley (#230)

Re: multivariate statistics (v25)

On 04/04/2017 09:55 AM, David Rowley wrote:

On 1 April 2017 at 04:25, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've attached an updated patch.

I've made another pass at this and ended up removing the tryextstats
variable. We now only try to use extended statistics when
clauselist_selectivity() is given a valid RelOptInfo with rtekind ==
RTE_RELATION, and of course, it must also have some extended stats
defined too.

I've also cleaned up a few more comments, many of which I managed to
omit updating when I refactored how the selectivity estimates ties
into clauselist_selectivity()

I'm quite happy with all of this now, and would also be happy for
other people to take a look and comment.

As a reviewer, I'd be marking this ready for committer, but I've moved
a little way from just reviewing this now, having spent two weeks
hacking at it.

The latest patch is attached.

Thanks David, I agree the reworked patch is much cleaner that the last
version I posted. Thanks for spending your time on it.

Two minor comments:

1) DEPENDENCY_MIN_GROUP_SIZE

I'm not sure we still need the min_group_size, when evaluating
dependencies. It was meant to deal with 'noisy' data, but I think it
after switching to the 'degree' it might actually be a bad idea.

Consider this:

create table t (a int, b int);
insert into t select 1, 1 from generate_series(1, 10000) s(i);
insert into t select i, i from generate_series(2, 20000) s(i);
create statistics s with (dependencies) on (a,b) from t;
analyze t;

select stadependencies from pg_statistic_ext ;
stadependencies
--------------------------------------------
[{1 => 2 : 0.333344}, {2 => 1 : 0.333344}]
(1 row)

So the degree of the dependency is just ~0.333 although it's obviously a
perfect dependency, i.e. a knowledge of 'a' determines 'b'. The reason
is that we discard 2/3 of rows, because those groups are only a single
row each, except for the one large group (1/3 of rows).

Without the mininum group size limitation, the dependencies are:

test=# select stadependencies from pg_statistic_ext ;
stadependencies
--------------------------------------------
[{1 => 2 : 1.000000}, {2 => 1 : 1.000000}]
(1 row)

which seems way more reasonable, I think.

2) A minor detail is that instead of this

if (estimatedclauses != NULL &&
bms_is_member(listidx, estimatedclauses))
continue;

perhaps we should do just this:

if (bms_is_member(listidx, estimatedclauses))
continue;

bms_is_member does the same NULL check right at the beginning, so I
don't think this might make a measurable difference.

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#232

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 9 years ago

In reply to: Tomas Vondra (#231)

Re: multivariate statistics (v25)

At Tue, 4 Apr 2017 20:19:39 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com>

On 04/04/2017 09:55 AM, David Rowley wrote:

On 1 April 2017 at 04:25, David Rowley <david.rowley@2ndquadrant.com>
wrote:

I've attached an updated patch.

I've made another pass at this and ended up removing the tryextstats
variable. We now only try to use extended statistics when
clauselist_selectivity() is given a valid RelOptInfo with rtekind ==
RTE_RELATION, and of course, it must also have some extended stats
defined too.

I've also cleaned up a few more comments, many of which I managed to
omit updating when I refactored how the selectivity estimates ties
into clauselist_selectivity()

I'm quite happy with all of this now, and would also be happy for
other people to take a look and comment.

As a reviewer, I'd be marking this ready for committer, but I've moved
a little way from just reviewing this now, having spent two weeks
hacking at it.

The latest patch is attached.

Thanks David, I agree the reworked patch is much cleaner that the last
version I posted. Thanks for spending your time on it.

Two minor comments:

1) DEPENDENCY_MIN_GROUP_SIZE

I'm not sure we still need the min_group_size, when evaluating
dependencies. It was meant to deal with 'noisy' data, but I think it
after switching to the 'degree' it might actually be a bad idea.

Consider this:

create table t (a int, b int);
insert into t select 1, 1 from generate_series(1, 10000) s(i);
insert into t select i, i from generate_series(2, 20000) s(i);
create statistics s with (dependencies) on (a,b) from t;
analyze t;

select stadependencies from pg_statistic_ext ;
stadependencies
--------------------------------------------
[{1 => 2 : 0.333344}, {2 => 1 : 0.333344}]
(1 row)

So the degree of the dependency is just ~0.333 although it's obviously
a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The
reason is that we discard 2/3 of rows, because those groups are only a
single row each, except for the one large group (1/3 of rows).

Without the mininum group size limitation, the dependencies are:

test=# select stadependencies from pg_statistic_ext ;
stadependencies
--------------------------------------------
[{1 => 2 : 1.000000}, {2 => 1 : 1.000000}]
(1 row)

which seems way more reasonable, I think.

I think the same. Quite large part of functional dependency in
reality is in this kind.

2) A minor detail is that instead of this

if (estimatedclauses != NULL &&
bms_is_member(listidx, estimatedclauses))
continue;

perhaps we should do just this:

if (bms_is_member(listidx, estimatedclauses))
continue;

bms_is_member does the same NULL check right at the beginning, so I
don't think this might make a measurable difference.

I have some other comments.

======
- The comment for clauselist_selectivity,
| + * When 'rel' is not null and rtekind = RTE_RELATION, we'll try to apply
| + * selectivity estimates using any extended statistcs on 'rel'.

The 'rel' is actually a parameter but rtekind means rel->rtekind
so this might be better be such like the following.

| When a relation of RTE_RELATION is given as 'rel', we try
| extended statistcs on the relation.

Then the following line doesn't seem to be required.

| + * If we identify such extended statistics exist, we try to apply them.

=====
The following comment in the same function,

| + if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
| + {
| + /*
| + * Try to estimate with multivariate functional dependency statistics.
| + *
| + * The function will supply an estimate for the clauses which it
| + * estimated for. Any clauses which were unsuitible were ignored.
| + * Clauses which were estimated will have their 0-based list index set
| + * in estimatedclauses. We must ignore these clauses when processing
| + * the remaining clauses later.
| + */

(Notice that I'm not a good writer) This might better be the
following.

| dependencies_clauselist_selectivity gives selectivity over
| caluses that functional dependencies on the given relation is
| applicable. 0-based index numbers of consumed clauses are
| returned in the bitmap set estimatedclauses so that the
| estimation here after can ignore them.

=====
| + s1 *= dependencies_clauselist_selectivity(root, clauses, varRelid,
| + jointype, sjinfo, rel, &estimatedclauses);

The name prefix "dependency_" means "functional_dependency" here
and omitting "functional" is confusing to me. On the other hand
"functional_dependency" is quite long as prefix. Could we use
"func_dependency" or something that is shorter but meaningful?
(But this change causes renaming of many other sutff..)

=====
The name "dependency_compatible_clause" might be meaningful if it
were "clause_is_compatible_with_(functional_)dependency" or such.

=====
dependency_compatible_walker() returns true if given node is
*not* compatible. Isn't it confusing?

=====
dependency_compatible_walker() seems implicitly expecting that
RestrictInfo will be given at the first. RestrictInfo might
should be processed outside this function in _compatible_clause().

=====
dependency_compatible_walker() can return two or more attriburtes
but dependency_compatible_clause() errors out in the case. Since
_walker is called only from the _clause, _walker can return
earlier with "incompatible" in such a case.

=====
In the comment in dependencies_clauselist_selectivity(),

| /*
| * Technically we could find more than one clause for a given
| * attnum. Since these clauses must be equality clauses, we choose
| * to only take the selectivity estimate from the final clause in
| * the list for this attnum. If the attnum happens to be compared
| * to a different Const in another clause then no rows will match
| * anyway. If it happens to be compared to the same Const, then
| * ignoring the additional clause is just the thing to do.
| */
| if (dependency_implies_attribute(dependency,
| list_attnums[listidx]))

If multiple clauses include the attribute, selectivity estimates
for clauses other than the last one are waste of time. Why not the
first one but the last one?

Even if all clauses should be added into estimatedclauses,
calling clause_selectivity once is enough. Since
clause_selectivity may return 1.0 for some clauses, using s2 for
the decision seems reasonable.

| if (dependency_implies_attribute(dependency,
| list_attnums[listidx]))
| {
| clause = (Node *) lfirst(l);
+ if (s2 == 1.0)
| s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,

# This '==' works since it is not a result of a calculation.

=====
Still in dependencies_clauselist_selectivity,
dependency_implies_attributes seems designed to return true for
at least one clause in the clauses but any failure leands to
infinite loop. I think any measure against the case is required.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#233

Sven R. Kunze

srkunze@mail.de

almost 9 years ago

In reply to: Tomas Vondra (#231)

Re: multivariate statistics (v25)

Thanks Tomas and David for hacking on this patch.

On 04.04.2017 20:19, Tomas Vondra wrote:

I'm not sure we still need the min_group_size, when evaluating
dependencies. It was meant to deal with 'noisy' data, but I think it
after switching to the 'degree' it might actually be a bad idea.

Consider this:

create table t (a int, b int);
insert into t select 1, 1 from generate_series(1, 10000) s(i);
insert into t select i, i from generate_series(2, 20000) s(i);
create statistics s with (dependencies) on (a,b) from t;
analyze t;

select stadependencies from pg_statistic_ext ;
stadependencies
--------------------------------------------
[{1 => 2 : 0.333344}, {2 => 1 : 0.333344}]
(1 row)

So the degree of the dependency is just ~0.333 although it's obviously
a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The
reason is that we discard 2/3 of rows, because those groups are only a
single row each, except for the one large group (1/3 of rows).

Just for me to follow the comments better. Is "dependency" roughly the
same as when statisticians speak about " conditional probability"?

Sven

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#234

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 9 years ago

In reply to: Sven R. Kunze (#233)

Re: multivariate statistics (v25)

On 04/05/2017 08:41 AM, Sven R. Kunze wrote:

Thanks Tomas and David for hacking on this patch.

On 04.04.2017 20:19, Tomas Vondra wrote:

I'm not sure we still need the min_group_size, when evaluating
dependencies. It was meant to deal with 'noisy' data, but I think it
after switching to the 'degree' it might actually be a bad idea.

Consider this:

create table t (a int, b int);
insert into t select 1, 1 from generate_series(1, 10000) s(i);
insert into t select i, i from generate_series(2, 20000) s(i);
create statistics s with (dependencies) on (a,b) from t;
analyze t;

select stadependencies from pg_statistic_ext ;
stadependencies
--------------------------------------------
[{1 => 2 : 0.333344}, {2 => 1 : 0.333344}]
(1 row)

So the degree of the dependency is just ~0.333 although it's obviously
a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The
reason is that we discard 2/3 of rows, because those groups are only a
single row each, except for the one large group (1/3 of rows).

Just for me to follow the comments better. Is "dependency" roughly the
same as when statisticians speak about " conditional probability"?

No, it's more 'functional dependency' from relational normal forms.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#235

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Kyotaro HORIGUCHI (#232)

1 attachment(s)

Re: multivariate statistics (v25)

On 5 April 2017 at 14:53, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Tue, 4 Apr 2017 20:19:39 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com>

Two minor comments:

1) DEPENDENCY_MIN_GROUP_SIZE

I'm not sure we still need the min_group_size, when evaluating
dependencies. It was meant to deal with 'noisy' data, but I think it
after switching to the 'degree' it might actually be a bad idea.

Yeah, I'd wondered about this when I first started testing the patch.
I failed to get any functional dependencies because my values were too
unique. Seems I'd gotten a bit used to it, and in the end thought that
if the values are unique enough then they won't suffer as much from
the underestimation problem you're trying to solve here.

I've removed that part of the code now.

I think the same. Quite large part of functional dependency in
reality is in this kind.

2) A minor detail is that instead of this

if (estimatedclauses != NULL &&
bms_is_member(listidx, estimatedclauses))
continue;

perhaps we should do just this:

if (bms_is_member(listidx, estimatedclauses))
continue;

bms_is_member does the same NULL check right at the beginning, so I
don't think this might make a measurable difference.

hmm yeah, I'd added that because I thought the estimatedclauses would
be NULL in 99.9% of cases and thought that I might be able to shave a
few cycles off. I see that there's an x < 0 test before the NULL test
in the function. Anyway, I'm not going to put up a fight here, so I've
removed it. I didn't ever benchmark anything to see if the extra test
actually helped anyway...

I have some other comments.

======
- The comment for clauselist_selectivity,
| + * When 'rel' is not null and rtekind = RTE_RELATION, we'll try to apply
| + * selectivity estimates using any extended statistcs on 'rel'.

The 'rel' is actually a parameter but rtekind means rel->rtekind
so this might be better be such like the following.

| When a relation of RTE_RELATION is given as 'rel', we try
| extended statistcs on the relation.

Then the following line doesn't seem to be required.

| + * If we identify such extended statistics exist, we try to apply them.

Yes, good point. I've revised this comment a bit now.

=====
The following comment in the same function,

| + if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
| + {
| + /*
| + * Try to estimate with multivariate functional dependency statistics.
| + *
| + * The function will supply an estimate for the clauses which it
| + * estimated for. Any clauses which were unsuitible were ignored.
| + * Clauses which were estimated will have their 0-based list index set
| + * in estimatedclauses. We must ignore these clauses when processing
| + * the remaining clauses later.
| + */

(Notice that I'm not a good writer) This might better be the
following.

| dependencies_clauselist_selectivity gives selectivity over
| caluses that functional dependencies on the given relation is
| applicable. 0-based index numbers of consumed clauses are
| returned in the bitmap set estimatedclauses so that the
| estimation here after can ignore them.

I've changed this one too now.

=====
| + s1 *= dependencies_clauselist_selectivity(root, clauses, varRelid,
| + jointype, sjinfo, rel, &estimatedclauses);

The name prefix "dependency_" means "functional_dependency" here
and omitting "functional" is confusing to me. On the other hand
"functional_dependency" is quite long as prefix. Could we use
"func_dependency" or something that is shorter but meaningful?
(But this change causes renaming of many other sutff..)

oh no! Many functions in dependencies.c start with dependencies_. To
me, it's a bit of an OOP thing, which if we'd been using some other
language would have been dependencies->clauselist_selectivity(). Of
course, not all functions in that file follow that rule, but I don't
feel a pressing need to go make that any worse. Perhaps the prefix
could be func_dependency, but I really don't feel very excited about
having it that way, and even less so about making the change.

=====
The name "dependency_compatible_clause" might be meaningful if it
were "clause_is_compatible_with_(functional_)dependency" or such.

I could maybe squeeze the word "is" in there. ... OK done.

=====
dependency_compatible_walker() returns true if given node is
*not* compatible. Isn't it confusing?

Yeah.

=====
dependency_compatible_walker() seems implicitly expecting that
RestrictInfo will be given at the first. RestrictInfo might(
should be processed outside this function in _compatible_clause().

Actually, I don't really see a great need for this to be a recursive
walker type function. So I've just gone and stuck all that logic in
dependency_is_compatible_clause() instead.

=====
dependency_compatible_walker() can return two or more attriburtes
but dependency_compatible_clause() errors out in the case. Since
_walker is called only from the _clause, _walker can return
earlier with "incompatible" in such a case.

I don't quite see how it's possible for it to ever have more than 1
attnum in there. We only capture Vars from one side of a binary
OpExpr. If one side of the OpExpr is an Expr, then we'd not capture
anything, and not recurse into the Expr. Anyway, I've pulled that code
out into dependency_is_compatible_clause now.

=====
In the comment in dependencies_clauselist_selectivity(),

| /*
| * Technically we could find more than one clause for a given
| * attnum. Since these clauses must be equality clauses, we choose
| * to only take the selectivity estimate from the final clause in
| * the list for this attnum. If the attnum happens to be compared
| * to a different Const in another clause then no rows will match
| * anyway. If it happens to be compared to the same Const, then
| * ignoring the additional clause is just the thing to do.
| */
| if (dependency_implies_attribute(dependency,
| list_attnums[listidx]))

If multiple clauses include the attribute, selectivity estimates
for clauses other than the last one are waste of time. Why not the
first one but the last one?

Why not the middle one? Really it's not expected to be a common case.
If someone writes: WHERE a = 1 and a = 2; then they'll likely not get
many results back. If the same clause is duplicated then well, it
won't be the only thing that does a little needless extra work. I
don't think optimising for this is worth the trouble.

Even if all clauses should be added into estimatedclauses,
calling clause_selectivity once is enough. Since
clause_selectivity may return 1.0 for some clauses, using s2 for
the decision seems reasonable.

| if (dependency_implies_attribute(dependency,
| list_attnums[listidx]))
| {
| clause = (Node *) lfirst(l);
+ if (s2 == 1.0)
| s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,

# This '==' works since it is not a result of a calculation.

I don't think this is an important optimisation. It's a corner case if
more than one match, although not impossible. I vote to leave it as
is, and not optimise the corner case.

=====
Still in dependencies_clauselist_selectivity,
dependency_implies_attributes seems designed to return true for
at least one clause in the clauses but any failure leands to
infinite loop. I think any measure against the case is required.

I did consider this, but I really can't see a scenario that this is
possible. find_strongest_dependency() would not have found a
dependency if dependency_implies_attribute() was going to fail, so
we'd have exited the loop already. I think it's safe providing that
'clauses_attnums' is in sync with the clauses that we'll examine in
the loop over the 'clauses' list. Perhaps the while loop should have
some safety valve, but I'm not all that sure what that would be, and
since I can't see how it could become an infinite loop, I've not
bothered to think too hard about what else might be done here.

I've attached an updated patch to address Tomas' concerns and yours too.

Thank you to both for looking at my changes

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

mv_functional-deps_2017-04-06.patchapplication/octet-stream; name=mv_functional-deps_2017-04-06.patchDownload

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 277639f..7414c16 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1013,6 +1013,7 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 							   baserel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
+							   NULL,
 							   NULL);
 
 	nrows = clamp_row_est(nrows);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 03f1480..a03bf70 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -591,6 +591,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 baserel->relid,
 													 JOIN_INNER,
+													 NULL,
 													 NULL);
 
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
@@ -2573,6 +2574,7 @@ estimate_path_cost_size(PlannerInfo *root,
 										   local_param_join_conds,
 										   foreignrel->relid,
 										   JOIN_INNER,
+										   NULL,
 										   NULL);
 		local_sel *= fpinfo->local_conds_sel;
 
@@ -4447,6 +4449,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 													 fpinfo->local_conds,
 													 0,
 													 JOIN_INNER,
+													 NULL,
 													 NULL);
 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
 
@@ -4457,7 +4460,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 	if (!fpinfo->use_remote_estimate)
 		fpinfo->joinclause_sel = clauselist_selectivity(root, fpinfo->joinclauses,
 														0, fpinfo->jointype,
-														extra->sjinfo);
+														extra->sjinfo, NULL);
 
 	/* Estimate costs for bare join relation */
 	estimate_path_cost_size(root, joinrel, NIL, NIL, &rows,
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ac39c63..58b5ca9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4339,6 +4339,15 @@
       </entry>
      </row>
 
+     <row>
+      <entry><structfield>stadependencies</structfield></entry>
+      <entry><type>pg_dependencies</type></entry>
+      <entry></entry>
+      <entry>
+       Functional dependencies, serialized as <structname>pg_dependencies</> type.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index b73c66b..671963e 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -446,6 +446,151 @@ rows = (outer_cardinality * inner_cardinality) * selectivity
    in <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
 
+  <sect2 id="functional-dependencies">
+   <title>Functional Dependencies</title>
+
+   <para>
+    The simplest type of extended statistics are functional dependencies,
+    used in definitions of database normal forms. When simplified, saying that
+    <literal>b</> is functionally dependent on <literal>a</> means that
+    knowledge of value of <literal>a</> is sufficient to determine value of
+    <literal>b</>.
+   </para>
+
+   <para>
+    In normalized databases, only functional dependencies on primary keys
+    and superkeys are allowed. However, in practice, many data sets are not
+    fully normalized, for example, due to intentional denormalization for
+    performance reasons.
+   </para>
+
+   <para>
+    Functional dependencies directly affect accuracy of the estimates, as
+    conditions on the dependent column(s) do not restrict the result set,
+    resulting in underestimates.
+   </para>
+
+   <para>
+    To inform the planner about the functional dependencies, or rather to
+    instruct it to search for them during <command>ANALYZE</>, we can use
+    the <command>CREATE STATISTICS</> command.
+
+<programlisting>
+CREATE TABLE t (a INT, b INT);
+INSERT INTO t SELECT i/100, i/100 FROM generate_series(1,10000) s(i);
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
+ANALYZE t;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                             
+-------------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.095..3.118 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.367 ms
+ Execution time: 3.380 ms
+(5 rows)
+</programlisting>
+
+    The planner is now aware of the functional dependencies and considers
+    them when computing the selectivity of the second condition.  Running
+    the query without the statistics would lead to quite different estimates.
+
+<programlisting>
+DROP STATISTICS s1;
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 1;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual time=0.000..6.379 rows=100 loops=1)
+   Filter: ((a = 1) AND (b = 1))
+   Rows Removed by Filter: 9900
+ Planning time: 0.000 ms
+ Execution time: 6.379 ms
+(5 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Similarly to per-column statistics, extended statistics are stored in
+    a system catalog called <structname>pg_statistic_ext</structname>, but
+    there is also a more convenient view <structname>pg_stats_ext</structname>.
+    To inspect the statistics <literal>s1</literal> defined above,
+    you may do this:
+
+<programlisting>
+SELECT tablename, staname, attnums, depsbytes
+  FROM pg_stats_ext WHERE staname = 's1';
+ tablename | staname | attnums | depsbytes 
+-----------+---------+---------+-----------
+ t         | s1      | 1 2     |        40
+(1 row)
+</programlisting>
+
+     This shows that the statistics are defined on table <structname>t</>,
+     <structfield>attnums</structfield> lists attribute numbers of columns
+     (references <structname>pg_attribute</structname>). It also shows
+     the length in bytes of the functional dependencies, as found by
+     <command>ANALYZE</> when serialized into a <literal>bytea</> column.
+   </para>
+
+   <para>
+    When computing the selectivity, the planner inspects all conditions and
+    attempts to identify which conditions are already implied by other
+    conditions.  The selectivity estimates from any redundant conditions are
+    ignored from a selectivity point of view. In the example query above,
+    the selectivity estimates for either of the conditions may be eliminated,
+    thus improving the overall estimate.
+   </para>
+
+    <sect3 id="functional-dependencies-limitations">
+     <title>Limitations of functional dependencies</title>
+
+     <para>
+      Functional dependencies are a very simple type of statistics, and
+      as such have several limitations. The first limitation is that they
+      only work with simple equality conditions, comparing columns and constant
+      values. It's not possible to use them to eliminate equality conditions
+      comparing two columns or a column to an expression, range clauses,
+      <literal>LIKE</> or any other type of conditions.
+     </para>
+
+     <para>
+      When eliminating the implied conditions, the planner assumes that the
+      conditions are compatible. Consider the following example, violating
+      this assumption:
+
+<programlisting>
+EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1 AND b = 10;
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual time=2.992..2.992 rows=0 loops=1)
+   Filter: ((a = 1) AND (b = 10))
+   Rows Removed by Filter: 10000
+ Planning time: 0.232 ms
+ Execution time: 3.033 ms
+(5 rows)
+</programlisting>
+
+      While there are no rows with such combination of values, the planner
+      is unable to verify whether the values match - it only knows that
+      the columns are functionally dependent.
+     </para>
+
+     <para>
+      This assumption is more about queries executed on the database - in many
+      cases, it's actually satisfied (e.g. when the GUI only allows selecting
+      compatible values). But if that's not the case, functional dependencies
+      may not be a viable option.
+     </para>
+
+     <para>
+      For additional information about functional dependencies, see
+      <filename>src/backend/statistics/README.dependencies</>.
+     </para>
+
+    </sect3>
+
+  </sect2>
+
  </sect1>
 
 </chapter>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 60184a3..163d43f 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -21,8 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
-  <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
+CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable>
+  WITH ( <replaceable class="PARAMETER">option</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] )
+  ON ( <replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
   FROM <replaceable class="PARAMETER">table_name</replaceable>
 </synopsis>
 
@@ -94,6 +95,41 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
 
   </variablelist>
 
+  <refsect2 id="SQL-CREATESTATISTICS-parameters">
+   <title id="SQL-CREATESTATISTICS-parameters-title">Parameters</title>
+
+ <indexterm zone="sql-createstatistics-parameters">
+  <primary>statistics parameters</primary>
+ </indexterm>
+
+   <para>
+    The <literal>WITH</> clause can specify <firstterm>options</>
+    for the statistics. Available options are listed below.
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>dependencies</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables functional dependencies for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>ndistinct</> (<type>boolean</>)</term>
+    <listitem>
+     <para>
+      Enables ndistinct coefficients for the statistics.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
+
+  </refsect2>
  </refsect1>
 
  <refsect1>
@@ -122,7 +158,7 @@ CREATE TABLE t1 (
 INSERT INTO t1 SELECT i/100, i/500
                  FROM generate_series(1,1000000) s(i);
 
-CREATE STATISTICS s1 ON (a, b) FROM t1;
+CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t1;
 
 ANALYZE t1;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d357c8b..c19e68e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -192,7 +192,8 @@ CREATE VIEW pg_stats_ext AS
         C.relname AS tablename,
         S.staname AS staname,
         S.stakeys AS attnums,
-        length(s.standistinct) AS ndistbytes
+        length(s.standistinct::bytea) AS ndistbytes,
+        length(S.stadependencies::bytea) AS depsbytes
     FROM (pg_statistic_ext S JOIN pg_class C ON (C.oid = S.starelid))
         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace);
 
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 0750329..8d483db 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -62,10 +62,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	Oid			relid;
 	ObjectAddress parentobject,
 				childobject;
-	Datum		types[1];		/* only ndistinct defined now */
+	Datum		types[2];		/* one for each possible type of statistics */
 	int			ntypes;
 	ArrayType  *staenabled;
 	bool		build_ndistinct;
+	bool		build_dependencies;
 	bool		requested_type = false;
 
 	Assert(IsA(stmt, CreateStatsStmt));
@@ -159,7 +160,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 				 errmsg("statistics require at least 2 columns")));
 
 	/*
-	 * Sort the attnums, which makes detecting duplicies somewhat easier, and
+	 * Sort the attnums, which makes detecting duplicities somewhat easier, and
 	 * it does not hurt (it does not affect the efficiency, unlike for
 	 * indexes, for example).
 	 */
@@ -182,6 +183,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	 * recognized.
 	 */
 	build_ndistinct = false;
+	build_dependencies = false;
 	foreach(l, stmt->options)
 	{
 		DefElem    *opt = (DefElem *) lfirst(l);
@@ -191,6 +193,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 			build_ndistinct = defGetBoolean(opt);
 			requested_type = true;
 		}
+		else if (strcmp(opt->defname, "dependencies") == 0)
+		{
+			build_dependencies = defGetBoolean(opt);
+			requested_type = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -199,12 +206,17 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 	/* If no statistic type was specified, build them all. */
 	if (!requested_type)
+	{
 		build_ndistinct = true;
+		build_dependencies = true;
+	}
 
 	/* construct the char array of enabled statistic types */
 	ntypes = 0;
 	if (build_ndistinct)
 		types[ntypes++] = CharGetDatum(STATS_EXT_NDISTINCT);
+	if (build_dependencies)
+		types[ntypes++] = CharGetDatum(STATS_EXT_DEPENDENCIES);
 	Assert(ntypes > 0);
 	staenabled = construct_array(types, ntypes, CHAROID, 1, true, 'c');
 
@@ -222,6 +234,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/* no statistics build yet */
 	nulls[Anum_pg_statistic_ext_standistinct - 1] = true;
+	nulls[Anum_pg_statistic_ext_stadependencies - 1] = true;
 
 	/* insert it into pg_statistic_ext */
 	statrel = heap_open(StatisticExtRelationId, RowExclusiveLock);
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index af2934a..1ba5781 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -22,6 +22,7 @@
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
+#include "statistics/statistics.h"
 
 
 /*
@@ -60,23 +61,30 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause,
  * subclauses.  However, that's only right if the subclauses have independent
  * probabilities, and in reality they are often NOT independent.  So,
  * we want to be smarter where we can.
-
- * Currently, the only extra smarts we have is to recognize "range queries",
- * such as "x > 34 AND x < 42".  Clauses are recognized as possible range
- * query components if they are restriction opclauses whose operators have
- * scalarltsel() or scalargtsel() as their restriction selectivity estimator.
- * We pair up clauses of this form that refer to the same variable.  An
- * unpairable clause of this kind is simply multiplied into the selectivity
- * product in the normal way.  But when we find a pair, we know that the
- * selectivities represent the relative positions of the low and high bounds
- * within the column's range, so instead of figuring the selectivity as
- * hisel * losel, we can figure it as hisel + losel - 1.  (To visualize this,
- * see that hisel is the fraction of the range below the high bound, while
- * losel is the fraction above the low bound; so hisel can be interpreted
- * directly as a 0..1 value but we need to convert losel to 1-losel before
- * interpreting it as a value.  Then the available range is 1-losel to hisel.
- * However, this calculation double-excludes nulls, so really we need
- * hisel + losel + null_frac - 1.)
+ *
+ * When 'rel' is not null and rtekind = RTE_RELATION, we'll try to apply
+ * selectivity estimates using any extended statistcs on 'rel'.
+ *
+ * If we identify such extended statistics exist, we try to apply them.
+ * Currently we only have (soft) functional dependencies, so apply these in as
+ * many cases as possible, and fall back on normal estimates for remaining
+ * clauses.
+ *
+ * We also recognize "range queries", such as "x > 34 AND x < 42".  Clauses
+ * are recognized as possible range query components if they are restriction
+ * opclauses whose operators have scalarltsel() or scalargtsel() as their
+ * restriction selectivity estimator.  We pair up clauses of this form that
+ * refer to the same variable.  An unpairable clause of this kind is simply
+ * multiplied into the selectivity product in the normal way.  But when we
+ * find a pair, we know that the selectivities represent the relative
+ * positions of the low and high bounds within the column's range, so instead
+ * of figuring the selectivity as hisel * losel, we can figure it as hisel +
+ * losel - 1.  (To visualize this, see that hisel is the fraction of the range
+ * below the high bound, while losel is the fraction above the low bound; so
+ * hisel can be interpreted directly as a 0..1 value but we need to convert
+ * losel to 1-losel before interpreting it as a value.  Then the available
+ * range is 1-losel to hisel.  However, this calculation double-excludes
+ * nulls, so really we need hisel + losel + null_frac - 1.)
  *
  * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
  * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
@@ -93,33 +101,70 @@ clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo)
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel)
 {
 	Selectivity s1 = 1.0;
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
+	Bitmapset  *estimatedclauses = NULL;
+	int			listidx;
 
 	/*
-	 * If there's exactly one clause, then no use in trying to match up pairs,
-	 * so just go directly to clause_selectivity().
+	 * If there's exactly one clause, then extended statistics is futile at
+	 * this level (we might be able to apply them later if it's AND/OR
+	 * clause). So just go directly to clause_selectivity().
 	 */
 	if (list_length(clauses) == 1)
 		return clause_selectivity(root, (Node *) linitial(clauses),
-								  varRelid, jointype, sjinfo);
+								  varRelid, jointype, sjinfo, rel);
+
+	/*
+	 * When a relation of RTE_RELATION is given as 'rel', we'll try to
+	 * perform selectivity estimation using extended statistics.
+	 */
+	if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+	{
+		/*
+		 * Perform selectivity estimations on any clauses found applicable by
+		 * dependencies_clauselist_selectivity. The 0-based list position of
+		 * estimated clauses will be populated in 'estimatedclauses'.
+		 */
+		s1 *= dependencies_clauselist_selectivity(root, clauses, varRelid,
+								   jointype, sjinfo, rel, &estimatedclauses);
+
+		/*
+		 * This would be the place to apply any other types of extended
+		 * statistics selectivity estimations for remaining clauses.
+		 */
+	}
 
 	/*
-	 * Initial scan over clauses.  Anything that doesn't look like a potential
-	 * rangequery clause gets multiplied into s1 and forgotten. Anything that
-	 * does gets inserted into an rqlist entry.
+	 * Apply normal selectivity estimates for remaining clauses. We'll be
+	 * careful to skip any clauses which were already estimated above.
+	 *
+	 * Anything that doesn't look like a potential rangequery clause gets
+	 * multiplied into s1 and forgotten. Anything that does gets inserted into
+	 * an rqlist entry.
 	 */
+	listidx = -1;
 	foreach(l, clauses)
 	{
 		Node	   *clause = (Node *) lfirst(l);
 		RestrictInfo *rinfo;
 		Selectivity s2;
 
+		listidx++;
+
+		/*
+		 * Skip this clause if it's already been estimated by some other
+		 * statistics above.
+		 */
+		if (bms_is_member(listidx, estimatedclauses))
+			continue;
+
 		/* Always compute the selectivity using clause_selectivity */
-		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo);
+		s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo, rel);
 
 		/*
 		 * Check for being passed a RestrictInfo.
@@ -484,7 +529,8 @@ clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo)
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *rel)
 {
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
@@ -604,7 +650,8 @@ clause_selectivity(PlannerInfo *root,
 								  (Node *) get_notclausearg((Expr *) clause),
 									  varRelid,
 									  jointype,
-									  sjinfo);
+									  sjinfo,
+									  rel);
 	}
 	else if (and_clause(clause))
 	{
@@ -613,7 +660,8 @@ clause_selectivity(PlannerInfo *root,
 									((BoolExpr *) clause)->args,
 									varRelid,
 									jointype,
-									sjinfo);
+									sjinfo,
+									rel);
 	}
 	else if (or_clause(clause))
 	{
@@ -632,7 +680,8 @@ clause_selectivity(PlannerInfo *root,
 												(Node *) lfirst(arg),
 												varRelid,
 												jointype,
-												sjinfo);
+												sjinfo,
+												rel);
 
 			s1 = s1 + s2 - s1 * s2;
 		}
@@ -725,7 +774,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((RelabelType *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								rel);
 	}
 	else if (IsA(clause, CoerceToDomain))
 	{
@@ -734,7 +784,8 @@ clause_selectivity(PlannerInfo *root,
 								(Node *) ((CoerceToDomain *) clause)->arg,
 								varRelid,
 								jointype,
-								sjinfo);
+								sjinfo,
+								rel);
 	}
 	else
 	{
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 92de2b7..a2093ac 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3713,7 +3713,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									jointype,
-									sjinfo);
+									sjinfo,
+									NULL);
 
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
@@ -3736,7 +3737,8 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 									joinquals,
 									0,
 									JOIN_INNER,
-									&norm_sjinfo);
+									&norm_sjinfo,
+									NULL);
 
 	/* Avoid leaking a lot of ListCells */
 	if (jointype == JOIN_ANTI)
@@ -3903,7 +3905,7 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 		Node	   *qual = (Node *) lfirst(l);
 
 		/* Note that clause_selectivity will be able to cache its result */
-		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo);
+		selec *= clause_selectivity(root, qual, 0, JOIN_INNER, &sjinfo, NULL);
 	}
 
 	/* Apply it to the input relation sizes */
@@ -3939,7 +3941,8 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 							   rel->baserestrictinfo,
 							   0,
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   rel);
 
 	rel->rows = clamp_row_est(nrows);
 
@@ -3976,7 +3979,8 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
 							   allclauses,
 							   rel->relid,		/* do not use 0! */
 							   JOIN_INNER,
-							   NULL);
+							   NULL,
+							   rel);
 	nrows = clamp_row_est(nrows);
 	/* For safety, make sure result is not more than the base estimate */
 	if (nrows > rel->rows)
@@ -4142,12 +4146,14 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										joinquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL);
 		pselec = clauselist_selectivity(root,
 										pushedquals,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL);
 
 		/* Avoid leaking a lot of ListCells */
 		list_free(joinquals);
@@ -4159,7 +4165,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 										restrictlist,
 										0,
 										jointype,
-										sjinfo);
+										sjinfo,
+										NULL);
 		pselec = 0.0;			/* not used, keep compiler quiet */
 	}
 
@@ -4454,7 +4461,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 				Selectivity csel;
 
 				csel = clause_selectivity(root, (Node *) rinfo,
-										  0, jointype, sjinfo);
+										  0, jointype, sjinfo, NULL);
 				thisfksel = Min(thisfksel, csel);
 			}
 			fkselec *= thisfksel;
diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c
index 9cbcaed..735697d 100644
--- a/src/backend/optimizer/util/orclauses.c
+++ b/src/backend/optimizer/util/orclauses.c
@@ -280,7 +280,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 	 * saving work later.)
 	 */
 	or_selec = clause_selectivity(root, (Node *) or_rinfo,
-								  0, JOIN_INNER, NULL);
+								  0, JOIN_INNER, NULL, rel);
 
 	/*
 	 * The clause is only worth adding to the query if it rejects a useful
@@ -344,7 +344,7 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Compute inner-join size */
 		orig_selec = clause_selectivity(root, (Node *) join_or_rinfo,
-										0, JOIN_INNER, &sjinfo);
+										0, JOIN_INNER, &sjinfo, NULL);
 
 		/* And hack cached selectivity so join size remains the same */
 		join_or_rinfo->norm_selec = orig_selec / or_selec;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cc88dcc..e35ea0d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1308,6 +1308,18 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			stainfos = lcons(info, stainfos);
 		}
 
+		if (statext_is_kind_built(htup, STATS_EXT_DEPENDENCIES))
+		{
+			StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+			info->statOid = statOid;
+			info->rel = rel;
+			info->kind = STATS_EXT_DEPENDENCIES;
+			info->keys = bms_copy(keys);
+
+			stainfos = lcons(info, stainfos);
+		}
+
 		ReleaseSysCache(htup);
 		bms_free(keys);
 	}
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
index b3615bd..3404e45 100644
--- a/src/backend/statistics/Makefile
+++ b/src/backend/statistics/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/statistics
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = extended_stats.o mvdistinct.o
+OBJS = extended_stats.o dependencies.o mvdistinct.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/statistics/README b/src/backend/statistics/README
index beb7c24..af76511 100644
--- a/src/backend/statistics/README
+++ b/src/backend/statistics/README
@@ -8,10 +8,72 @@ not true, resulting in estimation errors.
 Extended statistics track different types of dependencies between the columns,
 hopefully improving the estimates and producing better plans.
 
-Currently we only have one type of extended statistics - ndistinct
-coefficients, and we use it to improve estimates of grouping queries. See
-README.ndistinct for details.
 
+Types of statistics
+-------------------
+
+There are two kinds of extended statistics:
+
+    (a) ndistinct coefficients
+
+    (b) soft functional dependencies (README.dependencies)
+
+
+Compatible clause types
+-----------------------
+
+Each type of statistics may be used to estimate some subset of clause types.
+
+    (a) functional dependencies - equality clauses (AND), possibly IS NULL
+
+Currently, only OpExprs in the form Var op Const, or Const op Var are
+supported, however it's feasible to expand the code later to also estimate the
+selectivities on clauses such as Var op Var.
+
+
+Complex clauses
+---------------
+
+We also support estimating more complex clauses - essentially AND/OR clauses
+with (Var op Const) as leaves, as long as all the referenced attributes are
+covered by a single statistics.
+
+For example this condition
+
+    (a=1) AND ((b=2) OR ((c=3) AND (d=4)))
+
+may be estimated using statistics on (a,b,c,d). If we only have statistics on
+(b,c,d) we may estimate the second part, and estimate (a=1) using simple stats.
+
+If we only have statistics on (a,b,c) we can't apply it at all at this point,
+but it's worth pointing out clauselist_selectivity() works recursively and when
+handling the second part (the OR-clause), we'll be able to apply the statistics.
+
+Note: The multi-statistics estimation patch also makes it possible to pass some
+clauses as 'conditions' into the deeper parts of the expression tree.
+
+
+Selectivity estimation
+----------------------
+
+Throughout the planner clauselist_selectivity() still remains in charge of
+most selectivity estimate requests. clauselist_selectivity() can be instructed
+to try to make use of any extended statistics on the given RelOptInfo, which
+it will do, if:
+
+    (a) An actual valid RelOptInfo was given. Join relations are passed in as
+        NULL, therefore are invalid.
+
+    (b) The relation given actually has any extended statistics defined which
+        are actually built.
+
+When the above conditions are met, clauselist_selectivity() first attempts to
+pass the clause list off to the extended statistics selectivity estimation
+function. This functions may not find any clauses which is can perform any
+estimations on. In such cases these clauses are simply ignored. When actual
+estimation work is performed in these functions they're expected to mark which
+clauses they've performed estimations for so that any other function
+performing estimations knows which clauses are to be skipped.
 
 Size of sample in ANALYZE
 -------------------------
diff --git a/src/backend/statistics/README.dependencies b/src/backend/statistics/README.dependencies
new file mode 100644
index 0000000..7bc2533
--- /dev/null
+++ b/src/backend/statistics/README.dependencies
@@ -0,0 +1,119 @@
+Soft functional dependencies
+============================
+
+Functional dependencies are a concept well described in relational theory,
+particularly in the definition of normalization and "normal forms". Wikipedia
+has a nice definition of a functional dependency [1]:
+
+    In a given table, an attribute Y is said to have a functional dependency
+    on a set of attributes X (written X -> Y) if and only if each X value is
+    associated with precisely one Y value. For example, in an "Employee"
+    table that includes the attributes "Employee ID" and "Employee Date of
+    Birth", the functional dependency
+
+        {Employee ID} -> {Employee Date of Birth}
+
+    would hold. It follows from the previous two sentences that each
+    {Employee ID} is associated with precisely one {Employee Date of Birth}.
+
+    [1] https://en.wikipedia.org/wiki/Functional_dependency
+
+In practical terms, functional dependencies mean that a value in one column
+determines values in some other column. Consider for example this trivial
+table with two integer columns:
+
+    CREATE TABLE t (a INT, b INT)
+        AS SELECT i, i/10 FROM generate_series(1,100000) s(i);
+
+Clearly, knowledge of the value in column 'a' is sufficient to determine the
+value in column 'b', as it's simply (a/10). A more practical example may be
+addresses, where the knowledge of a ZIP code (usually) determines city. Larger
+cities may have multiple ZIP codes, so the dependency can't be reversed.
+
+Many datasets might be normalized not to contain such dependencies, but often
+it's not practical for various reasons. In some cases, it's actually a conscious
+design choice to model the dataset in a denormalized way, either because of
+performance or to make querying easier.
+
+
+Soft dependencies
+-----------------
+
+Real-world data sets often contain data errors, either because of data entry
+mistakes (user mistyping the ZIP code) or perhaps issues in generating the
+data (e.g. a ZIP code mistakenly assigned to two cities in different states).
+
+A strict implementation would either ignore dependencies in such cases,
+rendering the approach mostly useless even for slightly noisy data sets, or
+result in sudden changes in behavior depending on minor differences between
+samples provided to ANALYZE.
+
+For this reason, the statistics implements "soft" functional dependencies,
+associating each functional dependency with a degree of validity (a number
+between 0 and 1). This degree is then used to combine selectivities in a
+smooth manner.
+
+
+Mining dependencies (ANALYZE)
+-----------------------------
+
+The current algorithm is fairly simple - generate all possible functional
+dependencies, and for each one count the number of rows consistent with it.
+Then use the fraction of rows (supporting/total) as the degree.
+
+To count the rows consistent with the dependency (a => b):
+
+ (a) Sort the data lexicographically, i.e. first by 'a' then 'b'.
+
+ (b) For each group of rows with the same 'a' value, count the number of
+     distinct values in 'b'.
+
+ (c) If there's a single distinct value in 'b', the rows are consistent with
+     the functional dependency, otherwise they contradict it.
+
+The algorithm also requires a minimum size of the group to consider it
+consistent (currently 3 rows in the sample). Small groups make it less likely
+to break the consistency.
+
+
+Clause reduction (planner/optimizer)
+------------------------------------
+
+Applying the functional dependencies is fairly simple - given a list of
+equality clauses, we compute selectivities of each clause and then use the
+degree to combine them using this formula
+
+    P(a=?,b=?) = P(a=?) * (d + (1-d) * P(b=?))
+
+Where 'd' is the degree of functional dependence (a=>b).
+
+With more than two equality clauses, this process happens recursively. For
+example for (a,b,c) we first use (a,b=>c) to break the computation into
+
+    P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))
+
+and then apply (a=>b) the same way on P(a=?,b=?).
+
+
+Consistency of clauses
+----------------------
+
+Functional dependencies only express general dependencies between columns,
+without referencing particular values. This assumes that the equality clauses
+are in fact consistent with the functional dependency, i.e. that given a
+dependency (a=>b), the value in (b=?) clause is the value determined by (a=?).
+If that's not the case, the clauses are "inconsistent" with the functional
+dependency and the result will be over-estimation.
+
+This may happen, for example, when using conditions on the ZIP code and city
+name with mismatching values (ZIP code for a different city), etc. In such a
+case, the result set will be empty, but we'll estimate the selectivity using
+the ZIP code condition.
+
+In this case, the default estimation based on AVIA principle happens to work
+better, but mostly by chance.
+
+This issue is the price for the simplicity of functional dependencies. If the
+application frequently constructs queries with clauses inconsistent with
+functional dependencies present in the data, the best solution is not to
+use functional dependencies, but one of the more complex types of statistics.
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
new file mode 100644
index 0000000..fb958e1
--- /dev/null
+++ b/src/backend/statistics/dependencies.c
@@ -0,0 +1,1079 @@
+/*-------------------------------------------------------------------------
+ *
+ * dependencies.c
+ *	  POSTGRES functional dependencies
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/statistics/dependencies.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic_ext.h"
+#include "lib/stringinfo.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/var.h"
+#include "nodes/nodes.h"
+#include "nodes/relation.h"
+#include "statistics/extended_stats_internal.h"
+#include "statistics/statistics.h"
+#include "utils/bytea.h"
+#include "utils/fmgroids.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "utils/typcache.h"
+
+/*
+ * Internal state for DependencyGenerator of dependencies. Dependencies are similar to
+ * k-permutations of n elements, except that the order does not matter for the
+ * first (k-1) elements. That is, (a,b=>c) and (b,a=>c) are equivalent.
+ */
+typedef struct DependencyGeneratorData
+{
+	int			k;				/* size of the dependency */
+	int			n;				/* number of possible attributes */
+	int			current;		/* next dependency to return (index) */
+	AttrNumber	ndependencies;	/* number of dependencies generated */
+	AttrNumber *dependencies;	/* array of pre-generated dependencies	*/
+}	DependencyGeneratorData;
+
+typedef DependencyGeneratorData *DependencyGenerator;
+
+static void generate_dependencies_recurse(DependencyGenerator state,
+						   int index, AttrNumber start, AttrNumber *current);
+static void generate_dependencies(DependencyGenerator state);
+static DependencyGenerator DependencyGenerator_init(int n, int k);
+static void DependencyGenerator_free(DependencyGenerator state);
+static AttrNumber *DependencyGenerator_next(DependencyGenerator state);
+static double dependency_degree(int numrows, HeapTuple *rows, int k,
+			 AttrNumber *dependency, VacAttrStats **stats, Bitmapset *attrs);
+static bool dependency_is_fully_matched(MVDependency *dependency,
+							Bitmapset *attnums);
+static bool dependency_implies_attribute(MVDependency *dependency,
+							 AttrNumber attnum);
+static bool dependency_is_compatible_clause(Node *clause, Index relid,
+							 AttrNumber *attnum);
+static MVDependency *find_strongest_dependency(StatisticExtInfo *stats,
+						  MVDependencies *dependencies,
+						  Bitmapset *attnums);
+
+static void
+generate_dependencies_recurse(DependencyGenerator state, int index,
+							  AttrNumber start, AttrNumber *current)
+{
+	/*
+	 * The generator handles the first (k-1) elements differently from the
+	 * last element.
+	 */
+	if (index < (state->k - 1))
+	{
+		AttrNumber	i;
+
+		/*
+		 * The first (k-1) values have to be in ascending order, which we
+		 * generate recursively.
+		 */
+
+		for (i = start; i < state->n; i++)
+		{
+			current[index] = i;
+			generate_dependencies_recurse(state, (index + 1), (i + 1), current);
+		}
+	}
+	else
+	{
+		int			i;
+
+		/*
+		 * the last element is the implied value, which does not respect the
+		 * ascending order. We just need to check that the value is not in the
+		 * first (k-1) elements.
+		 */
+
+		for (i = 0; i < state->n; i++)
+		{
+			int			j;
+			bool		match = false;
+
+			current[index] = i;
+
+			for (j = 0; j < index; j++)
+			{
+				if (current[j] == i)
+				{
+					match = true;
+					break;
+				}
+			}
+
+			/*
+			 * If the value is not found in the first part of the dependency,
+			 * we're done.
+			 */
+			if (!match)
+			{
+				state->dependencies = (AttrNumber *) repalloc(state->dependencies,
+				 state->k * (state->ndependencies + 1) * sizeof(AttrNumber));
+				memcpy(&state->dependencies[(state->k * state->ndependencies)],
+					   current, state->k * sizeof(AttrNumber));
+				state->ndependencies++;
+			}
+		}
+	}
+}
+
+/* generate all dependencies (k-permutations of n elements) */
+static void
+generate_dependencies(DependencyGenerator state)
+{
+	AttrNumber *current = (AttrNumber *) palloc0(sizeof(AttrNumber) * state->k);
+
+	generate_dependencies_recurse(state, 0, 0, current);
+
+	pfree(current);
+}
+
+/*
+ * initialize the DependencyGenerator of variations, and prebuild the variations
+ *
+ * This pre-builds all the variations. We could also generate them in
+ * DependencyGenerator_next(), but this seems simpler.
+ */
+static DependencyGenerator
+DependencyGenerator_init(int n, int k)
+{
+	DependencyGenerator state;
+
+	Assert((n >= k) && (k > 0));
+
+	/* allocate the DependencyGenerator state */
+	state = (DependencyGenerator) palloc0(sizeof(DependencyGeneratorData));
+	state->dependencies = (AttrNumber *) palloc(k * sizeof(AttrNumber));
+
+	state->ndependencies = 0;
+	state->current = 0;
+	state->k = k;
+	state->n = n;
+
+	/* now actually pre-generate all the variations */
+	generate_dependencies(state);
+
+	return state;
+}
+
+/* free the DependencyGenerator state */
+static void
+DependencyGenerator_free(DependencyGenerator state)
+{
+	pfree(state->dependencies);
+	pfree(state);
+
+}
+
+/* generate next combination */
+static AttrNumber *
+DependencyGenerator_next(DependencyGenerator state)
+{
+	if (state->current == state->ndependencies)
+		return NULL;
+
+	return &state->dependencies[state->k * state->current++];
+}
+
+
+/*
+ * validates functional dependency on the data
+ *
+ * An actual work horse of detecting functional dependencies. Given a variation
+ * of k attributes, it checks that the first (k-1) are sufficient to determine
+ * the last one.
+ */
+static double
+dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
+				  VacAttrStats **stats, Bitmapset *attrs)
+{
+	int			i,
+				j;
+	int			nvalues = numrows * k;
+	MultiSortSupport mss;
+	SortItem   *items;
+	Datum	   *values;
+	bool	   *isnull;
+	int		   *attnums;
+
+	/* counters valid within a group */
+	int			group_size = 0;
+	int			n_violations = 0;
+
+	/* total number of rows supporting (consistent with) the dependency */
+	int			n_supporting_rows = 0;
+
+	/* Make sure we have at least two input attributes. */
+	Assert(k >= 2);
+
+	/* sort info for all attributes columns */
+	mss = multi_sort_init(k);
+
+	/* data for the sort */
+	items = (SortItem *) palloc(numrows * sizeof(SortItem));
+	values = (Datum *) palloc(sizeof(Datum) * nvalues);
+	isnull = (bool *) palloc(sizeof(bool) * nvalues);
+
+	/* fix the pointers to values/isnull */
+	for (i = 0; i < numrows; i++)
+	{
+		items[i].values = &values[i * k];
+		items[i].isnull = &isnull[i * k];
+	}
+
+	/*
+	 * Transform the bms into an array, to make accessing i-th member easier.
+	 */
+	attnums = (int *) palloc(sizeof(int) * bms_num_members(attrs));
+	i = 0;
+	j = -1;
+	while ((j = bms_next_member(attrs, j)) >= 0)
+		attnums[i++] = j;
+
+	/*
+	 * Verify the dependency (a,b,...)->z, using a rather simple algorithm:
+	 *
+	 * (a) sort the data lexicographically
+	 *
+	 * (b) split the data into groups by first (k-1) columns
+	 *
+	 * (c) for each group count different values in the last column
+	 */
+
+	/* prepare the sort function for the first dimension, and SortItem array */
+	for (i = 0; i < k; i++)
+	{
+		VacAttrStats *colstat = stats[dependency[i]];
+		TypeCacheEntry *type;
+
+		type = lookup_type_cache(colstat->attrtypid, TYPECACHE_LT_OPR);
+		if (type->lt_opr == InvalidOid) /* shouldn't happen */
+			elog(ERROR, "cache lookup failed for ordering operator for type %u",
+				 colstat->attrtypid);
+
+		/* prepare the sort function for this dimension */
+		multi_sort_add_dimension(mss, i, type->lt_opr);
+
+		/* accumulate all the data for both columns into an array and sort it */
+		for (j = 0; j < numrows; j++)
+		{
+			items[j].values[i] =
+				heap_getattr(rows[j], attnums[dependency[i]],
+							 stats[i]->tupDesc, &items[j].isnull[i]);
+		}
+	}
+
+	/* sort the items so that we can detect the groups */
+	qsort_arg((void *) items, numrows, sizeof(SortItem),
+			  multi_sort_compare, mss);
+
+	/*
+	 * Walk through the sorted array, split it into rows according to the
+	 * first (k-1) columns. If there's a single value in the last column, we
+	 * count the group as 'supporting' the functional dependency. Otherwise we
+	 * count it as contradicting.
+	 *
+	 * We also require a group to have a minimum number of rows to be
+	 * considered useful for supporting the dependency. Contradicting groups
+	 * may be of any size, though.
+	 *
+	 * XXX The minimum size requirement makes it impossible to identify case
+	 * when both columns are unique (or nearly unique), and therefore
+	 * trivially functionally dependent.
+	 */
+
+	/* start with the first row forming a group */
+	group_size = 1;
+
+	/* loop 1 beyond the end of the array so that we count the final group */
+	for (i = 1; i <= numrows; i++)
+	{
+		/*
+		 * Check if the group ended, which may be either because we processed
+		 * all the items (i==numrows), or because the i-th item is not equal
+		 * to the preceding one.
+		 */
+		if (i == numrows ||
+		multi_sort_compare_dims(0, k - 2, &items[i - 1], &items[i], mss) != 0)
+		{
+			/*
+			 * If no violations were found in the group then track the rows of
+			 * the group as supporting the functional dependency.
+			 */
+			if (n_violations == 0)
+				n_supporting_rows += group_size;
+
+			/* Reset counters for the new group */
+			n_violations = 0;
+			group_size = 1;
+			continue;
+		}
+		/* first columns match, but the last one does not (so contradicting) */
+		else if (multi_sort_compare_dim(k - 1, &items[i - 1], &items[i], mss) != 0)
+			n_violations++;
+
+		group_size++;
+	}
+
+	pfree(items);
+	pfree(values);
+	pfree(isnull);
+	pfree(mss);
+
+	/* Compute the 'degree of validity' as (supporting/total). */
+	return (n_supporting_rows * 1.0 / numrows);
+}
+
+/*
+ * detects functional dependencies between groups of columns
+ *
+ * Generates all possible subsets of columns (variations) and computes
+ * the degree of validity for each one. For example with a statistic on
+ * three columns (a,b,c) there are 9 possible dependencies
+ *
+ *	   two columns			  three columns
+ *	   -----------			  -------------
+ *	   (a) -> b				  (a,b) -> c
+ *	   (a) -> c				  (a,c) -> b
+ *	   (b) -> a				  (b,c) -> a
+ *	   (b) -> c
+ *	   (c) -> a
+ *	   (c) -> b
+ */
+MVDependencies *
+statext_dependencies_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
+						   VacAttrStats **stats)
+{
+	int			i,
+				j,
+				k;
+	int			numattrs;
+	int		   *attnums;
+
+	/* result */
+	MVDependencies *dependencies = NULL;
+
+	numattrs = bms_num_members(attrs);
+
+	/*
+	 * Transform the bms into an array, to make accessing i-th member easier.
+	 */
+	attnums = palloc(sizeof(int) * bms_num_members(attrs));
+	i = 0;
+	j = -1;
+	while ((j = bms_next_member(attrs, j)) >= 0)
+		attnums[i++] = j;
+
+	Assert(numattrs >= 2);
+
+	/*
+	 * We'll try build functional dependencies starting from the smallest ones
+	 * covering just 2 columns, to the largest ones, covering all columns
+	 * included in the statistics. We start from the smallest ones because we
+	 * want to be able to skip already implied ones.
+	 */
+	for (k = 2; k <= numattrs; k++)
+	{
+		AttrNumber *dependency; /* array with k elements */
+
+		/* prepare a DependencyGenerator of variation */
+		DependencyGenerator DependencyGenerator = DependencyGenerator_init(numattrs, k);
+
+		/* generate all possible variations of k values (out of n) */
+		while ((dependency = DependencyGenerator_next(DependencyGenerator)))
+		{
+			double		degree;
+			MVDependency *d;
+
+			/* compute how valid the dependency seems */
+			degree = dependency_degree(numrows, rows, k, dependency, stats, attrs);
+
+			/*
+			 * if the dependency seems entirely invalid, don't store it it
+			 */
+			if (degree == 0.0)
+				continue;
+
+			d = (MVDependency *) palloc0(offsetof(MVDependency, attributes)
+										 + k * sizeof(AttrNumber));
+
+			/* copy the dependency (and keep the indexes into stakeys) */
+			d->degree = degree;
+			d->nattributes = k;
+			for (i = 0; i < k; i++)
+				d->attributes[i] = attnums[dependency[i]];
+
+			/* initialize the list of dependencies */
+			if (dependencies == NULL)
+			{
+				dependencies
+					= (MVDependencies *) palloc0(sizeof(MVDependencies));
+
+				dependencies->magic = STATS_DEPS_MAGIC;
+				dependencies->type = STATS_DEPS_TYPE_BASIC;
+				dependencies->ndeps = 0;
+			}
+
+			dependencies->ndeps++;
+			dependencies = (MVDependencies *) repalloc(dependencies,
+											   offsetof(MVDependencies, deps)
+							   + dependencies->ndeps * sizeof(MVDependency));
+
+			dependencies->deps[dependencies->ndeps - 1] = d;
+		}
+
+		/*
+		 * we're done with variations of k elements, so free the
+		 * DependencyGenerator
+		 */
+		DependencyGenerator_free(DependencyGenerator);
+	}
+
+	return dependencies;
+}
+
+
+/*
+ * Serialize list of dependencies into a bytea value.
+ */
+bytea *
+statext_dependencies_serialize(MVDependencies * dependencies)
+{
+	int			i;
+	bytea	   *output;
+	char	   *tmp;
+	Size		len;
+
+	/* we need to store ndeps, with a number of attributes for each one */
+	len = VARHDRSZ + SizeOfDependencies
+		+ dependencies->ndeps * SizeOfDependency;
+
+	/* and also include space for the actual attribute numbers and degrees */
+	for (i = 0; i < dependencies->ndeps; i++)
+		len += (sizeof(AttrNumber) * dependencies->deps[i]->nattributes);
+
+	output = (bytea *) palloc0(len);
+	SET_VARSIZE(output, len);
+
+	tmp = VARDATA(output);
+
+	/* Store the base struct values (magic, type, ndeps) */
+	memcpy(tmp, &dependencies->magic, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(tmp, &dependencies->type, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(tmp, &dependencies->ndeps, sizeof(uint32));
+	tmp += sizeof(uint32);
+
+	/* store number of attributes and attribute numbers for each dependency */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *d = dependencies->deps[i];
+
+		memcpy(tmp, d, SizeOfDependency);
+		tmp += SizeOfDependency;
+
+		memcpy(tmp, d->attributes, sizeof(AttrNumber) * d->nattributes);
+		tmp += sizeof(AttrNumber) * d->nattributes;
+
+		Assert(tmp <= ((char *) output + len));
+	}
+
+	return output;
+}
+
+/*
+ * Reads serialized dependencies into MVDependencies structure.
+ */
+MVDependencies *
+statext_dependencies_deserialize(bytea *data)
+{
+	int			i;
+	Size		min_expected_size;
+	MVDependencies *dependencies;
+	char	   *tmp;
+
+	if (data == NULL)
+		return NULL;
+
+	if (VARSIZE_ANY_EXHDR(data) < SizeOfDependencies)
+		elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), SizeOfDependencies);
+
+	/* read the MVDependencies header */
+	dependencies = (MVDependencies *) palloc0(sizeof(MVDependencies));
+
+	/* initialize pointer to the data part (skip the varlena header) */
+	tmp = VARDATA_ANY(data);
+
+	/* read the header fields and perform basic sanity checks */
+	memcpy(&dependencies->magic, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(&dependencies->type, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+	memcpy(&dependencies->ndeps, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+
+	if (dependencies->magic != STATS_DEPS_MAGIC)
+		elog(ERROR, "invalid dependency magic %d (expected %d)",
+			 dependencies->magic, STATS_DEPS_MAGIC);
+
+	if (dependencies->type != STATS_DEPS_TYPE_BASIC)
+		elog(ERROR, "invalid dependency type %d (expected %d)",
+			 dependencies->type, STATS_DEPS_TYPE_BASIC);
+
+	if (dependencies->ndeps == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid zero-length item array in MVDependencies")));
+
+	/* what minimum bytea size do we expect for those parameters */
+	min_expected_size = SizeOfDependencies +
+		dependencies->ndeps * (SizeOfDependency +
+							   sizeof(AttrNumber) * 2);
+
+	if (VARSIZE_ANY_EXHDR(data) < min_expected_size)
+		elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
+			 VARSIZE_ANY_EXHDR(data), min_expected_size);
+
+	/* allocate space for the MCV items */
+	dependencies = repalloc(dependencies, offsetof(MVDependencies, deps)
+							+ (dependencies->ndeps * sizeof(MVDependency *)));
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		double		degree;
+		AttrNumber	k;
+		MVDependency *d;
+
+		/* degree of validity */
+		memcpy(&degree, tmp, sizeof(double));
+		tmp += sizeof(double);
+
+		/* number of attributes */
+		memcpy(&k, tmp, sizeof(AttrNumber));
+		tmp += sizeof(AttrNumber);
+
+		/* is the number of attributes valid? */
+		Assert((k >= 2) && (k <= STATS_MAX_DIMENSIONS));
+
+		/* now that we know the number of attributes, allocate the dependency */
+		d = (MVDependency *) palloc0(offsetof(MVDependency, attributes)
+									 + (k * sizeof(AttrNumber)));
+
+		d->degree = degree;
+		d->nattributes = k;
+
+		/* copy attribute numbers */
+		memcpy(d->attributes, tmp, sizeof(AttrNumber) * d->nattributes);
+		tmp += sizeof(AttrNumber) * d->nattributes;
+
+		dependencies->deps[i] = d;
+
+		/* still within the bytea */
+		Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
+	}
+
+	/* we should have consumed the whole bytea exactly */
+	Assert(tmp == ((char *) data + VARSIZE_ANY(data)));
+
+	return dependencies;
+}
+
+/*
+ * dependency_is_fully_matched
+ *		checks that a functional dependency is fully matched given clauses on
+ *		attributes (assuming the clauses are suitable equality clauses)
+ */
+static bool
+dependency_is_fully_matched(MVDependency * dependency, Bitmapset *attnums)
+{
+	int			j;
+
+	/*
+	 * Check that the dependency actually is fully covered by clauses. We have
+	 * to translate all attribute numbers, as those are referenced
+	 */
+	for (j = 0; j < dependency->nattributes; j++)
+	{
+		int			attnum = dependency->attributes[j];
+
+		if (!bms_is_member(attnum, attnums))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * dependency_implies_attribute
+ *		check that the attnum matches is implied by the functional dependency
+ */
+static bool
+dependency_implies_attribute(MVDependency * dependency, AttrNumber attnum)
+{
+	if (attnum == dependency->attributes[dependency->nattributes - 1])
+		return true;
+
+	return false;
+}
+
+/*
+ * staext_dependencies_load
+ *		Load the functional dependencies for the indicated pg_statistic_ext tuple
+ */
+MVDependencies *
+staext_dependencies_load(Oid mvoid)
+{
+	bool		isnull;
+	Datum		deps;
+
+	/*
+	 * Prepare to scan pg_statistic_ext for entries having indrelid = this
+	 * rel.
+	 */
+	HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+
+	if (!HeapTupleIsValid(htup))
+		elog(ERROR, "cache lookup failed for extended statistics %u", mvoid);
+
+	deps = SysCacheGetAttr(STATEXTOID, htup,
+						   Anum_pg_statistic_ext_stadependencies, &isnull);
+
+	Assert(!isnull);
+
+	ReleaseSysCache(htup);
+
+	return statext_dependencies_deserialize(DatumGetByteaP(deps));
+}
+
+/*
+ * pg_dependencies_in		- input routine for type pg_dependencies.
+ *
+ * pg_dependencies is real enough to be a table column, but it has no operations
+ * of its own, and disallows input too
+ */
+Datum
+pg_dependencies_in(PG_FUNCTION_ARGS)
+{
+	/*
+	 * pg_node_list stores the data in binary form and parsing text input is
+	 * not needed, so disallow this.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies		- output routine for type pg_dependencies.
+ */
+Datum
+pg_dependencies_out(PG_FUNCTION_ARGS)
+{
+	int			i,
+				j;
+	StringInfoData str;
+
+	bytea	   *data = PG_GETARG_BYTEA_PP(0);
+
+	MVDependencies *dependencies = statext_dependencies_deserialize(data);
+
+	initStringInfo(&str);
+	appendStringInfoChar(&str, '[');
+
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *dependency = dependencies->deps[i];
+
+		if (i > 0)
+			appendStringInfoString(&str, ", ");
+
+		appendStringInfoChar(&str, '{');
+		for (j = 0; j < dependency->nattributes; j++)
+		{
+			if (j == dependency->nattributes - 1)
+				appendStringInfoString(&str, " => ");
+			else if (j > 0)
+				appendStringInfoString(&str, ", ");
+
+			appendStringInfo(&str, "%d", dependency->attributes[j]);
+		}
+		appendStringInfo(&str, " : %f", dependency->degree);
+		appendStringInfoChar(&str, '}');
+	}
+
+	appendStringInfoChar(&str, ']');
+
+	PG_RETURN_CSTRING(str.data);
+}
+
+/*
+ * pg_dependencies_recv		- binary input routine for type pg_dependencies.
+ */
+Datum
+pg_dependencies_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_dependencies")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_dependencies_send		- binary output routine for type pg_dependencies.
+ *
+ * Functional dependencies are serialized in a bytea value (although the type
+ * is named differently), so let's just send that.
+ */
+Datum
+pg_dependencies_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
+
+/*
+ * dependency_is_compatible_clause
+ *		Determines if the clause is compatible with functional dependencies
+ *
+ * Only OpExprs with two arguments using an equality operator are supported.
+ * When returning True attnum is set to the attribute number of the Var within
+ * the supported clause.
+ *
+ * Currently we only support Var = Const, or Const = Var. It may be possible
+ * to expand on this later.
+ */
+static bool
+dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
+{
+	RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+	if (!IsA(rinfo, RestrictInfo))
+		return false;
+
+	/* Pseudoconstants are not really interesting here. */
+	if (rinfo->pseudoconstant)
+		return false;
+
+	/* clauses referencing multiple varnos are incompatible */
+	if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
+		return false;
+
+	if (is_opclause(rinfo->clause))
+	{
+		OpExpr	   *expr = (OpExpr *) rinfo->clause;
+		Var		   *var;
+		bool		varonleft = true;
+		bool		ok;
+
+		/* Only expressions with two arguments are considered compatible. */
+		if (list_length(expr->args) != 2)
+			return false;
+
+		/* see if it actually has the right */
+		ok = (NumRelids((Node *) expr) == 1) &&
+			(is_pseudo_constant_clause(lsecond(expr->args)) ||
+			 (varonleft = false,
+			  is_pseudo_constant_clause(linitial(expr->args))));
+
+		/* unsupported structure (two variables or so) */
+		if (!ok)
+			return false;
+
+		/*
+		 * If it's not "=" operator, just ignore the clause, as it's not
+		 * compatible with functional dependencies.
+		 *
+		 * This uses the function for estimating selectivity, not the operator
+		 * directly (a bit awkward, but well ...).
+		 */
+		if (get_oprrest(expr->opno) != F_EQSEL)
+			return false;
+
+		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
+
+		/* We only support plain Vars for now */
+		if (!IsA(var, Var))
+			return false;
+
+		/* Ensure var is from the correct relation */
+		if (var->varno != relid)
+			return false;
+
+		/* we also better ensure the Var is from the current level */
+		if (var->varlevelsup > 0)
+			return false;
+
+		/* Also skip system attributes (we don't allow stats on those). */
+		if (!AttrNumberIsForUserDefinedAttr(var->varattno))
+			return false;
+
+		*attnum = var->varattno;
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * find_strongest_dependency
+ *		find the strongest dependency on the attributes
+ *
+ * When applying functional dependencies, we start with the strongest
+ * dependencies. That is, we select the dependency that:
+ *
+ * (a) has all attributes covered by equality clauses
+ *
+ * (b) has the most attributes
+ *
+ * (c) has the highest degree of validity
+ *
+ * This guarantees that we eliminate the most redundant conditions first
+ * (see the comment in dependencies_clauselist_selectivity).
+ */
+static MVDependency *
+find_strongest_dependency(StatisticExtInfo * stats, MVDependencies * dependencies,
+						  Bitmapset *attnums)
+{
+	int			i;
+	MVDependency *strongest = NULL;
+
+	/* number of attnums in clauses */
+	int			nattnums = bms_num_members(attnums);
+
+	/*
+	 * Iterate over the MVDependency items and find the strongest one from the
+	 * fully-matched dependencies. We do the cheap checks first, before
+	 * matching it against the attnums.
+	 */
+	for (i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *dependency = dependencies->deps[i];
+
+		/*
+		 * Skip dependencies referencing more attributes than available
+		 * clauses, as those can't be fully matched.
+		 */
+		if (dependency->nattributes > nattnums)
+			continue;
+
+		if (strongest)
+		{
+			/* skip dependencies on fewer attributes than the strongest. */
+			if (dependency->nattributes < strongest->nattributes)
+				continue;
+
+			/* also skip weaker dependencies when attribute count matches */
+			if (strongest->nattributes == dependency->nattributes &&
+				strongest->degree > dependency->degree)
+				continue;
+		}
+
+		/*
+		 * this dependency is stronger, but we must still check that it's
+		 * fully matched to these attnums. We perform this check last as it's
+		 * slightly more expensive than the previous checks.
+		 */
+		if (dependency_is_fully_matched(dependency, attnums))
+			strongest = dependency;		/* save new best match */
+	}
+
+	return strongest;
+}
+
+/*
+ * dependencies_clauselist_selectivity
+ *		Attempt to estimate selectivity using functional dependency statistics
+ *
+ * Given equality clauses on attributes (a,b) we find the strongest dependency
+ * between them, i.e. either (a=>b) or (b=>a). Assuming (a=>b) is the selected
+ * dependency, we then combine the per-clause selectivities using the formula
+ *
+ *	   P(a,b) = P(a) * [f + (1-f)*P(b)]
+ *
+ * where 'f' is the degree of the dependency.
+ *
+ * With clauses on more than two attributes, the dependencies are applied
+ * recursively, starting with the widest/strongest dependencies. For example
+ * P(a,b,c) is first split like this:
+ *
+ *	   P(a,b,c) = P(a,b) * [f + (1-f)*P(c)]
+ *
+ * assuming (a,b=>c) is the strongest dependency.
+ */
+Selectivity
+dependencies_clauselist_selectivity(PlannerInfo *root,
+									List *clauses,
+									int varRelid,
+									JoinType jointype,
+									SpecialJoinInfo *sjinfo,
+									RelOptInfo *rel,
+									Bitmapset **estimatedclauses)
+{
+	Selectivity s1 = 1.0;
+	ListCell   *l;
+	Bitmapset  *clauses_attnums = NULL;
+	StatisticExtInfo *stat;
+	MVDependencies *dependencies;
+	AttrNumber *list_attnums;
+	int			listidx;
+
+
+	/* check if there's any stats that might be useful for us. */
+	if (!has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
+		return 1.0;
+
+	list_attnums = (AttrNumber *) palloc(sizeof(AttrNumber) *
+										 list_length(clauses));
+
+	/*
+	 * Pre-process the clauses list to extract the attnums seen in each item.
+	 * We need to determine if there's any clauses which will be useful for
+	 * dependency selectivity estimations. Along the way we'll record all of
+	 * the attnums for each clause in a list which we'll reference later so we
+	 * don't need to repeat the same work again. We'll also keep track of all
+	 * attnums seen.
+	 */
+	listidx = 0;
+	foreach(l, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(l);
+		AttrNumber	attnum;
+
+		if (dependency_is_compatible_clause(clause, rel->relid, &attnum))
+		{
+			list_attnums[listidx] = attnum;
+			clauses_attnums = bms_add_member(clauses_attnums, attnum);
+		}
+		else
+			list_attnums[listidx] = InvalidAttrNumber;
+
+		listidx++;
+	}
+
+	/*
+	 * If there's not at least two distinct attnums then reject the whole list
+	 * of clauses. We must return 1.0 so the calling function's selectivity is
+	 * unaffected.
+	 */
+	if (bms_num_members(clauses_attnums) < 2)
+	{
+		pfree(list_attnums);
+		return 1.0;
+	}
+
+	/* find the best suited statistics for these attnums */
+	stat = choose_best_statistics(rel->statlist, clauses_attnums,
+								  STATS_EXT_DEPENDENCIES);
+
+	/* if no matching stats could be found then we've nothing to do */
+	if (!stat)
+	{
+		pfree(list_attnums);
+		return 1.0;
+	}
+
+	/* load the dependency items stored in the statistics */
+	dependencies = staext_dependencies_load(stat->statOid);
+
+	/*
+	 * Apply the dependencies recursively, starting with the widest/strongest
+	 * ones, and proceeding to the smaller/weaker ones. At the end of each
+	 * round we factor in the selectivity of clauses on the implied attribute,
+	 * and remove the clauses from the list.
+	 */
+	while (true)
+	{
+		Selectivity s2 = 1.0;
+		MVDependency *dependency;
+
+		/* the widest/strongest dependency, fully matched by clauses */
+		dependency = find_strongest_dependency(stat, dependencies,
+											   clauses_attnums);
+
+		/* if no suitable dependency was found, we're done */
+		if (!dependency)
+			break;
+
+		/*
+		 * We found an applicable dependency, so find all the clauses on the
+		 * implied attribute - with dependency (a,b => c) we look for clauses
+		 * on 'c'.
+		 */
+		listidx = -1;
+		foreach(l, clauses)
+		{
+			Node	   *clause;
+
+			listidx++;
+
+			/*
+			 * Skip incompatible clauses, and ones we've already estimated on.
+			 */
+			if (list_attnums[listidx] == InvalidAttrNumber ||
+				bms_is_member(listidx, *estimatedclauses))
+				continue;
+
+			/*
+			 * Technically we could find more than one clause for a given
+			 * attnum. Since these clauses must be equality clauses, we choose
+			 * to only take the selectivity estimate from the final clause in
+			 * the list for this attnum. If the attnum happens to be compared
+			 * to a different Const in another clause then no rows will match
+			 * anyway. If it happens to be compared to the same Const, then
+			 * ignoring the additional clause is just the thing to do.
+			 */
+			if (dependency_implies_attribute(dependency,
+											 list_attnums[listidx]))
+			{
+				clause = (Node *) lfirst(l);
+
+				s2 = clause_selectivity(root, clause, varRelid, jointype, sjinfo,
+										NULL);	/* don't try to use ext stats */
+
+				/* mark this one as done, so we don't touch it again. */
+				*estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+
+				/*
+				 * Mark that we've got and used the dependency on this clause.
+				 * We'll want to ignore this when looking for the next
+				 * strongest dependency above.
+				 */
+				clauses_attnums = bms_del_member(clauses_attnums,
+												 list_attnums[listidx]);
+			}
+		}
+
+		/*
+		 * Now factor in the selectivity for all the "implied" clauses into
+		 * the final one, using this formula:
+		 *
+		 * P(a,b) = P(a) * (f + (1-f) * P(b))
+		 *
+		 * where 'f' is the degree of validity of the dependency.
+		 */
+		s1 *= (dependency->degree + (1 - dependency->degree) * s2);
+	}
+
+	pfree(dependencies);
+	pfree(list_attnums);
+
+	return s1;
+}
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index d2b9f6a..006bb89 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -47,7 +47,7 @@ static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
 static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 					  int natts, VacAttrStats **vacattrstats);
 static void statext_store(Relation pg_stext, Oid relid,
-			  MVNDistinct *ndistinct,
+			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  VacAttrStats **stats);
 
 
@@ -74,6 +74,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	{
 		StatExtEntry   *stat = (StatExtEntry *) lfirst(lc);
 		MVNDistinct	   *ndistinct = NULL;
+		MVDependencies *dependencies = NULL;
 		VacAttrStats  **stats;
 		ListCell	   *lc2;
 
@@ -93,10 +94,13 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 			if (t == STATS_EXT_NDISTINCT)
 				ndistinct = statext_ndistinct_build(totalrows, numrows, rows,
 													stat->columns, stats);
+			else if (t == STATS_EXT_DEPENDENCIES)
+				dependencies = statext_dependencies_build(numrows, rows,
+													   stat->columns, stats);
 		}
 
 		/* store the statistics in the catalog */
-		statext_store(pg_stext, stat->statOid, ndistinct, stats);
+		statext_store(pg_stext, stat->statOid, ndistinct, dependencies, stats);
 	}
 
 	heap_close(pg_stext, RowExclusiveLock);
@@ -117,6 +121,10 @@ statext_is_kind_built(HeapTuple htup, char type)
 			attnum = Anum_pg_statistic_ext_standistinct;
 			break;
 
+		case STATS_EXT_DEPENDENCIES:
+			attnum = Anum_pg_statistic_ext_stadependencies;
+			break;
+
 		default:
 			elog(ERROR, "unexpected statistics type requested: %d", type);
 	}
@@ -178,7 +186,8 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
 		enabled = (char *) ARR_DATA_PTR(arr);
 		for (i = 0; i < ARR_DIMS(arr)[0]; i++)
 		{
-			Assert(enabled[i] == STATS_EXT_NDISTINCT);
+			Assert((enabled[i] == STATS_EXT_NDISTINCT) ||
+				   (enabled[i] == STATS_EXT_DEPENDENCIES));
 			entry->types = lappend_int(entry->types, (int) enabled[i]);
 		}
 
@@ -256,7 +265,7 @@ lookup_var_attr_stats(Relation rel, Bitmapset *attrs, int natts,
  */
 static void
 statext_store(Relation pg_stext, Oid statOid,
-			  MVNDistinct *ndistinct,
+			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  VacAttrStats **stats)
 {
 	HeapTuple	stup,
@@ -280,8 +289,17 @@ statext_store(Relation pg_stext, Oid statOid,
 		values[Anum_pg_statistic_ext_standistinct - 1] = PointerGetDatum(data);
 	}
 
+	if (dependencies != NULL)
+	{
+		bytea	   *data = statext_dependencies_serialize(dependencies);
+
+		nulls[Anum_pg_statistic_ext_stadependencies - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_stadependencies - 1] = PointerGetDatum(data);
+	}
+
 	/* always replace the value (either by bytea or NULL) */
 	replaces[Anum_pg_statistic_ext_standistinct - 1] = true;
+	replaces[Anum_pg_statistic_ext_stadependencies - 1] = true;
 
 	/* there should already be a pg_statistic_ext tuple */
 	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
@@ -387,3 +405,82 @@ multi_sort_compare_dims(int start, int end,
 
 	return 0;
 }
+
+/*
+ * has_stats_of_kind
+ *	Check that the list contains statistic of a given kind
+ */
+bool
+has_stats_of_kind(List *stats, char requiredkind)
+{
+	ListCell   *l;
+
+	foreach(l, stats)
+	{
+		StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(l);
+
+		if (stat->kind == requiredkind)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * choose_best_statistics
+ *		Look for statistics with the specified 'requiredkind' which have keys
+ *		that match at least two attnums.
+ *
+ * The current selection criteria is very simple - we choose the statistics
+ * referencing the most attributes with the least keys.
+ *
+ * XXX if multiple statistics exists of the same size matching the same number
+ * of keys, then the statistics which are chosen depend on the order that they
+ * appear in the stats list. Perhaps this needs to be more definitive.
+ */
+StatisticExtInfo *
+choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind)
+{
+	ListCell   *lc;
+	StatisticExtInfo *best_match = NULL;
+	int			best_num_matched = 2;	/* goal #1: maximize */
+	int			best_match_keys = (STATS_MAX_DIMENSIONS + 1);	/* goal #2: minimize */
+
+	foreach(lc, stats)
+	{
+		StatisticExtInfo *info = (StatisticExtInfo *) lfirst(lc);
+		int			num_matched;
+		int			numkeys;
+		Bitmapset  *matched;
+
+		/* skip statistics that are not the correct type */
+		if (info->kind != requiredkind)
+			continue;
+
+		/* determine how many attributes of these stats can be matched to */
+		matched = bms_intersect(attnums, info->keys);
+		num_matched = bms_num_members(matched);
+		bms_free(matched);
+
+		/*
+		 * save the actual number of keys in the stats so that we can choose
+		 * the narrowest stats with the most matching keys.
+		 */
+		numkeys = bms_num_members(info->keys);
+
+		/*
+		 * Use these statistics when it increases the number of matched
+		 * clauses or when it matches the same number of attributes but these
+		 * stats have fewer keys than any previous match.
+		 */
+		if (num_matched > best_num_matched ||
+			(num_matched == best_num_matched && numkeys < best_match_keys))
+		{
+			best_match = info;
+			best_num_matched = num_matched;
+			best_match_keys = numkeys;
+		}
+	}
+
+	return best_match;
+}
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c2681ce..84934ce 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1452,6 +1452,13 @@ pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
 	StringInfoData buf;
 	int			colno;
 	char	   *nsp;
+	ArrayType  *arr;
+	char	   *enabled;
+	Datum		datum;
+	bool		isnull;
+	bool		ndistinct_enabled;
+	bool		dependencies_enabled;
+	int			i;
 
 	statexttup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statextid));
 
@@ -1467,10 +1474,55 @@ pg_get_statisticsext_worker(Oid statextid, bool missing_ok)
 	initStringInfo(&buf);
 
 	nsp = get_namespace_name(statextrec->stanamespace);
-	appendStringInfo(&buf, "CREATE STATISTICS %s ON (",
+	appendStringInfo(&buf, "CREATE STATISTICS %s",
 					 quote_qualified_identifier(nsp,
 												NameStr(statextrec->staname)));
 
+	/*
+	 * Lookup the staenabled column so that we know how to handle the WITH
+	 * clause.
+	 */
+	datum = SysCacheGetAttr(STATEXTOID, statexttup,
+							Anum_pg_statistic_ext_staenabled, &isnull);
+	Assert(!isnull);
+	arr = DatumGetArrayTypeP(datum);
+	if (ARR_NDIM(arr) != 1 ||
+		ARR_HASNULL(arr) ||
+		ARR_ELEMTYPE(arr) != CHAROID)
+		elog(ERROR, "staenabled is not a 1-D char array");
+	enabled = (char *) ARR_DATA_PTR(arr);
+
+	ndistinct_enabled = false;
+	dependencies_enabled = false;
+
+	for (i = 0; i < ARR_DIMS(arr)[0]; i++)
+	{
+		if (enabled[i] == STATS_EXT_NDISTINCT)
+			ndistinct_enabled = true;
+		if (enabled[i] == STATS_EXT_DEPENDENCIES)
+			dependencies_enabled = true;
+	}
+
+	/*
+	 * If any option is disabled, then we'll need to append a WITH clause to
+	 * show which options are enabled.  We omit the WITH clause on purpose
+	 * when all options are enabled, so a pg_dump/pg_restore will create all
+	 * statistics types on a newer postgres version, if the statistics had all
+	 * options enabled on the original version.
+	 */
+	if (!ndistinct_enabled || !dependencies_enabled)
+	{
+		appendStringInfoString(&buf, " WITH (");
+		if (ndistinct_enabled)
+			appendStringInfoString(&buf, "ndistinct");
+		else if (dependencies_enabled)
+			appendStringInfoString(&buf, "dependencies");
+
+		appendStringInfoChar(&buf, ')');
+	}
+
+	appendStringInfoString(&buf, " ON (");
+
 	for (colno = 0; colno < statextrec->stakeys.dim1; colno++)
 	{
 		AttrNumber	attnum = statextrec->stakeys.values[colno];
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5c382a2..7a4ed84 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1633,13 +1633,17 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 			case IS_NOT_FALSE:
 				selec = (double) clause_selectivity(root, arg,
 													varRelid,
-													jointype, sjinfo);
+													jointype,
+													sjinfo,
+													NULL);
 				break;
 			case IS_FALSE:
 			case IS_NOT_TRUE:
 				selec = 1.0 - (double) clause_selectivity(root, arg,
 														  varRelid,
-														  jointype, sjinfo);
+														  jointype,
+														  sjinfo,
+														  NULL);
 				break;
 			default:
 				elog(ERROR, "unrecognized booltesttype: %d",
@@ -6436,7 +6440,8 @@ genericcostestimate(PlannerInfo *root,
 	indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											  index->rel->relid,
 											  JOIN_INNER,
-											  NULL);
+											  NULL,
+											  index->rel);
 
 	/*
 	 * If caller didn't give us an estimate, estimate the number of index
@@ -6757,7 +6762,8 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		btreeSelectivity = clauselist_selectivity(root, selectivityQuals,
 												  index->rel->relid,
 												  JOIN_INNER,
-												  NULL);
+												  NULL,
+												  index->rel);
 		numIndexTuples = btreeSelectivity * index->rel->tuples;
 
 		/*
@@ -7516,7 +7522,8 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity = clauselist_selectivity(root, selectivityQuals,
 											   index->rel->relid,
 											   JOIN_INNER,
-											   NULL);
+											   NULL,
+											   index->rel);
 
 	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
@@ -7748,7 +7755,8 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	*indexSelectivity =
 		clauselist_selectivity(root, indexQuals,
 							   path->indexinfo->rel->relid,
-							   JOIN_INNER, NULL);
+							   JOIN_INNER, NULL,
+							   path->indexinfo->rel);
 	*indexCorrelation = 1;
 
 	/*
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index b0f3e5e..2ef0626 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2331,7 +2331,8 @@ describeOneTableDetails(const char *schemaname,
 						   "    FROM ((SELECT pg_catalog.unnest(stakeys) AS attnum) s\n"
 			   "         JOIN pg_catalog.pg_attribute a ON (starelid = a.attrelid AND\n"
 							  "a.attnum = s.attnum AND not attisdropped))) AS columns,\n"
-							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled\n"
+							  "  (staenabled::char[] @> '{d}'::char[]) AS ndist_enabled,\n"
+							  "  (staenabled::char[] @> '{f}'::char[]) AS deps_enabled\n"
 			  "FROM pg_catalog.pg_statistic_ext stat WHERE starelid  = '%s'\n"
 			  "ORDER BY 1;",
 							  oid);
@@ -2348,7 +2349,7 @@ describeOneTableDetails(const char *schemaname,
 
 				for (i = 0; i < tuples; i++)
 				{
-					int		cnt = 0;
+					bool		gotone = false;
 
 					printfPQExpBuffer(&buf, "    ");
 
@@ -2361,7 +2362,12 @@ describeOneTableDetails(const char *schemaname,
 					if (strcmp(PQgetvalue(result, i, 5), "t") == 0)
 					{
 						appendPQExpBufferStr(&buf, "ndistinct");
-						cnt++;
+						gotone = true;
+					}
+
+					if (strcmp(PQgetvalue(result, i, 6), "t") == 0)
+					{
+						appendPQExpBuffer(&buf, "%sdependencies", gotone ? ", " : "");
 					}
 
 					appendPQExpBuffer(&buf, ") ON (%s)",
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index bc5d28a..ccc6fb3 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -258,6 +258,10 @@ DATA(insert (  194	 25    0 i b ));
 DATA(insert (  3361  17    0 i b ));
 DATA(insert (  3361  25    0 i i ));
 
+/* pg_dependencies can be coerced to, but not from, bytea and text */
+DATA(insert (  3402  17    0 i b ));
+DATA(insert (  3402  25    0 i i ));
+
 /*
  * Datetime category
  */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 220ba7b..58e080e 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2771,6 +2771,15 @@ DESCR("I/O");
 DATA(insert OID = 3358 (  pg_ndistinct_send PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3361" _null_ _null_ _null_ _null_ _null_ pg_ndistinct_send _null_ _null_ _null_ ));
 DESCR("I/O");
 
+DATA(insert OID = 3404 (  pg_dependencies_in	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 3402 "2275" _null_ _null_ _null_ _null_ _null_ pg_dependencies_in _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3405 (  pg_dependencies_out	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2275 "3402" _null_ _null_ _null_ _null_ _null_ pg_dependencies_out _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3406 (  pg_dependencies_recv	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 3402 "2281" _null_ _null_ _null_ _null_ _null_ pg_dependencies_recv _null_ _null_ _null_ ));
+DESCR("I/O");
+DATA(insert OID = 3407 (  pg_dependencies_send	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 1 0 17 "3402" _null_ _null_ _null_ _null_ _null_ pg_dependencies_send _null_ _null_ _null_ ));
+DESCR("I/O");
+
 DATA(insert OID = 1928 (  pg_stat_get_numscans			PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_numscans _null_ _null_ _null_ ));
 DESCR("statistics: number of scans done for table/index");
 DATA(insert OID = 1929 (  pg_stat_get_tuples_returned	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_tuples_returned _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 5f67fe7..0a1cc04 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -46,6 +46,7 @@ CATALOG(pg_statistic_ext,3381)
 	char		staenabled[1] BKI_FORCE_NOT_NULL;	/* statistic types
 													 * requested to build */
 	pg_ndistinct standistinct;	/* ndistinct coefficients (serialized) */
+	pg_dependencies stadependencies;	/* dependencies (serialized) */
 #endif
 
 } FormData_pg_statistic_ext;
@@ -61,7 +62,7 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
  *		compiler constants for pg_statistic_ext
  * ----------------
  */
-#define Natts_pg_statistic_ext					7
+#define Natts_pg_statistic_ext					8
 #define Anum_pg_statistic_ext_starelid			1
 #define Anum_pg_statistic_ext_staname			2
 #define Anum_pg_statistic_ext_stanamespace		3
@@ -69,7 +70,9 @@ typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;
 #define Anum_pg_statistic_ext_stakeys			5
 #define Anum_pg_statistic_ext_staenabled		6
 #define Anum_pg_statistic_ext_standistinct		7
+#define Anum_pg_statistic_ext_stadependencies	8
 
-#define STATS_EXT_NDISTINCT		'd'
+#define STATS_EXT_NDISTINCT			'd'
+#define STATS_EXT_DEPENDENCIES		'f'
 
 #endif   /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 9ad6725..345e916 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -368,6 +368,10 @@ DATA(insert OID = 3361 ( pg_ndistinct		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_nd
 DESCR("multivariate ndistinct coefficients");
 #define PGNDISTINCTOID	3361
 
+DATA(insert OID = 3402 ( pg_dependencies		PGNSP PGUID -1 f b S f t \054 0 0 0 pg_dependencies_in pg_dependencies_out pg_dependencies_recv pg_dependencies_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
+DESCR("multivariate dependencies");
+#define PGDEPENDENCIESOID	3402
+
 DATA(insert OID = 32 ( pg_ddl_command	PGNSP PGUID SIZEOF_POINTER t p P f t \054 0 0 0 pg_ddl_command_in pg_ddl_command_out pg_ddl_command_recv pg_ddl_command_send - - - ALIGNOF_POINTER p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR("internal type for passing CollectedCommand");
 #define PGDDLCOMMANDOID 32
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d9a9b12..cb1fecf 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -200,12 +200,14 @@ extern Selectivity clauselist_selectivity(PlannerInfo *root,
 					   List *clauses,
 					   int varRelid,
 					   JoinType jointype,
-					   SpecialJoinInfo *sjinfo);
+					   SpecialJoinInfo *sjinfo,
+					   RelOptInfo *rel);
 extern Selectivity clause_selectivity(PlannerInfo *root,
 				   Node *clause,
 				   int varRelid,
 				   JoinType jointype,
-				   SpecialJoinInfo *sjinfo);
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *rel);
 extern void cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 							  RelOptInfo *rel, ParamPathInfo *param_info,
 							  Cost input_startup_cost, Cost input_total_cost,
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 961f1f7..0c40b86 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -52,6 +52,11 @@ extern MVNDistinct *statext_ndistinct_build(double totalrows,
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
 extern MVNDistinct *statext_ndistinct_deserialize(bytea *data);
 
+extern MVDependencies *statext_dependencies_build(int numrows, HeapTuple *rows,
+						Bitmapset *attrs, VacAttrStats **stats);
+extern bytea *statext_dependencies_serialize(MVDependencies *dependencies);
+extern MVDependencies *statext_dependencies_deserialize(bytea *data);
+
 extern MultiSortSupport multi_sort_init(int ndims);
 extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 						 Oid oper);
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 91645bf..a3f0d90 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -14,6 +14,7 @@
 #define STATISTICS_H
 
 #include "commands/vacuum.h"
+#include "nodes/relation.h"
 
 #define STATS_MAX_DIMENSIONS	8		/* max number of attributes */
 
@@ -44,11 +45,54 @@ typedef struct MVNDistinct
 #define SizeOfMVNDistinct	(offsetof(MVNDistinct, nitems) + sizeof(uint32))
 
 
+/* size of the struct excluding the items array */
+#define SizeOfMVNDistinct	(offsetof(MVNDistinct, nitems) + sizeof(uint32))
+
+#define STATS_DEPS_MAGIC		0xB4549A2C		/* marks serialized bytea */
+#define STATS_DEPS_TYPE_BASIC	1		/* basic dependencies type */
+
+/*
+ * Functional dependencies, tracking column-level relationships (values
+ * in one column determine values in another one).
+ */
+typedef struct MVDependency
+{
+	double		degree;			/* degree of validity (0-1) */
+	AttrNumber	nattributes;	/* number of attributes */
+	AttrNumber	attributes[FLEXIBLE_ARRAY_MEMBER];		/* attribute numbers */
+} MVDependency;
+
+/* size of the struct excluding the deps array */
+#define SizeOfDependency \
+	(offsetof(MVDependency, nattributes) + sizeof(AttrNumber))
+
+typedef struct MVDependencies
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of MV Dependencies (BASIC) */
+	uint32		ndeps;			/* number of dependencies */
+	MVDependency *deps[FLEXIBLE_ARRAY_MEMBER];	/* dependencies */
+} MVDependencies;
+
+/* size of the struct excluding the deps array */
+#define SizeOfDependencies	(offsetof(MVDependencies, ndeps) + sizeof(uint32))
+
 extern MVNDistinct *statext_ndistinct_load(Oid mvoid);
+extern MVDependencies *staext_dependencies_load(Oid mvoid);
 
 extern void BuildRelationExtStatistics(Relation onerel, double totalrows,
 						   int numrows, HeapTuple *rows,
 						   int natts, VacAttrStats **vacattrstats);
 extern bool statext_is_kind_built(HeapTuple htup, char kind);
+extern Selectivity dependencies_clauselist_selectivity(PlannerInfo *root,
+									List *clauses,
+									int varRelid,
+									JoinType jointype,
+									SpecialJoinInfo *sjinfo,
+									RelOptInfo *rel,
+									Bitmapset **estimatedclauses);
+extern bool has_stats_of_kind(List *stats, char requiredkind);
+extern StatisticExtInfo *choose_best_statistics(List *stats,
+					   Bitmapset *attnums, char requiredkind);
 
 #endif   /* STATISTICS_H */
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 262036a..d23f876 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -824,11 +824,12 @@ WHERE c.castmethod = 'b' AND
  character varying | character         |        0 | i
  pg_node_tree      | text              |        0 | i
  pg_ndistinct      | bytea             |        0 | i
+ pg_dependencies   | bytea             |        0 | i
  cidr              | inet              |        0 | i
  xml               | text              |        0 | a
  xml               | character varying |        0 | a
  xml               | character         |        0 | a
-(8 rows)
+(9 rows)
 
 -- **************** pg_conversion ****************
 -- Look for illegal values in pg_conversion fields.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d706f42..cba82bb 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2192,7 +2192,8 @@ pg_stats_ext| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     s.staname,
     s.stakeys AS attnums,
-    length((s.standistinct)::text) AS ndistbytes
+    length((s.standistinct)::bytea) AS ndistbytes,
+    length((s.stadependencies)::bytea) AS depsbytes
    FROM ((pg_statistic_ext s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 8fe96d6..b43208d 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -31,7 +31,7 @@ ALTER TABLE ab1 DROP COLUMN a;
  b      | integer |           |          | 
  c      | integer |           |          | 
 Statistics:
-    "public.ab1_b_c_stats" WITH (ndistinct) ON (b, c)
+    "public.ab1_b_c_stats" WITH (ndistinct, dependencies) ON (b, c)
 
 DROP TABLE ab1;
 -- Ensure things work sanely with SET STATISTICS 0
@@ -135,7 +135,7 @@ SELECT staenabled, standistinct
   FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
  staenabled |                                          standistinct                                          
 ------------+------------------------------------------------------------------------------------------------
- {d}        | [{(b 3 4), 301.000000}, {(b 3 6), 301.000000}, {(b 4 6), 301.000000}, {(b 3 4 6), 301.000000}]
+ {d,f}      | [{(b 3 4), 301.000000}, {(b 3 6), 301.000000}, {(b 4 6), 301.000000}, {(b 3 4 6), 301.000000}]
 (1 row)
 
 -- Hash Aggregate, thanks to estimates improved by the statistic
@@ -201,7 +201,7 @@ SELECT staenabled, standistinct
   FROM pg_statistic_ext WHERE starelid = 'ndistinct'::regclass;
  staenabled |                                            standistinct                                            
 ------------+----------------------------------------------------------------------------------------------------
- {d}        | [{(b 3 4), 2550.000000}, {(b 3 6), 800.000000}, {(b 4 6), 1632.000000}, {(b 3 4 6), 10000.000000}]
+ {d,f}      | [{(b 3 4), 2550.000000}, {(b 3 6), 800.000000}, {(b 4 6), 1632.000000}, {(b 3 4 6), 10000.000000}]
 (1 row)
 
 -- plans using Group Aggregate, thanks to using correct esimates
@@ -311,3 +311,107 @@ EXPLAIN (COSTS off)
 (3 rows)
 
 DROP TABLE ndistinct;
+-- functional dependencies tests
+CREATE TABLE functional_dependencies (
+    filler1 TEXT,
+    filler2 NUMERIC,
+    a INT,
+    b TEXT,
+    filler3 DATE,
+    c INT,
+    d TEXT
+);
+SET random_page_cost = 1.2;
+CREATE INDEX fdeps_ab_idx ON functional_dependencies (a, b);
+CREATE INDEX fdeps_abc_idx ON functional_dependencies (a, b, c);
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i, 23), mod(i, 29), mod(i, 31), i FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- a => b, a => c, b => c
+TRUNCATE functional_dependencies;
+DROP STATISTICS func_deps_stat;
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text))
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Index Scan using fdeps_abc_idx on functional_dependencies
+   Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
+(2 rows)
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+ANALYZE functional_dependencies;
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   ->  Bitmap Index Scan on fdeps_abc_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Bitmap Heap Scan on functional_dependencies
+   Recheck Cond: ((a = 1) AND (b = '1'::text))
+   Filter: (c = 1)
+   ->  Bitmap Index Scan on fdeps_ab_idx
+         Index Cond: ((a = 1) AND (b = '1'::text))
+(5 rows)
+
+RESET random_page_cost;
+DROP TABLE functional_dependencies;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 84022f6..7b200ba 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -67,12 +67,13 @@ WHERE p1.typtype not in ('c','d','p') AND p1.typname NOT LIKE E'\\_%'
     (SELECT 1 FROM pg_type as p2
      WHERE p2.typname = ('_' || p1.typname)::name AND
            p2.typelem = p1.oid and p1.typarray = p2.oid);
- oid  |   typname    
-------+--------------
+ oid  |     typname     
+------+-----------------
   194 | pg_node_tree
  3361 | pg_ndistinct
+ 3402 | pg_dependencies
   210 | smgr
-(3 rows)
+(4 rows)
 
 -- Make sure typarray points to a varlena array type of our own base
 SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 4faaf88..1b0018d 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -163,3 +163,71 @@ EXPLAIN (COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
 
 DROP TABLE ndistinct;
+
+-- functional dependencies tests
+CREATE TABLE functional_dependencies (
+    filler1 TEXT,
+    filler2 NUMERIC,
+    a INT,
+    b TEXT,
+    filler3 DATE,
+    c INT,
+    d TEXT
+);
+
+SET random_page_cost = 1.2;
+
+CREATE INDEX fdeps_ab_idx ON functional_dependencies (a, b);
+CREATE INDEX fdeps_abc_idx ON functional_dependencies (a, b, c);
+
+-- random data (no functional dependencies)
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i, 23), mod(i, 29), mod(i, 31), i FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- a => b, a => c, b => c
+TRUNCATE functional_dependencies;
+DROP STATISTICS func_deps_stat;
+
+INSERT INTO functional_dependencies (a, b, c, filler1)
+     SELECT mod(i,100), mod(i,50), mod(i,25), i FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+-- create statistics
+CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies;
+
+ANALYZE functional_dependencies;
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
+
+EXPLAIN (COSTS OFF)
+ SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
+
+RESET random_page_cost;
+DROP TABLE functional_dependencies;

#236

Simon Riggs

simon@2ndquadrant.com

almost 9 years ago

In reply to: David Rowley (#235)

1 attachment(s)

Re: multivariate statistics (v25)

On 5 April 2017 at 10:47, David Rowley <david.rowley@2ndquadrant.com> wrote:

I have some other comments.

Me too.

CREATE STATISTICS should take ShareUpdateExclusiveLock like ANALYZE.

This change is in line with other changes in this and earlier
releases. Comments and docs included.

Patch ready to be applied directly barring objections.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

create_statistics_lock_reduction.v1.patchapplication/octet-stream; name=create_statistics_lock_reduction.v1.patchDownload

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 82e69fe2d2..7aa32932fa 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -923,7 +923,8 @@ ERROR:  could not serialize access due to read/write dependencies among transact
 
         <para>
          Acquired by <command>VACUUM</command> (without <option>FULL</option>),
-         <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
+         <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+         <command>CREATE STATISTICS</> and
          <command>ALTER TABLE VALIDATE</command> and other
          <command>ALTER TABLE</command> variants (for full details see
          <xref linkend="SQL-ALTERTABLE">).
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 0750329961..c11a04d097 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -95,7 +95,13 @@ CreateStatistics(CreateStatsStmt *stmt)
 				 errmsg("statistics \"%s\" already exist", namestr)));
 	}
 
-	rel = heap_openrv(stmt->relation, AccessExclusiveLock);
+	/*
+	 * CREATE STATISTICS will influence future execution plans but does
+	 * not interfere with currently executing plans so it is safe to
+	 * take only ShareUpdateExclusiveLock on relation, conflicting with
+	 * ANALYZE and other DDL that sets statistical information.
+	 */
+	rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
 	relid = RelationGetRelid(rel);
 
 	if (rel->rd_rel->relkind != RELKIND_RELATION &&

#237

Tels

nospam-abuse@bloodgate.com

almost 9 years ago

In reply to: Simon Riggs (#236)

Re: multivariate statistics (v25)

Moin,

On Wed, April 5, 2017 2:52 pm, Simon Riggs wrote:

On 5 April 2017 at 10:47, David Rowley <david.rowley@2ndquadrant.com>
wrote:

I have some other comments.

Me too.

CREATE STATISTICS should take ShareUpdateExclusiveLock like ANALYZE.

This change is in line with other changes in this and earlier
releases. Comments and docs included.

Patch ready to be applied directly barring objections.

I know I'm a bit late, but isn't the syntax backwards?

"CREATE STATISTICS s1 WITH (dependencies) ON (col_a, col_b) FROM table;"

These do it the other way round:

CREATE INDEX idx ON table (col_a);

AND:

CREATE TABLE t (
id INT REFERENCES table_2 (col_b);
);

Won't this be confusing and make things hard to remember?

Sorry for not asking earlier, I somehow missed this.

Regard,

Tels

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#238

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Tels (#237)

Re: multivariate statistics (v25)

On 6 April 2017 at 07:19, Tels <nospam-abuse@bloodgate.com> wrote:

I know I'm a bit late, but isn't the syntax backwards?

"CREATE STATISTICS s1 WITH (dependencies) ON (col_a, col_b) FROM table;"

These do it the other way round:

CREATE INDEX idx ON table (col_a);

AND:

CREATE TABLE t (
id INT REFERENCES table_2 (col_b);
);

Won't this be confusing and make things hard to remember?

Sorry for not asking earlier, I somehow missed this.

The reasoning is in [1]/messages/by-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=VbJ78VQFA@mail.gmail.com

[1]: /messages/by-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=VbJ78VQFA@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#239

Simon Riggs

simon@2ndquadrant.com

almost 9 years ago

In reply to: David Rowley (#235)

Re: multivariate statistics (v25)

On 5 April 2017 at 10:47, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've attached an updated patch to address Tomas' concerns and yours too.

Commited, with some doc changes and additions based upon my explorations.

For the record, I measured the time to calc extended statistics as
+800ms on 2 million row sample.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#240

David Rowley

david.rowley@2ndquadrant.com

almost 9 years ago

In reply to: Simon Riggs (#239)

Re: multivariate statistics (v25)

On 6 April 2017 at 10:17, Simon Riggs <simon@2ndquadrant.com> wrote:

On 5 April 2017 at 10:47, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've attached an updated patch to address Tomas' concerns and yours too.

Commited, with some doc changes and additions based upon my explorations.

Great. Thanks for committing!

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers